By Tiffany Whitfield
Jian Wu, assistant professor of computer science at 圖朸厙, has been recognized for his work on CiteSeerX, a world-renowned academic search engine.
Wu contributed to the work of C. Lee Giles, the David Reese professor of information sciences and technology at Penn State University and creator of the search engine. Wu, Giles and a team of computer scientists were honored by the Information Retrieval Specialist Group of the British Computer Society (BCS) with the Best Open Source Project award at the organization's . The BCS recognizes people, projects and organizations that have excelled in the design of search and information retrieval products and services.
Giles developed , an adaptive, worldwide large-scale open-source academic search engine that launched as CiteSeer in 1998 and was . Wu joined the team in 2012. This search engine houses more than 10 million full-text English documents along with metadata from 32 million authors and 240 million citation mentions. More than three million users globally access the site, allowing for one billion hits and hundreds of millions of downloads every year.
"The team had to overcome both financial and technical challenges to maintain such a production system in an academic setting," Wu said. "The BCS award is a recognition of the persistent work of several generations of team members."
From its inception, CiteSeerX, was created to adapt to users' requirements.
"Automatically, we were able to bring up how many citations a paper had gotten," Giles said. "Indexing based on importance was revolutionary at the time."
To perform this indexing and information extraction as scale, CiteSeerX uses several machine learning methods.
The digital archive search engine was one of the pioneer platforms that implemented the automated citation indexing technique to connect papers and researchers as a network. It actively crawls and harvests academic and scientific documents online and uses automatous citation indexing, making it possible for users to find related papers using citation graphs. It is often considered a predecessor of academic search tools such as Google Scholar and Microsoft Academic Search.
"Dr. Wu is a very productive and creative researcher," said Gail Dodge, dean of College of Sciences at 圖朸厙. "We are proud of his contribution to the innovative CiteSeerX project, and congratulations to him and his team on receiving the BCS award."
"I am very glad to hear that Jian received the prestigious BCS award because this indeed is a recognition of his long-term commitment to the CiteSeerX project," said Ravi Mukkamala, professor and chair of the Department of Computer Science. He is very active in research and is a shining star among the new faculty that the department has recruited in recent years."
Wu is working with Penn State researchers on the next generation CiteSeerX.
"We are refactoring CiteSeerX from Solr Lucene and mySQL to Elasticsearch, all of which is open source," Wu said.
The BCS has more than 60,000 members in 150 countries and is a charity with a royal charter that aims to lead the information technology industry through its ethical challenges, support the people who work in the industry and make IT good for society. To learn more about the CiteSeerX, click