Wednesday, May 05, 2004

Google is indexing the full content of the scientific literature

Nature Publishing Group: science journals, jobs and information

From a UMLS list posting by Mike Cumments:
Google appears to be getting ready to index all of the full content of the scientific literature. They are doing a pilot with publishers until the end of 2004. At this stage they seem to be focused on simply doing a PageRank type search without delving into the semantic content. They do a standard Google search and apply a filter for only the URLs of the 9 publishers.

See for background.

The project description is

For a search of the full text content of these publishers using Google go to
From the infotoday article:
CrossRef (, a 300-member publisher trade association, has announced a pilot project called CrossRef Search that will enable users to search the full text of scholarly journal articles, conference proceedings, and other sources from nine leading publishers. Google will supply the search technologies and CrossRef the reference links to publisher Web sites. While Google will also incorporate CrossRef content connections into its general Web search engine, users who go to publisher Web sites and click on the CrossRef Search icon will reach just the scholarly subset. However, searching through the icon will access content from all participating publishers...

... Searching CrossRef Search will be available to all Web users at no charge. Content will include current journal issues as well as back files. The system uses CrossRef’s DOIs (Digital Object Identifiers) or standard URLs to identify and link to content.

At present, publishers participating in CrossRef Search are:

American Physical Society (
Annual Reviews (
Association for Computing Machinery (
Blackwell Publishing (
Institute of Physics Publishing (
International Union of Crystallography (; click “search” and scroll down the page)
Nature Publishing Group (
Oxford University Press (; each journal’s search page includes a link)
John Wiley & Sons, Inc. (

These initial publishers produce some 1,100 journals, according to Pentz. Participants have investigations underway to test how to use DOIs to improve indexing and metadata for better retrieval and to enable persistent links from search results to the full text of content at publisher sites.

The initial pilot will last throughout 2004. CrossRef plans to gather feedback from scientists, scholars, and librarians through e-mail forms and formal evaluations using external consultants, according to Pentz. CrossRef is also hoping to discuss similar programs with other search engines, Pentz said.

There are only two rules for joining the pilot program, according to Pentz. “The publisher has to have all their content indexed through the way Google indexes and make the search box available to everyone at no charge.” ...

I doubt there's a large amount of money in this for Google, but it fits with the social mission of their CEOs. I doubt it makes their VC's happy.

No comments: