Search Engines

Limitations

Size

Google search results

too much information

A search for Barack Obama in December, 2008, retrieved over 106 million pages. When you are doing research, more is not really better—you could never review all of these pages, and most likely you will run out of patience within the first few pages of results. The enormous indexes that search engines build can backfire on us if we aren't able to narrow things down enough.

Relevance judgments

Search engines see your keywords, but they don't understand the context of your research. When a search engine's spider evaluates a web page, it is trying to get the sites that best match your keywords and have some kind of value. It cannot really know the quality of the sites, though, so it uses other signs of value. For instance, it might employ some of the following criteria:

It is important to understand that although these criteria can work very well for many purposes, they can be problematic for academic research. When you are looking for materials for a class, you are looking for reliable, authoritative, accurate, relatively unbiased materials, and the major search engines simply do not have a way to judge websites by these criteria.

What is more, these relevance judgments can be manipulated by savvy web developers. In fact, there is a whole industry that has grown up around Search Engine Optimization (SEO)—strategies for getting websites to be placed higher on results lists. And there are even cases of "google bombing," or manipulating the search engine to achieve certain results. A few years ago, a search for "miserable failure" returned as its top result the White House's official biography of George W. Bush. So, although appearing high on search results isn't meaningless, it is no guarantee of a site's quality, credibility, or authority.

sponsored links

And be sure to watch out for areas labeled "Sponsored Links," "Sponsored Results," or something similar. These are simply advertisements; they may or may not lead to good sites, but the only reason they are appearing where they are is that the website owner has paid the search engine to achieve that result.

Scirus search engine

Some search engines are trying to separate out more reliable results. Hakia, for instance, gives you a category for "credible sites" for some queries, and Scirus, a search engine for scientific topics, has a category it calls the "preferred web" (click the image on the left to see how it marks these results). These tools can help you, but the criteria they apply in making these distinctions is not always rigorous—and, since it is done automatically using an algorithm, you will still have to approach the results skeptically.

Gaps

Even though the major search engines have indexed millions of pages, many pages are missing. There are various reasons a page might not be in the index, including:

According to recent studies, web search engines are still not adequately indexing U.S. government sites and collections of scholarly literature.