CONCLUSION | (ed.) Intelligent Agents for Data Mining and Information Retrieval

In this chapter, we carry out a simple case study on the overlap and distance of search results, by multiple search engines, on some popular queries. We submitted 58 sample queries, provided by the WordTracker service, to four general search engines (Google, AltaVista, Alltheweb and WiseNut). These queries were divided into specific, general and adult- related queries. Three cases (top 10, top 20 and top 50) were considered in the experiment.

The highlights of our findings in this experiment are summarized as follows :

The search results by different search engines have little overlap.
The search results for the queries in different categories behave in dramatically different ways. Search engines usually return the same top 1 result for the query in Category A, while there is very little overlap on the query in Category C.
Different search engine pairs have different overlap of the search results. But, in all cases in this study, Google has the highest overlap with other search engines.
Compared with overlap, the distance of the search results retrieved by different search engines show only a slight variation. This indicates that each search engine independently adopts a different ranking algorithm.

Although only 58 popular queries and four major search engines were examined, this study illustrates that the distinct characterization of queries in different categories, and the independent ranking algorithm adopted by each search engine, result in distinguished search results. This will shed light on future research in the areas of proposing effective result-merging algorithms in metasearch engines and search engine evaluation algorithms.