DISCUSSION


In this experiment, the search results retrieved by the four search engines have little overlap. Over 75 percent of the total distinct results are returned by only one search engine, and less than 3 percent are retrieved by all four search engines. We think this is due to the various ranking algorithms adopted and to different coverage of the index database. On the other hand, the distribution of the overlap is relatively steady in all three cases (top 10, top 20, top 50). It means that the overlapped results do not definitely occur in high ranks.

The search results for the queries in different categories behave very differently. For the queries in Category A ( specific), all four search engines will retrieve same top 1 result in more than 80 percent of the time. However, they achieve little agreement on the queries in Category C (adult). In all three cases (top 10, top 20, top 50), the number of total distinct results retrieved for queries in Category C is, on average, 25 percent higher than that of the queries in Categories A or B. It also indicates that there are no obvious web site winners for adult- related content.

We present the overlap between any two search engines and the mean overlap of one search engine with any other search engine. There are, altogether, six search engine pairs for four search engines. Because the overlap is affected by several factors, such as the queries in different categories, and different cases (top 10, top 20 or top 50), no search engine pairs can obtain maximum overlap in all cases. However, when we calculate the average overlap for each search engine, Google always achieves the highest average overlap for all cases. It indicates that, in some degree, Google is highly recognized by other search engines.

Different from the overlap, the results for distance of search have a low variation over all search engine pairs. This may result from the fact that each search engine independently adopts different ranking algorithms. Since the distance is approximately 0.4 for most cases, it means that the ranking algorithms adopted by different search engines could, to some degree, achieve similarly ranked lists. The distance for the queries in Category A for the top 20 case is relatively lower than other situations, which is due to the highest overlap of the top 1 result for the queries in Category A.




(ed.) Intelligent Agents for Data Mining and Information Retrieval
(ed.) Intelligent Agents for Data Mining and Information Retrieval
ISBN: N/A
EAN: N/A
Year: 2004
Pages: 171

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net