BACKGROUND


Of the web searching studies on the characteristics of search engines, many of them try to estimate the coverage and overlap of the general web search engines. Using 575 queries obtained from the query log of NEC research laboratory, Lawrence and Giles (1998) estimated that, by the end of 1997, the indexable web contained 320 million pages. Bharat and Broder (1998) described a different technique for measuring the relative size and overlap of public web search engines. In contrast to Lawrence and Giles (1998), they adopted a different strategy of constructing more uniform random queries based on a lexicon of 400,000 words, which was built from the vocabulary of 300,000 pages present in the Yahoo! hierarchy. In a later study by Lawrence and Giles (1999), another method, random sampling of IP addresses, was introduced. They gave an estimate of 800 million pages as the size of the Web by February 1999. In all these studies, they found that the overlap among the general search engine's indexes database is surprising small. Some other researchers studied the dynamic characteristics of the Web, such as measuring search engine performance over time (see Bar-Ilan, 2001; BarIlan, 2002) and the growth and update dynamics of search engines (Risvik & Michelsen, 2002).

Other studies emphasize the searching behaviors of web users by analyzing the query logs of practical web search engines (Silverstein et al., 1998) analyzed a six-week period (from August 2 to September 13, 1998) of AltaVista search engine query logs consisting of approximately 1 billion queries. Jansen et al., Spink et al. (2001) and Spink et al. (2002) analyzed Excite web search engine query logs three times, collected in September 1997, December 1999 and May 2001, respectively. All of them report similar findings. That is, users tend to submit short queries, they mostly view only a few top-ranked web pages, and they seldom modify the queries. Some of the most popular queries are identified in their studies. The latest study by Spink et al. (2002) shows that, although search topics have shifted, there is little change in user search behaviors. Other related studies also exist, such as the effect of advanced operators on simple queries (Jansen, 2000) and the term co-occurrence in Internet search engine queries (Wolfram, 1999).

However, from users' points of view, they care little about the size of the Web or about which search engine has the largest indexes database. The users are concerned more about the overlap of the top N (10, 20 or 50) hits of the general search engines on specific queries, which motivates this study of the overlap and distance of search engine results.




(ed.) Intelligent Agents for Data Mining and Information Retrieval
(ed.) Intelligent Agents for Data Mining and Information Retrieval
ISBN: N/A
EAN: N/A
Year: 2004
Pages: 171

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net