How Google Searches the Web


To get the best out of Google it helps to know a little bit about how Google goes about searching the Web. This will help you create better queries and understand the results you are seeing and why they are ordered the way they are. Google ranks the results of a search by using “traditional” factors such as the URL (Web page address such as www.google.com), meta tags (invisible pieces of information about a Web page added by a Web page author), keywords (search terms), and its own patented technology called PageRank.

Google follows four steps to complete the search:

  • Finds all the pages that match the keywords on the page.

  • Ranks the pages using “traditional” factors (URL, meta tags, and keyword frequency).

  • Calculates the relevancy of the link text. How related are the keywords to what appears in the link?

  • Google displays the results using PageRank to determine the result order.

Google employs search bots (Web crawlers), which are special programs that search through Web pages, evaluating the pages based on certain criteria and creating an index that allows for rapid search results.

Traditional factors

There is certain information within a Web page that Web crawlers look at to analyze where a page should fall within the results of a search. The term for this is relevancy. Pages in a search engine most often appear in the results based on the most relevant pages first. There are several factors in determining the degree of relevancy a page might have. Some of these are obvious, such as the keywords appearing within the text of the Web page. Other, less obvious, factors are involved in determining the relevancy of the page. These include meta tags, descriptive tags, and link text.

Meta tags

Meta tags refer to information that is placed within the header of a Web page that is not visible to the person viewing the page. The information in these tags, which is stored in name/value pairs, is passed to search bots or Web crawlers like the ones Google uses. The information stored in these tags often includes keywords and descriptions of the page that the Web page author would like to associate with the page.

Even though Google does not give these tags much weight, it still looks for keywords in them. There are many types of meta tags but the two most commonly used for assisting in PageRank are the <DESCRIPTION> tag and the <KEYWORD> tag. The <DESCRIPTION> tag includes a text string describing the contents of the page. This is the description used by Google and other search engines to describe the page to Web searchers. The <KEYWORD> tag contains a comma-delimited list of keywords the Web page author feels are important in finding the Web page. This includes terms that may not be found within the actual text of the page. For example, an e-commerce site might include keywords such as hefty man in the <KEYWORD> tag but not include this term in the actual text viewed on the Web site. Someone then searching for hefty man clothing may then find this page. Where, without this keyword the page may not have been included in their search results. Other HTML tags, such as <TITLE>, <H1>, and <H2> are also searched by Google’s Web crawler for keywords.

The link text

The URL, or link text, has always been important in determining a page’s relevance. When the URL of a Web page contains the keyword it will be considered much more relevant by Google than a page that may contain the keywords within the text but not within the link text. For example, when you search for the word house using Google, the first five results have the word house in their URLs as shown in Figure 1.2.

image from book
Figure 1.2: When search terms appear in the URL, they are considered more relevant.

Cross-Ref 

The Web browser shown in Figure 1.2 displays the Google Toolbar. See Chapter 31 to learn more about downloading and installing the Google Toolbar. Another factor Google takes into account when considering page relevancy is the frequency with which the keyword appears on the Web page. The more times the word you are searching for appears within the Web page the higher the relevancy of that page.

PageRank

PageRank was created at Stanford University in 1985 as part of a research project studying a new kind of search engine. The project was developed by Larry Page and Sergey Brin. They created a functional search prototype that they called Google. Shortly after creating the PageRank technology Larry and Sergey founded Google, Inc., making use of the PageRank technology as a key element in its new Web search software.

If you search for PageRank on Google, you find the following description:

PageRank relies on the uniquely democratic nature of the web by using its vast link structure as an indicator of an individual page’s value. In essence, Google interprets a link from page A to page B as a vote, by page A, for page B. But, Google looks at more than the sheer volume of votes, or links a page receives; it also analyzes the page that casts the vote. Votes cast by pages that are themselves “important” weigh more heavily and help to make other pages “important.”

The PageRank that a Web page receives is determined by the number of links that are pointing to the page. So each link is basically a vote but since Google is not a democratic country, not all votes are equal. Some votes are more important and have greater value. PageRank gives a little more value to the votes on pages that are themselves listed higher in the PageRank.

Note 

Some votes can count against you. Webmasters, in order to increase their PageRank, add their pages to link farms. These are locations on the Internet that will add links to your page. Google punishes this behavior by removing the page from the Google index.

Of most importance to PageRank is the link to a Web page. The type of Web sites that link to a page will most determine the PageRank. For example, when there is a Web page that talks about Labradors (breed of dogs) and the American Kennel Club (www.akc.org) has a link to that page, this link is given greater weight than perhaps a link from a personal Web page. This is because the AKC Web page is specifically about dogs and breeds and is likely to have a very high PageRank of its own.



Google Power Tools Bible
Google Power Tools Bible
ISBN: 0470097124
EAN: 2147483647
Year: 2004
Pages: 353

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net