Using PageRank


Google uses the PageRank algorithm to order the results returned by specific search queries. As such, understanding PageRank is crucial to core SEO efforts to improve natural search results.

Depending on who you ask, PageRank is named after its inventor, Lawrence Page, Google's co-founderor because it is a mechanism for ranking pages.

When a user enters a query, also called a search, into Google, the results are returned in the order of their PageRank.

Originally fairly simple in concept, PageRank now reportedly processes more than 100 variables. Since the exact nature of this "secret sauce" is, well, secret, the best thing you can do from an SEO perspective is more or less stick to the original concept.

The underlying idea behind PageRank is an old one that has been used by librarians in the pre-Web past to provide an objective method of scoring the relative importance of scholarly documents. The more citations other documents make to a particular document, the more "important" the document is, the higher its rank in the system, and the more likely it is to be retrieved first.

Let me break it down for you:

Each web page is assigned a number depending upon the number of other pages that link to the page.

The crucial element that makes PageRank work is the nature of the Web itself, which depends almost solely on the use of hyperlinking between pages and sites. In the system that makes Google's PageRank algorithm work, links are a web popularity contest: Webmaster A thinks Webmaster B's site has good information (or is cool, or looks good, or is funny), so Webmaster A may decide to add a link to Webmaster B's site. In turn, Webmaster B might return the favor.

Links from Web site A to Web site B are called outbound (from A) and inbound links (to B)

The more inbound links a page has (references from other sites), the more likely it is to have a higher PageRank. However, not all inbound links are of equal weight when it comes to how they contribute to PageRanknor should they be. A web page gets a higher PageRank if another significant source (by significant source I mean a source that also receives a lot of inbound links, and thus has a higher PageRank) links to it than if a trivial site without traffic provides the inbound link.


Note: PageRank is essentially a recursive algorithm; a given page's PageRank is the sum of the PageRanks of the pages that link to it (weighted by the total number of links, of course). In this scheme, a link from a high PageRank page clearly counts for more than a link from a low-ranking page.

The more sophisticated version of the PageRank algorithm currently used by Google involves more than simply crunching the number of links to a page and the PageRank of each page that provides an inbound link. While Google's exact method of calculating PageRank is shrouded in proprietary mystery, PageRank does try to exclude links from so-called link farms, pages that contain only links, and mutual linking (which are individual two-way links put up for the sole purpose of boosting PageRanks).


Tip: The easiest way to see the comparative PageRank for your web pages is to install the Google Toolbar. With a web page open, the PageRank is shown in the Toolbar on a scale of 0 to 10. These PageRanks are really between 0 and 1, so although the 0 to 10 scale is useful for comparison purposes, it does not represent an actual PageRank number.Note that you may have to specifically turn on the feature that displays PageRanks in the Google Toolbar; in some installations this feature is not enabled by default.

From the viewpoint of SEO, it's easy to understand some of the implications of PageRank. If you want your site to have a high PageRank, then you need to get as many high-ranked sites as possible to link to you. Paradoxically, outbound links reduce the PageRank of the linking site because they reduce overall traffic on the linking site (users are more likely to leave the original site if they have several links they can click).

However, useful outbound links draw traffic to the linking site and encourage other sites to return the favor because they respect the quality of the links the original site provides. So for SEO there's a delicate balancing act with outbound linking: some quality outbound links add merit to a site, but too many outbound links decrease desirability. Trial and error is probably the only way to get this one right.

Getting Technical: PageRank and Random Surfers

The PageRank formula can be thought of as a model of user behavior of "random surfers." Such a random surfer visits a random web page, keeps clicking links randomly, never clicking the back button, and eventually gets bored enough to visit a new random page by typing in the web address into the browser. The probability that the random surfer visits a particular page is its PageRank. The probability at each page that the random surfer will get bored and request a new random page is called the damping factor¸ represented by d in the formula.

Put this way, the PageRank for a specific web page can clearly be calculated by going through all the inbound links to a page, calculating the PageRanks of all these pages, backing up to calculate the inbound links in turn to the new set of pages, and so on, all the way back until there are no more inbound links. A little more technically, a web page's PageRank can be calculated by iterating recursively through all of its inbound linked pages. This is the fundamental method behind Google's search engine, although in the real world (as you likely know if you've read this far in this sidebar) there are usually non-recursive techniques that calculate results more quickly than the corresponding recursive algorithm.

The original formula for PageRank with further explanation is contained in the Brin and Page page at Stanford University (http://wwwdb.stanford.edu/~backrub/google.html). Here it is (PR stands for PageRank; A stands for a random page, identified as Page A; T1. . .Tn signifies all the pages that link to Page A; C(A) represents the number of Page A's outbound links):

 PR(A) = (1-d) + d(PR(T1)/C(T1) + ... + d(PR(Tn)/C(Tn)

PageRanks form a probability distribution over web pages, so the sum of all web pages' PageRanks is one.

The formula for PageRank has, of course, evolved since this formulationand, as I've mentioned, now involves more than 100 variablesand its exact nature is part of Google's proprietary technology. It's still the case that the best insight for SEO purposes into how Google works come from this early academic formulation.




Search Engine Optimization
The Truth About Search Engine Optimization
ISBN: 0789738317
EAN: 2147483647
Year: 2004
Pages: 54
Authors: Rebecca Lieb

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net