Merely displaying the list of all the pages that contain the words in the query is not much help when there are more than a dozen pages. And with Web search, that is almost always the case. So, one of the most important parts of a search engine is the ranking algorithmthe part of the search engine that decides which pages show up at the top of the results list.
Both organic results and paid results must be ranked, but the organic ranking algorithm is by far the most complicated, so we tackle that first.
Ranking Organic Search Matches
A search engine's organic ranking algorithm is one of the trickiest parts of designing a search engine, so let's start by examining the simplest kind of ranking algorithm.
Ranking is just another word for sorting, the act of collating results into a certain order. Shopping search engines typically use simple ranking algorithms that the searcher can choose. When the searcher is looking for a product to buy, the shopping search engine might start by ordering the results by price (lowest to highest), but the searcher can decide to sort the list by other columns, such as availability (in stock, within one week, and so on), or any other features of the product.
But the kind of organic search that Yahoo! and Google use over billions of Web pages requires a much more sophisticated approach to ranking. For some kinds of information, such as news stories, ranking results by the date of the information (newest first) might make sense, but most organic search results are ranked by relevance, the degree to which the pages match the subject of the query.
More and more, organic search engines differentiate themselves on how their relevance ranking algorithms work, but every search engine uses certain standard techniques. We look at several factors that go into ranking algorithms, but one of the most interesting parts of designing a search engine is the interplay between those factors. Each factor is an ingredient in the ranking soup, and some engines use more of one ingredient than anotherone reason why different search engines show different results for the same query. Some place more value on one ranking factor than others do.
Because your goal as a search marketer is to get your pages to the top of the list, it is crucial that you understand why search engines put some pages at the top and others far down the list, where few searchers will ever see them. As we discuss these ranking factors, we constantly talk about tendencies, such as, "All else being equal, pages with more of the terms will rank higher than pages with fewer of the terms." But ranking algorithms are deliciously complicated and, all else, are rarely equal. Suffice it to say that if you pay attention to each of the factors as you design your site, you will have the best chance at high rankings.
We also try to use the parlance of search marketers and refer to a keyword. Unfortunately, keyword is a somewhat ambiguous termat times it means an individual word in a query, but at other times it denotes the entire query. We use keyword or query to refer to the entire string of words a searcher types, but refer to a word or a search term when it is important to emphasize an individual search query word.
Perhaps the simplest thing search engines look for is how many of the terms in the search query are actually found on the page. All other things being equal, pages that have more of the terms in the query (some search engines require all of the important terms) tend to rank higher.
However, it is more than mere term occurrence. It is more than just having all the terms of a particular query on your page. Keyword density, also known as keyword weight, is critical. In the old days of search, the more frequently the terms occurred on the page, the better. A page with a term frequency of ten occurrences for the word glaucoma was considered better than one with two, when that is what a searcher is looking for.
But the advent of Web search drove people to look for quick fixes in search rankings, and they started littering pages with words. (If 10 occurrences were good, why not 50?) So search engines have cracked down on keyword density. Now they look for a particular keyword density on a page and have decided that pages with around 7 percent of the words matching the query (a 7 percent keyword density) are "good" matches.
Most search queries, however, have more than one word. So search engines look at frequencies even more deeply. Ranking algorithms often decide between pages with a higher density in one term versus another. That is why a query for "glaucoma eye treatment" might look for pages with heavier densities of glaucoma and treatment rather than eye. Search engines can make these decisions based on the frequency of occurrence of a word throughout the entire Web. So, because the engine knows the word eye is much more common than glaucoma, the word glaucoma is a better differentiator for which pages best match that query. Similarly, pages that mention glaucoma frequently along with treatment are probably better than pages that mention treatment frequently along with one occurrence of the word glaucoma.
However, there is even more to it than that. If you think about it, the best possible pages might have the words glaucoma and treatment right next to each other. So pages that have higher keyword proximity (the terms are closer together) are often better than those that contain the terms separated by a few words, or worse, a few paragraphs. Web search engines work hard to find as many of the terms in the query as possible, with as many occurrences on the page as possible (up to that magical 7 percent threshold), as close to each other as possible. As you might imagine, it is critical for you, the search marketer, to write your pages using these keywords and phrases. We teach you how to do that in Chapter 12, "Optimize Your Content."
Besides knowing that a page contains the words in a search query, isn't it important to know where they appear on the page? You better believe it. All other things being equal, pages where query terms appear in important places, such as the page title, tend to rank higher than pages where the terms are buried at the page bottom. Pages that feature query words in titles and initial paragraphs are said to have high keyword prominence, because the keywords appear in more prominent places than on other pages.
Why do search engines emphasize keyword prominence? Because search engines are, at heart, pattern-matching machines. They are tuned to recognize various patterns associated with pages that strongly match queriespages with a pattern of keyword matches in prominent places are stronger matches than others.
So how does the search engine evaluate the prominence of terms it finds in various parts of the page? Here are the major categories, which are also depicted in Figure 2-4:
Figure 2-4. Keyword prominence within a page. Search engines treat matching words differently based on where those words are found on a page.
You can find much more detail on how to craft your pages to be more attractive to search engines through clever term placement in Chapter 12.
It might seem to you that term frequency and term placement techniques would suffice for good relevance ranking, but, in practice, they do not. These techniques formed the state-of-the-art in ranking before the Web, but the sheer number of pages on the Web has overwhelmed their effectiveness. Luckily, the Web also made possible a new factor, called link popularity (sometimes called link analysis), that dramatically improves ranking when used in conjunction with these older techniques.
Link popularity is a fancy name for a simple conceptWeb pages that other pages link to are better pages than Web pages that no one links to. It makes sense, doesn't it? The best pages on the Web are linked to by lots of other pages, and bad pages are not. Now there are certainly perfectly good pages that are new that no one has discovered yet, but the more links there are to a page, the more it tends to be a high-quality page with up-to-date and valuable information. If it weren't, people would stop linking.
For this reason, link popularity has emerged as a major factor in results ranking, sometimes outweighing the other two factors previously discussed. So, other things being equal, pages with more links to them tend to rank higher than other pages. It is easiest to see why this is a good idea by looking at an example.
Consider a search for the word "glaucoma"a one-word search query. It seems simple enough, but just imagine how difficult a task this is for the search engine. A million pages contain the word glaucoma. Why should we expect that the pages that have the most glaucoma occurrences or that contain glaucoma in the title are the best ones? There must be tens of thousands of pages with glaucoma in the title anyway. How does any search engine pick the top ten from such a long list?
Link popularity is the answer. The best pages for glaucoma are the ones that are the most respected sources of information. And the best surrogate a search engine can find for respect is how well each site is linked to the rest of the Web. Now, this being a book about search engines, we can't leave it that simple, but that is the basic idea.
For a few reasons, the actual algorithms that search engines use are more complex than a simple count of the number of links. One reason is that all links are not equal. If you think about this, you will agree it is true. If you knew that one site about glaucoma was linked to by the American Medical Association's Web site, and another glaucoma site was linked to by someone's personal Web page, which one would you trust more? Undoubtedly, it would be the AMA link. But how can a search engine tell the difference between those links?
It's simple, really. Every Web site on the Internet is given a calculation of its authority, or its intrinsic value, based on the links that come to it. So, as you might expect, the AMA site has high authority, because it has thousands of links coming in, and many of those links are themselves from highly respected sites. And each high-authority site, such as the AMA, conveys some of that authority to each site it links to. So sites that are linked to by high-authority sites have a little bit of that authority rub off, which they can then pass along to the sites they link to. It is complex to calculate, but every search engine uses this type of calculation to help rank its search results.
Google's algorithm, known as PageRank, is the most well known. Google calculates the PageRank of every page on the Web as a number between one and ten. To continue with the previous example, if the AMA page that links to a glaucoma site has a PageRank of six, and the personal home page has a page rank of one, the AMA page confers great authority to its link, whereas the other conveys almost none. Now if that personal home page turns out to be that of a well-known glaucoma researcher and other sites begin to link to that page, its PageRank might rise to three and thus confer more status on pages to which it links.
But links alone are not specific enough to yield good search rankings. A site might receive many inbound links from well-respected sites, but those links might be on different subjects from what is being searched for. Suppose the AMA linked to the glaucoma site because an AMA board member is on the board of the glaucoma organization, and the link was to that board member's biography? What at first seemed like a credible endorsement of glaucoma information now seems like quite a bit less than that.
To be sure about the relevance of the link to the searcher's query, search engines use anchor text, the words that appear as the name of the link on the page. Therefore, a link from the AMA site to the glaucoma site that is actually named "glaucoma" is much more pertinent than one that contains the name of the AMA board member. That is why the search engine uses the names of the links as part of the link popularity analysis, giving much higher consideration to links that contain search terms in the link names.
Search engines are fiendishly complicated, so in truth these algorithms are even more intricate than described here, but this is enough for you to understand the basics. Chapter 13, "Attract Links to Your Site," explains strategies you can use to improve the link factor for your site.
Ranking Paid Placement Matches
Paid placement matches use much simpler ranking algorithms than organic search, but they still require a bit of explanation.
The oldest way of ranking paid matches is the simplest: The highest bidder wins. Overture, the paid placement company now owned by Yahoo!, uses the high bidder technique. Overture invented the paid placement genre, and its Precision Match program supplies paid results for many search engines. Each advertiser bids the amount of money it will pay when a searcher clicks their advertisement, and the search engine displays the highest bidder's ad first among the paid results. Bids can change minute to minute, but the search engine always shows the current high bidder at the top of the list. Besides Yahoo!, all paid placement companies except for Google use the highest bidder ranking technique.
Google, never one to shy away from innovation, uses a somewhat different approach. Advertisers participating in Google's AdWords program bid the amount they are willing to pay for a searcher's click, just as with the highest bidder approach, but the highest bidder does not always get the top spot at Google. Instead, Google weighs both the bid and the clickthrough rate (the percentage of searchers that click the result after it has been displayed), choosing the best combination of bid amount and clickthrough rate as #1.
In this way, Google's ranking method rewards more relevant results (those with higher clickthrough rates) because they will rank higher than less-relevant results that have higher bids. Although this ranking algorithm might provide benefits to the searcher by showing more relevant results, it is not motivated entirely by altruismGoogle maximizes its overall paid placement revenue using this technique, whereas Yahoo!'s approach maximizes the bids, but at the expense of higher clickthrough.