Matching the Search Query


The search query is the technical name for what searchers type into a search engine to get search results. When someone enters "glaucoma treatments" into Google (or any other search engine), that is the search query. Experts usually describe each word in the search query as a search term. (In this example, "glaucoma" and "treatments" are search terms.) The search engine goes through several basic steps to find the pages that match, starting with analyzing the query.

Analyzing the Query

As soon as the searcher types his query and presses the Enter key, the search engine goes to work analyzing the queryexamining each word (search term) in the query and deciding how to find the best Web pages in the search index that match. Search engines do not all analyze queries the same way, but most engines share some basic analysis techniques. That is what we look at in this section.

Finding Word Variants and Correcting Spelling

In English and other Western languages, the same word can be written in different cases ("Glaucoma Treatments" rather than "glaucoma treatments"). Most search engines pay no attention to case, which is usually what the searcher wants, because a word that starts a sentence (for example) is just as good a match as one that does not. Occasionally, searchers might want a query for "the White House" to match only occurrences in the proper case (and not match a sentence such as "He lives in the white house on Fourth Street"), but search engines generally find better matches by ignoring case.

In the same way, simply matching the exact words in the search query does not always locate what searchers are looking for. For Western languages, most words have multiple formssingular and plural nouns, verb conjugationsthat mean basically the same thing. Many of these words look similar to each other: house and houses mean essentially the same concept in a sentence, and they look basically alike. But others that are equally related, such as mouse and mice, look a bit different. Verbs such as is and were look completely different but mean about the same thing. Some search engines know that these word variants should be searched for whenever one is used in a query, so a search for "mouse" looks for "mice," too.

But that is not all that search engines do. We have all seen search engines correct our spelling for us, sometimes just going ahead and changing our entry into a correctly spelled word, but usually asking us "Did you mean . . .?" and prompting us with a more common word. Although a wonderful feature for searchers, search marketers should beware of clever product names that look like misspelled wordsthey might get corrected to real words, making it harder for searchers to find your product. And if customers cannot spell your product names correctly, the spelling-correction algorithms will not always help, so choosing names people can spell easily will work in your favor.

Detecting Phrases, Antiphrases, and Stop Words

A phrase means something slightly different in search parlance than in its normal use. You might know that most Web search engines allow searchers to enclose multiple words in double quotation marks when they should be searched together, as though they are a single word. These enclosed words are what search engines call a phrase. Phrase searches look for the words exactly as they are in the query, in the same order, and prove useful for finding specific information.

What you might not know is that modern search engines analyze queries to look for phrases even when the searcher does not use quotation marks. Search engines can identify words that occur frequently together and give preference to pages that use the words together.

Similarly, many searchers enter extraneous words that really are not what they are looking for, as in this query: "What is the treatment for glaucoma?" You can imagine that "What is the" does not help find the proper pages and might even throw the search engine off by finding pages that contain what, is, and the. These search terms are called antiphrases and are ignored (or at least treated as less important) in queries by smarter search engines.

This is important to search marketers because you might have a brand name (such as "Where's Waldo?") that looks more like a searcher's question rather than the actual query. Search engines use other techniques to recognize popular examples such as "Where's Waldo,"but if your brand name is not well known, the search engines might not handle it well. If you can avoid cutesy names that will confuse search engines, you will be better off.

Finally, some words are just more important than others. Extremely common words (such as a or the) are usually called stop words because in the old days search engines would not ever look for them. Modern search engines know to pay attention to stop words at times, such as when you are searching for the rock group "The Who." As a search marketer, if you can avoid using stop words as critically important words in your brands and your trademarked names, that will make your names more easily searched.

If you work for clothing retailer The Limited, however, you probably do not have the luxury of changing the name! Unfortunately, you might find it is harder to get high search rankings because Google insists on looking simply for limited even when searchers enter "the limited." Now, The Limited is a well-known company, so when searchers enter "the limited," they will still probably find the right page. If your small business is called The Company, however, you might not be so lucky. And even a company as large as The Limited would find it hard to get high rankings for queries such as "the limited sale" or "the limited locations" when there are so many other pages that have those words on them.

Examining Word Order

Some search engines consider word order when they search, so the results differ depending on whether the search is for "Little Joe" versus "Joe Little." These engines try to find pages in which the words occur in the same order as they do in the query.

Again, this is a boon for searchers, but not always for search marketers. If your product has the catchy name of Enterprise Management Storage System, do not be surprised if some customers remember it as Enterprise Storage Management System or Storage Management Enterprise System. To the extent that your names are memorable in the correct order, that will aid searchability.

Processing Search Operators

A few savvy searchers know how to use the plus and minus operators in their queries, such as "big brother tv" to find the Big Brother charitable organization rather than the TV show of the same name. Similarly, searchers can demand that a term be included in the results, as in "+the white house" to avoid sentences that talk about a white house (and ignore the as a stop word).

As search engines get smarter, it is less and less important for searchers to use these operators, but search marketers need to know them. If you have a choice, you want to avoid using brand and trademarked names that require these operators to produce good results.

Choosing Matches to the Query

After the query has been analyzed, the search engine must decide which results to present. With so many possibilities, how does the search engine find matches so quickly? Different approaches are taken for organic and paid results, and we look at the organic approach first.

Selecting Organic Search Matches

An organic search engine uses its search index to locate the matching pages. Basically, the query analysis determined which words to look upnot just the words that were typed as the query, but any word variants (mouse and mice)and which to ignore (stop words and antiphrases). The engine goes to work looking up each word in the query to see which pages contain the word.

The search index can be thought of as an alphabetic list of every word that occurs on every page of the Web (as shown in Figure 2-3). The index contains a list of every Web page that contains each word. So, when you look up the word "glaucoma," you get a list of every page that contains that word.

Figure 2-3. How pages are retrieved from the index. Organic search engines check an index for the list of pages that contain each word in the search query.


That is the simplest case. It is more complicated when searchers enter more complex queries. If the searcher were looking for "glaucoma treatment," the engine would look up every page that contains each word, giving it a list of pages that contain the word glaucoma and a list of pages that contain the word treatment. Most search engines, faced with this decision, decide to show just the pages that contain both words. So they look through the two lists and find which Web pages are listed on both.

Some engines have more sophisticated rules about what they show for multiple-word queries. Consider a query such as "glaucoma eye treatment." Because the word eye is so much more common than the other words, some search engines might show some pages that contain the words glaucoma and treatment even if they do not happen to contain the word eye.

When you start adding word variants (eye and eyes, treatment and treatments), you can see that there might be many lists of pages that the search engine must look through quickly to determine the final list of pages to display.

Selecting Paid Placement Matches

Paid placement results are not retrieved from a search index, the way organic matches are, but the search engine does consult a database that stores all the listings that advertisers have submitted. Each advertiser selects the words and phrases that should match its listing, and submits a bid amountthat amount is charged to the advertiser each time the ad is clicked by a searcher.

EVALUATING ORGANIC SEARCH RESULTS: WHAT ARE PRECISION AND RECALL?

Search experts traditionally evaluate organic search engines according to measurements of precision and recall.

Precision measures the percentage of search results that are "correct" answers for a query. Precision attempts to show how well the search engine provides "good" results rather than "bad" results. So, looking at the list of ten results, the searcher can subjectively decide whether each one seems an appropriate answer to the search query. Time was that a person could actually examine every result for a particular query, but with hundreds of thousands of results being returned on the Web, no one can actually measure precision anymore (at least not precisely!).

In contrast, recall compares the number of correct search results returned for a particular query to the total possible correct results that should have been returned for that query. Recall tries to measure the fraction of correct results that were found, rather than missed, by the search engine. Historically, researchers could actually examine every document in a collection and decide subjectively which ones should have been returned for a particular query. With the advent of the Web, this is no longer possible, although sometimes it is still noticeable when a result that should have been present is missing.

Precision and recall work against each other. If a query returns only one page, and it is a correct answer, that is 100 percent precision (all answers are correct), but the recall measure is probably awful. (Likely, many other pages on the Web should also have been found.) Likewise, if a query returns all four billion pages on the Web, it has 100 percent recall (all the correct answers have been returned, and none missed), but precision is lousy because the vast majority of answers are wrong.

If no one can actually measure recall and precision on the Web, why are they important? Obviously, they are no longer important as measurements, but they remain important as concepts. Precision is important with searches that return many results. Lots of extraneous results frustrate searchers. Less frequently, searchers complain that they cannot find certain results they expect. Low recall of pages is the culprit, even if most searchers do not express the problem that way.

Although only experts (and now you) know these concepts, all searchers understand intuitively that they do not want to see wrong answers and do not want to miss correct answers. And the experts often drive search engine popularity, which is why you care about these concepts. "Buzz" among the digerati can drive usage of one search engine versus another. So if you think that "Google is slipping" or you hear experts talking about how "Yahoo! is improving its recall" or "Ask Jeeves has more accurate results," it might be a sign of a popularity shift coming.


The search engine uses the query analysis to decide which words should be searched for (just as with organic search) and looks up those words in the paid listing database. Each listing associated with the query terms is retrieved from the paid listing database.

Although the process sounds similar to organic search, in practice it is far simpler. The advertisers typically control exactly which words should match their ads, so far less analysis is required to find synonyms, for example. Moreover, rather than sifting through billions of pages, there are far fewer advertisements to pick from. In short, paid placement results are chosen much the same way that organic results arethe query is analyzed, and the results that match the words in the query are selected. For paid placement, however, there is typically a lot less work for the search engine to do.



    Search Engine Marketing, Inc. Driving Search Traffic to Your Company's Web Site
    Search Engine Marketing, Inc.: Driving Search Traffic to Your Companys Web Site (2nd Edition)
    ISBN: 0136068685
    EAN: 2147483647
    Year: 2005
    Pages: 138

    flylib.com © 2008-2017.
    If you may any questions please contact us: flylib@qtcs.net