Blooper 36: Needle in a Haystack: Piles of Irrelevant Hits

< Day Day Up >

Just as you don't want a search facility to miss relevant items, you don't want it to return a lot of stuff that isn't really relevant to your search terms. Search facilities are measured not only by their recall -ability to find all relevant items-but also by their precision, or ability to exclude irrelevant items. On the Web, search facilities are also measured by their ability to sort results by relevance, so the most relevant items are listed first.

Unfortunately, search facilities that bury relevant items in irrelevant and barely relevant ones are even more common than search facilities that overlook relevant items. Irrelevant items are an annoying distraction even when rel atively few in number, especially if presented as if they were relevant. Conversely, a large number of low-relevance items isn't too harmful if all the relevant items are listed before them. What's worst is when a search facility returns a large number of items and fails to order them by actual relevance to the user 's search terms.

Spurious Matches as Distractions

United Airlines' website provides a good example of how spurious search results can distract and delay users (Figure 5.28). If you search for a flight to Minneapolis, instead of a list of flights , you first get a page that reads:

Figure 5.28: www.United.com (Feb. 2002)- A- A search for a flight to Minneapolis turns up B- seven matches- One is Minneapolis, but the other six are irrelevant.

Uncertain City/Airport Name : More than one city were found matching with your destination entry of 'minneapolis" in our databases. Please select an airport in or nearby the city of your choice from the following list. Otherwise go back and specify a different entry.

For now we'll ignore the poor English (" ... one city were found ... ") and focus on the fact that only one of the airports listed has anything to do with "Minneapolis: St. Paul International." Multiple choices might make sense if Minneapolis had more than one airport, but it doesn't. The other six airports listed are not only not in Minneapolis; they are in states nowhere near Minnesota! Furthermore, their names aren't even similar to "Minneapolis." Why these match the given destination is unclear. But they do, so anyone who uses United.com to book a flight to "Minneapolis" is forced to make this entirely unnecessary choice.

Poor Results Order

A good example of search results being poorly ordered comes from online bookstore BarnesAndNoble.com. I searched for book author "Ellen Isaacs." The search facility found 21 books (Figure 5.29). As the results page indicates, the books are sorted not by relevance to the given search terms, but by "bestselling order." In this order, books by Ellen Quigley (editor) and Isaac Bickerstaff (illustrator) are at the top of the list, and Ellen Isaacs' book doesn't appear until item 20. In other words, the book that best matches the search terms is 20th in a list of 21. Doesn't that seem odd?

Figure 5.29: www.BarnesAndNoble.com (Jan. 2002)- A- A look at the top of the search results shows results sorted in "bestselling order," not by relevance. B- At the bottom of the results, the best matching book is listed as item 20 of 21 items.

The results page provides buttons that resort the list alphabetically by title or by publication date, but not by relevance to the search terms. Imagine if the search facility had found 50 books, or 100, sorted by best-selling order.

VitaminShoppe.com provides a similar example in a different product domain. The user searched for "glucoseamine sulfate" (Figure 5.30). Everything it found matched at least one of those words, which is good. What isn't good is that products matching both words did not appear in the results list until item eight. Quoting the search terms didn't help. Clearly, relevance to a user's search terms is not the default order of VitaminShoppe.com's search results.

Figure 5.30: www.VitaminShoppe.com (Jan. 2002)-A search for "glucoseamine sulfate" gave the first seven items matching only "sulfate," and the actual hits start at item 8.

Searching for a Job at Dice.com

The search facility at Dice.com returns many irrelevant job descriptions and orders them poorly. Someone I know searched for "editor." It found 133 job listings supposedly matching that word. The first thirty were as follows :

Strong Project Editor
Freelance Video Editor
Resource Kit Technical Editor
Medical Managing Editor
Web Copy Editor
Freelance Editing/Final Cut Pro/Avid/Photoshop
Technical Documentation Editor/analyst
Senior Circuit Design Engineer
Technical Writer
Senior Circuit Design Engineer
Clarify Technical Lead/Senior Programmer
Photo Editor
IC Mask Layout Designer
Medical Writer/Editor
Filmbox Editor
IC Layout Mask Design Contractor
Senior Circuit Design Engineer SRAM, PLL, I/O
Backend Design Engineer
Clarify Tech Lead
Senior Technical Writer-HTML Editor
Oracle Application Serv Admin
Jr. UNIX Database Administrator-Shell Scripting, SQL, Unix Ad
Sr. Analog and RFIC Layout Designer
Product Developer
Unix Operator Consultant
Senior Analog Layout Designer
Senior SQL Server DBA/Biztalk
Sr. Mask Layout Engineer
Senior IC Mask Designer-Microprocessor products
Web Developer w/BizTalk and .Net

The first seven job titles actually contain the word "editor," which is good, but after that, the results go downhill fast. The titles of jobs 12, 14, 15, and 20 contain the word "editor," but for some reason they weren't placed before many items that don't contain "editor" and seem irrelevant. Finally, none of the jobs in items 21 through 133 seem even remotely relevant to the term "editor."

Avoiding the Blooper

Extraneous irrelevant hits usually occur for the same reasons as missed items: poor indexing and weak search methods . Poor ordering of results is usually due to faulty metrics for rating the relevance of items.

Again, this is a back-end implementation problem that strongly affects the usability of a website. Therefore, the best remedy is a back-end design process that is just as focused on users and their tasks as the front-end design process is. Back-end developers may squirm at this, but it is crucial: You cannot slap a user-friendly front end on a back end that was designed with no thought to usability and usefulness for actual user tasks.

Incorrect keywords on data items can totally destroy the accuracy of an otherwise good search engine. Erroneous keywords sometimes get attached to data items when new items are copied from old ones haphazardly. Sometimes it happens because the people hired to add content to the site don't really understand the site's central topic.

As with Blooper 35 (Search Myopia: Missing Relevant Items), the obvious remedy to this blooper is better procedures and oversight for adding indexing and maintaining content. Further, a lexicon of allowed keywords can help reduce randomness in assigning keywords to content items (Rosenfeld and Morville, 2002). The goal is to ensure that keywords on items are accurate and useful.

< Day Day Up >