Blooper 35: Search Myopia: Missing Relevant Items

< Day Day Up >

When you use a website's search facility, you expect it to show you everything related to the search terms and settings you give it. You expect it not to miss anything relevant. ^[5] In the data-retrieval field, a search facility's ability to fetch all relevant items is known as its recall.

Search facilities that miss relevant items are bad for two reasons. First, they mislead people into believing that everything of interest has been found, when it hasn't. Second, they risk losing users' confidence. If Web users see evidence that a website's search facility misses relevant items, they lose confidence in it-and perhaps in the site or company as a whole-and therefore use it less.

Missing relevant items is such a serious blooper for a search facility that one might expect to find it only in websites thrown together on shoestring budgets by kids , impoverished nonprofit organizations, or family businesses. Not so! This blooper is found even in websites of well-known companies.

The blooper can be found at iFilm.com. I used the Search box on the site's home page to look for films by Joe Bini, an independent filmmaker. It found several films about guys named Joe, but no Joe Bini (Figure 5.26[A]). A friend had already told me that Joe Bini is listed at iFilm, so I doubted the search results. Besides, Bini's an award-winning film-maker; he must be in there. I tried again with quotes around the name . Not surprisingly, I got nothing -not even films about guys named Joe (Figure 5.26[B]). It gave the message "Your search for 'Joe Bini' found no results.

Figure 5.26: www.iFilm.com (Jan. 2002)- A- Search for Joe Bini doesn't find his listing at the site. B- Adding quotes finds even less. C- Listing for Joe Bini is found by browsing.

TO COME

What's next ?" I thought, What's next is that I'm giving up on this useless search and will browse the site. I saw the box on the lower right asking, "Not getting the results you wanted?" and offering to help me refine my search, but I didn't think it would help. After all, how much more specific could one be than the guy's name?

Eventually, by browsing the site's categories, I found iFilm's listing for Joe Bini (Figure 5.26[C]). Even though the item is filed under "People: Joe Bini," it apparently is not indexed by that name for search purposes.

InfoWorld.com, an online technology news service, has a search facility that fails to find articles when given search terms that should match. In this case, I had come back to the site to find an article I found there previously. I knew the article was titled "White House Nixes Controversial National ID Notion." I searched for it using "white house nix"-quoted and unquoted-without success (Figure 5.27[A and B]). (I'll save the strange "estimated total number of results ..." for later discussion.) Unwilling to believe that the article had been removed, I tried "white house nixes." Oddly, that worked (Figure 5.27[C]). Normally in search functions, "nix" matches "nixes"-and "nixed," "nixing," and "Nixon"-but InfoWorld.com's search function apparently doesn't work that way.

Figure 5.27: www.InfoWorld.com (Feb. 2002)-( A , B ) Fruitless searches for a known article using the first few words of its title. C- Spelling "nixes" in full found the article.

Avoiding the Blooper

Like the solution to Blooper 34 (Duplicate Hits), the solution to this one focuses on the back end -the servers that store and retrieve data for a website-rather than on the front end-the design and organization of the site's pages.

Search facilities that fail to find data in the database can result from software bugs . However, that is not the usual cause of the problem. As the examples suggest, overlooked data more often results from the following:

Poorly indexed data, such as inadequate or inaccurate keywords on items
Weak search methods , such as relying completely on keywords or accepting only whole-word matches

One obvious solution to such problems is to index the content more carefully, thoroughly, and consistently. With multiple people adding content, the chances of them doing it inconsistently or insufficiently are higher, leading to diminished search accuracy. Indexing content carefully , thoroughly, and consistently means adding content-along with the data required to index it ^[6] -to the back-end system in a controlled way.

For example, when adding content items to the database, designers or content editors should try to anticipate and include, as keywords attached to the items, all the terms people might use to search for them. A site keyword lexicon listing allowable keywords and their meanings-and indicating which ones are synonyms-can help content editors choose consistent, predictable keywords for new content (Rosenfeld and Morville, 2002).

During site design and development, conduct tests to see how typical visitors to the site will look for items. Such testing need not be expensive, as early testing can be done very cheaply, without a computer, using questions on paper, such as the following: Suppose you wanted to find an article about XYZ; what search terms would you use to find it?

After a site is in use, it is of course not feasible to pretest keywords whenever new content is added. However, it is feasible and advisable to evaluate how easily site users find what they are looking for and what sorts of search terms they use. This can be done either by conducting periodic user tests or by having the site monitor usage of its search function.

Another solution is to use stronger search methods. Some methods rely completely on keywords. If keyword-only searches miss too much of your site's data, don't rely solely on this type of search. Search the actual text of the content or at least the title and abstract or lead paragraph. And if a user types partial words for search terms, find everything that matches them (within reason, of course). The goal is to maximize the search function's ability to find relevant items without significantly increasing its tendency to return irrelevant ones (discussed in the following blooper).

^[5] You also want the search facility not to show you items of little or no relevance to your search specification, but that's a separate issue, discussed in Blooper 36: Needle in a Haystack: Piles of Irrelevant Hits, in this chapter.

^[6] Referred to as metadata.

< Day Day Up >