8.7. Presenting ResultsWhat happens after your search engine has assembled the results to display? There are many ways to present results, so once again you'll need to make some choices. And as usual, the mysterious art of understanding your site's content and how users want to use it should drive your selection process. When you are configuring the way your search engine displays results, there are two main issues to consider: which content components to display for each retrieved document, and how to list or group those results. 8.7.1. Which Content Components to DisplayA very simple guideline is to display less information to users who know what they're looking for, and more information to users who aren't sure what they want. A variant on that approach is to show users who are clear on what they're looking for only representational content components, such as a title or author, to help them quickly distinguish the result they're seeking. Users who aren't as certain of what they're looking for will benefit from descriptive content components such as a summary, part of an abstract, or keywords to get a sense of what their search results are about. You can also provide users some choice of what to display; again, consider your users' most common information needs before setting a default. Figures 8-9 and 8-10 show a site that provides both options to users. Figure 8-9. Salon uses search results with summaries to help users who want to learn about the documents they've retrievedFigure 8-10. and without summaries for users who have a better sense of what they needWhen it's hard to distinguish retrieved documents because of a commonly displayed field (such as the title), show more information, such as a page number, to help the user differentiate between results. Another take on the same concept is shown in Figure 8-11, which displays three versions of the same book. Some of the distinctions are meaningful; you'll want to know which library has a copy available. Some aren't so helpful; you might not care who the publisher is. Figure 8-11. Content components help distinguish three versions of the same bookHow much information to display per result is also a function of how large a typical result set is. Perhaps your site is fairly small, or most users' queries are so specific that they retrieve only a small number of results. If you think that users would like more information in such cases, then it may be worth displaying more content components per result. But keep in mind that regardless of how many ways you indicate that there are more results than fit on one screen, many (if not most) users will never venture past that first screen. So don't go overboard with providing lots of content per result, as the first few results may obscure the rest of the retrieval. Which content components you display for each result also depends on which components are available in each document (i.e., how your content is structured) and on how the content will be used. Users of phone directories, for example, want phone numbers first and foremost. So it makes sense to show them the information from the phone number field in the result itself, as opposed to forcing them to click through to another document to find this information (see Figure 8-12). Figure 8-12. A yellow pages search doesn't force us to click through for a phone numberIf you don't have much structure to draw from or if your engine is searching full text, showing the query terms within the "context" of the document's text is a useful variation on this theme (see Figure 8-13). In this example, E*Trade displays the query terms in boldan excellent practice, as it helps the user quickly scan the results page for the relevant part of each result. E*Trade further augments this useful context by highlighting the surrounding sentence. Figure 8-13. e*Trade bolds the search query, and highlights its surrounding sentence to show its context8.7.2. How Many Documents to DisplayHow many documents are displayed depends mostly on two factors. If your engine is configured to display a lot of information for each retrieved document, you'll want to consider having a smaller retrieval set, and vice versa. Additionally, a user's monitor resolution, connectivity speed, and browser settings will affect the number of results that can be displayed effectively. It may be safest to err toward simplicityby showing a small number of resultswhile providing a variety of settings that the user can select based on his own needs. We do suggest that you let users know the total number of retrieved documents so they have a sense of how many documents remain as they sift through search results. Also consider providing a results navigation system to help them move through their results. In Figure 8-14, ICON Advisers provides such a navigation system, displaying the total number of results and enabling users to move through the result set 10 at a time. Figure 8-14. ICON Advisers allows you to jump ahead through screens of ten results at a timeIn many cases, the moment a user is confronted by a large result set is the moment he decides the number of results is too large. This is a golden opportunity to provide the user with the option of revising and narrowing his search. ICON Advisers could achieve this quite simply by repeating the query "retirement" in the search box in the upper right. 8.7.3. Listing ResultsNow that you have a group of search results and a sense of which content components you wish to display for each, in what order should these results be listed? Again, much of the answer depends upon what kind of information needs your users start with, what sort of results they are hoping to receive, and how they would like to use the results. There are two common methods for listing retrieval results: sorting and ranking. Retrieval results can be sorted chronologically by date, or alphabetically by any number of content component types (e.g., by title, by author, or by department). They can also be ranked by a retrieval algorithm (e.g., by relevance or popularity). Sorting is especially helpful to users who are looking to make a decision or take an action. For example, users who are comparing a list of products might want to sort by price or another feature to help them make their choice. Any content component can be used for sorting, but it's sensible to provide users with the option to sort on components that will actually help them accomplish tasks. Which ones are task-oriented and which aren't, of course, depends upon each unique situation. Ranking is more useful when there is a need to understand information or learn something. Ranking is typically used to describe retrieved documents' relevance, from most to least. Users look to learn from those documents that are most relevant. Of course, as we shall see, relevance is relative, and you should choose relevance ranking approaches carefully. Users will generally assume that the top few results are best. The following sections provide examples of both sorting and ranking, as well as some ideas on what might make the most sense for your users. 8.7.3.1. Sorting by alphabetJust about any content component can be sorted alphabetically (see Figure 8-15). Alphabetical sorts are a good general-purpose sorting approachespecially when sorting namesand in any case, it's a good bet that most users are familiar with the order of the alphabet! It works best to omit initial articles such as "a" and "the" from the sort order (certain search engines provide this option); users are more likely to look for "The Naked Bungee Jumping Guide" under "N" rather than "T." Figure 8-15. Baseball-Reference.com displays search results in alphabetical order8.7.3.2. Sorting by chronologyIf your content (or your user) is time-sensitive, chronological sorts are a useful approach. And you can often draw from a filesystem's built-in dating if you have no other sources of date information. If your site provides access to press releases or other news-oriented information, sorting by reverse chronological order makes good sense (see Figures 8-16 and 8-17). Chronological order is less common and can be useful for presenting historical data. Figure 8-16. The Washington Post's default list ordering is by reverse-chronological orderFigure 8-17. as is Digg's8.7.3.3. Ranking by relevanceRelevance-ranking algorithms (there are many flavors) are typically determined by one or more of the following:
Different relevance-ranking approaches make sense for different types of content, but with most search engines, the content you're searching is apples and oranges. So, for example, Document A might be ranked higher than Document B, but Document B is definitely more relevant. Why? Because while Document B is a bibliographic citation to a really relevant work, Document A is a long document that just happens to contain many instances of the terms in the search query. So the more heterogeneous your documents are, the more careful you'll need to be with relevance ranking. Indexing by humans is another means of establishing relevance. Keyword and descriptor fields can be searched, leveraging the value judgments of human indexers. For example, manually selected recommendationspopularly known as "Best Bets"can be returned as relevant results. In Figure 8-18, the first set of results was associated with the query "monty python" in advance. Figure 8-18. A search of the BBC site retrieves a set of manually tagged documents as well as automatic results; the recommendations are called "Best Links" rather than "Best Bets" to avoid gambling connotationsRequiring an investment of human expertise and time, the Best Bets approach isn't trivial to implement and therefore isn't necessarily suitable to be developed for each and every user query. Instead, recommendations are typically used for the most common queries (as determined by search-log analysis) and combined with automatically generated search results. There are other concerns with relevance ranking. It's tempting to display relevance scores alongside results; after all, those scores are what's behind the order of the results. In Figure 8-19, we searched for "storage" at Computer Associates' web site. Figure 8-19. What do these relevance scores really mean?The first result does seem quite promising. But what exactly is the difference between a document with a relevance score of 50 percent and one with 49 percent? They are scored similarly, but the top result is CA's events calendarwhich covers CA events. Interestingly, the events calendar doesn't even mention "storage." Other results are a bit more relevant, but this is an excellent illustration of how relevancy algorithms do strange and complicated things behind the scenes. We don't really know why the results are ranked this way. Showing scores only aggravates that sense of ignorance, so these should be used with caution; leaving out scores altogether is often a better approach. 8.7.3.4. Ranking by popularityPopularity is the source of Google's popularity. Put another way, Google is successful in large part because it ranks results by which ones are the most popular. It does so by factoring in how many links there are to a retrieved document. Google also distinguishes the quality of these links: a link from a site that itself receives many links is worth more than a link from a little-known site (this algorithm is known as PageRank). There are other ways to determine popularity, but keep in mind that small sites or collections of separate, nonlinked sites (often referred to as "silos") don't necessarily take advantage of popularity as well as large, multisite environments with many users. The latter have a wide scope of usage and a richer set of links. A smaller site isn't likely to have enough variation in the popularity of different documents to merit this approach, while in a "silo" environment, little cross-pollination results in few links between sites. It's also worth noting that, to calculate relevance, Google uses over 100 other criteria in addition to PageRank. 8.7.3.5. Ranking by users' or experts' ratingsIn an increasing number of situations, users are willing to rate the value of information. User ratings can be used as the basis of retrieval result ordering. In the case of Digg (see Figure 8-20), these ratingsbased on Digg users' votes on the pages submitted by other Digg usersare integral to helping users judge the value of an item, and form the foundation of an entire information economy. Of course, Digg has a lot of users who don't shrink from expressing their opinions, so there is a rich collection of judgments to draw on for ranking. Figure 8-20. User ratings fuel the ranking of these Digg resultsMost sites don't have a sufficient volume of motivated users to employ valuable user ratings. However, if you have the opportunity to use it, it can be helpful to display user ratings with a document, if not as part of a presentation algorithm. 8.7.3.6. Ranking by pay-for-placementAs banner-ad sales are no longer the most viable economic model, pay-for-placement (PFP) is becoming increasingly common to web-wide searching. Different sites bid for the right to be ranked high, or higher, on users' result lists. Yahoo! Search Marketing (Figure 8-21) is one of the most popular sites to take this approach. Figure 8-21. Overture (now Yahoo! Search Marketing) used to auction the right to be ranked highlyIf your site aggregates content from a number of different vendors, you might consider implementing PFP to present search results. Or if users are shopping, they might appreciate this approachwith the assumption that the most stable, successful sites are the ones that can afford the highest placement. This is somewhat like selecting the plumber with the largest advertisement in the yellow pages to fix your toilet. 8.7.4. Grouping ResultsDespite all the ways we can list results, no single approach is perfect. Hybrid approaches like Google's show a lot of promise, but you typically need to be in the business of creating search engines to have this level of involvement with a tool. In any case, our sites are typically getting larger, not smaller. Search result sets will accordingly get larger as well, and so will the probability that those ideal results will be buried far beyond the point where users give up looking. However, one alternative approach to sorting and ranking holds promise: clustering retrieved results by some common aspect. An excellent study[] by researchers at Microsoft and the University of California at Berkeley show improved performance when results are clustered by category as well as by a ranked list. How can we cluster results? The obvious ways are, unfortunately, the least useful: we can use existing metadata, like document type (e.g., .doc, .pdf) and file creation/modification date, to allow us to divide search results into clusters. Much more useful are clusters derived from manually applied metadata, like topic, audience, language and product family. Unfortunately, approaches based on manual effort can be prohibitively expensive.
|