Section 8.7. Presenting Results | Information Architecture for the World Wide Web: Designing Large-Scale Web Sites

8.7. Presenting Results

What happens after your search engine has assembled the results to display? There are many ways to present results, so once again you'll need to make some choices. And as usual, the mysterious art of understanding your site's content and how users want to use it should drive your selection process.

When you are configuring the way your search engine displays results, there are two main issues to consider: which content components to display for each retrieved document, and how to list or group those results.

8.7.1. Which Content Components to Display

A very simple guideline is to display less information to users who know what they're looking for, and more information to users who aren't sure what they want. A variant on that approach is to show users who are clear on what they're looking for only representational content components, such as a title or author, to help them quickly distinguish the result they're seeking. Users who aren't as certain of what they're looking for will benefit from descriptive content components such as a summary, part of an abstract, or keywords to get a sense of what their search results are about. You can also provide users some choice of what to display; again, consider your users' most common information needs before setting a default. Figures 8-9 and 8-10 show a site that provides both options to users.

Figure 8-9. Salon uses search results with summaries to help users who want to learn about the documents they've retrieved

Figure 8-10. and without summaries for users who have a better sense of what they need

When it's hard to distinguish retrieved documents because of a commonly displayed field (such as the title), show more information, such as a page number, to help the user differentiate between results.

Another take on the same concept is shown in Figure 8-11, which displays three versions of the same book. Some of the distinctions are meaningful; you'll want to know which library has a copy available. Some aren't so helpful; you might not care who the publisher is.

Figure 8-11. Content components help distinguish three versions of the same book

How much information to display per result is also a function of how large a typical result set is. Perhaps your site is fairly small, or most users' queries are so specific that they retrieve only a small number of results. If you think that users would like more information in such cases, then it may be worth displaying more content components per result. But keep in mind that regardless of how many ways you indicate that there are more results than fit on one screen, many (if not most) users will never venture past that first screen. So don't go overboard with providing lots of content per result, as the first few results may obscure the rest of the retrieval.

Which content components you display for each result also depends on which components are available in each document (i.e., how your content is structured) and on how the content will be used. Users of phone directories, for example, want phone numbers first and foremost. So it makes sense to show them the information from the phone number field in the result itself, as opposed to forcing them to click through to another document to find this information (see Figure 8-12).

Figure 8-12. A yellow pages search doesn't force us to click through for a phone number

If you don't have much structure to draw from or if your engine is searching full text, showing the query terms within the "context" of the document's text is a useful variation on this theme (see Figure 8-13). In this example, E*Trade displays the query terms in boldan excellent practice, as it helps the user quickly scan the results page for the relevant part of each result. E*Trade further augments this useful context by highlighting the surrounding sentence.

Figure 8-13. e*Trade bolds the search query, and highlights its surrounding sentence to show its context

8.7.2. How Many Documents to Display

How many documents are displayed depends mostly on two factors. If your engine is configured to display a lot of information for each retrieved document, you'll want to consider having a smaller retrieval set, and vice versa. Additionally, a user's monitor resolution, connectivity speed, and browser settings will affect the number of results that can be displayed effectively. It may be safest to err toward simplicityby showing a small number of resultswhile providing a variety of settings that the user can select based on his own needs.

We do suggest that you let users know the total number of retrieved documents so they have a sense of how many documents remain as they sift through search results. Also consider providing a results navigation system to help them move through their results. In Figure 8-14, ICON Advisers provides such a navigation system, displaying the total number of results and enabling users to move through the result set 10 at a time.

Figure 8-14. ICON Advisers allows you to jump ahead through screens of ten results at a time

In many cases, the moment a user is confronted by a large result set is the moment he decides the number of results is too large. This is a golden opportunity to provide the user with the option of revising and narrowing his search. ICON Advisers could achieve this quite simply by repeating the query "retirement" in the search box in the upper right.

8.7.3. Listing Results

Now that you have a group of search results and a sense of which content components you wish to display for each, in what order should these results be listed? Again, much of the answer depends upon what kind of information needs your users start with, what sort of results they are hoping to receive, and how they would like to use the results.

There are two common methods for listing retrieval results: sorting and ranking. Retrieval results can be sorted chronologically by date, or alphabetically by any number of content component types (e.g., by title, by author, or by department). They can also be ranked by a retrieval algorithm (e.g., by relevance or popularity).

Sorting is especially helpful to users who are looking to make a decision or take an action. For example, users who are comparing a list of products might want to sort by price or another feature to help them make their choice. Any content component can be used for sorting, but it's sensible to provide users with the option to sort on components that will actually help them accomplish tasks. Which ones are task-oriented and which aren't, of course, depends upon each unique situation.

Ranking is more useful when there is a need to understand information or learn something. Ranking is typically used to describe retrieved documents' relevance, from most to least. Users look to learn from those documents that are most relevant. Of course, as we shall see, relevance is relative, and you should choose relevance ranking approaches carefully. Users will generally assume that the top few results are best.

The following sections provide examples of both sorting and ranking, as well as some ideas on what might make the most sense for your users.

8.7.3.1. Sorting by alphabet

Just about any content component can be sorted alphabetically (see Figure 8-15). Alphabetical sorts are a good general-purpose sorting approachespecially when sorting namesand in any case, it's a good bet that most users are familiar with the order of the alphabet! It works best to omit initial articles such as "a" and "the" from the sort order (certain search engines provide this option); users are more likely to look for "The Naked Bungee Jumping Guide" under "N" rather than "T."

Figure 8-15. Baseball-Reference.com displays search results in alphabetical order

8.7.3.2. Sorting by chronology

If your content (or your user) is time-sensitive, chronological sorts are a useful approach. And you can often draw from a filesystem's built-in dating if you have no other sources of date information.

If your site provides access to press releases or other news-oriented information, sorting by reverse chronological order makes good sense (see Figures 8-16 and 8-17). Chronological order is less common and can be useful for presenting historical data.

Figure 8-16. The Washington Post's default list ordering is by reverse-chronological order

Figure 8-17. as is Digg's

8.7.3.3. Ranking by relevance

Relevance-ranking algorithms (there are many flavors) are typically determined by one or more of the following:

How many of the query's terms occur in the retrieved document
How frequently those terms occur in that document
How close together those terms occur (e.g., are they adjacent, in the same sentence, or in the same paragraph?)
Where the terms occur (e.g., a document with the query term in its title may be more relevant than one with the query term in its body)
The popularity of the document where the query terms appear (e.g., is it linked to frequently, and are the sources of its links themselves popular?)

Different relevance-ranking approaches make sense for different types of content, but with most search engines, the content you're searching is apples and oranges. So, for example, Document A might be ranked higher than Document B, but Document B is definitely more relevant. Why? Because while Document B is a bibliographic citation to a really relevant work, Document A is a long document that just happens to contain many instances of the terms in the search query. So the more heterogeneous your documents are, the more careful you'll need to be with relevance ranking.

Indexing by humans is another means of establishing relevance. Keyword and descriptor fields can be searched, leveraging the value judgments of human indexers. For example, manually selected recommendationspopularly known as "Best Bets"can be returned as relevant results. In Figure 8-18, the first set of results was associated with the query "monty python" in advance.

Figure 8-18. A search of the BBC site retrieves a set of manually tagged documents as well as automatic results; the recommendations are called "Best Links" rather than "Best Bets" to avoid gambling connotations

Requiring an investment of human expertise and time, the Best Bets approach isn't trivial to implement and therefore isn't necessarily suitable to be developed for each and every user query. Instead, recommendations are typically used for the most common queries (as determined by search-log analysis) and combined with automatically generated search results.

There are other concerns with relevance ranking. It's tempting to display relevance scores alongside results; after all, those scores are what's behind the order of the results. In Figure 8-19, we searched for "storage" at Computer Associates' web site.

Figure 8-19. What do these relevance scores really mean?

The first result does seem quite promising. But what exactly is the difference between a document with a relevance score of 50 percent and one with 49 percent? They are scored similarly, but the top result is CA's events calendarwhich covers CA events. Interestingly, the events calendar doesn't even mention "storage." Other results are a bit more relevant, but this is an excellent illustration of how relevancy algorithms do strange and complicated things behind the scenes. We don't really know why the results are ranked this way. Showing scores only aggravates that sense of ignorance, so these should be used with caution; leaving out scores altogether is often a better approach.

8.7.3.4. Ranking by popularity

Popularity is the source of Google's popularity.

Put another way, Google is successful in large part because it ranks results by which ones are the most popular. It does so by factoring in how many links there are to a retrieved document. Google also distinguishes the quality of these links: a link from a site that itself receives many links is worth more than a link from a little-known site (this algorithm is known as PageRank).

There are other ways to determine popularity, but keep in mind that small sites or collections of separate, nonlinked sites (often referred to as "silos") don't necessarily take advantage of popularity as well as large, multisite environments with many users. The latter have a wide scope of usage and a richer set of links. A smaller site isn't likely to have enough variation in the popularity of different documents to merit this approach, while in a "silo" environment, little cross-pollination results in few links between sites. It's also worth noting that, to calculate relevance, Google uses over 100 other criteria in addition to PageRank.

8.7.3.5. Ranking by users' or experts' ratings

In an increasing number of situations, users are willing to rate the value of information. User ratings can be used as the basis of retrieval result ordering. In the case of Digg (see Figure 8-20), these ratingsbased on Digg users' votes on the pages submitted by other Digg usersare integral to helping users judge the value of an item, and form the foundation of an entire information economy. Of course, Digg has a lot of users who don't shrink from expressing their opinions, so there is a rich collection of judgments to draw on for ranking.

Figure 8-20. User ratings fuel the ranking of these Digg results

Most sites don't have a sufficient volume of motivated users to employ valuable user ratings. However, if you have the opportunity to use it, it can be helpful to display user ratings with a document, if not as part of a presentation algorithm.

8.7.3.6. Ranking by pay-for-placement

As banner-ad sales are no longer the most viable economic model, pay-for-placement (PFP) is becoming increasingly common to web-wide searching. Different sites bid for the right to be ranked high, or higher, on users' result lists. Yahoo! Search Marketing (Figure 8-21) is one of the most popular sites to take this approach.

Figure 8-21. Overture (now Yahoo! Search Marketing) used to auction the right to be ranked highly

If your site aggregates content from a number of different vendors, you might consider implementing PFP to present search results. Or if users are shopping, they might appreciate this approachwith the assumption that the most stable, successful sites are the ones that can afford the highest placement. This is somewhat like selecting the plumber with the largest advertisement in the yellow pages to fix your toilet.

8.7.4. Grouping Results

Despite all the ways we can list results, no single approach is perfect. Hybrid approaches like Google's show a lot of promise, but you typically need to be in the business of creating search engines to have this level of involvement with a tool. In any case, our sites are typically getting larger, not smaller. Search result sets will accordingly get larger as well, and so will the probability that those ideal results will be buried far beyond the point where users give up looking.

However, one alternative approach to sorting and ranking holds promise: clustering retrieved results by some common aspect. An excellent study^[] by researchers at Microsoft and the University of California at Berkeley show improved performance when results are clustered by category as well as by a ranked list. How can we cluster results? The obvious ways are, unfortunately, the least useful: we can use existing metadata, like document type (e.g., .doc, .pdf) and file creation/modification date, to allow us to divide search results into clusters. Much more useful are clusters derived from manually applied metadata, like topic, audience, language and product family. Unfortunately, approaches based on manual effort can be prohibitively expensive.

] Dumais, S.T., Cutrell, E. and Chen, H. "Optimizing search by showing results in context." Some automated tools are getting better at approximating the more useful topical types of clustering that often serve users best. In Figures 8-22 and 8-23, Clusty and WiseNut contextualize the query "RFID" with such topics as "Privacy," "Barcode," and "RFID implementation."

Figure 8-22. Clusty contextualizes search results for the query "RFID"

Figure 8-23. as does WiseNut; the positioning of related categories differs, as do the categories generated by both services

These clusters provide context for search results; by selecting the category that seems to fit your interest best, you're working with a significantly smaller retrieval set and (ideally) a set of documents that come from the same topical domain. This approach is much like generating search zones on the fly.

8.7.5. Exporting Results

You've provided users with a set of search results. What happens next? Certainly, they could continue to search, revising their query and their idea of what they're looking for along the way. Or, heavens, they might have found what they're looking for and are ready to move on. Contextual inquiry and task-analysis techniques will help you understand what users might want to do with their results. The following sections discuss a few common options.

8.7.5.1. Printing, emailing, or saving results

The user has finally reached his destination. He could bookmark the result, but he likely doesn't want to return to this document where it lives on the site. Instead, he wants to grab it and take it with him.

Obviously he can print it, but not all documents are designed to be printedthey may sport banner ads or be crowded with navigational options. If many of your users wish to print and your content isn't print-friendly, consider offering a "print this document" option that provides a clearer, more printable version of the document. Alternatively, users may want a digital version of the file. And as so many of us use our email programs as personal information managers, providing an "email this document" function can come in handy as well. Both functions are shown in Figure 8-24.

Figure 8-24. New York Times articles can be formatted for printing or emailed for later use

The New York Times also allows users to save articles for future retrieval. We wonder if many users take advantage of "Save," as most users rely on bookmarking options that aren't specific to a single site. The Times also provides a "reprints" option and enables users to toggle between multiple- and single-page views of the article. For the most part, these options are conceived with a good understanding of what users might want to do next, now that they've found an article of interest.

8.7.5.2. Select a subset of results

Sometimes you want to take more than one document along with you. You want to "shop" for documents just like you shop for books at Amazon. And if you're sorting through dozens or hundreds of results, you may need a way to mark the documents you like so you don't forget or lose track of them.

A shopping-cart feature can be quite useful in search-intensive environments such as a library catalog. In Figures 8-25 and 8-26, users can "save" a subset of their retrieval and then manipulate them in a "shopping basket" once they're done searching.

Figure 8-25. The Ann Arbor District Library catalog enables users to select a few records to "save"

Figure 8-26. and email the results or download them to a local disk

8.7.5.3. Save a search

In some cases it's the search itself, not the results, that you're interested in "keeping." Saved searches are especially useful in dynamic domains that you'd like to track over time; you can manually re-execute a saved search on a regular basis, or schedule that query to automatically be rerun regularly. Some search tools, like that of Science Magazine's ScienceNOW service, allow both, as shown in Figure 8-27.

Figure 8-27. Queries can be saved for future use and scheduled to be automatically re-executed on a regular basis

As search results become more "portable"allowing users to access them without visiting the originating search system's sitethey can be syndicated using RSS or Atom. For example, you can save and automatically re-execute a search in Google using the Google Alerts service, shown in Figure 8-28, and the results of your saved query can be monitored via an RSS or Atom feed (as well as by email).

Figure 8-28. Monitoring queries using Google Alerts; results can be delivered via RSS or Atom feeds, as well as email