Chapter 8. Search Systems


Chapter 8. Search Systems

What we'll cover:
Determining whether your site needs a search system
The basic anatomy of a search system
What to make searchable
A basic understanding of retrieval algorithms
How to present retrieval results
Search interface design
Where to learn more

Chapter 7 helped you create the best navigation system possible for your web site. This chapter describes another form of finding information: searching. Searching (and more broadly, information retrieval) is an expansive, challenging, and well-established field, and we can only scratch the surface here. We'll limit our discussion to what makes up a search system, when to implement search systems, and some practical advice on how to design a search interface and display search results.

This chapter often uses examples of search systems from sites that allow you to search the entire Web in addition to site-specific search engines. Although these web-wide tools tend to index a very broad collection of content, it's nonetheless extremely useful to study them. Of all search systems, none has undergone the testing, usage, and investment that web-wide search tools have, so why not benefit from their research? Many of these tools are available for use on local sites as well.



8.1. Does Your Site Need Search?

Before we delve into search systems, we need to make a point: think twice before you make your site searchable.

Your site should, of course, support the finding of its information. But as the preceding chapters demonstrate, there are other ways to support finding. And be careful not to assume, as many do, that a search engine alone will satisfy all users' information needs. While many users want to search a site, some are natural browsers, preferring to forego filling in that little search box and hitting the "search" button. We suggest you consider the following questions before committing to a search system for your site.


Does your site have enough content?

How much content is enough to merit the use of a search engine? It's hard to say. It could be 5, 50, or 500 pages; no specific number serves as a standard threshold. What's more important is the type of information need that's typical of your site's users. Users of a technical support site often have a specific kind of information in mind, and are more likely to require search than users of an online banking site. If your site is more like a library than a software application, then search probably makes sense. If that's the case, then consider the volume of content, balancing the time required to set up and maintain a search system with the payoff it will bring to your site's users.


Will investing in search systems divert resources from more useful navigation systems?

Because many site developers see search engines as the solution to the problems users have when trying to find information in their sites, search engines become Band-Aids for sites with poorly designed navigation systems and other architectural weaknesses. If you see yourself falling into this trap, you should probably suspend implementing your search system until you fix your navigation system's problems. You'll find that search systems often perform better if they can take advantage of aspects of strong navigation systems, such as the controlled vocabulary terms used to tag content. And users will often benefit even more from using both types of finding if they work together well. Of course, your site's navigation might be a disaster for political reasons, such as an inability among your organization's decision-makers to agree on a site-wide navigation system. In such cases, reality trumps what ought to be, and search might indeed be your best alternative.


Do you have the time and know-how to optimize your site's search system?

Search engines are fairly easy to get up and running, but like many things on the Web, they are difficult to implement effectively. As a user of the Web, you've certainly seen incomprehensible search interfaces, and we're sure that your queries have retrieved some pretty inscrutable results. This is often due to a lack of planning by the site developer, who probably installed the search engine with its default settings, pointed it at the site, and forgot about it. If you don't plan on putting some significant time into configuring your search engine properly, reconsider your decision to implement it.


Are there better alternatives?

Search may be a good way to serve your site's users, but other ways may work better. For example, if you don't have the technical expertise or confidence to configure a search engine or the money to shell out for one, consider providing a site index instead. Both site indexes and search engines help users who know what they're looking for. While a site index can be a heck of a lot of work, it is typically created and maintained manually, and can therefore be maintained by anyone who knows HTML.


Will your site's users bother with search?

It may already be clear that your users would rather browse than search. For example, users of a greeting card site may prefer browsing thumbnails of cards instead of searching. Or perhaps users do want to search, but searching is a lower priority for them, and it should be for you as you consider how to spend your information architecture development budget.

Now that we've got our warnings and threats out of the way, let's discuss when you should implement search systems. Most web sites, as we know, aren't planned out in much detail before they're built. Instead, they grow organically. This may be all right for smaller web sites that aren't likely to expand much, but for ones that become popular, more and more content and functional features get piled on haphazardly, leading to a navigation nightmare. The following issues will help you decide when your site has reached the point of needing a search system.


Search helps when you have too much information to browse

There's a good analogy of physical architecture. Powell's Books (http://www.powells.com), which claims to be the largest bookstore in the world, covers an entire city block (68,000 square feet) in Portland, Oregon. We guess that it started as a single small storefront on that block, but as the business grew, the owners knocked a doorway through the wall into the next storefront, and so on, until it occupied the whole block. The result is a hodgepodge of chambers, halls with odd turns, and unexpected stairways. This chaotic labyrinth is a charming place to wander and browse, but if you're searching for a particular title, good luck. It will be difficult to find what you're looking for, although if you're really lucky you might serendipitously stumble onto something better.

Yahoo! once was a web version of Powell's. At first, everything was there and fairly easy to find. Why? Because Yahoo!, like the Web, was relatively small. At its inception, Yahoo! pointed to a few hundred Internet resources, made accessible through an easily browsable subject hierarchy. No search option was available, something unimaginable to Yahoo! users today. But things soon changed. Yahoo! had an excellent technical architecture that allowed site owners to easily self-register their sites, but Yahoo!'s information architecture was not well planned and couldn't keep up with the increasing volume of resources that were added daily. Eventually, the subject hierarchy became too cumbersome to navigate, and Yahoo! installed a search system as an alternative way of finding information in the site. Nowadays, far more people use Yahoo!'s search engine instead of browsing through its taxonomy, which indeed disappeared from Yahoo!'s main page eons ago.

Your site probably isn't as large as Yahoo!, but it's probably experienced a similar evolution. Has your content outstripped your browsing systems? Do your site's users go insane trying to spot the right link on your site's hugely long category pages? Then perhaps the time has come for search.


Search helps fragmented sites

Powell's room after room after room of books is also a good analogy for the silos of content that make up so many intranets and large public sites. As is so often the case, each business unit has gone ahead and done its own thing, developing content haphazardly with few (if any) standards, and probably no metadata to support any sort of reasonable browsing.

If this describes your situation, you have a long road ahead of you, and search won't solve all of your problemslet alone your users' problems. But your highest priority should be to set up a search system to perform full-text indexing of as much cross-departmental content as possible. Even if it's only a stopgap, search will address your users' dire need for finding information regardless of which business unit actually owns it. Search will also help you, as the information architect, to get a better handle on what content is actually out there.


Search is a learning tool

Through search-log analysis, which we touched on in Chapter 6, you can gather useful data on what users actually want from your site, and how they articulate their needs (in the form of search queries). Over time you can analyze this valuable data to diagnose and tune your site's search system, other aspects of its information architecture, the performance of its content, and many other areas as well.


Search should be there because users expect it to be there

Your site probably doesn't contain as much content as Yahoo!, but if it's a substantial site, it probably merits a search engine. There are good reasons for this. Users won't always be willing to browse through your site's structure; their time is limited, and their cognitive-overload threshold is lower than you think. Interestingly, sometimes users won't browse for the wrong reasonsthat is, they search when they don't necessarily know what to search for and would be better served by browsing. But perhaps most of all, users expect that little search box wherever they go. It's a default convention, and it's hard to stand against the wave of expectations.


Search can tame dynamism

You should also consider creating a search system for your site if it contains highly dynamic content. For example, if your site is a web-based newspaper, you might be adding dozens of story files daily via a commercial newsfeed or some other form of content syndication. For this reason, you probably wouldn't have the time each day to manually catalog your content or maintain elaborate tables of contents and site indexes. A search engine could help you by automatically indexing the contents of the site once or many times daily. Automating this process ensures that users have quality access to your site's content, and you can spend time doing things other than manually indexing and linking the story files as they come in.