Section 8.9. Where to Learn More

8.9. Where to Learn More

Although this is the longest chapter in this book, we've covered only the tip of the search system iceberg. If this piqued your interest, you may want to delve further into the field of information retrieval. Three of our favorite texts are:

  • Modern Information Retrieval by Ricardo Baeza-Yates and Berthier Ribeiro-Neto (Addison-Wesley).

  • Concepts of Information Retrieval by Miranda Lee Pao (Libraries Unlimited). This title is out of print, but you may be able to find used copies on Amazon.

  • On Search, the Series by Tim Bray, an excellent collection of essays on search written by the father of XML (

If you're looking for more immediate and practical advice, the most useful site for learning about search tools is, naturally, (, Avi Rappoport's compendium of installation and configuration advice, product listings, and industry news. Another excellent source is Danny Sullivan's Search Engine Watch (, which focuses on web-wide searching but is quite relevant to site-wide searching nonetheless.

Chapter 9. Thesauri, Controlled Vocabularies, and Metadata

What we'll cover:
Definitions of metadata and controlled vocabularies
Overview of synonym rings, authority files, classification schemes, and thesauri
Hierarchical, equivalence, and associative relationships
Faceted classification and guided navigation

A web site is a collection of interconnected systems with complex dependencies. A single link on a page can simultaneously be part of the site's structure, organization, labeling, navigation, and searching systems. It's useful to study these systems independently, but it's also crucial to consider how they interact. Reductionism will not tell us the whole truth.

Metadata and controlled vocabularies present a fascinating lens through which we can view the network of relationships between systems. In many large metadata-driven web sites, controlled vocabularies have become the glue that holds the systems together. A thesaurus on the back end can enable a more seamless and satisfying user experience on the front end.

In addition, the practice of thesaurus design can help bridge the gap between past and present. The first thesauri were developed for libraries, museums, and government agencies long before the invention of the World Wide Web. As information architects we can draw upon these decades of experience, but we can't copy indiscriminately. The web sites and intranets we design present new challenges and demand creative solutions.

But we're getting ahead of ourselves. Let's begin by defining some basic terms and concepts. Then we can work back toward the big picture.

9.1. Metadata

When it comes to definitions, metadata is a slippery fish. Describing it as "data about data" isn't very helpful. The following excerpt from takes us a little further:

In data processing, meta-data is definitional data that provides information about or documentation of other data managed within an application or environment. For example, meta-data would document data about data elements or attributes (name, size, data type, etc.) and data about records or data structures (length, fields, columns, etc.) and data about data (where it is located, how it is associated, ownership, etc.). Meta-data may include descriptive information about the context, quality and condition, or characteristics of the data.

While these tautological explanations could lead us into the realms of epistemology and metaphysics, we won't go there. Instead, let's focus on the role that metadata plays in the practical realm of information architecture.

Metadata tags are used to describe documents, pages, images, software, video and audio files, and other content objects for the purposes of improved navigation and retrieval. The HTML keyword meta tag used by many web sites provides a simple example. Authors can freely enter words and phrases that describe the content. These keywords are not displayed in the interface but are available for use by search engines.

<meta name="keywords" content="information architecture, content management, 
knowledge management, user experience">

Many companies today are using metadata in more sophisticated ways. Leveraging content management software and controlled vocabularies, they create dynamic metadata-driven web sites that support distributed authoring and powerful navigation. This metadata-driven model represents a profound change in how web sites are created and managed. Instead of asking, "Where do I place this document in the taxonomy?" we can now ask, "How do I describe this document?" The software and vocabulary systems take care of the rest.