Section 9.4. A Thesaurus in Action


9.4. A Thesaurus in Action

It's not so easy to find good examples of public web sites that leverage thesauri. Until recently, not many teams have had the knowledge or support to make this significant investment. We expect this to change in the coming years as thesauri become a key tool for dealing with the growing size and importance of web sites and intranets. Another barrier to finding good examples is that it's often not obvious when a site is using a thesaurus. When it's well integrated, a thesaurus can be invisible to the untrained eye. You have to know what you're looking for to notice one. Think back to the Tilenol/Tylenol example. How many users even realize when the site adjusts for their misspelling?

One good example that will serve throughout this chapter is PubMed, a service of the National Library of Medicine. PubMed provides access to over 16 million citations from MEDLINE and additional life science journals. MEDLINE has been the premier electronic information service for doctors, researchers, and other medical professionals for many years. It leverages a huge thesaurus that includes more than 19,000 preferred terms or "main subject headings" and provides powerful searching capabilities.

PubMed provides a simpler public interface with free access to citations, but without access to the full text of the journal articles. Let's first take a look at the interface, and then dive beneath the surface to see what's going on.

Let's say we're studying African sleeping sickness. We enter that phrase into the PubMed search engine and are rewarded with the first 20 results out of 2,778 total items found (Figure 9-14). So far, there's nothing apparently different about this search experience. For all we know, we might have just searched the full text of all 16 million journal articles. To understand what's going on, we need to look deeper.

Figure 9-14. Search results on PubMed


In fact, we didn't search the full-text articles at all. Instead, we searched the metadata records for these articles, which include a combination of abstracts and subject headings (Figure 9-15).

Figure 9-15. Sample record with abstract in PubMed


When we select another item from our search results, we find a record with subject headings ("MeSH Terms") but no abstract (Figure 9-16).

Figure 9-16. Sample record with index terms in PubMed


When we scroll down to look through the full list of terms, we see no entry for African sleeping sickness. What's going on? Why was this article retrieved? To answer that question, we need to switch gears and take a look at the MeSH Browser, an interface for navigating the structure and vocabulary of MeSH (Figure 9-17).

Figure 9-17. The MeSH Browser


The MeSH Browser enables us to navigate by browsing the hierarchical classification schemes within the thesaurus or by searching. If we try a search on "African sleeping sickness," we'll see why the article "Wolbachia. A tale of sex and survival" was retrieved in our search. "African sleeping sickness" is actually an entry term for the preferred term or MeSH heading, "Trypanosomiasis, African." (See Figure 9-18.) When we searched PubMed, our variant term was mapped to the preferred term behind the scenes. Unfortunately, PubMed doesn't go further in leveraging the underlying MeSH thesaurus. It would be nice, for example, to turn all of those MeSH terms in our sample record into live links and provide enhanced searching and browsing capabilities, similar to those provided by Amazon, as shown in Figure 9-19.

Figure 9-18. MeSH record for trypanosomiasis (top and bottom of page)


Figure 9-19. Amazon's use of structure and subject headings for enhanced navigation


In this example, Amazon leverages the hierarchical classification scheme and subject headings to provide powerful options for searching and browsing, allowing users to iteratively refine their queries. This surely could be a useful enhancement to PubMed.

One of the advantages to using a thesaurus is that you have tremendous power and flexibility to shape and refine the user interface over time. You can't take advantage of all the capabilities at once, but you can user-test different features, learning and adjusting as you go. PubMed may not have leveraged the full power of the MEDLINE thesaurus so far, but it's nice to have that rich network of semantic relationships to draw upon as design and development continues.




Information Architecture for the World Wide Web
Information Architecture for the World Wide Web: Designing Large-Scale Web Sites
ISBN: 0596527349
EAN: 2147483647
Year: 2006
Pages: 194

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net