Chapter 12. Symbol Tables and Binary Search Trees | Algorithms in Java, Parts 1-4 (3rd Edition) (Pts.1-4)

The retrieval of a particular piece or pieces of information from large volumes of previously stored data is a fundamental operation, called search, that is intrinsic to a great many computational tasks. As with sorting algorithms in Chapters 6 through 11 and, in particular, priority queues in Chapter 9, we work with data divided into records or items, each item having a key for use in searching. The goal of the search is to find the items with keys matching a given search key. The purpose of the search is usually to access information within the item (not merely the key) for processing.

Applications of search are widespread and involve a variety of different operations. For example, consider a bank that needs to keep track of all its customers' account information and to search through these records to check account balances and to perform transactions. Another example is an airline that needs to keep track of reservations on all its flights and to search through them to find empty seats or to cancel or otherwise modify the reservations. A third example is a search engine on a network software interface that looks for all documents in the network containing a given keyword. The demands of these applications are similar in some ways (the bank and the airline both demand accuracy and reliability) and different in others (the bank's data have a long life, compared to the data in the others); all need good search algorithms.

Definition 12.1 A symbol table is a data structure of items with keys that supports two basic operations: insert a new item, and return an item with a given key.

Symbol tables are also sometimes called dictionaries, by analogy with the time-honored system of providing definitions for words by listing them alphabetically in a reference book. In an English-language dictionary, the "keys" are the words and the "items" are the entries associated with the words that contain the definition, pronunciation, and other information. People use search algorithms to find information in a dictionary, usually depending on the fact that the entries appear in alphabetical order. Telephone books, encyclopedias, and other reference books are organized in essentially the same way, and some of the search methods that we shall discuss (for example, the binary search algorithm in Sections 2.6 and 12.4) also depend upon the entries being kept in order.

An advantage of computer-based symbol tables is that they can be much more dynamic than a dictionary or a telephone book, so most of the methods that we shall discuss build data structures that not only enable efficient search algorithms but also support efficient implementations of operations to add new items, to remove or modify items, to combine two symbol tables into one, and so forth. In this chapter, we shall revisit many of the issues related to such operations that we considered for priority queues in Chapter 9. The development of dynamic data structures to support search is one of the oldest and most widely studied problems in computer science; it will be our main focus in this chapter and in Chapters 13 through 16. As we shall see, many ingenious algorithms have been (and are still being) invented to solve the symbol-table implementation problem.

Beyond basic applications of the type just mentioned, symbol tables have been studied intensively by computer scientists and programmers because they are indispensable aids in organizing software on computer systems. A symbol table is the dictionary for a program: The keys are the symbolic names used in the program, and the items contain information describing the object named. From the early days of computing, when symbol tables allowed programmers to move from using numeric addresses in machine code to using symbolic names in assembly language, to modern applications of the new millennium, when symbolic names have meaning across worldwide computer networks, fast search algorithms have played and will play an essential role in computation.

Symbol tables are also frequently encountered in low-level abstractions, occasionally at the hardware level. The term associative memory is sometimes used to describe the concept. We shall focus on software implementations, but some of the methods that we consider are also appropriate for hardware implementation.

As with our study of sorting methods in Chapter 6, we shall begin our study of search methods in this chapter by looking at some elementary methods that are useful for small tables and in other special situations and that illustrate fundamental techniques exploited by more advanced methods. Then, for much of the remainder of the chapter, we shall focus on the binary search tree (BST), a fundamental and widely used data structure that admits fast search algorithms.

We considered two search algorithms in Section 2.6 as an illustration of the effectiveness of mathematical analysis in helping us to develop effective algorithms. For completeness in this chapter, we repeat some of the information that we considered in Section 2.6, though we refer back to that section for some proofs. Later in the chapter, we also refer to the basic properties of binary trees that we considered in Sections 5.4 and 5.5.

Top