12.6 Binary Search Trees | Algorithms in Java, Parts 1-4 (3rd Edition) (Pts.1-4)

Insertions into ordered lists and searches in unordered lists are both expensive, so neither data structure is useful when we have a mix of such operations. In this section, we see that using an explicit tree structure as the basis for a symbol-table implementation allows us to develop algorithms with fast average-case performance for the search, insert, select, and sort symbol-table operations. It is the method of choice for many applications and qualifies as one of the most fundamental algorithms in computer science.

We discussed trees at some length, in Chapter 5, but it will be useful to review the terminology. We are working with data structures comprised of nodes that contain links that either point to other nodes or to external nodes, which have no links. In a (rooted) tree, we have the restriction that every node is pointed to by just one other node, which is called its parent. In a binary tree, we have the further restriction that each node has exactly two links, which are called its left and right links. Nodes with two links are also referred to as internal nodes. For search, each internal node also has an item with a key value, and we refer to links to external nodes as null links. The key values in internal nodes are compared with the search key, and control the progress of the search.

Definition 12.2 A binary search tree (BST) is a binary tree that has a key associated with each of its internal nodes, with the additional property that the key in any node is larger than (or equal to) the keys in all nodes in that node's left subtree and smaller than (or equal to) the keys in all nodes in that node's right subtree.

Program 12.15 uses BSTs to implement the symbol-table search, insert, and construct operations. It defines nodes in BSTs as each containing an item (with a key), a left link, and a right link. The left link points to a BST for items with smaller (or equal) keys, and the right link points to a BST for items with larger (or equal) keys.

Program 12.15 BST-based symbol table

The search and insert methods in this implementation use the private recursive methods searchR and insertR that directly mirror the recursive definition of BSTs. The link head points to the root of the tree. The less and equals methods are the same as in Program 12.8 and are omitted.

 class ST    {      private class Node        { ITEM item; Node l, r;          Node(ITEM x) { item = x; }        }      private Node head;      ST(int maxN)        { head = null; }      private Node insertR(Node h, ITEM x)        {          if (h == null)              return new Node(x);          if (less(x.key(), h.item.key()))              h.l = insertR(h.l, x);          else h.r = insertR(h.r, x);          return h;        }      void insert(ITEM x)        { head = insertR(head, x); }      private ITEM searchR(Node h, KEY v)        {          if (h == null) return null;          if (equals(v, h.item.key())) return h.item;          if (less(v, h.item.key()))               return searchR(h.l, v);          else return searchR(h.r, v);        }      ITEM search(KEY key)        { return searchR(head, key); }

Given this structure, a recursive algorithm to search for a key in a BST follows immediately: If the tree is empty, we have a search miss; if the search key is equal to the key at the root, we have a search hit. Otherwise, we search (recursively) in the appropriate subtree. The searchR method in Program 12.15 implements this algorithm directly. We invoke a recursive method that takes a tree as first parameter and a key as second parameter, starting with the root of the tree and the search key. At each step, we are guaranteed that no parts of the tree other than the current subtree can contain items with the search key. Just as the size of the interval in binary search shrinks by a little more than half on each iteration, the current subtree in binary-tree search is smaller than the previous (by about half, ideally). The procedure stops either when an item with the search key is found (search hit) or when the current subtree becomes empty (search miss).

The diagram at the top in Figure 12.6 illustrates the search process for a sample tree. Starting at the top, the search procedure at each node involves a recursive invocation for one of that node's children, so the search defines a path through the tree. For a search hit, the path terminates at the node containing the key. For a search miss, the path terminates at an external node, as illustrated in the middle diagram in Figure 12.6.

Figure 12.6. BST search and insertion

In a successful search for H in this sample tree (top), we move right at the root (since H is larger than A), then left at the right subtree of the root (since H is smaller than S), and so forth, continuing down the tree until we encounter the H. In an unsuccessful search for M in this sample tree (center), we move right at the root (since M is larger than A), then left at the right subtree of the root (since M is smaller than S), and so forth, continuing down the tree until we encounter an external link at the left of N at the bottom. To insert M after the search miss, we simply replace the link that terminated the search with a link to M (bottom).

graphics/12fig06.gif

Program 12.15 uses null links to represent external nodes, and a private data member head that is a link to the root of the tree. To construct an empty BST, we set head to null. We could also use a dummy node at the root and another to represent all external nodes, in various combinations analogous to those we considered for linked lists in Table 3.1 (see Exercise 12.67).

The search method in Program 12.15 is as simple as binary search; an essential feature of BSTs is that insert is as easy to implement as search. A recursive method insertR to insert a new item into a BST follows from logic similar to that we used to develop searchR: If the tree is empty, we return a new node containing the item; if the search key is less than the key at the root, we set the left link to the result of inserting the item into the left subtree; otherwise, we set the right link to the result of inserting the key into the right subtree. For the simple BSTs that we are considering, resetting the link after the recursive call is usually unnecessary, because the link changes only if the subtree was empty, but it is as easy to set the link as to test to avoid setting it. In Section 12.8 and in 13, we shall study more advanced tree structures that are naturally expressed with this same recursive scheme but that more often actually change the link.

Program 12.16 Counting nodes and sorting in BSTs

An inorder traversal of a BST visits all the items in order of their keys, and therefore can serve as the basis for implementing the count and sort operations, as in these methods. This lazy count implementation of count is appropriate only when counts are infrequent; to implement an eager count, we could maintain a field in each node giving the number of nodes in its subtree (see Exercise 12.60).

 private int countR(Node h)    { if (h == null) return 0;      return 1 + countR(h.l) + countR(h.r);    }  int count() { return countR(head); }  private String toStringR(Node h)    { if (h == null) return "";      String s = toStringR(h.l);      s += h.item.toString() + "\n";      s += toStringR(h.r);      return s;    }  public String toString()    { return toStringR(head); }

Figures 12.7 and 12.8 show how we construct a sample BST by inserting a sequence of keys into an initially empty tree. New nodes are attached to null links at the bottom of the tree; the tree structure is not otherwise changed. Because each node has two links, the tree tends to grow out, rather than down.

Figure 12.7. BST construction

This sequence depicts the result of inserting the keys A S E R C H I N into an initially empty BST. Each insertion follows a search miss at the bottom of the tree.

graphics/12fig07.gif

Figure 12.8. BST construction (continued)

This sequence depicts insertion of the keys G X M P L to the BST started in Figure 12.7.

graphics/12fig08.gif

The sort operation for symbol tables is available with little extra work when BSTs are used. Constructing a binary search tree amounts to sorting the items, since a binary search tree represents a sorted file when we look at it the right way. In our figures, the keys appear in order if read from left to right on the page (ignoring their height and the links). A program has only the links with which to work, but a simple inorder traversal does the job, by definition, as shown by the recursive implementation toStringR in Program 12.16. To show the items in a BST in order of their keys, we show the items in the left subtree in order of their keys (recursively), then show the root, then show the items in the right subtree in order of their keys (recursively).

Program 12.17 Insertion in BSTs (nonrecursive)

Inserting an item into a BST is equivalent to doing an unsuccessful search for it, then attaching a new node for the item in place of the null link where the search terminates. Attaching the new node requires that we keep track of the parent p of the current node q as we proceed down the tree. When we reach the bottom of the tree, p points to the node whose link we must change to point to the new node inserted.

 public void insert(ITEM x)    { KEY key = x.key();      if (head == null)        { head = new Node(x); return; }      Node p = head, q = p;      while (q != null)        if (less(key, q.item.key()))             { p = q; q = q.l; }        else { p = q; q = q.r; }      if (less(key, p.item.key()))           p.l = new Node(x);      else p.r = new Node(x);    }

As discussed in Section 12.1, we shall refer on occasion to a generic visit operation for symbol tables, where we want to visit each of the items in the symbol table in a systematic manner. For BSTs, we can visit items in order of their keys by replacing "show" by "visit" in the description just given and perhaps arranging to pass an object that has a method to visit an item as a parameter (see Section 5.6).

Thinking nonrecursively when contemplating search and insert in BSTs is also instructive. In a nonrecursive implementation, the search process consists of a loop where we compare the search key against the key at the root, then move left if the search key is less and right if it is greater. Insertion consists of a search miss (ending in an empty link), then replacement of the empty link with a pointer to a new node. This process corresponds to manipulating the links explicitly along a path down the tree (see Figure 12.6). In particular, to be able to insert a new node at the bottom, we need to maintain a link to the parent of the current node, as in the implementation in Program 12.17. As usual, the recursive and nonrecursive versions are essentially equivalent, but understanding both points of view enhances our understanding of the algorithm and data structure.

The BST methods in Program 12.15 do not explicitly check for items with duplicate keys. When a new node whose key is equal to some key already in the tree is inserted, it falls to the right of the node already in the tree. One side effect of this convention is that nodes with duplicate keys do not appear contiguously in the tree (see Figure 12.9). However, we can find them by continuing the search from the point where search finds the first match, until we encounter a null link. There are several other options for dealing with items that have duplicate keys, as mentioned in Section 12.1.

Figure 12.9. Duplicate keys in BSTs

When a BST has records with duplicate keys (top), they appear scattered throughout the tree, as illustrated by the three highlighted A's. Duplicate keys do all appear on the search path for the key from the root to an external node, so they can readily be accessed. However, to avoid confusing us-ages such as "the A below the C," we use distinct keys in our examples (bottom).

graphics/12fig09.gif

BSTs are dual to quicksort. The node at the root of the tree corresponds to the partitioning element in quicksort (no keys to the left are larger, and no keys to the right are smaller). In Section 12.6, we shall see how this observation relates to the analysis of properties of the trees.

Exercises

12.57 Draw the BST that results when you insert items with the keys E A S Y Q U T I O N, in that order, into an initially empty tree.

12.58 Draw the BST that results when you insert items with the keys E A S Y Q U E S T I O N, in that order, into an initially empty tree.

12.59 Give the number of comparisons required to put the keys E A S Y Q U E S T I O N into an initially empty symbol table using a BST. Assume that a search is performed for each key, followed by an insert for each search miss, as in Program 12.6.

12.60 Add an integer field N to Node and modify the BST code in Programs 12.15 and 12.16 to implement an eager count operation that takes constant time.

12.63 Draw a diagram like Figure 12.1 that depicts three representations of the 4-key BST in Figure 12.7: one with primitive fields in the nodes (as in Exercise 12.62), one with int keys, and one with myKey keys.

12.64 Implement a searchinsert method for BSTs (Program 12.15). It should search the symbol table for an item with the same key as a given item, then insert the item if it finds none.

12.65 Write a method that returns the number of items in a BST with keys equal to a given key.

12.66 Suppose that we have an estimate ahead of time of how often search keys are to be accessed in a binary tree. Should the keys be inserted into the tree in increasing or decreasing order of likely frequency of access? Explain your answer.

See Table 3.1.)

12.68 Modify the BST implementation in Program 12.15 to keep items with duplicate keys in linked lists hanging from tree nodes. Change the interface to have search operate like sort (for all the items with the search key).

12.69 The nonrecursive insertion procedure in Program 12.17 uses a redundant comparison to determine which link of p to replace with the new node. Give an implementation that avoids this comparison.

Top