How do we choose among randomized BSTs, splay BSTs, red black BSTs, and skip lists for a particular application? We have concentrated on the differing nature of these algorithms' performance guarantees. Time and space are always primary considerations, but we must also consider a number of other factors. In this section, we shall briefly discuss implementation issues, empirical studies, estimates of running time, and space requirements.
All the tree-based algorithms depend on rotations; implementation of rotations along the search path is an essential ingredient of most balanced tree algorithms. We have used recursive implementations that implicitly save references to nodes on the search path in local variables on the recursion stack, but each of the algorithms can be implemented in a nonrecursive fashion, operating on a constant number of nodes and performing a constant number of link operations per node in one top-down pass through the tree.
Randomized BSTs are the simplest to implement of the three tree-based algorithms. The prime requirements are to have confidence in the random-number generator and to avoid spending too much time generating the random bits. Splay BSTs are slightly more complicated but are a straightforward extension to the standard root-insertion algorithm. Red black BSTs involve slightly more code stillf in order to check and manipulate the color bits. One advantage of red black trees over the other two is that the color bits can be used for both a consistency check for debugging and for a guarantee of a quick search at any time during the lifetime of the tree. There is no way to know from examining a splay BST whether or not the code that produced it made all the proper transformations; a bug might lead (only!) to performance problems. Similarly, a bug in the random-number generator for randomized BSTs or skip lists could lead to otherwise unnoticed performance problems.
Skip lists are easy to implement and are particularly attractive if a full range of symbol-table operations is to be supported, because search, insert, remove, join, select, and sort all have natural implementations that are easy to formulate. The inner loop for searching in skip lists is longer than that for trees (it involves an additional index into the link array or an additional recursive call to move down a level), so the time for search and insert is longer. Skip lists also put the programmer at the mercy of the random-number generator debugging a program whose behavior is random is a challenge, and some programmers find it particularly unsettling to work with nodes having a random number of links.
Table 13.1 gives empirical data on both the performance of the four methods that we have discussed in this chapter and on the elementary BST implementations from Chapter 12, for keys that are random 32-bit integers. The information in this table confirms what we expect from the analytic results in Sections 13.2, 13.4, and 13.5. Red black BSTs are much faster than the others for random keys. Paths in red black BSTs are 35 percent shorter than in randomized or splay BSTs, and there is less work to do in the inner loop. Randomized trees and skip lists require that we generate at least one new random number for every insertion, and splay BSTs involve a rotation at every node for every insertion and every search. By contrast, the overhead for red black BSTs is that we check the value of 2 bits at every node during insertion and occasionally need to do a rotation. For nonuniform access, splay BSTs may involve shorter paths, but this savings is likely to be offset by the fact that both search and insertion involve rotations at every node in the inner loop, except possibly in extreme cases.
Splay BSTs require no extra space for balance information, red black BSTs require 1 extra bit, and randomized BSTs require a count field. For many applications, the count field is maintained for other reasons, so it may not represent an extra cost for randomized BSTs. Indeed, we might need to add this field if we use splay BSTs, red black BSTs, or skip lists. If necessary, we can make red black BSTs as space-efficient as splay BSTs by eliminating the color bit (see Exercise 13.65). In modern applications, space is less critical than it once was, but the careful programmer still needs to be vigilant against waste. For example, we need to be aware that some systems might use a whole 32-bit word for a small count field or a 1-bit color field in a node, and that some other systems might pack the fields in memory such that unpacking them requires a significant amount of extra time. If space is tight, skip lists with large t can reduce by nearly one-half the space for links, at the cost of a slower but still logarithmic search. With some programming, the tree-based methods can also be implemented with one link per node (see Exercise 12.68).
These relative timings for building and searching BSTs from random sequences of N 32-bit integers, for various values of N, indicate that all the methods have good performance, even for huge tables, but that red black trees are significantly faster than are the other methods. All the methods use standard BST search, except splay BSTs, where we splay on search to bring frequently accessed keys near the top, and skip lists, which use essentially the same algorithm with a different underlying data structure. | ||||||||||||
construction | search misses | |||||||||||
N | B | T | T* | S | R | L | B | T | T* | S | R | L |
1250 | 20 | 28 | 28 | 10 | 14 | 15 | 11 | 10 | 10 | 10 | 10 | 16 |
2500 | 10 | 36 | 40 | 24 | 25 | 21 | 15 | 12 | 11 | 12 | 11 | 19 |
5000 | 22 | 33 | 65 | 145 | 42 | 35 | 26 | 26 | 26 | 27 | 19 | 46 |
12500 | 90 | 128 | 197 | 267 | 92 | 145 | 75 | 74 | 86 | 80 | 60 | 145 |
25000 | 167 | 569 | 492 | 215 | 181 | 385 | 175 | 180 | 212 | 195 | 158 | 355 |
50000 | 381 | 648 | 1105 | 1125 | 430 | 892 | 420 | 415 | 505 | 433 | 359 | 878 |
100000 | 1004 | 2593 | 2656 | 1148 | 1190 | 3257 | 1047 | 1041 | 1357 | 1113 | 861 | 2094 |
200000 | 2628 | 6121 | 6341 | 2784 | 2936 | 7493 | 2553 | 2573 | 2893 | 2649 | 2114 | 5109 |
Key: B Standard BST(Program 12.15) T BST built by root insertion (Program 12.19) T* Randomized BST (Program 13.2) S Splay BST(Exercise 13.33 and Program 13.5) R Red black BST (Program 13.6) L Skip list (Programs 13.7 and 13.9) |
In summary, all the methods that we have discussed in this chapter will provide good performance for typical applications, and each has its virtues for people interested in developing a high-performance symbol-table implementation. Splay BSTs will provide good performance as a self-organizing search method, particularly when frequent access to a small set of keys is a typical pattern; randomized BSTs are likely to be faster and easier to implement for a fully functional symbol table BST; skip lists are easy to understand and can provide logarithmic search with less space than the other methods, and red black BSTs are attractive for symbol-table library implementations, because they provide guaranteed performance bounds in the worst case and the fastest search and insertion algorithms for random data.
Beyond specific uses in applications, this panoply of solutions to the problem of developing efficient implementations of the symbol-table ADT is important because it illustrates fundamental approaches to algorithm design that are available to us when we consider solutions to other problems. In our constant quest for simple, optimal algorithms, we often encounter useful near-optimal algorithms, such as the ones discussed here. Moreover, as we saw with sorting, comparison-based algorithms such as these are only the beginning of the story by moving to a lower-level abstraction, where we can process pieces of keys, we can develop implementations that are even faster than the ones discussed in this chapter, as we shall see in Chapters 14 and 15.
13.85 Develop a symbol-table implementation using randomized BSTs that includes a clone implementation and supports the construct, count, search, insert, remove, and join operations for a symbol-table ADT, with support for client handles (see Exercises 12.6 and 12.7).
13.86 Develop a symbol-table implementation using skip lists that includes a clone implementation and supports the construct, count, search, insert, remove, and join operations for a symbol-table ADT, with support for client handles (see Exercises 12.6 and 12.7).
Top |