Chapter 15. Radix Search

   

Several search methods proceed by examining the search keys one small piece at a time, rather than using full comparisons between keys at each step. These methods, called radix-search methods, operate in a manner entirely analogous to the radix-sorting methods that we discussed in Chapter 10. They are useful when the pieces of the search keys are easily accessible, and they can provide efficient solutions to a variety of practical search tasks.

We use the same abstract model that we used in Chapter 10: Depending on the context, a key may be a word (a fixed-length sequence of bytes) or a string (a variable-length sequence of bytes). We treat keys that are words as numbers represented in a base-R number system, for various values of R (the radix), and work with individual digits of the numbers. We view strings as variable-length numbers terminated by a special symbol so that, for both fixed- and variable-length keys, we can base all our algorithms on the abstract operation "extract the ith digit from a key," including a convention to handle the case that the key has fewer than i digits. Accordingly, all of our implementations are based on the two-parameter static method digit from Chapter 10 that implements this operation. For clarity, we use the name bit when R is 2.

This convention gives us the flexibility to accommodate complicated keys and items by defining an appripriate class or to accommodate simple keys by defining appropriate digit or bit methods for primitive types. For example, for integer keys, we could replace KEY by int in our code and add to each class the following code:

Program 15.1 Binary key type

This code extends a key class such as Program 12.2, which defines integer-valued keys, to provide radix methods with access to the key bits. It provides a bit method that returns the indicated bit from the key (an integer that is 0 or 1), the constants bitsword and R,and a toString method that returns a representation of the key as a string of bits.

 class bitsKey extends myKey    {      public final static int bitsword = 31;      public final static int R = 2;      public int bit(int B)        { return (val >> (bitsword-B-1)) & 1; }      public String toString()        { String s = "";          for (int i = 0; i < bitsword; i++)            s = s + bit(i);          return s;        }    } 
 private final static int bitsword = 31;  private final static int R = 2;  private int bit(int val, int B)    { return (val >> (bitsword-B-1)) & 1; } 

Program 15.1 illustrates how to achieve the same effect for class key types by extending a key class to define the bit method (along with B, bitsword, and toString). In this case, we would add to each class the code

 private final static int R = bitsKey.R;  private int bit(KEY v, int B)     { return ((bitsKey) v).bit(B); } 

The same approaches apply to implementing digit, using the techniques for various types of keys that are described in Section 10.1. Program 15.9 in Section 15.4 is an example of such a class.

The principal advantages of radix-search methods are that the methods provide reasonable worst-case performance without the complication of balanced trees; they provide an easy way to handle variable-length keys; some of them allow space savings by storing part of the key implicitly within the search structure; and they can provide fast access to data, competitive with both binary search trees and hashing. The disadvantages are that some of the methods can make inefficient use of space and that, as with radix sorting, performance can suffer if efficient access to the bytes of the keys is not available.

First, we examine several search methods that proceed by examining the search keys 1 bit at a time, using them to travel through binary tree structures. We examine a series of methods, each one correcting a problem inherent in the previous one, culminating in an ingenious method that is useful for a variety of search applications.

Next, we examine generalizations to R-way trees. Again, we examine a series of methods, culminating in a flexible and efficient method that can support a basic symbol-table implementation and numerous extensions.

In radix search, we usually examine the most significant digits of the keys first. Many of the methods directly correspond to MSD radix-sorting methods, in the same way that BST-based search corresponds to quicksort. In particular, we shall see the analog to the linear-time sorts of Chapter 10 constant-time search methods based on the same principle.

We also consider the specific application of using radix-search structures for string processing, including building indexes for large text strings. The methods that we consider provide natural solutions for this application and help to set the stage for us to consider more advanced string-processing tasks in Part 6.


   
Top


Algorithms in Java, Part 1-4
Algorithms in Java, Parts 1-4 (3rd Edition) (Pts.1-4)
ISBN: 0201361205
EAN: 2147483647
Year: 2002
Pages: 158

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net