The primary conclusion that we can draw from the analytic results of Section 10.6 is that the running time of radix sorts can be sublinear in the total amount of information in the keys. In this section, we consider practical implications of this fact.
The LSD radix-sort implementation given in Section 10.5 makes bytesword passes through the file. By making R large, we get an efficient sorting method, as long as N is also large and we have space for R counters. As mentioned in the proof of Property 10.5, a reasonable choice is to make lg R (the number of bits per byte) about one-quarter of the word size so that the radix sort is four key-indexed counting passes. Each byte of each key is examined, but there are only four digits per key. This example corresponds directly to the architectural organization of many computers: one typical organization has 32-bit words, each consisting of four 8-bit bytes. We extract bytes, rather than bits, from words, which approach is likely to be much more efficient on many computers. Now, each key-indexed counting pass is linear, and, because there are only four of them, the entire sort is linear certainly the best performance we could hope for in a sort.
In fact, it turns out that we can get by with only two key-indexed counting passes. We do so by taking advantage of the fact that the file will be almost sorted if only the leading w/2 bits of the w-bit keys are used. As with quicksort, we can complete the sort efficiently by using insertion sort on the whole file afterward. This method is a trivial modification to Program 10.4. To do a right-to-left sort using the leading one-half of the keys, we simply start the outer for loop at bytesword/2-1, rather than bytesword-1. Then, we use a conventional insertion sort on the nearly ordered file that results. Figures 10.3 and 10.18 provide convincing evidence that a file sorted on its leading bits is well ordered. Insertion sort would use only six exchanges to sort the file in the fourth column of Figure 10.3,and Figure 10.18 shows that a larger file sorted on only the leading one-half of its bits also could be sorted efficiently by insertion sort.
When keys are random bits, sorting the file on the leading bits of the keys brings it nearly into order. This diagram compares a six-pass LSD radix sort (left) on a file of random 6-bit keys with a three-pass LSD radix sort, which can be followed by an insertion-sort pass (right). The latter strategy is nearly twice as fast.
For some file sizes, it might make sense to use the extra space that would otherwise be used for the auxiliary array to try to get by with just one key-indexed counting pass, doing the rearrangement in place. For example, sorting 1 million random 32-bit keys could be done with one key-indexed counting sort on the leading 20 bits, then an insertion sort. To do that, we need space just for the (1 million) counters significantly less than would be needed for an auxiliary array. Using this method is equivalent to using standard MSD radix sort with R = 220, although it is essential that a small radix be used for small files for such a sort (see the discussion after Property 10.4).
The LSD approach to radix sorting is widely used, because it involves extremely simple control structures and its basic operations are suitable for machine-language implementation, which can directly adapt to special-purpose high-performance hardware. In such an environment, it might be fastest to run a full LSD radix sort. We need to have space for just N extra references (and R counters) to use LSD radix sort, and this investment yields a method that can sort random files with only three or four passes.
These relative timings for radix sorts on random files of N 32-bit integers (all with a cutoff to insertion sort for N less than 16) indicate that, when used with care, radix sorts can be among the fastest sorts available. If we use a huge radix for tiny files, we ruin the performance of MSD radix sort, but adapting the radix to be less than the file size cures this problem. The fastest method for integer keys is LSD radix sort on the leading one-half of the bits, which we can speed up further by paying careful attention to the inner loop (see Exercise 10.51). | |||||||||
4-bit bytes | 8-bit bytes | 16-bit bytes | |||||||
N | Q | M | L | M | L | L* | M | L | M* |
12500 | 53 | 29 | 35 | 13 | 14 | 18 | 21 | 15 | 2 |
25000 | 49 | 22 | 40 | 19 | 20 | 16 | 26 | 23 | 5 |
50000 | 73 | 50 | 84 | 32 | 42 | 24 | 35 | 38 | 12 |
100000 | 91 | 99 | 170 | 66 | 89 | 51 | 61 | 74 | 64 |
200000 | 192 | 202 | 373 | 130 | 195 | 119 | 202 | 158 | 82 |
400000 | 410 | 457 | 784 | 515 | 413 | 274 | 62506 | 346 | 153 |
800000 | 892 | 1003 | 1587 | 2452 | 847 | 637 | 588076 | 726 | 1039 |
Key: Q Quicksort, standard (Program 7.1) M MSD radix sort, standard (Program 10.2) L LSD radix sort (Program 10.4) M* MSD radix sort, radix adapting to file size L* LSD radix sort on MSD bits |
In conventional programming environments, the inner loop of the key-indexed counting program on which the radix sorts are based contains a substantially higher number of instructions than do the inner loops of quicksort or mergesort. This property of the implementations implies that the sublinear methods that we have been describing may not be as much faster than quicksort (say) as we might expect in many situations.
These relative timings for various sorts on the first N words of Moby Dick (all, except heapsort, with a cutoff to insertion sort for N less than 16) indicate that the MSD-first approach is effective for string data. The cutoff for small subfiles is not effective unless we modify the insertion sort to avoid going through the leading parts of the keys (see Exercise 10.52). Three-way partitioning benefits from the large numbers of equal keys in the file and MSD radix sort benefits from the assumption that all characters are lower-case letters in more general situations, three-way radix quicksort is likely to outperform the other methods. | |||||||
N | Q | T | M | F | R | X | X* |
12500 | 75 | 59 | 68 | 74 | 69 | 64 | 43 |
25000 | 129 | 107 | 145 | 169 | 127 | 115 | 102 |
50000 | 313 | 257 | 341 | 418 | 237 | 283 | 267 |
100000 | 819 | 603 | 757 | 986 | 500 | 681 | 649 |
Key: Q Quicksort, standard (Program 7.1) T Quicksort with three-way partitioning (Program 7.5) M Mergesort (Program 8.3) F Heapsort with Floyd's improvement (see Section 9.4) R MSD radix sort (Program 10.2) X Three-way radix quicksort (Program 10.3) X* Three-way radix quicksort (with cutoff) |
General-purpose algorithms such as quicksort are more widely used than radix sort, because they adapt to a broader variety of applications. The primary reason for this state of affairs is that the key abstraction on which radix sort is built is less general than the one that we used in Chapters 6 through 9. Our use of the ITEM interface to specify that items to be sorted must have a less method (and Java's use of Comparable and compareTo for the same purpose) is to have the client provide the comparison method. This arrangement not only handles situations where the client can use specialized knowledge about complex keys to implement a fast comparison but also allows us to sort using an ordering relation that may not involve keys at all. Radix sorting may not be applicable in such situations.
When any of them could be used, the choice among quicksort and the various radix sort algorithms (and related versions of quicksort!) that we have considered in this chapter will depend not only on features of the application (such as key, record, and file size) but also on features of the programming and machine environment that relate to the efficiency of access and use of individual bits and bytes. Tables 10.1 and 10.2 give empirical results in support of the conclusion that the linear- and sublinear-time performance results that we have discussed for various applications of radix sorts make these sorting methods an attractive choice for a variety of suitable applications.
10.50 What is the major drawback of doing LSD radix sorting on the leading bits of the keys, then cleaning up with insertion sort afterward?
10.51 Develop an implementation of LSD radix sort for 32-bit keys with as few instructions as possible in the inner loop.
10.52 Implement three-way radix quicksort such that the insertion sort for small files does not use leading bytes that are known to be equal in comparisons.
10.53 Given 1 million random 32-bit keys, find the choice of byte size that minimizes the total running time when we use the method of using LSD radix sort on the first 2 bytes, then using insertion sort to clean up.
10.54 Answer Exercise 10.53 for 1 billion 64-bit keys.
10.55 Answer Exercise 10.54 for three-pass LSD radix sort.
Top |