Armed with the tools outlined in the previous three sections, we now consider the analysis of
sequential search
and
binary search
, two basic algorithms for determining whether or not any of a sequence of objects appears among a set of previously stored objects. Our purpose is to
For example, we might imagine a credit-card company that has
N
credit risks or stolen credit cards and wants to check whether any of
M
given transactions involves any one of the
N
bad
This function checks whether the number v is among a previously stored set of numbers in a[l] , a[l+1] , ... , a[r] , by comparing against each number sequentially, starting at the beginning. If we reach the end without finding the number sought, then we return the value -1 . Otherwise, we return the index of the array position containing the number.
static int search(int a[], int v, int l, int r)
{ int i;
for (i = l; i <= r; i++)
if (v == a[i]) return i;
return -1;
}
Program 2.1 implements a straightforward solution to the search problem. It is implemented as a Java method that operates on an array (see Chapter 3) for better compatibility with other code that we will examine for the same problem in Part IV. It is not necessary to understand these details to understand the algorithm: We store all the objects in an array; then, for each transaction, we look through the array sequentially, from beginning to end, checking each to see whether it is the one that we seek.
To analyze the algorithm, we note immediately that the running time depends on whether or not the object sought is in the array. We can determine that the search is
Therefore, the running time depends on the data. If all the searches are for the number that happens to be in the first position in the array, then the algorithm will be fast; if they are for the number that happens to be in the last position in the array, it will be slow. We discuss in Section 2.7 the distinction between being able to guarantee performance and being able to predict performance. In this case, the best guarantee that we can provide is that no more than N numbers will be examined.
To make a prediction, however, we need to make an assumption about the data. In this case, we might choose to assume that all the numbers are randomly chosen. This assumption implies, for example, that each number in the table is equally likely to be the object of a search. On reflection, we realize that it is that property of the search that is critical, because with
Property 2.1
Sequential search examines N numbers for each unsuccessful search and about N/ 2 numbers for each successful search on the average.
If each number in the table is equally likely to be the object of a search, then
is the average cost of a search.
Property 2.1 implies that the running time of Program 2.1 is proportional to N , subject to the implicit assumption that the average cost of comparing two numbers is constant. Thus, for example, we can expect that if we double the number of objects, we double the amount of time required for a search.
We can speed up sequential search for unsuccessful search by
This program has the same functionality as Program 2.1, but it is much more efficient.
static int search(int a[], int v, int l, int r)
{
while (r >= l)
{ int m = (l+r)/2;
if (v == a[m]) return m;
if (v < a[m]) r = m-1; else l = m+1;
}
return -1;
}
Property 2.2
Sequential search in an ordered table examines N numbers for each search in the worst case and about N/ 2 numbers for each search on the average.
We still need to specify a model for unsuccessful search. This result
The cost of an unsuccessful search ending before or after the
N
th entry in the table is the same:
N
.
Another way to state the result of Property 2.2 is to say that the running time of sequential search is proportional to
MN
for
M
transactions, on the average and in the worst case. If we double either the number of transactions or the number of objects in the table, we can expect the running time to double; if we double both, we can expect the running time to go up by a factor of 4. The result also
Program 2.2 is a classical solution to the search problem that is much more efficient than sequential search. It is based on the idea that if the numbers in the table are in order, we can eliminate half of them from consideration by comparing the one that we seek with the one at the middle position in the table. If it is equal, we have a successful search. If it is less, we apply the same method to the left half of the table. If it is greater, we apply the same method to the right half of the table. Figure 2.7 is an example of the operation of this method on a sample set of numbers.
To see whether or not 5025 is in the table of numbers in the left column, we first compare it with 6504 ; that leads us to consider the first half of the array. Then we compare against 4548 (the middle of the first half); that leads us to the second half of the first half. We continue, always working on a subarray that would contain the number being sought, if it is in the table. Eventually, we get a subarray with just one element, which is not equal to 5025 , so 5025 is not in the table.
Property 2.3
Binary search never examines more than
lg
N
+ 1
numbers.
The proof of this property illustrates the use of
To search in a table of size
N
, we examine the middle number, then search in a table of size no larger than
N/
2
. The actual cost could be less than this value either because the comparison might cause us to terminate a successful search or because the table to be searched might be of size
N/
2
- 1 (if
N
is even). As we did in the solution of Formula 2.2, we can
n
+ 1 if
N
= 2
n
and then verify the general result by induction.
Property 2.3 says that we can solve a huge search problem with up to 1 billion numbers with at most 30 comparisons per transaction, and that is likely to be much less than the time it takes to read the request or write the result on typical computers. The search problem is so important that several techniques have been developed that are even faster than this one, as we shall see in Chapters 12 through 16.
Note that we express Property 2.1 and Property 2.2 in terms of the operations that we perform most often on the data. As noted in the
We can verify the analytic evidence of Properties 2.1 and 2.2 by implementing and testing the algorithms. For example, Table 2.4 shows running times for binary search and sequential search for
M
searches in a table of size
N
(including, for binary search, the cost of sorting the table) for various values of
M
and
N
. We will not consider the implementation of the program to run these experiments in detail here because it is similar to those that we consider in full detail in Chapters 6 and 11, and because we consider the use of library methods and other details of putting together programs from
Table 2.4
This example is a prototype of our general approach to comparing algorithms. We use mathematical analysis of the frequency with which algorithms perform critical abstract operations, then use those results to deduce the functional form of the running time, which allows us to verify and extend empirical studies. As we develop algorithmic solutions to computational problems that are more and more refined, and as we develop mathematical analyses to learn their performance characteristics that are more and more refined, we call on mathematical studies from the literature, so as to keep our attention on the algorithms
2.47 Give the average number of comparisons used by Program 2.1 in the case that a N of the searches are successful, for 0
a
1.
Table 2.4. Empirical study of sequential and binary search
These relative timings validate our analytic results that sequential search takes time proportional to MN and binary search takes time proportional to M lg N for M searches in a table of N objects. When we increase N by a factor of 2, the time for sequential search
increases by a factor of 2 as well, but the time for binary search hardly changes. Sequential search is infeasible for huge M as N increases, but binary search is fast even for huge tables.M = 1000
M = 10000
M = 100000
N
S
B
S
B
S
B
125
3
3
36
12
357
126
250
6
2
63
13
636
130
500
13
1
119
14
1196
137
1250
28
1
286
15
2880
146
2500
57
1
570
16
154
5000
113
1
1172
17
164
12500
308
2
3073
17
173
25000
612
1
19
183
50000
1217
2
20
196
100000
2682
2
21
209
Key :
S sequential search (Program 2.1)
B binary search (Program 2.2)
![]()
2.48 Estimate the probability that at least one of M random 10-digit numbers matches one of a set of N given values, for M = 10, 100, and 1000 and N = 10 3 , 10 4 , 10 5 , and 10 6 .
2.49 Write a driver program that generates M random integers and puts them in an array, then counts the number of N random integers that matches one of the numbers in the array, using sequential search. Run your program for M = 10, 100, and 1000 and N = 10, 100, and 1000.
2.50 State and prove a property analogous to Property 2.3 for binary search.
| Top |