Searching Algorithms | Searching and Sorting

Table of contents:

Looking up a phone number, accessing a Web site and checking the definition of a word in a dictionary all involve searching large amounts of data. The next two sections discuss two common search algorithmsone that is easy to program yet relatively inefficient and one that is relatively efficient but more complex to program.

24.2.1. Linear Search

The linear search algorithm searches each element in an array sequentially. If the search key does not match an element in the array, the algorithm tests each element and, when the end of the array is reached, informs the user that the search key is not present. If the search key is in the array, the algorithm tests each element until it finds one that matches the search key and returns the index of that element.

As an example, consider an array containing the following values

34 56 2 10 77 51 93 30 5 52

and an application that is searching for 51. Using the linear search algorithm, the application first checks whether 34 matches the search key. It does not, so the algorithm checks whether 56 matches the search key. The application continues moving through the array sequentially, testing 2, then 10, then 77. When the application tests 51, which matches the search key, the application returns the index 5, which is the location of 51 in the array. If, after checking every array element, the application determines that the search key does not match any element in the array, the application returns a sentinel value (e.g. -1). If there are duplicate values in the array, linear search returns the index of the first element in the array that matches the search key.

Figure 24.2 declares class LinearArray. This class has two private instance variablesan array of ints named data, and a static Random object named generator to fill the array with randomly generated ints. When an object of class LinearArray is instantiated, the constructor (lines 1219) creates and initializes the array data with random ints in the range 1099.

Figure 24.2. Class that contains an array of random integers and a method that searches that array sequentially.

(This item is displayed on page 1292 in the print version)

 1 // Fig 24.2: LinearArray.cs
 2 // Class that contains an array of random integers and a method
 3 // that will search that array sequentially.
 4 using System;
 5
 6 public class LinearArray
 7 {
 8 private int[] data; // array of values
 9 private static Random generator = new Random();
10
11 // create array of given size and fill with random integers
12 public LinearArray( int size )
13 {
14 data = new int[ size ]; // create space for array
15
16 // fill array with random ints in range 10-99
17 for ( int i = 0; i < size; i++ )
18 data[ i ] = generator.Next( 10, 100 );
19 } // end LinearArray constructor
20
21 // perform a linear search on the data 
22 public int LinearSearch( int searchKey ) 
23 { 
24  // loop through array sequentially 
25  for ( int index = 0; index < data.Length; index++ )
26  if ( data[ index ] == searchKey ) 
27  return index; // return index of integer 
28 
29  return -1; // integer was not found 
30 } // end method LinearSearch 
31
32 // method to output values in array
33 public override string ToString()
34 {
35 string temporary = "";
36
37 // iterate through array
38 foreach ( int element in data )
39 temporary += element + " ";
40
41 temporary += "
"; // add newline character
42 return temporary;
43 } // end method ToString
44 } // end class LinearArray

Lines 2230 perform the linear search. The search key is passed to parameter searchKey. Lines 2527 loop through the elements in the array. Line 26 compares each element in the array with searchKey. If the values are equal, line 27 returns the index of the element. If the loop ends without finding the value, line 29 returns -1. Lines 3343 declare method ToString, which returns a string representation of the array for printing.

Figure 24.3 creates LinearArray object searchArray containing an array of 10 ints (line 13) and allows the user to search the array for specific elements. Lines 1718 prompt the user for the search key and store it in searchInt. Lines 2137 then loop until the searchInt is equal to -1. The array holds ints from 1099 (line 18 of Fig. 24.2). Line 24 calls method LinearSearch to determine whether searchInt is in the array. If searchInt is found, LinearSearch returns the position of the element, which the application outputs in lines 2729. If searchInt is not in the array, LinearSearch returns -1, and the application notifies the user (lines 3132). Lines 3536 retrieve the next integer from the user.

Figure 24.3. Sequentially searching an array for an item.

(This item is displayed on page 1293 in the print version)

 1 // Fig 24.3: LinearSearchTest.cs
 2 // Sequentially search an array for an item.
 3 using System;
 4
 5 public class LinearSearchTest
 6 {
 7 public static void Main( string[] args )
 8 {
 9 int searchInt; // search key
10 int position; // location of search key in array
11
12 // create array and output it
13 LinearArray searchArray = new LinearArray( 10 );
14 Console.WriteLine( searchArray ); // print array
15
16 // input first int from user
17 Console.Write( "Please enter an integer value (-1 to quit): " );
18 searchInt = Convert.ToInt32( Console.ReadLine() );
19
20 // repeatedly input an integer; -1 terminates the application
21 while ( searchInt != -1 )
22 {
23 // perform linear search
24 position = searchArray.LinearSearch( searchInt );
25
26 if ( position != -1 ) // integer was not found
27 Console.WriteLine(
28 "The integer {0} was found in position {1}.
",
29 searchInt, position );
30 else // integer was found
31 Console.WriteLine( "The integer {0} was not found.
",
32 searchInt );
33
34 // input next int from user
35 Console.Write( "Please enter an integer value (-1 to quit): " );
36 searchInt = Convert.ToInt32( Console.ReadLine() );
37 } // end while
38 } // end Main
39 } // end class LinearSearchTest

64 90 84 62 28 68 55 27 78 73

Please enter an integer value (-1 to quit): 78
The integer 78 was found in position 8.

Please enter an integer value (-1 to quit): 64
The integer 64 was found in position 0.

Please enter an integer value (-1 to quit): 65
The integer 65 was not found.

Please enter an integer value (-1 to quit): -1

Efficiency of Linear Search

Searching algorithms all accomplish the same goalfinding an element that matches a given search key, if such an element, in fact, exists. There are, however, a number of things that differentiate search algorithms from another. The major difference is the amount of effort they require to complete the search. One way to describe this effort is with Big O notation, which is a measure of the worst-case run time for an algorithmthat is, how hard an algorithm may have to work to solve a problem. For searching and sorting algorithms, this is particularly dependent on how many data elements there are.

Suppose an algorithm is designed to test whether the first element of an array is equal to the second element of the array. If the array has 10 elements, this algorithm requires one comparison. If the array has 1,000 elements, the algorithm still requires one comparison. In fact, the algorithm is completely independent of the number of elements in the array. This algorithm is said to have a constant run time, which is represented in Big O notation as O(1). An algorithm that is O(1) does not necessarily require only one comparison. O(1) just means that the number of comparisons is constantit does not grow as the size of the array increases. An algorithm that tests whether the first element of an array is equal to any of the next three elements is still O(1) even though it requires three comparisons.

An algorithm that tests whether the first element of an array is equal to any of the other elements of the array will require at most n 1 comparisons, where n is the number of elements in the array. If the array has 10 elements, this algorithm requires up to nine comparisons. If the array has 1,000 elements, this algorithm requires up to 999 comparisons. As n grows larger, the n part of the expression "dominates" and subtracting one becomes inconsequential. Big O is designed to highlight these dominant terms and ignore terms that become unimportant as n grows. For this reason, an algorithm that requires a total of n 1 comparisons (such as the one we described earlier) is said to be O(n). An O(n) algorithm is referred to as having a linear run time. O(n) is often pronounced "on the order of n" or more simply "order n."

Now suppose you have an algorithm that tests whether any element of an array is duplicated elsewhere in the array. The first element must be compared with every other element in the array. The second element must be compared with every other element except the first (it was already compared to the first). The third element must be compared with every other element except the first two. In the end, this algorithm will end up making (n 1) + (n 2) + ... + 2 + 1 or n2/2 n/2 comparisons. As n increases, the n2 term dominates and the n term becomes inconsequential. Again, Big O notation highlights the n2 term, leaving n2/2. But as we will soon see, constant factors are omitted in Big O notation.

Big O is concerned with how an algorithm's run time grows in relation to the number of items processed. Suppose an algorithm requires n2 comparisons. With four elements, the algorithm will require 16 comparisons; with eight elements, the algorithm will require 64 comparisons. With this algorithm, doubling the number of elements quadruples the number of comparisons. Consider a similar algorithm requiring n2/2 comparisons. With four elements, the algorithm will require eight comparisons; with eight elements, the algorithm will require 32 comparisons. Again, doubling the number of elements quadruples the number of comparisons. Both of these algorithms grow as the square of n, so Big O ignores the constant, and both algorithms are considered to be O(n2), which is referred to as quadratic run time and pronounced "on the order of n-squared" or more simply "order n-squared."

When n is small, O(n2) algorithms (running on today's billion-operation-per-second personal computers) will not noticeably affect performance. But as n grows, you will start to notice the performance degradation. An O(n2) algorithm running on a million-element array would require a trillion "operations" (where each could actually require several machine instructions to execute). This could require a few hours to execute. A billion-element array would require a quintillion operations, a number so large that the algorithm could take decades! O(n2) algorithms are easy to write, as you will see in this chapter. You will also see algorithms with more favorable Big O measures. These efficient algorithms often take a bit more cleverness and effort to create, but their superior performance can be well worth the extra effort, especially as n gets large and algorithms are compounded into larger applications.

The linear search algorithm runs in O(n) time. The worst case in this algorithm is that every element must be checked to determine whether the search item exists in the array. If the size of the array is doubled, the number of comparisons that the algorithm must perform is also doubled. Note that linear search can provide outstanding performance if the element matching the search key happens to be at or near the front of the array. But we seek algorithms that perform well, on average, across all searches, including those where the element matching the search key is near the end of the array.

Linear search is the easiest search algorithm to program, but it can be slow compared to other search algorithms. If an application needs to perform many searches on large arrays, it may be better to implement a different, more efficient algorithm, such as the binary search, which we present in the next section.

Performance Tip 24 1

Sometimes the simplest algorithms perform poorly. Their virtue is that they are easy to program, test and debug. Sometimes more complex algorithms are required to realize maximum performance.

24.2.2. Binary Search

The binary search algorithm is more efficient than the linear search algorithm, but it requires that the array be sorted. The first iteration of this algorithm tests the middle element in the array. If this matches the search key, the algorithm ends. Assuming the array is sorted in ascending order, if the search key is less than the middle element, the search key cannot match any element in the second half of the array and the algorithm continues with only the first half of the array (i.e., the first element up to, but not including, the middle element). If the search key is greater than the middle element, the search key cannot match any element in the first half of the array, and the algorithm continues with only the second half of the array (i.e., the element after the middle element through the last element). Each iteration tests the middle value of the remaining portion of the array, called a subarray. A subarray can have no elements, or it can encompass the entire array. If the search key does not match the element, the algorithm eliminates half of the remaining elements. The algorithm ends either by finding an element that matches the search key or reducing the subarray to zero size.

As an example, consider the sorted 15-element array

2 3 5 10 27 30 34 51 56 65 77 81 82 93 99

and a search key of 65. An application implementing the binary search algorithm would first check whether 51 is the search key (because 51 is the middle element of the array). The search key (65) is larger than 51, so 51 is "discarded" (i.e., eliminated from consideration) along with the first half of the array (all elements smaller than 51.) Next, the algorithm checks whether 81 (the middle element of the remainder of the array) matches the search key. The search key (65) is smaller than 81, so 81 is discarded along with the elements larger than 81. After just two tests, the algorithm has narrowed the number of values to check to three (56, 65 and 77). The algorithm then checks 65 (which indeed matches the search key), and returns the index of the array element containing 65. This algorithm required just three comparisons to determine whether the search key matched an element of the array. Using a linear search algorithm would have required 10 comparisons. [Note: In this example, we have chosen to use an array with 15 elements so that there will always be an obvious middle element in the array. With an even number of elements, the middle of the array lies between two elements. We implement the algorithm to chose the higher of the two elements.]

Figure 24.4 declares class BinaryArray. This class is similar to LinearArrayit has two private instance variables, a constructor, a search method (BinarySearch), a RemainingElements method (which creates a string containing the elements not yet searched) and a ToString method. Lines 1221 declare the constructor. After initializing the array with random ints from 1099 (lines 1718), line 20 calls method Array.Sort on the array data. Method Sort is a static method of class Array that sorts the elements in an array in ascending order. Recall that the binary search algorithm works only on sorted arrays.

Figure 24.4. Class that contains an array of random integers and a method that uses binary search to find an integer.

 1 // Fig 24.4: BinaryArray.cs
 2 // Class that contains an array of random integers and a method
 3 // that uses binary search to find an integer.
 4 using System;
 5
 6 public class BinaryArray
 7 {
 8 private int[] data; // array of values
 9 private static Random generator = new Random();
10

11 // create array of given size and fill with random integers 12 public BinaryArray( int size ) 13 { 14 data = new int[ size ]; // create space for array 15 16 // fill array with random ints in range 10-99 17 for ( int i = 0; i < size; i++ ) 18 data[ i ] = generator.Next( 10, 100 ); 19 20 Array.Sort( data ); 21 } // end BinaryArray constructor 22 23 // perform a binary search on the data 24 public int BinarySearch( int searchElement ) 25 { 26 int low = 0; // low end of the search area 27 int high = data.Length - 1; // high end of the search area 28 int middle = ( low + high + 1 ) / 2; // middle element 29 int location = -1; // return value; -1 if not found 30 31 do // loop to search for element 32 { 33 // print remaining elements of array 34 Console.Write( RemainingElements( low, high ) ); 35 36 // output spaces for alignment 37 for ( int i = 0; i < middle; i++ ) 38 Console.Write( " " ); 39 40 Console.WriteLine( " * " ); // indicate current middle 41 42 // if the element is found at the middle 43 if ( searchElement == data[ middle ] ) 44 location = middle; // location is the current middle 45 46 // middle element is too high 47 else if ( searchElement < data[ middle ] ) 48 high = middle - 1; // eliminate the higher half 49 else // middle element is too low 50 low = middle + 1; // eliminate the lower half 51 52 middle = ( low + high + 1 ) / 2; // recalculate the middle 53 } while ( ( low <= high ) && ( location == -1 ) ); 54 55 return location; // return location of search key 56 } // end method BinarySearch 57 58 // method to output certain values in array 59 public string RemainingElements( int low, int high ) 60 { 61 string temporary = ""; 62 63 // output spaces for alignment 64 for ( int i = 0; i < low; i++ ) 65 temporary += " "; 66 67 // output elements left in array 68 for ( int i = low; i <= high; i++ ) 69 temporary += data[ i ] + " "; 70 71 temporary += " "; 72 return temporary; 73 } // end method RemainingElements 74 75 // method to output values in array 76 public override string ToString() 77 { 78 return RemainingElements( 0, data.Length - 1 ); 79 } // end method ToString 80 } // end class BinaryArray

Lines 2456 declare method BinarySearch. The search key is passed into parameter searchElement (line 24). Lines 2628 calculate the low end index, high end index and middle index of the portion of the array that the application is currently searching. At the beginning of the method, the low end is 0, the high end is the length of the array minus 1 and the middle is the average of these two values. Line 29 initializes the location of the element to -1the value that will be returned if the element is not found. Lines 3153 loop until low is greater than high (this occurs when the element is not found) or location does not equal -1 (indicating that the search key was found). Line 43 tests whether the value in the middle element is equal to searchElement. If this is true, line 44 assigns middle to location. Then the loop terminates, and location is returned to the caller. Each iteration of the loop tests a single value (line 43) and eliminates half of the remaining values in the array (line 48 or 50).

Lines 2240 of Fig. 24.5 loop until the user enters -1. For each other number the user enters, the application performs a binary search to determine whether the number matches an element in the array. The first line of output from this application is the array of ints, in increasing order. When the user instructs the application to search for 72, the application first tests the middle element (indicated by * in the sample output of Fig. 24.5), which is 52. The search key is greater than 52, so the application eliminates from consideration the first half of the array and tests the middle element from the second half of the array. The search key is smaller than 82, so the application eliminates from consideration the second half of the subarray, leaving only three elements. Finally, the application checks 72 (which matches the search key) and returns the index 9.

Figure 24.5. Using binary search to locate an item in an array.

 1 // Fig 24.5: BinarySearchTest.cs
 2 // Use binary search to locate an item in an array.
 3 using System;
 4
 5 public class BinarySearchTest
 6 {
 7 public static void Main( string[] args )
 8 {
 9 int searchInt; // search key
10 int position; // location of search key in array
11
12 // create array and output it
13 BinaryArray searchArray = new BinaryArray( 15 );
14 Console.WriteLine( searchArray );
15

16 // prompt and input first int from user 17 Console.Write( "Please enter an integer value (-1 to quit): " ); 18 searchInt = Convert.ToInt32( Console.ReadLine() ); 19 Console.WriteLine(); 20 21 // repeatedly input an integer; -1 terminates the application 22 while ( searchInt != -1 ) 23 { 24 // use binary search to try to find integer 25 position = searchArray.BinarySearch( searchInt ); 26 27 // return value of -1 indicates integer was not found 28 if ( position == -1 ) 29 Console.WriteLine( "The integer {0} was not found. ", 30 searchInt ); 31 else 32 Console.WriteLine( 33 "The integer {0} was found in position {1}. ", 34 searchInt, position); 35 36 // prompt and input next int from user 37 Console.Write( "Please enter an integer value (-1 to quit): " ); 38 searchInt = Convert.ToInt32( Console.ReadLine() ); 39 Console.WriteLine(); 40 } // end while 41 } // end Main 42 } // end class BinarySearchTest

12 17 22 25 30 39 40 52 56 72 76 82 84 91 93

Please enter an integer value (-1 to quit): 72

12 17 22 25 30 39 40 52 56 72 76 82 84 91 93
 *
 56 72 76 82 84 91 93
 *
 56 72 76
 *
The integer 72 was found in position 9.

Please enter an integer value (-1 to quit): 13

12 17 22 25 30 39 40 52 56 72 76 82 84 91 93
 *
12 17 22 25 30 39 40
 *
12 17 22
 *
12
 *
The integer 13 was not found.

Please enter an integer value (-1 to quit): -1

Efficiency of Binary Search

In the worst-case scenario, searching a sorted array of 1,023 elements will take only 10 comparisons when using a binary search. Repeatedly dividing 1,023 by 2 (because after each comparison, we are able to eliminate half of the array) and rounding down (because we also remove the middle element) yields the values 511, 255, 127, 63, 31, 15, 7, 3, 1 and 0. The number 1023 (210 1) is divided by 2 only 10 times to get the value 0, which indicates that there are no more elements to test. Dividing by 2 is equivalent to one comparison in the binary-search algorithm. Thus, an array of 1,048,575 (220 1) elements takes a maximum of 20 comparisons to find the key, and an array of one billion elements (which is less than 230 1) takes a maximum of 30 comparisons to find the key. This is a tremendous improvement in performance over the linear search. For a one-billion-element array, this is a difference between an average of 500 million comparisons for the linear search and a maximum of only 30 comparisons for the binary search! The maximum number of comparisons needed for the binary search of any sorted array is the exponent of the first power of 2 greater than the number of elements in the array, which is represented as log2n. All logarithms grow at roughly the same rate, so in Big O notation the base can be omitted. This results in a big O of O(log n) for a binary search, which is also known as logarithmic run time.

Sorting Algorithms

Preface

Index

Introduction to Computers, the Internet and Visual C#

Introduction to the Visual C# 2005 Express Edition IDE

Introduction to C# Applications

Introduction to Classes and Objects

Control Statements: Part 1

Control Statements: Part 2

Methods: A Deeper Look

Arrays

Classes and Objects: A Deeper Look

Object-Oriented Programming: Inheritance

Polymorphism, Interfaces & Operator Overloading

Exception Handling

Graphical User Interface Concepts: Part 1

Graphical User Interface Concepts: Part 2

Multithreading

Strings, Characters and Regular Expressions

Graphics and Multimedia

Files and Streams

Extensible Markup Language (XML)

Database, SQL and ADO.NET

ASP.NET 2.0, Web Forms and Web Controls

Web Services

Networking: Streams-Based Sockets and Datagrams

Searching and Sorting

Data Structures

Generics

Collections

Appendix A. Operator Precedence Chart

Appendix A. Operator Precedence Chart

Appendix B. Number Systems

Appendix C. Using the Visual Studio 2005 Debugger

Appendix D. ASCII Character Set

Appendix D. ASCII Character Set

Appendix E. Unicode®

Appendix F. Introduction to XHTML: Part 1

Appendix G. Introduction to XHTML: Part 2

Appendix H. HTML/XHTML Special Characters

Appendix H. HTML/XHTML Special Characters

Appendix I. HTML/XHTML Colors

Appendix I. HTML/XHTML Colors

Appendix J. ATM Case Study Code

Appendix K. UML 2: Additional Diagram Types

Appendix L. Simple Types

Index