Searching

   


Looking up a person in the phone book, finding a specific topic in the index of a book, looking for that empty spot in the car park, or those lost keys at home are all examples of the process called searching.

Searching is the process of locating a data item in a collection of data items that matches certain criteria.

Searching is an important computer task. It allows us to let the computer sift through millions of data items in databases to find the requested information.

A database is a collection of related data items, organized to allow convenient access.

Computers, Searching, and the Human Genome Project

graphics/bulb.gif

The Human Genome Project began in 1990 and is expected to last for thirteen years. The goals of the project are to identify the 30000 or so genes in human DNA, and to determine the sequence of three billion "data" items (chemical base pairs) found here. An important part of the project has been to record the information on computer databases, and thereby allow researchers to let computers search for specific patterns in this maze of information. At the heart of many of the data analysis tools employed, we find fast sophisticated searching algorithms without which the Human Genome Project would have been an impossible task to accomplish.

The mixture of computer science and biology/medicine has become so important that a new science called bioinformatics, which combines these disciplines, has emerged.


In the next couple of sections, we will concentrate on the problem of locating a specific number, called a key value, in an array. If the number is found, the position of the number is returned. If the number occurs more than once, the algorithm can return any one of those positions. If the number is not found, a signal saying "Not found" must be returned. The array cannot be altered during the search.

The next two sections look at two different search algorithms to solve the specified problem the sequential search, which is simple but slow, and the more sophisticated and significantly faster binary search.

Sequential Search

The sequential search simply begins searching at the first item of the database (in this case, an array) and moves through the collection of items one by one. It is accordingly described as being linear. Each item visited is compared to the data item to be located. When a match is found, the position of the data item is reported and the process terminated. If no match is found, the process stops when the end of the array is reached, accompanied by an indication saying "Not found."

The search key is the data item to be located. The search key is used for comparisons during the searching process.

If the array is not sorted, the linear sequential search is more or less the only search method available. Other faster and more sophisticated search algorithms presume, during their hunt for the key value, that the array is sorted. When we discuss the binary search method, it will become clear why this particular technique only works on a sorted array.

A Phone Book Analogy

graphics/bulb.gif

The numbers in a phone book are not sorted, so if you only have a phone number and want to find the person with this number, you have to perform a sequential search, sifting through all the numbers until a match is found.

The names of the phone book, on the other hand, are sorted in alphabetical order. Thus, if you have a name, you can perform a much faster search, which would be more closely related to the binary search technique presented in the next section.


Listing 11.10 presents an example of a sequential search method. The user can search the array testScores (line 19) by entering an integer value. If the value is found, the index containing the value is returned; otherwise, the program will let the user know that no matching value was found.

Listing 11.10 SequentialSearcher.cs
01: using System; 02: 03: class SequentialSearcher 04: { 05:     public static int SequentialSearch (int key,int[] arr) 06:     { 07:         for (int i = 0; i < arr.Length; i++) 08:         { 09:             if (arr[i] == key) 10:                 return i; 11:         } 12:         return -1; 13:     } 14: 15:     public static void Main() 16:     { 17:         int searchKey; 18:         int searchResult; 19:         int [] testScores = {23, 6, 20, 12, 20, 34, 10, 5, 14, 30} ; 20: 21:         do 22:         { 23:             Console.Write("Please enter the test score you would like to locate: "); 24:             searchKey = Convert.ToInt32(Console.ReadLine()); 25:             searchResult = SequentialSearch(searchKey, testScores); 26:             if (searchResult >= 0) 27:                 Console.WriteLine("The score was found in position: {0}\n",  graphics/ccc.gifsearchResult); 28:             else 29:                 Console.WriteLine("The score was not found\n"); 30:             Console.Write("Would you like to do another search? Y)es N)o "); 31:         } while (Console.ReadLine().ToUpper() == "Y"); 32:     } 33: } Please enter the test score you would like to locate: 30<enter> The score was found in position: 9 Would you like to do another search? Y)es N)o Y<enter> Please enter the test score you would like to locate: 20<enter> The score was found in position: 2 Would you like to do another search? Y)es N)o Y<enter> Please enter the test score you would like to locate: 15<enter> The score was not found Would you like to do another search? Y)es N)o N<enter> 

The SequentialSearch method (lines 5 13) performs a sequential search on its formal parameter arr, an array of base type int. The search key is represented by the formal parameter named key. If the search key is found to match the value of an array element, the position of this element will be returned to the caller (lines 9 and 10); otherwise, the value -1 is returned.

The linear search might be too slow for large collections of data. If those data are sorted, we can resort to the efficient binary search technique instead.

Binary Search

The following example illustrates the fundamental idea behind the binary search technique. Consider the array presented in Figure 11.16 called myArray. It contains fifteen array elements with values sorted in ascending order. Our task is to find the index of the search key value 22. During this thought experiment, we can only, like the computer, access one array element at a time. Let's first look at the number in the middle of the array (called middle-index), in this case index 7 with value 16 (see Step 1 of Figure 11.16). What conclusions can we make by comparing this value to our search key 22? Well, because 22 is greater than 16, and because the array is sorted in ascending order, 22 must exist (if it does exist) somewhere in the upper half of the array more precisely, in an index for which it is true that Middle-index + 1 <= index <= 14 that is equivalent to 8 <= index <= 14, as illustrated with the horizontal arrows and in Figure 11.16. Observe that we have suddenly reduced the number of values we need to search by more than half (eight values in this case) simply by accessing one array element.

Figure 11.16. Finding value 22 in myArray using the binary search technique.
graphics/11fig16.gif

During Step 2 in Figure 11.16, we find the middle-index (index 11) of the new smaller array interval, and compare its associated value (24) with the search key 22. Because the search key this time is smaller than the value of the middle-index (24), the key value must be located somewhere between index 7 and index 11. This can be expressed as 8 <= index <= middle-index 1 or shorter, 8 <= index <= 10.

In Step 3, the middle-index of this new smaller interval is 9, and the search key is greater than its associated value (20). The new interval that can contain the key value 22 has shrunk to just one array element with index 10. Expressed as middle-index + 1 <= index <= 10 or shorter, 10 <= index <= 10.

Step 4 finds the value of the new corresponding middle-index to be equal to the search key. We can thus conclude that index 10 contains the key value.

Figure 11.17 explicitly shows how fast the segments searched in the previous example shrink.

Figure 11.17. Segments of myArray searched.
graphics/11fig17.gif

The Binary Search Technique in a Nutshell

graphics/bulb.gif

The binary search method repeatedly compares the search key with the value of the middle-index of an array segment. Initially, this array segment consists of the entire array, but, by combining the result of the comparison with the knowledge that the array is pre-sorted, the algorithm is able to divide the array into two halves one that must be searched further, and one that can be eradicated from any future searches.


Note:

  • m indicates the middle value of each segment.

Listing 11.11 implements the binary search algorithm in the BinarySearch method (lines 3 24). The Main() utilizes its functionality to search the testScores array (line 53) for values entered by the user (lines 55 65).

The PrintArray method is merely included for illustrative purposes. It lets you track the specific segments of the array searched by the BinarySearch method and provides similar output to that of Figure 11.17. However, to employ the PrintArray method correctly, it needs to be called from within the BinarySearch method. This call has not been included in Listing 11.11 to keep the initial source code uncluttered. Later in this chapter, I will explain how to utilize PrintArray correctly.

Listing 11.11 BinarySearcher.cs
01: using System; 02: 03: class BinarySearcher 04: { 05:      //Search for the key value in the arr array 06:      //using the binary search algorithm. 07:     public static int BinarySearch (int key, int[] arr) 08:     { 09:         int low = 0; 10:         int high = arr.Length - 1; 11:         int middle; 12: 13:         while (low <= high) 14:         { 15:             middle = (low + high) / 2; 16:             if (key > arr[middle]) 17:                 low = middle + 1; 18:             else if (key < arr[middle]) 19:                 high = middle - 1; 20:             else 21:                 return middle; 22:         } 23:         return -1; 24:     } 25: 26:      //PrintArray is used merely for illustrative purposes. 27:      //It will print the segment of the array arr commencing 28:      //with the low index and up to the high index. 29:      //The middle element will be marked with an m. 30:     public static void PrintArray(int low, int middle,int high, int [] arr) 31:     { 32:         for (int i = 0; i <= high; i++) 33:         { 34:             if (i < low) 35:             { 36:                 Console.Write("    "); 37:             } 38:             else 39:             { 40:                 if (i == middle) 41:                     Console.Write("{0,2} m ", arr[i]); 42:                 else 43:                     Console.Write("{0,2}   ", arr[i]); 44:             } 45:         } 46:         Console.WriteLine(); 47:     } 48: 49:     public static void Main() 50:     { 51:         int searchKey; 52:         int searchResult; 53:         int [] testScores = {2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38} ; 54: 55:         do 56:         { 57:             Console.Write("Please enter the test score you would like to locate: "); 58:             searchKey = Convert.ToInt32(Console.ReadLine()); 59:             searchResult = BinarySearch(searchKey, testScores); 60:             if (searchResult >= 0) 61:                 Console.WriteLine("The score was found in position: {0}\n",  graphics/ccc.gifsearchResult); 62:             else 63:                 Console.WriteLine("The score was not found\n"); 64:             Console.Write("Would you like to do another search? Y)es N)o "); 65:         }  while (Console.ReadLine().ToUpper() == "Y"); 66:     } 67: } Please enter the test score you would like to locate: 30<enter> The score was found in position: 14 Would you like to do another search? Y)es N)o Y<enter> Please enter the test score you would like to locate: 3<enter> The score was not found Would you like to do another search? Y)es N)o N<enter> 

The BinarySearch method (lines 3 24) performs a binary search on the formal parameter arr (line 7), which must represent a one-dimensional array of base type int sorted in ascending order. The formal parameter key (line 7) is the search key. If the key is found in arr, the method returns the index containing this value; otherwise, -1 is returned. To keep track of the current array segment where the key value (if present) must reside, the method declares two variables called low and high (lines 9 and 10). The key value (if present) must exist in the array segment enclosed by these two variables. Thus, they are the source code counterparts to the two small horizontal arrows (graphics/rightarr.gif and graphics/leftarr.gif) from Figure 11.16. Initially, the entire array might contain the key value, so low is set to zero (line 9) and high to arr.Length 1 (line 10). To represent the middle-index, we use a variable called middle (line 11).

The repetitive nature of the four steps of Figure 11.16 is represented by a while loop (lines 13 22). It should be terminated if either low is greater than high, or if middle is equal to key. The former termination condition is taken care of by the while loop condition (low <= high) (line 13). The latter by the statement return middle; (line 21) that terminates the loop and the method and returns the value of middle back to the caller. This statement is only executed if both of the following comparisons fail (key > arr[middle]) (line 16) and (key < arr[middle]), which means only if key is equal to arr[middle], thus satisfying our termination condition.

During each repetition of the while loop body we need to

  1. Find the middle-index: middle (line 15).

  2. If key is greater than the array value pointed at by the middle-index, set low equal to middle plus one (lines 16 and 17).

  3. If key is smaller than the array value pointed at by the middle-index, set high equal to middle minus one (lines 18 and 19).

  4. If both 2 and 3 fail, terminate the loop and return the value of middle as discussed previously (line 21).

If the while loop is terminated due to the loop condition (low <= high) being false, it means that key could not be found in arr, in which case, -1 is returned (line 23).

To print the array segments analyzed during each repeated execution of the while loop body, you can call the PrintArray method from within the BinarySearch method, as done in line 11 of Listing 11.12. Notice that a couple of WriteLine calls have been inserted in lines 7, 18, and 22 to make the output more presentable. By exchanging the BinarySearch version of Listing 11.11 with that of Listing 11.12, you will be able to produce the output shown after the listing.

Listing 11.12 The BinarySearch Method Calling PrintArray
01: public static int BinarySearch (int key, int[] arr) 02: { 03:     int low = 0; 04:     int high = arr.Length - 1; 05:     int middle; 06: 07:     Console.WriteLine("\nSegments of array searched:\n"); 08:     while (low <= high) 09:     { 10:         middle = (low + high) / 2; 11:         PrintArray(low, middle, high, arr); 12:         if (key > arr[middle]) 13:             low = middle + 1; 14:         else if (key < arr[middle]) 15:             high = middle - 1; 16:         else 17:         { 18:             Console.WriteLine(); 19:             return middle; 20:         } 21:     } 22:     Console.WriteLine(); 23:     return -1; 24: } Please enter the test score you would like to locate: 14<enter> Segments of array searched:  2   4   6   8  10  12  14  16  18  20m 22  24  26  28  30  32  34  36  38  2   4   6   8  10m 12  14  16  18                     12  14m 16  18 The score was found in position: 6 Would you like to do another search? Y)es N)o Y<enter> Please enter the test score you would like to locate: 22<enter> Segments of array searched:  2   4   6   8  10  12  14  16  18  20m 22  24  26  28  30  32  34  36  38                                         22  24  26  28  30m 32  34  36  38                                         22  24m 26  28                                         22m The score was found in position: 10 Would you like to do another search? Y)es N)o N<enter> 

The high efficiency of the binary search might not be apparent by searching a very short array, such as testScores in Listing 11.11. However, the extreme speed associated with this technique becomes apparent if we look at much larger collections of data items. Because the binary search eliminates half of the array elements searched after each comparison, we can find the maximum number of comparisons required for any array length by determining how many times we need to divide this length by two before we get a number equal to or smaller than 1. For example, to search 1,024 elements requires a maximum 10 searches because 210 is equal to 1,024, and 1,024 divided by 2 ten times is equal to 1. To search a collection of one billion array elements merely requires 30 comparisons; an extraordinarily low number compared to the up to one billion comparisons required by the linear search presented in the previous section.

Searching with System.Array's IndexOf Method

Instead of writing your own search method, you can apply the search functionality provided by the methods IndexOf and LastIndexOf (see Table 11.3 presented earlier). There are several overloaded versions of both of these methods, but this section will only provide an example of how to use one version of IndexOf. Please refer to the .NET Framework Documentation and Table 11.3 for information on the remaining overloaded methods. The IndexOf version applied in this section takes two arguments the array to be searched and the search key value, as illustrated in the following line:

 System.Array.IndexOf(<Array_identifier>, <Search_Key_Value>) 

It either returns the index of the first array element containing the search key value, or -1 one if not found. Notice that the array need not be sorted for the method to work.

The IndexOf method is utilized in Listing 11.13 to search the testScores array for search key values entered by the user.

Listing 11.13 DotNETSearcher.cs
01: using System; 02: 03: class DotNETSearcher 04: { 05:     public static void Main() 06:     { 07:         int searchKey; 08:         int searchResult; 09:         int [] testScores = {32, 2, 87, 23, 2, 51, 101, 22, 45, 33, 10} ; 10: 11:         do 12:         { 13:             Console.Write("Please enter the test score you would like to locate: "); 14:             searchKey = Convert.ToInt32(Console.ReadLine()); 15:             searchResult = Array.IndexOf(testScores, searchKey); 16:             if (searchResult >= 0) 17:                 Console.WriteLine("The score was found in position: {0}\n",  graphics/ccc.gifsearchResult); 18:             else 19:                 Console.WriteLine("The score was not found\n"); 20:             Console.Write("Would you like to do another search? Y)es N)o "); 21:         }  while (Console.ReadLine().ToUpper() == "Y"); 22:     } 23: } Please enter the test score you would like to locate: 2<enter> The score was found in position: 1 Would you like to do another search? Y)es N)o Y<enter> Please enter the test score you would like to locate: 20<enter> The score was not found Would you like to do another search? Y)es N)o N<enter> 

Just one call to Array.IndexOf (line 15) is required to search testScores.

The search methods IndexOf and LastIndexOf search an unsorted array, and will not be as fast as the binary search method searching a sorted array for larger data collections. Consequently, you might want to conduct your own speed tests equivalent to those performed on the covered sorting methods with data similar to those searched by the final application before you choose which methods to apply.


   


C# Primer Plus
C Primer Plus (5th Edition)
ISBN: 0672326965
EAN: 2147483647
Year: 2000
Pages: 286
Authors: Stephen Prata

Similar book on Amazon

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net