# Lecture 16: Binary Trees

## Linear and Binary Searches

In a previous lecture lists were discussed. The types of "real" problems encountered are to store data like: employee records, invoices, charge card records and data similar to these. One of the reasons we store data like these examples is to be able to retrieve the data and then to edit or to delete the records. However to be able to do these processes, we must first find the records with the desired characteristics to perform these processes. What we must do is to search the list. Searching can not be done on all types of data. The data must have the properties that the operators: >( ), <( ) and ==( ) are overloaded on the data type being searched. As has been discussed there are several different methods for searching. This section will discuss: Linear Search and Binary Search.

First let's consider a linear search on a vector. You could set up a search by creating a linearsearch( ) function like the following:

` template<typename Datatype> bool linearsearch(const vector<Datatype> & a_vector, Datatype &item, short &location) {   bool found = false;   location = 0;   for(;;)   {     if(found || (location == a_vector.size()))        break;     if(item == a_vector[location])        found = true;     else        location++;   }   return found; } `

In the above code notice that the program appears to enter an infinite for( ) loop. However, notice that before the loop is entered, the Boolean found is set to false. Notice further that the first conditional in the loop will cause the loop to terminate with a break should found have the value true or the counter location reaches the vector's size. The implementation must therefore be set up so as to evaluate the output of the function. See vector6.cpp for an example that uses this function on objects of the class Date is stored in date.h.

Next let's consider a linear search on a linked list. A search function similar to the one above can be written for a linked list. What must be different in the code is that location is no longer a counter but a pointer to the node of the linked list. Second, instead of using the [ ] operator to move down the list, pointers to nodes need to be used. For example:

` template<typename datatype> bool linearsearch(datatype &item, node* &location) {   bool found = false;   location = first;   for(;;)   {     if(found || (location == NULL))        break;     if(item == location->data)        found = true;     else        location = location->next;   }   return found; } `

Compare this linearsearch( ) with the one in the previous example and notice the similarities and differences needed for the type of data storage used in each list. For an implementation of a linear search on a linked list see the example: linked2.cpp. The search is performed on link list objects of the List class found in the file linked.h. The function linearsearch( ) is a member function of the class List. The data stored in the list are objects from the Date class in the file date.h. This search is similar to the find( ) algorithm in the STL.

Each of the examples above is very inefficient. In particular if the data sought is the last record searched or it does not even exist in any record, then the program must pass through each of the other records first before concluding that the record does not exist. However if the data was arranged in a sequential order so that the data is increasing in value according to the operator <( ), then the search could stop after the data sought exceeded the records being searched. For example suppose you were looking to see if the short: 14 was in a record in a list of shorts like: 15, 23, 67, 12, 59, 77, 11. In this case the entire list would have to be searched before determining that the short 14 was not a record. However, if the list was reordered to: 11, 12, 15, 23, 59, 67, 77, then the search could stop once the record 15 had been reached. For example if the vector in the example vector6.cpp or the linked list in the example linked2.cpp had both been sorted first, then the search would be much faster. The function linearsearch( ) is modified as in the following:

` template<typename Datatype> bool linearsearch(const vector<Datatype> & a_vector, Datatype &item, short &location) {   bool found = false;   location = 0;   for(;;)   {     if(found || (location == a_vector.size())            || (item < a_vector[location]))        break;     if(item == a_vector[location])        found = true;     else        location++;   }   cout << endl << "The number of records counted was: "        << location + 1 << endl << endl; return found; } `

Notice in the code that the only thing changed from the previous search example for vectors was an early exit should the value of item be less than an element in the vector. In the above example, there is a line of code that outputs the current location of the search. This line of code is only for informational purposes and could be removed. For an implementation of this function, see: vector7.cpp This example again uses objects from the class Date stored in date.h.

The linear searches discussed so far are still not the most efficient searches even on sorted data. A better search for a large number of records of sorted data would be the binary search. That is suppose we had the same list of short data records used in the example above: 11, 12, 15, 23, 59, 67, 77. For binary searches, the search for 14 would not begin with the first record. Instead it begins with the middle record namely 23. The program would notice that 23 is greater than 14, so it would step back to the middle of the first half of the data namely: 11, 12, 15. The record 12 as the middle would be checked next. The record 12 is less than 14. The record was again not found so the half block of data above 12 would be checked next namely 15. Since this is the last data record and it is not equal to 14, the search stops. This particular data set and the specific record sought did not take full advantage of the binary search

` bool found = false; location = 0; short counted = 0, first = 0, last = a_vector.size() - 1; for(;;) {   if(found || (first > last))      break;   ++counted;   location = (first + last)/2;   if(item < a_vector[location])      last = location - 1;   else      if(item > a_vector[location])         first = location + 1;      else         found = true; } cout << endl << "The number of records counted was: "       << counted << endl <<endl; `

Notice how the for( ) loop has been changed from the last example. In particular notice the statement:

` location = (first + last)/2; `

Compare running the same data through the code above. Notice that the two searches would take the same number of steps on the example above. However, for large sets of data, the binary search can save time. For example with vectors see vector8.cpp, This example is also using the class Date stored in date.h. No example is given for a binary search for the linked list because this type of search is not an efficient one for the linked list.