Developing a Hashtable | Data Structures Demystified (Demystified)

Creating and using a hashtable in your application is a two-step process. The first step is to define a user -defined structure similar to the way you defined nodes in a tree or a linked list. The second step is to define a Hashtable class. The Hashtable class declares an instance of the user-defined structure and defines member data and member functions that are used to add, remove, and manipulate information stored in the hashtable.

The Hashtable Class

The first step to using a hashtable in your application is to define a user-defined structure and then define the Hashtable class, which interacts with the hashtable within the application.

Begin by defining the user-defined structure, as shown here:

 typedef struct Metadata {  struct Metadata(char* key, char* value)  {  strcpy(this->key, key);  strcpy(this->value, value);  next = NULL;  }  char key[SIZE_KEY];  char value[SIZE_VALUE];  struct Metadata* next; } METADATA;

I called this structure metadata because metadata is data that describes data, similar to a column name on a spreadsheet. You ll assign data to the structure once an instance of metadata is declared in the Hashtable class.

The metadata structure has three members . The first member is a char array called key that stores the key of a key/value pair. The second member is a char array called value because it stores the corresponding value of the key/value pair. The last member is a pointer to another metadata structure, which is called next. This enables the application to link together structures at a given index.

The size of both character arrays is determined by the value of the #define macro called SIZE_KEY and SIZE_VALUE . These are defined in the HashTable.h header file (see Hashtable Using C++).

You ll notice that a constructor is defined inside the structure definition. This enables the application to pass the structure initial values for the key and value, which are then assigned the corresponding character arrays. It also initializes the pointer to the next node in the linked list. You ll see how this is used later.

Once the metadata structure is defined, you need to define the Hashtable class, which declares an instance of the metadata structure and defines member functions that interact with the metadata structure.

Here s the definition of the Hashtable class:

 class Hashtable {  private:  int tablesize;  METADATA** table;  int size;  METADATA* current_entry;  int current_index;  long hashString(char* key);  METADATA* find(char* key);  public:  Hashtable(int tablesize = DEFAULT_TABLESIZE);  virtual ~Hashtable();  bool put(char* key, char* value);  bool get(char* key, char* value);  bool contains(char* key);  bool remove(char* key);  void removeAll();  int getSize();  void initIterator();  bool hasNext();  void getNextKey(char* key); };

The Hashtable class is organized into the private access specifier and public access specifier sections. The private access specifier section contains five data members and two member functions.

The first data member is an integer called tablesize , which is later assigned the size of the array of pointers that stores entries in the hashtable. Next is a pointer to a pointer called table , an array of metadata pointers that will store information in the hashtable. Each entry in the table is a pointer to a linked list of entries. NULL indicates there isn t an entry at this index.

The third data member of the Hashtable class is an integer called size that is later assigned the number of entries in the hashtable. The last two data members are current_entry and current_index . The current_entry data member is a pointer to the current entry in the metadata structure, and the current_index is an integer representing the current key. Both iterate entries in the hashtable.

Two functions are declared within the private access specifier of the Hashtable class: hashString() and find() . The hashString() function hashes a key in a key/value pair and returns the hash code. The returned hash code is the index where the entry resides in the hashtable, the index in the array of metadata pointers. The find() function searches a hashtable for a particular key and returns a pointer to the metadata structure that contains that key. Both functions are called by other member functions and are described in detail later in this chapter.

The public access specifier section of the Hashtable class contains member functions that create and interact with the hashtable. Each of these functions is discussed in forthcoming sections of this chapter.

Constructor and Destructor

The constructor of the Hashtable class initializes data members and creates the hashtable, as illustrated in the following code snippet. The size of the array of pointers ( tablesize ) is passed to the constructor when the application declares an instance of the Hashtable class. The value passed to the constructor must be an integer, which is assigned to the tablesize data member of the Hashtable class.

The constructor allocates an array of metadata pointers, which will store the data in the hashtable. This array is assigned to the table member of the class. Previously, you learned that the table data member is an array of pointers that point to metadata structures.

Once the instance of the Hashtable class is declared, the constructor uses a for loop to initialize elements of the table array to NULL. The size data member is initialized to zero, indicating there aren t any entries in the hashtable. However, you can increase the size of the hashtable by passing the tablesize to the constructor. In Figure 11-3, the hashtable size is five elements.

Figure 11-3: The constructor declares an array of pointers where each element of the array points to an instance of the metadata structure.

 Hashtable(int tablesize) {  size = 0;  this->tablesize = tablesize;  table = new METADATA*[tablesize];  for(int i=0; i<tablesize; i++)  {  table[i] = NULL;  } }

The destructor of the Hashtable class is shown in the next code snippet. It performs two actions. First, the destructor calls the removeAll() member function to remove all entries from the hashtable. After entries are deleted, the constructor deletes the array of pointers referenced by the table data member. It does this by using the delete operator.

 ~Hashtable() {  removeAll();  delete[] table; }

Inserting a New Entry

You insert a new entry into the hashtable by calling the put() member function, which is available directly to the application because it is declared in the public access specifier section of the Hashtable class.

The put() function is shown in the following code snippet. It requires two arguments. The first argument is a char pointer called key that contains the key of the new entry. The second argument is a char pointer called value that references the value of the new entry. The put() function returns a Boolean true if the new entry is inserted into the hashtable; otherwise , a Boolean false is returned.

As you ll recall, each key must be unique. Before the new entry is placed in the hashtable, the put() function determines if the key already exists by calling the find() member function and passing find() the key.

The keys stored in the hashtable are exactly the same as the key passed into the find() function. The hash determines which bucket it goes in, and then find() compares keys to find the desired key. You ll learn how the find() function works later in this chapter.

The find() function returns a NULL if the key isn t found; otherwise, the find() function returns a pointer to the metadata structure that contains the key. If the find() function doesn t return a NULL, the key is already in the hashtable, and the put() function returns a false .

However, if the find() function returns a NULL, a new entry is placed in the hashtable by first declaring a new instance of the metadata structure, which is then passed the key and the value of the new entry. Reference to the instance is assigned to a pointer called entry.

Next, the hashString() member function is called and passed the key of the new entry. The hashString() function hashes the key and returns a hash number that is used as the array index for the entry in the hashtable. The hash number is assigned to the integer called bucket. You ll learn how the hashString() function operates later in this chapter.

The bucket integer is then used as the array index of the table data member of the Hashtable class. As you ll remember from the previous section of this chapter, the table data member is an array of metadata pointers. This means that the table[bucket] references the element of the hashtable that will be assigned the new entry.

Before the entry is assigned to this element, the current element in that bucket is assigned to the next member of the instance of the metadata called entry that is declared in the put() function. After this assignment, the new entry is assigned to the table[bucket] element of the hashtable. In effect, what this does is make the new entry the first entry in a linked list defined at this point in the array. If there was no entry at this index, the value of the index would be NULL. This would assign NULL to the next pointer of the new entry, which is okay because the new entry is the only entry in the linked list.

The put() function then increments the size data member of the Hashtable class, indicating an additional entry has been placed in the hashtable. The put() function then returns a Boolean true . Figure 11-4 illustrates the hashtable if you pass 111 as the key and Bob as the value to the put() function.

Figure 11-4: Here s what happens after the first entry is placed on the hashtable by calling the put() function.

 bool put(char* key, char* value) {  if(find(key) != NULL)  {  return false;  }  METADATA* entry = new METADATA(key, value);  int bucket = hashString(key);  entry->next = table[bucket];  table[bucket] = entry;  size++;  return true; }

Retrieving a Value

You can retrieve a value stored in a hashtable by calling the get() member function, which is illustrated in the next code snippet. The get() function requires two arguments. The first argument is a char pointer that references the key of the entry that you want to retrieve. The second argument is a char pointer that references the value of the entry. The get() function copies the value of the entry from the hashtable to this char pointer if the key is found in the hashtable. If the key is found, then a Boolean true is returned by the get() function; otherwise, a Boolean false is returned.

The get() function searches the hashtable by calling the find() function and passing it the key received as the first argument to the get() function. The find() function hashes the key before searching for the key in the hashtable. The find() function then either returns a reference to the metadata structure that contains the key, or returns a NULL.

The get() function assigns the return value of the find() function to a pointer called temp, which is a pointer to a metadata structure. The get() function then determines if the temp pointer is NULL. If so, then the first array element of the value argument is assigned the NULL character (sets value to an empty string) and the get() function returns a Boolean false , indicating that this key does not exist in the hashtable.

If temp isn t NULL, it means the find() function found the key in the hashtable and it returns a reference to the metadata that contains the entry. The value of the entry is then copied to the value argument, and the get() function returns a Boolean true .

 bool get(char* key, char* value) {  METADATA* temp = find(key);  if(temp == NULL)  {  value[0] = '  bool get(char* key, char* value) { METADATA* temp = find(key); if(temp == NULL) { value[0] = '\0'; return false; } else { strcpy(value, temp->value); return true; } } 
 ';  return false;  }  else  {  strcpy(value, temp->value);  return true;  } }

find()

You called the find() member function several times in other member functions. Now let s take a close look at how the find() member function works. As you ll recall, the purpose of the find() function is to search the hashtable for a key. If the key is found, then the find() function returns a reference to the metadata structure that contains the key and the corresponding value. If the key isn t found, then find() returns a NULL.

The find() function requires one argument, a reference to the key. The key is then passed to the hashString() member function, which hashes the key and returns a hash number that corresponds to the key. It does this because keys stored in the hashtable are hash number representations of the actual key. Therefore, the key must be converted to its corresponding hash number for the find() function to locate the entry in the array of pointers.

The hash number returned by the hashString() function is assigned to the bucket integer, which is used as the index to identify the entry in the table array that is the hashtable. The value of the table array element is a reference to a metadata structure, which is then assigned to the temp pointer of the metadata structure.

As long as the temp pointer isn t NULL, the find() function uses the strcmp() function to compare the key element of the metadata structure pointed to by temp with the key passed to the find() function. At this point, the temp variable is being used to iterate the linked list. If there is a match, then the metadata structure pointed to by the temp pointer is returned. If there isn t a match, then the next member of the temp metadata structure is assigned to the temp pointer, and the find() function continues by making another comparison.

If there isn t a match after the find() function has examined all the entries in the hash index, then the find() function returns a NULL.

 METADATA* find(char* key) {  int bucket = hashString(key);  METADATA* temp = table[bucket];  while(temp != NULL)  {  if(strcmp(key, temp->key) == 0)  {  return temp;  }  temp = temp->next;  }  return NULL; }

contains()

The purpose of the contains() member function is to determine if a key exists in the hashtable. As you can see by the following definition, the contains() function is simple to construct, but it has a critical role in working with hashtables.

As you learned previously in this chapter, each key of a hashtable must be unique. The contains() function enables your application to ensure that keys are unique by determining if the key already exists in the hashtable.

The contains() function requires one argument, which is a reference to the key that you want to know exists in the hashtable. The contains() function returns a Boolean true if the key exists or a Boolean false if the key isn t found.

The contains() function determines if the key exists by calling the find() function and passing it the key. In the previous section of this chapter, you learned that the find() function returns either a reference to the metadata that contains the key or a NULL. The contains() function determines which of these is returned.

If a NULL is returned by the find() function, then the contains() function returns a Boolean false; otherwise, if the find() function returns reference to the metadata that contains the key, a Boolean true is returned.

 bool contains(char* key) {  if(find(key) == NULL)  {  return false;  }  else  {  return true;  } }

Remove an Entry

You ll need to call the remove() member function whenever your application needs to remove an entry from the hashtable. The remove() function, shown in the next code snippet, requires one argument, which is a reference to the key of the entry that you want to remove from the hashtable. If the key is found and the entry successfully removed, then the remove() function returns a Boolean true; otherwise, a Boolean false is returned.

The remove() function hashes the key of the entry you want to remove by calling the hashString() function and passing it a reference to the key received as the argument to the remove() function. The hashString() function returns the hash number for this key, which is then assigned to an integer called bucket.

The bucket is used as the array index of the table data member of the Hashtable class. The table[bucket] references the element of the hashtable that contains the linked list, which must be searched to find the entry. This value is assigned to a temp pointer that will be used to iterate the list.

The remove() function then determines if the entry exists by comparing the temp pointer to NULL. If it is NULL, then a Boolean false is returned to the statement that calls the remove() function. If temp isn t NULL, then at least one entry exists in the linked list, and the remove() function determines where the entry appears in the hashtable linked list.

First, the remove() function determines if the entry is the first node on the linked list by using the strcmp() function to compare the key of the entry to the key passed to the remove() function. If they match, then the strcmp() function returns a zero, and the remove() function knows that the entry is the first node on the linked list. If the entry isn t the first node, then the remove() function must iterate through the linked list to locate the entry.

If the entry is the first node, then the remove() function switches entries on the linked list. As you learned earlier in this chapter, the temp metadata contains three elements: the key, the value, and a reference to the next entry called next.

Reference to the next entry is assigned to the table[bucket] array element, which currently contains reference to the entry that is being removed from the hashtable. This makes the next entry the first entry in the linked list because the current entry is the first entry in the linked list.

After this switch is made, the remove() function uses the delete operator to deallocate the current entry, which is pointed to by the temp pointer. It is at this point the entry is removed from the hashtable. The remove() function then decrements the size data member to reflect the removal of the entry and returns a Boolean true , indicating that the entry was successfully removed.

If the entry isn t the first node on the linked list, the remove() function must step through the entire linked list looking for the entry. It does this by assigning reference to the next metadata structure, which is the next entry, to the temp_next pointer. As long as the temp_next pointer isn t NULL, the remove() function calls the strcmp() function to compare the key member of the temp_next metadata structure to the key passed as an argument to the remove() function.

If they match, the entries are switched using the same steps as if the entry is the first node on the linked list. If they don t match, then the next entry (metadata structure) is assigned to the temp_next pointer and the search continues. If the key cannot be located in the linked list after the search is exhausted, the remove() function returns a Boolean false .

 bool remove(char* key) {  int bucket = hashString(key);  METADATA* temp = table[bucket];  if(temp == NULL)  {  return false;  }  else if(strcmp(key, temp->key) == 0)  {  table[bucket] = temp->next;  delete temp;  size--;  return true;  }  else  {  METADATA* temp_next = temp->next;  while(temp_next != NULL)  {  if(strcmp(key, temp_next->key) == 0)  {  temp->next = temp_next->next;  delete temp_next;  size--;  return true;  }  temp = temp->next;  temp_next = temp_next->next;  }  }  return false; }

Another way to remove entries from a hashtable is to call the removeAll() function. The removeAll() function, shown in the next code snippet, deletes all entries in the hashtable. To do this, the removeAll() function uses a for loop to iterate all the entries in the hashtable. At each entry, the while loop executes to transverse the linked list to delete all entries that are linked to the hashtable entry. Once entries on the linked list are deleted, the for loop moves to the next entry in the hashtable and repeats the process until all linked entries and all entries on the hashtable are removed.

It begins by declaring a pointer called temp that points to a metadata structure. The temp pointer is then assigned the first element in the hashtable array, which is called table.

As long as the temp pointer isn t NULL, the removeAll() function assigns reference to the next metadata associated with the current entry (metadata structure) to the next pointer. The key and value of the current entry are then displayed on the screen before the delete operator is called to remove the entry pointed to by the temp pointer.

The entry pointed to by the next pointer is then assigned to the temp pointer, and the removeAll() function returns to the top of the while loop and continues by removing the next entry from the hashtable. This continues until the temp pointer is NULL, which means that the hashtable is empty. (When temp is NULL, one linked list is finished, then Java returns to the outer loop again to process the next linked list.) The removeAll() function then sets the size data member of the Hashtable class to zero, indicating there are no entries in the hashtable.

 void removeAll() {  for(int i=0; i<tablesize; i++)  {  METADATA* temp = table[i];  while(temp != NULL)  {  METADATA* next = temp->next;  cout << "Removing node - key:" << temp->key <<  "\t" << temp->value << endl;  delete temp;  temp = next;  }  }  size = 0; }

getSize()

The getSize() member function is the simplest function of the Hashtable class because it reads the size data member of the Hashtable class and returns its value to the statement that calls the getSize() function. The getSize() function definition is listed in the next code snippet.

Why should you use the getSize() function instead of giving the application access to the size data member? To protect the integrity of the data. If you gave the application direct access to the size data member, statements within the application could assign an incorrect value to size . By controlling access to size to only function members of the hashtable, you protect the integrity of the data.

 int getSize() {  return size; }

hashString()

The hashString() member function is another function called by other member functions of the Hashtable class whenever a function needs to convert a key to a hash number key. The hashString() function requires one argument, a char pointer to the key that is being hashed . The hash number that corresponds to the key is then returned by the hashString() function as a long.

The definition of the hashString() function is listed in the next code snippet. The hashing process begins by first determining the length of the key by calling the strlen() function. The length is assigned to an integer that we call n. You ll also declare a long called h and initialize it to zero to store intermediate values of the hash key during the hashing process.

The hashing process works at the bit level of the key and, in effect, randomizes bits that comprise the key. The hashString() function iterates through each character of the key by using the for loop. During each iteration, the bits that comprise the value of the h variable are shifted (h << 2), and the bits of the current character of the key are added to the shifted bits. The result is then assigned to the h variable. This process continues until the last character of the key is hashed.

The hashString() then calculates the modulus value, the final hashed value (h), and the table size (h % tablesize). The modulus value can be either a positive or negative value. However, the hashed key must be a positive value. Therefore, the hashString() function returns the absolute value of the hashed key (abs(h % tablesize)).

 long hashString(char* key) {  int n = strlen(key);  long h = 0;   for(int i=0; i<n; i++)  {  h = (h << 2) + key[i];  }  return abs(h % tablesize); }

Note

The goal of hashing is to take a data set of keys and produce a nicely distributed sequence of indices. Although programmers agree that the hashtable size should be a prime number, there are various algorithms for hashing. However, all hashing algorithms have one thing in common: they use bit shifting.

There isn t a perfect hashing function. Some hashing functions work better on a given dataset than others. Programmers typically test a wide variety of hash functions on a dataset before settling on the best one to use for a specific dataset.

initIterator()

The initIterator() member function initializes some class variables that traverse all the entries in the linked list. The initIterator() function doesn t require any arguments and doesn t return any value, as shown in the next code snippet. The initIterator() is called by the displayAll() function that is defined by the application to display the content of the hashtable. You ll learn more about the displayAll() function in the Hashtable Using C++ section of this chapter.

When called, the initIterator() function assigns values to two data members of the Hashtable class. The current_entry data member is assigned NULL, and the current_index data member is assigned the value of the tablesize data member.

Next, a for loop finds the first entry in the hashtable. During each loop, the current element of the table array is compared to NULL. If the element is NULL, the search continues to search for the element of the table array that isn t NULL. The first element that isn t NULL is the first entry in the hashtable.

Once the first entry is found, the value of the corresponding element of the table array is assigned to the current_entry . This value is a reference to the metadata structure that contains the key and value for that entry. The index of the element is then assigned to the current_index data member. The initIterator() function then returns.

The initIterator() function is used by the displayAll() function to determine the first entry in the hashtable before the hasNext() and getNextKey() member functions are called. Both these functions directly access the current_entry and current_index data members of the Hashtable class. These functions are discussed in detail in the next section of this chapter.

 void initIterator() {  current_entry = NULL;  current_index = tablesize;  for(int i=0; i<tablesize; i++)  {   if(table[i] == NULL)  {  continue;  }  else  {  current_entry = table[i];  current_index = i;  break;  }  } }

hasNext() and getNextKey()

An application iterates the hashtable by calling the hasNext() and getNextKey() member functions. These two functions are used together with initIterator() to retrieve all the keys from a hashtable. The hasNext() function determines if there is another entry in the hashtable based on the current state of the iterator. As illustrated next, the hasNext() function doesn t require any arguments and returns a Boolean true if another entry exists or a Boolean false if the end of the hashtable is reached.

The hasNext() function makes this determination by comparing the value of the current_entry data member to NULL. If the value of the current_entry is NULL, then the hasNext() function returns a Boolean false; otherwise, a Boolean true is returned.

 bool hasNext() {  if(current_entry == NULL)  {  return false;  }  else  {  return true;  } }

As you ll recall from the initIterator() section, the current_entry and current_index are data members of the Hashtable class. The current_entry data member contains a reference to the current entry in the hashtable, and the current_index holds the index value of the table array that references the current entry.

The getNextKey() function retrieves the key of the entry pointed to by the current_entry data member. The getNextKey() function then moves to the next entry in the hashtable by first trying to go to the next element in the linked list. If the next element is NULL, it moves to the next array index and iterates the array to find the next entry.

The following code snippet is the definition of the getNextKey() function. In it, the getNextKey() function requires one argument. The argument is a char pointer to an array called key. The getNextKey() function copies the key of the current entry to this array, which is then accessed by the statement that calls the getNextKey() function.

Before the process begins, the getNextKey() function determines if the value of the current_entry is NULL. If so, a NULL character is assigned to the first element of the key array (it sets key to an empty string), and the getNextKey() returns without further processing.

If the value of the current_entry data member isn t NULL, then the key of the metadata structure that contains the current entry is copied to the key pointer by calling the strcpy() function. Once copied, the getNextKey() function sets out to locate the next entry. It does this by referencing the next member of the metadata structure that contains the current entry.

The value of the next member is compared to NULL in an if statement. If the next member isn t NULL, indicating there is another entry in the linked list at this index, the value of the next member is assigned to the current_entry data member, making the next entry the current entry of the hashtable.

However, if the value of the next member is NULL, then the getNextKey() function steps through each element of the table array to find an array element whose value isn t NULL. When it finds one, the getNextKey() function copies the value of the array element to the current_entry data member and copies the index of that array element to the current_index data member. The getNextKey() function then returns to the statement that called it.

If the remaining elements of the table array are NULL, there are no more entries in the hashtable. The getNextKey() function then assigns a NULL value to the current_entry and assigns the value of the tablesize data member to the current_index data member. These keys and values are pulled from the hashtable in no particular order; generally , hashtables do not support ordering of the data.

 void getNextKey(char* key) {  if(current_entry == NULL)  {  key[0] = '  void getNextKey(char* key) { if(current_entry == NULL) { key[0] = '\0'; return; } strcpy(key, current_entry->key); if(current_entry->next != NULL) { current_entry = current_entry->next; } else { for(int i=current_index+1; i<tablesize; i++) { if(table[i] == NULL) { continue; } current_entry = table[i]; current_index = i; return; } current_entry = NULL; current_index = tablesize; } } 
 ';  return;  }  strcpy(key, current_entry->key);  if(current_entry->next != NULL)  {  current_entry = current_entry->next;  }  else  {  for(int i=current_index+1; i<tablesize; i++)  {  if(table[i] == NULL)  {  continue;  }  current_entry = table[i];  current_index = i;  return;  }  current_entry = NULL;  current_index = tablesize;  } }