Chapter 11: What Is a Hashtable? | Data Structures Demystified (Demystified)

Hashtable is one of those programming terms whose definition is illusive. Hash means mishmash, and a table is the organization of data into columns and rows, but a table containing mishmash data seems useless to an application. Not necessarily ! Programmers use a hashtable to store and retrieve large amounts of information efficiently . You ll learn how this is done and how to use hashtables in your application in this chapter.

A Hashtable

Object-oriented applications that mimic real life must store and retrieve large amounts of information. Previously in this book, you learned that information is associated with an object and stored in an instance of a class that represents the object within the application.

Objects are stored using one of a number of data structures. The choice of data structure depends on the nature of the application. A hashtable is a common data structure to store objects that have a key/value relationship.

A hashtable is an array of pointers to data. Data takes the form of a user -defined structure that consists of up to three elements: the key, the value, and a pointer to the instance of the next structure in the hashtable. The pointer is used only if your collisions are handled in the manner described in this chapter; otherwise , it is not required.

The key uniquely identifies the value. Each user-defined structure in the hashtable must have a unique key. The value is data that is associated with the key, and it can appear more than once in the hashtable. Think of a key as a student ID and the corresponding value as a student s name. Each student is assigned a unique student ID, but two or more students can have the same name .

What makes a hashtable interesting is the way in which the program assigns a user-defined structure to an array element of the hashtable. The program hashes the key of a user-defined structure to determine which array element is assigned the user-defined structure.

A bit confused ? If so, you re not alone, because this concept isn t intuitive. To clear up any confusion, look at Figure 11-1, which shows three entries in the hashtable. I simplified this illustration by using blocks to represent each instance of the user-defined structure that contains the key/value of the entry. Later in this chapter, you ll see the actual user-defined structure used for hashtable entries.

Figure 11-1: The hashtable is an array whose elements point to user-defined structures that contain data.

Figure 11-1 also shows the index that represents the hashtable. Notice that each array element points to a user-defined structure. The user-defined structure contains the actual data for the entry of the hashtable.

Each user-defined structure is assigned to a specific array element based on the user-defined structure s key. The key is translated into a number that is the array index. This number is called a hash value and is created by using the process called hashing . The hash value becomes the array index of the array element that points to the user-defined structure whose key is hashed . Each index in Figure 11-1 is the hash value of the key of the user-defined structure.

You can think of hashing as a way to come up with the index of the array element associated with an entry in the hashtable. You ll learn how to perform hashing in your application later in this chapter.

Hashing achieves an even distribution of index values, which makes finding information faster than if a string of bits is in a natural order. In a natural order, words and names follow a predictable pattern. By shuffling the bits that make up words and names , a program no longer treats those bits as a text string and instead randomizes the bits to make them more efficient to search.

Hashing is a high-speed scheme for taking a key that has natural sequence (alphabetical or numerical order) and pseudo-randomizing it. If the key is a name, certain letters appear more often than other letters , and certain sequences of characters occur more frequently than other characters . Hashing moves the bits around to produce an even distribution of hash values, which is needed to quickly search a hashtable.

The result of hashing is a number that has no real significance, but it is used as an array index to store and retrieve information that is associated with an entry that is stored in a hashtable. Indexes shown in Figure 11-1 are hash values of the corresponding key of the user-defined structure pointed to by the array element.

Problems with Hashing

Hashing is not perfect. Occasionally, a collision occurs when two different keys hash into the same hash value and are assigned to the same array element. Programmers have come up with various techniques for dealing with this conflict.

A common way to deal with a collision is to create a linked list of entries that have the same hash value. For example, say that the key of each entry hashes to the same hash value, and this results in both being assigned to the same array element of the hashtable, as shown in Figure 11-2.

Figure 11-2: A linked list connects user-defined structures whose keys hash to the same hash value.

Because two entries cannot be assigned the same array element, the programmer creates a linked list. The first user-defined structure is assigned to the pointer in the array element. The second isn t assigned to any array element and is instead linked to the first user-defined structure, thereby forming a linked list.

As you ll see later in this chapter, the program locates an entry in a hashtable by referencing the hashed value of the entry. The hash value is the index of the array element that points to the entry. You can probably see the dilemma: the index points only to the first entry, not the second.

Programmers work around this problem by having the program read the key of the first entry. If the key isn t the one the program seeks, the program looks at the next entry in the linked list. It continues down the linked list until the program finds the desired entry or reaches the end of the linked list.