Dictionaries | Professional C# 2005 with .NET 3.0

Dictionaries represent a sophisticated data structure that allows you to access an element based on a key. Dictionaries are also known as hash tables or maps. The main feature of dictionaries is fast lookup based on keys. You can also add and remove items freely, a bit like a List<T>, but without the performance overhead of having to shift subsequent items in memory.

Figure 10-5 shows a simplified representation of a dictionary. Here employee-ids such as B4711 are the keys added to the dictionary. The key is transformed into a hash. With the hash a number is created to associate an index with the values. The index then contains a link to the value. The figure is simplified because it is possible that a single index entry can be associated with multiple values, and the index can be stored as a tree.

image from book
Figure 10-5

The .NET Framework offers several dictionary classes. The main class you can use is Dictionary<TKey, TValue>. This class offers nearly the same properties and methods as SortedList<TKey, TValue> discussed earlier; that’s why they are not repeated here.

Key Type

A type that is used as a key in the dictionary must override the method GetHashCode() of the Object class. Whenever a dictionary class needs to work out where an item should be located, it calls the GetHashCode() method. The int that is returned by GetHashCode() is used by the dictionary to calculate an index where to place the element. We don’t go into this part of the algorithm. What you should know is that it involves prime numbers, so the capacity of a dictionary is a prime number.

The implementation of GetHashCode() must satisfy these requirements:

The same object should always return the same value.
Different objects can return the same value.
It should execute as quickly as possible; it must be inexpensive to compute.
It must not throw exceptions.
It should use at least one instance field.
The hash code value should be evenly distributed across the entire range of numbers that an int can store.
At best, the hash code should not change during the lifetime of the object.

Important

A good performance of the dictionary is based on a good implementation of the method GetHashCode().

What’s the reason for having hash code values evenly distributed across the range of integers? If two keys return hashes that give the same index, the dictionary class needs to start looking for the nearest available free location to store the second item - and will have to do some searching in order to retrieve this item later on. This is obviously going to hurt performance, and clearly, if lots of your keys are tending to give the same indexes for where they should be stored, this kind of clash becomes more likely. However, because of the way that Microsoft’s part of the algorithm works, this risk is minimized when the calculated hash values are evenly distributed between int.MinValue and int.MaxValue.

Besides having an implementation of GetHashCode(), the key type also must implement the IEquality.Equals() method or override the Equals() method from the Object class. Because different key objects may return the same hash code, the method Equals() is used by the dictionary comparing keys. The dictionary examines if two keys A and B are equal; it invokes A.Equals(B). This means that you must ensure that the following is always true:

Important

If A.Equals(B) is true, then A.GetHashCode() and B.GetHashCode() must always return the same hash code.

This probably seems a fairly subtle point, but it is crucial. If you contrived some way of overriding these methods so that the preceding statement was not always true, a dictionary that uses instances of this class as its keys would simply not work properly. Instead, you’d find funny things happening. For example, you might place an object in the dictionary and then discover that you could never retrieve it, or you might try to retrieve an entry and have the wrong entry returned.

Tip

For this reason, the C# compiler will display a compilation warning if you supply an override for Equals() but don’t supply an override for GetHashCode().

For System.Object this condition is true, because Equals() simply compares references, and GetHashCode() actually returns a hash that is based solely on the address of the object. This means that hash tables based on a key that doesn’t override these methods will work correctly. However, the problem with this way of doing things is that keys are regarded as equal only if they are the same object. That means that when you place an object in the dictionary, you then have to hang onto the reference to the key. You can’t simply instantiate another key object later with the same value. If you don’t override Equals() and GetHashCode(), the type is not very convenient to use in a dictionary.

Incidentally, System.String implements the interface IEquatable and overloads GetHashCode() appropriately. Equals() provides value comparison, and GetHashCode() returns a hash based on the value of the string. Strings can be used conveniently as keys in dictionaries.

Number types such as Int32 also implement the interface IEquatable and overload GetHashCode(). However, the hash code returned by these types simply maps to the value. If the number you would like to use as a key is not itself distributed around the possible values of an integer, using integers as keys doesn’t fulfill the rule of evenly distributing key values to get the best performance. Int32 is not meant to be used in a dictionary.

If you need to use a key type that does not implement IEquatable and override GetHashCode according to the key values you store in the dictionary, you can create a comparer implementing the interface IEqualityComparer<T>. IEqualityComparer<T> defines the methods GetHashCode() and Equals() with an argument of the object passed, so you can offer an implementation different from the object type itself. An overload of the Dictionary<TKey, TValue> constructor allows passing an object implementing IEqualityComparer<T>. If such an object is assigned to the dictionary, this class is used to generate the hash codes and compare the keys.

Dictionary Example

The dictionary example is a program that sets up a dictionary of employees. The dictionary is indexed by EmployeeId objects, and each item stored in the dictionary is an Employee object that stores details of an employee.

The class EmployeeId is implemented to define a key to be used in a dictionary. The members of the class are a prefix character and a number for the employee. Both of these variables are read-only and can only be initialized in the constructor. A key within the dictionary shouldn’t change, and this way that is guaranteed. The fields are filled within the constructor. The ToString() method is overloaded to get a string representation of the employee ID. As required for a key type, EmployeeId implements the interface IEquatable and overloads the method GetHashCode().

  [Serializable] public class EmployeeId  : IEquatable<EmployeeId> {    private readonly char prefix;    private readonly int number;    public EmployeeId(string id)    {       if (id == null) throw new ArgumentNullException("id");              prefix = (id.ToUpper())[0];       int numLength = id.Length - 1;       number = int.Parse(id.Substring(1, numLength > 6 ? 6 : numLength));    }    public override string ToString()    {       return prefix.ToString() + string.Format("{0,6:000000}", number);    }    public override int GetHashCode()    {       return (number ^ number << 16) * 0x15051505;    }    public bool Equals(EmployeeId other)    {       if (other == null) return false;              return (prefix == other.prefix && number == other.number);    } }

The Equals() method that is defined by the IEquatable<T> interface compares the values of two EmployeeId objects and returns true if the both values are the same. Instead of implementing the Equals() method from the IEquatable<T> interface, you can also override the Equals() method from the Object class:

 public bool Equals(EmployeeId other) {    if (other == null) return false;    return (prefix == other.prefix && number == other.number); }

With the number variable, a value from 1 to around 190000 is expected for the employees. This doesn’t fill the range of an integer. The algorithm used by GetHashCode() shifts the number 16 bits to the left, then does a xor with the original number, and finally multiplies the result by the hex value 15051505. The hash code is fairly distributed across the range of an integer.

 public override int GetHashCode() {    return (number ^ number << 16) * 0x15051505; }

Tip

On the Internet you can find a lot of more complex algorithms that have a better distribution across the integer range. You can also use the GetHashCode() method of a string to return a hash.

The Employee class is a simple entity class containing the name, salary, and ID of the employee. The constructor initializes all values, and the method ToString() returns a string representation of an instance. The implementation of ToString() uses the StringBuilder object to create the string representation for performance reasons.

  [Serializable] public class Employee {    private string name;    private decimal salary;    private readonly EmployeeId id;    public Employee(EmployeeId id, string name, decimal salary)    {       this.id = id;       this.name = name;       this.salary = salary;    }    public override string ToString()    {       StringBuilder sb = new StringBuilder(100);       sb.AppendFormat("{0}: {1, -20} {2:C}", id.ToString(), name, salary);       return sb.ToString();    } }

In the Main() method of the sample application, a new Dictionary<TKey, TValue> instance is created, where the key is of type EmployeeId and the value is of type Employee. The constructor allocates a capacity of 31 elements. Remember, the capacity based on prime numbers. However, when you assign a value that is not a prime number, you don’t need to worry. The Dictionary<TKey, TValue> class itself takes the next prime number that follows the integer passed to the constructor to allocate the capacity. The employee objects and IDs are created and added to the dictionary with the Add() method. Instead of using the Add() method, you can also use the indexer to add keys and values to the dictionary, as shown with the employees Jeff and Casey:

  static void Main() {    Dictionary<EmployeeId, Employee> employees =          new Dictionary<EmployeeId, Employee>(31);    EmployeeId idJimmie = new EmployeeId("B4711");    Employee jimmie = new Employee(idJimmie, "Jimmie Johnson", 8909140.00m);    employees.Add(idJimmie, jimmie);    Console.WriteLine(jimmie);    EmployeeId idTony = new EmployeeId("B12836");    Employee tony = new Employee(idTony, "Tony Stewart", 7285280.00m);    employees.Add(idTony, tony);    Console.WriteLine(tony);    EmployeeId idMatt = new EmployeeId("N34434");    Employee matt = new Employee(idMatt, "Matt Kenseth", 6608920.00m);    employees.Add(idMatt, matt);    Console.WriteLine(matt);    EmployeeId idJeff = new EmployeeId("J127");    Employee jeff = new Employee(idJeff, "Jeff Gordon", 5975870.00m);    employees[idJeff] = jeff;    Console.WriteLine(jeff);    EmployeeId idCasey = new EmployeeId("K100223");    Employee casey = new Employee(idCasey, "Casey Mears", 5413340.00m);    employees[idCasey] = casey;    Console.WriteLine(casey);

After the entries are added to the dictionary, inside a while loop employees are read from the dictionary. The user is asked to enter an employee number to store the number in the variable userInput. The user can exit the application by entering X. If the key is in the dictionary, it is examined with the TryGetValue() method of the Dictionary<TKey, TValue> class. TryGetValue() returns true if the key is found and false otherwise. If the value is found, the value associated with the key is stored in the employee variable. This value is written to the console.

Tip

You can also use an indexer of the Dictionary<TKey, TValue> class instead of TryGetValue() to access a value stored in the dictionary. However, if the key is not found, the indexer throws an exception of type KeyNotFoundException.

     while (true)    {       Console.Write("Enter employee id (X to exit)> ");       string userInput = Console.ReadLine();       userInput = userInput.ToUpper();       if (userInput == "X") break;           EmployeeId id = new EmployeeId(userInput);              Employee employee;       if (!employees.TryGetValue(id, out employee))       {          Console.WriteLine("Employee with id {0} does not exist",             id);       }       else       {          Console.WriteLine(employee);       }    } }

Running the application produces the following output:

 Enter employee ID (format:A999999, X to exit)> J127 J000127: Jeff Gordon         $5,975,870.00 Enter employee ID (format:A999999, X to exit)> N34434 N034434: Matt Kenseth        $6,608,920.00 Enter employee ID (format:A999999, X to exit)> X

Other Dictionary Classes

Dictionary<TKey, TValue> is the major dictionary class from the framework. There are some more classes, and of course there are also some nongeneric dictionary classes.

Dictionaries that are based on the Object type and are available since .NET 1.0 are described in the following table.

Open table as spreadsheet

Nongeneric dictionary	Description
Hashtable	Hashtable is the mostly used dictionary implementation of .NET 1.0. Keys and values are based on the Object type.
ListDictionary	ListDictionary is located in the namespace System .Collections.Specialized and is faster than the Hashtable if 10 or fewer items are used. ListDictionary is implemented as a linked list.
HybridDictionary	HybridDictionary uses a ListDictionary if the collection is small and switches to a Hashtable as the collection grows. If you don’t know the number of items in advance, you can use the HybridDictionary.
NameObjectCollectionBase	NameObjectCollectionBase is an abstract base class to associate keys of type string to values of type object. This can be used as a base class for custom string/object collections. This class uses a Hashtable internally.
NameValueCollection	NameValueCollection derives from NameObjectCollection. Here, both the key and value are of type string. This class has another feature that multiple values can use the same key.

Since .NET 2.0 generic dictionaries are preferred over object-based dictionaries:

Open table as spreadsheet

Generic Dictionary	Description
Dictionary<TKey, TValue>	Dictionary<TKey, TValue> is the general-purpose dictionary for mapping keys to values.
SortedDictionary<TKey, TValue>	SortedDictionary<TKey, TValue> is a binary search tree where the items are sorted based on the key. The key type must implement the interface IComparable<TKey>. If the key type is not sortable, you can also create a comparer implementing IComparer<TKey> and assign the comparer as a constructor argument of the sorted dictionary.

SortedDictionary<TKey, TValue> and SortedList<TKey, TValue> have similar functionality. But because SortedList<TKey, TValue> is implemented as a list and SortedDictionary<TKey, TValue> is implemented as a dictionary, the classes have different characteristics.

SortedList<TKey, TValue> uses less memory than SortedDictionary<TKey, TValue>.
SortedDictionary<TKey, TValue> has faster insertion and removal of elements.
When populating the collection with already sorted data, SortedList<TKey, TValue> is faster, if capacity changes are not needed.

Important

SortedList consumes less memory than SortedDictionary. SortedDictionary is faster for inserts and the removal of unsorted data.