Groups of Objects | Performance Consulting: A Practical Guide for HR and Learning Professionals

Chapter 5 - C# and the Base Classes

bySimon Robinsonet al.
Wrox Press 2002

We are now going to examine the support that the .NET base classes have for data structures in which a number of similar objects are grouped together. The simplest such data structure is the ordinary array, which we saw how to use in Chapter 2. The ordinary array is actually an instance of the class System.Array , but C# wraps its own syntax around this class. System.Array has the advantages of being relatively efficient for accessing an individual element given its index, and of having its own C# syntax, which obviously makes using it more intuitive. However, Array has the big disadvantage that you need to specify its size when you instantiate it. There is no facility for adding, inserting, or removing elements later on. You also have to have a numeric index in order to be able to access an element. This is not particularly useful if, for example, you are dealing with a set of employee records and need to look up a given record from the name of the employee.

.NET has quite extensive support for a number of other data structures that are useful in different circumstances. Not only that, but there are also a number of interfaces, which classes can implement in order to declare that they support all the functionality of a particular type of data structure. Here, we are going to survey three of these structures:

Array lists
Collections
Dictionaries (also sometimes known as maps)

Other than the basic System.Array , all the data structure classes are in the System.Collections namespace.

The name System.Collections reflects another of those terminology ambiguities that plague computing. Collection is often used informally to denote any data structure. However, it also has the more specific meaning of a class that implements IEnumerable or ICollection - a particular type of data structure that we will investigate later in the chapter. In this chapter, we will always use the term 'collection' to refer to the more specific meaning, except where .NET base class names force us to use it in the general sense.

Array Lists

An array list is very similar to an array, except that it has the ability to grow. It is represented by the class System.Collections.ArrayList .

The ArrayList class also has some similarities with the StringBuilder class that we looked at earlier. Just as a StringBuilder allocated enough space in memory to store a certain number of characters, and allowed you to manipulate characters within the space, the ArrayList allocates enough memory to store a certain number of object references. You can then efficiently manipulate these object references however you wish, within that limit. If you try to add more objects to the ArrayList than the capacity allows, then it will automatically increase its capacity by allocating a new area of memory big enough to hold twice as many elements as the current capacity, and relocating to there.

You can instantiate an array list by indicating the initial capacity you want. For this example, we will assume we are creating a list of Vectors :

   ArrayList vectors = new ArrayList(20);

If you don't specify the initial size, it defaults to 16 :

   ArrayList vectors = new ArrayList();   // capacity of 16

You can then add elements using the Add() method:

   vectors.Add(new Vector(2,2,2));     vectors.Add(new Vector(3,5,6));

The ArrayList treats all its elements as object references. That means you can store whatever objects you like in an ArrayList , but when accessing the objects, you will need to cast them back to the appropriate data type:

   Vector element1 = (Vector)vectors[1];

This example also shows that ArrayList defines an indexer, so that you can access its elements with an array-like syntax. You can also insert elements into the ArrayList :

   vectors.Insert(1, new Vector(3,2,2));   // inserts at position 1

There is also a useful override of Insert that allows you to insert all the elements of a collection into an ArrayList , given an ICollection interface reference.

You may remove elements:

   vvectors.RemoveAt(1);   // removes object at position 1

You can also supply an object reference to another method, Remove() . Doing this will take longer as it will cause the ArrayList to make a linear search through the array to find the object.

Note that adding or removing an element will cause all subsequent elements to have to be correspondingly shifted in memory, even if no reallocation of the entire ArrayList is needed,

You can modify or read the capacity via the Capacity property:

   vectors.Capacity = 30;

Note, however, that changing the capacity will cause the entire ArrayList to be reallocated to a new block of memory with the required capacity.

The number of elements actually in the ArrayList can be obtained via the Count property:

   int nVectors = vectors.Count;

An array list can be really useful if you need to build up an array of objects but you do not know in advance how big the array is going to end up. In that case, you can construct the 'array' in an ArrayList , and then copy the ArrayList back to a plain old array when you have finished if you actually need the data as an array (this would be the case, for example, if the array is to be passed to a method that expects an array as a parameter). The relationship between ArrayList and Array is in many ways similar to that between StringBuilder and String .

Unfortunately, unlike the StringBuilder class, there is no single method to do this conversion from an array list to an array. You have to use a loop to manually copy references back. Note, however, that you are only copying the references not the objects, so this shouldn't give too much of a performance hit:

   // vectors is an ArrayList instance being used to store Vector instances     Vector [] vectorsArray = new Vector[vectors.Count];     for (int i=0 ; i< vectors.Count ; i++)     vectorsArray[i] = (Vector)vectors [i];

Collections

The idea of a collection is that it represents a set of objects that you can access by stepping through each element in turn . In particular, it is the set of objects that you access using a foreach loop. In other words, when you write something like this:

   foreach (string nextMessage in messageSet)   {    DoSomething(nextMessage); }

you are assuming that the variable messageSet is a collection. The ability to use a foreach loop is the main purpose of collections. They offer little in the way of additional features.

Over the next couple of pages, we are going to look in more detail at what a collection is and implement our own collection by converting the Vector sample that we have been developing. The broad concepts behind collections are actually not new to .NET. Collections have been a part of COM for years , and have also been used in Visual Basic with the convenient For...Each syntax. Java also has a foreach loop, and in both cases the underlying architecture is very similar to that for .NET collections.

What is a Collection?

Internally, an object is a collection if it is able to supply a reference to a related object, known as an enumerator , which is able to step through the items in the collection. More specifically , a collection must implement the interface System.Collections.IEnumerable . IEnumerable defines just one method, and looks like this:

   interface IEnumerable     {     IEnumerator GetEnumerator();     }

The purpose of GetEnumerator() is to return the enumerator object. As you will gather from the above code, the enumerator object is expected to implement an interface, System.Collections.IEnumerator .

There is an additional collections interface, ICollection , which is derived from IEnumerable . More sophisticated collections will implement this interface as well. Besides GetEnumerator() , it implements a property that directly returns the number of elements in the collection. It also features support for copying the collection to an array and can supply information indicating if it is thread-safe. However, here we will only consider the simpler collection interface, IEnumerable .

IEnumerator looks like this:

   interface IEnumerator     {     object Current { get; }     bool MoveNext();     void Reset();     }

IEnumerator is intended to work like this: the object that implements it should be associated with one particular collection. When this object is first initialized , it does not yet refer to any elements in the collection, and you must call MoveNext() , which moves the enumerator so that it refers to the first element in the collection. You can then retrieve this element with the Current property. Current returns an object reference, so you will have to cast it to the type of object you are expecting to find in the collection. You can do whatever you want with that object then move to the next item in the collection by calling MoveNext() again. You repeat this process until there are no more items in the collection - you will know this has happened when the Current property returns null . If you wish, you can at any time return to the start of the collection by calling the Reset() method. Note that Reset() actually returns to just before the start of the collection so if you call this method, you will need to subsequently call MoveNext() again to get to the first element.

You can see from this that the point of the collection is simply to provide a way of stepping through all the elements when you don't want to supply an index, and you are happy to rely on the collection itself to choose the order in which the elements will be returned to you. This will usually mean that you are not bothered about the order in which the elements are retrieved, as long as you get to see all of them, although in some cases it may be that a particular collection is documented as returning the elements in a certain order. In one sense, a collection is a very basic type of group of objects, because it does not allow you to add or remove items from the group . All you can do is retrieve the items in an order determined by the collection, and examine them. It is not even possible to replace or modify items in the collection, because the Current property is read-only. The most frequent use of the collection is to give you the syntactical convenience of the foreach loop.

Arrays are also collections, as should be obvious because the foreach command works successfully with arrays. For the particular case of arrays, the enumerator supplied by the System.Array class steps through the elements in increasing order of index from zero upwards.

From the above discussion, we can see the above foreach loop in C# is just a syntactical shortcut for writing:

   {     IEnumerator enumerator = MessageSet.GetEnumerator();     string nextMessage;     enumerator.MoveNext();     while ((nextMessage = enumerator.Current) != null)     {     DoSomething(nextMessage);   // NB. We only have read access     // toNextMessage     enumerator.MoveNext();     }     }

Note the enclosing curly braces around the above code snippet. We have supplied them in order to ensure that this code has exactly the same effect as the earlier foreach loop. If we hadn't included them, then this code would have differed to the extent that the nextMessage and enumerator variables would have remained in scope beyond the end of the loop.

One important aspect of collections is that the enumerator is returned as a separate object. It should not be the same object as the collection itself. The reason is to allow for the possibility that more than one enumerator might be applied simultaneously to the same collection.

Adding Collection Support to the Vector Struct

Our Vector struct that we started in Chapter 3, and to which we have already added formatting support earlier in this chapter, is about to get another extension with collection support.

When we last left the Vector struct, a Vector instance contained three components , x , y , and z , and because we had defined an indexer in Chapter 3, it was possible to treat a Vector instance as an array, so that we could access the x -component by writing SomeVector[0] , the y -component by writing SomeVector[1], and the z -component by writing SomeVector[2] .

We will now extend the Vector struct into a new code sample, the VectorAsCollection project, in which it is also possible to iterate through the components of a Vector by writing code like this:

   foreach (double component in someVector)     Console.WriteLine("Component is " + component);

Our first task is to mark Vector as a collection by having it implement the IEnumerable interface. We start by modifying the declaration of the Vector struct:

   struct Vector : IFormattable, IEnumerable   {    public double x, y, z;

Note that the IFormattable interface is present because we added support for string format specifiers earlier in this chapter. Now we need to implement the IEnumerable interface:

   public IEnumerator GetEnumerator()     {     return new VectorEnumerator(this);     }

The implementation of GetEnumerator() could hardly be simpler, but it depends on the existence of a new class, VectorEnumerator , which we need to define. Since VectorEnumerator is not a class that any outside code needs to be able to see directly, we declare it as a private class inside the Vector struct. Its definition looks like this:

   private class VectorEnumerator : IEnumerator     {     Vector theVector;      // Vector object that this enumerato refers to     int location;   // which element of theVector the enumerator is     // currently referring to     public VectorEnumerator(Vector theVector)     {     this.theVector = theVector;     location = -1;     }     public bool MoveNext()     {     ++location;     return (location > 2) ? false : true;     }     public object Current     {     get     {     if (location < 0  location > 2)     throw new InvalidOperationException(     "The enumerator is either before the first element or " +     "after the last element of the Vector");     return theVector[(uint)location];     }     }     public void Reset()     {     location = -1;     }     }

As required for an enumerator, VectorEnumerator implements the IEnumerator interface. It also contains two member fields, theVector , which is a reference to the Vector (the collection) that this enumerator is to be associated with, and location , an int that indicates where in the collection the enumerator should reference - in other words whether the Current property should retrieve the x , y , or z component of the vector.

The way we will work in this case is by treating location as an index and internally implementing the enumerator to access the Vector as an array. When accessing the Vector as an array, the valid indices are , 1 , and 2 - we will extend this by using -1 as the value that indicates the enumerator is before the start of the collection, and 3 to indicate that it is beyond the end of the collection. Hence, the initialization of this field to -1 in the VectorEnumerator constructor:

 public VectorEnumerator(Vector theVector)    {       this.theVector = theVector;   location = -1;   }

Notice the constructor also takes a reference to the Vector instance that we are to enumerate - this was supplied in the Vector.GetEnumerator() method:

 public IEnumerator GetEnumerator()    {   return new VectorEnumerator(this);   }

Dictionaries

Dictionaries represent a very sophisticated data structure that allows you to access an element based on some key, which can be of any data type you want. They are also known as maps or hash tables . Dictionaries are great for situations where you wish to store objects as if they were an array, but where you want to use some other data type rather than a numeric type to index into the structure. They also allow you to freely add and remove items, a bit like an ArrayList, but without the performance overhead of having to shift subsequent items in memory .

We will illustrate the kinds of situations in which dictionaries can be useful using the example that we will develop later in this section, the MortimerPhonesEmployees example. This example assumes that Mortimer Phones (the mobile phone company that we first introduced in Chapter 3) has some software that processes details of its employees. To that end, we need a data structure - something like an array - that contains data for employees . We assume that each Mortimer Phones employee is identified by an employee ID, which is a set of characters such as B342 or W435, and is stored as an EmployeeID object. The employee's details are stored as an EmployeeData object; for our example, this just contains the employee's ID, name, and salary.

Suppose we have this EmployeeID :

   EmployeeID id = new EmployeeID("W435");

and we have a variable called employees , which we can treat syntactically as an array of EmployeeData objects. In actuality, it is not an array - it is a dictionary, and because it is a dictionary, we can get the details of an employee with the ID declared above like this:

   EmployeeData theEmployee = employees[id];     // Note that id is NOT a numeric type - it is an EmployeeID instance

That's the power of dictionaries. They look like arrays (but are more powerful than that; they are more like ArrayList s since you can dynamically set their capacity, and add and remove elements), but you don't have to use an integer to index into them; you can use any data type you want. For a dictionary, this is called a key rather than an index. Roughly speaking, what happens is that the dictionary takes the key supplied when you access an element (in the above example this is the ID object) and it does some processing on the value of this key. This processing returns an integer that depends on the value of the key, and is used to work out where in the 'array' the entry should be stored or retrieved from. Other examples where you would use a dictionary to store objects include:

If you wish to store details of employees or other people, indexed by their social security numbers Although the social security number is basically an integer, you still couldn't use an array with social security numbers as the index since a US social security number can theoretically go up to the value 999999999. On a 32-bit system you'd never fit an array that big in a program's address space! Most of the array would be empty anyway. Using a dictionary, we can have a social security number to index an employee, but still keep the dictionary size small.
If you want to store addresses, indexed by zip code. In the USA, zip codes are just numbers, but in Canada they have letters in too. In the UK, the equivalent (postal codes) are strings that contain both letters and numbers.
Any data for objects or people that you wish to store, indexed by the name of the object or person.

Although the effect of a dictionary is that it looks to client code much like a dynamic array with a very flexible means of indexing into it, there is a lot of work that goes on behind the scenes to bring this about. Though in principle an object of any class can be used as the key to index into a dictionary, you do need to implement certain features on a class before it can be usefully used as a key. This also crucially involves the GetHashCode() method that all classes and structs inherit from System.Object . In this section, we will take a closer look under the hood at what a dictionary is, how it works, and how GetHashCode() is involved. Then, we will move on to our MortimerPhonesEmployees example, which demonstrates both how to use a dictionary, and how to set up a class so that it can be used as a key.

Dictionaries in Real Life

The name 'dictionary' is used because the structure is very similar to a real-life dictionary. In a real dictionary you will normally want to look up the meaning of a word (or in the case of a foreign dictionary, the details of how to translate a word). The couple of lines of text that give the meaning (or the translation) is the data that you are really interested in. The fact that a large dictionary will have tens of thousands of data items in it is no problem when you want to look up a meaning, because you just look for the word in alphabetical order. In a sense, the word you are looking up is equivalent to the key that you use to get at the data you are really interested in. It is not really the word itself you are interested in so much as the data associated with it. The word just provides the means to locate the entry in the dictionary. This means that there are really three things here that you need to make a dictionary:

The data you want to look up
The key
The algorithm that allows you to find where the data is in the dictionary

The algorithm is a crucial part of the dictionary. Just knowing what the key is isn't sufficient - you also need a way that you can use the key to find out the location of the item in the data structure. In real-life dictionaries, this algorithm is provided by arranging words in alphabetical order.

Dictionaries in .NET

In .NET, the basic dictionary is represented by the class Hashtable , which works on the same principles as a real-life dictionary, except that it assumes that the key and item are both of type Object . This means that a Hashtable can store whatever data structure you want - whereas a real-life dictionary uses strings as the keys.

Although Hashtable represents the generic will-store-anything dictionary, it is permissible to define your own more specialized dictionary classes. Microsoft has provided an abstract base class, DictionaryBase , which provides basic dictionary functionality, and from which you can derive your classes. There is also a ready-made .NET base class, System.Collections.Specialized.StringDictionary , which you should use in place of Hashtable if your keys are strings.

When you create a Hashtable object, you can indicate its initial capacity, just as you would do for StringBuilder and ArrayList :

   Hashtable employees = new Hashtable(53);

As usual there are many other constructors, but this is the one you will probably most commonly be using. Notice the unusual size of the initial capacity that I've chosen : 53. There is a good reason for this. Due to the internal algorithms used in dictionaries, they work most efficiently if their capacity is a prime number.

Adding an object to the Hashtable is done with the Add() method, but Hashtable.Add() takes two parameters, both of them object references. The first is a reference to the key; the second is a reference to the data. Carrying on with the EmployeeID and EmployeeData classes from the example that we will develop soon:

   EmployeeID id;     EmployeeData data;     // initialize id and data to refer to some employee     // assume employees is a Hashtable instance     //that contains EmployeeData references     employees.Add(id, data);

In order to retrieve the data for an item, you need to supply the key. Hashtable implements an indexer so that you can retrieve data - this is how we get the array syntax we saw earlier:

   EmployeeData data = employees[id];

You can also remove items from the dictionary, by supplying the key of the object to be removed:

   employees.Remove(id);

You can also find out how many items are in the hash table using the Count property:

   int nEmployees = employees.Count;

Notice, however, that there is no Insert() method. We have not yet looked at how a dictionary works internally, but there is no difference between adding and inserting data. Unlike an array or an ArrayList , you don't find one big block of data at the beginning of the structure and an empty block at the end. Instead, the situation looks more like the following diagram, in which any unmarked parts of the dictionary are empty:

When you add an entry, it will actually be placed at some location that could be anywhere in the dictionary. How the location is worked out from the key is something that you don't need to know about when you are using the dictionary. The important point is that the algorithm used to work out the location of an item is reliable. As long as you remember what the key is, you can just hand it to the Hashtable object, and it will be able to use the key to quickly work out where the item is and retrieve it for you. We will examine how the algorithm works later in this section. We will just say here that it relies on the key's GetHashCode() method.

Note that the above diagram is simplified. Each key/entry pair is not actually stored inside the dictionary structure - as usual for reference types, what is stored are the object references that indicate where on the heap the objects themselves are located.

How the Dictionary Works

So far, we've seen that dictionaries (hash tables) are extremely convenient to use, but there is a snag: Hashtable (and indeed any other dictionary class) uses some sort of algorithm to work out where to place each object based on the key, and that algorithm isn't entirely provided by the Hashtable class. It has two stages, and the code for one of these stages must be provided by the key class. If you are using a class that Microsoft has written, and which may possibly be used as a key (such as String ), then that's no problem (Microsoft will have written all the code already), but if the key class is one that you have written yourself, then you will have to write this part of the algorithm yourself.

In computer parlance, the part of the algorithm implemented by the key class is known as a hash (hence the term hash table ), and the Hashtable class looks in a very particular place for the hash algorithm. It looks in your object's GetHashCode() method, which it inherits from System.Object() that if you override GetHashCode() , there are fairly stringent requirements on how you do it, because your implementation needs to behave in certain ways for dictionary classes to work correctly (though if you don't intend your class to ever be used as a key in a dictionary, there's no need to override GetHashCode() ).

The way it works is that GetHashCode() returns an int , and it somehow uses the value of the key to generate this int . Hashtable will take this int and do some other processing on it that involves some sophisticated mathematical calculations, and which returns the index of where in the dictionary an item with the given hash should be stored. We won't go into this part of the algorithm - that part has already been coded by Microsoft, so we don't need to know about it, but we will say that it involves prime numbers and is the reason why the hash table capacity should be a prime number.

For this to work properly, there are some fairly strict requirements for the GetHashCode() override, which we will look at here. These requirements are going to sound quite abstract and daunting, but don't worry too much. As our MortimerPhonesEmployees sample will demonstrate , it is not at all difficult to code up a key class that satisfies these requirements:

It should be fast (because placing or retrieving entries in a dictionary is supposed to be fast)
It must be consistent - if you regard two keys as representing the same value, then they must give the same value for the hash
It should ideally give values that are likely to be evenly distributed across the entire range of numbers that an int can store

The reason for this last condition is because of a potential problem; what happens if you get two entries in the dictionary whose hashes both give the same index?

If this happens, the dictionary class will have to start fiddling about looking for the nearest available free location to store the second item - and will have to do some searching in order to retrieve this item later on. This is obviously going to hurt performance, and clearly, if lots of your keys are tending to give the same indexes for where they should be stored, this kind of clash becomes more likely. Due to the way Microsoft's part of the algorithm works, this risk is minimized when the calculated hash values are evenly distributed between int.MinValue and int.MaxValue .

The risk of clashes between keys also increases as the dictionary gets more full, so it's normally a good idea to make sure the capacity of the dictionary is substantially greater than the number of elements actually in it. For this reason, the Hashtable will automatically relocate in order to increase its capacity well before it actually becomes full. The proportion of the table that is full is termed the load , and you can set the maximum value that you want the load to reach before the Hashtable relocates in another of the Hashtable constructors:

   // capacity =50, Max Load = 0.5     Hashtable employees = new Hashtable(50, 0.5);

The smaller you make the maximum load, the more efficiently your hash table will work, but the more memory it will occupy. Incidentally, when a hash table relocates in order to increase its capacity, it always chooses a prime number as its new capacity.

Another important point we listed above is that the hashing algorithm must be consistent. If two objects contain what you regard as the same data, then they must give the same hash value, and this is where we come to the important restrictions on how you override the Equals() and GetHashCode() methods of System.Object . You see, the way that the Hashtable determines whether two keys A and B are equal is that it calls A.Equals(B) . This means you must ensure that the following is always true:

Important

If A.Equals(B) is true , then A.GetHashCode() and B.GetHashCode() must always return the same hash code.

This probably seems a fairly subtle point, but it is crucial. If you contrived some way of overriding these methods so that the above statement is not always true, then a hash table that uses instances of this class as its keys will simply not work properly. You will find funny things happening. For example, you might place an object in the hash table and then find you can never retrieve it, or you might try to retrieve an entry and get the wrong entry returned.

For this reason, the C# compiler will display a compilation warning if you supply an override for Equals() , but don't supply an override for GetHashCode() .

For System.Object this condition is true, because Equals() simply compares references, and GetHashCode() actually returns a hash that is based solely on the address of the object. This means that hash tables based on a key that doesn't override these methods will work correctly. However, the problem with this way of doing things is that keys are regarded as equal only if they are the same object. That means that when you place an object in the dictionary, you then have to hang on to the reference to the key. You can't simply instantiate another key object later that has the same value, because the same value is defined as meaning the very same instance. This means that if you don't override the Object versions of Equals() and GetHashCode() , your class won't be very convenient to use in a hash table. It makes more sense to implement GetHashCode() to generate a hash based on the value of the key rather than its address in memory. This is why you will invariably need to override GetHashCode() and Equals() for any class that you want to be used as a key.

Incidentally, System.String has had these methods overloaded appropriately. Equals() has been overloaded to provide value comparison, and GetHashCode() has also been correspondingly overloaded to return a hash based on the value of the string. For this reason it is convenient to use strings as keys in a dictionary.

The MortimerPhonesEmployees Example

The MortimerPhonesEmployees example is a program that sets up a dictionary of employees. As mentioned earlier, the dictionary is indexed using EmployeeID objects, and each item stored in the dictionary is an EmployeeData object that stores details of an employee. The program simply instantiates a dictionary, adds a couple of employees to it, and then invites the user to type in employee IDs. For each ID the user types in, the program attempts to use the ID to index into the dictionary and retrieve the employee's details. The process iterates until the user types in X . The example, when run, looks like this:

  MortimerPhonesEmployees  Enter employee ID (format:A999, X to exit)> B001 Employee: B001: Mortimer             100,000.00 Enter employee ID (format:A999, X to exit)> W234 Employee: W234: Arabel Jones         10,000.00 Enter employee ID (format:A999, X to exit)> X

This example contains a number of classes. In particular, we need the EmployeeID class, which is the key used to identify employees, and the EmployeeData class that stores employee data. We will examine the EmployeeID class first, since this is the one where all the action happens in terms of preparing it to be used as a dictionary key. The definition of this class is as follows :

   class EmployeeID     {     private readonly char prefix;     private readonly int number;     public EmployeeID(string id)     {     prefix = (id.ToUpper())[0];     number = int.Parse(id.Substring(1,3));

   }     public override string ToString()     {     return prefix.ToString() + string.Format("{0,3:000}", number);     }     public override int GetHashCode()     {     return ToString().GetHashCode();     }     public override bool Equals(object obj)     {     EmployeeID rhs = obj as EmployeeID;     if (rhs == null)     return false;     if (prefix == rhs.prefix && number == rhs.number)     return true;     return false;     }     }

The first part of the class definition simply stores the actual ID. Recall that we said the ID takes a format such as B001 or W234 . In other words, it consists of a single letter prefix, followed by three numeric characters. We store this as a char for the prefix and an int for the remainder of the code.

The constructor simply takes a string and breaks it up to form these fields. Note that to keep the example simple, no error checking is performed. We will just assume the string passed into the constructor is in the correct format. The ToString() method simply returns the ID as a string:

 return prefix.ToString() + string.Format("{0,3:000}", number);

Note the format specifier ( 3:000 ) that ensures the int containing the number is padded with zeros, so we get for example B001 , and not B1 .

Now we come to the two method overrides that we need for the dictionary. First, we have overridden Equals() so that it compares the values of EmployeeID instances:

 public override bool Equals(object obj)       {          EmployeeID rhs = obj as EmployeeID;          if (rhs == null)             return false;          if (prefix == rhs.prefix && number == rhs.number)             return true;          return false;       }    }

This is the first time we have seen an example of an override of Equals() . Notice that our first task is to check whether the object passed as a parameter is actually an EmployeeID instance. If it isn't, then it obviously isn't going to equal this object, so we return false . We test the type by attempting to cast it to EmployeeID using C#'s as keyword. Once we have established that we have an EmployeeID object, we just compare the values of the fields to see if they contain the same values as this object.

Next, we look at GetHashCode() . The implementation of this is shorter, though at first sight it is perhaps harder to understand what's going on:

   public override int GetHashCode()     {     string str = this.ToString();     return str.GetHashCode();     }

Earlier, we listed some strict requirements that the calculated hash code had to satisfy . Of course, there are all sorts of ways to devise simple and efficient hashing algorithms. Generally, taking the fields, multiplying them by large prime numbers, and adding the results together is a good way, but personally I'm not one for doing any more work than I have to, and Microsoft has already implemented a sophisticated, yet efficient hashing algorithm for the String class, so we may as well take advantage of that. String.GetHashCode() produces well-distributed numbers based on the contents of the string. It satisfies all the requirements of a hash code.

The only disadvantage of leveraging this method is that there is some performance loss associated with converting our EmployeeID class to a string in the first place. If you are concerned about that and need the last ounce of performance in your hashing algorithms, you will need to design your own hash. Designing hashing algorithms is a complex topic that we cannot go to in depth in this book. However, we will suggest one simple approach to the problem, which is to simply multiply numbers based on the component fields of the class by different prime numbers (for mathematical reasons, multiplying by different prime numbers helps to prevent different combinations of values of the fields from giving the same hash code). A suitable implementation of GetHashCode() would be:

   public override int GetHashCode()   // alternative implementation     {     return (int)prefix*13 + (int)number*53;     }

This particular example, will work more quickly than the ToString() -based algorithm that we use in the example, but has the disadvantage that the hash codes generated by different EmployeeIDs are less likely to be evenly spread across the range of int . Incidentally, the primitive numeric types do have GetHashCode() methods defined, but these methods simply return the value of the variable, and are hence not particularly useful. The primitive types aren't really intended to be used as keys.

Notice that our GetHashCode() and Equals() implementations do between them satisfy the requirements for equality that we mentioned earlier. With our override of Equals() , two EmployeeID objects will be considered equal if, and only if they have the same values of prefix and number , but if that's the case, ToString() will give the same value for both of them, and so they will give the same hash code. That's the crucial test that we must make sure we satisfy.

Next, we can look at the class that contains the employee data. The definition of this class is fairly basic and intuitive:

   class EmployeeData     {     private string name;     private decimal salary;     private EmployeeID id;     public EmployeeData(EmployeeID id, string name, decimal salary)     {     this.id = id;     this.name = name;     this.salary = salary;     }     public override string ToString()     {     StringBuilder sb = new StringBuilder(id.ToString(), 100);     sb.Append(": ");     sb.Append(string.Format("{0,-20}", name));     sb.Append(" ");     sb.Append(string.Format("{0:C}", salary));     return sb.ToString();     }     }

Notice how once again for performance reasons, we use a StringBuilder object to generate the string representation of an EmployeeData object. Finally, we create the test harness. This is defined in a class, TestHarness :

   class TestHarness     {     Hashtable employees = new Hashtable(31);     public void Run()     {     EmployeeID idMortimer = new EmployeeID("B001");     EmployeeData mortimer = new EmployeeData(idMortimer, "Mortimer",     100000.00M);     EmployeeID idArabel = new EmployeeID("W234");     EmployeeData arabel= new EmployeeData(idArabel, "Arabel Jones",     10000.00M);     employees.Add(idMortimer, mortimer);     employees.Add(idArabel, arabel);     while (true)     {     try     {     Console.Write("Enter employee ID (format:A999, X to exit)> ");     string userInput = Console.ReadLine();     userInput = userInput.ToUpper();     if (userInput == "X")     return;     EmployeeID id = new EmployeeID(userInput);     DisplayData(id);     }     catch (Exception e)     {     Console.WriteLine("Exception occurred. Did you use the correct     format for the employee ID?");     Console.WriteLine(e.Message);     Console.WriteLine();     }     Console.WriteLine();     }     }     private void DisplayData(EmployeeID id)     {     object empobj = employees[id];     if (empobj != null)     {     EmployeeData employee = (EmployeeData)empobj;     Console.WriteLine("Employee: " + employee.ToString());     }     else     Console.WriteLine("Employee not found: ID = " + id);     }     }

TestHarness contains the member field, which actually is the dictionary.

As usual for a dictionary, we have set the initial capacity to a prime number; in this case, 31 . The guts of the test harness are in the Run() method. This method first sets up details for a couple of employees - mortimer and arabel - and adds their details to the dictionary:

 employees.Add(idMortimer, mortimer);       employees.Add(idArabel, arabel);

Next, we enter the while loop that repeatedly asks the user to input an employeeID . There is a try block inside the while loop, which is just there to trap any problems caused by the user typing in something that's not the correct format for an EmployeeID , which would cause the EmployeeID constructor to throw an exception when it tries to construct an ID from the string:

 string userInput = Console.ReadLine();    userInput = userInput.ToUpper();    if (userInput == "X")       return;    EmployeeID id = new EmployeeID(userInput);

If the EmployeeID was constructed correctly, we display the associated employee by calling a method, DisplayData() . This is the method in which we finally get to access the dictionary with array syntax. Indeed, retrieving the employee data for the employee with this ID is the first thing we do in this method:

 private void DisplayData(EmployeeID id)    {   object empobj = employees[id];

If there is no employee with that ID in the dictionary, then employees[id] will return null , which is why we check for a null reference and display an appropriate error message if we find one. Otherwise, we simply cast our returned empobj reference to an EmployeeData (remember that Hashtable is a very generic dictionary class; as far as it is concerned, it is storing objects, so retrieving an element from it will return an object reference, which we need to explicitly cast back to the type that we originally placed in the dictionary.) Once we have our EmployeeID reference, we can simply display the employee data using the EmployeeData.ToString() method:

 EmployeeData employee = (EmployeeData)empobj;             Console.WriteLine("Employee: " + employee.ToString());

We have one final part of the code - the Main() method that kicks the whole sample off. This simply instantiates a TestHarness object and runs it:

 static void Main()       {          TestHarness harness = new TestHarness();          harness.Run();       }