Examining Groups Of Objects | Pro Visual C++ 2005 for C# Developers

This section looks at how the .NET base classes support data structures that consist of a group of similar objects. Chapter 3, "Objects and Types," introduced the ordinary array, the simplest data structure of this kind. The ordinary array is an instance of the class System.Array namespace, but C# wraps its own syntax around this class. System.Array has two advantages: it is relatively efficient for accessing an individual element given its index, and it has its own C# syntax, which obviously makes using it more intuitive. However, it also has a huge disadvantage: you must specify its size when you instantiate it. There is no facility for adding, inserting, or removing elements later on. You also have to have a numeric index in order to be able to access an element. This is not particularly useful if, for example, you are dealing with a set of employee records and need to look up a given record from the name of the employee.

.NET has quite extensive support for a number of other data structures that are useful in different circumstances. Not only that, but there are also a number of interfaces that classes can implement in order to declare that they support all the functionality of a particular type of data structure. This chapter surveys many of these structures:

Collections
Array lists
Stacks
Queues
SortedLists
Dictionaries (also sometimes known as maps)

With the exception of the basic System.Array, all the data structure classes are located in the System.Collections namespace.

Note

The name System.Collections reflects another of those terminology ambiguities that plague computing. Collection is often used informally to denote any data structure. However, it also has the more specific meaning of a class that implements IEnumerable or ICollection — a particular type of data structure that you investigate later in this chapter. This chapter always uses the term collection to refer to the more specific meaning, except where .NET base class names force us to use it in the general sense.

Collections

The idea of a collection is that it represents a set of objects that you can access by stepping through each element in turn. In particular, you access the set of objects by using a foreach loop. For example, when you write something like the following code, you are assuming that the variable messageSet is a collection:

 foreach (string nextMessage in messageSet) {    DoSomething(nextMessage); }

The ability to use a foreach loop is the main purpose of collections. They offer little in the way of additional features.

Over the next couple of pages, you look in more detail at what a collection is, and you implement your own collection by converting the Vector example from Chapter 3. The broad concepts behind collections are actually not new to the .NET Framework. Collections have been a part of COM for years and have also been used in Visual Basic 6 with the convenient For...Each syntax. Java also has a foreach loop, and in both cases the underlying architecture is very similar to that for .NET collections.

What is a collection?

Internally, an object is a collection if it is able to supply a reference to a related object, known as an enumerator, which is able to step through the items in the collection. More specifically, a collection must implement the interface System.Collections.IEnumerable. IEnumerable defines just one method and looks like this:

 interface IEnumerable { IEnumerator GetEnumerator(); }

The purpose of GetEnumerator() is to return the enumerator object. As you can gather from the preceding code, the enumerator object is expected to implement an interface, System.Collections.IEnumerator.

Note

An additional collections interface, ICollection, is derived from IEnumerable. More sophisticated collections will implement this interface as well. Besides GetEnumerator(), it implements a property that returns the number of elements in the collection. It also features support for copying the collection to an array and can supply information indicating if it is thread-safe. However, this example only considers the simpler collection interface, IEnumerable.

IEnumerator looks like this:

 interface IEnumerator  { object Current { get; } bool MoveNext(); void Reset(); }

IEnumerator is intended to work like this — the object that implements it should be associated with one particular collection. When this object is first initialized, it does not yet refer to any elements in the collection, and you must call MoveNext(), which moves the enumerator so that it refers to the first element in the collection. You can then retrieve this element with the Current property. Current returns an object reference, so you will have to cast it to the type of object you are expecting to find in the collection. You can do whatever you want with that object, then move to the next item in the collection by calling MoveNext() again. You repeat this process until there are no more items in the collection — you will know this has happened when the Current property returns null.

You can return to the start of the collection by calling the Reset() method at any time. Note that Reset() actually returns to just before the start of the collection so if you call this method, you must call MoveNext() again to get to the first element.

You can see from this example that the point of the collection is simply to provide a way of stepping through all the elements when you don't want to supply an index, and you are happy to rely on the collection itself to choose the order in which the elements are returned to you. This usually means that you are not bothered about the order in which the elements are retrieved, as long as you get to see all of them, although in some cases a particular collection might be instructed to return the elements in a certain order.

In one sense, a collection is a very basic type of group of objects, because it does not allow you to add or remove items from the group. All you can do is retrieve the items in an order determined by the collection and examine them. It is not even possible to replace or modify items in the collection, because the Current property is read-only. The most frequent use of the collection is to give you the syntactical convenience of the foreach loop.

Arrays are considered collections, as should be obvious because the foreach command works successfully with arrays. For the particular case of arrays, the enumerator supplied by the System.Array class steps through the elements in increasing order of index from zero upward.

In fact, the previous foreach loop in C# is just a syntactical shortcut for writing the following code:

 { IEnumerator enumerator = MessageSet.GetEnumerator(); string nextMessage; enumerator.MoveNext(); while ( (nextMessage = enumerator.Current) != null) { DoSomething(nextMessage);   // We only have read access // to NextMessage enumerator.MoveNext(); } }

Note the enclosing curly braces framing the previous code snippet. They are supplied to ensure that this code has exactly the same effect as the earlier foreach loop. If they weren't included, this code would have differed to the extent that the nextMessage and enumerator variables would have remained in scope after the loop had finished being executed.

One important aspect of collections is that the enumerator is returned as a separate object. It should not be the same object as the collection itself. The reason is to allow for the possibility that more than one enumerator might be applied simultaneously to the same collection.

Adding collection support to the Vector struct

The Vector struct started in Chapter 3 is about to get another extension with collection support.

So far, the Vector struct contains a Vector instance with three components, x, y, and z, and because an indexer was defined in Chapter 3, it is possible to treat a Vector instance as an array, so that you can access the x-component by writing SomeVector[0], the y-component by writing SomeVector[1], and the z-component by writing SomeVector[2].

You now extend the Vector struct into a new code sample, the VectorAsCollection project, in which it is also possible to iterate through the components of a Vector by writing code like this:

 foreach (double component in someVector) Console.WriteLine("Component is " + component);

Your first task is to mark Vector as a collection by having it implement the IEnumerable interface. You start by modifying the declaration of the Vector struct:

 struct Vector : IFormattable, IEnumerable {    public double x, y, z;

Note that the IFormattable interface is present because you added support for string format specifiers earlier. Now you need to implement the IEnumerable interface:

 public IEnumerator GetEnumerator() { return new VectorEnumerator(this); }

The implementation of GetEnumerator() could hardly be simpler, but it depends on the existence of a new class, VectorEnumerator, which you need to define. Because VectorEnumerator is not a class that any outside code has to see directly, you declare it as a private class inside the Vector struct. Its definition looks like this:

 private class VectorEnumerator : IEnumerator { Vector theVector;      // Vector object that this enumerator refers to  int location;          // which element of theVector the enumerator is // currently referring to public VectorEnumerator(Vector theVector) { this.theVector = theVector; location = -1; } public bool MoveNext() { ++location; return (location > 2) ? false : true; } public object Current { get { if (location < 0 || location > 2) throw new InvalidOperationException( "The enumerator is either before the first element or " + "after the last element of the Vector"); return theVector[(uint)location]; } } public void Reset() { location = -1; } }

As required for an enumerator, VectorEnumerator implements the IEnumerator interface. It also contains two member fields, theVector, which is a reference to the Vector (the collection) that this enumerator is to be associated with, and location, an int that indicates the enumerator's reference point in the collection — put differently, whether the Current property should retrieve the x, y, or z component of the vector.

The way, you will work in this case is by treating location as an index and internally implementing the enumerator to access Vector as an array. When accessing Vector as an array, the valid indices are 0, 1, and 2 — you will extend this by using -1 as the value that indicates where the enumerator is before the start of the collection, and 3 to indicate that it is beyond the end of the collection. Hence, the initialization of this field to -1 in the VectorEnumerator constructor:

public  VectorEnumerator(Vector theVector) {    this.theVector = theVector; location = -1; }

Notice the constructor also takes a reference to the Vector instance that you are to enumerate — this was supplied in the Vector.GetEnumerator() method:

public IEnumerator GetEnumerator() { return new VectorEnumerator(this); }

Array Lists

An array list is very similar to an array, except that it has the ability to grow. It is represented by the class System.Collections.ArrayList.

The ArrayList class also has some similarities with the StringBuilder class. Just as a StringBuilder allocates enough space in memory to store a certain number of characters and allows you to manipulate characters within the space, the ArrayList allocates enough memory to store a certain number of object references. You can then efficiently manipulate these object references. If you try to add more objects to the ArrayList than permitted, it will automatically increase its capacity by allocating a new area of memory big enough to hold twice as many elements as the current capacity and relocate the objects to this new location.

The simplest form of instantiation of an ArrayList is illustrated here:

 ArrayList baseballTeams = new ArrayList();

With this type of instantiation, the ArrayList object is created with a capacity of 16 — meaning that there are 16 items in the array. In order to get at the ArrayList class, you need to make a reference to the System.Collections namespace in your project. You can also instantiate an ArrayList by indicating the initial capacity you want. For example:

 ArrayList baseballTeams = new ArrayList(20);

It is also possible to set the capacity of the ArrayList directly after a generic instantiation with the use of the Capacity property as shown in the following example:

 ArrayList baseballTeams = new ArrayList();  baseballTeams.Capacity = 20

Note, however, that changing the capacity causes the entire ArrayList to be reallocated to a new block of memory with the required capacity. Some other important notes to understand about the capacity of the ArrayList are in the area of capacity growth. Initially, as stated, the default capacity of an ArrayList object is set to 16 items. The idea of an ArrayList is that it can dynamically grow as needed. However, when you add the 17th element to the default ArrayList, what happens is that a new ArrayList object is created — but double the size (an ArrayList with 32 elements). Once the new ArrayList is created, the contents of the original ArrayList are copied over to the new ArrayList instance. Once copied over, the original ArrayList is then marked for garbage collection. This growth behavior is the same if you set the capacity of the ArrayList yourself. For instance, if you use the following construction:

 ArrayList baseballTeams = new ArrayList(20);

an ArrayList is created with 20 items. On adding the 21st item to the ArrayList, a new ArrayList is created — but this time, the new ArrayList created contains 40 items in the collection. Just remember that when the ArrayList is forced to dynamically grow, it simply doubles itself.

The number of elements in the ArrayList can be obtained with the Count property:

 int nBaseballTeams = baseballTeams.Count;

Once the ArrayList is instantiated, you can then add elements using the Add() method. This console application example uses the Add() method and then displays each of the items that were added:

 ArrayList baseballTeams = new ArrayList(); baseballTeams.Add("St. Louis Cardinals"); baseballTeams.Add("Seattle Mariners"); baseballTeams.Add("Florida Marlins"); foreach (string item in baseballTeams) { Console.Write(item + "\n"); } Console.Readline();

One thing to note about adding items to the ArrayList is that the ArrayList treats all its elements as object references. That means you can store whatever objects you like in an ArrayList, but when accessing the objects, you will need to cast them back to the appropriate data type, as shown here:

 string element1 = (string)baseballTeams[1];

In addition to casting this item from the ArrayList to a string, the preceding example is specifying a specific item in the collection based upon an indexer. In this example, element1 is assigned a value of Seattle Mariners. When adding items to the ArrayList, you can also specify the location of the item being added using the Insert() method. This is illustrated in the following example:

 baseballTeams.Insert(1, "San Francisco Giants"); // inserts at position 1

There is also a useful override of Insert that allows you to insert all the elements of a collection into an ArrayList, given an ICollection interface reference.

You can remove elements at a specific point in the collection with the use of the RemoveAt() method. This is done using the following construction:

 baseballTeams.RemoveAt(1); // removes object at position 1

You can also supply an object reference to another method, Remove(). However, it is important to note that this takes longer because it will cause the ArrayList to make a linear search through the array to find the object.

Note that adding or removing an element causes all subsequent elements to have to be correspondingly shifted in memory, even if no reallocation of the entire ArrayList is needed.

So far, you've seen the basics of adding and removing items from the ArrayList collection. Another way of adding items to the ArrayList is through the use of the AddRange() method. The AddRange() method allows you to add an entire collection of items at a time. Going back to the previous baseball teams example, add some additional items. This time though, add a collection of items that are represented in a string array. This is illustrated in the following example:

ArrayList baseballTeams = new ArrayList(); baseballTeams.Add("St. Louis Cardinals"); baseballTeams.Add("Seattle Mariners"); baseballTeams.Add("Florida Marlins"); string[] myStringArray = new string[2]; myStringArray[0] = "San Francisco Giants"; myStringArray[1] = "Los Angeles Dodgers"; baseballTeams.AddRange(myStringArray); foreach (string item in baseballTeams) {    Console.Write(item + "\n"); } Console.Readline();

This example creates a string array, adds two items to the array, and then uses the AddRange() method of the ArrayList object to add the entire string array in one line. Once added to the ArrayList, there are then five items in the collection, which are then displayed to the console.

The RemoveRange() method allows you to specify a range of items that are to be removed from a collection of items in the ArrayList. The RemoveRange() method takes two parameters — the first being the location to start from and the second being the number of items to remove from that point.

For instance, add this one line of code to the example and see what happens to the items in the ArrayList:

ArrayList baseballTeams = new ArrayList(); baseballTeams.Add("St. Louis Cardinals"); baseballTeams.Add("Seattle Mariners"); baseballTeams.Add("Florida Marlins"); string[] myStringArray = new string[2]; myStringArray[0] = "San Francisco Giants"; myStringArray[1] = "Los Angeles Dodgers"; baseballTeams.AddRange(myStringArray); baseballTeams.RemoveRange(2, 2); foreach (string item in baseballTeams) {    Console.Write(item + "\n"); } Console.Readline();

This use of the RemoveRange() method states that the items to be removed should start from the index position of 2 ("Florida Marlins") and that two items should be removed. This means that Florida Marlins and San Francisco Giants will be removed from the ArrayList collection of items.

Another interesting method of the ArrayList object is the GetRange() method. This is used if you want to copy over a specific range of items to a new ArrayList. For example, suppose that you have an ArrayList as shown earlier that contains five baseball teams. If you need a new ArrayList object that will contain only a subset of these five baseball teams, you would accomplish this task as shown here:

 ArrayList secondArray = new ArrayList(baseballTeams.GetRange(2, 2));

The GetRange() method works similarly to the AddRange() method. It takes two parameters — the first is the place in the index to start and the second is the number of items to copy. In this example, a new ArrayList object is created that copies two items from the previous ArrayList. The two items copied are at the index positions of 2 and 3, whose values are Florida Marlins and San Francisco Giants. In addition to the GetRange() method, you will also find the SetRange() and InsertRange() methods that are there for placing items in an ArrayList.

An array list can be really useful if you need to build up an array of objects but you do not know in advance how big the array is going to be. In that case, you can construct the array in an ArrayList, and then copy the ArrayList back to a plain old array when you have finished, provided that you actually need the data as an array (this would be the case, for example, if the array is to be passed to a method that expects an array as a parameter). The relationship between ArrayList and Array is in many ways similar to that between StringBuilder and String.

Unfortunately, unlike the StringBuilder class, there is no single method to do this conversion from an ArrayList to an Array. You have to use a loop to manually copy back references. Note, however, that you are only copying the references, not the objects, so this should not result in much of a performance hit:

 // strings is an ArrayList instance being used to store string instances string[] myStringArray = new string[baseballTeams.Count]; for (int i=0 ; i< baseballTeams.Count ; i++) myStringArray[i] = (string)baseballTeams[i];

The Stack Class

A Stack is another collection type that is ideal for working with data items that are considered temporary and are disposable after their use by your application. The Stack class creates collections in a structure called Last In First Out (LIFO). This means that the items that are pulled from the collection are in the reverse order in which they are input into the collection. This is illustrated in Figure 9-1.

image from book
Figure 9-1

Items are placed into a Stack using the Push() method, while items are pulled from the collection using the Pop() method. This section starts by looking at populating a Stack object and reading back the values. This is illustrated in the following console application example:

 Stack alphabet = new Stack(); alphabet.Push("A"); alphabet.Push("B"); alphabet.Push("C"); foreach (string item in alphabet) { Console.Write(item); } Console.ReadLine();

To code this example, you have to first make a reference to the System.Collections namespace in order to get at the Stack class. In this case, a Stack object is instantiated and using the Stack object's Push() method, the letters A, B, and C are added to the collection (in that order). Then the items of the Stack collection are iterated through and written to the console screen. This produces the following results:

CBA

In this example, items are read in the order of CBA because items are read from a Stack collection in the reverse order in which they were set into the collection.

It is also possible to use the Pop() method to pull items from a Stack object. However, the big difference in pulling them off the collection using the Pop() method and just iterating through them as shown in the preceding example is that when you use the Pop() method, the item is pulled from the collection permanently. For instance, take a look at the following code example, which utilizes the Pop() method:

Stack alphabet = new Stack(); alphabet.Push("A"); alphabet.Push("B"); alphabet.Push("C"); Console.Write("First Iteration: "); foreach (string item in alphabet) {    Console.Write(item); } Console.WriteLine("\nItem pulled from collection: " + alphabet.Pop().ToString()); Console.Write("Second iteration: "); foreach (string item in alphabet) { Console.Write(item); } Console.ReadLine();

In this case, the items A, B, and C are placed in the collection and written to the console screen. From here, one of the items is popped from the collection using:

alphabet.Pop()

The value of the item popped from the collection is then written to the screen. Then the alphabet collection is iterated through again, but this time you will notice that the letter C is missing from the collection. The reason is that this item was popped off the collection permanently. Running this application produces the following results to the console screen:

First iteration: CBA  Item pulled from collection: C  Second iteration: BA

In regards to the Pop() method, if you are interested in keeping the items as part of the collection, you can always use the Peek() method, which allows you to pull the latest item from the collection without removing it from the overall collection as the Pop() method would do.

The following table details some of the methods of the Stack class.

Method	Description
Clear	Removes all the items from the collection
Clone	Creates a shadow copy of the collection
CopyTo	Copies the collection to an existing one-dimensional array
GetEnumerator	Returns an enumerator for the collection
Peek	Returns the top-most object from the Stack without removing it from the entire collection
Pop	Returns the top-most object from the Stack and removes it from the collection
Push	Places items in the Stack
ToArray	Copies the Stack to a new array

The Queue Class

You will find that working with the Queue class and the collection it creates is similar to working with the Stack class. The big difference is that the Queue class creates collections in a structure referred to as First In First Out (FIFO). This means that items that are placed in the collection are pulled from the collection in the same order in which they were placed. This is illustrated in Figure 9-2.

image from book
Figure 9-2

Items are placed into the collection using the Enqueue() method, and they are pulled from the collection using the Dequeue() method. If you take the example from the Stack object and modify it so that it uses the Queue object instead, it will produce different results. The code for this example is shown here:

 Queue alphabet = new Queue(); alphabet.Enqueue("A"); alphabet.Enqueue("B"); alphabet.Enqueue("C"); Console.Write("First Iteration: "); foreach (string item in alphabet) { Console.Write(item); } Console.WriteLine("\nItem pulled from collection: " +  alphabet.Dequeue().ToString()); Console.Write("Second iteration: "); foreach (string item in alphabet) { Console.Write(item); } Console.ReadLine();

In this example, the Queue class is used to store the values A, B, and C. Once stored, the items are iterated through and written to the console screen. From here, one of the items is removed from the collection. Much like the Stack class's Pop() method, the Queue class uses the Dequeue() method and also permanently removes the item from the collection. This is illustrated in the second iteration through the collection. The result of running this application is shown here:

First iteration: ABC  Item pulled from collection: A  Second iteration: BC

The following table details some of the methods of the Queue class.

Method	Description
Clear	Removes all the items from the collection
Clone	Creates a shadow copy of the collection
Contains	Determines whether an element is in the collection
CopyTo	Copies the collection to an existing one-dimensional array
Dequeue	Returns the first-place object in the Queue and removes it from the collection
Enqueue	Places items in the Queue
GetEnumerator	Returns an enumerator for the collection
Peek	Returns the top-most object from the Queue without removing it from the entire collection
ToArray	Copies the Queue to a new array

The SortedList Class

The SortedList class creates collections that are meant to be worked with in a different manner than when working with FIFO/LIFO collections. In working with the items in a SortedList collection, each item placed in the collection is assigned an identifying key that is later used in reference to the item.

Like the ArrayList class, when you instantiate the SortedList object, it is initially given a default capacity of 16. You place items in the collection using the Add() method:

 SortedList footballTeams = new SortedList(); footballTeams.Add(1, "St. Louis Rams"); footballTeams.Add(2, "Kansas City Chiefs"); footballTeams.Add(3, "Indianapolis Colts"); for (int i = 0; i < footballTeams.Count; i++) { Console.WriteLine("KEY: " + footballTeams.GetKey(i) + "   VALUE: " + footballTeams.GetByIndex(i)); } Console.ReadLine();

In this example, three items are added to the SortedList collection. Each string is assigned an associated key for the item placed. In this case, the keys are of type int: 1, 2, and 3. However, you could just have easily used a string value for the key, as shown here:

 footballTeams.Add("Winner", "St. Louis Rams");

It is possible to iterate through values and keys of a SortedList collection by using either the GetKeyList() or GetValueList() methods. For instance, you can iterate through the values of the footballTeams collection as illustrated here:

 foreach (string item in footballTeams.GetValueList()) { Console.WriteLine(item); }

Here is an iteration through the keys of a SortedList collection:

 foreach (int item in footballTeams.GetKeyList()) { Console.WriteLine(item); }

Notice in this example that you cast the keys coming from the SortedList as of type int because this is the value you used. If you used string values for the keys, then, of course, you would have to cast item to string as well.

The following table details the methods of the SortedList class.

Method	Description
Add	Places an item in the collection
Clear	Removes all the items from the collection
Clone	Creates a shallow copy of the collection
ContainsKey	Determines whether an item is in the collection based upon a key value
ContainsValue	Determines whether an item is in the collection based upon an item value
CopyTo	Copies the collection to an existing one-dimensional array
GetByIndex	Returns the value of an item at a specified index
GetEnumerator	Returns an enumerator for the collection
GetKey	Returns the value of an item at a specified key
GetKeyList	Returns a collection of keys from the SortedList
GetValueList	Returns a collection of values from the SortedList
Remove	Removes an item from the collection based upon the item's key value
RemoveAt	Removes an item from the collection based upon the index value of the item
SetByIndex	Replaces the value at a specified index in the collection

Dictionaries and Hashtables

Dictionaries represent a very sophisticated data structure that allows you to access an element based on some key, which can be of any data type you want. They are also known as maps or hash tables. Dictionaries are great if you want to store objects as if they were an array, but where you want to use some other data type rather than a numeric type to index into the structure. They also allow you to add and remove items freely, a bit like an ArrayList, but without the performance overhead of having to shift subsequent items in memory.

The kinds of situations in which dictionaries can be useful are illustrated using the MortimerPhones Employees example, which you develop later in this section. This example assumes that Mortimer Phones (the mobile phone company that was first introduced in Chapter 3) has some software that processes details of its employees. To that end, you need a data structure — something like an array — that contains data for employees. You assume that each Mortimer Phones employee is identified by an employee ID, which is a set of characters such as B342 or W435, and is stored as an EmployeeID object. Each employee's details are stored as an EmployeeData object; for the example, this just contains the employee's ID, name, and salary.

Suppose you have this EmployeeID

 EmployeeID id = new EmployeeID("W435");

and a variable called employees, which you can treat syntactically as an array of EmployeeData objects. In actuality, this is not an array; it is a dictionary, and because it is a dictionary, you can get the details of an employee with the previously declared ID like this:

 EmployeeData theEmployee = employees[id]; // Note that id is NOT a numeric type – it is an EmployeeID instance

That's the power of dictionaries. They look like arrays (but are more powerful than that; they are more like ArrayLists because you can dynamically set their capacity, and add and remove elements), but you don't have to use an integer to index into them; you can use any data type you want. For a dictionary, this is called a key rather than an index. Roughly speaking, what happens is that the dictionary takes the key supplied when you access an element (in the preceding example this is the ID object) and it does some processing on the value of this key. This processing returns an integer that depends on the value of the key, and is used to work out where in the array the entry should be stored or retrieved from. Here is a short list of other examples where you might want to use a dictionary to store objects:

If you want to store details of employees or other people, indexed by their Social Security numbers. Although the Social Security number is basically an integer, you cannot use an array with Social Security numbers as the index because a U.S. Social Security number theoretically can go up to the value of 999999999. On a 32-bit system you'd never fit an array that big in a program's address space! Most of the array would be empty anyway. Using a dictionary, you can have a Social Security number to index an employee, but still keep the dictionary size small.
If you want to store addresses, indexed by zip code. In the United States, zip codes are just numbers, but in Canada and the United Kingdom they use letters and numbers together.
If you want to store any data for objects or people, indexed by the name of the object or person.

Although the effect of a dictionary is that it looks to client code much like a dynamic array with a very flexible means of indexing into it, a lot of work goes on behind the scenes to bring this about. In principle you can use an object of any class as an index key for dictionaries. However, you must implement certain features on a class before it can be used as a key. This also pertains to the GetHashCode() method that all classes and structs inherit from System.Object. This section takes a closer look under the hood at what a dictionary is, how it works, and how GetHashCode() is involved. Then, you move on to the MortimerPhonesEmployees example, which demonstrates how to use a dictionary and how to set up a class so that it can be used as a key.

Dictionaries in real life

The term dictionary is used because the structure is very similar to a real-life dictionary. In a real dictionary you will normally want to look up the meaning of a word (or in the case of a foreign dictionary, the details of how to translate a word). The couple of lines of text that give the meaning (or the translation) are the data you are really interested in. The fact that a large dictionary has tens of thousands of data items in it is no problem when you want to look up a meaning, because you just look for the word in alphabetical order. In a sense, the word you are looking up is equivalent to the key that you use to get at the data you are really interested in. It is not really the word itself you are interested in so much as the data associated with it. The word just provides the means to locate the entry in the dictionary. So, you need three things to build a dictionary:

The data you want to look up
The key
The algorithm that allows you to find where the data is in the dictionary

The algorithm is a crucial part of the dictionary. Just knowing the key is not sufficient — you also need a way that you can use the key to find out the location of the item in the data structure. In real-life dictionaries, this algorithm is provided by arranging words in alphabetical order.

Dictionaries in .NET

In .NET, the basic dictionary is represented by the class Hashtable, which works on the same principles as a real-life dictionary, except that it assumes that the key and item are both of type Object. This means that a Hashtable can store whatever data structure you want. By contrast, a real-life dictionary uses strings as the keys.

Although Hashtable represents the generic will-store-anything dictionary, it is permissible to define your own more specialized dictionary classes. Microsoft has provided an abstract base class, DictionaryBase, which provides basic dictionary functionality, and from which you can derive your classes. There is also a ready-made .NET base class, System.Collections.Specialized.StringDictionary, which you should use in place of Hashtable if your keys are strings.

When you create a Hashtable object, you can indicate its initial capacity, just as you would for StringBuilder and ArrayList:

 Hashtable employees = new Hashtable(53);

As usual, there are many other constructors, but this is the one you will probably use most often. Notice the unusual size of the initial capacity: 53. There is a good reason for this. Due to the internal algorithms used in dictionaries, they work most efficiently if their capacity is a prime number.

Adding an object to the Hashtable is done with the Add() method, but Hashtable.Add() takes two parameters, both of which are object references. The first is a reference to the key; the second is a reference to the data. Carrying on with the EmployeeID and EmployeeData classes from the example, you would instantiate the various objects as such:

 EmployeeID id;  EmployeeData data; // initialize id and data to refer to some employee // assume employees is a Hashtable instance  //that contains EmployeeData references  employees.Add(id, data);

To retrieve the data for an item, you need to supply the key. Hashtable implements an indexer so that you can retrieve data — this is how you get the array syntax discussed earlier:

 EmployeeData data = employees[id];

You can also remove items from the dictionary by supplying the key of the object to be removed:

 employees.Remove(id);

You can find out how many items are in the hash table using the Count property:

 int nEmployees = employees.Count;

Notice, however, that there is no Insert() method. You have not yet seen how a dictionary works internally, but there is no difference between adding and inserting data. Unlike an array or an ArrayList, you don't find one big block of data at the beginning of the structure and an empty block at the end. Instead, the situation looks more like the diagram in Figure 9-3, in which any unmarked parts of the dictionary are empty.

image from book
Figure 9-3

When you add an entry, it will actually be placed at some location that could be anywhere in the dictionary. How the location is worked out from the key is something that you don't need to know about when you are using the dictionary. The important point is that the algorithm used to work out the location of an item is reliable. As long as you remember what the key is, you can just hand it to the Hashtable object, and it will be able to use the key to quickly work out where the item is and retrieve it for you. You see how the algorithm works later in this section. (Hint: It relies on the key's GetHashCode() method.)

Note that the diagram in Figure 9-3 is simplified. Each key/entry pair is not actually stored inside the dictionary structure — as is common for reference types, what is stored are the object references that indicate where on the heap the objects themselves are located.

How the dictionary works

So far, you've seen that dictionaries (hash tables) are extremely convenient to use, but there is a snag: Hashtable (and indeed any other dictionary class) uses some sort of algorithm to work out where to place each object based on the key, and that algorithm isn't entirely provided by the Hashtable class. It has two stages, and the code for one of these stages must be provided by the key class. If you are using a class that Microsoft has written, and which can be used as a key (such as String), there's no problem(Microsoft will have written all the code already). However, if the key class is one that you have written, you will have to write this part of the algorithm yourself.

In computer parlance, the part of the algorithm implemented by the key class is known as a hash (hence the term hash table), and the Hashtable class looks in a very particular place for the hash algorithm. It looks in your object's GetHashCode() method, which it inherits from System.Object. Whenever a dictionary class needs to work out where an item should be located, it simply calls the key object's GetHashCode() method. This is why it was emphasized when discussing System.Object() that if you override GetHashCode(), there are fairly stringent requirements on how you do it, because your implementation needs to behave in certain ways for dictionary classes to work correctly. (If you don't intend your class to ever be used as a key in a dictionary, there's no need to override GetHashCode().)

The way it works is that GetHashCode() returns an int, and it somehow uses the value of the key to generate this int. Hashtable will take this int and do some other processing on it that involves some sophisticated mathematical calculations, and which returns the index of where in the dictionary an item with the given hash should be stored. We won't go into this part of the algorithm — that part has already been coded by Microsoft, so you don't need to know about it. What you should know is that it involves prime numbers and is the reason why the hash table capacity should be a prime number.

For this to work properly, you need to adhere to some fairly strict requirements for the GetHashCode() override, as mentioned earlier. These requirements are going to sound quite abstract and daunting, but don't worry too much. As the MortimerPhonesEmployees example demonstrates, it is not at all difficult to code a key class that satisfies these requirements:

It should be fast (because placing or retrieving entries in a dictionary is supposed to be fast).
It must be consistent; if you regard two keys as representing the same value, they must give the same value for the hash.
It should ideally give values that are likely to be evenly distributed across the entire range of numbers that an int can store.

The reason for this last condition is because of a potential problem; what happens if you get two entries in the dictionary whose hashes both give the same index?

If this happens, the dictionary class will have to start fiddling about looking for the nearest available free location to store the second item — and will have to do some searching in order to retrieve this item later on. This is obviously going to hurt performance, and clearly, if lots of your keys are tending to give the same indexes for where they should be stored, this kind of clash becomes more likely. However, because of the way Microsoft's part of the algorithm works, this risk is minimized when the calculated hash values are evenly distributed between int.MinValue and int.MaxValue.

The risk of clashes between keys also increases as the dictionary gets fuller, so it's normally a good idea to make sure the capacity of the dictionary is substantially greater than the number of elements actually in it. For this reason, Hashtable will automatically relocate to increase its capacity well before it actually becomes full. The proportion of the table that is full is termed the load, and you can set the maximum value that you want the load to reach before Hashtable relocates in another of the Hashtable constructors:

 // capacity =50, Max Load = 0.5 Hashtable employees = new Hashtable(50, 0.5);

The smaller the maximum load, the more efficiently your hash table works and the more memory it occupies. Incidentally, when a hash table relocates in order to increase its capacity, it always chooses a prime number as its new capacity.

Another important point mentioned earlier is that the hashing algorithm must be consistent. If two objects contain what you regard as the same data, they must give the same hash value, and this is where the important restrictions on how you override the Equals() and GetHashCode() methods of System.Object come in. You see, the way that the Hashtable determines whether two keys A and B are equal is that it calls A.Equals(B). This means you must ensure that the following is always true:

Important

If A.Equals(B) is true, then A.GetHashCode() and B.GetHashCode() must always return the same hash code.

This probably seems a fairly subtle point, but it is crucial. If you contrived some way of overriding these methods so that the preceding statement is not always true, a hash table that uses instances of this class as its keys will simply not work properly. Instead, you'll find funny things happening. For example, you might place an object in the hash table and then discover that you can never retrieve it, or you might try to retrieve an entry and get the wrong entry returned.

Note

For this reason, the C# compiler will display a compilation warning if you supply an override for Equals() but don't supply an override for GetHashCode().

For System.Object this condition is true, because Equals() simply compares references, and GetHashCode() actually returns a hash that is based solely on the address of the object. This means that hash tables based on a key that doesn't override these methods will work correctly. However, the problem with this way of doing things is that keys are regarded as equal only if they are the same object. That means that when you place an object in the dictionary, you then have to hang on to the reference to the key. You can't simply instantiate another key object later that has the same value, because the same value is defined as meaning the very same instance. This means that if you don't override the Object versions of Equals() and GetHashCode(), your class won't be very convenient to use in a hash table. It makes more sense to implement GetHashCode() to generate a hash based on the value of the key rather than its address in memory. This is why you will invariably need to override GetHashCode() and Equals() for any class that you want to be used as a key.

Incidentally, System.String has had these methods overloaded appropriately. Equals() has been overloaded to provide value comparison, and GetHashCode() has also been correspondingly overloaded to return a hash based on the value of the string. For this reason, it is convenient to use strings as keys in a dictionary.

The MortimerPhonesEmployees example

The MortimerPhonesEmployees example is a program that sets up a dictionary of employees. As mentioned earlier, the dictionary is indexed using EmployeeID objects, and each item stored in the dictionary is an EmployeeData object that stores details of an employee. The program simply instantiates a dictionary, adds a couple of employees to it, and then invites the user to type in employee IDs. For each ID the user types in, the program attempts to use the ID to index a dictionary and retrieve the employee's details. The process iterates until the user types in X. The example, when run, looks like this:

 MortimerPhonesEmployees Enter employee ID (format:A999, X to exit          B001 Employee: B001:                      $100,000.00 Enter employee ID (format:A999, X to exit)> W234 Employee: W234: Arabel Jones         $10,000.00 Enter employee ID (format:A999, X to exit)> X

This example contains a number of classes. In particular, you need the EmployeeID class, which is the key used to identify employees, and the EmployeeData class that stores employee data. We will examine the EmployeeID class first, because this is the one where all the action happens in terms of preparing it to be used as a dictionary key. The definition of this class is as follows:

 class EmployeeID { private readonly char prefix; private readonly int number; public EmployeeID(string id) { prefix = (id.ToUpper())[0]; number = int.Parse(id.Substring(1,3)); } public override string ToString() { return prefix.ToString() + string.Format("{0,3:000}", number); } public override int GetHashCode() { return ToString().GetHashCode(); } public override bool Equals(object obj) { EmployeeID rhs = obj as EmployeeID; if (rhs == null) return false; if (prefix == rhs.prefix && number == rhs.number) return true; return false; } }

The first part of the class definition simply stores the actual ID. Remember that the ID takes a format such as B001 or W234. It consists of a single letter prefix, followed by three numeric characters. You store this as a char for the prefix and an int for the remainder of the code.

The constructor simply takes a string and breaks it up to form these fields. Note that to keep the example simple, no error checking is performed. Just assume the string passed into the constructor is in the correct format. The ToString() method simply returns the ID as a string:

return prefix.ToString() + string.Format("{0,3:000}", number);

Note the format specifier (3:000) that ensures the int containing the number is padded with zeros, so you get, for example, B001, and not B1.

Now you come to the two method overrides that you need for the dictionary. First, you have overridden Equals() so that it compares the values of EmployeeID instances:

   public override bool Equals(object obj)    {       EmployeeID rhs = obj as EmployeeID;       if (rhs == null)          return false;       if (prefix == rhs.prefix && number == rhs.number)          return true;       return false;    } }

This is the first time you have seen an example of an override of Equals(). Notice that your first task is to check whether the object passed as a parameter is actually an EmployeeID instance. If it isn't, then it obviously isn't going to equal this object, so you return false. You test the type by attempting to cast it to EmployeeID using C#'s as keyword. Once you have established that you have an EmployeeID object, you just compare the values of the fields to see if they contain the same values as this object.

The implementation of GetHashCode() is shorter, though at first sight it is perhaps harder to understand what's going on:

 public override int GetHashCode() { string str = this.ToString(); return str.GetHashCode(); }

Earlier, you learned some strict requirements that the calculated hash code has to satisfy. Of course, all sorts of ways exist to devise simple and efficient hashing algorithms. Generally, taking the fields, multiplying them by large prime numbers, and adding the results together is a good way to do this. However, for convenience, Microsoft has already implemented a sophisticated, yet efficient hashing algorithm for the String class, so you may as well take advantage of that. String.GetHashCode() produces well- distributed numbers based on the contents of the string. It satisfies all the requirements of a hash code.

The only disadvantage of leveraging this method is that some performance loss is associated with converting your EmployeeID class to a string in the first place. If you are concerned about that and need the last ounce of performance in your hashing algorithms, you will need to design your own hash. Designing hashing algorithms is a complex topic that we cannot discuss in depth in this book. However, we will suggest one simple approach to the problem, which is to multiply numbers based on the component fields of the class by different prime numbers (for mathematical reasons, multiplying by different prime numbers helps to prevent different combinations of values of the fields from giving the same hash code). The following snippet shows a suitable implementation of GetHashCode():

 public override int GetHashCode()   // alternative implementation { return (int)prefix*13 + (int)number*53; }

This particular example works more quickly than the ToString()-based algorithm used in the example but has the disadvantage that the hash codes generated by different EmployeeIDs are less likely to be evenly spread across the range of int. Incidentally, the primitive numeric types do have GetHashCode() methods defined, but these methods simply return the value of the variable, and are hence not particularly useful. The primitive types aren't really intended to be used as keys.

Notice that the GetHashCode() and Equals() implementations do between them satisfy the requirements for equality mentioned earlier. With the override of Equals(), two EmployeeID objects will be considered equal if, and only if, they have the same values of prefix and number. However, in that case ToString() provides the same value for both of them, and so they will give the same hash code. That's the crucial test that must be satisfied.

Next is the class that contains the employee data. The definition of this class is fairly basic and intuitive:

 class EmployeeData { private string name; private decimal salary; private EmployeeID id; public EmployeeData(EmployeeID id, string name, decimal salary) { this.id = id; this.name = name; this.salary = salary; } public override string ToString() { StringBuilder sb = new StringBuilder(id.ToString(), 100); sb.Append(": "); sb.Append(string.Format("{0,-20}", name)); sb.Append(" "); sb.Append(string.Format("{0:C}", salary)); return sb.ToString(); } }

Notice how once again for performance reasons, you use a StringBuilder object to generate the string representation of an EmployeeData object. Finally, you create the test harness. This is defined in the TestHarness class:

 class TestHarness { Hashtable employees = new Hashtable(31); public void Run() { EmployeeID idMortimer = new EmployeeID("B001"); EmployeeData mortimer = new EmployeeData(idMortimer, "Mortimer", 100000.00M); EmployeeID idArabel = new EmployeeID("W234"); EmployeeData arabel= new EmployeeData(idArabel, "Arabel Jones", 10000.00M); employees.Add(idMortimer, mortimer); employees.Add(idArabel, arabel); while (true) { try { Console.Write("Enter employee ID (format:A999, X to exit)> "); string userInput = Console.ReadLine(); userInput = userInput.ToUpper(); if (userInput == "X") return; EmployeeID id = new EmployeeID(userInput); DisplayData(id); } catch (Exception e) { Console.WriteLine("Exception occurred. Did you use the correct format for the employee ID?"); Console.WriteLine(e.Message); Console.WriteLine(); } Console.WriteLine(); } } private void DisplayData(EmployeeID id) { object empobj = employees[id]; if (empobj != null) { EmployeeData employee = (EmployeeData)empobj; Console.WriteLine("Employee: " + employee.ToString()); } else Console.WriteLine("Employee not found: ID = " + id); } }

TestHarness contains the member field, which actually is the dictionary.

As usual for a dictionary, you set the initial capacity to a prime number; in this case, 31. The guts of the test harness are in the Run() method. This method first sets up details for two employees — mortimer and arabel — and adds their details to the dictionary:

employees.Add(idMortimer, mortimer); employees.Add(idArabel, arabel);

Next, you enter the while loop that repeatedly asks the user to input an employeeID. There is a try block inside the while loop, which is just there to trap any problems caused by the user typing in something that's not the correct format for an EmployeeID, which would cause the EmployeeID constructor to throw an exception when it tries to construct an ID from the string:

string userInput = Console.ReadLine(); userInput = userInput.ToUpper(); if (userInput == "X")    return; EmployeeID id = new EmployeeID(userInput);

If the EmployeeID is constructed correctly, you display the associated employee by calling a method, DisplayData(). This is the method in which you finally get to access the dictionary with array syntax. Indeed, retrieving the employee data for the employee with this ID is the first thing you do in this method:

private void DisplayData(EmployeeID id) { object empobj = employees[id];

If there is no employee with that ID in the dictionary, employees[id] will return null, which is why you check for a null reference and display an appropriate error message if you find one. Otherwise, you simply cast your returned empobj reference to an EmployeeData. (Remember that Hashtable is a very generic dictionary class; it is storing objects, so retrieving an element from it will return an object reference, which you need to cast back to the type that you originally placed in the dictionary.) Once you have your EmployeeID reference, you can simply display the employee data using the EmployeeData.ToString() method:

EmployeeData employee = (EmployeeData)empobj; Console.WriteLine("Employee: " + employee.ToString());

The final part of the code — the Main() method — kicks off the whole example. This simply instantiates a TestHarness object and runs it:

static void Main() {    TestHarness harness = new TestHarness();    harness.Run(); }

Generics

To make collections, a more powerful feature and also increase their efficiency and usability, generics were introduced to C# with the release of the .NET Framework 2.0. The idea of generics is nothing new. They are a tad similar to C++ templates. You can also find generics in other languages, such as Java. Their introduction into the .NET Framework 2.0 languages is a huge benefit for the developer.

Generics enable you to make a generic collection that is still strongly typed — providing fewer chances for errors (because they occur at design time), increasing performance, and giving you IntelliSense features when you are working with the collections.

The System.Collections.Generic namespace gives you access to generic versions of the Stack, Dictionary, SortedDictionary, List, and Queue classes.

Generics are covered in detail in Chapter 10.