Section 26.4. Non-Generic Collections


26.4. Non-Generic Collections

The System.Collections namespace in the .NET Framework Class Library is the primary source for non-generic collections. These classes provide standard implementations of many of the data structures discussed in Chapter 24 with collections that store references of type Object. In this section, we demonstrate classes ArrayList, Stack and Hashtable.

26.4.1. Class ArrayList

In most programming languages, conventional arrays have a fixed sizethey cannot grow or shrink dynamically to conform to an application's execution-time memory requirements. In some applications, this fixed-size limitation presents a problem for programmers. They must choose between using fixed-size arrays that are large enough to store the maximum number of elements the application may require, and dynamic data structures that can grow and shrink the amount of memory required to store data in response to an application's changing requirements at execution time.

The .NET Framework's ArrayList collection class mimics the functionality of conventional arrays and provides dynamic resizing of the collection through the class's methods. At any time, an ArrayList contains a certain number of elements less than or equal to its capacitythe number of elements currently reserved for the ArrayList. An application can manipulate the capacity with ArrayList property Capacity. If an ArrayList needs to grow, by default it doubles its Capacity.

Performance Tip 26.1

As with linked lists, inserting additional elements into an ArrayList whose current size is less than its capacity is a fast operation.


Performance Tip 26.2

It is a slow operation to insert an element into an ArrayList that needs to grow larger to accommodate a new element. An ArrayList that is at its capacity must have its memory reallocated and the existing values copied into it.


Performance Tip 26.3

If storage is at a premium, use method TRimToSize of class ArrayList to trim an ArrayList to its exact size. This will optimize the ArrayList's memory use. Be carefulif the application needs to insert additional elements, the process will be slower because the ArrayList must grow dynamically (trimming leaves no room for growth).


Performance Tip 26.4

The default capacity increment, doubling the size of the ArrayList, may seem to waste storage, but doubling is an efficient way for an ArrayList to grow quickly to "about the right size." This is a much more efficient use of time than growing the ArrayList by one element at a time in response to insert operations.


ArrayLists store references to Objects. All classes derive from class Object, so an ArrayList can contain objects of any reference type and boxed primitive-type values. Figure 26.4 lists some useful methods and properties of class ArrayList.

Figure 26.4. Some methods and properties of class ArrayList.

Method or property

Description

Add

Adds an Object to the ArrayList and returns an Integer specifying the index at which the Object was added.

Capacity

Property that gets and sets the number of elements for which space is currently reserved in the ArrayList.

Clear

Removes all the elements from the ArrayList.

Contains

Returns TRue if the specified Object is in the ArrayList; otherwise, returns False.

Count

Read-only property that gets the number of elements stored in the ArrayList.

IndexOf

Returns the index of the first occurrence of the specified Object in the ArrayList.

Insert

Inserts an Object at the specified index.

Remove

Removes the first occurrence of the specified Object.

RemoveAt

Removes an Object at the specified index.

RemoveRange

Removes a specified number of elements starting at a specified index in the ArrayList.

Sort

Sorts the ArrayList.

trimToSize

Sets the Capacity of the ArrayList to the number of elements the ArrayList currently contains (Count).


Figure 26.5 demonstrates class ArrayList and several of its methods. Class ArrayList belongs to the System.Collections namespace (line 3). Lines 68 declare two arrays of Strings (colors and removeColors) that we will use to fill two ArrayList objects. When the application begins execution, we create an ArrayList (named list) with an initial capacity of one element (line 11). The For Each statement in lines 1517 adds the five elements of array colors to list via ArrayList's Add method, so list grows to accommodate these new elements. Line 21 uses ArrayList's overloaded constructor to create a new ArrayList (named removeList) initialized with the contents of array removeColors. This constructor can initialize the contents of an ArrayList with the elements of any ICollection passed to it. Many of the collection classes have such a constructor. Note that the constructor call in line 21 initializes ArrayList removeList in a manner similar to lines 1517.

Figure 26.5. Using class ArrayList

  1  ' Fig. 26.5: ArrayListTest.vb  2  ' Using class ArrayList.  3  Imports System.Collections  4  5  Module ArrayListTest  6     Private colors As String() = _  7       {"MAGENTA","RED","WHITE","BLUE","CYAN"}  8     Private removeColors As String() = {"RED", "WHITE", "BLUE"}  9 10     Sub Main() 11        Dim list As New ArrayList(1) ' initial capacity of 1 12 13        ' add the elements of the colors array  14        ' to the ArrayList list 15        For Each color As String In colors 16           list.Add(color) ' add color to the ArrayList list 17        Next color 18 19        ' add elements in the removeColors array to 20        ' the ArrayList removeList with the ArrayList constructor 21        Dim removeList As New ArrayList(removeColors) 22 23        Console.WriteLine("ArrayList: ") 24        DisplayInformation(list) ' output the list 25 26        ' remove from ArrayList list the colors in removeList 27        RemoveColorNames(list, removeList) 28 29        Console.WriteLine(vbCrLf & _ 30           "ArrayList after calling RemoveColors: ") 31        DisplayInformation(list) ' output list contents 32     End Sub ' Main 33 34     ' displays information on the contents of an array list 35     Private Sub DisplayInformation(ByVal arrayList As ArrayList) 36        ' iterate through array list with a For Each statement 37        For Each element As Object In arrayList               38           Console.Write("{0} " , element) ' invokes ToString 39        Next element                                          40 41        ' display the size and capacity 42        Console.WriteLine(vbCrLf & "Size = {0}; Capacity = {1}", _ 43           arrayList.Count, arrayList.Capacity) 44 45        Dim index As Integer = arrayList.IndexOf("BLUE" ) 46 47        If index <> -1 Then 48           Console.WriteLine("The array list contains BLUE at index {0}.", _ 49              index) 50        Else 51           Console.WriteLine("The array list does not contain BLUE.") 52        End If 53     End Sub ' DisplayInformation 54 55     ' remove colors specified in secondList from firstList        56     Private Sub  RemoveColorNames(ByVal firstList As ArrayList, _ 57        ByVal secondList As ArrayList)                             58        ' iterate through second ArrayList like an array           59     For count As Integer = 0 To secondList.Count - 1              60        firstList.Remove(secondList(count))                        61     Next count                                                    62     End Sub ' RemoveColors                                        63  End Module ' ArrayListTest 


 ArrayList: MAGENTA RED WHITE BLUE CYAN Size = 5; Capacity = 8 The array list contains BLUE at index 3. ArrayList after calling RemoveColors: MAGENTA CYAN Size = 2; Capacity = 8 The array list does not contain BLUE. 



Line 24 calls method DisplayInformation (lines 3553) to output the contents of the list. This method uses a For Each statement to traverse the elements of an ArrayList. As we discussed in Section 26.3, the For Each statement is a convenient shorthand for calling ArrayList's GetEnumerator method and using an enumerator to traverse the elements of the collection. Also, we use an iteration variable of type Object because class ArrayList stores references to Objects.

We use properties Count and Capacity in lines 4243 to display the current number of elements and the maximum number of elements that can be stored without allocating more memory to the ArrayList. The output of Fig. 26.5 indicates that the ArrayList has capacity 8recall that an ArrayList doubles its capacity whenever it needs more space.

In line 45, we invoke method IndexOf to determine the position of the String "BLUE" in arrayList and store the result in local variable index. IndexOf returns -1 if the element is not found. Lines 4752 determine whether arrayList contains "BLUE". If it does (i.e., index is not equal to -1), we output the index. ArrayList also provides method Contains, which simply returns true if an object is in the ArrayList, and False otherwise. Method Contains is preferred if we do not need the index of the element.

Performance Tip 26.5

ArrayList methods IndexOf and Contains each perform a linear search, which is a costly operation for large ArrayLists. If the ArrayList is sorted, use ArrayList method BinarySearch to perform a more efficient search. Method BinarySearch returns the index of the element, or a negative number if the element is not found.


After DisplayInformation returns, we call method RemoveColorNames (lines 5662) with the two ArrayLists. Lines 5961 iterate through secondList. Line 60 uses an indexer to access an ArrayList element by following the ArrayList reference name with parentheses (()) containing the desired element's index. An ArgumentOutOfRangeException occurs if the specified index is not both greater than 0 and less than the number of elements currently stored in the ArrayList (specified by the ArrayList's Count property).

We use the indexer to obtain each of secondList's elements, then remove each one from firstList with the Remove method. This method deletes a specified item from an ArrayList by performing a linear search and removing (only) the first occurrence of the specified object. All subsequent elements shift toward the beginning of the ArrayList to fill the emptied position.

After the call to RemoveColorNames, line 31 again outputs the contents of list, confirming that the elements of removeList were, indeed, removed.

26.4.2. Class Stack

The Stack class implements a stack data structure and provides much of the functionality that we defined in our own implementation in Section 24.5. Refer to that section for a discussion of stack concepts. We created a test application in Fig. 24.14. to demonstrate our StackInheritance data structure. We adapt Fig. 24.14 in Fig. 26.6 to demonstrate the .NET Framework collection class Stack.

Figure 26.6. Demonstrating class Stack

  1  ' Fig. 26.6: StackTest.vb  2  ' Demonstrating class Stack.  3  Imports System.Collections  4  5  Module StackTest  6     Sub Main()  7        Dim stack As New Stack()' default Capacity of 10  8  9        ' create objects to store in the stack 10        Dim aBoolean As Boolean = True 11        Dim aCharacter As Char = "$"c 12        Dim anInteger As Integer = 34567 13        Dim aString As String = "hello" 14 15        ' use method Push to add items to (the top of) the stack 16        stack.Push(aBoolean) 17        PrintStack(stack) 18        stack.Push(aCharacter) 19        PrintStack(stack) 20        stack.Push(anInteger) 21        PrintStack(stack) 22        stack.Push(aString) 23        PrintStack(stack) 24 25        ' check the top element of the stack 26        Console.WriteLine("The top element of the stack is {0}" & _ 27           vbCrLf, stack.Peek()) 28 29        ' remove items from stack 30        Try 31           While True                                   32              Dim removedObject As Object = stack.Pop() 33              Console.WriteLine(removedObject & " popped") 34              PrintStack(stack)                            35           End While                                       36        Catch exception  AsInvalidOperationException 37           ' if exception occurs, print stack trace 38           Console.Error.WriteLine(exception) 39        End Try 40     End Sub'Main 41 42     ' print the contents of a stack 43     Private Sub PrintStack(ByVal stack As Stack) 44        If stack.Count = 0 Then 45           ' the stack is empty 46           Console.WriteLine("stack is empty" & vbCrLf) 47        Else 48           Console.Write("The stack is: ") 49 50           ' iterate through the stack with a For Each statement 51           For Each element As Object In stack                  52              Console.Write("{0} ", element) ' invokes ToString 53           Next element                                         54 55           Console.WriteLine(vbCrLf) 56        End If 57     End Sub ' PrintStack 58  End Module ' StackTest 


[View full width]

The stack is: True The stack is: $ True The stack is: 34567 $ True The stack is: hello 34567 $ True The top element of the stack is hello hello popped The stack is: 34567 $ True 34567 popped The stack is: $ True $ popped The stack is: True True popped stack is empty System.InvalidOperationException: Stack empty. at System.Collections.Stack.Pop() at StackTest.StackTest.Main() in C:\examples \ch26\Fig26_06\StackTest\ StackTest.vb:line 32



Line 3 Imports class Stack. Line 7 creates a Stack with the default initial capacity (10 elements). Class Stack has methods Push and Pop to perform the basic stack operations.

Method Push takes an Object as an argument and inserts it at the top of the Stack. If the number of items on the Stack (the Count property) is equal to the capacity at the time of the Push operation, the Stack grows to accommodate more Objects. Lines 16, 18, 20 and 22 use method Push to add four elements (a Boolean, a Char, an Integer and a String) to the stack. After each Push, we invoke method PrintStack (lines 4357) to output the contents of the stack. Note that this non-generic Stack class can store only references to Objects, so each of the value-type itemsthe Boolean, the Char and the Integerare implicitly boxed before they are added to the Stack. (Namespace System.Collections.Generic provides a generic Stack class that has many of the same methods and properties used in Fig. 26.6.)

Method PrintStack (lines 4357) uses Stack property Count to obtain the number of elements in Stack. If the stack is not empty (i.e., Count is not equal to 0), we use a For Each statement to iterate over the stack and output its contents by implicitly invoking the ToString method of each element. The For Each statement implicitly invokes Stack's GetEnumerator method, which we could have called explicitly to traverse the stack via an enumerator.

Method Peek returns the value of the top stack element, but does not remove the element from the Stack. We use Peek at line 27 to obtain the top object of the Stack, then output that object, implicitly invoking the object's ToString method. An InvalidOperationException occurs if the Stack is empty when the application calls Peek.

Method Pop takes no argumentsit removes and returns the object currently on top of the Stack. An infinite loop (lines 3135) pops objects off the stack and outputs them until the stack is empty. When the application calls Pop on the empty stack, an InvalidOperationException is thrown. The Catch block (lines 3638) outputs the exception, implicitly invoking the InvalidOperationException's ToString method to obtain its error message and stack trace.

Common Programming Error 26.3

Attempting to Peek or Pop an empty Stack (a Stack whose Count property is 0) causes an InvalidOperationException.


Although Fig. 26.6 does not demonstrate it, class Stack also has method Contains, which returns TRue if the Stack contains the specified object, and returns False otherwise.

26.4.3. Class Hashtable

When an application creates objects, it needs to manage those objects efficiently. This includes storing and retrieving objects. Storing and retrieving information with arrays is efficient if some aspect of your data directly matches the key value and if the keys are unique and tightly packed. If you have 100 employees with nine-digit Social Security numbers and you want to store and retrieve employee data by using the Social Security number as a key, it would nominally require an array with 999,999,999 elements, because there are 999,999,999 unique nine-digit numbers. If you have an array that large, you could get very high performance storing and retrieving employee records by simply using the Social Security number as the array index, but it would be a massive waste of memory.

Many applications have this problemeither the keys are of the wrong type (i.e., not non-negative integers), or they are of the right type, but they are sparsely spread over a large range.

What is needed is a high-speed scheme for converting keys such as Social Security numbers and inventory part numbers to unique array indices. Then, when an application needs to store something, the scheme could convert the application key rapidly to an index and the record of information could be stored at that location in the array. Retrieval occurs the same wayonce the application has a key for which it wants to retrieve the data record, the application simply applies the conversion to the key, which produces the array subscript where the data resides in the array and retrieves the data.

The scheme we describe here is the basis of a technique called hashing, in which we store data in a data structure called a hash table. Why the name? Because, when we convert a key into an array subscript, we literally scramble the bits, making a "hash" of the number. The number actually has no real significance beyond its usefulness in storing and retrieving this particular data record.

A glitch in the scheme occurs when collisions occur (i.e., two different keys "hash into" the same cell, or element, in the array). Since we cannot store two different data records in the same space, we need to find an alternative home for all records beyond the first that hash to a particular array subscript. One scheme for doing this is to "hash again" (i.e., to reapply the hashing transformation to the key to provide a next candidate cell in the array). The hashing process is designed to be quite random, it is assumed is that an available cell will be found with just a few hashes.

Another scheme uses one hash to locate the first candidate cell. If the cell is occupied, successive cells are searched linearly until an available cell is found. Retrieval works the same waythe key is hashed once, and the resulting cell is checked to determine whether it contains the desired data. If it does, the search is complete. If it does not, successive cells are searched linearly until the desired data is found.

The most popular solution to hash-table collisions is to have each cell of the table be a hash "bucket"typically, a linked list of all the keyvalue pairs that hash to that cell. This is the solution that the .NET Framework's Hashtable class implements.

The load factor affects the performance of hashing schemes. The load factor is the ratio of the number of objects stored in the hash table to the total number of cells of the hash table. As this ratio gets higher, the chance of collisions increases.

Performance Tip 26.6

The load factor in a hash table is a classic example of a space/time trade-off: By increasing the load factor, we get better memory utilization, but the application runs slower due to increased hashing collisions. By decreasing the load factor, we get better application speed because of reduced hashing collisions, but we get poorer memory utilization because a larger portion of the hash table remains empty.


The .NET Framework provides class Hashtable to enable you to easily employ hashing in applications. A hash function performs a calculation that determines where to place data in the hash table. The hash function is applied to the key in a keyvalue pair of objects. Class Hashtable can accept any Object as a key. For this reason, class Object defines method GetHashCode, which all classes inherit. Most classes that are candidates to be used as keys in a hash table override this method to provide one that performs efficient hash code calculations for a specific type. For example, a String has a hash code calculation that is based on the contents of the String. Figure 26.7 uses a Hashtable to count the number of occurrences of each word in a String.

Figure 26.7. Application counts the number of occurrences of each word in a String and stores this information in a hash table

  1  ' Fig. 26.7: HashtableTest.vb  2  ' Application counts the number of occurrences of each word in a String  3  ' and stores them in a hash table.  4  Imports System.Text.RegularExpressions  5  Imports System.Collections  6  7  Module HashtableTest  8     Sub Main()  9        ' create hash table based on user input 10        Dim table As Hashtable = CollectWords() 11 12        ' display hash table content 13        DisplayHashtable(table) 14     End Sub ' Main 15 16     ' create hash table from user input 17     Private Function CollectWords() As Hashtable 18        Dim table As New Hashtable() ' create a new hash table 19 20        Console.WriteLine("Enter a string: ") ' prompt for user input 21        Dim input As String = Console.ReadLine() ' get input 22 23        ' split input text into tokens 24        Dim words As String() = Regex.Split(input, "\s+") 25 26        ' processing input words 27        For Each word As String In words                                  28           Dim wordKey As String = word.ToLower() ' get word in lowercase 29         30           ' if the hash table contains the word                          31           If table.ContainsKey(wordKey) Then                             32              table(wordKey) = Convert.ToInt32(table(wordKey)) + 1        33           Else                                                           34              ' add new word with a count of 1 to hash table              35              table.Add(wordKey, 1)                                      36           End If                                                         37        Next                                                              38 39        Return table 40     End Function ' CollectWords 41 42     ' display hash table content 43     Private Sub DisplayHashtable(ByVal table As Hashtable) 44        Console.WriteLine(vbCrLf & "Hashtable contains:"& _ 45           vbCrLf & "{0,-12}{1,-12}", "Key:", "Value:") 46 47        ' generate output for each key in hash table 48        ' by iterating through the Keys property with a For Each statement 49        For Each key As Object In table.Keys                     50            Console.WriteLine("{0,-12}{1,-12}", key, table(key)) 51        Next                                                     52 53        Console.WriteLine(vbCrLf & "size: {0}", table.Count) 54     End Sub ' DisplayHashtable 55  End Module ' HashtableTest 


 Enter a string: As idle as a painted ship upon a painted ocean Hashtable contains: Key:        Value: painted     2 a           2 upon        1 as          2 ship        1 idle        1 ocean       1 size: 7 



Lines 45 contain Imports statements for the System.Text.RegularExpressions namespace (for class Regex, discussed in Chapter 16) and the System.Collections namespace (for class Hashtable). Class HashtableTest declares three methods. Method CollectWords (lines 1740) inputs a String and returns a Hashtable in which each value stores the number of times that word appears in the String and the word is used as the key. Method DisplayHashtable (lines 4354) displays in column format the Hashtable passed to it. The Main method (lines 814) invokes CollectWords (line 10), then passes the Hashtable returned by CollectWords to DisplayHashtable (line 13).

Method CollectWords (lines 1740) initializes local variable table with a new Hashtable (line 18) that has a default initial capacity of 0 elements and a default maximum load factor of 1.0. When the number of items in the Hashtable becomes greater than the number of cells times the load factor, the capacity is increased automatically. (This implementation detail is invisible to clients of the class.) Lines 2021 prompt the user and input a String. We use Shared method Split of class Regex in line 24 to divide the String into words using whitespace characters as delimiters. This creates an array of "words," which we then store in local variable words.

The For Each statement in lines 2737 iterates through the elements of array words. Each word is converted to lowercase with String method ToLower, then stored in variable wordKey (line 28). Line 31 calls Hashtable method ContainsKey to determine whether the word is in the hash table (and thus has occurred previously in the String). If the Hashtable does not contain an entry for the word, line 35 uses Hashtable method Add to create a new entry in the hash table, with the lowercase word as the key and an object containing 1 as the value. Note that autoboxing occurs when the application passes the Integer value 1 to method Add, because the hash table stores both the key and value as references to Objects.

Common Programming Error 26.4

Using the Add method to add a key that already exists in the hash table causes an ArgumentException.


If the word is already a key in the hash table, line 32 uses the Hashtable's indexer to obtain and set the key's associated value (the word count) in the hash table. We first downcast the value from an Object to an Integer. This unboxes the value so that we can increment it by 1. We then use the indexer to store the key's associated value. The incremented value is implicitly reboxed so that it can be stored in the hash table.

Invoking the Get accessor of a Hashtable indexer with a key that does not exist in the hash table returns a Nothing reference. Using the Set accessor with a key that does not exist in the hash table creates a new entry, as if you had used the Add method.

Line 39 returns the hash table to the Main method, which then passes it to method DisplayHashtable (lines 4354), which displays all the entries. This method uses read-only property Keys (line 49) to get an ICollection that contains all the keys. Because ICollection extends IEnumerable, we can use this collection in the For Each statement in lines 4951 to iterate over the keys of the hash table. This loop accesses and outputs each key and its value in a field width of -12. The negative field width indicates that the output should be left justified. Note that a hash table is not sorted, so the keyvalue pairs are not displayed in any particular order. Line 53 uses Hashtable property Count to get the number of keyvalue pairs in the Hashtable.

Lines 4951 could also have used the For Each statement with the Hashtable object itself, rather than the Keys property. If you use a For Each statement with a Hashtable object, the iteration variable will be of type Dictionary Entry. The enumerator of a Hashtable (or any other class that implements IDictionary) uses the DictionaryEntry structure to store keyvalue pairs. This structure provides properties Key and Value for retrieving the key and value of the current element. If you do not need the key, class Hashtable also provides a read-only Values property that gets an ICollection of all the values stored in the Hashtable.

Problems with Non-Generic Collections

In the word-counting application in Fig. 26.7, our Hashtable stores its keys and data as Object references, even though we store only String keys and Integer values by convention. This results in some awkward code. For example, line 32 was forced to unbox and box the Integer data stored in the Hashtable every time it incremented the count for a particular key. This is inefficient. A similar problem occurs in lines 49 and 50the iteration variable of the For Each statement is an Object reference. If we need to use any of its String-specific methods, we need an explicit downcast.

This can cause subtle bugs. Suppose we decide to improve the readability of Fig. 26.7 by using the indexer's Set accessor instead of the Add method to add a keyvalue pair in line 41, but accidentally type:

 table(wordKey) = wordKey ' initialize to 1 

This statement creates a new entry with a String key and String value instead of an Integer value of 1. Although the application will compile correctly, this is clearly incorrect. If a word appears twice, line 32 will try to downcast this String to an Integer, causing an InvalidCastException at execution time. The error that appears at execution time will indicate that the problem is in line 32, where the exception occurred, not in line 35. This makes the error more difficult to find and debug, especially in large software applications where the exception may occur in a different fileand even in a different assembly.

In Chapter 25, we introduced generics. In the next two sections, we demonstrate how to use generic collections, which can prevent the problem we just discussed.



Visual BasicR 2005 for Programmers. DeitelR Developer Series
Visual Basic 2005 for Programmers (2nd Edition)
ISBN: 013225140X
EAN: 2147483647
Year: 2004
Pages: 435

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net