Maps | Collections

Maps associate keys to values and cannot contain duplicate keys (i.e., each key can map to only one value; this is called one-to-one mapping). Maps differ from Sets in that Maps contain keys and values, whereas Sets contain only values. Three of the several classes that implement interface Map are Hashtable, HashMap and TreeMap. Hashtables and HashMaps store elements in hash tables, and TReeMaps store elements in trees. This section discusses hash tables and provides an example that uses a HashMap to store key/value pairs. Interface SortedMap extends Map and maintains its keys in sorted ordereither the elements' natural order or an order specified by a Comparator. Class TReeMap implements SortedMap.

Map Implementation with Hash Tables

Object-oriented programming languages facilitate creating new types. When a program creates objects of new or existing types, it may need to store and retrieve them efficiently. Storing and retrieving information with arrays is efficient if some aspect of your data directly matches a numerical key value and if the keys are unique and tightly packed. If you have 100 employees with nine-digit Social Security numbers and you want to store and retrieve employee data by using the Social Security number as a key, the task would require an array with one billion elements, because there are one billion unique nine-digit numbers (000,000,000999,999,999). This is impractical for virtually all applications that use Social Security numbers as keys. A program that had an array that large could achieve high performance for both storing and retrieving employee records by simply using the Social Security number as the array index.

There are numerous applications that have this problem, namely, that either the keys are of the wrong type (e.g., not positive integers that correspond to array subscripts) or they are of the right type, but sparsely spread over a huge range. What is needed is a high-speed scheme for converting keys such as Social Security numbers, inventory part numbers and the like into unique array indices. Then, when an application needs to store something, the scheme could convert the application's key rapidly into an index, and the record of information could be stored at that slot in the array. Retrieval is accomplished the same way: Once the application has a key for which it wants to retrieve a data record, the application simply applies the conversion to the keythis produces the array index where the data is stored and retrieved.

The scheme we describe here is the basis of a technique called hashing. Why the name? When we convert a key into an array index, we literally scramble the bits, forming a kind of "mishmashed," or hashed, number. The number actually has no real significance beyond its usefulness in storing and retrieving a particular data record.

A glitch in the scheme is called a collisionthis occurs when two different keys "hash into" the same cell (or element) in the array. We cannot store two values in the same space, so we need to find an alternative home for all values beyond the first that hash to a particular array index. There are many schemes for doing this. One is to "hash again" (i.e., to apply another hashing transformation to the key to provide a next candidate cell in the array). The hashing process is designed to distribute the values throughout the table, so the assumption is that an available cell will be found with just a few hashes.

Another scheme uses one hash to locate the first candidate cell. If that cell is occupied, successive cells are searched in order until an available cell is found. Retrieval works the same way: The key is hashed once to determine the initial location and check whether it contains the desired data. If it does, the search is finished. If it does not, successive cells are searched linearly until the desired data is found.

The most popular solution to hash-table collisions is to have each cell of the table be a hash "bucket," typically a linked list of all the key-value pairs that hash to that cell. This is the solution that Java's Hashtable and HashMap classes (from package java.util) implement. Both Hashtable and HashMap implement the Map interface. The primary differences between them are that HashMap is unsynchronized (multiple threads can modify a HashMap concurrently), and allows null keys and null values.

A hash table's load factor affects the performance of hashing schemes. The load factor is the ratio of the number of occupied cells in the hash table to the total number of cells in the hash table. The closer this ratio gets to 1.0, the greater the chance of collisions.

Performance Tip 19.7

The load factor in a hash table is a classic example of a memory-space/execution-time trade-off: By increasing the load factor, we get better memory utilization, but the program runs slower, due to increased hashing collisions. By decreasing the load factor, we get better program speed, because of reduced hashing collisions, but we get poorer memory utilization, because a larger portion of the hash table remains empty.

Hash tables are complex to program. Computer science students study hashing schemes in courses called "Data Structures" and "Algorithms." Java provides classes Hashtable and HashMap to enable programmers to use hashing without having to implement hash table mechanisms. This concept is profoundly important in our study of object-oriented programming. As discussed in earlier chapters, classes encapsulate and hide complexity (i.e., implementation details) and offer user-friendly interfaces. Properly crafting classes to exhibit such behavior is one of the most valued skills in the field of object-oriented programming. Figure 19.20 uses a HashMap to count the number of occurrences of each word in a string.

Figure 19.20. HashMaps and Maps.

(This item is displayed on pages 944 - 945 in the print version)

 1 // Fig. 19.20: WordTypeCount.java
 2 // Program counts the number of occurrences of each word in a string
 3 import java.util.StringTokenizer;
 4 import java.util.Map;
 5 import java.util.HashMap;
 6 import java.util.Set;
 7 import java.util.TreeSet;
 8 import java.util.Scanner;
 9
10 public class WordTypeCount
11 {
12 private Map< String, Integer > map;
13 private Scanner scanner;
14
15 public WordTypeCount()
16 {
17 map = new HashMap< String, Integer >(); // create HashMap
18 scanner = new Scanner( System.in ); // create scanner
19 createMap(); // create map based on user input
20 displayMap(); // display map content
21 } // end WordTypeCount constructor
22
23 // create map from user input
24 private void createMap()
25 {
26 System.out.println( "Enter a string:" ); // prompt for user input
27 String input = scanner.nextLine();
28
29 // create StringTokenizer for input
30 StringTokenizer tokenizer = new StringTokenizer( input );
31
32 // processing input text
33 while ( tokenizer.hasMoreTokens() ) // while more input
34 {
35 String word = tokenizer.nextToken().toLowerCase(); // get word
36
37 // if the map contains the word
38 if ( map.containsKey( word ) ) // is word in map
39 {
40 int count = map.get( word ); // get current count
41 map.put( word, count + 1 ); // increment count
42 } // end if
43 else
44 map.put( word, 1 ); // add new word with a count of 1 to map
45 } // end while
46 } // end method createMap
47
48 // display map content
49 private void displayMap()
50 {
51 Set< String > keys = map.keySet(); // get keys
52
53 // sort keys
54 TreeSet< String > sortedKeys = new TreeSet< String >( keys );
55
56 System.out.println( "Map contains:
Key		Value" );
57
58 // generate output for each key in map
59 for ( String key : sortedKeys )
60 System.out.printf( "%-10s%10s
", key, map.get( key ) );
61
62 System.out.printf(
63 "
size:%d
isEmpty:%b
", map.size(), map.isEmpty() );
64 } // end method displayMap
65
66 public static void main( String args[] )
67 {
68 new WordTypeCount();
69 } // end main
70 } // end class WordTypeCount

Enter a string:
To be or not to be: that is the question Whether 'tis nobler to suffer
Map contains:
Key Value
'tis 1
be 1
be: 1
is 1
nobler 1
not 1
or 1
question 1
suffer 1
that 1
the 1
to 3
whether 1

size:13
isEmpty:false

Line 17 creates an empty HashMap with a default initial capacity (16 elements) and a default load factor (0.75)these defaults are built into the implementation of HashMap. When the number of occupied slots in the HashMap becomes greater than the capacity times the load factor, the capacity is doubled automatically. Note that HashMap is a generic class that takes two type arguments. The first type argument specifies the type of key (i.e., String), and the second type argument specifies the type of value (i.e., Integer). Recall that the type arguments passed to a generic class must be reference types, hence the second type argument is Integer, not int. Line 18 creates a Scanner that reads user input from the standard input stream. Line 19 calls method createMap (lines 2446), which uses a map to store the number of occurrences of each word in the sentence. Line 27 invokes method nextLine of scanner to obtain the user input, and line 30 creates a StringTokenizer to breaks the input string into its component individual words. This StringTokenier constructor takes a string argument and creates a StringTokenizer for that string and will use the whitespace to separate the string. The condition in the while statement in lines 3345 uses StringTokenizer method hasMoreTokens to determine whether there are more tokens in the string being tokenized. If so, line 35 converts the next token to lowercase letters. The next token is obtained with a call to StringTokenizer method nextToken that returns a String. [Note: Section 29.6 discusses class StringTokenizer in detail.] Then line 38 calls Map method containsKey to determine whether the word is in the map (and thus has occurred previously in the string). If the Map does not contain a mapping for the word, line 44 uses Map method put to create a new entry in the map, with the word as the key and an Integer object containing 1 as the value. Note that autoboxing occurs when the program passes integer 1 to method put, because the map stores the number of occurrences of the word as Integer. If the word does exist in the map, line 40 uses Map method get to obtain the key's associated value (the count) in the map. Line 41 increments that value and uses put to replace the key's associated value in the map. Method put returns the prior value associated with the key, or null if the key was not in the map.

Method displayMap (lines 4964) displays all the entries in the map. It uses HashMap method keySet (line 51) to get a set of the keys. The keys have type String in the map, so method keySet returns a generic type Set with type parameter specified to be String. Line 54 creates a treeSet of the keys, in which the keys are sorted. The loop in lines 5960 accesses each key and its value in the map. Line 60 displays each key and its value using format specifier %-10s to left justify each key and format specifier %10s to right justify each value. Note that the keys are displayed in ascending order. Line 63 calls Map method size to get the number of key-value pairs in the Map. Line 64 calls isEmpty, which returns a boolean indicating whether the Map is empty.