Section 16.2. Implementing Map | Java Generics and Collections

16.2. Implementing Map

The implementations, eight in all, that the Collections Framework provides for Map are shown in Figure 16.2. We shall discuss HashMap, LinkedHashMap, WeakHashMap, IdentityHashMap, and EnumMap here; the interfaces NavigableMap, ConcurrentMap, and ConcurrentNavigableMap are discussed, along with their implementations, in the sections following this one.

Figure 16-2. The structure of Map implementations in the Collections Framework

For constructors, the general rule for Map implementations is like that for Collection implementations (see Section 12.3). Every implementation excluding EnumMap has at least two constructors; taking HashMap as an example, they are:

 public HashMap() public HashMap(Map<? extends K,? extends V> m)

The first of these creates an empty map, and the second a map that will contain the key-value mappings contained in the supplied map m. The keys and values of map m must have types that are the same as (or are subtypes of) the keys and values, respectively, of the map being created. Using this second constructor has the same effect as creating an empty map with the default constructor, and then adding the contents of map m using putAll. In addition to these two, the standard implementations have other constructors for configuration purposes.

16.2.1. HashMap

In discussing HashSet in Section 13.1.1, we mentioned that it delegates all its operations to a private instance of HashMap. Figure 16.3(a) is similar to Figure 13.2, but without the simplification that removed the value elements from the map (all elements in a HashSet are stored as keys with the same constant value). The discussion in Section 13.1 of hash tables and their performance applies equally to HashMap. In particular, HashMap provides constant-time performance for put and get. Although in principle constant-time performance is only attainable with no collisions, it can be closely approached by the use of rehashing to control the load and thereby to minimize the number of collisions.

Figure 16-3. HashMap and WeakHashMap

Iteration over a collection of keys or values requires time proportional to the capacity of the map plus the number of key-value mappings that it contains. The iterators are fail-fast.

Two constructors allow the programmer to configure a new instance of HashMap:

 public HashMap(int initialCapacity) public HashMap(int initialCapacity, float loadFactor)

These constructors are like those of HashSet, allowing specification of the initial capacity and, optionally, the load factor at which the table will be rehashed.

16.2.2. LinkedHashMap

Like LinkedHashSet (Section 13.1.2), the class LinkedHashMap refines the contract of its parent class, HashMap, by guaranteeing the order in which iterators return its elements. Unlike LinkedHashSet, however, LinkedHashMap offers a choice of iteration orders; elements can be returned either in the order in which they were inserted in the map, or in the order in which they were accessed (from least-recently to most-recently accessed). An access ordered LinkedHashMap is created by supplying an argument of true for the last parameter of the constructor:

 public LinkedHashMap(int initialCapacity,                      float loadFactor,                      boolean accessOrder)

Supplying false will give an insertion-ordered map. The other constructors, which are just like those of HashMap, also produce insertion-ordered maps. As with LinkedHashSet, iteration over a LinkedHashMap takes time proportional only to the number of elements in the map, not its capacity.

Access-ordered maps are especially useful for constructing Least Recently Used (LRU) caches. A cache is an area of memory that stores frequently accessed data for fast access. In designing a cache, the key issue is the choice of algorithm that will be used to decide what data to remove in order to conserve memory. When an item from a cached data set needs to be found, the cache will be searched first. Typically, if the item is not found in the cache, it will be retrieved from the main store and added to the cache. But the cache cannot be allowed to continue growing indefinitely, so a strategy must be chosen for removing the least useful item from the cache when a new one is added. If the strategy chosen is LRU, the entry removed will be the one least recently used. This simple strategy is suitable for situations in which an access of an element increases the probability of further access in the near future of the same element. Its simplicity and speed have made it the most popular caching strategy.

Cache construction with LinkedHashMap is further aided by removeEldestEntry, the single method that it adds to those inherited from HashMap:

 protected boolean removeEldestEntry(Map.Entry<K,V> eldest)

The contract for removeEldestEntry states that the methods put and putAll will call removeEldestEntry whenever a new entry is added to the map, passing to it the "eldest" entry. In an insertion-ordered map, the eldest entry will be the one that was least recently added to the map, but in an access-ordered map it is the one least recently accessed (and if some entries have never been accessed, it is the one amongst these which was least recently added). In LinkedHashMap itself, removeEldestEntry does nothing and returns false, but subclasses can override it to return true under some circumstances. The contract for this method specifies that although it may itself remove the eldest entry, it must return false if it has done so, since it is expected that a return value of true will cause its calling method to do the removal. A simple example of removeEldestEntry would allow a map to grow to a given maximum size and then maintain that size by deleting the eldest entry each time a new one is added:

 class BoundedSizeMap extends LinkedHashMap {   private int maxEntries;   public BoundedSizeMap(int maxEntries) {     super(16, 0.75f, true);     this.maxEntries = maxEntries;   }   protected boolean removeEldestEntry(Map.Entry eldest) {     return size() > maxEntries;   } }

A refinement of this simple example could take into account the entry supplied as the argument to removeEldestEntry. For example, a directory cache might have a set of reserved names which should never be removed, even if the cache continues to grow as a result.

Notice that an insertion-ordered LinkedHashMap that overrides removeEldestEntry as shown above will implement a FIFO strategy. FIFO caching has often been used in preference to LRU because it is much simpler to implement in maps that do not offer access ordering. However LRU is usually more effective than FIFO, because the reduced cost of cache refreshes outweighs the overhead of maintaining access ordering.

Iteration over a collection of keys or values returned by a LinkedHashMap is linear in the number of elements. The iterators over such collections are fail-fast.

16.2.3. WeakHashMap

An ordinary Map keeps ordinary ("strong") references to all the objects it contains. That means that even when a key has become unreachable by any means other than through the map itself, it cannot be garbage collected. Often, that's exactly what we want; in the example at the beginning of this chapter, where we mapped tasks to clients, we wouldn't have wanted a mapping to disappear just because we weren't holding a reference to the task object that we had put into the HashMap. To look up the value associated with a supplied key, the HashMap will look for a key that matches (in the sense of equals) the supplied onethey don't have to be physically the same object.

But suppose that the objects of the key class are uniquethat is, object equality is the same as object identity. For example, each object might contain a unique serial number. In this case, once we no longer have a referencefrom outside the mapto an object being used as a key, we can never look it up again, because we can never re-create it. So the map might as well get rid of the key-value pair and, in fact, there may be a strong advantage in doing so if the map is large and memory is in short supply. That is the idea that WeakHashMap implements.

Internally WeakHashMap holds references to its key objects through references of the class java.lang.ref.WeakReference (see Figure 16.3(b)). A WeakReference introduces an extra level of indirection in reaching an object. For example, to make a weak reference to to the string "code gui" you would write:

 WeakReference<String> wref = new WeakReference<String>("code gui");

And at a later time you would recover a strong reference to it using the get method of WeakReference:

 String recoveredStringRef = wref.get();

If there are no strong references to the string "code gui" (or to any substring of it returned from its subString method), the existence of the weak reference will not by itself prevent the garbage collector from reclaiming the object. So the recovered reference value recoveredStringRef may, or may not, be null.

To see how WeakHashMap can be useful, think of a tracing garbage collector that works by determining which objects are reachable, and reclaiming all others. The starting points for a reachability search include the static variables of currently loaded classes and the local variables currently in scope. Only strong references are followed to determine reachability, so the keys of a WeakHashMap will be available to be reclaimed if they are not reachable by any other route. Note that a key cannot be reclaimed if it is strongly referenced from the corresponding value. (including from the values they correspond to).

Before most operations on a WeakHashMap are executed, the map checks which keys have been reclaimed. (It's not enough to check if a key is null, because null is a legal value for keys in a WeakHashMap. The WeakReference mechanism allows you to tell the garbage collector to leave information in a ReferenceQueue each time it reclaims a weakly referenced object.) The WeakHashMap then removes every entry of which the garbage collector has reclaimed the key.

What is a WeakHashMap good for? Imagine you have a program that allocates some transient system resourcea buffer, for exampleon request from a client. Besides passing a reference to the resource back to the client, your program might also need to store information about it locallyfor example, associating the buffer with the client that requested it. That could be implemented by means of a map from resource to client objects. But now, even after the client has disposed of the resource, the map will still hold a reference that will prevent the resource object from being garbage collectedif, that is, the reference is strong. Memory will gradually be used up by resources which are no longer in use. But if the reference is weak, held by a WeakHashMap, the garbage collector will be able to reclaim the object after the last strong reference has gone away, and the memory leak is prevented.

A more general use is in those applicationsfor example, cacheswhere you don't mind information disappearing if memory is low. Here, WeakHashMap is useful whether or not the keys are unique, because you can always re-create a key if necessary to see if the corresponding value is still in the cache. WeakHashMap isn't perfect for this purpose; one of its drawbacks is that it weakly references the map's keys rather than its values, which usually occupy much more memory. So even after the garbage collector has reclaimed a key, the real benefit in terms of available memory will not be experienced until the map has removed the stale entry. A second drawback is that weak references are too weak; the garbage collector is liable to reclaim a weakly reachable object at any time, and the programmer cannot influence this in any way. (A sister class of WeakReference, java.lang.ref.SoftReference, is treated differently: the garbage collector should postpone reclaiming these until it is under severe memory pressure. Heinz Kabutz has written a SoftReference-based map using generics; see http://www.javaspecialists.co.za/archive/Issue098.html.)

WeakHashMap performs similarly to HashMap, though more slowly because of the overheads of the extra level of indirection for keys. The cost of clearing out unwanted key-value associations before each operation is proportional to the number of associations that need to be removed. The iterators over collections of keys and values returned by WeakHashMap are fail-fast.

16.2.4. IdentityHashMap

An IdentityHashMap differs from an ordinary HashMap in that two keys are considered equal only if they are physically the same object: identity, rather than equals, is used for key comparison. That sets the contract for IdentityHashMap at odds with the contract for Map, the interface it implements, which specifies that equality should be used for key comparison. An alternative design could have avoided this problem by providing a weaker contract for Map, with two different subinterfaces strengthening the contract to specify the type of key comparison to use. This is another example of the problem we discussed in Section 11.4, of balancing the tradeoff between a framework's complexity and its precision in implementing its contracts.

IdentityHashMap is a specialized class, commonly used in operations such as serialization, in which a graph has to be traversed and information stored about each node. The algorithm used for traversing the graph must be able to check, for each node it encounters, whether that node has already been seen; otherwise, graph cycles could be followed indefinitely. For cyclic graphs, we must use identity rather than equality to check whether nodes are the same. Calculating equality between two graph node objects requires calculating the equality of their fields, which in turn means computing all their successorsand we are back to the original problem. An IdentityHashMap, by contrast, will report a node as being present only if that same node has previously been put into the map.

The standard implementation of IdentityHashMap handles collisions differently than the chaining method shown in Figure 13.2 and used by all the other variants of HashSet and HashMap. This implementation uses a technique called linear probing, in which the key and value references are stored directly in adjacent locations in the table itself rather than in cells referenced from it. With linear probing, collisions are handled by simply stepping along the table until the first free pair of locations is found. Figure 16.4 shows three stages in filling an IdentityHashMap with a capacity of 8. In (a) we are storing a key-value pair whose key hashes to 0, and in (b) a pair whose key hashes to 4. The third key, added in (c), also hashes to 4, so the algorithm steps along the table until it finds an unused location. In this case, the first one it tries, with index 6, is free and can be used. Deletions are trickier than with chaining; if key2 and value2 were removed from the table in Figure 13.2, key3 and value3 would have to be moved along to take their place.

Figure 16-4. Resolving collisions by linear probing

Out of all the Collections Framework hash implementations, why does IdentityHashMap alone use linear probing when all the others use chaining? The motivation for using probing is that it is somewhat faster than following a linked list, but that is only true when a reference to the value can be placed directly in the array, as in Figure 16.4. That isn't practical for all other hash-based collections, because they store the hash code as well as the value. This is for reasons of efficiency: a get operation must check whether it has found the right key, and since equality is an expensive operation, it makes sense to check first whether it even has the right hash code. Of course, this reasoning doesn't apply to IdentityHashMap, which checks object identity rather than object equality.

There are three constructors for IdentityHashMap:

 public IdentityHashMap() public IdentityHashMap(Map<? extends K,? extends V> m) public IdentityHashMap(int expectedMaxSize)

The first two are the standard constructors found in every general-purpose Map implementation. The third takes the place of the two constructors that in other HashMaps allow the user to control the initial capacity of the table and the load factor at which it will be rehashed. IdentityHashMap doesn't allow this, fixing it instead (at .67 in the standard implementation) in order to protect the user from the consequences of setting the load factor inappropriately: whereas the cost of finding a key in a table using chaining is proportional to the load factor l, in a table using linear probing it is proportional to 1/(1-l). So avoiding high load factors is too important to be left to the programmer! This decision is in line with the policy, mentioned earlier, of no longer providing configuration parameters when new classes are introduced into the Framework.

16.2.5. EnumMap

Implementing a mapping from an enumerated type is straightforward and very efficient, for reasons similar to those described for EnumSet (see Section 13.1.4); in an array implementation, the ordinal value of each enumerated type constant can serve as the index of the corresponding value. The basic operations of get and put can be implemented as array accesses, in constant time. An iterator over the key set takes time proportional to the number of constants in the enumerated type and returns the keys in their natural order (the order in which the enum constants are declared). Although, as with EnumSet, this class is not thread-safe, the iterators over its collection views are not fail-fast but weakly consistent.

EnumMap has three public constructors:

 EnumMap(Class<K> keyType)          // create an empty enum map EnumMap(EnumMap<K, ? extends V> m) // create an enum map, with the same                                    // key type and elements as m EnumMap(Map<K, ? extends V> m)     // create an enum map using the elements                                    // of the supplied Map, which (unless it                                    // is an EnumMap) must contain at least                                    // one association

An EnumMap contains a reified key type, which is used at run time for checking the validity of new entries being added to the map. This type is supplied by the three constructors in different ways. In the first, it is supplied as a class token; in the second, it is copied from the specified EnumMap. In the third, there are two possibilities: either the specified Map is actually an EnumMap, in which case it behaves like the second constructor, or the class of the first key of the specified Map is used (which is why the supplied Map may not be empty).