16.2. Implementing Map
The
implementations
, eight in all, that the Collections Framework provides for
Map
are shown in Figure 16.2. We shall discuss
HashMap
,
LinkedHashMap
,
WeakHashMap
,
IdentityHashMap
, and
EnumMap
here; the interfaces
NavigableMap
,
ConcurrentMap
, and
ConcurrentNavigableMap
are discussed, along with their implementations, in the sections following this one.
For constructors, the general rule for
Map
implementations is like that for
Collection
implementations (see Section 12.3). Every implementation excluding
EnumMap
has at least two constructors; taking
HashMap
as an example, they are:
public HashMap()
public HashMap(Map<? extends K,? extends V> m)
The first of these creates an empty map, and the second a map that will contain the key-value mappings contained in the supplied map
m
. The keys and values of map
m
must have types that are the same as (or are
subtypes
of) the keys and values, respectively, of the map being created. Using this second constructor has the same effect as creating an empty map with the default constructor, and then adding the contents of map
m
using
putAll
. In addition to these two, the standard implementations have other constructors for configuration purposes.
16.2.1. HashMap
In discussing
HashSet
in Section 13.1.1, we mentioned that it delegates all its operations to a private instance of
HashMap
. Figure 16.3(a) is similar to Figure 13.2, but without the simplification that removed the value elements from the map (all elements in a
HashSet
are stored as keys with the same constant value). The discussion in Section 13.1 of hash tables and their performance applies equally to
HashMap
. In particular,
HashMap
provides constant-time performance for
put
and
get
. Although in principle constant-time performance is only attainable with no collisions, it can be closely approached by the use of rehashing to control the load and thereby to minimize the number of collisions.
Iteration over a collection of keys or values requires time proportional to the capacity of the map plus the number of key-value mappings that it contains. The iterators are fail-fast.
Two constructors allow the programmer to configure a new instance of
HashMap
:
public HashMap(int initialCapacity)
public HashMap(int initialCapacity, float loadFactor)
These constructors are like those of
HashSet
, allowing specification of the initial capacity and,
optionally
, the load factor at which the table will be rehashed.
16.2.2. LinkedHashMap
Like
LinkedHashSet
(Section 13.1.2), the class
LinkedHashMap
refines the contract of its parent class,
HashMap
, by
guaranteeing
the order in which iterators return its elements. Unlike
LinkedHashSet
, however,
LinkedHashMap
offers a choice of iteration orders; elements can be returned either in the order in which they were inserted in the map, or in the order in which they were accessed (from least-recently to most-recently accessed). An access ordered
LinkedHashMap
is created by supplying an argument of
true
for the last parameter of the constructor:
public LinkedHashMap(int initialCapacity,
float loadFactor,
boolean accessOrder)
Supplying
false
will give an insertion-ordered map. The other constructors, which are just like those of
HashMap
, also produce insertion-ordered maps. As with
LinkedHashSet
, iteration over a
LinkedHashMap
takes time proportional only to the number of elements in the map, not its capacity.
Access-ordered maps are
especially
useful for constructing
Least Recently Used
(
LRU
) caches. A cache is an area of memory that stores frequently accessed data for fast access. In designing a cache, the key issue is the choice of algorithm that will be used to decide what data to remove in order to
conserve
memory. When an item from a cached data set needs to be found, the cache will be searched first. Typically, if the item is not found in the cache, it will be retrieved from the main store and added to the cache. But the cache cannot be allowed to continue growing indefinitely, so a strategy must be chosen for removing the least useful item from the cache when a new one is added. If the strategy
chosen
is LRU, the entry removed will be the one least recently used. This simple strategy is suitable for situations in which an access of an element
increases
the probability of further access in the near future of the same element. Its simplicity and speed have made it the most popular caching strategy.
Cache construction with
LinkedHashMap
is further aided by
removeEldestEntry
, the single method that it adds to those inherited from
HashMap
:
protected boolean removeEldestEntry(Map.Entry<K,V> eldest)
The contract for
removeEldestEntry
states that the
methods
put
and
putAll
will call
removeEldestEntry
whenever a new entry is added to the map, passing to it the "eldest" entry. In an insertion-ordered map, the eldest entry will be the one that was least recently added to the map, but in an access-ordered map it is the one least recently accessed (and if some entries have never been accessed, it is the one amongst these which was least recently added). In
LinkedHashMap
itself,
removeEldestEntry
does nothing and returns
false
, but subclasses can override it to return
true
under some circumstances. The contract for this method specifies that although it may itself remove the eldest entry, it must return
false
if it has done so, since it is expected that a return value of
true
will cause its calling method to do the removal. A simple example of
removeEldestEntry
would allow a map to grow to a given maximum size and then maintain that
size
by deleting the eldest entry each time a new one is added:
class BoundedSizeMap extends LinkedHashMap {
private int maxEntries;
public BoundedSizeMap(int maxEntries) {
super(16, 0.75f, true);
this.maxEntries = maxEntries;
}
protected boolean removeEldestEntry(Map.Entry eldest) {
return size() > maxEntries;
}
}
A refinement of this simple example could take into account the entry supplied as the argument to
removeEldestEntry
. For example, a directory cache might have a set of reserved
names
which should never be removed, even if the cache continues to grow as a result.
Notice that an insertion-ordered
LinkedHashMap
that
overrides
removeEldestEntry
as shown above will implement a FIFO strategy. FIFO caching has often been used in preference to LRU because it is much simpler to implement in maps that do not offer access ordering. However LRU is usually more effective than FIFO, because the reduced cost of cache refreshes outweighs the overhead of maintaining access ordering.
Iteration over a collection of keys or values returned by a
LinkedHashMap
is linear in the number of elements. The iterators over such collections are fail-fast.
16.2.3. WeakHashMap
An ordinary
Map
keeps ordinary ("strong") references to all the objects it contains. That means that even when a key has become unreachable by any means other than through the map itself, it cannot be garbage collected. Often, that's exactly what we want; in the example at the beginning of this chapter, where we mapped
tasks
to
clients
, we wouldn't have wanted a mapping to disappear just because we weren't holding a reference to the task object that we had put into the
HashMap
. To look up the value associated with a supplied key, the
HashMap
will look for a key that matches (in the sense of
equals
) the supplied onethey don't have to be physically the same object.
But suppose that the objects of the key class are uniquethat is, object equality is the same as object identity. For example, each object might contain a unique serial number. In this case, once we no longer have a referencefrom outside the mapto an object being used as a key, we can never look it up again, because we can never re-create it. So the map might as well get rid of the key-value pair and, in fact, there may be a strong advantage in doing so if the map is large and memory is in short supply. That is the idea that
WeakHashMap
implements.
Internally
WeakHashMap
holds references to its key objects through references of the class
java.lang.ref.WeakReference
(see Figure 16.3(b)). A
WeakReference
introduces an extra level of indirection in reaching an object. For example, to make a weak reference to to the string
"code gui"
you would write:
WeakReference<String> wref = new WeakReference<String>("code gui");
And at a later time you would recover a strong reference to it using the
get
method of
WeakReference
:
String recoveredStringRef = wref.get();
If there are no strong references to the string
"code gui"
(or to any substring of it returned from its
subString
method), the existence of the weak reference will not by itself prevent the garbage collector from reclaiming the object. So the recovered reference value
recoveredStringRef
may, or may not, be
null
.
To see how
WeakHashMap
can be useful, think of a tracing garbage collector that works by determining which objects are
reachable
, and reclaiming all others. The starting points for a reachability search include the static variables of currently loaded classes and the local
variables
currently in scope. Only strong references are followed to determine reachability, so the keys of a
WeakHashMap
will be available to be reclaimed if they are not reachable by any other route. Note that a key cannot be reclaimed if it is strongly referenced from the corresponding value. (including from the values they
correspond
to).
Before most operations on a
WeakHashMap
are executed, the map checks which keys have been reclaimed. (It's not enough to check if a key is
null
, because
null
is a legal value for keys in a
WeakHashMap
. The
WeakReference
mechanism allows you to tell the garbage collector to leave information in a
ReferenceQueue
each time it reclaims a
weakly
referenced object.) The
WeakHashMap
then
removes
every entry of which the garbage collector has reclaimed the key.
What is a
WeakHashMap
good for? Imagine you have a program that
allocates
some transient system resourcea buffer, for exampleon request from a client. Besides passing a reference to the resource back to the client, your program might also need to store information about it locallyfor example, associating the buffer with the client that
requested
it. That could be implemented by means of a map from resource to client objects. But now, even after the client has disposed of the resource, the map will still hold a reference that will prevent the resource object from being garbage collectedif, that is, the reference is strong. Memory will gradually be used up by resources which are no longer in use. But if the reference is weak, held by a
WeakHashMap
, the garbage collector will be able to
reclaim
the object after the last strong reference has gone away, and the memory leak is prevented.
A more general use is in those applicationsfor example, cacheswhere you don't mind information
disappearing
if memory is low. Here,
WeakHashMap
is useful whether or not the keys are unique, because you can always re-create a key if necessary to see if the corresponding value is still in the cache.
WeakHashMap
isn't perfect for this purpose; one of its drawbacks is that it weakly references the map's keys rather than its values, which usually occupy much more memory. So even after the garbage collector has reclaimed a key, the real benefit in terms of available memory will not be
experienced
until the map has removed the stale entry. A second drawback is that weak references are
too
weak; the garbage collector is liable to reclaim a weakly reachable object at any time, and the programmer cannot influence this in any way. (A sister class of
WeakReference
,
java.lang.ref.SoftReference
, is treated differently: the garbage collector should
postpone
reclaiming these until it is under severe memory pressure. Heinz Kabutz has written a
SoftReference
-based map using generics; see http://www.javaspecialists.co.za/archive/Issue098.html.)
WeakHashMap
performs
similarly to
HashMap
, though more slowly because of the overheads of the extra level of indirection for keys. The cost of clearing out unwanted key-value associations before each operation is proportional to the number of associations that need to be removed. The iterators over collections of keys and values returned by
WeakHashMap
are fail-fast.
16.2.4. IdentityHashMap
An
IdentityHashMap
differs
from an ordinary
HashMap
in that two keys are
considered
equal only if they are physically the same object: identity, rather than
equals
, is used for key comparison. That sets the contract for
IdentityHashMap
at odds with the contract for
Map
, the interface it implements, which specifies that equality should be used for key comparison. An alternative design could have avoided this problem by providing a weaker contract for
Map
, with two different subinterfaces strengthening the contract to specify the type of key comparison to use. This is another example of the problem we discussed in Section 11.4, of balancing the
tradeoff
between a framework's complexity and its precision in implementing its contracts.
IdentityHashMap
is a specialized class, commonly used in operations such as serialization, in which a graph has to be traversed and information stored about each node. The algorithm used for traversing the graph must be able to check, for each node it encounters, whether that node has already been seen;
otherwise
, graph cycles could be followed indefinitely. For cyclic graphs, we must use identity rather than equality to check whether nodes are the same. Calculating equality between two graph node objects requires calculating the equality of their fields, which in
turn
means computing all their successorsand we are back to the original problem. An
IdentityHashMap
, by contrast, will report a node as being present only if that same node has previously been put into the map.
The standard implementation of
IdentityHashMap
handles collisions differently than the chaining method shown in Figure 13.2 and used by all the other variants of
HashSet
and
HashMap
. This implementation uses a technique called
linear probing
, in which the key and value references are stored directly in adjacent locations in the table itself rather than in
cells
referenced from it. With linear probing, collisions are handled by simply stepping along the table until the first free pair of locations is found. Figure 16.4 shows three stages in filling an
IdentityHashMap
with a capacity of 8. In (a) we are storing a key-value pair whose key hashes to 0, and in (b) a pair whose key hashes to 4. The third key, added in (c), also hashes to 4, so the algorithm steps along the table until it finds an unused location. In this case, the first one it
tries
, with index 6, is free and can be used. Deletions are trickier than with chaining; if
key2
and
value2
were removed from the table in Figure 13.2,
key3
and
value3
would have to be moved along to take their place.
Out of all the Collections Framework hash implementations, why does
IdentityHashMap
alone use linear probing when all the others use chaining? The motivation for using probing is that it is somewhat faster than following a linked list, but that is only true when a reference to the value can be placed directly in the array, as in Figure 16.4. That isn't practical for all other hash-based collections, because they store the hash code as well as the value. This is for reasons of efficiency: a
get
operation must check whether it has found the right key, and since equality is an expensive operation, it makes sense to check first whether it even has the right hash code. Of course, this reasoning doesn't apply to
IdentityHashMap
, which checks object identity rather than object equality.
There are three constructors for
IdentityHashMap
:
public IdentityHashMap()
public IdentityHashMap(Map<? extends K,? extends V> m)
public IdentityHashMap(int expectedMaxSize)
The first two are the standard constructors found in every general-purpose
Map
implementation. The third takes the place of the two constructors that in other
HashMap
s allow the
user
to control the initial capacity of the table and the load factor at which it will be rehashed.
IdentityHashMap
doesn't allow this, fixing it instead (at .67 in the standard implementation) in order to protect the user from the consequences of setting the load factor inappropriately: whereas the cost of finding a key in a table using chaining is proportional to the load factor
l
, in a table using linear probing it is proportional to 1/(1-
l
). So avoiding high load factors is too important to be left to the programmer! This decision is in line with the policy, mentioned earlier, of no longer providing configuration parameters when new classes are introduced into the Framework.
16.2.5. EnumMap
Implementing a mapping from an enumerated type is straightforward and very efficient, for reasons similar to those described for
EnumSet
(see Section 13.1.4); in an array implementation, the ordinal value of each enumerated type constant can serve as the index of the corresponding value. The basic operations of
get
and
put
can be implemented as array
accesses
, in constant time. An iterator over the key set takes time proportional to the number of constants in the enumerated type and returns the keys in their natural order (the order in which the
enum
constants are declared). Although, as with
EnumSet
, this class is not thread-safe, the iterators over its collection views are not fail-fast but weakly consistent.
EnumMap
has three public constructors:
EnumMap(Class<K> keyType) // create an empty enum map
EnumMap(EnumMap<K, ? extends V> m) // create an enum map, with the same
// key type and elements as m
EnumMap(Map<K, ? extends V> m) // create an enum map using the elements
// of the supplied Map, which (unless it
// is an EnumMap) must contain at least
// one association
An
EnumMap
contains a reified key type, which is used at run time for checking the validity of new entries being added to the map. This type is supplied by the three constructors in different ways. In the first, it is supplied as a class token; in the second, it is
copied
from the specified
EnumMap
. In the third, there are two possibilities: either the specified
Map
is actually an
EnumMap
, in which case it behaves like the second constructor, or the class of the first key of the specified
Map
is used (which is why the supplied
Map
may not be empty).
|