Section 11.4. Collections

11.4. Collections

Collections are fundamental to all kinds of programming. Anywhere we need to keep a group of objects, we have some kind of collection. At the most basic level, Java supports collections in the form of arrays. But since arrays have a fixed length, they are awkward for groups of things that grow and shrink over the lifetime of an application. Arrays also do not represent abstract relationships between objects well. In the early days, the Java platform had only two basic classes to address these needs: the java.util.Vector class, which represents a dynamic list of objects, and the java.util.Hashtable, class which holds a map of key/value pairs. Today, Java has a more comprehensive approach to collections called the Collections Framework. The older classes still exist, but they have been retrofitted into the framework (with some eccentricities).

Though conceptually simple, collections are one of the most powerful parts of any programming language. Collections implement data structures that lie at the heart of managing complex information and relationships. A great deal of basic computer science is devoted to describing the most efficient ways to implement certain types of algorithms over collections. Having these tools at your disposal and understanding how to use them can make your code both much smaller and faster. It can also save you from reinventing the wheel.

Prior to Java 5.0, the Collections Framework had two major drawbacks. The first was that collections were, of necessity, untyped and worked only with anonymous Objects instead of real types like Dates and Strings. This meant that you had to perform a type cast every time you took an object out of a collection. In theory, this broke Java's compile-time type safety. In practice, this was less the problem than that it was just plain cumbersome and tedious. The second problem was that, for practical reasons, collections could work only with objects and not with primitive types. This meant that any time you wanted to put a number or other primitive type into a collection you had to store it in a wrapper class first and, of course, unpack it later upon retrieving it. The combination of these factors made code working with collections less readable and more dangerous to boot.

Two of the biggest new features in Java 5.0 (and the most basic changes to the core language in Java's lifetime) address these problems directly. First, the introduction of generic types, as we described in Chapter 8, have made it possible to have truly typesafe collections under the control of the programmer. Second, the introduction of autoboxing and unboxing of primitive types means that you can generally treat objects and primitives as equals as far as collections are concerned. The combination of these new features can significantly reduce the amount of code you write and add safety as well. As we'll see, all of the collections classes now take advantage of these features.

The Collections Framework is based around a handful of interfaces in the java.util package. These interfaces are divided into two hierarchies. The first hierarchy descends from the Collection interface. This interface (and its descendants) represents a container that holds other objects. The second, separate, hierarchy is based on the Map interface, which represents a group of key/value pairs.

11.4.1. The Collection Interface

The mother of all collections is an interface appropriately named Collection. It serves as a container that holds other objects, its elements. It doesn't specify exactly how the objects are organized; it doesn't say, for example, whether duplicate objects are allowed or whether the objects are ordered in any way. These kinds of details are left to child interfaces. Nevertheless, the Collection interface defines some basic operations common to all collections:

public boolean add( element ): This method adds the supplied object to this collection. If the operation succeeds, this method returns true. If the object already exists in this collection and the collection does not permit duplicates, false is returned. Furthermore, some collections are read-only. These collections throw an UnsupportedOperationException if this method is called.
public boolean remove( element ): This method removes the specified object from this collection. Like the add( ) method, this method returns TRue if the object is removed from the collection. If the object doesn't exist in this collection, false is returned. Read-only collections throw an UnsupportedOperationException if this method is called.
public boolean contains( element ): This method returns TRue if the collection contains the specified object.
public int size( ): Use this method to find the number of elements in this collection.
public boolean isEmpty( ): This method returns true if this collection has no elements.
public Iterator iterator( ): Use this method to examine all the elements in this collection. This method returns an Iterator, which is an object you can use to step through the collection's elements. We'll talk more about iterators in the next section.

Additionally, the methods addAll( ), removeAll( ), and containsAll( ) accept another Collection and add, remove, or test for all of the elements of the supplied collection.

11.4.1.1 Generics and collections

When using generics, the Collection type is parameterized with a specific type of element that the collection will hold. This makes a generic collection of "anything" into a specific collection of one type of element. The parameter type becomes the compile-time type of the element arguments in all of the methods of Collection (in this case, the add( ), remove( ), and contains( ) methods listed earlier). For example, in the following code, we create a Collection that works with Dates:

     Collection<Date> dates = new ArrayList<Date>(  );     dates.add( new Date(  ) );     dates.add( "foo" ) // Error; string type where Date expected!!

ArrayList is just one type of Collection; we'll talk about it a bit later. The important thing is that we've declared the variable dates to be of the type Collection<Date>, that is, a collection of Dates, and we've allocated our ArrayList to match. Since our collection has been parameterized with the type Date, the add( ) method of the collection becomes add( Date date ) and attempting to add any type of object other than a Date to the list would have caused a compile-time error.

If you are not using generics, you can simply substitute the type Object everywhere you see the element type and perform the appropriate casts. For example:

     Collection dates = new ArrayList(  );     dates.add( new Date(  ) );  // unchecked, compile-time warning

As we've described earlier in the book, this is essentially what the Java compiler is doing for us with generics. When using collections (or any generic classes) in this way under Java 5.0 or later, you will get compile-time warnings indicating that the usage is unchecked, meaning that it is possible to get an error at runtime if you have made a mistake. In this example, a mistake would not be caught until someone tried to retrieve the object from the collection and cast it to the expected type.

11.4.1.2 Runtime versus compile-time safety

With the introduction of generics, collections can now be made typesafe for a given type of element. However, a lot of existing Java code does not use generics and, in some cases, you may still simply wish to use plain Object collections. It's natural to ask if there isn't a way for us to simply subclass the existing collection classes to produce typesafe ones and, if so, why this wasn't done all along. Well, the answer is no. You can't get compile-time safety in this way because the new type-specific methods you would add would overload the general ones and not override them. (They wouldn't prevent the wider Object methods from being used.) However, you can implement runtime safety checks with special collections and this can be a reasonable thing to do. In fact, in support of generics migration, Java 5.0 supplies runtime-checked wrappers for all of the basic collection types. These wrappers enforce a specific Java element type at runtime by throwing ClassCastException if the wrong element is inserted. For example:

     List list = new ArrayList(  );     list = Collections.checkedList( list, Date.class );     list.add( new Date(  ) );     list.add( "foo" ); // Runtime ClassCastException!

Here, the static Collections.checkedList( ) method has wrapped our collection, list, in a wrapper that implements all of the methods of List, but checks that we are only holding Dates. The second argument to the method is the literal Date.class reference to the Class of Date. This serves to tell the wrapper what type we want to enforce. Corresponding "checked" collection methods exist for all of the basic collection interfaces that we'll see, including the base Collection, List, Set, and Map.

While they do not prevent mistakes from being made at compile time, runtime wrappers help catch errors quickly at runtime (on insertion rather than extraction of the element) and provide an additional level of confidence in the code.

11.4.1.3 Collections and arrays

Converting between collections and arrays is easy. As a special convenience, the elements of a collection can be placed into an array using the following methods:

     public Object[] toArray(  )     public <E> E[] toArray( E[] a )     // nongeneric equivalent     // public Object[] toArray( Object [] a )

The first method returns a plain Object array. With the second form, we can be more specific and get back an array of the correct element type. If we supply an array of sufficient size, it will be filled in with the values. But if the array is too short, a new array of the same type will be created, of the required length, and returned to us. If you don't want to bother creating the array, you can just pass in an empty array of the correct type like this:

     String [] myStrings = myCollection( new String[0] ); // generic

(This trick is a little awkward and it would be nice if Java let us specify the type explicitly, using a Class reference, but for some reason, this isn't the case.) Going the other way, you can convert an array of objects to a List collection with the static asList( ) method of the java.util.Arrays class:

     List list = Arrays.asList( myStrings );

11.4.2. Iteration

An iterator is an object that lets you step through a sequence of values. This kind of operation comes up so often that it is given a standard interface: java.util.Iterator. The Iterator interface has only two primary methods:

public Object next( ): This method returns the next element of the associated collection.
public boolean hasNext( ): This method returns TRue if you have not yet stepped through all the Collection's elements. In other words, it returns true if you can call next( ) to get the next element.

The following example shows how you could use an Iterator to print out every element of a collection:

     public void printElements(Collection c, PrintStream out) {         Iterator iterator = c.iterator(  );         while ( iterator.hasNext(  ) )             out.println( iterator.next(  ) );     }

In addition to the traversal methods, Iterator provides the ability to remove an element from a collection:

public void remove( ): This method removes the last object returned from next( ) from the associated Collection.

Not all iterators implement remove( ). It doesn't make sense to be able to remove an element from a read-only collection, for example. If element removal is not allowed, an UnsupportedOperationException is thrown from this method. If you call remove( ) before first calling next( ), or if you call remove( ) twice in a row, you'll get an IllegalStateException.

11.4.2.1 For-loop over collections

Java 5.0 made iteration over collections even easier because, as we described in Chapter 4, the for-loop has been enhanced to operate over all types of Collection objects. For example, we can now step over all of the elements of a typed collection of Date objects like so:

     Collection<Date> col = ...     for( Date date : collection )        System.out.println( date );

The enhanced for-loop applies only to Collection type collections, not Maps. Maps are another type of beast that really contain two distinct sets of objects (keys and values), so it's not obvious what your intentions would be in such a loop.

11.4.2.2 java.util.Enumeration

Prior to the introduction of the Collections API there was another iterator interface: java.util.Enumeration. It used the slightly more verbose names: nextElement( ) and hasMoreElements( ) but accomplished the same thing. Many older classes provide Enumerations where they would now use Iterator. If you aren't worried about performance, you can just convert your Enumeration into a List with a static convenience method of the java.util.Collections class:

     List list = Collections.list( enumeration );

11.4.3. Collection Types

The Collection interface has three child interfaces. Set represents a collection in which duplicate elements are not allowed. List is a collection whose elements have a specific order. The Queue interface, added in Java 5.0, is a buffer for objects with a notion of a "head" element that's next in line for processing.

11.4.3.1 Set

Set has no methods besides the ones it inherits from Collection. It simply enforces the no-duplicates rule. If you try to add an element that already exists in a Set, the add( ) method simply returns false. SortedSet adds only a few methods to Set. As you call add( ) and remove( ), the set maintains its order. You can retrieve subsets (which are also sorted) using the subSet( ), headSet( ), and tailSet( ) methods. The first( ), last( ), and comparator( ) methods provide access to the first element, the last element, and the object used to compare elements (more on this later).

11.4.3.2 List

The next child interface of Collection is List. The List is an ordered collection, similar to an array but with methods for manipulating the position of elements in the list:

public void add( E element ): This method adds the specified element to the end of the list.
public void add( int index, E element ): This method inserts the given object at the supplied position in the list. If the position is less than zero or greater than the list length, an IndexOutOfBoundsException will be thrown. The element that was previously at the supplied position, and all elements after it, are moved up one index position.
public void remove( int index ): This method removes the element at the specified position. All subsequent elements move down one index position.
public E get( int index ): This method returns the element at the given position.
public Object set( int index, E element ): This method changes the element at the given position to the specified object. There must already be an object at the index or an IndexOutOfBoundsException is thrown.

The type E in these methods refers to the parameterized element type of the List class. Collection, Set, and List are all interface types. We'll look at concrete implementations of these shortly.

11.4.3.3 Queue

A Queue is a collection that acts like a buffer for elements. The queue maintains an ordering of items placed into it and has the notion of a "head" item for removal that can be queried or accepted:

public boolean offer( E element ): This method attempts to place the element into the queue, returning true if successful. Different Queue types may have different limits or restrictions on element types (including capacity). This method differs from the Collection method add( ) in that it returns a Boolean value instead of throwing an exception to indicate that the element cannot be accepted.
public E poll( ): This method removes the element at the head of the queue and returns it. If the queue is empty, null is returned.
public E peek( ): This method returns the head element without removing it from the queue. If the queue is empty, null is returned.

11.4.3.4 BlockingQueue

BlockingQueue is part of the java.util.concurrent package. It extends the Queue interface for queues that may have a fixed capacity and allows the user to block, waiting for space to insert an item or for an available item to retrieve. It adds timed wait versions of offer( ) and poll( ) and additional, blocking take( ) and put( ) methods:

public boolean offer( E element, long time, TimeUnit units ): This method attempts to place the element into the queue, just like the method of the base Queue interface, but blocks for up to the specified period of time as it waits for space to become available.
public E poll( long time, timeUnit unit ): This method attempts to remove the element at the head of the queue, just like the method of the base Queue interface, but blocks for up to the specified period of time as it waits for an element to become available.
public E take( ): This method retrieves the element at the head of the queue, blocking if necessary until one becomes available.
public void put( E element ): This method adds an element to the queue, blocking if necessary until space becomes available.
public boolean add( E element ): This method attempts to add an element to the queue immediately. If successful, it returns true. If no space is available, it throws an IllegalStateException. This method is useful for cases where you are not expecting the queue to ever reject an item.

11.4.4. The Map Interface

The Collections Framework also includes the java.util.Map, which is a collection of key/value pairs. Other names for map are "dictionary" or "associative array." Maps store and retrieve elements with key values; they are very useful for things like caches or minimalist databases. When you store a value in a map, you associate a key object with a value. When you need to look up the value, the map retrieves it using the key.

With generics, a Map type is parameterized with two types: one for the keys and one for the values. The following snippet uses a HashMap, which is an efficient type of map implementation that we'll discuss later:

     Map<String, Date> dateMap = new HashMap<String, Date>(  );     dateMap.put( "today", new Date(  ) );     Date today = dateMap.get( "today" );           // nongeneric example     Map dateMap = new HashMap(  );     dateMap.put( "today", new Date(  ) );     Date today = (Date)dateMap.get( "today" );

Nongeneric maps simply map Object types to Object types and require the appropriate cast to retrieve values, as shown here.

The basic operations on Map are straightforward. In the following methods, the type K refers to the key parameter type and the type V refers to the value parameter type:

public V put( K key, V value): This method adds the specified key/value pair to the map. If the map already contains a value for the specified key, the old value is replaced and returned as the result.
public V get( K key ): This method retrieves the value corresponding to key from the map.
public V remove( K key ): This method removes the value corresponding to key from the map. The value removed is returned.
public int size( ): Use this method to find the number of key/value pairs in this map.

You can retrieve all the keys or values in the map:

public Set keySet( ): This method returns a Set that contains all the keys in this map.
public Collection values( ): Use this method to retrieve all the values in this map. The returned Collection can contain duplicate elements.

Map has one child interface, SortedMap. A SortedMap maintains its key/value pairs sorted in a particular order according to the key values. It provides the subMap( ), headMap( ), and tailMap( ) methods for retrieving sorted map subsets. Like SortedSet, it also provides a comparator( ) method that returns an object that determines how the map keys are sorted. We'll talk more about that later.

Finally, we should make it clear that although related, Map is not a type of Collection (Map does not extend the Collection interface). You might wonder why. All of the methods of the Collection interface would appear to make sense for Map, except for iterator( ). A Map, again, has two sets of objects: keys and values, and separate iterators for each. This is why a Map is not a Collection.

One more note about maps: some map implementations (including Java's standard HashMap) allow null to be used as a key or value, but others may not.

11.4.4.1 ConcurrentMap

The ConcurrentMap interface is part of the java.util.concurrent package. It extends the base Map interface and adds atomic put, remove, and replace functionality that is useful for concurrent programming:

public V putIfAbsent( K key, V value ): This method associates the value with the key only if the key was not already in use. If the key exists, no action is taken. If the key does not exist, it is created. The resulting value (existing or new) is returned.
public boolean remove( Object key, Object value ): This method removes the mapping (key and value) only if the current value associated with the key equals the supplied value. It returns true if the value was removed, false if not.
public boolean replace( K key, V existingValue, V newValue): This method replaces the value associated with the key only if the existing value equals the existingValue argument. It returns true if the value was replaced.
public boolean replace( K key, V value ): This method replaces the value associated with the key only if a mapping already exists for the key (i.e., it already has some value).

11.4.5. Collection Implementations

Up until this point, we've talked only about interfaces. But you can't instantiate interfaces; you need concrete implementations. Fortunately, the Collections Framework includes useful implementations of all of the collections interfaces. In some cases, there are several alternatives to choose from. To understand the tradeoffs between these implementations, it helps to have a basic understanding of the most common data structures used in all programming: arrays, linked lists, trees, and hash maps. Many books have been written about these data structures and we will not drag you into the mind-numbing details here. We'll hit the highlights briefly as a prelude to our discussion of the Java implementations. We should stress before we go on that the differences in the implementations of Java collections are only really significant when working with very large numbers of elements or with extreme time sensitivity. For the most part, they all behave well enough to be interchangeable.

11.4.5.1 Arrays

It should be fairly obvious that plain old Java arrays, shown in Figure 11-2, would be good at holding an ordered collections of elements. As we mentioned earlier however, one limitation of arrays is that they cannot grow. This means that to support a true Collection, arrays must be copied into larger arrays as capacity demands increase. Another problem with using arrays for lists is that inserting an element into the middle of an array or taking one out generally also involves copying large parts of the array to and fro, which is an expensive operation.

Figure 11-2. Array structure

Because of this, arrays are described as consuming constant time for retrieval, but linear time for insertion into or deletion from the body of the array. The term constant time here means that, in general, the time to retrieve an element stays roughly constant even as you add more elements to the array. Linear time means that the time to insert or delete an element takes longer and longer as the array adds elements; the time expense grows linearly with the number of elements. There are worse things too: exponential time, as you might imagine, means that an algorithm is useless for very large numbers of elements. All of the Java collection implementations work in linear time or better.

Arrays are useful when you are mostly reading or exclusively appending to the end of the collection.

11.4.5.2 Linked lists

A linked list, shown in Figure 11-3, holds its elements in a chain of nodes, each referencing the node before and after it (if any). In this way, the linked list forms an ordered collection that looks something like an array. Unlike the magic of an array, however, to retrieve an element from a linked list, you must traverse the list from either the head or tail to reach the correct spot. As you might have guessed, this is a linear-time operation that gets more expensive as the number of elements grows. The flip-side is that once you're at the spot, inserting or deleting an element is a piece of cake: simply change the references and you're done. This means that insertions and deletions, at least near the head and tail of a linked list, are said to be in constant time.

Figure 11-3. Linked list structure

Linked lists are useful when you are doing a lot of insertions or deletions on a collection.

11.4.5.3 Trees

A tree is like a linked list in that it holds its elements in nodes that point to their neighbors. However, a tree, as its name suggests, does not have a linear structure, but instead arranges its elements in a cascade of branches like a family tree. The power of the tree structure is in sorting and searching elements that have a specified order. A binary search tree, as shown in Figure 11-4, arranges its elements such that the children divide up a range of values. One child holds values greater than the node and one child holds values lower. By applying this knowledge recursively on a properly "balanced" tree, we can rapidly find any value. The effort to search the tree is described as log(n) time, which means that it grows only with the logarithm of the number of elements, which is much better than the linear time it would take to check all of the elements by brute force.

Figure 11-4. Tree structure

Trees are useful for maintaining or searching large collections of sorted elements.

11.4.5.4 Hash maps

Hash maps are strange and magical beasts. A hash map (or hash table as it is also called) uses a mathematical hash algorithm applied to its key value to distribute its element values into a series of "buckets." The algorithm relies on the hash algorithm to distribute the elements as uniformly (randomly) as possible. To retrieve an element by its key then simply involves searching the correct bucket. Because the hash calculation is fast and can have a large number of buckets, there are few elements to search and retrieval is very fast. As we described in Chapter 7, all Java Objects have a hash value as determined by the hashCode( ) method. We'll say a bit more about hash codes and key values for maps in a later section.

Hash map performance is governed by many factors, including the sophistication of the hash algorithm implemented by its elements (see Figure 11-5). In general, a good implementation can operate in linear time for putting and retrieving elements. Hash maps are fast at mapping unsorted collections.

Figure 11-5. Hash map structure

11.4.5.5 Java Collections implementations

Table 11-7 lists the implementations of the Java Collections Framework by interface type.

Table 11-7. Collections Framework implementation classes
Interface	Implementation
`Set`	`HashSet` `LinkedHashSet` `CopyOnWriteArraySet`
`SortedSet`	`treeSet`
`List`	`ArrayList` `LinkedList` `Vector`^a `CopyOnWriteArrayList`
`Map`	`HashMap` `LinkedHashMap` `IdentityHashMap` `Hashtable`^a
`ConcurrentMap`	`ConcurrentHashMap`
`SortedMap`	`TReeMap`
`Queue`	`LinkedList` `PriorityQueue` `DelayQueue` `SynchronousQueue` `ConcurrentLinkedQueue`
`BlockingQueue`	`ArrayBlockingQueue` `LinkedBlockingQueue` `PriorityBlockingQueue`
`Vector` and `Hashtable` are legacy classes and should generally be avoided in favor of `ArrayList` and `HashMap`, respectively.

ArrayList and LinkedList provide the array and linked list implementations of the List interface described earlier. ArrayList is satisfactory for most purposes but use LinkedList when you plan to do a lot of insertions or deletions at various points in the list.

HashSet and HashMap provide a good hash map implementation of the Set and Map interfaces. The LinkedHashSet and LinkedHashMap implementations combine the hash algorithm with a linked list that maintains insertion order of the elements. (Note that these linked collections are ordered, but not sorted collections.)

TReeSet and treeMap maintain sorted collections using a tree data structure. In the case of treeMap, it is the key values that are sorted. The sorting is accomplished by a comparator object. We'll discuss sorting later in this chapter.

Queue is implemented both by LinkedList (which implements both List and Queue) and PriorityQueue. A PriorityQueue's prioritization comes from a sorting order determined by a comparator supplied with its constructor. Elements that sort "least" or "lowest" have the highest priority. The various implementations of BlockingQueue mirror these for concurrency-aware queues.

Finally, IdentityHashMap is an alternate type of HashMap that uses object identity instead of object equality to determine which keys match which objects. Normally any two objects that test equal with equals( ) operate as the same key in a Map. With IdentityHashMap, only the original object instance retrieves the element. We'll talk about hash codes and keys more in the next section.

We should also mention three specialized collections that we'll talk about later: EnumSet and EnumMap are specifically designed to work with Java enumerations. WeakHashMap uses weak references to cooperate with Java garbage collection.

11.4.6. Hash Codes and Key Values

The names Hashtable and HashMap refer to the object hash codes these map collections use to make their associations. Specifically, an element in a Hashtable or HashMap is not associated with a key strictly by the key object's identity but rather by the key's contents. This allows keys that are equivalent to access the same object. By "equivalent," we mean those objects that compare true with equals( ). If you store an object in a Hashtable using one object as a key, you can use any other object that equals( ) tells you is equivalent to retrieve the stored object.

It's easy to see why equivalence is important if you remember our discussion of strings. You may create two String objects that have the same text in them but that come from different sources in Java. In this case, the == operator tells you that the objects are different, but the equals( ) method of the String class tells you that they are equivalent. Because they are equivalent, if we store an object in a HashMap using one of the String objects as a key, we can retrieve it using the other.

The hash code of an object makes this association based on content. As we mentioned in Chapter 7, the hash code is like a fingerprint of the object's data content. HashMap uses it to store the objects so that they can be retrieved efficiently. The hash code is nothing more than a number (an integer) that is a function of the data. The number is always the same for identical data, but the hashing function is intentionally designed to generate as random a number as possible for different combinations of data. In other words, a very small change in the data should produce a big difference in the number. It should be unlikely that two similar datasets would produce the same hash code.

As we described earlier, internally, HashMap really just keeps a number of lists of objects, but it puts objects into the lists based on their hash code. When it wants to find the object again, it can look at the hash code and know immediately how to get to the appropriate list. The HashMap still might end up with a number of objects to examine, but the list should be short. For each object in the short list it finds, it does the following comparison to see if the key matches:

     if ((keyHashcode == storedKeyHashcode) && key.equals(storedKey))         return object;

There is no prescribed way to generate hash codes. The only requirement is that they be somewhat randomly distributed and reproducible (based on the data). This means that two objects that are not the same could end up with the same hash code by accident. This is unlikely (there are 2³² possible integer values); moreover, it shouldn't cause a problem because the HashMap ultimately checks the actual keys, as well as the hash codes, to see if they are equal. Therefore, even if two objects have the same hash code, they can still coexist in the HashMap as long as they don't test equal to one another as well.

Hash codes are computed by an object's hashCode( ) method, which is inherited from the Object class if it isn't overridden. The default hashCode( ) method simply assigns each object instance a unique number to be used as a hash code. If a class does not override this method, each instance of the class will have a unique hash code. This goes along well with the default implementation of equals( ) in Object, which only compares objects for identity using ==.

You must override equals( ) in any classes for which equivalence of different objects is meaningful. Likewise, if you want equivalent objects to serve as equivalent keys, you need to override the hashCode( ) method as well to return identical hash code values. To do this, you need to create some suitably complex and arbitrary function of the contents of your object. The only criterion for the function is that it should be almost certain to return different values for objects with different data, but the same value for objects with identical data.

11.4.7. Synchronized and Unsynchronized Collections

The java.util.Collections class contains important static methods that operate on Sets and Maps. Since all the static methods in Collections operate on interfaces, they work regardless of the actual implementation classes you're using. The first methods we'll look at involve creating synchronized versions of our collections.

Most of the default collection implementations are not synchronized; that is, they are not safe for concurrent access by multiple threads. The reason for this is performance. In many applications there is no need for synchronization, so the Collections API does not provide it by default. Instead, you can create a synchronized version of any collection using the following methods of the Collections class:

     public static Collection synchronizedCollection(Collection c)     public static Set synchronizedSet(Set s)     public static List synchronizedList(List list)     public static Map synchronizedMap(Map m)     public static SortedSet synchronizedSortedSet(SortedSet s)     public static SortedMap synchronizedSortedMap(SortedMap m)

These methods return synchronized, threadsafe versions of the supplied collection, by wrapping them. For example, the following shows how to create a threadsafe List:

     List list = new ArrayList(  );     List syncList = Collections.synchronizedList(list);

Multiple threads can call methods on this list safely, blocking as necessary to wait for the other threads to complete.

In contrast to the norm, the older Hashtable and Vector collections are synchronized by default (and, therefore, may be a bit slower when that's not needed). The "copy on write" collection implementations that we'll talk about later also do not require synchronization for their special applications. Finally, the ConcurrentHashMap and ConcurrentLinkedQueue implementations that we'll cover later are threadsafe and designed specifically to support a high degree of concurrent access without incurring a significant penalty for their internal synchronization.

11.4.7.1 Synchronizing iterators

It's important to note that although synchronized collections are threadsafe, the Iterators returned from them are not. If you obtain an Iterator from a collection, you should do your own synchronization to ensure that the collection does not change as you're iterating through its elements. You can do this by synchronizing on the collection itself with the synchronized keyword:

     synchronized(syncList) {         Iterator iterator = syncList.iterator(  );         // do stuff with the iterator here     }

If you do not synchronize on the collection while iterating and the collection changes, Java attempts to throw a ConcurrentModificationException. However, this is not guaranteed.

11.4.7.2 ConcurrentHashMap and ConcurrentLinkedQueue

Java 5.0 introduced the java.util.concurrent.ConcurrentHashMap class. This is part of the new thread utilities package and provides a Map that performs much better under multithreaded access. A ConcurrentHashMap is safe for access from multiple threads, but it does not necessarily block threads during operations. Instead, some degree of overlapping operations, such as concurrent reads, are permitted safely. The ConcurrentHashMap can even allow a limited number of concurrent writes to happen while reads are being performed. These operations and iterators over the map do not throw a ConcurrentModificationException, but no guarantees are made as to exactly when one thread will see another thread's work. All views of the map are based upon the most recently committed writes.

Similarly, the ConcurrentLinkedQueue implementation provides the same sort of benefits for a linked queue, allowing some degree of overlapping writes and reads by concurrent users.

11.4.8. Read-Only and Read-Mostly Collections

You can use the Collections class to create read-only versions of any collection:

     public static Collection unmodifiableCollection(Collection c)     public static Set unmodifiableSet(Set s)     public static List unmodifiableList(List list)     public static Map unmodifiableMap(Map m)     public static SortedSet unmodifiableSortedSet(SortedSet s)     public static SortedMap unmodifiableSortedMap(SortedMap m)

Making unmodifiable versions of collections is a useful way to ensure that a collection handed off to another part of your code is not modified intentionally or inadvertently. Attempting to modify a read-only collection results in an UnsupportedOperationException.

11.4.8.1 Copy on write ("read-mostly") collections

The java.util.concurrent package contains the CopyOnWriteArrayList and CopyOnWriteArraySet List and Set implementations. These classes are threadsafe and do not require explicit synchronization, but are heavily optimized for read operations. Any write operation causes the entire data structure to be copied internally in a blocking operation. The advantage is that if you are almost always reading, these implementations are extremely fast and no synchronization is required.

11.4.9. WeakHashMap

In Chapter 5, we introduced the idea of weak referencesobject references that don't prevent their objects from being removed by the garbage collector. WeakHashMap is an implementation of Map that makes use of weak references in its keys and values. This means that you don't have to remove key/value pairs from a Map when you're finished with them. Normally, if you removed all references to a key object from the rest of your application, the Map would still contain a reference and keep the object "alive," preventing garbage collection. WeakHashMap changes this; once you remove references to a key object from the rest of the application, the WeakHashMap lets go of it, too.

11.4.10. EnumSet and EnumMap

EnumSet and EnumMap are collections designed to work specifically with the limited domain of objects defined by a Java enumerated type. (Enums are a new feature in Java 5.0. See Chapter 5.) Java enums are real Java objects and there is no reason you can't use them as keys or values in general collections. However, the EnumSet and EnumMap classes are highly optimized, taking advantage of the knowledge that the set, or keys in the map, respectively, may be one of only a few individual objects. With this knowledge, storage can be compact and fast using bit fields internally. The idea is to make using collection operations on enumerations efficient enough to replace the general usage pattern of bit-flags and make binary logic operations unnecessary. Instead of using:

     int flags = getFlags(  );     if ( flags & ( Constants.ERROR | Constants.WARNING ) != 0 )

we could use set operations:

     EnumSet flags = getFlags(  );     if ( flags.contains( Constants.Error) || flags.contains( Constants.Warning ) )

This code may not be as terse, but it is easier to understand and should be just as fast.

11.4.11. Sorting Collections

The Collections utilities include methods for performing common operations like sorting. Sorting comes in two varieties:

public static void sort(List list): This method sorts the given list. You can use this method only on lists whose elements implement the java.lang.Comparable interface. Luckily, many classes already implement this interface, including String, Date, BigInteger, and the wrapper classes for the primitive types (Integer, Double, etc.).
public static void sort(List list, Comparator c): Use this method to sort a list whose elements don't implement the Comparable interface. The supplied java.util.Comparator does the work of comparing elements. You might, for example, write an ImaginaryNumber class and want to sort a list of them. You would then create a Comparator implementation that knew how to compare two imaginary numbers.

The sorted collections we discussed earlier, SortedSet and SortedMap, maintain their collections in a specified order using the Comparable interface of their elements. If the elements do not implement Comparable, you must supply a Comparator object yourself in the constructor of the implementation. For example:

     Comparator myComparator = ...     SortedSet mySet = new TreeSet( myComparator );

Collections give you some other interesting capabilities, too. If you're interested in learning more, check out the min( ), max( ), binarySearch( ), and reverse( ) methods.

11.4.12. A Thrilling Example

Collections is a bread-and-butter topic, which means it's hard to create exciting examples. The example in this section reads a text file, parses all its words, counts the number of occurrences, sorts them, and writes the results to another file. It gives you a good feel for how to use collections in your own programs. This example also shows new Java 5.0 features, including generics, autoboxing, and the Scanner API. The previous version of this example was about 30 percent longer.

     Import java.io.*;     import java.util.*;           public class WordSort     {       public static void main(String[] args) throws IOException       {         if ( args.length < 2 ) {           System.out.println("Usage: WordSort inputfile outputfile");           return;         }         String inputfile = args[0];         String outputfile = args[1];               /*  Create the word map. Each key is a word and each value is an             Integer that represents the number of times the word occurs             in the input file.         */         Map<String,Integer> map = new TreeMap<String,Integer>(  );               Scanner scanner = new Scanner( new File(inputfile) );         while ( scanner.hasNext(  ) ) {             String word = scanner.next(  );             Integer count = map.get( word );             count = ( count == null ? 1 : count +1 );             map.put( word, count );         }         scanner.close(  );               // get the map's keys         List<String> keys = new ArrayList<String>( map.keySet(  ) );               // write the results to the output file         PrintWriter out = new PrintWriter( new FileWriter(outputfile) );         for ( String key : keys )           out.println( key + " : " + map.get(key) );         out.close(  );       }     }

Suppose, for example, that you have an input file named Ian Moore.txt:

     Well it was my love that kept you going     Kept you strong enough to fall     And it was my heart you were breaking     When he hurt your pride     So how does it feel     How does it feel     How does it feel     How does it feel

You could run the example on this file using the following command:

     % java WordSort "Ian Moore.txt" count.txt

The output file, count.txt, looks like this:

     And : 1     How : 3     Kept : 1     So : 1     Well : 1     When : 1     breaking : 1     does : 4     enough : 1     ...

The results are case-sensitive: "How" and "how" are recorded separately. You could modify this behavior by converting words to all lowercase after retrieving them from the Scanner.