Implementations | Appendix C. Collections

Implementations are the data objects used to store collections, which implement the core collection interfaces described in the section Interfaces (page 470). This section describes three kinds of implementations: general-purpose implementations, the public classes that provide the primary implementations of the core collection interfaces; wrapper implementations, used in combination with other types of implementations (often the general-purpose implementations) to provide added functionality; and convenience implementations, mini-implementations typically made available via static factory methods, which provide convenient, efficient alternatives to the general-purpose implementations for special collections (such as singleton sets).

Additionally, you can build your own implementations, based on the Java 2 SDK's abstract implementations. This topic is described in the section Custom Implementations (page 520). An advanced topic, it's not particularly difficult, but relatively few people will need to do it.

General-Purpose Implementations

The general-purpose implementations are summarized in the following table. The table highlights their regular naming pattern: Names are all of the form Implementation Inter face , where Interface is the core collection interface implemented by the class, and Implementation signifies the data structure underlying the implementation.

Table 63. Naming Patterns of the General-Purpose Implementations
Implementations
Interfaces		Hash Table	Resizable Array	Balanced Tree	Linked List
	`Set`	`HashSet`		`TreeSet`
	`List`		`ArrayList`		`LinkedList`
	`Map`	`HashMap`		`TreeMap`

Java 2 Platform v 1.2 and v 1.3 provide two implementations of each interface, with the exception of Collection, which has no direct implementations but serves as a least-common denominator for the other collection interfaces. In each case, one implementation is clearly the primary implementation: the one to use, all other things being equal. The primary implementations are HashSet, ArrayList, and HashMap. Note that the SortedSet and the SortedMap interfaces do not have rows in the table. Each of those interfaces has one implementation (TreeSet and TreeMap) and is listed in the Set and the Map rows.

The implementations have not only consistent names but also consistent behavior, implementing all the optional operations contained in their interfaces. All permit null elements, keys, and values. Each one is not synchronized. All have fail-fast iterators, which detect illegal concurrent modification during iteration and fail quickly and cleanly rather than risking arbitrary, nondeterministic behavior at an undetermined time in the future. All are Serializable, and all support a public clone method.

The fact that the new implementations are unsynchronized represents a break with the past: Vector and Hashtable were synchronized in the Java 2 SDK prior to version 1.2. The new approach was taken because collections are frequently used when the synchronization is of no benefit. Such uses include single-threaded use, read-only use, and use as part of a larger data object that does its own synchronization. In general, it is good API design practice not to make users pay for a feature they don't use. Further, unnecessary synchronization can result in deadlock under certain circumstances.

If you need a synchronized collection, the synchronization wrappers, described in the section Wrapper Implementations (page 511), allow any collection to be transformed into a synchronized collection. Thus, synchronization is optional for the new collection implementations, whereas it was mandatory for the old.

As a rule of thumb, you should be thinking about the interfaces, not the implementations. That is why there are no programming examples in this section. For the most part, the choice of implementation affects only performance. The preferred style, as mentioned in the section Interfaces (page 470), is to choose an implementation when a collection is created and to immediately assign the new collection to a variable of the corresponding interface type or to pass the collection to a method expecting an argument of the interface type. In this way, the program does not become dependent on any added methods in a given implementation, leaving the programmer free to change implementations at the drop of a hat, if performance concerns so warrant.

The general-purpose implementations are briefly discussed here. The performance of the implementations is described with such words as constant, log, linear, n log(n), and quadratic. These words refer to the asymptotic upper bound on the time complexity of performing the operation. All this is quite a mouthful, and it doesn't matter much if you don't know what it means. If you're interested, refer to any good algorithms textbook. One thing to keep in mind is that this sort of performance metric has its limitations. Sometimes, the nominally slower implementation may be faster for the collection size that you're using. When in doubt, measure the performance.

Set

The two general-purpose Set implementations are HashSet and TreeSet. It's very straightforward to decide which of these two to use. HashSet is much faster (constant time versus log time for most operations) but offers no ordering guarantees. If you need to use the operations in the SortedSet or if in-order iteration is important to you, use TreeSet. Otherwise, use HashSet. It's a fair bet that you'll end up using HashSet most of the time.

One thing worth keeping in mind about HashSet is that iteration is linear in the sum of the number of entries and the number of buckets (the capacity). Thus, it's important to choose an appropriate initial capacity if iteration performance is important. Choosing a capacity that's too high can waste both space and time. The default initial capacity is 101, and that's often more than you need. The initial capacity may be specified by using the int constructor. The following line of code allocates a HashSet whose initial capacity is 17:

Set s= new HashSet(17);

HashSets have one other tuning parameter, called the load factor. If you care deeply about the space consumption of your HashSet, read the HashSet documentation [1] for more information. Otherwise, just live with the default. If you accept the default load factor but want to specify an initial capacity, pick a number that's about twice the size that you expect the Set to grow to. If your guess is way off, it may have to grow, or you may waste a bit of space, but either way, it's no big problem. If you know a prime number of about the right size, use it. If not, use an odd number. Or use an even number. It doesn't really matter much; these things might make the HashSet perform a wee bit better, but nothing to write home about.

[1] The API documentation for HashSet is available on the CD that accompanies this book and online here: http://java.sun.com/j2se/1.3/docs/api/java/util/HashSet.html.

TreeSet has no tuning parameters. With the exception of clone, neither HashSet nor TreeSet has any operations other than those required by their respective interfaces (Set and TreeSet).

List

The two general-purpose List implementations are ArrayList and LinkedList. Most of the time, you'll probably use ArrayList. It offers constant time positional access and is just plain fast, because it does not have to allocate a node object for each element in the List, and it can take advantage of the native method System.arraycopy when it has to move multiple elements at once. Think of ArrayList as Vector without the synchronization overhead.

If you frequently add elements to the beginning of the List or iterate over the List to delete elements from its interior, you should consider using LinkedList. These operations are constant time in a LinkedList but linear time in an ArrayList. But you pay a big price! Posi-tional access is linear time in a LinkedList and constant time in an ArrayList. Furthermore, the constant factor for LinkedList is much worse. If you think that you want to use a LinkedList, measure the performance with both LinkedList and ArrayList. You may be surprised.

ArrayList has one tuning parameter, the initial capacity. It refers to the number of elements the ArrayList can hold before it has to grow. There's not much to say about it. The only ArrayList operations that are not required by List are ensureCapacity and trimToSize, which alter the excess capacity, and clone.

LinkedList has no tuning parameters and seven optional operations, one of which is clone. The other six are addFirst, getFirst, removeFirst, addLast, getLast, and removeLast. They make it a bit more convenient to use a LinkedList as a queue or as a double-ended queue (dequeue), but they also prevent you from easily switching representations when you discover that ArrayList is faster.

If you need synchronization, a Vector will be slightly faster than an ArrayList synchronized with Collections.synchronizedList. But Vector has loads of legacy operations, so be extra careful to always manipulate the Vector with the List interface, or you'll be stuck with it for life.

If your List is fixed in sizethat is, you'll never use remove, add, or any of the bulk operations other than containsAllyou have a third option that's definitely worth considering. See Arrays.asList in the section Convenience Implementations (page 514) for more information.

Map

The two general-purpose Map implementations are HashMap and TreeMap. The situation for Map is exactly analogous to Set. If you need SortedMap operations or in-order Collection view iteration, go for TreeMap; otherwise, go for HashMap. Everything else in the section Set (page 510) also applies to Map.

Completeness requires that we mention Hashtable. As with Vector and ArrayList, if you need synchronization, a Hashtable will be slightly faster than a HashMap synchronized with Collections.synchronizedMap. Again, Hashtable has loads of legacy operations, so be extra careful always to manipulate it with the Map interface, or you'll be stuck with it for life.

Wrapper Implementations

Wrapper implementations delegate all their real work to a specified collection but add extra functionality on top of what this collection offers. For design patterns fans, this is an example of the decorator pattern. Although it may seem a bit exotic, it's really pretty straightforward.

These implementations are anonymous: Rather than providing a public class, the Java 2 SDK provides a static factory method. All these implementations are found in the Collections API, which consists solely of static methods.

Synchronization Wrappers

The synchronization wrappers add automatic synchronization (thread safety) to an arbitrary collection. Each of the six core collection interfaces has one static factory method:

public static Collection synchronizedCollection(Collection c); 
public static Set synchronizedSet(Set s); 
public static List synchronizedList(List list); 
public static Map synchronizedMap(Map m); 
public static SortedSet synchronizedSortedSet(SortedSet s); 
public static SortedMap synchronizedSortedMap(SortedMap m);

Each of these methods returns a synchronized (thread-safe) Collection backed by the specified collection. In order to guarantee serial access, all access to the backing collection must be accomplished through the returned collection. The easy way to guarantee this is not to keep a reference to the backing collection. Creating the synchronized collection like this does the trick:

List list = Collections.synchronizedList(new ArrayList());

A collection created in this fashion is every bit as thread-safe as a normally synchronized collection, such as a Vector.

In the face of concurrent access, it is imperative that the user manually synchronize on the returned collection when iterating over it. The reason is that iteration is accomplished via multiple calls into the collection, which must be composed into a single atomic operation. The idiom to iterate over a wrapper-synchronized collection is:

Collection c = 
 Collections.synchronizedCollection(myCollection); 
synchronized(c) { 
 Iterator i = c.iterator(); // Must be in synchronized 
 // block! 
 while (i.hasNext()) { 
 foo(i.next()); 
 } 
}

Note that failure to follow this advice may result in nondeterministic behavior.

The idiom for iterating over a Collection view of a synchronized Map is similar; however, it is imperative that the user manually synchronize on the synchronized Map when iterating over any of its Collection views rather than synchronizing on the Collection view itself.

Map m = Collections.synchronizedMap(new HashMap()); 
 ... 
Set s = m.keySet(); // Needn't be in synchronized block 
 ... 
synchronized(m) { // Synchronizing on m, not s! 
 Iterator i = s.iterator(); // Must be in synchronized block 
 while (i.hasNext()) { 
 foo(i.next(); 
 } 
}

One minor downside of using wrapper implementations is that you do not have the ability to execute any noninterface operations of a wrapped implementation. So, for instance, in the preceding List example, one cannot call ArrayList's ensureCapacity operation on the wrapped ArrayList.

Unmodifiable Wrappers

The unmodifiable wrappers are conceptually similar to the synchronization wrappers but simpler. Rather than adding functionality to the wrapped collection, they take it away. In particular, they take away the ability to modify the collection, by intercepting all the operations that would modify the collection and throwing an UnsupportedOperationException. The unmodifiable wrappers have two main uses:

To make a collection immutable once it has been built. In this case, it's good practice not to maintain a reference to the backing collection. This absolutely guarantees immutability.
To allow "second-class citizens" read-only access to your data structures. You keep a

reference to the backing collection but hand out a reference to the wrapper. In this way, the second-class citizens can look but not touch, while you maintain full access.

Like the synchronization wrappers, each of the six core collection interfaces has one static factory method.

public static Collection unmodifiableCollection(Collection c); 
public static Set unmodifiableSet(Set s); 
public static List unmodifiableList(List list); 
public static Map unmodifiableMap(Map m); 
public static SortedSet unmodifiableSortedSet(SortedSet s); 
public static SortedMap unmodifiableSortedMap(SortedMap m);

Convenience Implementations

This section describes several mini-implementations that can be more convenient and more efficient than the general-purpose implementations when you don't need their full power. All the implementations in this section are made available via static factory methods or exported constants rather than public classes.

List View of an Array

The Arrays.asList method returns a List view of its array argument. Changes to the List write through to the array and vice versa. The size of the collection is that of the array and cannot be changed. If the add or the remove method is called on the List, an UnsupportedOperationException will result.

The normal use of this implementation is as a bridge between array-based and collection-based APIs. It allows you to pass an array to a method expecting a Collection or a List. However, this implementation also has another use. If you need a fixed-size List, it's more efficient than any general-purpose List implementation. Here's the idiom:

List l = Arrays.asList(new Object[size]);

Note that a reference to the backing array is not retained.

Immutable Multiple-Copy List

Occasionally, you'll need an immutable List consisting of multiple copies of the same element. The Collections.nCopies method returns such a List. This implementation has two main uses. The first is to initialize a newly created List. For example, suppose that you want an ArrayList initially consisting of 1,000 null elements. The following incantation does the trick:

List l = new ArrayList(Collections.nCopies(1000, null);

Of course, the initial value of each element needn't be null. The second main use is to grow an existing List. For example, suppose that you want to add 69 copies of the string fruit bat to the end of a List. It's not clear why you'd want to do such a thing, but let's just suppose you did. Here's how you'd do it:

lovablePets.addAll(Collections.nCopies(69, "fruit bat");

By using the form of addAll that takes both an index and a Collection, you can add the new elements to the middle of a List instead of at the end.

Immutable Singleton Set

Sometimes, you'll need an immutable singleton Set, which consists of a single specified element. The Collections.singleton method returns such a Set. One use of this implementation is to remove all occurrences of a specified element from a Collection:

c.removeAll(Collections.singleton(e));

A related idiom removes from a Map all elements that map to a specified value. For example, suppose that you have a Map, profession, that maps people to their line of work. Suppose that you want to eliminate all the lawyers. This one-liner will do the deed:

profession.values().removeAll(Collections.singleton(LAWYER));

One more use of this implementation is to provide a single input value to a method that is written to accept a Collection of values.

Empty Set and Empty List Constants

The Collections class provides two constants, representing the empty Set and the empty List, Collections.EMPTY_SET and Collections.EMPTY_LIST. The main use of these constants is as input to methods that take a Collection of values, when you don't want to provide any values at all.

Getting Started

Object-Oriented Programming Concepts

Language Basics

Object Basics and Simple Data Objects

Classes and Inheritance

Interfaces and Packages

Handling Errors Using Exceptions

Threads: Doing Two or More Tasks at Once

I/O: Reading and Writing

User Interfaces That Swing

Appendix A. Common Problems and Their Solutions

Appendix B. Internet-Ready Applets

Appendix C. Collections

Appendix D. Deprecated Thread Methods

Appendix E. Reference