Section 11.5. Collections and Thread Safety

11.5. Collections and Thread Safety

When a Java program is running, it is executing one or more execution streams, or threads. A thread is like a lightweight process, so a program simultaneously executing several threads can be thought of as a computer running several programs simultaneously, but with one important difference: different threads can simultaneously access the same memory locations and other system resources. On machines with multiple processors, truly concurrent thread execution can be achieved by assigning a processor to each thread. If, however, there are more threads than processorsthe usual casemultithreading is implemented by time slicing, in which a processor executes some instructions from each thread in turn before switching to the next one.

There are two good reasons for using multithread programming. An obvious one, in the case of multicore and multiprocessor machines, is to share the work and get it done quicker. (This reason is becoming ever more compelling as hardware designers turn increasingly to parallelism as the way of improving overall performance.) A second one is that two operations may take varying, perhaps unknown, amounts of time, and you do not want the response to one operation to await the completion of the other. This is particularly true for the a graphical user interface (GUI), where the response to the user clicking a button should be immediate, and should not be delayed if, say, the program happens to be running a compute-intensive part of the application at the time.

Although concurrency may be essential to achieving good performance, it comes at a price. Different threads simultaneously accessing the same memory location can produce unexpected results, unless you take care to constrain their access. Consider Example 11.2, in which the class ArrayStack uses an array and an index to implement the interface Stack, which models a stack of int (despite the similarity of names, this example is different from Example 5.1). For ArrayStack to work correctly, the variable index should always point at the top element of the stack, no matter how many elements are added to or removed from the stack. This is an invariant of the class. Now think about what can happen if two threads simultaneously attempt to push an element on to the stack. As part of the push method, each will execute the lines //1 and //2, which are correct in a single-threaded environment but in a multi-threaded environment may break the invariant. For example, if thread A executes line //1, tHRead B executes line //1 and then line //2, and finally thread A executes line //2, only the value added by thread B will now be on the stack, and it will have overwritten the value added by thread A. The stack pointer, though, will have been incremented by two, so the value in the top position of the stack is whatever happened to be there before. This is called a race condition, and it will leave the program in an inconsistent state, likely to fail because other parts of it will depend on the invariant being true.

A non-thread-safe stack implementation

 interface Stack {   public void push(int elt);   public int pop();   public boolean isEmpty(); } class     ArrayStack implements Stack{   private final int MAX_ELEMENTS = 10;   private int[] stack;   private int index;   public ArrayStack() {     stack = new int[MAX_ELEMENTS];     index = -1;   }   public void push(int elt) {     if (index != stack.length - 1) {       index++;                                        //1       stack[index] = elt;                             //2     } else {       throw new IllegalStateException("stack overflow");     }   }   public int pop() {     if (index != -1) {       return stack[index];       index--;     } else {       throw new IllegalStateException("stack underflow");     }   }   public boolean isEmpty() { return index == -1; } }

The increasing importance of concurrent programming during the lifetime of Java has led to a corresponding emphasis in the collections library on flexible and efficient concurrency policies. As a user of the Java collections, you need a basic understanding of the concurrency policies of the different collections in order to know how to choose between them and how to use them appropriately. In this section, we'll briefly outline the different ways in which the Framework collections handle concurrency, and the implications for the programmer. For a full treatment of the general theory of concurrent programming, see Concurrent Programming in Java by Doug Lea (Addison-Wesley), and for detail about concurrency in Java, and the collections implementations, see Java Concurrency in Practice by Brian Goetz et. al. (Addison-Wesley).

11.5.1. Synchronization and the Legacy Collections

Code like that in ArrayStack is not thread-safeit works when executed by a single thread, but may break in a multi-threaded environment. Since the incorrect behavior we observed involved two threads simultaneously executing the push method, we could change the program to make that impossible. Using synchronized to modify the declaration of the push method will guarantee that once a thread has started to execute it, all other threads are excluded from that method until the execution is done:

 public synchronized void push(int elt) { ... }

This is called synchronizing on a critical section of code, in this case the whole of the push method. Before a thread can execute synchronized code, it has to get the lock on some objectby default, as in this case, the current object. While a lock is held by one thread, another thread that tries to enter any critical section synchronized on that lock will blockthat is, will be suspendeduntil it can get the lock. This synchronized version of push is thread-safe; in a multi-threaded environment, each thread behaves consistently with its behavior in a single-threaded environment. To safeguard the invariant and make ArrayStack as a whole thread-safe, the methods pop and isEmpty must also be synchronized on the same object. The method isEmpty doesn't write to shared data, so synchronizing it isn't required to prevent a race condition, but for a different reason. Each thread may use a separate memory cache, which means that writes by one thread may not be seen by another unless either they both take place within blocks synchronized on the same lock, or unless the variable is marked with the volatile keyword.

Full method synchronization was, in fact, the policy of the collection classes provided in JDK1.0: Vector, Hashtable, and their subclasses; all methods that access their instance data are synchronized. These are now regarded as legacy classes to be avoided because of the high price this policy imposes on all clients of these classes, whether they require thread safety or not. Synchronization can be very expensive: forcing threads to queue up to enter the critical section one at a time slows down the overall execution of the program, and the overhead of administering locks can be very high, especially if they are often contended.

11.5.2. JDK 1.2: Synchronized Collections and Fail-Fast Iterators

The performance cost of internal synchronization in the JDK1.0 collections led the designers to avoid it when the Collections Framework was first introduced in JDK 1.2. Instead, the platform implementations of the interfaces List, Set, and Map widened the programmer's choice of concurrency policies. To provide maximum performance for single-threaded execution, the new collections provided no concurrency control at all. (More recently, the same policy change has been made for the synchronized class StringBuffer, which was complemented in Java 5 by its unsynchronized equivalent, StringBuilder.)

Along with this change came a new concurrency policy for collection iterators. In multi-threaded environments, a thread which has obtained an iterator will usually continue to use it while other threads modify the original collection. So iterator behavior has to be considered as an integral part of a collection®s concurrency policy. The policy of the iterators for the Java 2 collections is to fail fast, as described in Section 11.1: every time they access the backing collection, they check it for structural modification (which, in general, means that elements have been added or removed from the collection). If they detect structural modification, they fail immediately, throwing ConcurrentModificationException rather than continuing to attempt to iterate over the modified collection with unpredictable results. Note that this fail-fast behavior is provided to help find and diagnose bugs; it is not guaranteed as part of the collection contract.

The appearance of Java collections without compulsory synchronization was a welcome development. However, thread-safe collections were still required in many situations, so the Framework provided an option to use the new collections with the old concurrency policy, by means of synchronized wrappers (see Chapter 17). These are created by calling one of the factory methods in the Collections class, supplying an unsynchronized collection which it will encapsulate. For example, to make a synchronized List, you could supply an instance of ArrayList to be wrapped. The wrapper implements the interface by delegating method calls to the collection you supplied, but the calls are synchronized on the wrapper object itself. Example 11.3 shows a synchronized wrapper for the interface Stack of Example 11.2. To get a thread-safe Stack, you would write:

A synchronized wrapper `ArrayStack`

 public class SynchronizedArrayStack implements Stack {   private final Stack stack;   public SynchronizedArrayStack(Stack stack) {     this.stack = stack;   }   public synchronized void push(int elt) { stack.push(elt); }   public synchronized int pop() { return stack.pop(); }   public synchronized boolean isEmpty() { return stack.isEmpty(); } }

 Stack threadSafe = new SynchronizedArrayStack(new ArrayStack());

This is the preferred idiom for using synchronized wrappers; the only reference to the wrapped object is held by the wrapper, so all calls on the wrapped object will be synchronized on the same lockthat belonging to the wrapper object itself. It's important to have the synchronized wrappers available, but you won't use them more than you have to, because they suffer the same performance disadvantages as the legacy collections.

Using Synchronized Collections Safely Even a class like SynchronizedArrayStack, which has fully synchronized methods and is itself thread-safe, must still be used with care in a concurrent environment. For example, this client code is not thread-safe:

 Stack stack = new SynchronizedArrayStack(new ArrayStack()); ... // don't do this in a multi-threaded environment if (!stack.isEmpty()) {   stack.pop();              // can throw IllegalStateException }

The exception would be raised if the last element on the stack were removed by another thread in the time between the evaluation of isEmpty and the execution of pop. This is an example of a common concurrent program bug, sometimes called test-then-act, in which program behavior is guided by information that in some circumstances will be out of date. To avoid it, the test and action must be executed atomically. For synchronized collections (as for the legacy collections), this must be enforced with client-side locking:

 synchronized(stack) {   if (!stack.isEmpty()) {     stack.pop();   } }

For this technique to work reliably, the lock that the client uses to guard the atomic action should be the same one that is used by the methods of the synchronized wrapper. In this example, as in the synchronized collections, the methods of the wrapper are synchronized on the wrapper object itself. (An alternative is to confine references to the collection within a single client, which enforces its own synchronization discipline. But this strategy has limited applicability.)

Client-side locking ensures thread-safety, but at a cost: since other threads cannot use any of the collection's methods while the action is being performed, guarding a long-lasting action (say, iterating over an entire array) will have an impact on throughput. This impact can be very large if the synchronized methods are heavily used; unless your application needs a feature of the synchronized collections, such as exclusive locking, the Java 5 concurrent collections are almost always a better option.

11.5.3. Concurrent Collections: Java 5 and Beyond

Java 5 introduced thread-safe concurrent collections as part of a much larger set of concurrency utilities, including primitivesatomic variables and lockswhich give the Java programmer access to relatively recent hardware innovations for managing concurrent threads, notably compare-and-swap operations, explained below. The concurrent collections remove the necessity for client-side locking as described in the previous sectionin fact, external synchronization is not even possible with these collections, as there is no one object which when locked will block all methods. Where operations need to be atomicfor example, inserting an element into a Map only if it is currently absentthe concurrent collections provide a method specified to perform atomicallyin this case, ConcurrentMap.putIfAbsent.

If you need thread safety, the concurrent collections generally provide much better performance than synchronized collections. This is primarily because their throughput is not reduced by the need to serialize access, as is the case with the synchronized collections. Synchronized collections also suffer the overhead of managing locks, which can be high if there is much contention. These differences can lead to efficiency differences of two orders of magnitude for concurrent access by more than a few threads.

Mechanisms The concurrent collections achieve thread-safety by several different mechanisms. The first of thesethe only one that does not use the new primitivesis copy-on-write. Classes that use copy-on-write store their values in an internal array, which is effectively immutable; any change to the value of the collection results in a new array being created to represent the new values. Synchronization is used by these classes, though only briefly, during the creation of a new array; because read operations do not need to be synchronized, copy-on-write collections perform well in the situations for which they are designedthose in which reads greatly predominate over writes. Copy-on-write is used by the collection classes CopyOnWriteArrayList and CopyOnWriteArraySet.

A second group of thread-safe collections relies on compare-and-swap (CAS), a fundamental improvement on traditional synchronization. To see how it works, consider a computation in which the value of a single variable is used as input to a long-running calculation whose eventual result is used to update the variable. Traditional synchronization makes the whole computation atomic, excluding any other thread from concurrently accessing the variable. This reduces opportunities for parallel execution and hurts throughput. An algorithm based on CAS behaves differently: it makes a local copy of the variable and performs the calculation without getting exclusive access. Only when it is ready to update the variable does it call CAS, which in one atomic operation compares the variable's value with its value at the start and, if they are the same, updates it with the new value. If they are not the same, the variable must have been modified by another thread; in this situation, the CAS thread can try the whole computation again using the new value, or give up, orin some algorithms continue, because the interference will have actually done its work for it! Collections using CAS include ConcurrentLinkedQueue and ConcurrentSkipListMap.

The third group uses implementations of java.util.concurrent.locks.Lock, an interface introduced in Java 5 as a more flexible alternative to classical synchronization. A Lock has the same basic behavior as classical synchronization, but a thread can also acquire it under special conditions: only if the lock is not currently held, or with a timeout, or if the thread is not interrupted. Unlike synchronized code, in which an object lock is held while a code block or a method is executed, a Lock is held until its unlock method is called. Some of the collection classes in this group make use of these facilities to divide the collection into parts that can be separately locked, giving improved concurrency. For example, LinkedBlockingQueue has separate locks for the head and tail ends of the queue, so that elements can be added and removed in parallel. Other collections using these locks include ConcurrentHashMap and most of the implementations of BlockingQueue.

Iterators The mechanisms described above lead to iterator policies more suitable for concurrent use than fail-fast, which implicitly regards concurrent modification as a problem to be eliminated. Copy-on-write collections have snapshot iterators. These collections are backed by arrays which, once created, are never changed; if a value in the collection needs to be changed, a new array is created. So an iterator can read the values in one of these arrays (but never modify them) without danger of them being changed by another thread. Snapshot iterators do not throw ConcurrentModificationException.

Collections which rely on CAS have weakly consistent iterators, which reflect some but not necessarily all of the changes that have been made to their backing collection since they were created. For example, if elements in the collection have been modified or removed before the iterator reaches them, it definitely will reflect these changes, but no such guarantee is made for insertions. Weakly consistent iterators also do not throw ConcurrentModificationException.

The third group described above also mostly have weakly consistent iterators. Two concurrent queues, DelayQueue and PriorityLockingQueue, have fail-fast iterators, because the priority heaps on which they are based can reorder many elements during insertion. In effect, this means that you cannot iterate over one of these queues unless it is quiescent, at a time when no elements are being added or inserted; at other times you should copy its elements into an array using toArray and iterate over that instead.