What is a Memory Model, and Why would I Want One?

Suppose one thread assigns a value to aVariable:

aVariable = 3;

A memory model addresses the question "Under what conditions does a thread that reads aVariable see the value 3?" This may sound like a dumb question, but in the absence of synchronization, there are a number of reasons a threadmight not immediatelyor eversee the results of an operation in another thread. Compilers may generate instructions in a different order than the "obvious" one suggested by the source code, or store variables in registers instead of in memory; processors may execute instructions in parallel or out of order; caches may vary the order in which writes to variables are committed to main memory; and values stored in processor-local caches may not be visible to other processors. These factors can prevent a thread from seeing the most up-to-date value for a variable and can cause memory actions in other threads to appear to happen out of orderif you don't use adequate synchronization.

In a single-threaded environment, all these tricks played on our program by the environment are hidden from us and have no effect other than to speed up execution. The Java Language Specification requires the JVM to maintain withinthread as-if-serial semantics: as long as the program has the same result as if it were executed in program order in a strictly sequential environment, all these games are permissible. And that's a good thing, too, because these rearrangements are responsible for much of the improvement in computing performance in recent years. Certainly higher clock rates have contributed to improved performance, but so has increased parallelismpipelined superscalar execution units, dynamic instruction scheduling, speculative execution, and sophisticated multilevel memory caches. As processors have become more sophisticated, so too have compilers, rearranging instructions to facilitate optimal execution and using sophisticated global register-allocation algorithms. And as processor manufacturers transition to multicore processors, largely because clock rates are getting harder to increase economically, hardware parallelism will only increase.

In a multithreaded environment, the illusion of sequentiality cannot be maintained without significant performance cost. Sincemost of the time threads within a concurrent application are each "doing their own thing", excessive inter-thread coordination would only slow down the application to no real benefit. It is only when multiple threads share data that it is necessary to coordinate their activities, and the JVM relies on the program to identify when this is happening by using synchronization.

The JMM specifies the minimal guarantees the JVM must make about when writes to variables become visible to other threads. It was designed to balance the need for predictability and ease of program development with the realities of implementing high-performance JVMs on a wide range of popular processor architectures. Some aspects of the JMM may be disturbing at first if you are not familiar with the tricks used by modern processors and compilers to squeeze extra performance out of your program.

16.1.1. Platform Memory Models

In a shared-memory multiprocessor architecture, each processor has its own cache that is periodically reconciled with main memory. Processor architectures provide varying degrees of cache coherence; some provide minimal guarantees that allow different processors to see different values for the same memory location at virtually any time. The operating system, compiler, and runtime (and sometimes, the program, too) must make up the difference between what the hardware provides and what thread safety requires.

Ensuring that every processor knows what every other processor is doing at all times is expensive. Most of the time this information is not needed, so processors relax their memory-coherency guarantees to improve performance. An architecture's memory model tells programs what guarantees they can expect from the memory system, and specifies the special instructions required (called memory barriers or fences) to get the additional memory coordination guarantees required when sharing data. In order to shield the Java developer from the differences between memory models across architectures, Java provides its own memory model, and the JVMdeals with the differences between the JMMand the underlying platform's memory model by inserting memory barriers at the appropriate places.

One convenient mental model for program execution is to imagine that there is a single order in which the operations happen in a program, regardless of what processor they execute on, and that each read of a variable will see the last write in the execution order to that variable by any processor. This happy, if unrealistic, model is called sequential consistency. Software developers often mistakenly assume sequential consistency, but no modern multiprocessor offers sequential consistency and the JMM does not either. The classic sequential computing model, the von Neumann model, is only a vague approximation of how modern multiprocessors behave.

The bottom line is that modern shared-memory multiprocessors (and compilers) can do some surprising things when data is shared across threads, unless you've told them not to through the use of memory barriers. Fortunately, Java programs need not specify the placement of memory barriers; they need only identify when shared state is being accessed, through the proper use of synchronization.

16.1.2. Reordering

In describing race conditions and atomicity failures in Chapter 2, we used interaction diagrams depicting "unlucky timing" where the scheduler interleaved operations so as to cause incorrect results in insufficiently synchronized programs. To make matters worse, the JMM can permit actions to appear to execute in different orders from the perspective of different threads, making reasoning about ordering in the absence of synchronization even more complicated. The various reasons why operations might be delayed or appear to execute out of order can all be grouped into the general category of reordering.

PossibleReordering in Listing 16.1 illustrates how difficult it is to reason about the behavior of even the simplest concurrent programs unless they are correctly synchronized. It is fairly easy to imagine how PossibleReordering could print (1, 0), or (0, 1), or (1, 1): thread A could run to completion before B starts, B could run to completion before A starts, or their actions could be interleaved. But, strangely, PossibleReordering can also print (0, 0)! The actions in each thread have no dataflow dependence on each other, and accordingly can be executed out of order. (Even if they are executed in order, the timing by which caches are flushed to main memory can make it appear, from the perspective of B, that the assignments in A occurred in the opposite order.) Figure 16.1 shows a possible interleaving with reordering that results in printing (0, 0).

Figure 16.1. Interleaving Showing Reordering in PossibleReordering.

PossibleReordering is a trivial program, and it is still surprisingly tricky to enumerate its possible results. Reordering at the memory level can make programs behave unexpectedly. It is prohibitively difficult to reason about ordering in the absence of synchronization; it is much easier to ensure that your program uses synchronization appropriately. Synchronization inhibits the compiler, runtime, and hardware from reordering memory operations in ways that would violate the visibility guarantees provided by the JMM.[1]

[1] On most popular processor architectures, the memory model is strong enough that the performance cost of a volatile read is in line with that of a nonvolatile read.

16.1.3. The Java Memory Model in 500 Words or Less

The Java Memory Model is specified in terms of actions, which include reads and writes to variables, locks and unlocks of monitors, and starting and joining with threads. The JMM defines a partial ordering [2] called happens-before on all actions within the program. To guarantee that the thread executing action B can see the results of action A (whether or not A and B occur in different threads), there must be a happens-before relationship between A and B. In the absence of a happens-before ordering between two operations, the JVM is free to reorder them as it pleases.

[2] A partial ordering is a relation on a set that is antisymmetric, reflexive, and transitive, but for any two elements x and y, it need not be the case that x y or y x. We use partial orderings every day to express preferences; we may prefer sushi to cheeseburgers and Mozart to Mahler, but we don't necessarily have a clear preference between cheeseburgers and Mozart.

Listing 16.1. Insufficiently Synchronized Program that can have Surprising Results. Don't Do this.

public class PossibleReordering {
 static int x = 0, y = 0;
 static int a = 0, b = 0;

 public static void main(String[] args)
 throws InterruptedException {
 Thread one = new Thread(new Runnable() {
 public void run() {
 a = 1;
 x = b;
 }
 });
 Thread other = new Thread(new Runnable() {
 public void run() {
 b = 1;
 y = a;
 }
 });
 one.start(); other.start();
 one.join(); other.join();
 System.out.println("( "+ x + "," + y + ")");
 }
}

A data race occurs when a variable is read by more than one thread, and written by at least one thread, but the reads and writes are not ordered by happens-before. A correctly synchronized program is one with no data races; correctly synchronized programs exhibit sequential consistency, meaning that all actions within the program appear to happen in a fixed, global order.

The rules for happens-before are:

Program order rule. Each action in a thread happens-before every action in that thread that comes later in the program order.

Monitor lock rule. An unlock on a monitor lock happens-before every subsequent lock on that same monitor lock.[3]

Volatile variable rule. A write to a volatile field happens-before every subsequent read of that same field.[4]

Thread start rule. A call to Thread.start on a thread happens-before every action in the started thread.

Thread termination rule. Any action in a thread happens-before any other thread detects that thread has terminated, either by successfully return from Thread.join or by Thread.isAlive returning false.

Interruption rule. A thread calling interrupt on another thread happens-before the interrupted thread detects the interrupt (either by having InterruptedException tHRown, or invoking isInterrupted or interrupted).

Finalizer rule. The end of a constructor for an object happens-before the start of the finalizer for that object.

Transitivity. If A happens-before B, and B happens-before C, then A happens-before C.

[3] Locks and unlocks on explicit Lock objects have the same memory semantics as intrinsic locks.

[4] Reads and writes of atomic variables have the same memory semantics as volatile variables.

Even though actions are only partially ordered, synchronization actionslock acquisition and release, and reads and writes of volatile variablesare totally ordered. This makes it sensible to describe happens-before in terms of "subsequent" lock acquisitions and reads of volatile variables.

Figure 16.2 illustrates the happens-before relation when two threads synchronize using a common lock. All the actions within thread A are ordered by the program order rule, as are the actions within thread B. Because A releases lock M and B subsequently acquires M, all the actions in A before releasing the lock are therefore ordered before the actions in B after acquiring the lock. When two threads synchronize on different locks, we can't say anything about the ordering of actions between themthere is no happens-before relation between the actions in the two threads.

Figure 16.2. Illustration of Happens-before in the Java Memory Model.

16.1.4. Piggybacking on Synchronization

Because of the strength of the happens-before ordering, you can sometimes piggyback on the visibility properties of an existing synchronization. This entails combining the program order rule for happens-before with one of the other ordering rules (usually the monitor lock or volatile variable rule) to order accesses to a variable not otherwise guarded by a lock. This technique is very sensitive to the order in which statements occur and is therefore quite fragile; it is an advanced technique that should be reserved for squeezing the last drop of performance out of the most performance-critical classes like ReentrantLock.

The implementation of the protected AbstractQueuedSynchronizer methods in FutureTask illustrates piggybacking. AQS maintains an integer of synchronizer state that FutureTask uses to store the task state: running, completed, or cancelled. But FutureTask also maintains additional variables, such as the result of the computation. When one thread calls set to save the result and another thread calls get to retrieve it, the two had better be ordered by happens-before. This could be done by making the reference to the result volatile, but it is possible to exploit existing synchronization to achieve the same result at lower cost.

FutureTask is carefully crafted to ensure that a successful call to tryReleaseShared always happens-before a subsequent call to TRyAcquireShared; try-ReleaseShared always writes to a volatile variable that is read by TRyAcquire-Shared. Listing 16.2 shows the innerSet and innerGet methods that are called when the result is saved or retrieved; since innerSet writes result before calling releaseShared (which calls tryReleaseShared) and innerGet reads result after calling acquireShared (which calls TRyAcquireShared), the program order rule combines with the volatile variable rule to ensure that the write of result in innerGet happens-before the read of result in innerGet.

Listing 16.2. Inner Class of FutureTask Illustrating Synchronization Piggybacking.

// Inner class of FutureTask
private final class Sync extends AbstractQueuedSynchronizer {
 private static final int RUNNING = 1, RAN = 2, CANCELLED = 4;
 private V result;
 private Exception exception;

 void innerSet(V v) {
 while (true) {
 int s = getState();
 if (ranOrCancelled(s))
 return;
 if (compareAndSetState(s, RAN))
 break;
 }
 result = v;
 releaseShared(0);
 done();
 }

 V innerGet() throws InterruptedException, ExecutionException {
 acquireSharedInterruptibly(0);
 if (getState() == CANCELLED)
 throw new CancellationException();
 if (exception != null)
 throw new ExecutionException(exception);
 return result;
 }
}

We call this technique "piggybacking" because it uses an existing happensbefore ordering that was created for some other reason to ensure the visibility of object X, rather than creating a happens-before ordering specifically for publishing X.

Piggybacking of the sort employed by FutureTask is quite fragile and should not be undertaken casually. However, in some cases piggybacking is perfectly reasonable, such as when a class commits to a happens-before ordering between methods as part of its specification. For example, safe publication using a BlockingQueue is a form of piggybacking. One thread putting an object on a queue and another thread subsequently retrieving it constitutes safe publication because there is guaranteed to be sufficient internal synchronization in a BlockingQueue implementation to ensure that the enqueue happens-before the dequeue.

Other happens-before orderings guaranteed by the class library include:

Placing an item in a thread-safe collection happens-before another thread retrieves that item from the collection;
Counting down on a CountDownLatch happens-before a thread returns from await on that latch;
Releasing a permit to a Semaphore happens-before acquiring a permit from that same Semaphore;
Actions taken by the task represented by a Future happens-before another thread successfully returns from Future.get;
Submitting a Runnable or Callable to an Executor happens-before the task begins execution; and
A thread arriving at a CyclicBarrier or Exchanger happens-before the other threads are released from that same barrier or exchange point. If CyclicBarrier uses a barrier action, arriving at the barrier happens-before the barrier action, which in turn happens-before threads are released from the barrier.

Introduction

Part I: Fundamentals

Thread Safety

Sharing Objects

Composing Objects

Building Blocks

Part II: Structuring Concurrent Applications

Part II: Structuring Concurrent Applications

Task Execution

Cancellation and Shutdown

Applying Thread Pools

GUI Applications

Part III: Liveness, Performance, and Testing

Avoiding Liveness Hazards

Performance and Scalability

Testing Concurrent Programs

Part IV: Advanced Topics

Explicit Locks

Building Custom Synchronizers

Atomic Variables and Nonblocking Synchronization

The Java Memory Model

The Java Memory Model
What is a Memory Model, and Why would I Want One?
Publication
Initialization Safety
Summary
Appendix A. Annotations for Concurrency
Section A.1. Class Annotations
Section A.2. Field and Method Annotations