Atomic variables are finer-grained and lighter-weight than locks, and are critical for implementing high-performance concurrent code on multiprocessor systems. Atomic variables limit the scope of contention to a single variable; this is as finegrained as you can get (assuming your algorithm can even be implemented using such fine granularity). The fast (uncontended) path for updating an atomic variable is no slower than the fast path for acquiring a lock, and usually faster; the slow path is definitely faster than the slow path for locks because it does not involve suspending and rescheduling threads. With algorithms based on atomic variables instead of locks, threads are more likely to be able to proceed without delay and have an easier time recovering if they do experience contention.
The atomic variable classes provide a generalization of volatile variables to support atomic conditional read-modify-write operations. AtomicInteger represents an int value, and provides get and set methods with the same memory semantics as reads and writes to a volatile int. It also provides an atomic compareAndSet method (which if successful has the memory effects of both reading and writing a volatile variable) and, for convenience, atomic add, increment, and decrement methods. AtomicInteger bears a superficial resemblance to an extended Counter class, but offers far greater scalability under contention because it can directly exploit underlying hardware support for concurrency.
There are twelve atomic variable classes, divided into four groups: scalars, field updaters, arrays, and compound variables. The most commonly used atomic variables are the scalars: AtomicInteger, AtomicLong, AtomicBoolean, and AtomicReference. All support CAS; the Integer and Long versions support arithmetic as well. (To simulate atomic variables of other primitive types, you can cast short or byte values to and from int, and use floatToIntBits or doubleToLongBits for floating-point numbers.)
The atomic array classes (available in Integer, Long, and Reference versions) are arrays whose elements can be updated atomically. The atomic array classes provide volatile access semantics to the elements of the array, a feature not available for ordinary arraysa volatile array has volatile semantics only for the array reference, not for its elements. (The other types of atomic variables are discussed in Sections 15.4.3 and 15.4.4.)
While the atomic scalar classes extend Number, they do not extend the primitive wrapper classes such as Integer or Long. In fact, they cannot: the primitive wrapper classes are immutable whereas the atomic variable classes are mutable. The atomic variable classes also do not redefine hashCode or equals; each instance is distinct. Like most mutable objects, they are not good candidates for keys in hash-based collections.
15.3.1. Atomics as "Better Volatiles"
In Section 3.4.2, we used a volatile reference to an immutable object to update multiple state variables atomically. That example relied on check-then-act, but in that particular case the race was harmless because we did not care if we occasionally lost an update. In most other situations, such a check-then-act would not be harmless and could compromise data integrity. For example, NumberRange on page 67 could not be implemented safely with a volatile reference to an immutable holder object for the upper and lower bounds, nor with using atomic integers to store the bounds. Because an invariant constrains the two numbers and they cannot be updated simultaneously while preserving the invariant, a number range class using volatile references or multiple atomic integers will have unsafe check-then-act sequences.
We can combine the technique from OneValueCache with atomic references to close the race condition by atomically updating the reference to an immutable object holding the lower and upper bounds. CasNumberRange in Listing 15.3 uses an AtomicReference to an IntPair to hold the state; by using compareAndSet it can update the upper or lower bound without the race conditions of NumberRange.
Listing 15.3. Preserving Multivariable Invariants Using CAS.
public class CasNumberRange { @Immutable private static class IntPair { final int lower; // Invariant: lower <= upper final int upper; ... } private final AtomicReference values = new AtomicReference(new IntPair(0, 0)); public int getLower() { return values.get().lower; } public int getUpper() { return values.get().upper; } public void setLower(int i) { while (true) { IntPair oldv = values.get(); if (i > oldv.upper) throw new IllegalArgumentException( "Can't set lower to " + i + " > upper"); IntPair newv = new IntPair(i, oldv.upper); if (values.compareAndSet(oldv, newv)) return; } } // similarly for setUpper } |
15.3.2. Performance Comparison: Locks Versus Atomic Variables
To demonstrate the differences in scalability between locks and atomic variables, we constructed a benchmark comparing several implementations of a pseudorandom number generator (PRNG). In a PRNG, the next "random" number is a deterministic function of the previous number, so a PRNG must remember the previous number as part of its state.
Listings 15.4 and 15.5 show two implementations of a thread-safe PRNG, one using ReentrantLock and the other using AtomicInteger. The test driver invokes each repeatedly; each iteration generates a random number (which fetches and modifies the shared seed state) and also performs a number of "busy-work" iterations that operate strictly on thread-local data. This simulates typical operations that include some portion of operating on shared state and some portion of operating on thread-local state.
Figures 15.1 and 15.2 show throughput with low and moderate levels of simulated work in each iteration. With a low level of thread-local computation, the lock or atomic variable experiences heavy contention; with more thread-local computation, the lock or atomic variable experiences less contention since it is accessed less often by each thread.
Figure 15.1. Lock and AtomicInteger Performance Under High Contention.
Figure 15.2. Lock and AtomicInteger Performance Under Moderate Contention.
Listing 15.4. Random Number Generator Using ReentrantLock.
@ThreadSafe public class ReentrantLockPseudoRandom extends PseudoRandom { private final Lock lock = new ReentrantLock(false); private int seed; ReentrantLockPseudoRandom(int seed) { this.seed = seed; } public int nextInt(int n) { lock.lock(); try { int s = seed; seed = calculateNext(s); int remainder = s % n; return remainder > 0 ? remainder : remainder + n; } finally { lock.unlock(); } } } |
Listing 15.5. Random Number Generator Using AtomicInteger.
@ThreadSafe public class AtomicPseudoRandom extends PseudoRandom { private AtomicInteger seed; AtomicPseudoRandom(int seed) { this.seed = new AtomicInteger(seed); } public int nextInt(int n) { while (true) { int s = seed.get(); int nextSeed = calculateNext(s); if (seed.compareAndSet(s, nextSeed)) { int remainder = s % n; return remainder > 0 ? remainder : remainder + n; } } } } |
As these graphs show, at high contention levels locking tends to outperform atomic variables, but at more realistic contention levels atomic variables outperform locks.[6] This is because a lock reacts to contention by suspending threads, reducing CPU usage and synchronization traffic on the shared memory bus. (This is similar to how blocking producers in a producer-consumer design reduces the load on consumers and thereby lets them catch up.) On the other hand, with atomic variables, contention management is pushed back to the calling class. Like most CAS-based algorithms, AtomicPseudoRandom reacts to contention by trying again immediately, which is usually the right approach but in a high-contention environment just creates more contention.
[6] The same holds true in other domains: traffic lights provide better throughput for high traffic but rotaries provide better throughput for low traffic; the contention scheme used by ethernet networks performs better at low traffic levels, but the token-passing scheme used by token ring networks does better with heavy traffic.
Before we condemn AtomicPseudoRandom as poorly written or atomic variables as a poor choice compared to locks, we should realize that the level of contention in Figure 15.1 is unrealistically high: no real program does nothing but contend for a lock or atomic variable. In practice, atomics tend to scale better than locks because atomics deal more effectively with typical contention levels.
The performance reversal between locks and atomics at differing levels of contention illustrates the strengths and weaknesses of each. With low to moderate contention, atomics offer better scalability; with high contention, locks offer better contention avoidance. (CAS-based algorithms also outperform lock-based ones on single-CPU systems, since a CAS always succeeds on a single-CPU system except in the unlikely case that a thread is preempted in the middle of the read-modify-write operation.)
Figures 15.1 and 15.2 include a third curve; an implementation of PseudoRandom that uses a THReadLocal for the PRNG state. This implementation approach changes the behavior of the classeach thread sees its own private sequence of pseudorandom numbers, instead of all threads sharing one sequencebut illustrates that it is often cheaper to not share state at all if it can be avoided. We can improve scalability by dealing more effectively with contention, but true scalability is achieved only by eliminating contention entirely.
Introduction
Part I: Fundamentals
Thread Safety
Sharing Objects
Composing Objects
Building Blocks
Part II: Structuring Concurrent Applications
Task Execution
Cancellation and Shutdown
Applying Thread Pools
GUI Applications
Part III: Liveness, Performance, and Testing
Avoiding Liveness Hazards
Performance and Scalability
Testing Concurrent Programs
Part IV: Advanced Topics
Explicit Locks
Building Custom Synchronizers
Atomic Variables and Nonblocking Synchronization
The Java Memory Model