Deadlock | Avoiding Liveness Hazards

Deadlock is illustrated by the classic, if somewhat unsanitary, "dining philosophers" problem. Five philosophers go out for Chinese food and are seated at a circular table. There are five chopsticks (not five pairs), one placed between each pair of diners. The philosophers alternate between thinking and eating. Each needs to acquire two chopsticks for long enough to eat, but can then put the chopsticks back and return to thinking. There are some chopstick-management algorithms that let everyone eat on a more or less timely basis (a hungry philosopher tries to grab both adjacent chopsticks, but if one is not available, puts down the one that is available and waits a minute or so before trying again), and some that can result in some or all of the philosophers dying of hunger (each philosopher immediately grabs the chopstick to his left and waits for the chopstick to his right to be available before putting down the left). The latter situation, where each has a resource needed by another and is waiting for a resource held by another, and will not release the one they hold until they acquire the one they don't, illustrates deadlock.

When a thread holds a lock forever, other threads attempting to acquire that lock will block forever waiting. When thread A holds lock L and tries to acquire lock M, but at the same time thread B holds M and tries to acquire L, both threads will wait forever. This situation is the simplest case of deadlock (or deadly embrace), where multiple threads wait forever due to a cyclic locking dependency. (Think of the threads as the nodes of a directed graph whose edges represent the relation "Thread A is waiting for a resource held by thread B". If this graph is cyclical, there is a deadlock.)

Database systems are designed to detect and recover from deadlock. A transaction may acquire many locks, and locks are held until the transaction commits. So it is quite possible, and in fact not uncommon, for two transactions to deadlock. Without intervention, they would wait forever (holding locks that are probably required by other transactions as well). But the database server is not going to let this happen. When it detects that a set of transactions is deadlocked (which it does by searching the is-waiting-for graph for cycles), it picks a victim and aborts that transaction. This releases the locks held by the victim, allowing the other transactions to proceed. The application can then retry the aborted transaction, which may be able to complete now that any competing transactions have completed.

The JVM is not nearly as helpful in resolving deadlocks as database servers are. When a set of Java threads deadlock, that's the end of the gamethose threads are permanently out of commission. Depending on what those threads do, the application may stall completely, or a particular subsystem may stall, or performance may suffer. The only way to restore the application to health is to abort and restart itand hope the same thing doesn't happen again.

Like many other concurrency hazards, deadlocks rarely manifest themselves immediately. The fact that a class has a potential deadlock doesn't mean that it ever will deadlock, just that it can. When deadlocks do manifest themselves, it is often at the worst possible timeunder heavy production load.

10.1.1. Lock-ordering Deadlocks

LeftRightDeadlock in Listing 10.1 is at risk for deadlock. The leftRight and rightLeft methods each acquire the left and right locks. If one thread calls leftRight and another calls rightLeft, and their actions are interleaved as shown in Figure 10.1, they will deadlock.

Figure 10.1. Unlucky Timing in LeftRightDeadlock.

The deadlock in LeftRightDeadlock came about because the two threads attempted to acquire the same locks in a different order. If they asked for the locks in the same order, there would be no cyclic locking dependency and therefore no deadlock. If you can guarantee that every thread that needs locks L and M at the same time always acquires L and M in the same order, there will be no deadlock.

A program will be free of lock-ordering deadlocks if all threads acquire the locks they need in a fixed global order.

Verifying consistent lock ordering requires a global analysis of your program's locking behavior. It is not sufficient to inspect code paths that acquire multiple locks individually; both leftRight and rightLeft are "reasonable" ways to acquire the two locks, they are just not compatible. When it comes to locking, the left hand needs to know what the right hand is doing.

Listing 10.1. Simple Lock-ordering Deadlock. Don't Do this.

// Warning: deadlock-prone!
public class LeftRightDeadlock {
 private final Object left = new Object();
 private final Object right = new Object();

 public void leftRight() {
 synchronized (left) {
 synchronized (right) {
 doSomething();
 }
 }
 }

 public void rightLeft() {
 synchronized (right) {
 synchronized (left) {
 doSomethingElse();
 }
 }
 }
}

10.1.2. Dynamic Lock Order Deadlocks

Sometimes it is not obvious that you have sufficient control over lock ordering to prevent deadlocks. Consider the harmless-looking code in Listing 10.2 that transfers funds from one account to another. It acquires the locks on both Account objects before executing the transfer, ensuring that the balances are updated atomically and without violating invariants such as "an account cannot have a negative balance".

How can TRansferMoney deadlock? It may appear as if all the threads acquire their locks in the same order, but in fact the lock order depends on the order of arguments passed to transferMoney, and these in turn might depend on external inputs. Deadlock can occur if two threads call transferMoney at the same time, one transferring from X to Y, and the other doing the opposite:

Listing 10.2. Dynamic Lock-ordering Deadlock. Don't Do this.

// Warning: deadlock-prone!
public void transferMoney(Account fromAccount,
 Account toAccount,
 DollarAmount amount)
 throws InsufficientFundsException {
 synchronized (fromAccount) {
 synchronized (toAccount) {
 if (fromAccount.getBalance().compareTo(amount) < 0)
 throw new InsufficientFundsException();
 else {
 fromAccount.debit(amount);
 toAccount.credit(amount);
 }
 }
 }
}

A: transferMoney(myAccount, yourAccount, 10);
B: transferMoney(yourAccount, myAccount, 20);

With unlucky timing, A will acquire the lock on myAccount and wait for the lock on yourAccount, while B is holding the lock on yourAccount and waiting for the lock on myAccount.

Deadlocks like this one can be spotted the same way as in Listing 10.1look for nested lock acquisitions. Since the order of arguments is out of our control, to fix the problem we must induce an ordering on the locks and acquire them according to the induced ordering consistently throughout the application.

One way to induce an ordering on objects is to use System.identityHashCode, which returns the value that would be returned by Object.hashCode. Listing 10.3 shows a version of transferMoney that uses System.identityHashCode to induce a lock ordering. It involves a few extra lines of code, but eliminates the possibility of deadlock.

In the rare case that two objects have the same hash code, we must use an arbitrary means of ordering the lock acquisitions, and this reintroduces the possibility of deadlock. To prevent inconsistent lock ordering in this case, a third "tie breaking" lock is used. By acquiring the tie-breaking lock before acquiring either Account lock, we ensure that only one thread at a time performs the risky task of acquiring two locks in an arbitrary order, eliminating the possibility of deadlock (so long as this mechanism is used consistently). If hash collisions were common, this technique might become a concurrency bottleneck (just as having a single, program-wide lock would), but because hash collisions with System.identityHashCode are vanishingly infrequent, this technique provides that last bit of safety at little cost.

Listing 10.3. Inducing a Lock Ordering to Avoid Deadlock.

private static final Object tieLock = new Object();

public void transferMoney(final Account fromAcct,
 final Account toAcct,
 final DollarAmount amount)
 throws InsufficientFundsException {
 class Helper {
 public void transfer() throws InsufficientFundsException {
 if (fromAcct.getBalance().compareTo(amount) < 0)
 throw new InsufficientFundsException();
 else {
 fromAcct.debit(amount);
 toAcct.credit(amount);
 }
 }
 }
 int fromHash = System.identityHashCode(fromAcct);
 int toHash = System.identityHashCode(toAcct);

 if (fromHash < toHash) {
 synchronized (fromAcct) {
 synchronized (toAcct) {
 new Helper().transfer();
 }
 }
 } else if (fromHash > toHash) {
 synchronized (toAcct) {
 synchronized (fromAcct) {
 new Helper().transfer();
 }
 }
 } else {
 synchronized (tieLock) {
 synchronized (fromAcct) {
 synchronized (toAcct) {
 new Helper().transfer();
 }
 }
 }
 }
}

If Account has a unique, immutable, comparable key such as an account number, inducing a lock ordering is even easier: order objects by their key, thus eliminating the need for the tie-breaking lock.

You may think we're overstating the risk of deadlock because locks are usually held only briefly, but deadlocks are a serious problem in real systems. A production application may perform billions of lock acquire-release cycles per day. Only one of those needs to be timed just wrong to bring the application to deadlock, and even a thorough load-testing regimen may not disclose all latent deadlocks.[1] DemonstrateDeadlock in Listing 10.4[2] deadlocks fairly quickly on most systems.

[1] Ironically, holding locks for short periods of time, as you are supposed to do to reduce lock contention, increases the likelihood that testing will not disclose latent deadlock risks.

[2] For simplicity, DemonstrateDeadlock ignores the issue of negative account balances.

Listing 10.4. Driver Loop that Induces Deadlock Under Typical Conditions.

public class DemonstrateDeadlock {
 private static final int NUM_THREADS = 20;
 private static final int NUM_ACCOUNTS = 5;
 private static final int NUM_ITERATIONS = 1000000;

 public static void main(String[] args) {
 final Random rnd = new Random();
 final Account[] accounts = new Account[NUM_ACCOUNTS];

 for (int i = 0; i < accounts.length; i++)
 accounts[i] = new Account();

 class TransferThread extends Thread {
 public void run() {
 for (int i=0; i

10.1.3. Deadlocks Between Cooperating Objects

Multiple lock acquisition is not always as obvious as in LeftRightDeadlock or TRansferMoney; the two locks need not be acquired by the same method. Consider the cooperating classes in Listing 10.5, which might be used in a taxicab dispatching application. Taxi represents an individual taxi with a location and a destination; Dispatcher represents a fleet of taxis.

While no method explicitly acquires two locks, callers of setLocation and getImage can acquire two locks just the same. If a thread calls setLocation in response to an update from a GPS receiver, it first updates the taxi's location and then checks to see if it has reached its destination. If it has, it informs the dispatcher that it needs a new destination. Since both setLocation and notifyAvailable are synchronized, the thread calling setLocation acquires the Taxi lock and then the Dispatcher lock. Similarly, a thread calling getImage acquires the Dispatcher lock and then each Taxi lock (one at at time). Just as in LeftRightDeadlock, two locks are acquired by two threads in different orders, risking deadlock.

It was easy to spot the deadlock possibility in LeftRightDeadlock or transferMoney by looking for methods that acquire two locks. Spotting the deadlock possibility in Taxi and Dispatcher is a little harder: the warning sign is that an alien method (defined on page 40) is being called while holding a lock.

Invoking an alien method with a lock held is asking for liveness trouble. The alien method might acquire other locks (risking deadlock) or block for an unexpectedly long time, stalling other threads that need the lock you hold.

10.1.4. Open Calls

Of course, Taxi and Dispatcher didn't know that they were each half of a deadlock waiting to happen. And they shouldn't have to; a method call is an abstraction barrier intended to shield you from the details of what happens on the other side. But because you don't know what is happening on the other side of the call, calling an alien method with a lock held is difficult to analyze and therefore risky.

Calling a method with no locks held is called an open call [CPJ 2.4.1.3], and classes that rely on open calls are more well-behaved and composable than classes that make calls with locks held. Using open calls to avoid deadlock is analogous to using encapsulation to provide thread safety: while one can certainly construct a thread-safe program without any encapsulation, the thread safety analysis of a program that makes effective use of encapsulation is far easier than that of one that does not. Similarly, the liveness analysis of a program that relies exclusively on open calls is far easier than that of one that does not. Restricting yourself to open calls makes it far easier to identify the code paths that acquire multiple locks and therefore to ensure that locks are acquired in a consistent order.[3]

[3] The need to rely on open calls and careful lock ordering reflects the fundamental messiness of composing synchronized objects rather than synchronizing composed objects.

Listing 10.5. Lock-ordering Deadlock Between Cooperating Objects. Don't Do this.

// Warning: deadlock-prone!
class Taxi {
 @GuardedBy("this") private Point location, destination;
 private final Dispatcher dispatcher;

 public Taxi(Dispatcher dispatcher) {
 this.dispatcher = dispatcher;
 }

 public synchronized Point getLocation() {
 return location;
 }

 public synchronized void setLocation(Point location) {
 this.location = location;
 if (location.equals(destination))
 dispatcher.notifyAvailable(this);
 }
}

class Dispatcher {
 @GuardedBy("this") private final Set taxis;
 @GuardedBy("this") private final Set availableTaxis;

 public Dispatcher() {
 taxis = new HashSet();
 availableTaxis = new HashSet();
 }

 public synchronized void notifyAvailable(Taxi taxi) {
 availableTaxis.add(taxi);
 }

 public synchronized Image getImage() {
 Image image = new Image();
 for (Taxi t : taxis)
 image.drawMarker(t.getLocation());
 return image;
 }
}

Taxi and Dispatcher in Listing 10.5 can be easily refactored to use open calls and thus eliminate the deadlock risk. This involves shrinking the synchronized blocks to guard only operations that involve shared state, as in Listing 10.6. Very often, the cause of problems like those in Listing 10.5 is the use of synchronized methods instead of smaller synchronized blocks for reasons of compact syntax or simplicity rather than because the entire method must be guarded by a lock. (As a bonus, shrinking the synchronized block may also improve scalability as well; see Section 11.4.1 for advice on sizing synchronized blocks.)

Strive to use open calls throughout your program. Programs that rely on open calls are far easier to analyze for deadlock-freedom than those that allow calls to alien methods with locks held.

Restructuring a synchronized block to allow open calls can sometimes have undesirable consequences, since it takes an operation that was atomic and makes it not atomic. In many cases, the loss of atomicity is perfectly acceptable; there's no reason that updating a taxi's location and notifying the dispatcher that it is ready for a new destination need be an atomic operation. In other cases, the loss of atomicity is noticeable but the semantic changes are still acceptable. In the deadlock-prone version, getImage produces a complete snapshot of the fleet locations at that instant; in the refactored version, it fetches the location of each taxi at slightly different times.

In some cases, however, the loss of atomicity is a problem, and here you will have to use another technique to achieve atomicity. One such technique is to structure a concurrent object so that only one thread can execute the code path following the open call. For example, when shutting down a service, you may want to wait for in-progress operations to complete and then release resources used by the service. Holding the service lock while waiting for operations to complete is inherently deadlock-prone, but releasing the service lock before the service is shut down may let other threads start new operations. The solution is to hold the lock long enough to update the service state to "shutting down" so that other threads wanting to start new operationsincluding shutting down the servicesee that the service is unavailable, and do not try. You can then wait for shutdown to complete, knowing that only the shutdown thread has access to the service state after the open call completes. Thus, rather than using locking to keep the other threads out of a critical section of code, this technique relies on constructing protocols so that other threads don't try to get in.

10.1.5. Resource Deadlocks

Just as threads can deadlock when they are each waiting for a lock that the other holds and will not release, they can also deadlock when waiting for resources.

Listing 10.6. Using Open Calls to Avoiding Deadlock Between Cooperating Objects.

@ThreadSafe
class Taxi {
 @GuardedBy("this") private Point location, destination;
 private final Dispatcher dispatcher;
 ...
 public synchronized Point getLocation() {
 return location;
 }

 public synchronized void setLocation(Point location) {
 boolean reachedDestination;
 synchronized (this) {
 this.location = location;
 reachedDestination = location.equals(destination);
 }
 if (reachedDestination)
 dispatcher.notifyAvailable(this);
 }
}

@ThreadSafe
class Dispatcher {
 @GuardedBy("this") private final Set taxis;
 @GuardedBy("this") private final Set availableTaxis;
 ...
 public synchronized void notifyAvailable(Taxi taxi) {
 availableTaxis.add(taxi);
 }

 public Image getImage() {
 Set copy;
 synchronized (this) {
 copy = new HashSet(taxis);
 }
 Image image = new Image();
 for (Taxi t : copy)
 image.drawMarker(t.getLocation());
 return image;
 }
}

Say you have two pooled resources, such as connection pools for two different databases. Resource pools are usually implemented with semaphores (see Section 5.5.3) to facilitate blocking when the pool is empty. If a task requires connections to both databases and the two resources are not always requested in the same order, thread A could be holding a connection to database D1 while waiting for a connection to database D2, and thread B could be holding a connection to D2 while waiting for a connection to D1. (The larger the pools are, the less likely this is to occur; if each pool has N connections, deadlock requires N sets of cyclically waiting threads and a lot of unlucky timing.)

Another form of resource-based deadlock is thread-starvation deadlock. We saw an example of this hazard in Section 8.1.1, where a task that submits a task and waits for its result executes in a single-threaded Executor. In that case, the first task will wait forever, permanently stalling that task and all others waiting to execute in that Executor. Tasks that wait for the results of other tasks are the primary source of thread-starvation deadlock; bounded pools and interdependent tasks do not mix well.

Introduction

Part I: Fundamentals

Thread Safety

Sharing Objects

Composing Objects

Building Blocks

Part II: Structuring Concurrent Applications

Part II: Structuring Concurrent Applications

Task Execution

Cancellation and Shutdown

Applying Thread Pools

GUI Applications

Part III: Liveness, Performance, and Testing

Avoiding Liveness Hazards

Performance and Scalability

Testing Concurrent Programs

Part IV: Advanced Topics

Explicit Locks

Building Custom Synchronizers

Atomic Variables and Nonblocking Synchronization

The Java Memory Model