Testing for Correctness | Testing Concurrent Programs

Table of contents:

Testing for Performance

Developing unit tests for a concurrent class starts with the same analysis as for a sequential classidentifying invariants and postconditions that are amenable to mechanical checking. If you are lucky, many of these are present in the specification; the rest of the time, writing tests is an adventure in iterative specification discovery.

As a concrete illustration, we're going to build a set of test cases for a bounded buffer. Listing 12.1 shows our BoundedBuffer implementation, using Semaphore to implement the required bounding and blocking.

BoundedBuffer implements a fixed-length array-based queue with blocking put and take methods controlled by a pair of counting semaphores. The availableItems semaphore represents the number of elements that can be removed from the buffer, and is initially zero (since the buffer is initially empty). Similarly, availableSpaces represents how many items can be inserted into the buffer, and is initialized to the size of the buffer.

A take operation first requires that a permit be obtained from availableItems. This succeeds immediately if the buffer is nonempty, and otherwise blocks until the buffer becomes nonempty. Once a permit is obtained, the next element from the buffer is removed and a permit is released to the availableSpaces semaphore.[2] The put operation is defined conversely, so that on exit from either the put or take methods, the sum of the counts of both semaphores always equals the bound. (In practice, if you need a bounded buffer you should use ArrayBlockingQueue or LinkedBlockingQueue rather than rolling your own, but the technique used here illustrates how insertions and removals can be controlled in other data structures as well.)

[2] In a counting semaphore, the permits are not represented explicitly or associated with an owning thread; a release operation creates a permit and an acquire operation consumes one.

Listing 12.1. Bounded Buffer Using Semaphore.

@ThreadSafe
public class BoundedBuffer {
 private final Semaphore availableItems, availableSpaces;
 @GuardedBy("this") private final E[] items;
 @GuardedBy("this") private int putPosition = 0, takePosition = 0;

 public BoundedBuffer(int capacity) {
 availableItems = new Semaphore(0);
 availableSpaces = new Semaphore(capacity);
 items = (E[]) new Object[capacity];
 }
 public boolean isEmpty() {
 return availableItems.availablePermits() == 0;
 }
 public boolean isFull() {
 return availableSpaces.availablePermits() == 0;
 }

 public void put(E x) throws InterruptedException {
 availableSpaces.acquire();
 doInsert(x);
 availableItems.release();
 }
 public E take() throws InterruptedException {
 availableItems.acquire();
 E item = doExtract();
 availableSpaces.release();
 return item;
 }

 private synchronized void doInsert(E x) {
 int i = putPosition;
 items[i] = x;
 putPosition = (++i == items.length)? 0 : i;
 }
 private synchronized E doExtract() {
 int i = takePosition;
 E x = items[i];
 items[i] = null;
 takePosition = (++i == items.length)? 0 : i;
 return x;
 }
}

12.1.1. Basic Unit Tests

The most basic unit tests for BoundedBuffer are similar to what we'd use in a sequential contextcreate a bounded buffer, call its methods, and assert postconditions and invariants. Some invariants that quickly come to mind are that a freshly created buffer should identify itself as empty, and also as not full. A similar but slightly more complicated safety test is to insert N elements into a buffer with capacity N (which should succeed without blocking), and test that the buffer recognizes that it is full (and not empty). JUnit test methods for these properties are shown in Listing 12.2.

Listing 12.2. Basic Unit Tests for BoundedBuffer.

class BoundedBufferTest extends TestCase {
 void testIsEmptyWhenConstructed() {
 BoundedBuffer bb = new BoundedBuffer(10);
 assertTrue(bb.isEmpty());
 assertFalse(bb.isFull());
 }

 void testIsFullAfterPuts() throws InterruptedException {
 BoundedBuffer bb = new BoundedBuffer(10);
 for (int i = 0; i < 10; i++)
 bb.put(i);
 assertTrue(bb.isFull());
 assertFalse(bb.isEmpty());
 }
}

These simple test methods are entirely sequential. Including a set of sequential tests in your test suite is often helpful, since they can disclose when a problem is not related to concurrency issues before you start looking for data races.

12.1.2. Testing Blocking Operations

Tests of essential concurrency properties require introducing more than one thread. Most testing frameworks are not very concurrency-friendly: they rarely include facilities to create threads or monitor them to ensure that they do not die unexpectedly. If a helper thread created by a test case discovers a failure, the framework usually does not know with which test the thread is associated, so some work may be required to relay success or failure information back to the main test runner thread so it can be reported.

For the conformance tests for java.util.concurrent, it was important that failures be clearly associated with a specific test. Hence the JSR 166 Expert Group created a base class[3] that provided methods to relay and report failures during tearDown, following the convention that every test must wait until all the threads it created terminate. You may not need to go to such lengths; the key requirements are that it be clear whether the tests passed and that failure information is reported somewhere for use in diagnosing the problem.

[3] http://gee.cs.oswego.edu/cgi-bin/viewcvs.cgi/jsr166/src/test/tck/JSR166TestCase.java

If a method is supposed to block under certain conditions, then a test for that behavior should succeed only if the thread does not proceed. Testing that a method blocks is similar to testing that a method throws an exception; if the method returns normally, the test has failed.

Testing that a method blocks introduces an additional complication: once the method successfully blocks, you have to convince it somehow to unblock. The obvious way to do this is via interruptionstart a blocking activity in a separate thread, wait until the thread blocks, interrupt it, and then assert that the blocking operation completed. Of course, this requires your blocking methods to respond to interruption by returning early or throwing InterruptedException.

The "wait until the thread blocks" part is easier said than done; in practice, you have to make an arbitrary decision about how long the few instructions being executed could possibly take, and wait longer than that. You should be prepared to increase this value if you are wrong (in which case you will see spurious test failures).

Listing 12.3 shows an approach to testing blocking operations. It creates a "taker" thread that attempts to take an element from an empty buffer. If take succeeds, it registers failure. The test runner thread starts the taker thread, waits a long time, and then interrupts it. If the taker thread has correctly blocked in the take operation, it will throw InterruptedException, and the catch block for this exception treats this as success and lets the thread exit. The main test runner thread then attempts to join with the taker thread and verifies that the join returned successfully by calling Thread.isAlive; if the taker thread responded to the interrupt, the join should complete quickly.

The timed join ensures that the test completes even if take gets stuck in some unexpected way. This test method tests several properties of takenot only that it blocks but that, when interrupted, it throws InterruptedException. This is one of the few cases in which it is appropriate to subclass THRead explicitly instead of using a Runnable in a pool: in order to test proper termination with join. The same approach can be used to test that the taker thread unblocks after an element is placed in the queue by the main thread.

It is tempting to use Thread.getState to verify that the thread is actually blocked on a condition wait, but this approach is not reliable. There is nothing that requires a blocked thread ever to enter the WAITING or TIMED_WAITING states, since the JVM can choose to implement blocking by spin-waiting instead. Similarly, because spurious wakeups from Object.wait or Condition.await are permitted (see Chapter 14), a thread in the WAITING or TIMED_WAITING state may temporarily transition to RUNNABLE even if the condition for which it is waiting is not yet true. Even ignoring these implementation options, it may take some time for the target thread to settle into a blocking state. The result of Thread.getState should not be used for concurrency control, and is of limited usefulness for testingits primary utility is as a source of debugging information.

Listing 12.3. Testing Blocking and Responsiveness to Interruption.

void testTakeBlocksWhenEmpty() {
 final BoundedBuffer bb = new BoundedBuffer(10);
 Thread taker = new Thread() {
 public void run() {
 try {
 int unused = bb.take();
 fail(); // if we get here, it's an error
 } catch (InterruptedException success) { }
 }};
 try {
 taker.start();
 Thread.sleep(LOCKUP_DETECT_TIMEOUT);
 taker.interrupt();
 taker.join(LOCKUP_DETECT_TIMEOUT);
 assertFalse(taker.isAlive());
 } catch (Exception unexpected) {
 fail();
 }
}

12.1.3. Testing Safety

The tests in Listings 12.2 and 12.3 test important properties of the bounded buffer, but are unlikely to disclose errors stemming from data races. To test that a concurrent class performs correctly under unpredictable concurrent access, we need to set up multiple threads performing put and take operations over some amount of time and then somehow test that nothing went wrong.

Constructing tests to disclose safety errors in concurrent classes is a chicken-and-egg problem: the test programs themselves are concurrent programs. Developing good concurrent tests can be more difficult than developing the classes they test.

The challenge to constructing effective safety tests for concurrent classes is identifying easily checked properties that will, with high probability, fail if something goes wrong, while at the same time not letting the failureauditing code limit concurrency artificially. It is best if checking the test property does not require any synchronization.

One approach that works well with classes used in producer-consumer designs (like BoundedBuffer) is to check that everything put into a queue or buffer comes out of it, and that nothing else does. A naive implementation of this approach would insert the element into a "shadow" list when it is put on the queue, remove it from the list when it is removed from the queue, and assert that the shadow list is empty when the test has finished. But this approach would distort the scheduling of the test threads because modifying the shadow list would require synchronization and possibly blocking.

A better approach is to compute checksums of the elements that are enqueued and dequeued using an order-sensitive checksum function, and compare them. If they match, the test passes. This approach works best when there is a single producer putting elements into the buffer and a single consumer taking them out, because it can test not only that the right elements (probably) came out but that they came out in the right order.

Extending this approach to a multiple-producer, multiple-consumer situation requires using a checksum function that is insensitive to the order in which the elements are combined, so that multiple checksums can be combined after the test. Otherwise, synchronizing access to a shared checksum field could become a concurrency bottleneck or distort the timing of the test. (Any commutative operation, such as addition or XOR, meets these requirements.)

To ensure that your test actually tests what you think it does, it is important that the checksums themselves not be guessable by the compiler. It would be a bad idea to use consecutive integers as your test data because then the result would always be the same, and a smart compiler could conceivably just precompute it.

To avoid this problem, test data should be generated randomly, but many otherwise effective tests are compromised by a poor choice of random number generator (RNG). Random number generation can create couplings between classes and timing artifacts because most random number generator classes are threadsafe and therefore introduce additional synchronization.[4] Giving each thread its own RNG allows a non-thread-safe RNG to be used.

[4] Many benchmarks are, unbeknownst to their developers or users, simply tests of how great a concurrency bottleneck the RNG is.

Rather than using a general-purpose RNG, it is better to use simple pseudorandom functions. You don't need high-quality randomness; all you need is enough randomness to ensure the numbers change from run to run. The xor-Shift function in Listing 12.4 (Marsaglia, 2003) is among the cheapest mediumquality random number functions. Starting it off with values based on hashCode and nanoTime makes the sums both unguessable and almost always different for each run.

Listing 12.4. Medium-quality Random Number Generator Suitable for Testing.

static int xorShift(int y) {
 y ^= (y << 6);
 y ^= (y >>> 21);
 y ^= (y << 7);
 return y;
}

PutTakeTest in Listings 12.5 and 12.6 starts N producer threads that generate elements and enqueue them, and N consumer threads that dequeue them. Each thread updates the checksum of the elements as they go in or out, using a perthread checksum that is combined at the end of the test run so as to add no more synchronization or contention than required to test the buffer.

Depending on your platform, creating and starting a thread can be a moderately heavyweight operation. If your thread is short-running and you start a number of threads in a loop, the threads run sequentially rather than concurrently in the worst case. Even in the not-quite-worst case, the fact that the first thread has a head start on the others means that you may get fewer interleavings than expected: the first thread runs by itself for some amount of time, and then the first two threads run concurrently for some amount of time, and only eventually are all the threads running concurrently. (The same thing happens at the end of the run: the threads that got a head start also finish early.)

We presented a technique for mitigating this problem in Section 5.5.1, using a CountDownLatch as a starting gate and another as a finish gate. Another way to get the same effect is to use a CyclicBarrier, initialized with the number of worker threads plus one, and have the worker threads and the test driver wait at the barrier at the beginning and end of their run. This ensures that all threads are up and running before any start working. PutTakeTest uses this technique to coordinate starting and stopping the worker threads, creating more potential concurrent interleavings. We still can't guarantee that the scheduler won't run each thread to completion sequentially, but making the runs long enough reduces the extent to which scheduling distorts our results.

The final trick employed by PutTakeTest is to use a deterministic termination criterion so that no additional inter-thread coordination is needed to figure out when the test is finished. The test method starts exactly as many producers as consumers and each of them puts or takes the same number of elements, so the total number of items added and removed is the same.

Tests like PutTakeTest tend to be good at finding safety violations. For example, a common error in implementing semaphore-controlled buffers is to forget that the code actually doing the insertion and extraction requires mutual exclusion (using synchronized or ReentrantLock). A sample run of PutTakeTest with a version of BoundedBuffer that omits making doInsert and doExtract synchronized fails fairly quickly. Running PutTakeTest with a few dozen threads iterating a few million times on buffers of various capacity on various systems increases our confidence about the lack of data corruption in put and take.

Tests should be run on multiprocessor systems to increase the diversity of potential interleavings. However, having more than a few CPUs does not necessarily make tests more effective. To maximize the chance of detecting timing-sensitive data races, there should be more active threads than CPUs, so that at any given time some threads are running and some are switched out, thus reducing the predicatability of interactions between threads.

Listing 12.5. Producer-consumer Test Program for BoundedBuffer.

public class PutTakeTest {
 private static final ExecutorService pool
 = Executors.newCachedThreadPool();
 private final AtomicInteger putSum = new AtomicInteger(0);
 private final AtomicInteger takeSum = new AtomicInteger(0);
 private final CyclicBarrier barrier;
 private final BoundedBuffer bb;
 private final int nTrials, nPairs;

 public static void main(String[] args) {
 new PutTakeTest(10, 10, 100000).test(); // sample parameters
 pool.shutdown();
 }

 PutTakeTest(int capacity, int npairs, int ntrials) {
 this.bb = new BoundedBuffer(capacity);
 this.nTrials = ntrials;
 this.nPairs = npairs;
 this.barrier = new CyclicBarrier(npairs* 2 + 1);
 }

 void test() {
 try {
 for (int i = 0; i < nPairs; i++) {
 pool.execute(new Producer());
 pool.execute(new Consumer());
 }
 barrier.await(); // wait for all threads to be ready
 barrier.await(); // wait for all threads to finish
 assertEquals(putSum.get(), takeSum.get());
 } catch (Exception e) {
 throw new RuntimeException(e);
 }
 }

 class Producer implements Runnable { /* Listing 12.6 */ }

 class Consumer implements Runnable { /* Listing 12.6 */ }
}

Listing 12.6. Producer and Consumer Classes Used in PutTakeTest.

/* inner classes of PutTakeTest (Listing 12.5) */
class Producer implements Runnable {
 public void run() {
 try {
 int seed = (this.hashCode() ^ (int)System.nanoTime());
 int sum = 0;
 barrier.await();
 for (int i = nTrials; i > 0; --i) {
 bb.put(seed);
 sum += seed;
 seed = xorShift(seed);
 }
 putSum.getAndAdd(sum);
 barrier.await();
 } catch (Exception e) {
 throw new RuntimeException(e);
 }
 }
}

class Consumer implements Runnable {
 public void run() {
 try {
 barrier.await();
 int sum = 0;
 for (int i = nTrials; i > 0; --i) {
 sum += bb.take();
 }
 takeSum.getAndAdd(sum);
 barrier.await();
 } catch (Exception e) {
 throw new RuntimeException(e);
 }
 }
}

In tests that run until they complete a fixed number of operations, it is possible that the test case will never finish if the code being tested encounters an exception due to a bug. The most common way to handle this is to have the test framework abort tests that do not terminate within a certain amount of time; how long to wait should be determined empirically, and failures must then be analyzed to ensure that the problem wasn't just that you didn't wait long enough. (This problem is not unique to testing concurrent classes; sequential tests must also distinguish between long-running and infinite loops.)

12.1.4. Testing Resource Management

The tests so far have been concerned with a class's adherence to its specificationthat it does what it is supposed to do. A secondary aspect to test is that it does not do things it is not supposed to do, such as leak resources. Any object that holds or manages other objects should not continue to maintain references to those objects longer than necessary. Such storage leaks prevent garbage collectors from reclaiming memory (or threads, file handles, sockets, database connections, or other limited resources) and can lead to resource exhaustion and application failure.

Resource management issues are especially important for classes like BoundedBufferthe entire reason for bounding a buffer is to prevent application failure due to resource exhaustion when producers get too far ahead of consumers. Bounding causes overly productive producers to block rather than continue to create work that will consume more and more memory or other resources.

Undesirable memory retention can be easily tested with heap-inspection tools that measure application memory usage; a variety of commercial and open-source heap-profiling tools can do this. The testLeak method in Listing 12.7 contains placeholders for a heap-inspection tool to snapshot the heap, which forces a garbage collection[5] and then records information about the heap size and memory usage.

[5] Technically, it is impossible to force a garbage collection; System.gc only suggests to the JVM that this might be a good time to perform a garbage collection. HotSpot can be instructed to ignore System.gc calls with -XX:+DisableExplicitGC.

The testLeak method inserts several large objects into a bounded buffer and then removes them; memory usage at heap snapshot #2 should be approximately the same as at heap snapshot #1. On the other hand, if doExtract forgot to null out the reference to the returned element (items[i]=null), the reported memory usage at the two snapshots would definitely not be the same. (This is one of the few times where explicit nulling is necessary; most of the time, it is either not helpful or actually harmful [EJ Item 5].)

12.1.5. Using Callbacks

Callbacks to client-provided code can be helpful in constructing test cases; callbacks are often made at known points in an object's lifecycle that are good opportunities to assert invariants. For example, ThreadPoolExecutor makes calls to the task Runnables and to the ThreadFactory.

Listing 12.7. Testing for Resource Leaks.

class Big { double[] data = new double[100000]; }

void testLeak() throws InterruptedException {
 BoundedBuffer bb = new BoundedBuffer(CAPACITY);
 int heapSize1 = /* snapshot heap */ ;
 for (int i = 0; i < CAPACITY; i++)
 bb.put(new Big());
 for (int i = 0; i < CAPACITY; i++)
 bb.take();
 int heapSize2 = /* snapshot heap */ ;
 assertTrue(Math.abs(heapSize1-heapSize2) < THRESHOLD);
}

Testing a thread pool involves testing a number of elements of execution policy: that additional threads are created when they are supposed to, but not when they are not supposed to; that idle threads get reaped when they are supposed to, etc. Constructing a comprehensive test suite that covers all the possibilities is a major effort, but many of them can be tested fairly simply individually.

We can instrument thread creation by using a custom thread factory. TestingThreadFactory in Listing 12.8 maintains a count of created threads; test cases can then verify the number of threads created during a test run. TestingThreadFactory could be extended to return a custom Thread that also records when the thread terminates, so that test cases can verify that threads are reaped in accordance with the execution policy.

Listing 12.8. Thread Factory for Testing THReadPoolExecutor.

class TestingThreadFactory implements ThreadFactory {
 public final AtomicInteger numCreated = new AtomicInteger();
 private final ThreadFactory factory
 = Executors.defaultThreadFactory();

 public Thread newThread(Runnable r) {
 numCreated.incrementAndGet();
 return factory.newThread(r);
 }
}

If the core pool size is smaller than the maximum size, the thread pool should grow as demand for execution increases. Submitting long-running tasks to the pool makes the number of executing tasks stay constant for long enough to make a few assertions, such as testing that the pool is expanded as expected, as shown in Listing 12.9.

Listing 12.9. Test Method to Verify Thread Pool Expansion.

public void testPoolExpansion() throws InterruptedException {
 int MAX_SIZE = 10;
 ExecutorService exec = Executors.newFixedThreadPool(MAX_SIZE);

 for (int i = 0; i < 10* MAX_SIZE; i++)
 exec.execute(new Runnable() {
 public void run() {
 try {
 Thread.sleep(Long.MAX_VALUE);
 } catch (InterruptedException e) {
 Thread.currentThread().interrupt();
 }
 }
 });
 for (int i = 0;
 i < 20 && threadFactory.numCreated.get() < MAX_SIZE;
 i++)
 Thread.sleep(100);
 assertEquals(threadFactory.numCreated.get(), MAX_SIZE);
 exec.shutdownNow();
}

12.1.6. Generating More Interleavings

Since many of the potential failures in concurrent code are low-probability events, testing for concurrency errors is a numbers game, but there are some things you can do to improve your chances. We've already mentioned how running on multiprocessor systems with fewer processors than active threads can generate more interleavings than either a single-processor system or one with many processors. Similarly, testing on a variety of systems with different processor counts, operating systems, and processor architectures can disclose problems that might not occur on all systems.

A useful trick for increasing the number of interleavings, and therefore more effectively exploring the state space of your programs, is to use THRead.yield to encourage more context switches during operations that access shared state. (The effectiveness of this technique is platform-specific, since the JVM is free to treat THRead.yield as a no-op [JLS 17.9]; using a short but nonzero sleep would be slower but more reliable.) The method in Listing 12.10 transfers credits from one account to another; between the two update operations, invariants such as "sum of all accounts equals zero" do not hold. By sometimes yielding in the middle of an operation, you may activate timing-sensitive bugs in code that does not use adequate synchronization to access state. The inconvenience of adding these calls for testing and removing them for production can be reduced by adding them using aspect-oriented programming (AOP) tools.

Listing 12.10. Using THRead.yield to Generate More Interleavings.

public synchronized void transferCredits(Account from,
 Account to,
 int amount) {
 from.setBalance(from.getBalance() - amount);
 if (random.nextInt(1000) > THRESHOLD)
 Thread.yield();
 to.setBalance(to.getBalance() + amount);
}

Testing for Performance

Introduction

Part I: Fundamentals

Thread Safety

Sharing Objects

Composing Objects

Building Blocks

Part II: Structuring Concurrent Applications

Part II: Structuring Concurrent Applications

Task Execution

Cancellation and Shutdown

Applying Thread Pools

GUI Applications

Part III: Liveness, Performance, and Testing

Avoiding Liveness Hazards

Performance and Scalability

Testing Concurrent Programs

Part IV: Advanced Topics

Explicit Locks

Building Custom Synchronizers

Atomic Variables and Nonblocking Synchronization

The Java Memory Model