2.5 Program Safety

< Day Day Up >

2.5.1 Race Conditions

As pointed out in Section 2.3, problems unique to concurrent programs occur because of the nondeterministic interactions between threads. At the heart of most of these problems is a race condition. Race conditions occur when one activity, such as a thread, produces a correct answer as long as it can complete a task without interruption; however, if another activity interrupts it while it is in the middle of the task, the answer produced will be in error. A simple example of a race condition would be two threads printing a report to a printer. If both threads are allowed to access the printer at the same time, it is possible for the data for the two reports to be mixed together, obviously ruining both reports. However, if one report finishes before the next one begins, both reports will be correct. This problem can be seen as one where one thread printing one report is racing to finish printing the report before the second one starts. If it succeeds, it wins the race, and the program is correct. If it fails, it loses the race and both threads produce incorrect results.

Two conditions are necessary for race conditions to occur. First, a resource (in this case, a printer) must be shared between the two threads. Second, an execution path must exist in the threads where a resource is shared in an unsafe manner (in this case, the ability of the two threads to access the printer at the same time). If these two conditions exist, a race condition is present in the program even if the problem never actually occurs, because the fact is that the problem could happen. Even if a programmer is satisfied that the likelihood is extremely small, the program is dangerous because it is unstable. If a later change is made to the program that changes the assumptions that made the program appear safe, problems can begin to appear that cause the program to produce wrong answers. Often these problems are more difficult to solve because they will be intermittent and not consistently reproducible.

This is a problem for many programmers because they have a habit of "hacking" a program, which is writing the basic structure and tweaking it and testing the results until it produces the desired behavior. This strategy is not effective for concurrent programs, as the program cannot be shown to be correct by testing. So, if a concurrent program cannot be tested to show it is safe, how can the program be shown to be safe? Only two valid mechanisms are available. The first is to show a proof by induction that the program will satisfy any conditions necessary for the program to be safe. The second is to develop the program using strategies known to be safe in developing a program, which requires that implementation of the program is planned and implemented carefully.

What types of strategies can be applied to make a program safe? The necessary conditions for a race condition imply strategies to correct the problem. One strategy is to simply not share resources — for example, to have two printers and have each thread access its own printer. A program that does not share resources is obviously safe, and this strategy is often used as part of a concurrent program (for example, in the Java Event Model; see Chapter 7); however, such an approach is not always practical. Few computers will have a printer for every possible output that can be produced, thus some sharing is inevitable.

The second strategy for handling race conditions is to make sure that the shared resource is in a safe state before allowing another thread to access it. In the case of the printer, this means that a thread that wishes to use a printer must be guaranteed that it will have exclusive access to a printer until it has completed printing, and only then will a second thread be allowed to print. This strategy of isolating and locking a resource while it is in an unsafe state is the basis of most strategies for writing safe concurrent programs and is the topic of the rest of this section.

When doing concurrent programming, the shared resources that are most often subject to race conditions are program variables. The next section shows how race conditions can occur in a simple program that shares variables and shows the mechanisms built into Java for safely handling these shared variables.

2.5.2 Race Conditions in Programs

Just as in the printer example, race conditions exist in programs because multiple threads attempt to access a shared resource, often a program variable. Because concurrent threads interleave in a nondeterministic fashion, the variable can sometimes be changed so that it produces an incorrect result. For example, consider Program2.6 (Exhibit 17). This program simply swaps two numbers in two different threads. Obviously, two correct answers can be produced from this program when each thread swaps their numbers, as shown in Exhibit 18. However, this program can also produce two incorrect answers when the threads interleave and the second thread changes the value in the tmp variable being used by the first thread, also shown in Exhibit 18.

Exhibit 17: Program2.6: Program with a Race Condition

 public class ShowRaceCondition implements Runnable {      int val1, val2; // store the values to be swapped.      static SwapInt so = new SwapInt(); // SwapInt object.      Note that it is                       // static so it is shared between all                       // instances of this class.      /**       * Constructor. It just takes two integers and stores them       * so that they can be swapped later.       */      public ShowRaceCondition(int i1, int i2){           val1 = i1; val2 = i2;      }      /**       *  Run method that simply swaps the integers and prints       *  the result.       */      public void run() {           so.swap(this);     System.out.println("Val1 = " + val1 + " Val2 = " + val2);      }      /**       *  main method that creates and starts the objects and          threads       */      public static void main(String args[]) {         (new Thread(new ShowRaceCondition(4,7))).start();         (new Thread(new ShowRaceCondition(2,5))).start();      } } /**  *   Purpose: Provide an object that swaps integers.  *       An object is needed so that an instance variable  *       exists that can be shared. If a method is used  *       variable will be stored on the stack and not shared  *      between threads, and no race condition would exist.  */ class SwapInt {      volatile private int tmp;      public void swap(ShowRaceCondition s) {           try {                Thread.sleep((int) (Math.random() * 100));                tmp = s.val1;                s.val1 = s.val2;                Thread.sleep((int) (Math.random() * 100));                s.val2 = tmp;           } catch (InterruptedException e) {           }      } }

Exhibit 18: Possible Output from Program2.6 (Exhibit 17)

 Correct Output   Val1 = 5 Val2 = 2Val1 = 7 Val2 = 4   Val1 = 7 Val2 = 4Val1 = 5 Val2 = 2 Incorrect Output   Val1 = 5 Val2 = 4Val1 = 7 Val2 = 2   Val1 = 7 Val2 = 4Val1 = 5 Val2 = 2

The reason this program can produce these incorrect results is that it is unsafe. It is sharing a resource, the SwapInt object and its variable tmp, and it uses that variable in an unsafe manner. The unsafe scenario is that one thread enters the swap method, saves its val1 to the variable tmp, and then calls Thread.sleep. Because its state is now sleeping, the SVM chooses the second thread to execute and does a context switch to the second thread. The second thread can then enter the swap method and overwrite the value for tmp, thus losing the value from the first thread. This is an example of a race condition on a variable. Often, the first thread executes the swap method completely and the value in tmp has no meaning when the second thread enters the swap method, so the program produces a correct answer. However, if the first thread loses the race and does not finish the swap method before the second thread enters it, the value of tmp is changed which causes the first thread to incorrectly swap values.

One of the reasons that the program is unsafe is that the SwapInt object is shared. This object is shared because it is declared static, so there is only one copy of the object for the class, and that copy is shared by all the ShowRaceCondition objects. One way to make this program safe is to simply remove the static modifier, creating a SwapInt object for each ShowRaceCondition object. This same effect could be achieved by declaring the SwapInt object inside of the run method, making a SwapInt object for each thread. An object that is not shared is safe. However, as with printers, it is often necessary to share an object, so a method must be developed that will ensure that the data can be used safely. Because this program shows how a race condition can occur, it is also used to show how to solve it.

2.5.3 Locking Objects

The race condition in Program2.6 (Exhibit 17) occurred because the two threads could both be executing in the swap method at the same time. What is needed is a mechanism that locks the swap method when one thread enters it, preventing other threads from entering the method until the first thread has completed executing it. When the first thread leaves the method, it unlocks it, allowing another thread to obtain the lock and run in the method. This solution is shown in Program2.7 (Exhibits 19 and 20). Program2.7 is made up of two classes: a binary semaphore (as shown in Exhibit 19, Program2.7a) and a class that implements a safe solution to the problem in Program2.6 (Exhibit 17) (as shown in Exhibit 20, Program2.7b).

Exhibit 19: Program2.7a: Binary Semaphore

 public class BinarySemaphore {   private boolean locked = false;   /**    * Get the lock.    */   public synchronized void getLock() {     while (locked) {       try {         wait();       } catch(InterruptedException e) {       }     }     locked = true;   }   /**    * Release the lock.    */   public synchronized void releaseLock() {     locked = false;     notify();   } }

Exhibit 20: Program2.7b: Program that Safely Swaps Values

 public class SolveRaceCondition implements Runnable{   int val1, val2;   static SwapInt SO = new SwapInt();   public SolveRaceCondition(int i1, int i2){     val1 = i1; val2 = i2;   }   public void run() {     SO.swap(this);     System.out.println("Val1 = "+ val1 + "Val2 = "+ val2);   }   public static void main(String args[]) {     (new Thread(new SolveRaceCondition(4,7))).start();     (new Thread(new SolveRaceCondition(2,5))).start();   } } /**  *    Purpose: This SwapInt object shows how a lock can be  *             used to prevent two threads from sharing a  *             critical section, thus fixing a race condition.  *  * Procedure:  1 - Get the lock; 2 - Execute the Critical  *             section; 3 - Release the lock.  */ class SwapInt {   private volatile int tmp;   private BinarySemaphore mutex = new BinarySemaphore();   public void swap(SolveRaceCondition s) {     mutex.getLock();     try {       Thread.sleep((int) (Math.random() * 100));       tmp = s.val1;       s.val1 = s.val2;       Thread.sleep((int) (Math.random() * 100));       s.val2 = tmp;     } catch (InterruptedException e) {     }     mutex.releaseLock();   } }

One way to implement locking is to use a binary semaphore, such as the one implemented in Exhibit 19 (Program2.7a), which is shown later to be correct; however, for now you should assume that this binary semaphore is indeed safe. To use this binary semaphore, a thread makes a call to the method getLock. If the variable locked is false, the semaphore is not locked, so the thread obtains the lock by setting the value of locked to true. Now, if another thread tries to get the lock by calling getLock, it finds that the lock is taken (i.e., the value of locked is true), and it must wait until the lock is free to continue.

The thread that obtained the lock is now free to run in the critical section of the program. A critical section is a portion of code where only one thread may be executing at a time to ensure the safety of the program. The thread running in the critical section now does the swap operation, and because no other thread can enter this section of code the swap is safe.

When the thread has finished executing in the critical section, it makes a call to the releaseLock method, which releases the lock by setting the value of locked to false and notifying any thread that is currently waiting for the lock that it may now try to obtain the lock and proceed. For now, do not worry about the syntax or how the synchronized, wait, and notify statements actually work; they are covered in detail later. Just understand that a lock can be used to suspend threads until the lock is freed, ensuring the safety of using a critical section.

The binary semaphore has now been added to Exhibit 17 (Program2.6) to produce Exhibit 20 (Program2.7b). Note that the only difference is that in Exhibit 20 (Program2.7b) a locking variable, called mutex (for mutual exclusion), is created as part of the object to protect the critical section in the swap object. The lock for mutex is obtained when the thread enters the swap method, and the lock is released when the thread leaves the swap method. The program is now obviously safe, and it can be tested many times to show that it works; however, as was pointed out earlier, a concurrent program cannot be tested to show it is correct. The only valid way to show that a concurrent program is correct is to do an inductive proof of the program, which is done for Exhibit 20 (Program2.7b) in the next section.

Before continuing, the reader should note that the lock for the swap method is associated with the SwapInt object, not the swap method or SwapInt class. This is often confusing to programmers new to concurrent programming. Because of this, if more than one method in the SwapInt object uses the mutex lock, only one thread can be in any method for that object at any time. If there are multiple SwapInt objects, one thread can be in the swap method for each object at any time. This should become more clear when the details of the Binary-Semaphore class are discussed.

2.5.4 Proving Exhibit 20 (Program2.7b) Is Correct

As was pointed out earlier, a concurrent program cannot be tested to show it is correct; an inductive proof is necessary to show the program is correct. The technique for performing an inductive proof on a program is important to understand, even if the proof is rarely done, because the inductive proof provides the reasoning for showing a concurrent program is correct.

The inductive proof of Exhibit 20 (Program2.7b) starts with the assumption that the BinarySemaphore object is safe. Also, the variables used in the SwapInt class must be private so that they cannot be modified by a thread except by calling the getLock and releaseLock methods. The base case and inductive case are stated, and the inductive step is done to show that the program is indeed safe:

Base case: The two threads start at the beginning of the run method.
Induction hypothesis: One thread enters the swap method, while the other thread is still executing code prior to entering the swap method. Because neither thread has yet entered the swap method, we know it is safe to this point.
Inductive step: Both threads execute after the swap method, and the variables are swapped correctly.

As proof, all we have to do is show that all paths from the inductive hypothesis to the inductive step result in a correct program. Assuming thread 1 has entered the swap method, the possibilities are as follows:

Thread 1 executes to completion of the swap method before thread 2 executes again. Thread 1 will thus finish before thread 2 can enter the swap method, and is safe. Thread 2 will enter the swap method after thread 1 no longer uses it, and so it is safe. This path is safe.
Thread 1 and thread 2 interleave execution, but thread 2 never tries to enter the swap method. This path is again safe.
Thread 1 and thread 2 interleave, but thread 2 attempts to enter the swap method. Thread 2 is stopped on the lock, and thus thread 1 can complete safely, followed by thread 2. This path is also safe.

The second part of the proof would be to show that the program is safe if thread 2 enters the swap method before thread 1, and the proof is conducted the same as above, with the result that the program in Exhibit 20 (Program2.7b) is indeed shown to be safe.

2.5.5 The Synchronized Modifier

The proof in the previous section assumed that the BinarySemaphore object was safe. The next two sections show that this object is indeed safe and explain the workings of the wait and notify methods and the synchronized modifier. They also show how these methods can be used to implement a special type of object known as a monitor, of which the BinarySemaphore class is a good example.

The idea of protecting critical sections of programs using a lock is so useful that Java implemented it using the synchronized modifier. The synchronized modifier is placed at the beginning of a block of code, in this case an entire method. Associated with every instance of the Object class is a binary semaphore (which will be referred to as a lock). Because every object in Java extends the Object class, every object has a lock. The synchronized keyword automatically inserts the logic to obtain the lock on the object at the beginning of the block and inserts code for releasing the lock at the end of the block. When a thread attempts to enter a synchronized block of code, the lock for the object is checked, just as it was in the SwapInt object. If the lock is not taken, the thread obtains the lock, preventing any other thread from entering any other synchronized block in that object. When the thread leaves the block, the lock is released.

So the race condition in Exhibit 17 (Program2.6) could easily have been solved by adding the synchronized keyword to the swap method, as shown in Exhibit 21 (Program2.8), and the BinarySemaphore object is not needed. However, actually implementing the BinarySemaphore object is interesting for two reasons. First, it explains what happens when the synchronized modifier is used, and, second, implementation of the BinarySemaphore can be used to demonstrate the wait and notify methods.

Exhibit 21: Program2.8: Safe Swap Program Using the Synchronized Modifier

 public class SolveRaceCondition_2 implements Runnable{   private int val1, val2;   static SwapInt so = new SwapInt();   public SolveRaceCondition_2(int i1, int i2){     val1 = i1; val2 = i2;   }   public void run() {     so.swap(this);     System.out.println("Val1 = "+ val1 + "Val2 = "+ val2);   }   public static void main(String args[]) {     (new Thread(new SolveRaceCondition_2(4,7))).start();     (new Thread(new SolveRaceCondition_2(2,5))).start();   } } /**  * Purpose: This class shows the use of the synchronized  *          clause to lock the swap method.  */ class SwapInt {   private int tmp;   synchronized public void swap(SolveRaceCondition_2 s) {     Thread.sleep((int) (Math.random() * 100));     try {       Thread.sleep((int) (Math.random() * 100));       tmp = s.val1;       s.val1 = s.val2;       Thread.sleep((int) (Math.random() * 100));       s.val2 = tmp;     } catch (InterruptedException e) {     }   } }

Before explaining the use of wait and notify, however, one last point must be made about the synchronized modifier: It is associated with a block of code and an object. That block of code is often a method, and that object is often the "this" object. However, the block of code could be a part of the method, and the object could be any object, not simply the "this" object. To specify the block of code to be synchronized, the synchronized statement is simply used at the start of the block. To synchronize on an object other than "this," the object is specified as a parameter to the synchronized modifier. For example, the synchronization of the swap method in Exhibit 13 was achieved by synchronizing the entire method:

   synchronized public void swap(SolveRaceCondition_2 s) {   }

This could have been written equivalently as:

   public void swap(SolveRaceCondition_2 s) {     synchronized(this) {     }   }

Because the synchronized block uses the "this" object and contains the entire method, these two methods would be equivalent.

2.5.6 The Wait and Notify/NotifyAll Methods

Because Java has already implemented a locking mechanism for objects using the synchronized block, BinarySemaphore is not necessary to make the swap method correctly. But, the BinarySemaphore class is interesting in that it shows how to use the synchronized keyword and wait and notifyAll methods to implement what is called a monitor. Monitors are important data structures that form the basis for much of the material in Chapter 3, as well as the rest of the book. The BinarySemaphore class is a very good example of a monitor.

An easy way to understand a monitor is to think of it as an object in which all the data are private and all the methods are protected by a single lock so that only one thread may be executing in any method in this object at any time. Because the data of the object is private and all the methods that can access the data are synchronized, it is not possible for unintended interactions between the threads to cause a race condition. This is the simplest type of monitor using complete synchronization. This type of object is always thread safe, so it can always be used safely in a program that uses threads.

Using complete synchronization, while safe, is too restrictive for many objects — for example, the BinarySemaphore object. So, a slightly more complex monitor can be written that has a "state," or conditions under which the methods in the object can be run. In the case of BinarySemaphore, the getLock method can only be run when the locked variable is false. If the locked variable is true, the thread must wait until it has changed value to false before continuing to run that thread. This is accomplished by calling the wait method.

The wait method is a method in the Object class. It is matched with the notify or notifyAll method, also in the Object class. To understand how the wait and notify methods work, we need to refine the model of contexts. Earlier, we said that the SVM could choose any context that was ready or running to execute next. The SVM keeps a list of all the threads in the system that are ready to be run, called a ready set. Associated with each thread is its context, including the PC. The SVM chooses one thread from the ready set to run and begins to execute the code at the PC for that context. After the thread has executed for some time, where the time is as small as each line of code, the SVM takes the thread that is running and puts it back in the ready set. The SVM again chooses a thread to run from the ready set, possibly the same thread that it has just moved back to the list, and executes that thread. This cycle continues until all threads have completed and the program has finished running.

However, just as when a thread is waiting on an I/O or is sleeping, not all threads are always ready to run. The SVM has no reason to consider these threads when choosing a thread to run, as these threads are not ready; therefore, the SVM keeps a number of sets and moves the threads between the sets as the program executes. When something happens, such as when a thread wakes up from sleeping, the SVM moves the thread from the sleeping set to the ready set. This thread can then be selected to run the next time that the SVM chooses a new thread to move from the ready set to running.

Many such sets of threads not ready to run are managed by the SVM. The one of interest here is the one associated with the wait call on an object. When a thread makes a call to wait it is telling the SVM that it does not want to run again until a call to notify or notifyAll method is invoked on the object on which it did the wait. When the wait method is called, the SVM moves this thread to the wait set for the object that it is waiting on and only moves the object back to the ready set when a notify or notifyAll call is made on the object, as shown in Exhibit 22.

Exhibit 22: SVM with Wait Queue on an Object

click to expand

The two forms of the notify call are notify and notifyAll. The notify call moves any one thread (the choice of the thread to move is nondeterministic, not the one waiting the longest) from the wait set for this specified object to the ready set. The notifyAll call moves all threads on the wait set for the object to the ready set. As will be seen in Chapter 3, the notifyAll call, while it has more overhead, will be used more frequently than the notify call.

2.5.7 Semantics of a Wait Call

Earlier it was said that only one thread can be executing inside of any synchronized method for an object. This condition was necessary to ensure that the safety of the object is maintained. However, if this is true, then we have a problem with the semantics of the wait call. Consider the BinarySemaphore object in Exhibit 19 (Program2.7a) and used in Exhibit 20 (Program2.7b). Thread 1 enters the synchronized block by calling getLock and obtains the lock for the BinarySemaphore object. It then sets the value of the locked variable to true and exits the getLock method, which releases the object's lock. Thread 2 then tries to use the BinarySemaphore by entering the getLock method. It can enter the method as the object is not locked, but it finds the variable locked to be true and so executes the wait method. If thread 2 continues to holds the object's lock on the BinarySemaphore object, then when thread 1 later tries to call the releaseLock method it finds that the BinarySemaphore object is locked, and it cannot continue. This results in thread 1 not being able to continue (because it cannot get the lock on the BinarySemaphore object held by thread 2), and thread 2 cannot continue because it is waiting for a notify call, which cannot happen because it continues to hold the BinarySemaphore object's lock. This is called deadlock, and is the subject of Section 2.6. Notice, though, that this problem would be an unavoidable consequence if this is actually how the wait and notify methods work.

In order to make the semantics of the wait method work they must be changed so that the wait method releases the object's lock when a thread is moved to the wait list. In the Java wait method, when thread 2 calls the wait method it releases the lock, allowing thread 1 to call the releaseLock method, which in turn calls the notify, allowing thread 2 to continue.

It should be noted that when thread 2 starts to run again, it does not hold the lock on the object as it was released when the thread called the wait method. This could be a problem. If another thread attempts to call getLock or releaseLock, the BinarySemaphore object is no longer locked, and this new thread could obtain the lock. If thread 2 continues to run as if it holds the lock, then the possibility exists for two threads running in a synchronized block on an object. This problem can also occur if multiple threads are waiting on the same object and notifyAll is called, which releases all of them. Therefore, in order to ensure that the locking is maintained a thread that is moved to the ready set from a wait state must first regain the object lock before it can continue to run. This ensures that only one thread can execute in any synchronized block for an object at any time.

This also explains why notify and notifyAll cannot guarantee that the first thread to wait will be the first to execute when the object's lock becomes free. When a thread wakes up, it will be moved to the ready set. When it starts to execute, it must then try to obtain the object's lock. Because we already have said that the SVM is allowed to choose any thread from the ready set to execute, another thread may have obtained the lock between the time the thread was notified and when it can execute. Thus, the notify and notifyAll cannot guarantee which thread will execute first and obtain the lock. This is an important point that programmers need to keep in mind when using wait and notify/notifyAll in Java. The calls are not fair (i.e., the first thread to wait is not the first thread to begin execution after a notify/notifyAll) and cannot be made fair in Java. A programmer must build a mechanism to make the calls fair, if that is needed; such a mechanism is shown in Chapter 8.

From this discussion, we now have two ways to release a synchronized lock in Java. The first is to leave a synchronized block, which is safe, and the second is to call the wait method, which might not be safe. The second alternative is required in order to make the semantics of the wait and notify methods work correctly; however, a problem can occur if the call to wait is not used carefully. A wait call inside of a synchronized method means there is a possibility that the method will not run from start to end with no interference from other threads, leading to unsafe behavior by introducing race conditions back into the program. A programmer needs to be careful with how a wait call is used. Normally, wait is only called near the beginning of a synchronized block (as shown in Chapter 3) or by using very well-defined and carefully thought through semantics (such as notification objects, as discussed in Chapter 8).

Synchronization is very powerful but difficult to use correctly. If it is used incorrectly, it will result in either race conditions or deadlock; therefore, it is recommended that synchronization be implemented using standard patterns. In Chapter 3, the wait call is always used in a standard precondition block at the beginning of the method. Chapters 7 and 8 show other more complex but safe methods to do thread synchronization. If a programmer needs to use synchronization in a non-standard way, the only way to be sure that the program is correct is by supplying an inductive proof. So, the implementation of a concurrent program must be carefully considered.

< Day Day Up >