10.5 A Robust Implementation of Multiple Timers

Team-FLY

What happens if a SIGALRM signal is delivered during execution of the timerstart function? Both the timerhandler and the timerstart functions modify the timers data structure, a shared resource. This is the classical critical section problem for shared variables , and care must be taken to ensure that the timers data structure is not corrupted. It is difficult to determine if such a problem exists in the code by testing alone. The events that might cause corruption of the data structure are rare and usually would not show up during testing. If such an event occurred, it would not be easily repeatable and so there might be little information about its cause.

A race condition occurs when the outcome of a program depends on the exact order in which different threads of execution execute statements. The timerstart function is executed by the main thread of execution. That same thread executes timerhandler , but the thread that generates the SIGALRM signal determines when the timer expires . You can prevent race conditions of this type by ensuring that the critical sections are executed in a mutually exclusive manner.

You must analyze the problem to determine where the critical sections are. In this case, the analysis is simple since there is only one global variable, the timers data structure. Any function that modifies this structure must do so at a time when the SIGALRM signal handler may not be entered. The simplest approach is to block the SIGALRM signal before modifying the timers data structure.

Just blocking SIGALRM may not be sufficient. What happens if the interval timer expires during the execution of the timerstart function and SIGALRM is blocked? The timerstart function might make a new timer the running timer and reset the interval timer. Before the timerstart function terminates, it unblocks SIGALRM . At this point, the signal is delivered and the handler assumes that the new timer had expired . Although this sequence of events is extremely unlikely , a correctly working program must account for all possibilities. Exercise 10.7 shows another problem.

Exercise 10.7

Describe a sequence of events in which the timerstop function could fail even if it blocked the signal on entry and unblocked it on exit.

Answer:

The timerstop function blocks the SIGALRM signal. The timer to be stopped then expires (i.e., the interval timer generates a signal). This signal is not immediately delivered to the process, since the signal is blocked. The timerstop function then starts the interval timer corresponding to the next timer to expire. Before it returns, the timerstop function unblocks the signal and the signal is delivered. The signal handler behaves as if the running timer just expired, when in fact a different timer had expired.

The simplest solution to the problem described in Exercise 10.7 is to modify the hardwaretimer module. The stophardwaretimer function (which should be called with the SIGALRM signal blocked) should stop the timer and check to see if the SIGALRM signal is pending by using sigpending . If it is, the stophardwaretimer function removes the signal either by calling sigwait or by ignoring it and catching it again. The sethardwaretimer function can solve a similar problem by calling stophardwaretimer .

Exercise 10.8

How would you test to see if you solved this problem correctly?

Answer:

This cannot be done just by simple testing, since the problem occurs only when a timer expires in a narrow window. To test this, you will have to make the timerstop take some extra time.

Exercise 10.9

What would happen if you put a call to sleep(10) in timerstop to increase the chance that the error would occur?

Answer:

The sleep function might be implemented with SIGALRM , so sleep should not be called from a program that catches SIGALRM . The program has unpredictable results. The nanosleep function does not interact with SIGALRM and could be used in timerstop .

Program 10.4 is a function that can be used to waste a number of microseconds by busy waiting. It calls gettimeofday in a loop until the required number of microseconds has passed.

Program 10.4 `wastetime.c`

A function that does busy waiting for a given number of microseconds.

 #include <stdio.h> #include <sys/time.h> #define MILLION 1000000L int wastetime(int maxus) {               /* waste maxus microseconds of time */     long timedif;     struct timeval tp1, tp2;     if (gettimeofday(&tp1, NULL)) {         fprintf(stderr, "Failed to get initial time\n");         return 1;     }     timedif = 0;     while (timedif < maxus) {         if (gettimeofday(&tp2, NULL)) {             fprintf(stderr, "Failed to get check time\n");             return 1;         }         timedif = MILLION*(tp2.tv_sec - tp1.tv_sec) +                   tp2.tv_usec - tp1.tv_usec;         if (timedif < 0)             break;     }     return 0; }

Analyze the timerstart and timerstop functions and modify the implementation of Section 10.4 so that the timers are handled robustly. Devise a method of testing to verify that the program works correctly. (The test will involve simulating rare events.)

Team-FLY

Exercise 10.7

Exercise 10.8

Exercise 10.9

Program 10.4 wastetime.c

Program 10.4 `wastetime.c`