Section 3.7. Wait Queues


3.7. Wait Queues

We discussed the process transition between the states of TASK_RUNNING and TASK_INTERRUPTIBLE or TASK_UNINTERRUPTIBLE. Now, we look at another structure that's involved in this transition. When a process is waiting on an external event to occur, it is removed from the run queue and placed on a wait queue. Wait queues are doubly linked lists of wait_queue_t structures. The wait_queue_t structure is set up to hold all the information required to keep track of a waiting task. All tasks waiting on a particular external event are placed in a wait queue. The tasks on a given wait queue are woken up, at which point the tasks verify the condition they are waiting for and either resume sleep or remove themselves from the wait queue and set themselves back to TASK_RUNNING. You might recall that sys_wait4() system calls use wait queues when a parent requests status of its forked child. Note that a task waiting for an external event (and therefore is no longer on the run queue[4]) is either in the TASK_INTERRUPTIBLE or TASK_UNINTERRUPTIBLE states.

[4] A task is removed from the run queue once it sleeps and, therefore, yields control to another process.

A wait queue is a doubly linked list of wait_queue_t structures that hold pointers to the process task structures of the processes that are blocking. Each list is headed up by a wait_queue_head_t structure, which marks the head of the list and holds the spinlock to the list to prevent wait_queue_t additional race conditions. Figure 3.14 illustrates wait queue implementation. We now look at the wait_queue_t and the wait_queue_head_t structures:

 ----------------------------------------------------------------------- include/linux/wait.h 19  typedef struct __wait_queue wait_queue_t; ... 23  struct __wait_queue { 24   unsigned int flags; 25  #define WQ_FLAG_EXCLUSIVE  0x01 26   struct task_struct * task; 27   wait_queue_func_t func; 28   struct list_head task_list; 29  }; 30 31  struct __wait_queue_head { 32   spinlock_t lock; 33   struct list_head task_list; 34  }; 35  typedef struct __wait_queue_head wait_queue_head_t; ----------------------------------------------------------------------- 

Figure 3.14. Wait Queue Structures


The wait_queue_t structure is comprised of the following fields:

  • flags. Can hold the value WQ_FLAG_EXCLUSIVE, which is set to 1, or ~WQ_FLAG_EXCLUSIVE, which would be 0. The WQ_FLAG_EXCLUSIVE flag marks this process as an exclusive process. Exclusive and non-exclusive processes are discussed in the next section.

  • task. The pointer to the task descriptor of the process being placed on the wait queue.

  • func. A structure holding a function used to wake the task on the wait queue. This field uses as default default_wake_function(), which is covered in detail in the section, "Waiting on the Event."

    wait_queue_func_t is defined as follows:

     ------------------------------------------------------------------ include/linux/wait.h typedef int (*wait_queue_func_t)(wait_queue_t *wait,  unsigned mode, int sync); ------------------------------------------------------------------ 

    where wait is the pointer to the wait queue, mode is either TASK_ INTERRUPTIBLE or TASK_UNINTERRUPTIBLE, and sync specifies if the wakeup should be synchronous.

  • task_list. The structure that holds pointers to the previous and next elements in the wait queue.

The structure __wait_queue_head is the head of a wait queue list and is comprised of the following fields:

  • lock. One lock per list allows the addition and removal of items into the wait queue to be synchronized.

  • task_list. The structure that points to the first and last elements in the wait queue.

The "Wait Queues" section in Chapter 10, "Adding Your Code to the Kernel," describes an example implementation of a wait queue. In general, the way in which a process puts itself to sleep involves a call to one of the wait_event* macros (which is discussed shortly) or by executing the following steps, as in the example shown in Chapter 10:

1.

By declaring the wait queue, the process sleeps on by way of DECLARE_WAITQUEUE_HEAD.

2.

Adding itself to the wait queue by way of add_wait_queue() or add_wait_queue_exclusive().

3.

Changing its state to TASK_INTERRUPTIBLE or TASK_ UNINTERRUPTIBLE.

4.

Testing for the external event and calling schedule(), if it has not occurred yet.

5.

After the external event occurs, setting itself to the TASK_RUNNING state.

6.

Removing itself from the wait queue by calling remove_wait_queue().

The waking up of a process is handled by way of a call to one of the wake_up macros. These wake up all processes that belong to a particular wait queue. This places the task in the TASK_RUNNING state and places it back on the run queue.

Let's look at what happens when we call the add_wait_queue() functions.

3.7.1. Adding to the Wait Queue

Two different functions are used to add sleeping processes into a wait queue: add_wait_queue() and add_wait_queue_exclusive(). The two functions exist to accommodate the two types of sleeping processes. Non-exclusive waiting processes are those that wait for the return of a condition that is not shared by other waiting processes. Exclusive waiting processes are waiting for a condition that another waiting process might be waiting on, which potentially generates a race condition.

The add_wait_queue() function inserts a non-exclusive process into the wait queue. A non-exclusive process is one which will, under any circumstance, be woken up by the kernel when the event it is waiting for comes to fruition. The function sets the flags field of the wait queue struct, which represents the sleeping process to 0, sets the wait queue lock to avoid interrupts accessing the same wait queue from generating a race condition, adds the structure to the wait queue list, and restores the lock from the wait queue to make it available to other processes:

 ----------------------------------------------------------------------- kernel/fork.c 93  void add_wait_queue(wait_queue_head_t *q, wait_queue_t * wait) 94  { 95   unsigned long flags; 96 97   wait->flags &= ~WQ_FLAG_EXCLUSIVE; 98   spin_lock_irqsave(&q->lock, flags); 99   __add_wait_queue(q, wait); 100   spin_unlock_irqrestore(&q->lock, flags); 101  } ----------------------------------------------------------------------- 

The add_wait_queue_exclusive() function inserts an exclusive process into the wait queue. The function sets the flags field of the wait queue struct to 1 and proceeds in much the same manner as add_wait_queue() exclusive, with the exception that it adds exclusive processes into the queue from the tail end. This means that in a particular wait queue, the non-exclusive processes are at the front and the exclusive processes are at the end. This comes into play with the order in which the processes in a wait queue are woken up, as we see when we discuss waking up sleeping processes:

 ----------------------------------------------------------------------- kernel/fork.c 105  void add_wait_queue_exclusive(wait_queue_head_t *q, wait_queue_t * wait) 106  { 107   unsigned long flags; 108 109   wait->flags |= WQ_FLAG_EXCLUSIVE; 110   spin_lock_irqsave(&q->lock, flags); 111   __add_wait_queue_tail(q, wait); 112   spin_unlock_irqrestore(&q->lock, flags); 113  } ----------------------------------------------------------------------- 

3.7.2. Waiting on the Event

The sleep_on(), sleep_on_timeout(), and interruptible_sleep_on() interfaces, although still supported in 2.6, will be removed for the 2.7 release. Therefore, we cover only the wait_event*() interfaces that are to replace the sleep_on*() interfaces.

The wait_event*() interfaces include wait_event(), wait_event_ interruptible(), and wait_event_interruptible_timeout(). Figure 3.15 shows the function skeleton calling routing.

Figure 3.15. wait_event*() Call Graph


We go through and describe the interfaces related to wait_event() and mention what the differences are with respect to the other two functions. The wait_event() interface is a wrapper around the call to __wait_event() with an infinite loop that is broken only if the condition being waited upon returns. wait_event_interruptible_timeout() passes a third parameter called ret of type int, which is used to pass the timeout time.

wait_event_interruptible() is the only one of the three interfaces that returns a value. This return value is ERESTARTSYS if a signal broke the waiting event, or 0 if the condition was met:

 ----------------------------------------------------------------------- include/linux/wait.h 137  #define wait_event(wq, condition)        138  do {             139   if (condition)          140    break;          141   __wait_event(wq, condition);        142  } while (0) ----------------------------------------------------------------------- 

The __wait_event() interface does all the work around the process state change and the descriptor manipulation:

 ----------------------------------------------------------------------- include/linux/wait.h 121  #define __wait_event(wq, condition) 122  do { 123   wait_queue_t __wait; 124   init_waitqueue_entry(&__wait, current); 125 126   add_wait_queue(&wq, &__wait); 127   for (;;) { 128    set_current_state(TASK_UNINTERRUPTIBLE); 129    if (condition) 130     break; 131    schedule(); 132   } 133   current->state = TASK_RUNNING; 134   remove_wait_queue(&wq, &__wait); 135  } while (0) ----------------------------------------------------------------------- 

Line 124126

Initialize the wait queue descriptor for the current process and add the descriptor entry to the wait queue that was passed in. Up to this point, __wait_event_interruptible and __wait_event_interruptible_timeout look identical to __wait_event.

Lines 127132

This code sets up an infinite loop that will only be broken out of if the condition is met. Before blocking on the condition, we set the state of the process to TASK_INTERRUPTIBLE by using the set_current_state macro. Recall that this macro references the pointer to the current process so we do not need to pass in the process information. Once it blocks, it yields the CPU to the next process by means of a call to the scheduler(). __wait_event_interruptible() differs in one large respect at this point; it sets the state field of the process to TASK_ UNINTERRUPTIBLE and waits on a signal_pending call to the current process. __wait_event_interruptible_timeout is much like __wait_event_ interruptible except for its call to schedule_timeout() instead of the call to schedule() when calling the scheduler. schedule_timeout takes as a parameter the timeout length passed in to the original wait_event_interruptible_ timeout interface.

Lines 133134

At this point in the code, the condition has been met or, in the case of the other two interfaces, a signal might have been received or the timeout reached. The state field of the process descriptor is now set back to TASK_RUNNING (the scheduler places this in the run queue). Finally, the entry is removed from the wait queue. The remove_wait_queue() function locks the wait queue before removing the entry, and then it restores the lock before returning.

3.7.3. Waking Up

A process must be woken up to verify whether its condition has been met. Note that a process might put itself to sleep, but it cannot wake itself up. Numerous macros can be used to wake_up tasks in a wait queue but only three main "wake_up" functions exist. The macros wake_up, wake_up_nr, wake_up_all, wake_up_ interruptible, wake_up_interruptible_nr, and wake_up_ interruptible_all all call __wake_up() with different parameters. The macros wake_up_all_sync and wake_up_interruptible_sync both call __wake_ up_sync() with different parameters. Finally, the wake_up_locked macro defaults to the __wake_up_locked() function:

[View full width]

----------------------------------------------------------------------- include/linux/wait.h 116 extern void FASTCALL(__wake_up(wait_queue_head_t *q, unsigned int mode, int nr)); 117 extern void FASTCALL(__wake_up_locked(wait_queue_head_t *q, unsigned int mode)); 118 extern void FASTCALL(__wake_up_sync(wait_queue_head_t *q, unsigned int mode, int nr)); 119 120 #define wake_up(x) __wake_up((x),TASK_UNINTERRUPTIBLE | TASK_INTERRUPTIBLE, 1) 121 #define wake_up_nr(x, nr) __wake_up((x),TASK_UNINTERRUPTIBLE | TASK_INTERRUPTIBLE, nr) 122 #define wake_up_all(x) __wake_up((x),TASK_UNINTERRUPTIBLE | TASK_INTERRUPTIBLE, 0) 123 #define wake_up_all_sync(x) __wake_up_sync((x),TASK_UNINTERRUPTIBLE | TASK_INTERRUPTIBLE, 0) 124 #define wake_up_interruptible(x) __wake_up((x),TASK_INTERRUPTIBLE, 1) 125 #define wake_up_interruptible_nr(x, nr) __wake_up((x),TASK_INTERRUPTIBLE, nr) 126 #define wake_up_interruptible_all(x) __wake_up((x),TASK_INTERRUPTIBLE, 0) 127 #define wake_up_locked(x) __wake_up_locked((x), TASK_UNINTERRUPTIBLE | TASK_INTERRUPTIBLE) 128 #define wake_up_interruptible_sync(x) __wake_up_sync((x),TASK_INTERRUPTIBLE, 1 129 ) -----------------------------------------------------------------------

Let's look at __wake_up():

 ----------------------------------------------------------------------- kernel/sched.c 2336  void fastcall __wake_up(wait_queue_head_t *q, unsigned int mode, int nr_exclusive) 2337  { 2338    unsigned long flags; 2339 2340    spin_lock_irqsave(&q->lock, flags); 2341    __wake_up_common(q, mode, nr_exclusive, 0); 2342    spin_unlock_irqrestore(&q->lock, flags); 2343  } ----------------------------------------------------------------------- 

Line 2336

The parameters passed to __wake_up include q, the pointer to the wait queue; mode, the indicator of the type of thread to wake up (this is identified by the state of the thread); and nr_exclusive, which indicates whether it's an exclusive or non-exclusive wakeup. An exclusive wakeup (when nr_exclusive = 0) wakes up all the tasks in the wait queue (both exclusive and non-exclusive), whereas a non-exclusive wakeup wakes up all the non-exclusive tasks and only one exclusive task.

Lines 2340, 2342

These lines set and unset the wait queue's spinlock. Set the lock before calling __wake_up_common()to ensure no race condition comes up.

Line 2341

The function __wake_up_common() performs the bulk of the wakeup function:

 ----------------------------------------------------------------------- kernel/sched.c 2313  static void __wake_up_common(wait_queue_head_t *q,  unsigned int mode, int nr_exclusive, int sync) 2314  { 2315   struct list_head *tmp, *next; 2316 2317   list_for_each_safe(tmp, next, &q->task_list) { 2318      wait_queue_t *curr; 2319      unsigned flags;  2320     curr = list_entry(tmp, wait_queue_t, task_list); 2321      flags = curr->flags; 2322      if (curr->func(curr, mode, sync) && 2323       (flags & WQ_FLAG_EXCLUSIVE) && 2324       !--nr_exclusive) 2325        break; 2326    } 2327  } ----------------------------------------------------------------------- 

Line 2313

The parameters passed to __wake_up_common are q, the pointer to the wait queue; mode, the type of thread to wake up; nr_exclusive, the type of wakeup previously shown; and sync, which states whether the wakeup should be synchronous.

Line 2315

Here, we set temporary pointers for list-entry manipulation.

Line 2317

The list_for_each_safe macro scans each item of the wait queue. This is the beginning of our loop.

Line 2320

The list_entry macro returns the address of the wait queue structure held by the tmp variable.

Line 2322

The wait_queue_t's func field is called. By default, this calls default_wake_function(), which is shown here:

 ----------------------------------------------------------------------- kernel/sched.c 2296  int default_wake_function(wait_queue_t *curr, unsigned mode,  int sync) 2297  { 2298   task_t *p = curr->task; 2299   return try_to_wake_up(p, mode, sync); 2300  } ----------------------------------------------------------------------- 

This function calls try_to_wake_up() (kernel/sched.c) on the task pointed to by the wait_queue_t structure. This function performs the bulk of the work of waking up a process, including putting it on the run queue.

Lines 23222325

The loop terminates if the process being woken up is the first exclusive process. This makes sense if we realize that all the exclusive processes are queued at the end of the wait queue. After we encounter the first exclusive task in the wait queue all remaining tasks will also be exclusive so we do not want to wake them and we break out of the loop.




The Linux Kernel Primer. A Top-Down Approach for x86 and PowerPC Architectures
The Linux Kernel Primer. A Top-Down Approach for x86 and PowerPC Architectures
ISBN: 131181637
EAN: N/A
Year: 2005
Pages: 134

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net