2.3 Locking--Atomic Operations | Linux Network Architecture

2.3 Locking Atomic Operations

Several different forms of activity can operate and interrupt each other in the Linux kernel. (See Section 2.2.) In multiprocessor systems, different activities even operate in parallel. This is the reason why it is very important for the stability of the system that these operations run in parallel without undesired side effects.

As long as the activities in the Linux kernel operate independently, there will not be any problem. But as soon as several activities access the same data structures, there can be undesired effects, even in single-processor systems.

Figure 2-2 shows an example with two activities, A and B, trying to add the structures skb_a and skb_b to the list queue. At some point, activity A is interrupted by activity B. After some processing of B, A continues with its operations. Figure 2-3 shows the result of this procedure of the two activities. Structure skb_b was added to the list correctly.

Figure 2-2. Activity B interrupts activity A in the critical section.

Figure 2-3. (Undesired) result of the unprotected operations of activities A and B.

Undesired results can also occur in multiprocessor systems when the two activities A and B run quasi-in-parallel on different processors, as in the example shown in Figure 2-4.

Figure 2-4. Parallel operations of the activities A and B in the critical section.

To avoid these problems when several activities operate on a common data structure (the so-called critical section), then these operations have to be atomic. Atomic means that an operation composed of several steps is executed as an undividable operation. No other instance can operate on the data structure concurrently with the atomic operation (i.e., no other activity can access a critical section that's already busy [Tan95]).

The next four sections introduce mechanisms for atomic execution of operations. These mechanisms differ mainly in the way they wait for entry into a potentially occupied critical section, which implicitly depends on the size of the critical section and the expected waiting time.

2.3.1 Bit Operations

Atomic bit operations form the basis for the locking concepts spinlocks and semaphores, described in the following subsections. Locks are used to protect critical sections, and they are normally implemented by variables, which manage the status of locks (i.e., they remember how many activities there currently are in a critical section.

This means that, first of all, the status of locking variables has to be checked and then set, before entry into a critical section. In a processor, this generally adds up to two machine commands to be executed, one after the other. However, it can now happen that a situation like the one described above can occur exactly between these two machine commands (i.e., the activity is interrupted and another activity of the kernel changes the state of variables). For this reason, atomic test and set machine commands are required to support critical sections. Modern processors support these commands and many more. (It would go beyond the scope of this book to describe each one in detail.)

Some of these atomic machine operations, as well as useful operations to manipulate single bits or integer variables, which are often useful to handle protocol headers, are listed in the following examples. Another benefit of these atomic operations is that no additional locks are required for their operations. They run in one single (very fast) machine command.

test_and_set_bit(nr, void *addr) sets the bit with number nr in the unsigned long variables of the pointer addr. The previous value of the bit is returned as return value.
test_and_clear_bit(nr, void *addr) deletes the specified bit and also returns that bit's previous value.
test_and_change_bit(nr, void *addr) inverts bit nr and resets it to its original value.
set_bit(nr, void *addr) sets the bit with number nr at address addr.
clear_bit(nr, void *addr) deletes the specified bit.
change_bit(nr, void *addr) inverts bit number nr.
test_bit(nr, void *addr) returns the current values of the bits.

Integer Operations

Operations can also be run atomically on integers. To do this, however, you have to use the atomic_t data type, which corresponds to the int data type in all supported architectures.

atomic_set(atomic_t *var, int i) sets the variables to value i.
atomic_read(atomic_t *var) reads the variables.
atomic_add/atomic_sub(int i, atomic_t *var) adds or subtracts.
atomic_inc/atomic_dec(atomic_t *var) adds or subtracts in increments of 1.
atomic_..._and_test(...): see Bit Operations.

All of these atomic operations are implemented by one single machine command. However, critical sections often cannot be reduced to one single command, but consist of several operations. Using the atomic bit or integer operations introduced in this section, you can implement locks to protect larger ranges for exclusive access. These spinlocks and semaphores are introduced in the following subsections.

2.3.2 Spinlocks

Spinlocks are also called busy wait locks because of the way they work. When a critical section begins and the lock in this case the spinlock has already been set, then the processor waits actively until the lock is removed. This means that the processor continues testing the locking variable in a continuous loop until that locking variable is released by the locking activity.

Though this wastes computing time because the processor seems to continually test the locking variable "meaninglessly," it can prove more effective to continually test the lock for a brief moment and then be able to enter the critical section upon its release rather than call the scheduler and grant the computing time to another activity. A lot of time can elapse until the waiting activity will get its turn after the lock's release. In addition, a change of activity caused by calling the scheduler could eventually take more computing time than a brief wait in the busy wait loop. For this reason, we observe the principle that spinlocks represent the best locking method for small critical program points (i.e., points with short locking times).

In the Linux kernel, spinlocks are implemented by the variable spinlock_t, which consists of one single integer locking variable. To use a spinlock, you have to create and initialize a spinlock_t structure, as in the following example:

 #include <linux/spinlock.h> spinlock_t my_spinlock = SPIN_LOCK_UNLOCKED; /* You can also use spin_lock_init(&my_spinlock) instead of    an assignment in the definition. */

You can now use a set of different functions to request, set, or release spinlocks. Each of these functions is especially suited for certain application cases. First, let's look at how we can set a spinlock:

spin_lock(spinlock_t *my_spinlock) tries to set the spinlock my_spinlock. If it is not free, then we have to wait or test until it is released. The free spinlock is then set immediately.
spin_lock_irqsave(spinlock_t *my_spinlock, unsigned long flags) works similarly to spin_lock(), but it additionally prevents interrupts and stores the current value of the CPU status register in the variable flags.
spin_lock_irq(spinlock_t *my_spinlock) works similarly to spin_lock_irqsave(), but does not store the value of the CPU status register. It assumes that interrupts are already being prevented.
Similar to spin_lock(), spin_lock_bh(spinlock_t *my_spinlock) tries to set the lock, but it prevents bottom halfs (see Section 2.2.5) from running at the same time.

The following functions can be used to mark the end of the critical section, depending on the application. Each of them releases an occupied spinlock.

spin_unlock(spinlock_t *my_spinlock) releases an occupied spinlock.
spin_unlock_irqrestore(spinlock_t *my_spinlock, unsigned long flags) releases the specified spinlock and allows interrupts, if there were any activated interrupts when the CPU status register was saved to the variable flags; otherwise it doesn't allow interrupts.
spin_unlock_irq(spinlock_t *my_spinlock) releases the specified spinlock and allows interrupts.
spin_unlock_bh(spinlock_t *my_spinlock) also releases the lock and allows immediate processing of bottom halfs.

The functions introduced above can be used to protect critical sections to avoid undesired side effects from parallel operations on the critical sections from occurring. The example shown in Figures 2-2 and 2-4 can be used as follows, where the use of a spinlock prevents undesired side effects:

 #include <linux/spinlock.h> spinlock_t my_spinlock = SPIN_LOCK_UNLOCKED; // Activity A spin_lock(&my_spinlock); skb_a?gt;next = queue?gt;next; queue?gt;next = skb_a; spin_unlock(&my_spinlock); ... // Activity B spin_lock(&my_spinlock); skb_b?gt;next = queue?gt;next; queue?gt;next = skb_b; spin_unlock(&my_spinlock);

The following useful functions are available to handle spinlocks, in addition to the methods used to set and release spinlocks:

spin_is_locked(spinlock_t *my_lock) polls the current status of the lock, without changing it. For a set lock, a value unequal to zero is return; for a free lock, zero is returned.
spin_trylock(spinlock_t *my_lock) sets the spinlock, if it is currently unoccupied; otherwise, the functions immediately returns a value unequal to zero.
spin_unlock_wait(spinlock_t *my_lock) waits for the lock to be released, if the lock is occupied, but the lock is not set.

2.3.3 Read Write Spinlocks

Spinlocks represent a simple and useful element to protect parallel operations on common data structures from undesired side effects. However, they slow down the progress of activities, because these activities have to wait actively for locks to be released. Active waiting is not always necessary in certain situations. For example, there are data structures with frequent read, but few write accesses. A well-known example from the Linux network architecture for this situation would be the list of registered network devices, dev_base. It is rarely changed during a system's runtime (in fact, only by registering a network device), but it is subject to many read accesses.

During accessing of such a jointly used data structure, it is absolutely necessary to use a spinlock, but reading activities do not have to be stopped when no write activity currently operates on that data structure. This is the reason why so-called read write spinlocks are used. They allow several read activities to enter a critical section while there is no write activity operating on it. As soon as an activity with write intention occupies the lock, then no read activities must be in or enter the critical section until the write lock is released.

When we are adding a new net_device structure in our above example of the dev_base list, the undesired effect demonstrated in Figures 2-2 and 2-4 could occur. For this reason, we use the read write spinlock dev_base_lock to protect our dev_base list. The data structure rwlock_t is used to implement read write spinlocks.

The following functions are available to set and release read write spinlocks. Note that we distinguish them according to whether the lock should be entered for read or for write purposes. Again, RW spinlock functions come in different variants with regard to how they handle interrupts and bottom halfs (..._irq(), ..._bh() etc.). We will not repeat a description of their differences here, because their behavior corresponds to the spin_lock() functions.

read_lock...() tries to access a critical section for reading purposes. If it contains no activities or only reading activities, then the section is accessed immediately. If there is a write activity in the critical section, then we have to wait until that activity releases the lock.
read_unlock...() leaves the critical section, which it entered for reading purposes only. If a write activity is waiting and there is no other read activity in that section, then it can access the section.
write_lock...() tries to occupy the critical section for writing purposes. If there is already a (write or read) activity in the critical section, then the activity waits for all activities to leave that section. Subsequently, it puts an exclusive lock on the critical section.
write_unlock...() releases the (write) lock and thus the critical section.

2.3.4 Semaphores

In addition to active locks, there is a way to avoid waiting until a critical section can be accessed when a lock is set. Instead of waiting, the activity releases the CPU by calling the scheduler. This means that the computing time can be used by other activities. This concept is known by the term semaphore or mutex in computer science.

The Linux kernel offers semaphores. However, they are not frequently used in the Linux network architecture. Therefore, instead of describing here in detail, we refer our readers to [BBDK +01].