Section 17.3. Real-Time Kernel Patch

17.3. Real-Time Kernel Patch

Support for hard real time is not in the mainline kernel.org source tree. To enable hard real time, a patch must be applied. The real-time kernel patch is the cumulative result of several initiatives to reduce Linux kernel latency. The patch had many contributors, and it is currently maintained by Ingo Molnar; you can find it at http://people.redhat.com/~mingo/realtime-preempt. The soft real-time performance of the 2.6 Linux kernel has improved significantly since the early 2.6 kernel releases. When 2.6 was first released, the 2.4 Linux kernel was substantially better in soft real-time performance. Since about Linux 2.6.12, soft real-time performance in the single-digit milliseconds on a reasonably fast x86 processor is readily achieved. To get repeatable performance beyond this requires the real-time patch.

The real-time patch adds several important features to the Linux kernel. Figure 17-4 displays the configuration options for Preemption mode when the real-time patch has been applied.

Figure 17-4. Preemption modes with real-time patch

The real-time patch adds a fourth preemption mode called PREEMPT_RT, or Preempt Real Time. The four preemption modes are as follows:

PREEMPT_NONE: No forced preemption. Overall latency is, on average, good, but there can be some occasional long delays. Best suited for applications for which overall throughput is the top design criteria.
PREEMPT_VOLUNTARY: First stage of latency reduction. Additional explicit preemption points are placed at strategic locations in the kernel to reduce latency. Some loss of overall throughput is traded for lower latency.
PREEMPT_DESKTOP: This mode enables preemption everywhere in the kernel except when processing within critical sections. This mode is useful for soft real-time applications such as audio and multimedia. Overall throughput is traded for further reductions in latency.
PREEMPT_RT: Features from the real-time patch are added, including replacing spinlocks with preemptable mutexes. This enables involuntary preemption everywhere within the kernel except for those areas protected by preempt_disable(). This mode significantly smoothes out the variation in latency (jitter) and allows a low and predictable latency for time-critical real-time applications.

If kernel preemption is enabled in your kernel configuration, it can be disabled at boot time by adding the following kernel parameter to the kernel command line:

preempt=0

17.3.1. Real-Time Features

Several new Linux kernel features are enabled with CONFIG_PREEMPT_RT. From Figure 17-4, we see several new configuration settings. These and other features of the real-time Linux kernel patch are described here.

17.3.1.1. Spinlock Converted to Mutex

The real-time patch converts most spinlocks in the system to mutexes. This reduces overall latency at the cost of slightly reduced throughput. The benefit of converting spinlocks to mutexes is that they can be preempted. If Process A is holding a lock, and Process B at a higher priority needs the same lock, Process A can preempt Process B in the case where it is holding a mutex.

17.3.1.2. ISRs as Kernel Tasks

With CONFIG_PREEMPT_HARDIRQ selected, interrupt service routines^[4] (ISRs) are forced to run in process context. This gives the developer control over the priority of ISRs because they become schedulable entities. As such, they also become preemptable to allow higher-priority hardware interrupts to be handled first.

^[4] Also called HARDIRQs.

This is a powerful feature. Some hardware architectures do not enforce interrupt priorities. Those that do might not enforce the priorities consistent with your specified real-time design goals. Using CONFIG_PREEMPT_HARDIRQ, you are free to define the priorities at which each IRQ will run.

Conversion of ISRs to threads can be disabled at runtime through the /proc file system or at boot time by entering a parameter on the kernel command line. When enabled in the configuration, unless you specify otherwise, ISR threading is enabled by default.

To disable ISR threading at runtime, issue the following command as root:

# echo '0' >/proc/sys/kernel/hardirq_preemption

To verify the setting, display it as follows:

# cat /proc/sys/kernel/hardirq_preemption 1

To disable ISR threading at boot time, add the following parameter to the kernel command line:

hardirq-preempt=0

17.3.1.3. Preemptable Softirqs

CONFIG_PREEMPT_SOFTIRQ reduces latency by running softirqs within the context of the kernel's softirq daemon (ksoftirqd). ksoftirqd is a proper Linux task (process). As such, it can be prioritized and scheduled along with other tasks. If your kernel is configured for real time, and CONFIG_PREEMPT_SOFTIRQ is enabled, the ksoftirqd kernel task is elevated to real-time priority to handle the softirq processing.^[5] Listing 17-3 shows the code responsible for this from a recent Linux kernel, found in .../kernel/softirq.c.

^[5] See Linux Kernel Development, referenced at the end of this chapter, to learn more about softirqs.

Listing 17-3. Promoting `ksoftirq` to Real-Time Status

static int ksoftirqd(void * __bind_cpu) {       struct sched_param param = { .sched_priority = 24 };       printk("ksoftirqd started up.\n"); #ifdef CONFIG_PREEMPT_SOFTIRQS        printk("softirq RT prio: %d.\n", param.sched_priority);        sys_sched_setscheduler(current->pid, SCHED_FIFO, &param); #else        set_user_nice(current, -10); #endif ...

Here we see that if CONFIG_PREEMPT_SOFTIRQS is enabled in the kernel configuration, the ksoftirqd kernel task is promoted to a real-time task (SCHED_FIFO) at a real-time priority of 24 using the sys_sched_setscheduler() kernel function.

SoftIRQ threading can be disabled at runtime through the /proc file system, as well as through the kernel command line at boot time. When enabled in the configuration, unless you specify otherwise, SoftIRQ threading is enabled by default. To disable SoftIRQ threading at runtime, issue the following command as root:

# echo '0' >/proc/sys/kernel/softirq_preemption

To verify the setting, display it as follows:

# cat /proc/sys/kernel/softirq_preemption 1

To disable SoftIRQ threading at boot time, add the following parameter to the kernel command line:

softirq-preempt=0

17.3.1.4. Preempt RCU

RCU (Read-Copy-Update)^[6] is a special form of synchronization primitive in the Linux kernel designed for data that is read frequently but updated infrequently. You can think of RCU as an optimized reader lock. The real-time patch adds CONFIG_PREEMPT_RCU, which improves latency by making certain RCU sections preemptable.

^[6] See www.rdrop.com/users/paulmck/RCU/ for an in-depth discussion of RCU.

17.3.2. O(1) Scheduler

The O(1) scheduler has been around since the days of Linux 2.5. It is mentioned here because it is a critical component of a real-time solution. The O(1) scheduler is a significant improvement over the previous Linux scheduler. It scales better for systems with many processes and helps produce lower overall latency.

In case you are wondering, O(1) is a mathematical designation for a system of the first order. In this context, it means that the time it takes to make a scheduling decision is not dependent on the number of processes on a given runqueue. The old Linux scheduler did not have this characteristic, and its performance degraded with the number of processes.^[7]

^[7] We refer you again to Robert Love's book for an excellent discussion of the O(1) scheduler, and a delightful diatribe on algorithmic complexity, from which the notation O(1) derives.

17.3.3. Creating a Real-Time Process

You can designate a process as real time by setting a process attribute that the scheduler uses as part of its scheduling algorithm. Listing 17-4 shows the general method.

Listing 17-4. Creating a Real-Time Process

#include <sched.h> #define MY_RT_PRIORITY MAX_USER_RT_PRIO /* Highest possible */ int main(int argc, char **argv) {       ...       int rc, old_scheduler_policy;       struct sched_param my_params;       ...       /* Passing zero specifies caller's (our) policy */       old_scheduler_policy = sched_getscheduler(0);       my_params.sched_priority = MY_RT_PRIORITY;       /* Passing zero specifies callers (our) pid */       rc = sched_setscheduler(0, SCHED_RR, &my_params);       if ( rc == -1 )            handle_error();       ... }

This code snippet does two things in the call to sched_setscheduler(). It changes the scheduling policy to SCHED_RR and raises its priority to the maximum possible on the system. Linux supports three scheduling policies:

SCHED_OTHER: Normal Linux process, fairness scheduling
SCHED_RR: Real-time process with a time slicethat is, if it does not block, it is allowed to run for a given period of time determined by the scheduler
SCHED_FIFO: Real-time process that runs until it either blocks or explicitly yields the processor, or until another higher-priority SCHED_FIFO process becomes runnable

The man page for sched_setscheduler provides more detail on the three different scheduling policies.

17.3.4. Critical Section Management

When writing kernel code, such as a custom device driver, you will encounter data structures that you must protect from concurrent access. The easiest way to protect critical data is to disable preemption around the critical section. Keep the critical path as short as possible to maintain a low maximum latency for your system. Listing 17-5 shows an example.

Listing 17-5. Protecting Critical Section in Kernel Code

... /*  * Declare and initialize a global lock for your  * critical data  */ DEFINE_SPINLOCK(my_lock); ... int operate_on_critical_data() {     ...     spin_lock(&my_lock);     ...     /* Update critical/shared data */     ...     spin_unlock(&my_lock);     ... }

When a task successfully acquires a spinlock, preemption is disabled and the task that acquired the spinlock is allowed into the critical section. No task switches can occur until a spin_unlock operation takes place. The spin_lock() function is actually a macro that has several forms, depending on the kernel configuration. They are defined at the top level (architecture-independent definitions) in .../include/linux/spinlock.h. When the kernel is patched with the real-time patch, these spinlocks are promoted to mutexes to allow preemption of higher-priority processes when a spinlock is held.

Because the real-time patch is largely transparent to the device driver and kernel developer, the familiar constructs can be used to protect critical sections, as described in Listing 17-5. This is a major advantage of the real-time patch for real-time applications; it preserves the well-known semantics for locking and interrupt service routines.

Using the macro DEFINE_SPINLOCK as in Listing 17-5 preserves future compatibility. These macros are defined in .../include/linux/spinlock_types.h.