Section 13.2. Kernel Basics

[Page 534 (continued)]

13.2. Kernel Basics

The Linux kernel is the part of the operating system that contains the code for:

sharing the CPU and RAM between competing processes
processing system calls
transferring data between processes and peripheral devices (including the network), and between processes

The kernel is a program that is loaded from disk into RAM when the computer is first turned on. It always stays in RAM, and runs until the system is turned off (or crashes). Although it's mostly written in C, some parts of the kernel were written in assembly language for efficiency reasons. User programs make use of the kernel via the system call interface.

13.2.1. Kernel Subsystems

The kernel facilities may be divided into several subsystems:

memory management
process management
interprocess communication (IPC)
input/output
file management

These subsystems interact in a fairly hierarchical way. Figure 13-1 illustrates the layering.

[Page 535]

Figure 13-1. Linux kernel subsystems.

13.2.2. Processes and Files

The Linux kernel supports the concepts of processes and files. Processes are the "life forms" that live in the computer and make decisions. Files are containers of information that processes read and write. In addition, processes may talk to each other via several different kinds of interprocess communication mechanisms, including signals, pipes, and sockets. Figure 13-2 is an illustration of what I mean.

Figure 13-2. Linux supports processes and files.

[Page 536]

13.2.3. Talking to the Kernel

Processes access kernel functions via the system call interface, and peripherals (special files) communicate with the kernel via hardware interrupts. Linux also provides the /proc file system (discussed in Chapter 14, "System Administration"), which is an abstraction layer that provides access to "live" kernel data.

Since systems calls and interrupts are obviously very important, I'll begin the discussion of Linux internals with a description of each mechanism.

13.2.4. System Calls

System calls are the programmer's functional interface to the kernel. They are subroutines that reside inside the Linux kernel, and support basic system functions such as the ones listed in Figure 13-3.

Figure 13-3. Common Linux system calls.
Function	System call
open a file	open
close a file	close
perform I/O	read/write
send a signal	kill
create a pipe	pipe
create a socket	socket
duplicate a process	fork/clone
overlay a process	execl/execv
terminate a process	exit

System calls may be loosely grouped into three main categories, as illustrated in Figure 13-4.

Figure 13-4. Major system call subsystems. (This item is displayed on page 537 in the print version)

13.2.5. User Mode and Kernel Mode

The kernel contains several data structures that are essential to the functioning of the system. Examples include:

a task list, which is a doubly linked list of objects representing each process
a file list, which is a doubly linked list of objects representing each open file

These data structures reside in the kernel's memory space, which is protected from user processes by a memory management system that I'll describe to you later. User processes therefore cannot accidentally corrupt these important kernel data structures. System call routines are different from regular functions because they can directly manipulate kernel data structures, albeit in a carefully controlled manner.

[Page 537]

When a user process is running, it operates in a special machine mode called user mode. This mode prevents a process from executing certain privileged machine instructions, including those that would allow it to access the kernel data structures. The other machine mode is called kernel mode. A kernel-mode process may execute any machine instruction.

The only way for a user process to enter kernel mode is to execute a system call. Every system call is allocated a code number, starting from 1. When a process invokes a system call, the C runtime library version of the system call places the system call parameters and the system call code number into some machine registers, and then executes a trap machine instruction. The trap instruction flips the machine into kernel mode and uses the system call code number to find the proper function. The code corresponding to the indexed function executes in kernel mode, modifying kernel data structures as necessary, and then performs a special return instruction that flips the machine back into user mode and returns to the user process's code.

Why not just use a client/server model with a kernel server process that services system requests from client user processes? This avoids the need for user processes to directly execute kernel code. The reason is pure and simplespeed. In current architectures, the overhead of swapping between processes is too great to make the client/server approach practical. However, it's interesting to note that some of the modern microkernel systems are taking the latter approach.

From a programmer's standpoint, using a system call is easy; you call the C function with the correct parameters, and the function returns when complete. If an error occurs, the function returns -1 and the global variable errno is set to indicate the cause of the error. Figure 13-5 illustrates the flow of control during a system call.

[Page 538]

Figure 13-5. User mode and kernel mode.

13.2.6. Synchronous Versus Asynchronous Processing

When a process performs a system call, it cannot usually be preempted. This means that the scheduler will not assign the CPU to another process during the operation of a system call. However, some system calls request I/O operations from a device, which can take a while to complete. To avoid leaving the CPU idle during the wait for I/O completion, the kernel sends the waiting process to sleep and wakes it up again only when a hardware interrupt signaling I/O completion is received. The scheduler does not allocate a sleeping process any CPU time, and so allocates the CPU to other processes while the hardware device is servicing the I/O request.

[Page 539]

An interesting consequence of the way that Linux handles read () and write () is that user processes experience synchronous execution of system calls, whereas the kernel experiences asynchronous behavior (Figure 13-6).

Figure 13-6. Synchronous and asynchronous events

13.2.6.1. Interrupts

Interrupts are the way that hardware devices notify the kernel that they would like some attention. In the same way that processes compete for CPU time, hardware devices compete for interrupt processing. Devices are allocated an interrupt priority based on their relative importance. For example, interrupts from the system clock have a higher priority than those from the keyboard. Figure 13-7 illustrates interrupt processing.

Figure 13-7. Interrupt processing. (This item is displayed on page 540 in the print version)

[Page 540]

When an interrupt occurs, the current process is suspended and the kernel determines the source of the interrupt. It then examines its interrupt vector table (called irq_action in the kernel), to find the location of the code that processes the interrupt. This "interrupt handler" code is then executed. When the interrupt handler completes, the current process is resumed.

13.2.7. Interrupting Interrupts

Interrupt processing may itself be interrupted! If an interrupt of a higher priority than the current interrupt arrives, a similar sequence of events occurs, and the lower-priority interrupt handler is suspended until the higher-priority interrupt completes (Figure 13-8).

Figure 13-8. Interrupts may be interrupted.