Section 14.6. System Operation

14.6. System Operation

In this section, we consider topics that are related to the system-startup procedure.

Kernel Configuration

The software that makes up a FreeBSD kernel is defined by a configuration file that is interpreted by the /usr/sbin/config program, which is invoked as part of the kernel build process. The kernel build process has become considerably more complex in FreeBSD and is now controlled by a set of Makefile targets. To build a kernel, the user invokes make in the following way:

 make buildkernel KERNCONF=<kernel config file>

The buildkernel argument is a Makefile target that tells make to build a kernel but not to install it. KERNCONF is a Makefile variable that is set to the name of the kernel configuration. Once a kernel has been properly built, it is installed by running make in the following way:

 make installkernel KERNCONF=<kernel config file>

One reason for this new build process is the need to build and install the necessary kernel modules, which were discussed in Section 14.4. The configuration file specifies the hardware and software components that should be supported by a kernel. The build process generates several output files, some of which are compiled and linked into the kernel's load image. It also creates a directory into which all the loadable kernel modules will be built. When the kernel is installed, its modules are installed as well. A complete description of the kernel build process is given in Hamby & Mock [2004].

System Shutdown and Autoreboot

FreeBSD provides several utility programs to halt or reboot a system, or to bring a system from multiuser to single-user operation. Safe halting and rebooting of a system require support from the kernel. This support is provided by the reboot system call.

The reboot system call is a privileged call. A parameter specifies how the system should be shut down and rebooted. This parameter is a superset of the flags passed by the boot program to the system when the latter is initially bootstrapped. A system can be brought to a halt (typically by its being forced to execute an infinite loop), or it can be rebooted to single-user or multiuser operation. There are additional controls that can be used to force a crash dump before rebooting (see the next subsection for information about crash dumps) and to disable the writing of data that are in the buffer cache to disk if the information in the buffer cache is wrong.

Automatic rebooting is also commonly done when a catastrophic failure is recognized. The system will reboot itself automatically if it recognizes an unrecoverable failure during normal operation. Failures of this type, termed panics, are all handled by the panic() subroutine.

When the system is shutting down, it goes through three separate phases:

Shutdown of all services that depend on the filesystem to store data
Shutdown of the filesystem itself
Shutdown of services that do not depend on the filesystem

These three phases are necessary because some services will want to write some final data to the filesystem before it is turned off and may not be able to restart cleanly if they cannot do so.

Services register event handlers with the kernel to provide an orderly shutdown of the system. Each event handler is declared with the following macro:

 EVENTHANDLER_REGISTER(name, function, argument, priority)

The name shows in which part of the shutdown sequence the event handler's function will be called, as shown in Table 14.7. The argument allows the module to pass itself any private data necessary to turn itself off. The priority orders the shutdown routines within a phase. The priority serves the same purpose here as the order argument does in the SYSINIT macro in creating an orderly startup sequence. The priority is necessary to make sure that services do not go away while other services depend on them.

Table 14.7. Shutdown phases.
Name	Shutdown Phase
shutdown_pre_sync	Before the disks are synced.
shutdown_post_sync	After the disks are synced.
shutdown_final	Just before stopping the CPU.

The kernel shutdown routine first walks the list of shutdown_pre_sync functions and calls each in turn, and then it shuts down the filesystems on the local disks. With the filesystems in a quiescent state, it invokes all the shutdown_post_sync functions. A kernel core dump is made if requested for example, if it was called because of a kernel panic. Kernel core dumps are written directly to the swap partition and not to a normal filesystem, which is why this step can come after the filesystems have been shut down. Finally, the kernel shutdown routine invokes all the functions registered in the shutdown_final group. It then goes into an infinite loop, awaiting a reset by the user.

System Debugging

FreeBSD provides several facilities for debugging system failures. The most commonly used facility is the crash dump: a copy of memory that is saved on secondary storage by the kernel when a catastrophic failure occurs. Crash dumps are created by the doadump() routine. They occur if a reboot system call is made in which the RB_DUMP flag is specified or if the system encounters an unrecoverable and unexpected error.

The doadump() saves the current context with a call to the savectx() routine and then invokes the dumpsys() routine to write the contents of physical memory to secondary storage. The precise location of a crash dump is configurable; most systems place the information at the end of the primary swap partition. This operation is done by the dump entry point of the configured disk driver.

A crash dump is retrieved from its location on disk by the /sbin/savecore program after the system is rebooted and the filesystems have been checked. It creates a file into which the crash-dump image is copied. Savecore also makes a copy of the initial kernel load image for use in debugging. The system administrator can examine crash dumps with the standard FreeBSD debugging program, gdb. The kernel is also set up so that a gdb debugger running on one machine can attach itself across a serial line to a kernel running on another machine. Once attached, it can set breakpoints, examine and modify kernel data structures, and invoke kernel routines on the machine being debugged. This form of source-level debugging is particularly useful in developing kernel device drivers, as long as the driver being developed is not the serial-line driver itself.

Passage of Information To and From the Kernel

In 4.3BSD and earlier systems, utilities that needed to get information from the kernel would open the special device /dev/kmem, which gave access to the kernel's memory. Using the name list from the kernel binary, the utility would seek to the address of the desired symbol and read the value at that location. Utilities with superuser privilege could also use this technique to modify kernel variables. Although this approach worked, it had four problems:

Applications did not have a way to find the binary for the currently running kernel reliably. Using an incorrect binary would result in looking at the wrong location in /dev/kmem, resulting in wildly incorrect output. For programs that modified the kernel, using the wrong binary would usually result in crashing the system by trashing some unrelated data structure.
Reading and interpreting the kernel name list is time-consuming. Thus, applications that had to read kernel data structures ran slowly.
Applications given access to the kernel memory could read the entire kernel memory. Malicious programs could snoop the terminal or network input queues looking for users who were typing sensitive information such as passwords.
As more of the kernel data structures became dynamically allocated, it became difficult to extract the desired information reliably. For example, in 4.3BSD, the process structures were all contained in a single statically allocated table that could be read in a single operation. In FreeBSD, process structures are allocated dynamically and are referenced through a linked list. Thus, they can be read out only one process entry at a time. Because a process entry is subdivided into many separate pieces, each of which resides in a different part of the kernel memory, every process entry takes several seeks and reads to extract through /dev/kmem.

To resolve these problems, 4.4BSD introduced the sysctl system call. This extensible-kernel interface allows controlled access to kernel data structures. The problems enumerated previously are resolved as follows:

Applications do not need to know which kernel binary they are running. The running kernel responds to their request and knows where their data structures are stored. Thus, the correct data structure is always returned or modified.
No time is spent reading or interpreting name lists. Accessing kernel data structures takes only a few system calls.
Sensitive data structures cannot be accessed. The kernel controls the set of data structures that it will return. Nothing else in the kernel is accessible. The kernel can impose its own set of access restrictions on a data structure by data structure basis.
The kernel can use its standard mechanisms for ensuring consistent access to distributed data structures. When requesting process entries, the kernel can acquire the appropriate locks to ensure that a coherent set of data can be returned.

Additional benefits of the interface include the following:

The sysctl interface is fully integrated with the jail system call so that processes running in jails can only access those kernel variables that are appropriate for their view of the system.
Values to be changed can be validated before the data structure is updated. If modification of the data structure requires exclusive access, an appropriate lock can be obtained before the update is done. Thus, an element can be added to a linked list without danger of another process traversing the list while the update is in progress.
Information can be computed only on demand. Infrequently requested information can be computed only when it is requested, rather than being computed continually. For example, many of the virtual-memory statistics are computed only when a system-monitoring program requests them.
The interface allows the superuser to change kernel parameters even when the system is running in secure mode (secure mode is described in Section 8.2). To prevent malfeasance, the kernel does not allow /dev/kmem to be opened for writing while the system is running in secure mode. Even when the system is running in secure mode, the sysctl interface will still allow a superuser to modify kernel data structures that do not affect security.

The sysctl system call describes the kernel name space using a management information base (MIB). An MIB is a hierarchical name space much like the filesystem name space, except that each component is described with an integer value rather than with a string name. A hierarchical name space has several benefits:

New subtrees can be added without existing applications being affected.
If the kernel omits support for a subsystem, the sysctl information for that part of the system can be omitted.
Each kernel subsystem can define its own naming conventions. Thus, the network can be divided into protocol families. Each protocol family can be divided into protocol-specific information, and so on.
The name space can be divided into those parts that are machine independent and are available on every architecture and those parts that are machine dependent and are defined on an architecture by architecture basis.

Since the addition of the sysctl system call in 4.4BSD, the number of variables it controls has been expanded to include about 1000 values that control the virtual memory system, filesystems, networking stacks, and the underlying hardware, as well as the kernel itself.

14.6. System Operation

Kernel Configuration

System Shutdown and Autoreboot

Table 14.7. Shutdown phases.

System Debugging

Passage of Information To and From the Kernel