2.4 Kernel Modules

We explained in Section 2.1 that monolithic operating-system kernels, including the Linux kernel, have the drawback that all functionality of the operating system is accommodated in a large kernel, making this kernel big and inflexible. To add a new functionality to the operating-system kernel, you first have to create and install a new kernel. This is a rather cumbersome task and can also be expensive, because running applications have to be interrupted and the system has to be restarted. Moreover, using an operating-system kernel that includes all possible kinds of functions, drivers, and protocols is not recommended either, because the kernel would then become huge and consume an unnecessary amount of memory. In addition, there are always new functionalities we would like to integrate into the kernel, or newer versions of existing functionalities, where errors have been removed. In fact, we can assume that the set of functions of an operating-system kernel will change over time. For this reason, monolithic kernels have to be continually updated with the problems described above.

Linux is based on the monolithic approach, but it has used a different method to solve the problems noted, since kernel Version 2.0. Note that it does not opt for the microkernel-based approach, which also has drawbacks. The solution are kernel modules. These modules can be easily added to the kernel at runtime and they behave as if they had belonged to the monolithic kernel since the system started. When the functionality of a module is no longer needed, then it can simply be removed and the memory space it used is freed.

We saw in Figure 2-1 in which components of the kernel we can use modules: device drivers, file systems, network protocols, and network drivers. The use of modules is actually not limited to these components. Modules can normally be used on an individual basis. However, adding some functionality means that you need a corresponding kernel interface to inform the rest of the kernel about the new components. The interfaces of the Linux network architecture and the possibilities to expand it by new functionalities are one of the central issues of this book.

When compiled as kernel modules, new functionalities can be added as needed and removed once you don't need them anymore. (See Section 2.4.1.) This means that the principle of modularization is very similar to the flexibility of microkernels, the only difference being that Linux modules run in the kernel address space, components of microkernel systems in the user address space. More specifically, the Linux module concept combines the benefits of both operating-system variants. On the one hand, it avoids the expensive change of address spaces known from the microkernel-based approach; on the other, it lets you expand the kernel functionality individually at runtime at the same time.

The following sections take a closer look at the structure and management of kernel modules, because modules are the best and most flexible option to enhance the Linux network architecture. Unfortunately, a detailed description of kernel modules would go beyond the scope of this book; we refer mainly to [RuCo01] and [BBDK+01] instead.

2.4.1 Managing Kernel Modules

A kernel module consists of object code, which is loaded into the kernel address space at runtime, where it can be executed. When the system starts, it is not known which modules with what functionalities should be loaded, so the module has to make itself known to the respective components of the kernel. A module should also remove all references to itself when it is removed from the kernel address space. There are two methods available for these tasks, which each kernel module should implement namely, init_module() and cleanup_module(). We will have a closer look at these methods in Section 2.4.2; first, however we need some general information about the management of kernel modules outside the kernel.

The following tools are used to manually load a module into the kernel, or remove it from the system:

insmod Modulename.o [arguments] This command tries to load a kernel module into the kernel address apace. In a successful case, the object code of the module is linked to the kernel; the module can now access the symbols (functions and data structures) of the kernel. Calling insmod causes the following system calls to run implicitly:
- sys_create_module() allocates memory space to accommodate the module in the kernel address space.
- sys_get_kernel_syms() returns the kernel's symbol table to resolve the missing references within the module to kernel symbols. (See Section 2.4.4.)
- sys_init_module() copies the module's object code into the kernel address space and calls the module's initialization function (init_module()).

When loading a module, we can also pass parameters (e.g., values for device names, name, interrupt lines, irq, and I/O ports, io_addr). In the module itself, these parameters should be designated by the macro MODULE_PARM(arg, type). When the module is loaded, then these parameters are simply passed by module name for example:

 root@tux # insmod wvlan_cs eth=1 network_name="myWavelan"

rmmod Modulename removes the specified module from the kernel address space. For this purpose, we use the system call sys_delete_module(), which, in turn, calls the module's method cleanup_module().
The module can now be removed, if the module's reference counter is zero, which means that the module is currently not used in any point within the kernel. (See details in [RuCo01].)
lsmod lists all currently loaded modules and their dependencies and reference counters.
modinfo shows information about a module (e.g., its functionality, parameters, and author). This information cannot be generated automatically; it has to be set by the macros MODULE_DESCRIPTION, MODULE_AUTHOR, and so on in the module's source text.

Loading Modules Automatically

In addition to via the command-line tools described above, kernel modules can also be loaded into the kernel automatically when needed. To enable the automatic loading of modules, the corresponding support has to be activated when creating the kernel (CONFIG_KMOD).

Using the tools described in the previous section to add and remove modules always requires a user's intervention more specifically, the intervention of root. For security reasons, only the system administrator is authorized to load and remove kernel modules. Though this approach is secure, it is somewhat inflexible for example, when a user requires the functionality of a module that is currently not loaded in the kernel. For this reason, a means was created for reloading modules automatically into the kernel upon demand.

Normally, the kernel generates an error message when a resource or a specific driver is not registered. You can ask for this component in advance by use of the kernel function request_module(). To use this function, you have to first activate the option Kernel Module Loader when configuring the kernel. Request_module() will then try to use the modprobe command to automatically reload the desired module (and any additionally required modules). You can select such options in the file /etc/modules.conf.

Figure 2-5 shows an example of the configuration file /etc/modules.conf. This file specifies that the network device eth0 is currently represented by the module wvlan_cs and that, for loading of this module, the specified parameters should be passed to this module. If modprobe cannot find the module, then printk() generates an error message. (See Appendix B.1.1.)

Figure 2-5. Configuration file of the module loader:`/etc/modules.conf`.

 # Aliases - specify your hardware alias eth0  wvlan_cs options wvlan_cs eth=1 network_name="MyNet" station_name="neo" alias char-major-4             serial alias char-major-5             serial alias char-major-6             lp alias char-major-9             st alias tty-ldisc-1              slip alias tty-ldisc-3              ppp

Though this mechanism runs automatically, it can load only those modules the administrator has specified in the configuration files, to ensure that no user can load system-critical modules. Modules loaded automatically can also be removed automatically after some time. More configuration options of the Kernel Module Loader and the modprobe tool are described on the man pages and in [RuCo01].

2.4.2 Registering and Unregistering Module Functionality

In contrast to an application that runs its tasks after its start, a module normally provides functions used by other parts of the kernel in the course of the system operation. The kernel is enhanced by a new functionality, which may be removed after its use. It is not known upon system start which functionalities will be added to the kernel by modules, so we need interfaces for a module to register its functionality. The different set, of kernel components (see Figure 2-1) have such interfaces (e.g., to register and unregister network drivers, file systems, protocols, etc.). (See Table 2-1.)

These interfaces can most easily be identified by function names. They generally begin with register_... and unregister_..., respectively. Table 2-1 showed a few examples.

The functionality of a module is registered and initialized in the module's own method init_module(). As described earlier, it is called directly after successful integration of the module in the kernel. Init_module() should run all initialization tasks, such as reserving memory, creating entries in the /proc directory, initializing data structures, registering and unregistering the functionality, and so on.

Upon successful execution of init_module(), the functionality of the module should be known in the kernel, and all initialization steps required for it should have run. However, if something goes wrong during the initialization, all actions done up to this point should be undone in any event. The reason is that, when init_module() returns with an error code, the object code of the module is removed from the kernel address space, and all attempts to access methods of the module lead to a memory access error. [RuCo01] includes several tips to solve this problem.

Appendix D shows a kernel module that adds a fictitious functionality to the kernel. In the further course of this book, we will introduce many elements of the Linux network architecture that can be implemented in the form of kernel modules (e.g., network drivers and protocols). You can use the module from Appendix D as a framework for modules you design yourself to enhance the Linux network architecture.

One of the module's own methods, cleanup_module(), is used to remove that module from the kernel address space. It should be used to clean up the work environment of the module (i.e., to unregister the module's functionality, free the memory it used, and remove dependencies between the module and other parts of the kernel).

Once you have called and run cleanup_module(), there should be no more references by the kernel or other modules to the module concerned. Otherwise, this would lead to a memory access error, causing the computer to crash.

The method cleanup_module() is called only if the reference counter (use counter) of the module is equal to zero. Otherwise, it is assumed that the module's functionality is currently needed, so that it cannot be removed. The macro MOD_IN_USE can be used to check the use counters.

A good example for the use of the reference counter is a module-based network driver. As soon as the relevant network device is opened, it is possible to access the driver's methods (and thus the module's methods) asynchronously. For this reason, the reference counter (for module-based drivers) is always incremented by the macro MOD_INC_USE_COUNT in the method dev?gt;open(). When the network device is closed, so that driver methods can no longer be accessed, then MOD_DEC_USE_COUNT decrements the reference counter by one.

2.4.3 Passing Parameters When Loading a Module

We mentioned in Section 2.4.1 that parameters can be passed during loading of a kernel module. These parameters are specified either directly by insmod when loading or by modprobe in the configuration file. To be able to pass parameters to a module, you have to have previously declared these parameters in the module's source text. The following macros are available for this purpose:

MODULE_PARM(var, type) designates the variable var as a parameter of the module, and a value can be assigned to this parameter during loading. It needs to be previously declared, of course. The second parameter of the macro (type) specifies the data type of the module parameter. The following types can be specified:
- b: byte
- h: short (two bytes)
- i: integer
- l: long
- s: string (or a pointer to a string)
If the parameter is an array, then this can be specified as such by stating the array size before the type. For example, 1?i means that the parameter is an array with integer values, and between one and three values can be assigned to this array. More information about this topic are included in the header file <linux/module.h>.
MODULE_PARM_DESC(var, desc) allows you to add a description (desc) for the parameter var. For example, this description is displayed when the tool modinfo is called. The description of a parameter should be short, but descriptive enough to make clear the task of that parameter.

In addition, the following macros can be used to output additional information, which can be called by use of the command-line tool modinfo. It is recommended that one use this informative option, because there could often be situations where the user of a module does not provide the source text:

MODULE_AUTHOR(name) can be used to specify the author of a module. It is recommended to also state an e-mail address, in addition to names, for easy contact in the event that the module contains errors (and, of course, to be able to accept the large number of thank-you messages for your generous contribution to the open-source movement :-)).
MODULE_DESCRIPTION(desc) should contain a description of the module's functionality. Ideally, you describe the basic functionality and include reference to further information (e.g., a URL).
MODULE_SUPPORTED_DEVICE(dev) is currently not used. However, it might be used in future kernel versions to load the module automatically when the device dev is required.

The sample module in Appendix D shows how to use the macros described above.

2.4.4 Symbol Tables of the Kernel and Modules

Kernel modules are object code, which is added to the kernel at runtime. Once it has been embedded, the module is in the kernel address space. Before the embedding of a module, however, several aspects have to be observed. As the module will probably have to call functions of the actual kernel and want to use its data structures, we first have to resolve the addresses of these functions and data structures. The Linux kernel includes a table, the ksym symbol table,^[2] for this purpose. This table includes all required information. Each row of the table contains the name and memory address of a function or variable. Information about the data type or parameters is not saved to the table. Note that the programmer has to ensure correct mapping.

^[2] You can use the command-line call ksyms a to view the contents of the current symbol table.

You can see in Figure 2-6 that a module can access only functions and data structures saved in the kernel's symbol table. Other parts of the kernel are not accessible to a module. This has the benefit that modules cooperate with the kernel exclusively over defined interfaces, as is true for the microkernel architectures described in Section 2.1.

Figure 2-6. Symbol table of the Linux kernel (excerpt).

 c01e2640    register_netdevice c01e2888    unregister_netdevice c01e0ef8    netdev_state_change c01ddf94    skb_clone c01de20c    skb_copy c01e147c    netif_rx c01e0b40    dev_add_pack c01e0b8c    dev_remove_pack c01e0d78    dev_get c01e0e94    dev_alloc d0a03ec4    ppp_register_channel             [ppp_generic] d0a03f98    ppp_unregister_channel           [ppp_generic] d0a08660    ppp_crc16_table                  [ppp_async]

The instruction EXPORT_SYMBOL(xxx) from the file kernel/ksyms.c adds a function or variable of the kernel to the symbol table. From then on, each module can access these variables or call functions. In addition, modules can export references to functions and variables from the module into the symbol table. The macro EXPORT_SYMBOL can be used to allow modules to export selected function and data pointers into the symbol table of the kernel. A module that does not want to export methods or variables can simply use the macro EXPORT_NO_SYMBOLS to express its wish.

A module can normally access only those symbols that are listed in the symbol table when the module loads. For this reason, a situation where two modules loaded consecutively into the kernel want to access each other's symbols may cause problems. The module loaded first cannot access the symbols of the second module, because they are not yet known. Since Linux kernel Version 2.4, however, there is a solution to this problem. This solution is called intermodule communication and is introduced in [RuCo01] and [BBDK+01].