2.6 Memory Management in the Kernel | Linux Network Architecture

Memory management is one of the main components in the kernel of any operating system. It supplies a virtual memory space to processes, often one much larger than the physical memory. This can be achieved by partitioning memory pages and outsourcing memory pages that are temporarily not needed to the swap memory on the hard disk. Access to an outsourced page by an application is intercepted and handled by the kernel. The page is reloaded into the physical memory and the application can access the memory without even noticing anything about insourcing and outsourcing of things.

The memory residing in the kernel cannot be outsourced because, if the memory management were to move the code to the swap memory, it would not be available later on, and the system would be blocked. For this and, of course, performance reasons, the memory of the kernel cannot be outsourced. Therefore, we will always distinguish between the kernel address space and the user address space in the rest of this book.

Virtual memory management is one of the most important and most complex components of an operating system. [Tan95] offers an overview of the theory of virtual memory management, and detailed information about its implementation in the Linux kernel is described in [BBDK+01] and [BoCe00]. Within the Linux network architecture, the structure of the virtual memory management is less interesting; it is of interest only in regard to whether memory can be reserved and released in an efficient way, as we will see in the following section. We will also introduce methods to exchange data between the kernel address space and the user address space. Section 2.6.2 ends with a brief introduction of the slab cache, representing an efficient management of equalsized memory spaces (for example, similar to those use for socket buffers).

2.6.1 Selected Memory Management Functions

This section introduces the basic functions of memory management a programmer writing kernel components or kernel modules needs. First, we will discuss how memory spaces can be reserved and released in the kernel. Then we will introduce functions used to copy data between the kernel address space and the user address space.

Reserving and Releasing Memory in the Kernel

`kmalloc()`	mm/slab.c

kmalloc(size, priority) attempts to reserve consecutive memory space with a size of size bytes in the kernel's memory. This may mean that some more bytes will be reserved, because the memory is managed in the kernel in so-called slabs. Slabs are caches, each managing memory spaces with a specific size. (See /proc/slabinfo.) Letting a slab cache reserve memory space is clearly better performing than many other methods [Tan95].

The parameter priority can be used to specify options. We will briefly describe the most important options below and refer our readers to [RuCo01] for a detailed explanation of the large number of options offered by kmalloc(). The abbreviation GFP_ means that the function get_free_pages() may be used to reserve memory.

GFP_KERNEL is normally used when the requesting activity can be interrupted during the reservation. It can also be used for processes that want to reserve memory within a system call. For activities that must not be interrupted (e.g., interrupt routines), GFP_KERNEL should not be used.
GFP_ATOMIC is the counterpart of GFP_KERNEL and shows that the memory request should be atomic (i.e., without interrupting the activity).
GFP_DMA shows that memory in the DMA-enabled area should be reserved.
GFP_DMA can be combined with one of the two previous flags.
[RuCo01] introduces additional options, but we will not repeat them here, as they are of lesser interest.

The return value of kmalloc() is a pointer to the successfully reserved memory space, or NULL, if no more memory is available.

`kfree()`	mm/slab.c

kfree(objp) released the memory space reserved at address objp. This memory space should previously have been reserved by kmalloc().

Copying Between Kernel and User Address Space

The following functions can be used to exchange data between the user address space and the kernel address space. They are defined in the file include/asm/uaccess.h.

copy_from_user(to, from, count) copies count bytes from the address from in the user address space to the address to in the kernel address space.
copy_to_user(to, from, count) copies count bytes from the address from in the kernel address space to the address to in the user address space.
[RuCo01] and [BBDK+01] introduce more functions, but most of them can be implemented by copy_from/to_user().

Before the user address space is accessed, the above functions use the method access_ok() to confirm that the corresponding virtual memory page is actually residing in the physical memory. This control had to be done manually in earlier versions of the Linux kernel.

2.6.2 Memory Caches

Reserving memory spaces by calling kmalloc() can take a while, but it is the only way to reserve a memory space. However, when memory spaces of the same size are required over and over again, it is not useful to release them with kfree() after each use. Instead, they should be briefly buffered in a list and used from there when needed.

The Linux kernel allows this approach by providing slab caches. This means that you can create a cache with memory spaces of specific sizes, where the memory spaces no longer needed are managed until they are requested again.

Information about the current slab caches, including their use and sizes, can be polled from the proc file /proc/slabinfo. We will now introduce the methods required to build and tear down slab caches as well as functions to reserve and release memory spaces from a slab cache.

`kmem_cache_create()`	mm/slab.c

The function kmem_cache_create(name, size, offset,flags, ctor, dtor) is used to create a slab cache for memory spaces with sizes in size bytes. An arbitrary number of memory spaces (of equal size) can be managed in this slab cache. The parameter name should point to a string containing the name of the slab cache, which is specified in outputs in the proc directory.

Offset can be used to specify the offset of the first memory space of a memory page. Note, however, that this is normally not necessary, so it is initialized to null. The parameter flags can be used to specify additional options when reserving memory spaces:

SLAB_HWCACHE_ALIGN: Aligns to the size of the first-level cache in the CPU.
SLAB_NO_REAP: Prevents the slab cache from being reduced when the kernel needs memory.
SLAB_CACHE_DMA: Specifies that the reserved memory spaces have to be within DMA-enabled areas.

The ctor and dtor parameters allow you to specify a constructor and a destructor for your memory spaces. They are then used to initialize or clean up, respectively, the reserved memory spaces.

The return value of the function kmem_cache_create() is a pointer to the management structure of the slab cache, which is of data type kmem_cache_t. In the Linux network architecture, slab caches can be used for instance, for socket buffers (as in Chapter 4). The cache for socket buffers is created as follows:

 skbuff_head_cache = kmem_cache_create("skbuff_head_cache", sizeof(struct                     sk_buff), 0, SLAB_HWCACHE_ALIGN, skb_headerinit, NULL);

`kmem_cache_destroy()`	mm/slab.c

kmem_cache_destroy(cachep) releases the slab cache cachep. Note, however, that this call will be successful only provided that all memory spaces granted by the cache have been returned to the cache; otherwise, kmem_cache_destroy() will be unsuccessful.

`kmem_cache_shrink()`	mm/slab.c

kmem_cache_shrink(cachep) is called by the kernel when the kernel itself requires memory space and might may have to reduce the cache.

`kmem_cache_alloc()`	mm/slab.c

kmem_cache_alloc(cachep, flags) can be used to request a memory space from the slab cache, cachep. If memory space is available, then this call immediately returns a pointer to it for the caller. If the slab cache is empty, then kmalloc() can be used to reserve new memory space. For this call of kmalloc(), you can use flags to specify the options introduced in Section 2.6.1.

`kmem_cache_free()`	mm/slab.c

kmem_cache_free(cachep, ptr) frees the memory space that begins at address ptr and gives it back to the cache, cachep. Of course, this should be a memory space that had been previously reserved with kmem_cache_alloc().