4.4. Slab AllocatorWe discussed that pages are the basic unit of memory for the memory manager. However, processes generally request memory on the order of bytes, not on the order of pages. To support the allocation of smaller memory requests made through calls to functions like kmalloc(), the kernel implements the slab allocator, which is a layer of the memory manager that acts on acquired pages. The slab allocator seeks to reduce the cost incurred by allocating, initializing, destroying, and freeing memory areas by maintaining a ready cache of commonly used memory areas. This cache maintains the memory areas allocated, initialized, and ready to deploy. When the requesting process no longer needs the memory areas, they are simply returned to the cache. In practice, the slab allocator is made up of many caches, each of which stores memory areas of different sizes. Caches can be specialized or general purpose. Specialized caches store memory areas that hold specific objects, such as descriptors. For example, process descriptors, the task_structs, are stored in a cache that the slab allocator maintains. The size of the memory areas held by this cache are sizeof(task_struct). In the same manner, inode and dentry data structures are also maintained in caches. General caches are made of memory areas of predetermined sizes. These sizes include memory areas of 32, 64, 128, 256, 512, 1,024, 2,048, 4,096, 8,192, 16,384, 32,768, 65,536, and 131,072 bytes.[7]
If we run the command cat /proc/slabinfo, the existing slab allocator caches are listed. Looking at the first column of the output, we can see the names of data structures and a group of entries following the format size-*. The first set corresponds to specialized object caches; the latter set corresponds to caches that hold general-purpose objects of the specified size. You might also notice that the general-purpose caches have two entries per size, one of which ends with (DMA). This exists because memory areas from either DMA or normal zones can be requested. The slab allocator maintains caches of both types of memory to facilitate these requests. Figure 4.5 shows the output of /proc/slabinfo, which shows the caches of both types of memory. Figure 4.5. cat /proc/slabinfoA cache is further subdivided into containers called slabs. Each slab is made up of one or more contiguous page frames from which the smaller memory areas are allocated. That is why we say that the slabs contain the objects. The objects themselves are address intervals of a predetermined size within a page frame that belongs to a particular slab. Figure 4.6 shows the slab allocator anatomy. Figure 4.6. Slab Allocator AnatomyThe slab allocator uses three main structures to maintain object information: the cache descriptor called kmem_cache, the general caches descriptor called cache_sizes, and the slab descriptor called slab. Figure 4.7 summarizes the relationships between all the descriptors. Figure 4.7. Slab Allocator Structures4.4.1. Cache DescriptorEvery cache has a cache descriptor of type kmem_cache_s, which holds its information. Most of these values are set or calculated at cache-creation time in kmem_cache_create() (mm/slab.c). We discuss this function in a later section. First, let's look at some of the fields in the cache descriptor and understand the information they hold. ----------------------------------------------------------------------------- mm/slab.c 246 struct kmem_cache_s { ... 252 struct kmem_list3 lists; ... 254 unsigned int objsize; 255 unsigned int flags; /* constant flags */ 256 unsigned int num; /* # of objs per slab */ ... 263 unsigned int gfporder; 264 265 /* force GFP flags, e.g. GFP_DMA */ 266 unsigned int gfpflags; 267 268 size_t color; /* cache coloring range */ 269 unsigned int color_off; /* color offset */ 270 unsigned int color_next; /* cache coloring */ 271 kmem_cache_t *slabp_cache; 272 unsigned int dflags; /* dynamic flags */ 273 273 /* constructor func */ 274 void (*ctor)(void *, kmem_cache_t *, unsigned long); 275 276 /* de-constructor func */ 277 void (*dtor)(void *, kmem_cache_t *, unsigned long); 278 279 /* 4) cache creation/removal */ 280 const char *name; 281 struct list_head next; 282 ... 301 }; ----------------------------------------------------------------------------- 4.4.1.1. listsThe lists field is a structure that holds three lists heads, which each correspond to the three states that slabs can find themselves in: partial, full, and free. A cache can have one or more slabs in any of these states. It is by way of this data structure that the cache references the slabs. The lists themselves are doubly linked lists that are maintained by the slab descriptor field list. This is described in the "Slab Descriptor" section later in this chapter. ----------------------------------------------------------------------------- mm/slab.c 217 struct kmem_list3 { 218 struct list_head slabs_partial; 219 struct list_head slabs_full; 220 struct list_head slabs_free; ... 223 unsigned long next_reap; 224 struct array_cache *shared; 225 }; ----------------------------------------------------------------------------- lists.slabs_partiallists.slabs_partial is the head of the list of slabs that are only partially allocated with objects. That is, a slab in the partial state has some of its objects allocated and some free to be used. lists.slabs_fulllists.slabs_full is the head of the list of slabs whose objects have all been allocated. These slabs contain no available objects. lists.slabs_freelists.slabs_free is the head of the list of slabs whose objects are all free to be allocated. Not a single one of its objects has been allocated. Maintaining these lists reduces the time it takes to find a free object. When an object from the cache is requested, the kernel searches the partial slabs. If the partial slabs list is empty, it then looks at the free slabs. If the free slabs list is empty, a new slab is created. lists.next_reapSlabs have page frames allocated to them. If these pages are not in use, it is better to return them to the main memory pool. Toward this end, the caches are reaped. This field holds the time of the next cache reap. It is set in kmem_cache_create() (mm/slab.c) at cache-creation time and is updated in cache_reap() (mm/slab.c) every time it is called. 4.4.1.2. objsizeThe objsize field holds the size (in bytes) of the objects in the cache. This is determined at cache-creation time based on requested size and cache alignment concerns. 4.4.1.3. flagsThe flags field holds the flag mask that describes constant characteristics of the cache. Possible flags are defined in include/linux/slab.h and Table 4.4 describes them.
4.4.1.4. numThe num field holds the number of objects per slab in this cache. This is determined upon cache creation (also in kmem_cache_create()) based on gfporder's value (see the next field), the size of the objects to be created, and the alignment they require. 4.4.1.5. gfporderThe gfporder is the order (base 2) of the number of contiguous page frames that are contained per slab in the cache. This value defaults to 0 and is set upon cache creation with the call to kmem_cache_create(). 4.4.1.6. gfpflagsThe gfpflags flags specify the type of page frames to be requested for the slabs in this cache. They are determined based on the flags requested of the memory area. For example, if the memory area is intended for DMA use, the gfpflags field is set to GFP_DMA, and this is passed on upon page frame request. 4.4.1.7. slabp_cacheSlab descriptors can be stored within the cache itself or external to it. If the slab descriptors for the slabs in this cache are stored externally to the cache, the slabp_cache field holds a pointer to the cache descriptor of the cache that stores objects of the type slab descriptor. See the "Slab Descriptor" section for more information on slab descriptor storage. 4.4.1.8. ctorThe ctor field holds a pointer to the constructor[8] that is associated with the cache, if one exists.
4.4.1.9. dtorMuch like the ctor field, the dtor field holds a pointer to the destructor that is associated with the cache, if one exists. Both the constructor and destructor are defined at cache-creation time and passed as parameters to kmem_cache_create(). 4.4.1.10. nameThe name field holds the human-readable string of the name that is displayed when /proc/slabinfo is opened. For example, the cache that holds file pointers has a value of filp in this field. This can be better understood by executing a call to cat /proc/slabinfo. The name field of a slab has to hold a unique value. Upon creation, the name requested for a slab is compared to the names of all other slabs in the list. No duplicates are allowed. The slab creation fails if another slab exists with the same name. 4.4.1.11. nextnext is the pointer to the next cache descriptor in the singly linked list of cache descriptors. 4.4.2. General Purpose Cache DescriptorAs previously mentioned, the caches that hold the predetermined size objects for general use are always in pairs. One cache is for allocating the objects from DMA memory, and the other is for standard allocations from normal memory. If you recall the memory zones, you realize that the DMA cache is in ZONE_DMA and the standard cache is in ZONE_NORMAL. The struct cache_sizes is a useful way to store together all the information regarding general size caches. ----------------------------------------------------------------------------- include/linux/slab.h 69 struct cache_sizes { 70 size_t cs_size; 71 kmem_cache_t *cs_cachep; 72 kmem_cache_t *cs_dmacachep; 73 }; ----------------------------------------------------------------------------- 4.4.2.1. cs_sizeThe cs_size field holds the size of the memory objects contained in this cache. 4.4.2.2. cs_cachepThe cs_cachep field holds the pointer to the normal memory cache descriptor for objects to be allocated from ZONE_NORMAL. 4.4.2.3. cs_dmacachepThe cs_dmacachep field holds the pointer to the DMA memory cache descriptor for objects to be allocated from ZONE_DMA. One question comes to mind, "Where are the cache descriptors stored?" The slab allocator has a cache that is reserved just for that purpose. The cache_cache cache holds objects of the type cache descriptors. This slab cache is initialized statically during system bootstrapping to ensure that cache descriptor storage is available. 4.4.3. Slab DescriptorEach slab in a cache has a descriptor that holds information particular to that slab. We just mentioned that cache descriptors are stored in the specialized cache called cache_cache. Slab descriptors in turn can be stored in two places: They are stored within the slab itself (specifically, the first-page frame) or externally within the first "general purpose" cache with objects large enough to hold the slab descriptor. This is determined upon cache creation based on space left over from object alignment. This space is determined upon cache creation. Let's look at some of the slab descriptor fields: ----------------------------------------------------------------------------- mm/slab.c 173 struct slab { 174 struct list_head list; 175 unsigned long coloroff; 176 void *s_mem; /* including color offset */ 177 unsigned int inuse; /* num of objs active in slab */ 178 kmem_bufctl_t free; 179 }; ----------------------------------------------------------------------------- 4.4.3.1. listIf you recall from the cache descriptor discussion, a slab can be in one of three states: free, partial, or full. The cache descriptor holds all slab descriptors in three listsone for each state. All slabs in a particular state are kept in a doubly linked list by means of the list field. 4.4.3.2. s_memThe s_mem field holds the pointer to the first object in the slab. 4.4.3.3. inuseThe value inuse keeps track of the number of objects that are occupied in that slab. For full and partial slabs, this is a positive number; for free slabs, this is 0. 4.4.3.4. freeThe free field holds an index value to the array whose entries represent the objects in the slab. In particular, the free field contains the index value of the entry representing the first available object in the slab. The kmem_bufctl_t data type links all the objects within a slab. The data type is simply an unsigned integer and is defined in include/asm/types.h. These data types make up an array that is always stored right after the slab descriptor, regardless of whether the slab descriptor is stored internally or externally to the slab. This becomes clear when we look at the inline function slab_bufctl(), which returns the array: ----------------------------------------------------------------------------- mm/slab.c 1614 static inline kmem_bufctl_t *slab_bufctl(struct slab *slabp) 1615 { 1616 return (kmem_bufctl_t *)(slabp+1); 1617 } ----------------------------------------------------------------------------- The function slab_bufctl() takes in a pointer to the slab descriptor and returns a pointer to the memory area immediately following the slab descriptor. When the cache is initialized, the slab->free field is set to 0 (because all objects will be free so it should return the first one), and each entry in the kmem_bufctl_t array is set to the index value of the next member of the array. This means that the 0th element holds the value 1, the 1st element holds the value 2, and so on. The last element in the array holds the value BUFCTL_END, which indicates that this is the last element in the array. Figure 4.8 shows how the slab descriptor, the bufctl array, and the slab objects are laid out when the slab descriptors are stored internally to the slab. Table 4.5 shows the possible values of certain slab descriptor fields when the slab is in each of the three possible states. Figure 4.8. Slab Descriptor and bufctl
|