Section 4.4. Slab Allocator


4.4. Slab Allocator

We discussed that pages are the basic unit of memory for the memory manager. However, processes generally request memory on the order of bytes, not on the order of pages. To support the allocation of smaller memory requests made through calls to functions like kmalloc(), the kernel implements the slab allocator, which is a layer of the memory manager that acts on acquired pages.

The slab allocator seeks to reduce the cost incurred by allocating, initializing, destroying, and freeing memory areas by maintaining a ready cache of commonly used memory areas. This cache maintains the memory areas allocated, initialized, and ready to deploy. When the requesting process no longer needs the memory areas, they are simply returned to the cache.

In practice, the slab allocator is made up of many caches, each of which stores memory areas of different sizes. Caches can be specialized or general purpose. Specialized caches store memory areas that hold specific objects, such as descriptors. For example, process descriptors, the task_structs, are stored in a cache that the slab allocator maintains. The size of the memory areas held by this cache are sizeof(task_struct). In the same manner, inode and dentry data structures are also maintained in caches. General caches are made of memory areas of predetermined sizes. These sizes include memory areas of 32, 64, 128, 256, 512, 1,024, 2,048, 4,096, 8,192, 16,384, 32,768, 65,536, and 131,072 bytes.[7]

[7] All general caches are L1 aligned for performance reasons.

If we run the command cat /proc/slabinfo, the existing slab allocator caches are listed. Looking at the first column of the output, we can see the names of data structures and a group of entries following the format size-*. The first set corresponds to specialized object caches; the latter set corresponds to caches that hold general-purpose objects of the specified size.

You might also notice that the general-purpose caches have two entries per size, one of which ends with (DMA). This exists because memory areas from either DMA or normal zones can be requested. The slab allocator maintains caches of both types of memory to facilitate these requests. Figure 4.5 shows the output of /proc/slabinfo, which shows the caches of both types of memory.

Figure 4.5. cat /proc/slabinfo


A cache is further subdivided into containers called slabs. Each slab is made up of one or more contiguous page frames from which the smaller memory areas are allocated. That is why we say that the slabs contain the objects. The objects themselves are address intervals of a predetermined size within a page frame that belongs to a particular slab. Figure 4.6 shows the slab allocator anatomy.

Figure 4.6. Slab Allocator Anatomy


The slab allocator uses three main structures to maintain object information: the cache descriptor called kmem_cache, the general caches descriptor called cache_sizes, and the slab descriptor called slab. Figure 4.7 summarizes the relationships between all the descriptors.

Figure 4.7. Slab Allocator Structures


4.4.1. Cache Descriptor

Every cache has a cache descriptor of type kmem_cache_s, which holds its information. Most of these values are set or calculated at cache-creation time in kmem_cache_create() (mm/slab.c). We discuss this function in a later section. First, let's look at some of the fields in the cache descriptor and understand the information they hold.

 ----------------------------------------------------------------------------- mm/slab.c 246  struct kmem_cache_s { ... 252   struct kmem_list3  lists; ... 254   unsigned int   objsize; 255   unsigned int   flags;  /* constant flags */ 256   unsigned int   num;  /* # of objs per slab */ ... 263   unsigned int   gfporder; 264 265  /* force GFP flags, e.g. GFP_DMA */ 266   unsigned int   gfpflags; 267 268   size_t    color; /* cache coloring range */ 269   unsigned int   color_off;  /* color offset */ 270   unsigned int   color_next;  /* cache coloring */ 271   kmem_cache_t   *slabp_cache; 272   unsigned int   dflags;   /* dynamic flags */ 273 273  /* constructor func */ 274   void (*ctor)(void *, kmem_cache_t *, unsigned long); 275 276  /* de-constructor func */ 277   void (*dtor)(void *, kmem_cache_t *, unsigned long); 278 279  /* 4) cache creation/removal */ 280   const char   *name; 281   struct list_head  next; 282 ... 301  }; ----------------------------------------------------------------------------- 

4.4.1.1. lists

The lists field is a structure that holds three lists heads, which each correspond to the three states that slabs can find themselves in: partial, full, and free. A cache can have one or more slabs in any of these states. It is by way of this data structure that the cache references the slabs. The lists themselves are doubly linked lists that are maintained by the slab descriptor field list. This is described in the "Slab Descriptor" section later in this chapter.

 ----------------------------------------------------------------------------- mm/slab.c 217  struct kmem_list3 { 218   struct list_head  slabs_partial;  219   struct list_head  slabs_full; 220   struct list_head  slabs_free; ... 223   unsigned long  next_reap; 224   struct array_cache  *shared; 225  }; ----------------------------------------------------------------------------- 

lists.slabs_partial

lists.slabs_partial is the head of the list of slabs that are only partially allocated with objects. That is, a slab in the partial state has some of its objects allocated and some free to be used.

lists.slabs_full

lists.slabs_full is the head of the list of slabs whose objects have all been allocated. These slabs contain no available objects.

lists.slabs_free

lists.slabs_free is the head of the list of slabs whose objects are all free to be allocated. Not a single one of its objects has been allocated.

Maintaining these lists reduces the time it takes to find a free object. When an object from the cache is requested, the kernel searches the partial slabs. If the partial slabs list is empty, it then looks at the free slabs. If the free slabs list is empty, a new slab is created.

lists.next_reap

Slabs have page frames allocated to them. If these pages are not in use, it is better to return them to the main memory pool. Toward this end, the caches are reaped. This field holds the time of the next cache reap. It is set in kmem_cache_create() (mm/slab.c) at cache-creation time and is updated in cache_reap() (mm/slab.c) every time it is called.

4.4.1.2. objsize

The objsize field holds the size (in bytes) of the objects in the cache. This is determined at cache-creation time based on requested size and cache alignment concerns.

4.4.1.3. flags

The flags field holds the flag mask that describes constant characteristics of the cache. Possible flags are defined in include/linux/slab.h and Table 4.4 describes them.

Table 4.4. Slab Flags

Flag Name

Description

SLAB_POISON

Requests that a test pattern of a5a5a5a5 be written to the slab upon creation. This can then be used to verify memory that has been initialized.

SLAB_NO_REAP

When memory requests meet with insufficient memory conditions, the memory manager begins to reap memory areas that are not used. Setting this flag ensures that this cache won't be automatically reaped under these conditions.

SLAB_HWCACHE_ALIGN

Requests that objects be aligned to the processor's hardware cacheline to improve performance by cutting down memory cycles.

SLAB_CACHE_DMA

Indicates that DMA memory should used. When requesting new page frames, the GFP_DMA flag is passed to the buddy system.

SLAB_PANIC

Indicates that a panic should be called if kmem_cache_create() fails for any reason.


4.4.1.4. num

The num field holds the number of objects per slab in this cache. This is determined upon cache creation (also in kmem_cache_create()) based on gfporder's value (see the next field), the size of the objects to be created, and the alignment they require.

4.4.1.5. gfporder

The gfporder is the order (base 2) of the number of contiguous page frames that are contained per slab in the cache. This value defaults to 0 and is set upon cache creation with the call to kmem_cache_create().

4.4.1.6. gfpflags

The gfpflags flags specify the type of page frames to be requested for the slabs in this cache. They are determined based on the flags requested of the memory area. For example, if the memory area is intended for DMA use, the gfpflags field is set to GFP_DMA, and this is passed on upon page frame request.

4.4.1.7. slabp_cache

Slab descriptors can be stored within the cache itself or external to it. If the slab descriptors for the slabs in this cache are stored externally to the cache, the slabp_cache field holds a pointer to the cache descriptor of the cache that stores objects of the type slab descriptor. See the "Slab Descriptor" section for more information on slab descriptor storage.

4.4.1.8. ctor

The ctor field holds a pointer to the constructor[8] that is associated with the cache, if one exists.

[8] If you are familiar with object-oriented programming, the concept of constructors and destructors will not be new to you. The ctor field of the cache descriptor allows for the programming of a function that will get called every time a new cache descriptor is created. Likewise, the dtor field holds a pointer to a function that will be called every time a cache descriptor is destroyed.

4.4.1.9. dtor

Much like the ctor field, the dtor field holds a pointer to the destructor that is associated with the cache, if one exists.

Both the constructor and destructor are defined at cache-creation time and passed as parameters to kmem_cache_create().

4.4.1.10. name

The name field holds the human-readable string of the name that is displayed when /proc/slabinfo is opened. For example, the cache that holds file pointers has a value of filp in this field. This can be better understood by executing a call to cat /proc/slabinfo. The name field of a slab has to hold a unique value. Upon creation, the name requested for a slab is compared to the names of all other slabs in the list. No duplicates are allowed. The slab creation fails if another slab exists with the same name.

4.4.1.11. next

next is the pointer to the next cache descriptor in the singly linked list of cache descriptors.

4.4.2. General Purpose Cache Descriptor

As previously mentioned, the caches that hold the predetermined size objects for general use are always in pairs. One cache is for allocating the objects from DMA memory, and the other is for standard allocations from normal memory. If you recall the memory zones, you realize that the DMA cache is in ZONE_DMA and the standard cache is in ZONE_NORMAL. The struct cache_sizes is a useful way to store together all the information regarding general size caches.

 ----------------------------------------------------------------------------- include/linux/slab.h 69  struct cache_sizes { 70   size_t   cs_size; 71   kmem_cache_t  *cs_cachep; 72   kmem_cache_t  *cs_dmacachep; 73  }; ----------------------------------------------------------------------------- 

4.4.2.1. cs_size

The cs_size field holds the size of the memory objects contained in this cache.

4.4.2.2. cs_cachep

The cs_cachep field holds the pointer to the normal memory cache descriptor for objects to be allocated from ZONE_NORMAL.

4.4.2.3. cs_dmacachep

The cs_dmacachep field holds the pointer to the DMA memory cache descriptor for objects to be allocated from ZONE_DMA.

One question comes to mind, "Where are the cache descriptors stored?" The slab allocator has a cache that is reserved just for that purpose. The cache_cache cache holds objects of the type cache descriptors. This slab cache is initialized statically during system bootstrapping to ensure that cache descriptor storage is available.

4.4.3. Slab Descriptor

Each slab in a cache has a descriptor that holds information particular to that slab. We just mentioned that cache descriptors are stored in the specialized cache called cache_cache. Slab descriptors in turn can be stored in two places: They are stored within the slab itself (specifically, the first-page frame) or externally within the first "general purpose" cache with objects large enough to hold the slab descriptor. This is determined upon cache creation based on space left over from object alignment. This space is determined upon cache creation.

Let's look at some of the slab descriptor fields:

 ----------------------------------------------------------------------------- mm/slab.c 173  struct slab { 174   struct list_head  list; 175   unsigned long   coloroff; 176   void    *s_mem;  /* including color offset */ 177   unsigned int   inuse;   /* num of objs active in slab */ 178   kmem_bufctl_t   free; 179  }; ----------------------------------------------------------------------------- 

4.4.3.1. list

If you recall from the cache descriptor discussion, a slab can be in one of three states: free, partial, or full. The cache descriptor holds all slab descriptors in three listsone for each state. All slabs in a particular state are kept in a doubly linked list by means of the list field.

4.4.3.2. s_mem

The s_mem field holds the pointer to the first object in the slab.

4.4.3.3. inuse

The value inuse keeps track of the number of objects that are occupied in that slab. For full and partial slabs, this is a positive number; for free slabs, this is 0.

4.4.3.4. free

The free field holds an index value to the array whose entries represent the objects in the slab. In particular, the free field contains the index value of the entry representing the first available object in the slab. The kmem_bufctl_t data type links all the objects within a slab. The data type is simply an unsigned integer and is defined in include/asm/types.h. These data types make up an array that is always stored right after the slab descriptor, regardless of whether the slab descriptor is stored internally or externally to the slab. This becomes clear when we look at the inline function slab_bufctl(), which returns the array:

 ----------------------------------------------------------------------------- mm/slab.c 1614  static inline kmem_bufctl_t *slab_bufctl(struct slab *slabp) 1615  { 1616   return (kmem_bufctl_t *)(slabp+1); 1617  } ----------------------------------------------------------------------------- 

The function slab_bufctl() takes in a pointer to the slab descriptor and returns a pointer to the memory area immediately following the slab descriptor.

When the cache is initialized, the slab->free field is set to 0 (because all objects will be free so it should return the first one), and each entry in the kmem_bufctl_t array is set to the index value of the next member of the array. This means that the 0th element holds the value 1, the 1st element holds the value 2, and so on. The last element in the array holds the value BUFCTL_END, which indicates that this is the last element in the array.

Figure 4.8 shows how the slab descriptor, the bufctl array, and the slab objects are laid out when the slab descriptors are stored internally to the slab. Table 4.5 shows the possible values of certain slab descriptor fields when the slab is in each of the three possible states.

Figure 4.8. Slab Descriptor and bufctl


Table 4.5. Slab State and Descriptor Field Values
 

Free

Partial

Full

slab->inuse

0

X

N

slab->free

0

X

N

N = Number of objects in slab

X = Some variable positive number





The Linux Kernel Primer. A Top-Down Approach for x86 and PowerPC Architectures
The Linux Kernel Primer. A Top-Down Approach for x86 and PowerPC Architectures
ISBN: 131181637
EAN: N/A
Year: 2005
Pages: 134

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net