Chapter 6: Buffer and Timer Management | Designing Embedded Communications Software

Buffers are used for the interchange of data among modules in a communications system. Timers are used for keeping track of timeouts for messages to be sent, acknowledgements to be received, as well as for aging out of information in tables. A strategy for Buffer and Timer Management are essential for the communications software subsystem; this chapter covers these two topics in detail.

6.1 Buffer Management

Buffers are used for data interchange among modules in a communications system. The data may be control or payload information and is required for system functioning. For example, when passing data from one process to another, a buffer may be allocated and filled in by the source process and then sent to the destination process. In fact, the buffer scheme in some operating systems evolved from inter-process communications (IPC) mechanisms.

The basic premise of buffer management in communications systems is to minimize data copying. The performance of a system is brought down dramatically if it spends a significant amount of CPU and memory bandwidth in copying data between buffers. The various techniques for buffer management build on this premise.

Buffer management is the provision of a uniform mechanism to allocate, manipulate and free buffers within the system. Allocation involves obtaining the buffer from the global buffer pool. Manipulation includes copying data to a buffer, copying data from a buffer, deleting data from anywhere in the buffer-beginning, middle, or end, concatenating two buffers, duplicating buffers, and so on. Freeing buffers returns the buffers to the global pool so that they can be allocated by other tasks or modules.

6.1.1 Global Buffer Management

Global buffer management uses a single pool for all buffers in the system. This is a common approach in communications systems, where a buffer pool is built out of a pre-designated memory area obtained using partition allocation calls. The number of buffers required in the system is the total of the individual buffer requirements for each of the modules. The advantage of a global pool is that memory management is easier, since the buffer pool size can be increased whenever a new module is added.

The use of a global pool leads to a lack of isolation between modules. An errant or buggy module could deplete the global buffer pool, impacting well-behaved modules. Assume that Modules A, B, and C run three different protocols but use the same global buffer pool. Also, assume that Module A does not release any of the buffers it allocates, thus slowly depleting the buffer pool. Eventually Modules B and C will have their buffer allocations fail and cease operation.

6.1.2 Local Buffer Management

In local buffer management, each module manages its own buffers. The advantage is that buffer representation and handling is independent of the other modules. Consider a module which requires routines only for buffer allocation and release but not other routines such as those for buffer concatenation. In this case, it can have its own 'private' buffer management library without the more complex routines. Each module can have the most efficient buffer management library for its operation.

While this provides flexibility, it requires some care at the interface between modules since the representations must be mapped. Moreover, the designer will not have a uniform view of the buffer requirements for the entire system. For these reasons, buffer management libraries are usually global, while buffers themselves can be allocated at either the global or local level.

Third-Party Protocol Libraries

It may not always be possible to design a uniform buffer management library for the entire system. A system may be built with protocol libraries licensed from third-party protocol stack vendors. These vendors could provide their libraries as source or object code. If the libraries are available as only object code, communications system designers do not have visibility into the buffer management scheme of the protocol library. They are aware only of the set of interfaces for data exchange with the protocol module and will use them for buffer interchange.

If the protocol libraries are standard, the interfaces are simplified. For example, the mbuf library in Berkeley UNIX is a global library available to multiple modules. This permits modules to exchange mbufs across their interfaces without the need for mapping.

6.1.3 Single versus Multiple Buffer Pools

Independent of whether we use global or local buffer management, we need to determine the buffer count and buffer size distribution. In a global buffer management scheme, there are two choices:

A single set of buffers, all the same size
Multiple buffer pools, with all buffers in each pool all the same size

Figure 6.1 illustrates this. In the first case, a single buffer pool is constructed out of the memory area, with the buffers linked to each other. Each buffer in the pool is of the same size (256 bytes). In the second, multiple buffer pools are created out of the memory area, with each buffer pool consisting of buffers of the same size (64, 128, 256 bytes). Note that the size of the memory and the number of buffers are only illustrative-there could be a large memory area segmented into 256-byte buffers or a small memory area segmented into 64- and 128-byte buffers.

click to expand
Figure 6.1: Single and Multiple Buffer Pools.

Single buffer pools are easier to manage, while multiple buffer pools can lower the wastage of memory, since the most appropriately sized buffer can be used for the specific frame.

6.1.4 Buffer Size

A rule of thumb for choosing the size of a buffer in the pool is to determine the most common data size to be stored in the buffer. Consider a Layer 2 switch. If the buffers in this device are most commonly used to store minimum-size Ethernet packets (sized 64 bytes), then choose a buffer size of 80 bytes (the extra bytes are for buffer manipulation and passing module information). With this method most frames are sent and received by the device without much buffer space waste. If the frame size exceeds 64 bytes, then multiple buffers are linked to each other in the form of a chain or a linked list to accommodate the additional bytes. The resulting structure is often called a buffer chain. Popular buffer schemes like the mbuf library used in Berkeley UNIX follow this format.

If the frame size is less than 64 bytes, there will be internal fragmentation in the buffer, a situation familiar to students of memory allocation in operating systems. Internal fragmentation is unused space in a single buffer. When the frame size is larger than 64 bytes, internal fragmentation can occur in the last buffer of the chain if the total frame size is not an exact multiple of 64.

For example, if the received frame size is 300 bytes, the following calculations apply:

Number of buffers required = 300/64 = 4 + 1 = 5 buffers

Size of data in the last buffer = Modulo 300/64 = 44 bytes

Unused data in the last buffer = 64 - 44 = 20 bytes

It is next to impossible to avoid fragmentation in a system if the frame sizes can vary. Designers should instead focus on the optimal size for a buffer in a pool. This size will be the one where the most common frame size will fit without the need to use multiple buffers in a buffer chain.

6.1.5 Checklist for Buffer Pools and Sizes

The following provides a checklist that can be used in selecting a buffer management strategy:

Use global buffer management if there is no dependency on external modules provided by a third party. Even when such an external module uses its own buffer management, keep a global buffer management strategy for the rest of the system, and define interfaces for clean interchange with the external module.
If the packet sizes that are to be handled by the system do not vary much, choose a single buffer pool, with an optimal size.
Avoid buffer chaining as much as possible by choosing a single buffer size closest to the most frequently encountered packet size.

Figure 6.2: The BSD mbuf Structure.

6.1.6 The Berkeley Systems Distribution (BSD) mbuf Library

The BSD mbuf library is discussed in this section to illustrate some buffer management concepts. The BSD mbuf library was first used for communications in the UNIX kernel. The design arose out of the fact that network protocols have different requirements from other parts of the operating system both for peer-to-peer communication and for inter- process communication (IPC). The routines were designed for scatter/gather operations with respect to communications protocols that use headers and trailers prepended or appended to the data buffer. Scatter/gather implies a scheme where the data may be in multiple memory areas or buffers scattered in memory, and, to construct the complete packet, the data will need to be gathered together.

The mbuf or memory buffer is the key data structure for memory management facilities in the BSD kernel. Each mbuf is 128 bytes long, with 108 bytes used for data (see Figure 6.2). Whenever data is larger than 108 bytes, the application uses a pointer to an external data area called an mbuf cluster. Data is stored in the internal data area or external mbuf cluster but never in both areas.

As Figure 6.2 shows, an mbuf can be linked to another mbuf with the m_next pointer. Multiple mbufs linked together constitute a chain, which can be a single message like a TCP packet. Multiple TCP packets can be linked together in a queue using the m_nextpkt field in the mbuf. Each mbuf has a pointer, m_data, indicating the start of "valid" data in the buffer. The m_len field indicates the length of the valid data in the buffer. Data can be deleted at the end of the mbuf by simply decrementing the valid data count. Data can be deleted at the beginning of the mbuf by incrementing the m_data pointer to point to a different part of the buffer as the start of valid data. Consider the case when a packet needs to be passed up from IP to TCP. To do this, we can increment m_data by the size of the IP header so that it then points to the first byte of the TCP header and then decrement m_len by the size of the IP header.

The same mechanism can be used when sending data from TCP to IP. The TCP header can start at a location in the mbuf which permits the IP header to be prepended to the TCP header in the same buffer. This ensures there is no need to copy data to another buffer for the new header(s).

Another significant advantage of mbufs is the ability to link multiple mbufs to a single mbuf cluster (see figure 6.3). This is useful if the same frame needs to be sent to multiple interfaces. Instead of copying the same frame to all the interfaces, we can allocate mbufs to point to the same mbuf cluster, with a count indicating the number of references to the same area. The reference counts are stored in a separate array of counters. Freeing an mbuf decrements the reference count for the corresponding data area, and, when the reference count reaches zero, the data area is released. The mbuf example is an important technique for buffer management and is used in several systems.

The mbuf buffer management scheme is an example of a two-level hierarchy for buffer organization. The first level is the mbuf structure, and the second is the mbuf cluster pointed to by the mbuf. Adding data to the beginning or end of the mbuf cluster will require modifying the pointers and counts for valid data in the mbuf.

click to expand
Figure 6.3: Creating an mbuf cluster with multiple mbufs.

A Quick View of the mbuf Library Routines

The routines available in the mbuf library include those for allocating a single mbuf, freeing an mbuf, deleting data from the front or end of the mbuf, copying data from an mbuf chain into a linear buffer, making a copy of an mbuf chain into another, and so on (see Table 6.1).

Table 6.1: mbuf library routines.
Function Name	Description and Use	Comments
m_get	To allocate an mbuf mptr = m_get (wait, type)	wait indicates if the call should block or return immediately if an mbuf is not available. Kernel will allocate the memory for the mbuf using malloc
m_free	To free an mbuf m_free (mptr)	Returns buffer to the kernel pool
m_freem	To free an mbuf chain m_freem (mptr)	Returns buffers to the kernel pool
m_adj	To delete data from the front or end of the mbuf m_adj (mptr, count)	If count is positive, count bytes are deleted from the front of the mbuf. If it is negative, they are deleted from the end of the mbuf.
m_copydata	To copy data from an mbuf into a linear buffer m_copydata (mptr, startingOffset, count, bufptr)	startingOffset indicates the offset from the start of the mbuf from which to copy the data. count indicates the number of bytes to be copied while bufptr indicates the linear buffer into which the data should be copied. We need to use this call when the application interface requires that the contents of the packet be in one contiguous buffer. This will hide the mbuf implementation from the application-a common requirement.
m_copy	To make a copy of an mbuf mptr2 = m_copy (mptr1, startingOffset, count)	mptr2 is the new mbuf chain created with bytes starting from startingOffset and count bytes from the chain pointed to by mptr1. This call is typically used in cases in which we need to make a partial copy of the mbuf for processing by a module independent of the current module.
m_cat	To concatenate two mbuf chains m_cat (mptr1, mptr2)	The chain pointed to by mptr2 is appended to the end of the chain pointed to by mptr1. This is often used in IP reassembly, in which each IP fragment is a separate mbuf chain. Before combining the chains, only the header of the first fragment is retained for the higher layer. The headers and trailers of the other fragments are "shaved" using the m_adj call so that the concatenation can be done without any copying. This is one example of the power and flexibility offered by the mbuf library.

6.1.7 The STREAMS Buffer Scheme

The mbuf scheme forms the basis for a number of buffer management schemes in commercially available RTOSes. An alternate buffer scheme is available in the STREAMS programming model, which was first presented in Chapter 2.

Consider xrefparanum. which shows the STREAMS buffer organization. There is a three-level hierarchy with a message block, data block, and a data buffer. Each message can consist of one or more message blocks. In Figure 6.4, there are two messages, the first having one message block and the second composed of two message blocks.

click to expand
Figure 6.4: STREAMS buffer organization.

Each message block has multiple fields. The b_next field points to the next message in the queue, while b_prev points to the previous message. b_cont points to the next message block for this message, while b_rptr and b_wptr point to the first unread byte and first byte that can be written in the data buffer. b_datap points to the data block for this message block. Note that the second message has two data blocks, one for each message block in the message.

In the data block, db_base points to the first byte of the buffer, while db_lim points to the last byte. db_ref indicates the reference count, i.e., the number of pointers (from message blocks) pointing to this data block (and buffer).

While the structures may appear different from the mbuf scheme, the fundamentals are the same. The STREAMS buffer scheme uses linking to modify the data without copying, concatenating, and duplicating buffers, and uses reference counts when multiple structures access the same data area. Similar to the separate mbuf table for cluster reference counts, the STREAMS buffer scheme uses the db_ref field in the data block to indicate the reference count for the memory area.

6.1.8 Comparing the Buffer Schemes

The two popular schemes for buffer and chain buffer management are the two-level hierarchy (as in mbufs) and the STREAMS three-level hierarchy. The two schemes are shown in Figure 6.5.

Which of the two schemes is more efficient? The two-level hierarchy is a simple scheme and has only one level of indirection to get data from the mbuf to the mbuf cluster or data area. The three-level hierarchy requires an additional level of indirection from the message block to the data block and to the corresponding data area. This is required only for the first data block since the message block only links to the first data block. The three-level hierarchy also requires additional memory for the message blocks, which are not present in the two-level hierarchy.

In a three-level hierarchy, the message pointer does not need to change to add data at the beginning of the message. The message block now points to a new data block with the additional bytes. This is transparent to the application since it continues to use the same pointer for the message block. With a two-level hierarchy, this could involve allocating a new mbuf at the head of the mbuf chain and ensuring that applications use the new pointer for the start of the message.

The two-level hierarchy is the same as the three-level hierarchy, but the message block is merged into the first data block (or mbuf). Both schemes are used in commercial systems and use an external data area to house the data in the buffer. This is a flexible method for handling data, since it can be used across protocols using different buffer management schemes. Consider a system implemented with protocol stack products from multiple vendors. If each of the products has its own buffer management scheme, we can still provide for data interchange on the interface without any copying if the final data is housed externally. For illustration, we can consider the two stacks to implement a three-level and a two-level hierarchy. Revisiting Figure 6.5, we can deduce how data can be manipulated across the interface; the data area is not copied between the two types of buffer schemes. Rather, only the pointers change.

click to expand
Figure 6.5: (a)Three and (b) Two level buffer Management Schemes.

6.1.9 A Sample Buffer Management Scheme

This section outlines the important components of a buffer management scheme using the ideas discussed earlier. In this real target system example, there are three types of structures-a message block, data block, and data buffer (see Figure 6.6). This scheme is similar to the buffer structure in STREAMS implementations.

The message block contains a pointer to the first data block of the message, and the data block contains a pointer to the actual data associated with the block. Message blocks and data blocks are allocated from DRAM and are housed in their own free pools.

click to expand
Figure 6.6: Structures in a buffer management scheme.

There is a message control block (MCB) and a data control block (DCB) which has the configuration, status, and statistics for the message and data blocks (see Figure 6.6(a)). The buffers should be allocated from DRAM and linked to the data blocks as required while the system is running. Figure 6.6(b) shows the system with two tasks after allocating and queuing messages on the task message queues. As seen, the message blocks maintain the semantics of the message queue.

Data blocks can be used for duplicating data buffers without copying. For example, two data blocks can point to the same data buffer if they need to have the same data content. Routines in the buffer management library perform the following actions:

Allocating and freeing of data blocks
Linking a data buffer to a data block
Queuing messages
Concatenating messages
Changing the data block pointer

The library is used by various applications to manipulate buffers for data interchange. One important factor in this buffer management scheme is the minimization of data copying-realized by the linking to data blocks.

Message and Data Buffer Control Blocks

The structure below shows the typical format of the message control block. There is a pointer to the start of the free pool housing the available message blocks. The count of available message blocks in the free pool is the difference between the number of allocations and the number of releases (NumAllocs - NumReleases). For the example, assume this is a separate field in the structure (FreePoolCount) (Listing 6.1).

Listing 6.1: Message control block.

typedef struct { struct MsgBlock *FreePoolPtr;          unsigned long    FreePoolCount;          unsigned long    NumAllocs;          unsigned long    NumReleases;          unsigned long    LowWaterMark;          unsigned long    MaxAllocs; } MsgControlBlock;

When the system is in an idle or lightly loaded state, the free-pool count has a value in a small range. In an end node TCP/IP implementation, which uses messages between the layers and with applications, a message and its message block will be processed quickly and released to the free-pool. Allocations will be matched by releases over a period of time. The difference, i.e., the free-pool count, will not vary much because few messages will be held in the system waiting processing. Sampling the number of queued messages is a quick way to check the health of the system.

When the system is heavily loaded, messages may not be processed and released rapidly, so the free-pool count may dip to a low value. However, when the system comes out of this state, the free-pool count will return to the normal range.

The LowWaterMark can be used to indicate when the free-pool count is approaching a dangerously low number. It is a configurable parameter which is used to indicate when an alert will be sent to the system operator due to a potential depletion of buffers. The alert is sent when the free-pool count reaches a value equal to or below LowWaterMark.

This variable should be set to a value high enough so that the system can allocate buffers to send the alert to the manager about the depletion, permitting the manager to take appropriate action. The depletion may be a temporary phenomenon or could happen when some module is holding up a number of messages. The management action for this alert could be to increase the number of message blocks at the next startup or to shut down the errant module. Choosing the correct value for LowWaterMark can permit a graceful shutdown of the system by the manager.

Similar to the message control block, we have a control block for . The structure is shown in Listing 6.2.

Listing 6.2: Data control block.

 typedef struct {   struct  DataBlock *FreePoolPtr;            unsigned long    FreePoolCount;            unsigned long    NumAllocs;            unsigned long    NumReleases;            unsigned long    LowWaterMark;            unsigned long    MaxAllocs;         } DataControlBlock;

6.1.10 Exception Conditions in Buffer Management

If the system does not have adequate memory for buffers, or if there are issues in passing buffers between modules, the designer would have to provide for exception conditions such as the following:

Lack of buffers or message or data blocks
Modules unable to process messages fast enough
Errant modules not releasing buffers
System unable to keep up with data rates

The lack of buffers or message or data blocks was covered earlier in the text where we specified using a low-water mark to alert the operator. Designers should engineer pool counts for fully loaded systems, which can be verified when the system is tested in the lab at peak load. It is usually not possible to fix this problem on a real-time system in the field. However, the alert helps the operator determine problems that should be addressed on the next system reboot.

Modules may not be able to keep up with messages for a variety of reasons. They could be spending a lot of time in algorithmic processing, or they may be scheduled less often due to lower task priorities. This causes some modules to hold up buffers. Modules which queue these buffers should perform flow control whenever the destination module queues reach a high-water mark. This is a threshold parameter to indicate a safe queue depth or count. The queuing call made by the source module checks if the queue of the target module has crossed the high-water mark. If so, it aborts the queuing operation and returns an error code to the source module. The source module can report this error or retry its operation at a later stage by storing the message in its own queue.

Errant modules are those that are unreliable due to bugs or faulty design. These cause the system to run out of message blocks and buffers. The low-water mark and alert is one way to inform the operator about these modules.

The final type of error is that the system cannot keep up with data rates and runs out of buffers. There are several methods to handle congestion, including well-known techniques such as RED (Random Early Detection/Discard) and Weighted RED. Several of these techniques are now implemented in hardware controllers also.