5.4 Speeding up Access | Designing Embedded Communications Software

Our examples of the PCB and PICB used a simple linked list for organization of the individual elements in the table. This structure, while simple to understand and implement, is not the most efficient for accessing the elements. There are several methods to speed up access based on the type of table or data structure that they are accessing.

There are three ways to speed up access, namely,

Optimized Access Methods for specific data structures
Hardware support
Caching

5.4.1 Optimized Access Methods

For faster access of the entries in a routing table, a trie structure has been shown to be more efficient than a linked list. A trie structure permits storing the routing table entries in the leaf nodes of a the data structure, which is accessed through the Longest Prefix Match (LPM) method. Similarly, hashing can be used for efficient storage and access of the elements of a MAC filtering table. The efficiency of the access depends upon the choice of the hashing algorithm and resultant key.

Over Engineering

When planning an embedded application, developers should check prior engineering work for efficient storage and access to data structures. However, there is a risk of over engineering without knowing all the benefits, when developers spend an inordinate amount of time on efficient organization and access, even if the benefits are only incremental. The final goal for an embedded application is a well tested system, with the required features and performance, and within the agreed schedule.

5.4.2 Hardware Support

Another way to speed up access is by using hardware support. An Ethernet controller may have a hardware hashing mechanism for matching the destination MAC address to a bit mask. A bit set in the mask indicates a match of a MAC address that the controller has been programmed to receive.

Another common method of hardware support for table access is with a Content Addressable Memory (CAM). A CAM is a hardware device which can enable parallel searches using a a key. For example, a CAM is often used to obtain the routing table entry corresponding to the Longest Prefix Match of an IP address. This scheme is one of the fastest ways to access and obtain a match—in fact, some modern network processors (e.g., the Intel IXP 2400 and 2800) have built-in CAMs for high-speed access.

A Note on Engineering Assumptions

To illustrate how dynamic engineering assumptions can be, consider the case of a forwarding table used in routers. In the early days of routing, the focus was on reducing the memory requirements for these tables and increasing access speeds through table organization. Due to advances in hardware- based switching techniques and the price of memory dropping considerably, the “traditional” way of organizing the data structures for fast software access were no longer valid. The use of Content- Addressable Memories (CAMs) is a classic example of how this works. CAMs do not require a trie- based organization since the memory is accessed based on content.

Another example of how even protocol focus can change is the example of Multi Protocol Label Switching (MPLS). Originally proposed as means of fixed-header-length label switching (since variable- length longest prefix match or LPM was inefficient for hardware-based switching), the technology shifted its emphasis into traffic engineering (TE). While TE is important, hardware devices became more adept at LPM switching for IP packets, with no discernible drop in performance as compared to a label-switched packet. So, the hardware switching bottleneck became less of an issue, causing the protocol engineers to focus their efforts on the problem of Traffic Engineering.

5.4.3 Caching

Caching is the third method for fast access. A cached subset of a large table can be stored in high-speed memory for faster access. For performance reasons, software always accesses its entries from the cached version. The subset-determination algorithm is key to this scheme, since the idea is to maximize the number of cache hits. If there is a cache miss, the software has to obtain the entries from the lower speed memory, thus negating the effects of caching.

On a Layer 2 switch, the MAC filtering table entries related to the most recently seen destination addresses can be cached. This is based on the premise that the next few frames will also be destined to the same set of MAC addresses, since they are probably part of the same bidirectional traffic flow between the two end stations.

Since the cache size is limited, we need a method to replace entries when a new entry is to be added to a full cache. Timing out entries is one technique. Here, an entry can be removed from the table if it has not been used for a specified period of time. In a Layer 2 switch, a table entry can be timed out if two end stations have not been communicating for a while.

The removal of the entry can be done actively, with a periodic scan of the entries, or it can be done on demand. In the latter case, an entry is chosen for replacement whenever the table is full and a new entry needs to be added. The new entry may be added by a software module called the Cache Handler, which can make the determination based on multiple criteria:

The entry is accessed more than once in the last few transactions
Static configuration—an entry is either replaceable or locked via manager configuration
Prioritization of an entry over another

If the cache is full, the choice of the entry to be replaced can be based on the LRU (Least Recently Used) algorithm used by operating systems. In the packet-forwarding case, this could be an address to which packets are no longer forwarded.