This section will discuss design considerations with respect to partitioning the system into
The following list is a set of guidelines for organizing the modules and tasks in a communications system:
There are one or more drivers which need to handle the various physical ports on the system. Each driver can be implemented as a task or module. There are one or more Interrupt Service Routines (ISRs) for each driver.
The drivers interface with a higher layer
Each protocol is designed as a module. It can be implemented as a task if it requires independent scheduling and handling of events, such as timers. If the overhead of context switching and message passing between the tasks is unacceptable, multiple protocol modules can run as one task.
Control and data plane functions, along with fast/slow
Housekeeping and management plane functions like SNMP
If memory protection is used, the interfaces between the tasks will be via messages. Otherwise, functional interfaces can be used. This will be discussed in greater detail in Section 3.6
The above guidelines are applied in the example of the Layer 2 switch, as discussed below.
Consider a Layer 2 Ethernet switch, which switches Ethernet
Figure 3.3 shows the software architecture of the device. It requires the following:
Driver(s) to transmit and receive Ethernet frames from the ports
A set of modules/
A set of modules/tasks required for the system operation and management
Figure 3.3:
Typical Architecture of a Layer 2 Switch.
The device driver is the module
With polling, the driver
Polling is a relatively inefficient way to handle frames,
|
|
Several techniques are used to optimize controllers for receiving frames. Consider a case in which a list of multicast Ethernet addresses are to be recognized. In the Layer 2 switch, one such multicast address is used for Spanning Tree Protocol (STP) frames. STP is used to ensure that
The STP frames use the multicast MAC address 01-80-C2-00-00-00, so the Ethernet controller needs to receive and pass up all MAC frames with this multicast address in the destination address field, whenever STP is enabled. The controller can perform this match on chip registers which store the MAC addresses. The driver programs this register with the MAC addresses based on the higher layer configuration. The higher layer configuration can take place via a Command Line Interface (CLI) with a command like “Enable STP”, which, in
|
|
In Figure 3.4, a driver provides receive buffers to the controller. These buffers are located in system memory, and the controller can access these receive buffers to directly DMA the received Ethernet frames. Typically, the controller
Figure 3.4:
Frame Reception and Buffer Handling
In Figurer 3.4, five frames of different sizes are received into the buffers. The figure shows that a frame “points” to the buffer housing the start of the next frame. This can be done using a pointer or an end-of-frame indication. Using Figure 3.4, assume a buffer size of 258 bytes, of which two bytes are used for management or “housekeeping” purposes, and 256 bytes are used for data. The parameters can include a count of the valid bytes in the buffer, whether this is the last buffer in the frame, status of the reception, and so on.
Now assume that the frames received are of size 300, 600, 256, 325 and 512 bytes. Using modulo arithmetic (i.e., if there is a remainder when dividing the frame size by the buffer size, you need to add 1 to the number of buffers needed), we need 2, 3, 1, 2 and 2 buffers respectively for the five received frames (see Figure 3.4). The “Total Frame Size” column details how the frame sizes are calculated based on the count of valid bytes in the buffers constituting the frame (the last buffer in a frame is the one where the end-of- frame indication is set by the controller).
|
Buffer No. |
Count of Valid Bytes |
Frame Reference |
Total Frame Size |
|---|---|---|---|
|
1 |
256 |
Frame 1 |
|
|
2 |
44 |
Frame 1 and end of frame |
Buffer 1 count + Buffer 2 count = 300 bytes |
|
3 |
256 |
Frame 2 |
|
|
4 |
256 |
Frame 2 |
|
|
5 |
88 |
Frame 2 and end of frame |
Buffer 3 count + Buffer 4 count + Buffer 5 count = 600 bytes |
|
6 |
256 |
Frame 3 and end of frame |
Buffer 6 count = 256 bytes |
|
7 |
256 |
Frame 4 |
|
|
8 |
69 |
Frame 4 and end of frame |
Buffer 7 count + Buffer 8 count = 325 bytes |
|
9 |
256 |
Frame 5 |
|
|
10 |
256 |
Frame 5 and end of frame |
Buffer 9 count + Buffer 10 count = 512 bytes |
If there is a possibility that the device can overrun all the buffers, one of two safety mechanisms can be employed. The device can be configured to raise an overrun interrupt, in which case the driver interrupt service routine picks up the received frames. For the second safety mechanism, the driver can poll the controller for frame reception and pick up the received frames. The polling interval can be based on the frame arrival rate (i.e., it is made a function of the buffer-
Buffer handling depends on the controller. If the controller requires that buffers be located only in a specific area of memory, the driver copies the frames from the driver area to the specified area in memory. It then flags the controller to
The driver passes the received frames to the higher layer. This depends upon the system architecture, i.e., whether the driver and its higher layer are in the same memory space, a common scenario in embedded systems. In such a scenario, the received frame is typically enqueued to the higher layer module by the driver without copying the frame into the module’s buffers. An event notifies the higher layer module of the presence of the received frame, so that it can start processing the frame. If more than one frame is enqueued, the driver places all the frames in a First In First Out (FIFO) queue for the higher layer module to pick up.
If the driver and the higher layer are in two separate memory areas, the driver copies the frame into a common area or into the buffers of a system facility like an Inter-Process Communication (IPC) queue and signals the higher layer. The higher layer then copies the frame from system buffers into its own buffers. This approach uses an extra copy cycle—which can degrade performance.
Frame transmission also depends on memory and controller functions. The considerations are much the same—whether the memory areas are separate, whether the controller can work with buffers in multiple memory areas, and so on.
In addition, the completion of transmission needs to be handled. The driver can either poll the device or be
In most systems, drivers are usually responsible for more than one hardware port. The driver differentiates the ports by using data structures—one for each of the ports it handles. These data structures typically involve the memory or I/O address of the controller, the port number, statistics related to the driver, and so on.
A driver may also be a task or a module with no independent thread of execution, since a driver does not exist without a higher layer or lower layer (controller ISR) providing it an impetus to run. A driver handles transmission, reception, and errors, which are all driven by the higher and lower layers. Therefore, many systems implement drivers as either libraries or functions that can be called from the higher layer or from an interrupt service routine.
The alternative, in which the driver is itself a task, allows drivers to implement logic that is useful with hardware controllers. The polling logic for reception is one such case. If the driver is scheduled as a task that polls the controllers at periodic intervals, we can avoid frame reception overrun. Another use of a separate driver task is when chip statistics counters need to be read within a time interval. It is not always possible to have on chip statistics counters that will never overflow. This is especially the case at higher speeds. A driver task that periodically polls the controllers to read the current value of the counters and maintains them in software will alleviate this situation.
xrefparanum showed the typical module partitioning of protocols in a Layer 2 switch. The switch runs the 802.1D spanning tree algorithm and protocol (STP), which detects loops in the switching topology and
Another protocol used in Layer 2 switching is the IEEE 802.1Q Generic VLAN Registration Protocol (GVRP). A VLAN (Virtual LAN) is a logical partitioninig of the switching topology. Nodes on a VLAN can communicate with one other without going through a router.. Nodes connected to multiple physical LANs (and switches) can be configured to be
Another method of partitioning could be to combine all the control protocols so that they are handled within one task. This has the advantage of avoiding context switches and memory requirements due to a large number of control tasks. The flip side to this is the complexity—the STP processing may hold up the GVRP processing. If they are separate, equal-priority tasks, equal-priority time slicing could be used to ensure that no one control task holds up the others.
Other than the protocols, there is a switching task that picks up the frames from one Ethernet port and switches them to another port based on the destination address in the frame. The switching task uses the information from frames to build its forwarding table and qualifies the table entries with information provided by the STP and the GVRP tasks. For example, it will not poll the Ethernet driver for frames from deactivated ports (as specified by the STP task). Similarly, it will forward a frame to a port based on the VLAN information it obtained from the GVRP task.
Note that the switching task needs to runs more often, since it processes frames from multiple Ethernet ports arriving at a rapid rate. Due to the nature of the protocols, the STP and the GVRP tasks do not need to process frames as often as the switching task— the control frames associated with these protocols are exchanged only once every few seconds. The switching task thus runs at a higher priority than the other protocol tasks in line with this requirement. If the driver is implemented as a separate task, it needs to have a higher priority than all the other tasks in the system since it needs to process frames as fast as possible. This logic extends upwards also, for the switching task that processes all the frames provided by the driver. The partitioning is done based on the protocol and how often the protocol needs to process frames.
Frames are provided to the GVRP, STP or IP tasks through the use of a demultiplexing (demux) operation, which is usually implemented at a layer above the driver. Demultiplexing involves pre-processing arriving frames from Ethernet ports and sending them to the appropriate task. For example, an STP multicast frame is identified by its multicast destination address and sent to the STP task Similarly, an IP packet destined to the router (
Listing 3.1: Perform demultiplexing.
|
|
{ If frame is a multicast frame { Check destination multicast address and send to GVRP or STP task; } else Dropframe; If frame is destined to switch with IP protocol type Send to IP function }
|
|
In some
In the above example, the algorithm for the switching, i.e., the bridging operation, is not shown. The algorithm includes learning from the source MAC address, filtering, and forwarding of received frames.
Layer 2 switches usually have a full TCP/IP stack to handle the following:
TCP over IP for telnet and HTTP over TCP for switch management
SNMP over UDP over IP, for switch management
ICMP functionality such as ping
This implies that the complete suite of IP, TCP, UDP, HTTP, SNMP protocols needs to be supported in the Layer 2 switch. Note that, since there is no IP forwarding performed, the TCP/IP stack implements only end-node functionality. Network managers connect to the switch using the IP address of any of the Ethernet ports. Figure 3.3 showed a Layer 2 Ethernet switch with complete end node functionality.
Often, the TCP/IP end node functionality is implemented with fewer tasks. For instance, IP, ICMP, UDP, and TCP can be provided in the same task. Since end node functionality is usually not time critical, each protocol function can run sequentially when an IP packet is received.
While protocol tasks form the
A Health Monitor task ensures that hardware and software are performing correctly. The Health Monitor task can
Other tasks or modules relevant to system and management functions can include buffer and timer management, inter-board communication, redundancy management, or shelf management in large hardware systems housing multiple
In the Layer 2 switch example, it was assumed that the switching was done in software, even if it was inefficient. In reality, switching is often performed by a switching chipset as detailed in Chapter 2. This switching chipset needs to be programmed for the switching parameters, including VLAN parameters, port priorities, and size of filtering tables—all of which can modify the system architecture.
When hardware acceleration is used in our Layer 2 switch example, the switching task is now responsible for the slow-