Conceptual Underpinnings of Flow Control and Quality of Service | Storage Networking Protocol Fundamentals (Vol 2)

To fully understand the purpose and operation of flow-control and QoS mechanisms, first readers need to understand several related concepts. These include the following:

The principle of operation for half-duplex upper layer protocols (ULPs) over full-duplex network protocols
The difference between half-duplex timing mechanisms and flow-control mechanisms
The difference between flow control and Quality of Service (QoS)
The difference between the two types of QoS algorithms
The relationship of delivery acknowledgement to flow control
The relationship of processing delay to flow control
The relationship of network latency to flow control
The relationship of retransmission to flow control
The factors that contribute to end-to-end latency

As previously mentioned, SCSI is a half-duplex command/response protocol. For any given I/O operation, either the initiator or the target may transmit at a given point in time. The SCSI communication model does not permit simultaneous transmission by both initiator and target within the context of a single I/O operation. However, SCSI supports full-duplex communication across multiple I/O operations. For example, an initiator may have multiple I/O operations outstanding simultaneously with a given target and may be transmitting in some of those I/O operations while receiving in others. This has the affect of increasing the aggregate throughput between each initiator/target pair. For this to occur, the end-to-end network path between each initiator/target pair must support full-duplex communication at all layers of the OSI model.

Readers should be careful not to confuse half-duplex signaling mechanisms with flow-control mechanisms. Communicating FCP devices use the Sequence Initiative bit in the FC Header to signal which device may transmit at any given point in time. Similarly, iSCSI devices use the F bit in the iSCSI Basic Header Segment (BHS) to signal which device may transmit during bidirectional commands. (iSCSI does not explicitly signal which device may transmit during unidirectional commands.) These mechanisms do not restrict the flow of data. They merely control the timing of data transmissions relative to one another.

Flow control and QoS are closely related mechanisms that complement each other to improve the efficiency of networks and the performance of applications. Flow control is concerned with pacing the rate at which frames or packets are transmitted. The ultimate goal of all flow-control mechanisms is to avoid receive buffer overruns, which improves the reliability of the delivery subsystem. By contrast, QoS is concerned with the treatment of frames or packets after they are received by a network device or end node. When congestion occurs on an egress port in a network device, frames or packets that need to be transmitted on that port must be queued until bandwidth is available. While those frames or packets are waiting in queue, other frames or packets may enter the network device and be queued on the same egress port. QoS policies enable the use of multiple queues per port and determine the order in which the queues are serviced when bandwidth becomes available. Without QoS policies, frames or packets within a queue must be transmitted according to a simple algorithm such as First In First Out (FIFO) or Last In First Out (LIFO). QoS mechanisms enable network administrators to define advanced policies for the transmission order of frames or packets. QoS policies affect both the latency and the throughput experienced by a frame or packet. The QoS concept also applies to frames or packets queued within an end node. Within an end node, QoS policies determine the order in which queued frames or packets are processed when CPU cycles and other processing resources become available.

All QoS algorithms fall into one of two categories: queue management and queue scheduling. Queue management algorithms are responsible for managing the number of frames or packets in a queue. Generally speaking, a frame or packet is not subject to being dropped after being admitted to a queue. Thus, queue management algorithms primarily deal with queue admission policies. By contrast, queue scheduling algorithms are responsible for selecting the next frame or packet to be transmitted from a queue. Thus, queue scheduling algorithms primarily deal with bandwidth allocation.

End-to-end flow control is closely related to delivery acknowledgement. To understand this, consider the following scenario. Device A advertises 10 available buffers to device B. Device B then transmits 10 packets to device A, but all 10 packets are transparently dropped in the network. Device B cannot transmit any more packets until device A advertises that it has free buffers. However, device A does not know it needs to send another buffer advertisement to device B. The result is a deadlock condition preventing device B from transmitting additional frames or packets to device A. If the network notifies device B of the drops, device B can increment its transmit buffers for device A. However, notification of the drops constitutes negative acknowledgement. Device A could send a data packet to device B containing in the header an indication that 10 buffers are available in device A. Although this does not constitute an acknowledgement that the 10 packets transmitted by device B were received and processed by device A, it does provide an indication that device B may transmit additional packets to device A. If device B assumes that the first 10 packets were delivered to device A, the result is an unreliable delivery subsystem (similar to UDP/IP and FC Class 3). If device B does not assume anything, the deadlock condition persists. Other contingencies exist, and in all cases, either a deadlock condition or an unreliable delivery subsystem is the result. Because the goal of flow control is to avoid packet drops due to buffer overrun, little motivation exists for implementing end-to-end flow control on unreliable delivery subsystems. So, end-to-end flow control is usually implemented only on reliable delivery subsystems. Additionally, end-to-end flow-control signaling is often integrated with the delivery acknowledgement mechanism.

End-to-end flow control is also closely tied to frame/packet processing within the receiving node. When a node receives a frame or packet, the frame or packet consumes a receive buffer until the node processes the frame or packet or copies it to another buffer for subsequent processing. The receiving node cannot acknowledge receipt of the frame or packet until the frame or packet has been processed or copied to a different buffer because acknowledgement increases the transmitting node's transmit window (TCP) or EE_Credit counter (FC). In other words, frame/packet acknowledgement implies that the frame or packet being acknowledged has been processed. Thus, processing delays within the receiving node negatively affect throughput in the same manner as network latency. For the effect on throughput to be negated, receive buffer resources must increase within the receiving node as processing delay increases. Another potential impact is the unnecessary retransmission of frames or packets if the transmitter's retransmission timer expires before acknowledgement occurs.

Both reactive and proactive flow-control mechanisms are sensitive to network latency. An increase in network latency potentially yields an increase in dropped frames when using reactive flow control. This is because congestion must occur before the receiver signals the transmitter to stop transmitting. While the pause signal is in flight, any frames or packets already in flight, and any additional frames or packets transmitted before reception of the pause signal, are at risk of overrunning the receiver's buffers. As network latency increases, the number of frames or packets at risk also increases. Proactive flow control precludes this scenario, but latency is still an issue. An increase in network latency yields an increase in buffer requirements or a decrease in throughput. Because all devices have finite memory resources, degraded throughput is inevitable if network latency continues to increase over time. Few devices support dynamic reallocation of memory to or from the receive buffer pool based on real-time fluctuations in network latency (called jitter), so the maximum expected RTT, including jitter, must be used to calculate the buffer requirements to sustain optimal throughput. More buffers increase equipment cost. So, more network latency and more jitter results in higher equipment cost if optimal throughput is to be sustained.

Support for retransmission also increases equipment cost. Aside from the research and development (R&D) cost associated with the more advanced software, devices that support retransmission must buffer transmitted frames or packets until they are acknowledged by the receiving device. This is advantageous because it avoids reliance on ULPs to detect and retransmit dropped frames or packets. However, the transmit buffer either consumes memory resources that would otherwise be available to the receive buffer (thus affecting flow control and degrading throughput) or increases the total memory requirement of a device. The latter is often the design choice made by device vendors, which increases equipment cost.

The factors that contribute to end-to-end latency include transmission delay, serialization delay, propagation delay, and processing delay. Transmission delay is the amount of time that a frame or packet must wait in a queue before being serialized onto a wire. QoS policies affect transmission delay. Serialization delay is the amount of time required to transmit a signal onto a wire. Frames or packets must be transmitted one bit at a time when using serial communication technologies. Thus, bandwidth determines serialization delay. Propagation delay is the time required for a bit to propagate from the transmitting port to the receiving port. The speed of light through an optical fiber is 5 microseconds per kilometer. Processing delay includes, but is not limited to, the time required to:

Classify a frame or a packet according to QoS policies
Copy a frame or a packet into the correct queue
Match the configured policies for security and routing against a frame or a packet and take the necessary actions
Encrypt or decrypt a frame or a packet
Compress or decompress a frame or a packet
Perform accounting functions such as updating port statistics
Verify that a frame or a packet has a valid CRC/checksum
Make a forwarding decision
Forward a frame or a packet from the ingress port to the egress port

The order of processing steps depends on the architecture of the network device and its configuration. Processing delay varies depending on the architecture of the network device and which steps are taken.