< Day Day Up > |
TCP Congestion Control and Flow Control Sliding WindowsOne of the main principles for congestion control is avoidance. TCP tries to detect signs of congestion before it happens and to reduce or increase the load into the network accordingly. The alternative of waiting for congestion and then reacting is much worse because once a network saturates, it does so at an exponential growth rate and reduces overall throughput enormously. It takes a long time for the queues to drain, and then all senders again repeat this cycle. By taking a proactive congestion avoidance approach, the pipe is kept as full as possible without the danger of network saturation. The key is for the sender to understand the state of the network and client and to control the amount of traffic injected into the system. Flow control is accomplished by the receiver sending back a window to the sender. The size of this window, called the receive window, tells the sender how much data to send. Often, when the client is saturated, it might not be able to send back a receive window to the sender to signal it to slow down transmission. However, the sliding windows protocol is designed to let the sender know, before reaching a meltdown, to start slowing down transmission by a steadily decreasing window size. At the same time these flow control windows are going back and forth, the speed at which ACKs come back from the receiver to the sender provides additional information to the sender that caps the amount of data to send to the client. This is computed indirectly. The amount of data that is to be sent to the remote peer on a specific connection is controlled by two concurrent mechanisms:
TCP Tuning for ACK ControlFIGURE 3-10 shows how senders and receivers control ACK waiting and generation. The general strategy is that clients want to reduce receiving many small packets. Receivers try to buffer up a bunch of received packets before sending back an acknowledgment (ACK) to the sender, which will trigger the sender to send more packets. The hope is that the sender will also buffer up more packets to send in one large chunk rather than many small chunks. The problem with small chunks is that the efficiency ratio or useful link ratio utilization is reduced. For example, a one-byte data packet requires 40 bytes of IP and TCP header information and 48 bytes of Ethernet header information. The ratio works out to be 1/(88+1) = 1.1 percent utilization. When a 1500-byte packet is sent, however, the utilization can be 1500/(88+1500) = 94.6 percent. Now, consider many flows on the same Ethernet segment. If all flows are small packets, the overall throughput is low. Hence, any effort to bias the transmissions towards larger chunks without incurring excessive delays is a good thing, especially interactive traffic such as Telnet. Figure 3-10. TCP Tuning for ACK ControlFIGURE 3-10 provides an overview of the various TCP parameters. For a complete detailed description of the tunable parameters and recommended sizes, refer to your product documentation or the Solaris AnswerBooks at docs.sun.com. There are two mechanisms that are used by senders and receivers to control performance:
TCP Example Tuning ScenariosThe following sections describe example scenarios where TCP require tuning, depending on the characteristics of the underlying physical media. Tuning TCP for Optical Networks WANSTypically, WANS are high-speed, long-haul network segments. These segments introduce some interesting challenges because of their properties. FIGURE 3-11 shows how the traffic changes as a result of a longer, yet faster, link, comparing a normal LAN and an Optical WAN. The line rate has increased, resulting in more packets per unit time, but the delays have also increased from the time a packet leaves the sender to the time it reaches the receiver. This has the strange effect that more packets are now in flight. Figure 3-11. Comparison between Normal LAN and WAN Packet TrafficFIGURE 3-11 shows a comparison of the number of packets that are in the pipe between a typical LAN of 10 mbps/100 meters with RTT of 71 microseconds, which is what TCP was originally designed for, and an optical WAN, which spans New York to San Francisco at the rate of 1 Gbps with RTT of 100 milliseconds. The bandwidth delay product represents the number of packets that is actually in the network and implies the amount of buffering the network must provide. This also gives some insight into the minimum window size, which we discussed earlier. The fact that the optical WAN has a very large bandwidth delay product as compared to a normal network requires tuning as follows:
Both of these parameters must be manually increased according to the actual WAN characteristics. Delayed ACKs on the receiver side should also be minimized because this will slow the increasing of the window size when the sender is trying to ramp up. RTT measurements require adjustment less frequently due to the long RTT times, hence interim additional RTT values should be computed. The tunable tcp_rtt_updates parameter is somewhat related. The TCP implementation knows when enough RTT values have been sampled, and then this value is cached. tcp_rtt_updates is on by default, but a value of 0 forces it to never be cached, which is the same as the case of not having enough for an accurate estimate of RTT for this particular connection.
Figure 3-12. Tuning Required to Compensate for Optical WANTuning TCP for Slow LinksWireless and satellite networks have a common problem of a higher bit error rate. One tuning strategy to compensate for the lengthy delays is to increase the send window, sending as much data as possible until the first ACK arrives. This way, the link is utilized as much as possible. FIGURE 3-13 shows how slow links and normal links differ. If the send window is small, then there will be significant dead time between the time the send window sends packets over the link and the time an ACK arrives and the sender can either retransmit or send the next window of packets in the send buffer. But due to the increased error probability, if one byte is not acknowledged by the receiver, the entire buffer must be re-sent. Hence, there is a trade-off to increase the buffer to increase throughput. But you don't want to increase it so much that if there is an error the performance is degraded by more than was gained due to retransmissions. This is where manual tuning comes in. You'll need to try various settings based on an estimation of the link characteristics. One major improvement in TCP is the selective acknowledgement (SACK), where only the one byte that was not received can be retransmitted, not the entire buffer. Figure 3-13. Comparison between Normal LAN and WAN Packet Traffic Long Low Bandwidth PipeAnother problem introduced in these slow links is that the ACKs play a major role. If ACKs are not received by the sender in a timely manner, the growth of windows is impacted. During initial slow start, and even slow start after an idle, the send window needs to grow exponentially, adjusting to the link speed as quickly as possible for coarser tuning. It then grows linearly after reaching ssthresh for finer-grained tuning. However, if the ACK is lost, which has a higher probability in these types of links, then the performance throughput is again degraded. Tuning TCP for slow links includes the following parameters:
Adjust all timeouts to compensate for long-delay satellite transmissions and possibly longer-distance WANs; the timeout values must be compensated. |
< Day Day Up > |