Congestion Avoidance | Content Networking Fundamentals

To configure your network to avoid becoming congested, you have these options at your disposal:

Weighted Random Early Detection
Policing and Shaping
QoS Policy Signaling

Configuring Weighted Random Early Detection

As discussed previously in Chapter 2 TCP senders use TCP slow start or TCP congestion avoidance or both to slow down transmission during periods of congestion. Without WRED enabled, packet dropping occurs from the tail of overloaded router queues, which works well if the congestion affects only a few senders. However, vast numbers of senders may commence TCP slow start at the same time during packet tail-drop. Consequently, traffic temporarily slows down to an extremely low level, and all flows initiate slow-start once again, creating an effect called global synchronization.

Note

WRED applies only to TCP and does not influence UDP-based traffic. However, real-time UDP applications benefit from WRED by preventing network congestion that is caused by bandwidth-hungry TCP-based applications.

You can configure WRED to randomly dropping packets slightly before congestion occurs. Thus, individual TCP senders initiate TCP slow start or congestion avoidance or both, rather than large groups of senders at once. Routers trigger WRED when its average queue size reaches a minimum threshold, after which it drops certain packets given a drop probability (see Figure 6-6). The drop probability is 10 percent by default, but you can adjust this value depending on your environment. The router randomly drops packets in this way until the average queue size lowers below the minimum threshold or the maximum queue size threshold is reached. Note that traffic in the queues between the minimum and maximum WRED thresholds has an increasing chance of being dropped as the queue fills. When the maximum queue size threshold is reached, the router performs tail-drop.

Figure 6-6. Weighted Random Early Detection

When a packet arrives at an interface, the router calculates the average queue depth. For example, if you configure three CBWFQ, and they currently have queue lengths 50, 75, and 100, then the router calculates the average queue size as 75. If the average depth is less than the minimum threshold, then the router queues the arriving packet. If the average is between the minimum and maximum thresholds, then the router drops the packet based on the probability denominator. If the average queue size is greater than or equal to the maximum threshold, then the packet is dropped.

Enable WRED on an interface with the random-detect interface configuration command:

 random-detect {dscp-based | prec-based}

To configure the thresholds and mark probability denominator for WRED, use the command:

 random-detect precedence precedence min-threshold max-threshold mark-prob-denominator

When you enable WRED without parameters in the random-detect command, the router uses default values for the drop probability denominator, minimum, and maximum thresholds. The default mark probability denominator is 10 (one out of every 10 packets is dropped). The router calculates the default maximum threshold based on the queue capacity and the transmission speed of the interface. The default minimum threshold depends on the precedence set in the packet. The minimum threshold for IP Precedence 0 is half of the maximum threshold. The values for the IP Precedences one through seven are between half the maximum threshold and the maximum threshold at evenly spaced intervals. That is, higher-priority traffic is less likely to be dropped.

You can manually change the WRED parameters to more accurately reflect the behavior of your queues. For example, if your network contains three classes of traffic with IP Precedence values of three, four, and five, you can enable WRED on your routers with the configuration in Example 6-7.

Example 6-7. Configuring WRED for Three IP Precedence Values

 interface FastEthernet1/0/0  random-detect prec-based  random-detect precedence 0 32 128 50  random-detect precedence 3 64 192 100  random-detect precedence 4 96 256 150  random-detect precedence 5 128 320 200

Make sure that you monitor average interface queue sizes before you adjust the default values of WRED parameters. Otherwise, the router may drop packets too aggressively, causing global synchronization to occur.

Note

WRED is RSVP-aware, and it can provide the RSVP controlled-load integrated service, as discussed later in this Chapter.

Instead of randomly dropping packets or tail-dropping packets, you can configure your routers to mark packets with the Explicit Congestion Notification (ECN) field. Refer to Chapter 2 for the values of the ECN field. You can configure ECN with WRED using the sample configuration in Example 6-8.

Example 6-8. Configuring WRED with ECN

 policy-map wredecn  class class-default  bandwidth percent 100   random-detect   random-detect ecn interface fastethernet0/10  service-policy input wredecn

Because Example 6-8 gives only a single QoS policy, you should configure the default class "class-default" to configure WRED and ECN.

Now, when the average queue length reaches the minimum threshold and the mark probability denominator establishes that the packet should be dropped, the router first checks the ECN value in the packet. If the ECN value is binary 01 (0b01) or 0b10, the router knows that the sending and receiving hosts are capable of understanding ECN. Therefore, instead of dropping the packet, the router sets the ECN value to 0b11, to indicate to the hosts that the network is congested. If the value of the ECN field is already 0b11, the router forwards the packet unchanged. Senders slow down transmission when they receive packets with an ECN value of 0b11.

Understanding Policing and Shaping

You can apply traffic shaping or policing at the edge of your network to control the bandwidth available to your applications within the network. Figure 6-8 illustrates the difference between traffic shaping and policing for the bandwidth graph given in Figure 6-7.

Figure 6-7. A Typical Bandwidth Graph

Figure 6-8. Traffic Shaping Versus Traffic Policing

During excessive traffic, routers that you configure with traffic shaping store excessive bursts of packets exceeding committed access rate (CIR) in their queues. When the bandwidth eventually decreases below the CIR, the router can transmit the queued packets. In contrast, traffic policing simply drops traffic exceeding the CIR.

Cisco traffic shaping and policing uses a token bucket metaphor to transmit packets. With traffic shaping, when a router needs to send a packet, it first withdraws a number of tokens from the bucket equal in representation to the size of the packet. The token generator replenishes the tokens at a constant rate that is equal in representation to the configured CIR. However, the bucket will not receive more tokens than its capacity. A full bucket is equal to the amount of the largest available burst in a time interval T, referred to as the committed burst size B_c. For example, if your router sends a burst of packets equal in size to the bucket during one interval T, any more packets are either discarded (when policing is enabled) or held in the queuing system (when traffic shaping is enabled). Shaping holds the packet in the queue until the token generator replenishes the bucket with enough tokens to send the new packet.

As Figure 6-9 illustrates, when the token bucket empties, policing sends only at the CIR, which is the rate at which the token generator replenishes tokens in the bucket. The router drops traffic exceeding the CIR. In contrast, shaping stores bursts in WFQ queues until the level of traffic is less than the token replenish rate, resulting in a smooth traffic curve.

Figure 6-9. The Token Bucket Mechanism

The traffic shaping token generator increments the token bucket periodically with B_c worth of tokens at every constant interval T. You should choose the normal burst (B_c) as half a second's to several seconds' worth of traffic at the configured CIR. Therefore, you may use the following equation to calculate B_c:

B_c = (CIR / 8 bits) * 0.5 seconds

Tip

Make sure that you monitor your effective traffic throughput when you enable shaping or policing, to ensure that you have an appropriate value for B_c. A value that is too low may cause your actual throughput to differ from the CIR you configure.

For example, using a CIR value of 100,000 bps would give you B_c = 25,000 bytes. Therefore, every 500 ms, the token generator credits the token bucket with 25,000 bytes worth of tokens. Notice that you can derive the time interval T from the token credits T = B_c / CIR.

T = B_c/ CIR = (25,000 bytes * 8 bits) * 100,000 bps = 200,000 / 100,000 bps = 0.5 seconds

Therefore, every 500 ms, the token generator credits the token bucket with 25,000 bytes worth of tokens. Notice that you can derive the token credits B_c for the time interval T by using B_c = CIR / T. Therefore, with an interval T of 500 ms, you can calculate B_c as follows:

B_c = CIR / T = 100,000 bps / 0.500 s = 200,000 bits = 25,000 bytes

With traffic policing, the token generator does not add tokens to the bucket at every interval T. Instead, policing replenishes the token bucket continuously rather than at constant intervals. That is, the token generator credits the bucket with tokens every time the router sends a packet. The generator uses the inter-packet arrival time (the time between packets) to calculate the amount of new tokens for the bucket. From the example described previously, the CIR is 100,000 bps, and B_c is 25,000 bytes. Assuming that the bucket is idle for some time, it should be full with 25,000 bytes worth of tokens. If a router sends its first packet and has another packet to send 0.05 seconds later, the amount of tokens replenished in the bucket after the second packet is

Bucket credit = DT * CIR = 0.05 * 100,000 = 5000 bits = 625 bytes

Therefore, the token generator credits 625 bytes worth of tokens when the router sends the second packet. For example, with a burst of one 1500-byte packet sent every 0.05 seconds for 0.75 seconds (that is, 15 packets in total), there will still be 9375 bytes (15 intervals * 625 bytes) worth of tokens left in the token bucket, even though the router has sent 22,500 bytes (1500 bytes * 15 packets).

Both traffic shaping and policing have extended capabilities to allow the router to borrow tokens in the event that packets are available to send, but the bucket is empty. B_e is the excess burst associated with the amount of tokens that the router can borrow. As a result, the maximum burst size is now B_c + B_e. This borrowing feature is meant to prevent queue tail-drop by applying congestion avoidance using WRED for traffic between the B_c and B_e parameters. If the router sends no packets during an interval, the token bucket may accumulate B_e worth of tokens from the next interval, resulting in maximum of B_c + B_e bytes that are allowed to be transmitted at any given point in time.

To configure traffic shaping, you can use either Generic Traffic Shaping (GTS) or class-based traffic shaping. To configure traffic policing, you can use Committed Access Rate (CAR), class-based policing, or two-rate policing.

Configuring Generic Traffic Shaping

As mentioned previously, the difference between traffic shaping and policing is that, when the token bucket is full, shaping queues packets until congestion decreases.

You can use the following command to enable GTS on a router:

 traffic-shape rate bit-rate [burst-size [excess-burst-size]]

For example, if employees in your organization need to use both real-time and file transfers applications over a T1 link (1.544 Mbps), you should ensure that the file transfers do not flood the T1. To do this, you can constrain file transfers to 70 percent of the T1 on average, but allow 10-second bursts to take up 95 percent of the T1. For your calculations, 70 percent of a T1 is 1,080,800 bps and 95 percent is 1,466,800 bps.

To calculate B_c for traffic shaping, use B_c= CIR * 0.5 seconds = 1,080,800 * 0.5 as with previous examples. Therefore, Bc is 540,400 bits/T. Additionally, bursts at 95 percent of the T1 generates 1,466,800 bits * 0.5 seconds = 733,400 bits of traffic. To calculate B_e, subtract 733,400 from B_c to get 193,000. Use the interface configuration command in Example 6-9 to achieve these requirements.

Example 6-9. Configuring Generic Traffic Shaping

 access-list 101 permit tcp any any ftp interface serial 1  traffic-shape group 101 1080800 2161600 14668000

Configuring Class-Based Traffic Shaping

Class-based traffic shaping uses the MQC to configure traffic shaping policies. The MQC enables you to shape either incoming or outgoing interface bandwidth. You also can apply shaping to the overall interface traffic, or you can police traffic-based IP address, MAC address, and IP Precedence or DSCP. Use the following command to enable traffic shaping:

 shape {average | peak} cir [bc] [be]

To achieve the results from Example 6-9, use the configuration in Example 6-10.

Example 6-10. Configuring Class-Based Traffic Shaping

 access-list 101 permit tcp any any ftp class ftp-traffic  match access-group 101 policy-map shape-ftp  class ftp-traffic   shape average 1080800 540400 193000 interface serial 1  service-policy output shape-ftp

Configuring Committed Access Rate (CAR)

CAR is a traffic policer with the ability to police either incoming or outgoing interface bandwidth. Using CAR, you can apply policing to the overall interface traffic, or you can police traffic-based IP address, MAC address, and IP Precedence.

If the traffic rate reaches the CIR (that is, the token bucket is empty), you can configure CAR to drop the packet or borrow more tokens and send the packet. Alternatively, before the router sends packets, you can configure CAR to mark its IP Precedence.

CAR maintains parameters for CIR, B_c, and B_e. Therefore, you can apply actions to traffic conforming to and exceeding these values. Conform actions are applied to bursts less than B_c. Exceed actions are applied to bursts between and B_c and B_c + B_e.

To enable CAR on an interface, use the interface configuration command:

 rate-limit {input | output} [access-group [rate-limit]] [acl-index]  bps burst-normal burst-max conform-action action exceed-action action

In this command, burst-normal is B_c and burst-max is B_c+ B_e. For example, if your router has an OC3 interface (155 Mbps), and you want traffic less than 100 Mbps to be marked with IP Precedence 3, traffic between 100 Mbps and 120 Mbps to be marked as best-effort (IP Precedence 0), and traffic exceeding 120 Mbps to be dropped, use the CAR interface configuration command:

 rate-limit output 100000000 6250000 7500000 conform-action set-prec-transmit   3 exceed-action set-prec-transmit 0

Using the equation burst-normal = B_c = (CIR / 8 bits) * 0.5 seconds discussed previously, burst-normal = (100,000,000 / 8) * 2 = 6.25 MB. Additionally, in this example, burst-max = B_c* 1.20 = 7.5 MB, giving B_e = 1.25 MB, in order to calculate the maximum burst required to get a peak information rate (PIR) of 120 Mbps.

Configuring Class-Based Policing

Class-based policing supersedes CAR to include support for IP DSCP marking. Additionally, you can specify actions on traffic that exceeds B_c + B_e, with the new command parameter violate-action. Whereas, with CAR, the router drops traffic exceeding B_c + B_e.

To support actions on traffic that exceeds B_c + B_e, the router maintains two token buckets: a conform bucket and an exceed bucket. The number of tokens in the conform bucket is equal to B_c, and the number of tokens in the exceed bucket is B_e.

To enable class-based policing, use the following command:

 police bps burst-normal [burst-max] conform-action action exceed-action   action [violate-action action]

The conform bucket is burst-max = B_c + B_e. The exceed bucket is B_e = burst-max -B_c.

As discussed previously, when a packet arrives at a traffic policer, the generator replenishes the token bucket with tokens representing DT * CIR worth of bytes. Using two buckets, the conform bucket receives the replenish tokens, and the exceed bucket receives the tokens overflowing the conform bucket.

If you are using the bandwidth requirements from the previous example, but instead you want traffic less than 100 Mbps to be marked with IP DSCP 40, traffic between 100 Mbps and 120 Mbps to be marked as IP DSCP 24, and traffic exceeding 120 Mbps as IP DSCP 0, then use the MQC configuration in Example 6-11.

Example 6-11. Configuring Class-Based Policing for Three Levels of Bandwidth

 policy-map police-oc3  class class-default   police 100000000 6250000 7500000 conform-action set-dscp-transmit cs5     exceed-action set-dscp-transmit cs3violate-action set-dscp-transmit     default interface pos 0/0  service-policy output police-oc3

When a packet arrives and the conform bucket is not empty, the conform action is applied to the packet. If the conform bucket is empty, the exceed bucket is checked for tokens. If the exceed bucket is not empty, the exceed action is applied to the packet. If the exceed bucket is empty, the violate action is applied to the packet.

Configuring Two-Rate Policing

To improve on class-based policing technologies discussed previously, you can use two-rate policing that also uses two token buckets, but supports frame relay DE and ATM CLP, and MPLS marking. It also uses a much more intuitive command, as follows:

 Router(config-pmap-c)# police {cir cir} [bc conform-burst] {pir pir}   [be peak-burst]

With this command, you can explicitly configure the CIR, B_c, B_e, and peak information rate (PIR). With early versions, B_e and the PIR are implied within burst-max, as discussed previously. To achieve the same results as illustrated in Example 6-11 using two-rate policing, use the configuration in Example 6-12.

Example 6-12. Configuring Two-Rate Policing

 policy-map police-oc3  class class-default   police cir 100000000 bc 6250000 pir 120000000 be 1250000 conform-action     set-dscp-transmit cs5 exceed-action set-dscp-transmit cs3 violate-action     set-dscp-transmit default interface pos 0/0  service-policy output police-oc3

QoS Policy Signaling

Besides configuring QoS policies directly on your routers and switches to avoid congestion, you can also configure your routers and hosts to signal QoS policies to one another automatically. You can configure QoS policy signaling using BGP QoS policy propagation or with RSVP.

BGP QoS Policy Propagation

The BGP QoS policy propagation feature enables you to classify packets by IP Precedence based on BGP autonomous system (AS) paths, BGP community lists, and access lists. A BGP router classifies the traffic and associates priority levels using IP Precedence to BGP updates and advertises the updates to remote routers. When remote routers receive the BGP routing updates, they can apply QoS policies, such as traffic policing, shaping, and WRED, to traffic that follows the advertised routes.

Resource Reservation Protocol (RSVP)

RSVP is the only mechanism for hosts to inform the network of end-to-end QoS requirements, such as bandwidth and latency guarantees, before sending the actual data flow. Hosts place reservations using special out-of-band RSVP signaling control packets. Routers perform the QoS features discussed in previous sections with in-band signaling. In-band QoS signaling occurs as traffic flows through the network.

File transfer applications seldom require reservation of bandwidth because their behavior is normally bursty and short-lived. On the other hand, real-time applications generate a constant flow of data and would benefit from a long-lived reservation of network resources. Otherwise, file transfer bursts would cause perceived delay and jitter for your real-time applications.

Using RSVP, you have the ability to provide bandwidth rate guarantees to your applications, such as H.323 video conferencing. H.323 provides constant data encoding and therefore requires a minimum bandwidth. Besides H.323, RSVP also provides delay guarantee to applications, such as streaming media.

RSVP was designed specifically for multicast networks running over simplex UDP flows. The reservation is for one-way resourcesin the direction flowing from sender to receiver. However, RSVP can also reserve for any number of senders to any number of receivers.

Clients reserve network resources with either explicit or wildcard scopes. With an explicit reservation scope, receivers must specify senders in the reservation request, whereas with a wildcard reservation scope, receivers do not specify the senders. Additionally, RSVP defines the following styles of reservations to differentiate environments containing multiple senders:

Fixed Filter (FF) Receivers generate a single reservation per sender. This is useful when individual senders generate distinct flows, such as a video presentation with numerous video sources generating individual streams. The receiver identifies the senders explicitly by IP address within the reservation.
Wildcard Filter (WF) Receivers generate a single reservation for a group of senders. You should use WF when individual senders generate a cumulative shared flow, such as in an audio presentation, where only one speaker talks at a time. The receiver does not explicitly identify individual senders; instead, it identifies the group of senders using a wildcard (*) reservation scope.
Shared Explicit (SE) SE enables the receiver to identify the senders in a shared flow. With SE, you should first establish a multicast distribution tree within your network using IGMP and PIM. You can then use RSVP for reserving the necessary QoS along the path that the data will flow in the multicast tree, as determined by the underlying unicast or multicast routing protocol.

The IETF Integrated Services suggests two classes of RSVP service: controlled-load and guaranteed QoS. The controlled-load level of service closely approximates best-effort service under unloaded conditions. It does this by specifying traffic shaping (similar to the traffic shaping described previously) and minimal delay-handling capabilities. RSVP achieves minimal delay by borrowing tokens or simply increasing the token bucket size to a value such that it decreases the possibility of queuing. On the other hand, the guaranteed QoS class of service specifies better delay guarantees by setting target values for delay. The routers estimate the queuing delays before the reservation is accepted based on current queue sizes throughout the network. The routers use the estimates as maximum delay guarantees to the application. That is, the delay will never be worse than the computed end-to-end delay.

For example, Figure 6-10 shows how the senders send RSVP PATH messages down the multicast tree. The initial PATH request contains the bandwidth it expects to generate in the "sender TSpec" object. The sender TSpec message contains information about the traffic that the sender will generate, such as the token bucket CIR, B_c, B_e, and minimum and maximum packet sizes. The routers along the path modify these values if they cannot supply the bandwidth that the senders are capable of generating. Additionally, guaranteed QoS accumulates an estimation of the queuing and end-to-end delays at each hop that the PATH message takes toward the receivers, whereas controlled-load does not.

Figure 6-10. A Sender PATH Message Flowing Down Toward the Receivers

In Figure 6-10, the sender sends its TSpec object down the multicast tree toward the receivers. The intermediary routers make necessary modifications of the sender's requirements, depending on the bandwidth and QoS availability at each hop, while forwarding the PATH message down the tree. The receivers take the accumulated information from the PATH message into consideration while determining what QoS to request from the network. As illustrated in Figure 6-10, modification of the CIR is necessary to accommodate the bandwidth bottlenecks in the network. This is assuming that the flow is able to use the full bandwidth of the link. In Cisco IOS, RSVP enables you to configure the overall bandwidth for RSVP and the rate per application flow, based on the bandwidth of the interface. To enable RSVP on an interface, use the interface configuration command:

 ip rsvp bandwidth [interface-kbps] [single-flow-kbps]

During the PATH message's traversal down the tree, each router stores state information it will use to route the return RESV messages from the receivers on the reverse path the PATH messages take. The senders and receivers send periodic PATH and RECV messages to refresh the soft state tables. This enables routing changes to occur without affecting the reservation.

Figure 6-11 shows how, after the receivers process the PATH message, they send a RESV message to request network resources. The routers merge the RESV messages as they progress toward the senders. The RESV message contains a "receiver FlowSpec" object that includes a receiver TSpec and an RSpec. The receiver uses the RSpec to request the guaranteed QoS class of service. The TSpec includes the requested token bucket size, based on information received in the PATH request. The RSpec contains the delay value, obtained from the PATH message that the routers will use to deploy queuing mechanisms along the path. For either controlled-load or guaranteed QoS reservations, you can use PQ/WFQ and WRED. PQ/WFQ is beneficial for guaranteed QoS reservations because strict priority queuing guarantees maximum levels of delay. Recall that you can allocate the number of WFQ queues for RSVP with the reservable-queues parameter of the fair-queue interface configuration command.

Figure 6-11. A Receiver RECV Message Flowing Up Toward the Sender

Each router determines whether or not they have the available QoS to allocate for the request. If they have the available resources, the end-to-end reservation is further accumulated by forwarding the request up the tree. Otherwise, the routers send a RESV ERROR message to the receiver(s). The result is essentially a copy of the largest, or most demanding, request made by the downstream receivers arriving at the individual senders. To simplify the illustration, you can consider the merged QoS as the largest of a comparison between absolute numbers. Figure 6-11 illustrates how merging RECV requests results in the senders receiving the QoS pseudo-value of 200.

Note

RSVP does not permit merging reservations of different styles.

To mark packets that conform to or exceed requested bandwidth resources, you can use the command:

 ip rsvp precedence {conform precedence-value | exceed precedence-value}

Keep in mind the following key items while evaluating RSVP for your network:

RSVP works with unicast reservations, but is not as scalable as it is when you use it in multicast reservations.
RSVP interoperates with routers that do not support RSVP because most core networks do not run RSVP services. For example, a reservation would terminate at the MPLS edge and take advantage of MPLS engineering in the core. The reservation would then reconvene at the other end of the MPLS tunnel.
The drawback to RSVP is the time it takes to set up the end-to-end reservation.
RSVP does not work well with LANs. You should use Subnetwork Bandwidth Manager for LANs instead.

Answer these questions before implementing RSVP in your network:

1.	How much bandwidth should you allow per end-user application flow?
2.	How much overall bandwidth should you allocate for RSVP flows?
3.	How many WFQ queues should you allocate to RSVP?
4.	How much bandwidth should you exclude from RSVP to allow for low-volume data conversations?
5.	Do you have any multiaccess LANs in your network?