You can implement several different queueing algorithms on Cisco routers. The most common type is Weighted Fair Queueing (WFQ), which is enabled by default on low-speed interfaces. There is also a class-based version of WFQ called Class-based Weighted Fair Queueing (CBWFQ). These algorithms have the advantage of being fast, reliable, and easy to implement. However, in some cases you might want to consider some of the other queueing systems available on Cisco routers.
Priority Queueing lets you specify absolute prioritization in your network so that more important packets always precede less important ones. This can be useful, but it is often dangerous in practice.
The other important queueing algorithm on Cisco routers is Custom Queueing, which allows you direct control over many of the queueing parameters.
Weighted Fair Queueing
A flow is loosely defined as the stream of packets associated with a single session of a single application. The common IP implementations of Fair Queueing (FQ) and WFQ assume that two packets are part of the same flow if they have the same source and destination IP addresses, the same source and destination TCP or UDP port numbers, and the same IP protocol field value. The algorithms combine these five values into a hash number, and sort the queued packets by this hash number.
The router then assigns sequence numbers to the queued packets. In the FQ algorithm, this process of sequencing the packets is optimized so that each flow gets a roughly equal share of the available bandwidth. As it receives each packet, the router assigns a sequence number based on the length of this packet, and the total number of bytes associated with this same flow that are already in the queue.
This has a similar effect to a flow-based Round Robin (RR) queueing algorithm, in which all of the flows are assigned to different queues. These queues are then processed a certain number of bytes at a time until enough bytes have accumulated for a given queue to send a whole packet. Although this is a useful way of picturing the algorithm mentally, it is important to remember that the Cisco implementations of FQ and WFQ do not actually work this way. They keep all of the packets in a single queue. So, if there is a serious congestion problem, you will still get global tail drops.
This distinction is largely irrelevant for FQ, but for WFQ it's quite important. WFQ introduces another factor besides flow and packet size into the sequence numbers. The new factor is the weight, (W), which is calculated from the IP Precedence (P) value as follows. For IOS levels after 12.0(5)T, the formula is:
W = 32768/(P+1)
For all earlier IOS levels, the weight is lower by a factor of 4,096:
W = 4096/(P+1)
Cisco increased the value to allow for finer control over weighting granularity.
The weight number for each packet is multiplied by the length of the packet when calculating sequence numbers. The result is that the router gives flows with higher IP Precedence values a larger share of the bandwidth than those with lower precedence. In fact, it is easy to calculate the relative scaling of the bandwidth shares of flows with different precedence values.
Table B-4 shows, for example, that a flow with Flash Override Precedence will get five times the bandwidth of a packet with Routine Precedence. However, if all of the flows have the same Precedence, then WFQ behaves exactly the same as FQ.
|Precedence name||Value||Relative share of bandwidth|
These fair queueing algorithms tend to do three things. First, they prevent individual flows from interfering with one another. Second, they tend to reduce queueing latency for applications with smaller packets. And third, they ensure that all of the packets from a given flow are delivered in the same order that they were sent.
In practice, of course, a router has limited memory resources, so there is a limit to how many flows it can handle. If the number of flows is too large or if the volume of traffic is too high, the router will start to have trouble with the computation. So these algorithms tend to be best on low-speed interfaces. WFQ is enabled by default on all interfaces with bandwidth of E1 (roughly two Mbps) or less. The only exceptions are interfaces that use SDLC or LAPB link layer protocols, which require FIFO queuing.
Cisco provides several mechanisms to improve the bandwidth scaling of queueing algorithms. The first is Distributed Weighted Fair Queueing (DWFQ), which is only available in routers that have Versatile Interface Processor (VIP) cards, such as 7500 series routers, or the older 7000 series with RSP7000 processors. DWFQ is essentially the same as WFQ, except that the router is able to distribute the queueing calculations to the various VIP modules. But there is another important difference. DWFQ uses a different sorting algorithm called Calendar Queueing, which uses much more memory, but operates much faster. This tradeoff means that you can use DWFQ on a VIP2-50 card containing Port Adapters Modules (PAM) with an aggregate line speed of up to OC-3. In fact, if the aggregate line speed is greater than a DS-3 (45 Mbps), we don't recommend using DWFQ on anything slower than a VIP2-50. Cisco claims that DWFQ can operate at up to OC-3 speeds. However, if you need to support several interfaces that aggregate to OC-3 speeds on one VIP module, you may want to consider a different queueing strategy, particularly CBWFQ.
The next popular queueing strategy on Cisco routers, particularly for higher speed interfaces, is Class-Based Weighted Fair Queueing (CBWFQ). CBWFQ is similar to WFQ, except that it doesn't group traffic by flows. Instead, it groups by traffic classes. A class is simply some logical grouping of traffic. It could be based on IP Precedence values, or source addresses, input interface, or a variety of other locally useful rules that you can specify on the router.
The principle advantage to CBWFQ is that it allows you to expand the functionality of WFQ to higher speeds by eliminating the need to keep track of a large number of flows. But there is another important advantage to CBWFQ. The most common and sensible way to use CBWFQ is to assign the classes according to precedence or DSCP values. You can then manually adjust the weighting factors for the different classes. As you can see in Table B-4, the standard WFQ weighting factors give traffic with an IP Precedence value of 1 twice as much bandwidth as Precedence 0 traffic. However, Precedence 7 traffic gets just under 17 percent more bandwidth than Precedence 6 traffic. For many applications, these arbitrary weighting factors are not appropriate. So the ability to adjust these weighting factors can come in handy if you need to give your highest priority traffic a larger share of the bandwidth.
Priority Queueing (PQ) is an older queueing algorithm that handles traffic with different precedence levels much more pragmatically. The Cisco implementation of Priority Queueing uses four distinct queues called "high priority," "medium priority," "normal priority," and "low priority." The PQ algorithm maintains an extremely strict concept of priority. If there are any packets in a higher priority queue, they must be sent first before any packets in the lower priority queues are sent.
Some types of critical real-time applications that absolutely cannot wait for low priority traffic work well with PQ. However, there is an obvious problem with this strategy as well. If the volume of traffic in the higher priority queues is greater than the link capacity, then no traffic from the lower priority queues will be forwarded. PQ starves low-priority applications in these cases.
So a pure PQ implementation requires that you have an extremely good understanding of your traffic patterns. The high-priority traffic must represent a small fraction of the total, with the lowest priorities having the largest net volume. Further, you must have enough link capacity that the PQ algorithm is only used during peak bursts. If there is routine link congestion, PQ will give extremely poor overall performance.
However, Cisco has recently implemented a new hybrid queue type, called Low Latency Queueing (LLQ), which you can use with CBWFQ to give the best features of PQ while avoiding the queue starvation problem. The idea is simply to use CBWFQ for all of the traffic except for a small number sub-queues that are reserved strictly for relatively low-volume real-time applications. The router services the real-time queues using a strict priority scheme, and the others using CBWFQ. So, if there is a packet in one of the real-time queues, the router will transmit it before looking in one of the other queues. However, when there is nothing in the priority queues, the router will use normal CBWFQ for everything else.
LLQ also includes the stipulation that, if the volume of high priority traffic exceeds a specified rate, the router will stop giving it absolute priority. The guarantees that LLQ will never starve the low priority queues.
This model is best suited to applications like voice or video in which the real-time data comes in a fairly continuous, but low-bandwidth, stream of small packets, as opposed to more bursty applications such as file transfers.
Custom Queueing (CQ) is one of Cisco's most popular queueing strategies. CQ was originally implemented to address the clear shortcomings of PQ. It lets you configure how many queues are to be used, what applications will use which queues, and how the queues will be serviced. Where PQ has only four queues, CQ allows you to use up to 16. And, perhaps most importantly, it includes a separate system queue so that user application data cannot starve critical network control traffic.
CQ is implemented as a round-robin queueing algorithm. The router takes a certain predetermined amount of data from each queue on each pass. You configure this as a number of bytes. This allows you to specify approximately how much of the bandwidth each queue will receive. For example, if you have four queues, all set to the same number of bytes per pass, you will expect to send roughly equal amounts of data for all of these applications. Since the queues are only used when the network link is congested, this means that each of the four applications will receive roughly one quarter of the available bandwidth.
However, it is important to remember that the router will always take data one packet at a time. So if, for example, you have a series of 1500 byte packets sitting in a particular queue, and you have configured the router to take 100 bytes from this queue on each pass, it will actually transmit one entire packet each time, and not one every 15 times. This is important because it can mean that your calculations of the relative amounts of bandwidth allocated to each queue might be different from what the router actually sends.
This difference tends to disappear as you increase the number of bytes taken each time the queues are serviced. But you don't want to let the number get too large, or you will cause unnecessary latency and jitter problems for your applications. For example, if the byte count for each of your four queues is 10,000 bytes, and all of the queues are full, the router will send 10,000 bytes from the first queue, then 10,000 bytes from the second queue, and so on. From the time it finished servicing the first queue until the time that it returns to service it again, it will have sent 30,000 bytes. It takes roughly 160 ms to send this much data through a T1 link. But the gap between the previous two packets in this queue was effectively zero. Variations in latency like this are called jitter, and they can cause serious problems for many real-time applications.
So, as with all of the other queueing algorithms we have discussed, Custom Queueing has some important advantages and disadvantages. Chapter 11 contains recipes to implement all of the queueing varieties we have discussed. You need to select the one that matches your network requirements best. None of them is perfect in all situations.