QoS Implementation

Implementing QoS covers these categories: classification and marking of traffic, choosing a queuing method, conditioning traffic (shaping and policing), and efficiently using bandwidth. The following section deals with these categories, and examines how these components are implemented in a network.

Classification and Marking of QoS

Classification is the process of grouping traffic based on its QoS needs. Two methods are used to break up traffic into classes: access control lists (ACLs) and network-based application recognition (NBAR). Marking is then used to tag the packet and frame with the traffic group's classification value.

You typically want to classify and mark traffic as close to the source as possible. This means that, optimally, you want to perform this process as soon as the traffic enters the network at the access layer. For server farms grouped at the distribution layer, the classification and marking occur here for traffic sourced from these devices. This boundary is referred to as a trust boundary. Trusted devices are devices that understand and implement QoS; untrusted devices don't. All of your routers and switches between sources and destinations that need QoS should implement QoS solutions and are therefore considered to be trusted devices.

Classification Methods

With IP traffic, the TOS field is used to classify traffic. The TOS field enables you to assign traffic to one of six classes (0 5). Non-IP traffic is more difficult to classify because non-IP Layer 3 traffic typically doesn't have a TOS (Type of Service field in the IP packet header) function. Therefore, non-IP traffic is typically classified based on the source and/or destination port number used at the transport layer unless a CoS (Class of Service) value exists in an 802.1Q/P frame. In this situation, the CoS value is used if it is trusted. This assumes that a trusted device, like one of your switches or routers configured with QoS, has marked this frame. Table 9.2 lists the classification methods that QoS can use as well as how traffic is sorted into different classes within the IOS.

Table 9.2. Classification Methods and Classifying Traffic
Classification Method	Class Selection Method
Policy-Based Routing (PBR)	Route maps
Priority and Custom Queuing	ACLs, ingress interface, Layer 3 protocol, and/or size of the packet
Committed Access Rate (CAR) and Class-Based Policing	ACLs, DSCP, QoS group, and rate limit ACLs
All methods, including the other methods in this table	Class maps (includes use of ACLs, NBAR, ingress interface, source/destination MAC addresses, QoS group, MPLS information, DSCP, and IP ToS field)

PBR enables you to route packets based on more than just the destination address in the packet. For instance, if a packet is going to a certain destination and is coming from a particular source address or network, you might want to route it across a different path than what is currently in the routing table. ACLs and route maps are typically used to perform the matching. PBR is beyond the scope of this book, but it is covered in Cisco's BSCI course.

Priority and custom queuing are used to queue and service traffic on egress ports based on a configured prioritization. CAR and class-based policing affect how traffic is transmitted to ensure that it operates under expected conditions, like using an expected amount of bandwidth. These methods are discussed later in this chapter.

Marking Options

After traffic is classified by a trusted device, it must be marked so that other trusted devices can implement the appropriate QoS policy. Marking can occur at Layer 2, Layer 3, or both.

At Layer 2, the CoS (tag) field is used in the IEEE 802.1P frame. This field is also used by 802.1Q and contains a priority field (CoS) as well as a VLAN ID. CoS supports the seven different priorities displayed in Table 9.3.

Table 9.3. 802.1P CoS Priorities
Priority	Explanation
0	Best effort delivery
1	Medium priority
2	High priority
3	Call signaling information
4	Video conferencing
5	Voice channel
6 7	Reserved

A Layer 3 device typically uses ACLs to place traffic into the appropriate traffic class. Assuming that the Layer 3 protocol is IP, marking in an IP packet is done in the ToS field, which is 1 byte in length. There are two methods to implement marking: IP precedence, which is used in IPv4, and DiffServ. Both of these marking methods use the ToS field. With IP precedence, the three high-order bits are used to mark the traffic's class and the remaining five bits are not used. In DiffServ, the six high-order bits are used to contain the DSCP class value and the lower two bits contain flow control information.

802.1Q/P is used to mark Layer 2 frames with CoS information. The IP TOS field is used to carry QoS information in IP packets. This can be accomplished by using IP precedence or DiffServ.

Managing Congestion with Queuing

After traffic is classified and marked, trusted devices can use the information to queue information appropriately. There are three components to queuing: classification, insertion, and service (scheduling).

When a packet is received on an interface, the first step that takes place is classifying the traffic, if it is not already associated with a class. After the traffic is classified, it must be queued up, based on its classification, on the egress interface. Once placed in the queue, the queue needs to be serviced removed from the queue, encapsulated in a frame, possibly marked, and then sent out the interface. If packets must be dropped from the queue, tail dropping is used. With tail dropping, the last packets in a connection are dropped.

There are many different queuing solutions available to the IOS. However, each queuing solution will be different based on how traffic is classified, how traffic is inserted in the queue, and how the queue is processed. Table 9.4 displays some of the queuing methods available and the DiffServ method they are grouped under.

Table 9.4. Queuing Methods
Queuing Method	DiffServ Method
Class-based weighted-fair queuing (CB-WFQ)	AF
Weighed random early detection (WRED)	AF
Custom Queuing (CQ)	AF
Class-based low latency queuing (CB-LLQ)	EF
IP RTP prioritization	EF
Priority queuing (PQ)	EF

The following sections cover different types of queuing, including the ones just mentioned.

FIFO Queuing

FIFO, first-in-first-out, doesn't provide any type of QoS the first packet or frame received is the first one queued up. Traffic is not associated with any class; instead, priority is defined by when the packet comes into an interface. The default queuing method on Cisco Catalyst switches is FIFO queuing, which performs queuing in hardware.

Cisco supports a software-based version of FIFO queuing, which breaks up RAM into four queues each serviced with best-effort delivery. Each queue is processed in a weighted round-robin (WRR) fashion. This enables you to implement a very basic form of QoS and give general preference to one queue over another.

Priority Queuing

Priority queuing (PQ) also has four queues. However, each queue has a distinct priority: high, medium, normal, or low. Strict priority is enforced in this scheme. First, the high queue is emptied. When the high priority queue is emptied, the IOS checks to make sure that no new packets have been added to it. If so, the high queue is processed again. The medium queue is processed only when the IOS checks the high queue and finds it empty. Both the high and medium queues must be empty for the normal queue to be processed and the high, medium, and low queues must be emptied before the low queue is processed.

Therefore, given this priority scheme, there is a chance that the lower-end queues might never be processed. It is therefore very important to know what traffic is placed into what queues. This is typically done based on the protocol of the packet, the ingress interface, the size of the packet, and ACLs.

The one advantage that priority queuing has is that any traffic classified and placed in the high queue is always guaranteed to be serviced.

Custom Queuing

Unlike priority queuing, custom queuing (CQ) has 16 queues. The same classification techniques used in PQ are used to place packets into one of the 16 queues in CQ. The main difference between PQ and CQ is that PQ guarantees only that the high queue will be processed; CQ guarantees that every queue will be processed. In CQ, queues are processed in a round-robin fashion. To give preference to one queue over another, you specify the amount of traffic that is allowed to be processed from a given queue.

As an example, if you wanted to give preference to queue 1 over 2, you can allow queue 1 to process twice as much information as queue 2 when the IOS is servicing the queues. Because CQ processes all queues, no one type of traffic will ever be starved for bandwidth. The main problem of CQ and PQ is that they cannot adjust to changing network conditions; how traffic is placed into queues and how much traffic is processed from the queues is hard-configured.

Weighted Fair Queuing

Weighted fair queuing (WFQ) examines traffic flows to determine how queuing occurs. A flow is basically a connection that Cisco calls a conversation. The IOS examines the Layer 3 protocol type, such as IP, ICMP, OSPF, and so on, the source and destination address, and the source and destination port numbers to determine how data should be classified. Based on this information, the traffic is either classified as high or low priority.

Traffic such as well-known voice and video applications, as well as interactive applications like telnet, are typically given higher priority. Traffic such as file transfers (FTP) and Web connections (HTTP) are given lower priority. Within the higher-priority traffic, different traffic flows are processed in a round-robin manner. This is also true of the lower-priority traffic: Traffic of the same priority is treated equally. WFQ is the default queuing method used on IOS routers with E1 or slower WAN links.

Class-Based Weighted Fair Queuing

Class-based weighted fair queuing (CB-WFQ) is an extension of WFQ. In WFQ, the IOS automatically determines what goes into the higher and lower queue structures: You have no control over this process. With CB-WFQ, you can configure up to 64 classes and control which traffic is placed in which class. You can also restrict a class to a certain amount of bandwidth on the egress interface. CB-WFQ gives you much more prioritization control on queuing on the egress interface, but requires configuration on your part. The one nice feature of WFQ is that it doesn't require any configuration on E1 or slower WAN link connections because it is already enabled and the IOS performs the prioritization for you automatically.

Low Latency Queuing

Low latency queuing (LLQ) uses two forms of queuing: PQ and CB-WFQ. The first thing that LLQ checks is to see whether the classification of the egress traffic is high. You can reserve either a percentage of bandwidth or a block of bandwidth for the high priority queue. If the traffic is high priority, it is processed first. Otherwise, CB-WFQ is used to process the traffic. One advantage that LLQ has over WFQ or CB-WFQ is that you specify which traffic is classified as high priority, and it is always given preference over the other types of traffic, even ensuring that its configured bandwidth allocation is met.

LLQ uses a combination of PQ and CB-WFQ. The PQ has the highest priority and is processed first. All other traffic is processed using CB-WFQ.

Real-Time Transport Protocol Priority Queuing

RTP, an IP protocol, is used to provide transport services for voice and video information. Cisco supports a queuing method called real time transport protocol priority queuing (RTP-PQ), which provides a strict prioritization scheme for delay-sensitive traffic. Delay-sensitive traffic is given higher prioritization and is processed before other queues. This queuing scheme is normally used for WAN connections.

In RTP-PQ, there are four queues, just as in PQ. The highest priority queue, voice, is always processed first. This is the first queue. The IOS looks at the UDP port numbers to determine whether traffic should be placed in this queue.

Data is typically placed in the other three queues. These queues use either the CB-WFQ or WFQ method to process and dispatch packets from the queue. If packets are classified using an IP precedence value, queuing is processed based on class structures. Data with an IP precedence value of 4 is placed in the second queue, which is sometimes referred to as the high data queue. An IP precedence value of 2 means the data is placed in the third queue, which is sometimes referred to as the medium data queue. Data with an IP precedence value of 0 is placed in the fourth queue, which is sometimes referred to as the low data queue. If the IP precedence value is not used (all packets have this value set to 0), normal WFQ is used.

Weighted Round-Robin Queuing

Weighted round-robin queuing (WRRQ) is a queuing solution used on the egress ports of Layer 3 switches, such as the Catalyst 3550. Like RTP-PQ, WRRQ has four queues and traffic is placed in the queues based on its IP precedence value. Each queue is assigned a weight value. Whenever congestion occurs in the egress direction of the port, the weight value is used to service the queues. Higher-priority queues (more weight) are given preference over lower-priority queues (less weight); however, no queue is ever starved. In other words, all queues get at least some bandwidth, but the higher-priority queues get more bandwidth than lower-priority queues. This is somewhat similar to CQ.

One option that you can specify in WRRQ is an expedite queue, which performs a similar function as the high queue in PQ. With this enabled, the expedited queue's traffic is always processed first and the other queues are processed in a round-robin fashion. In a sense, this is a combination of PQ and CQ: PQ with the high queue and CQ on the remaining three queues. Like WFQ for routers, WRRQ is automatically enabled in the egress direction on Cisco's Layer 3 switches.

WRRQ is the default queuing method on egress interfaces for Layer 3 Catalyst switches.

Avoiding Congestion

Congestion avoidance is a QoS technique that allows packet dropping, but it presumes that dropping certain packets will not cause problems for the connections on which the packets were dropped as well as decrease congestion issues. This section covers three types of congestion avoidance techniques.

Tail Dropping

Tail dropping is one of the most common forms of dealing with congestion during egress queuing. When queuing up packets during a period of heavy congestion, the queue will fill up at some point in time, leaving no room for more packets. During this period, any newly arrived packets for the egress queue are dropped. With tail dropping, all traffic is treated equally. In other words, the IOS doesn't look at whether this is UDP or TCP traffic, or data or voice. This can be detrimental for TCP-based connections because dropping one packet from a connection can cause the retransmission of multiple packets. In a network that heavily utilizes TCP, using tail dropping can actually create more congestion than it reduces.

Tail dropping has the following problems:

Tail dropping doesn't differentiate between different traffic types.
When congestion occurs and dropping begins, delay- and jitter-sensitive applications will suffer.
For TCP connections, tail dropping causes both dropped and already-received packets to be re-sent, creating inefficient bandwidth utilization.
If tail dropping occurs across many TCP connections and these TCP connections resend their packets, it can create an additional burst of congestion.
TCP has a poor feedback mechanism: It doesn't retransmit only dropped packets, it retransmits packets based on the negotiated window size.

However, the main advantage of using the tail dropping method is that it requires very little processing of the device to drop the packets or frames. Tail dropping is typically used in environments where congestion is minimal or nonexistent, or the dropping of packets or frames is acceptable and won't cause major disruptions for connections. As an example, you might have a large number of file transfers that are not time-sensitive. Dropping packets from these connections, which would cause retransmissions, won't be an issue (assuming that most of the packets are successfully transmitted).

If you're using WFQ when congestion occurs, WFQ uses a more intelligent mechanism to deal with the dropping of packets: congestion discard threshold (CDT). CDT weights dropping. It drops packets from high-bandwidth connections before it drops those using low amounts of bandwidth. However, the main downside of WFQ and CDT is that they don't scale at both the distribution and core layers.

Random Early Detection

With tail dropping, dropping occurs when the egress queue is filled up, and all traffic trying to enter the queue is dropped; no preference is given to dropping one type over another. Random early detection (RED) is a mechanism that handles congestion slightly better than tail dropping. With RED, a threshold is assigned to the queue. When this threshold is reached, traffic being placed into the queue is randomly dropped; some traffic is allowed to enter the queue and other traffic is dropped. RED, therefore, tries to deal with congestion before the queue is filled up and everything has to be dropped.

However, RED has one main problem: It doesn't look at the class of traffic (CoS or IP precedence) when dropping traffic it just randomly drops certain packets. If RED were dropping TCP traffic, congestion could typically be averted before it gets worse. However, because RED doesn't look at the type of traffic it is dropping, it can create problems for other applications.

Weighted Random Early Detection

Weighted random early detection (WRED) is an extension of RED. Like RED, the egress queues have a threshold assigned to them and when the threshold is reached, packets are randomly dropped. However, WRED is a bit more discrete in what it defines as random. WRED, unlike RED, is CoS-aware. When the queue threshold is reached, WRED drops packets based on their CoS values. For packets that have a CoS value of 0 or 1, the threshold is set to 1 (50%). When the threshold value reaches 50%, WRED begins to randomly drop packets with a CoS value of 0 or 1. The second threshold, 2, causes WRED to start dropping packets with a CoS value of 2 or 3 when 80% of the buffer is filled. Given this scheme, packets with a low priority (CoS 0 or 1) are dropped before higher-priority traffic.

WRED is used to avoid congestion. It does this by examining CoS information and dropping packets when traffic for a specified CoS reaches its configured threshold. This is done to reduce the likelihood that upcoming congestion will cause problems with important applications or data.

Conditioning Traffic

There are two basic methods to conditioning traffic: policing and shaping. Both methods are used to limit the rate of traffic leaving an interface. These methods are typically used in WAN environments, such as ATM or Frame Relay, where the virtual circuits are guaranteed only a certain amount of bandwidth inside the carrier's network. To enforce a rate limit on the interface, the IOS has to measure (meter) traffic rate. It then enforces rates by comparing traffic to limits assigned to it. The IOS handles this process by using tokens and buckets.

Of the two methods, policing is the simplest to implement. In policing, if traffic exceeds its assigned rate limit, the IOS either drops or marks the offending traffic. As an example, a certain type of traffic, such as FTP, might be assigned a bandwidth limit of 1Mbps. With policing, any traffic sent beyond the limit is either dropped or marked as low priority. This process requires few resources on the IOS device because the device doesn't need to use memory to buffer traffic. However, policing can cause problems with connection-oriented protocols such as TCP. There are two policing methods the IOS uses:

Class-based policing
Committed access rate

Shaping, on the other hand, buffers traffic that exceeds its assigned rate limit and transmits the traffic when bandwidth is available. Because shaping buffers traffic, it requires more resources on the device. However, shaping is more user-friendly to traffic than policing because it doesn't drop traffic unless its buffer is filled up. During this buffering period, the traffic is delayed. Therefore, shaping is not typically used for voice or video traffic because a delay occurs. In addition, the delay can lead to jitter, which creates problems for voice and video. Shaping is best used for data types that don't react to data loss very well, such as TCP and other connection-oriented protocols. There are three shaping methods used by the IOS:

Class-based shaping
Frame relay traffic shaping
Generic traffic shaping

Increasing Link Efficiency

One of the main issues of QoS is that it doesn't create more bandwidth for your network. Instead, it uses your bandwidth more efficiently. If you need more bandwidth, you essentially have two options: install faster links or use compression or link efficiency mechanisms.

Compression is really a short-term solution, especially if your bandwidth needs are growing linearly. Compression is a CPU-intensive process and adds delay to each packet. However, because the resulting packets are smaller in length, the serialization delay is reduced. There are three types of compression or link efficiency methods:

Header compression
Payload compression
Link fragmentation and interleaving (LFI)

Header compression compresses the headers of data. Certain headers, such as MAC and IP, can't be compressed because they must be readable by Layer 2 and Layer 3 devices to make path decisions. Instead, header compression is performed at the transport layer, like the TCP header. Also, header compression is performed on a hop-by-hop basis not for the entire connection. The IETF has created two standard header compression methods:

TCP header compression Uses the Van Jacobson compression algorithm to compress only the TCP header portion
RTP header compression Compresses the UDP and RTP headers for multimedia connections, such as voice or video

Payload compression compresses the payload, or data, and leaves the headers intact. This compression can occur at Layer 3 or Layer 2. At Layer 3, the IP Payload Compression Protocol (PCP) is typically used. It compresses everything but the IP header.

At Layer 2, the encapsulated packet (or payload) is compressed, leaving the frame header uncompressed. Cisco supports three compression algorithms for link compression: STAC (Stacker), Predictor, and Microsoft's Point-to-Point Compression (MPPC).

LFI fragments Layer 2 frames into small, equal-sized pieces and transmits these fragments in an interleaved process across the link. One advantage of this approach is that by fragmenting the Layer 2 frame, each of the fragments waits a smaller amount of time while being queued, thereby reducing delay and jitter. The downside of fragmentation is that the remote site must reassemble the fragments into a frame. Cisco supports the following LFI solutions: PPP Multilink's interleaving, FRF.11 Annex C for voice over Frame Relay, and FRF.12 LFI for data connections.

Campus QoS

Now that you have a better understanding of QoS, let's take a look at where QoS should be implemented in a campus network: access, distribution, and core.

At the access layer, switches are the typical devices connected to end users. Switches provide segmentation through the use of VLANs, and switches can perform the creation of CoS values, at Layer 2, for ingress frames. If the access layer device is a router, it can also mark DSCP information in the IP packet header.

The distribution layer typically contains Layer 3 devices. This is where most of your QoS setup occurs. Here is where you enable QoS, set up a CoS-to-DSCP table to correctly map Layer 2 QoS information into the IP ToS field, and configure policies that classify any traffic not already marked by your access layer devices. The access layer, the distribution layer, or both layers are responsible for the following QoS functions:

Classifying packets based on configured policies
Admitting and managing connections
Managing QoS configuration

The function of the core layer is to not classify or mark any traffic; this should already have been done at either the access or distribution layer. The core layer should instead enforce QoS policies. With a high-speed backbone, this should be a moot point. In most instances, low-latency queuing is used to process egress traffic. The core and the distribution layers are responsible for managing and avoiding congestion.

Classification and marking should occur as close to the source as possible, which is typically the access layer.