QoS Technologies | Practical Service Level Management: Delivering High-Quality Web-Based Services

QoS technologies classify network traffic and then ensure that some of that traffic receives special handling. The special handling may include attempts to provide improved availability, error rates, latency, and jitter.

However, because of the perceived complexity of QoS, many organizations choose to implement service quality differentiation through the use of separate facilities (isolation) instead of through QoS technology. For example, a separate LAN can be built to handle Voice over IP (VoIP), thereby isolating it from delays caused by large-scale file transfers on the data LAN. The QoS alternative of using frame tagging on the LAN may appear to be too complicated. In some cases, the organization may simply over-provision the transport facilities massively and hope that bandwidth constriction will never occur. In addition, signaling between ISPs for QoS is generally not done, so use of QoS technologies across the public Internet is impractical.

Even when it's completely implemented, QoS does not necessarily guarantee particular performance. Performance guarantees can be quite difficult and expensive to provide in packet-switched networks, and most applications and users can be satisfied with less stringent promises, such as prioritization only, without delay guarantees.

For the stated reasons, QoS technology is primarily used in private networks and has not yet achieved the widespread use that was predicted some years ago. Nevertheless, with the growing use of VoIP and other latency-sensitive network uses, interest in QoS is growing.

This section of the chapter discusses the major QoS technologies. The QoS technologies are placed into two groups: tag-based QoS, which relies on identification tags placed into data frames and used by network switches; and traffic-shaping QoS, which tries to manage bandwidth allocations through queuing or rate-shaping at a single point instead of through the active cooperation of all network elements and explicit tagging.

This section of the chapter also discusses the alternative to the major QoS technologies. This alternative is called over-provisioning, also known as design by hope.

Tag-Based QoS

Networks forward traffic through routers, switches, and access devices. The transport infrastructures must make forwarding decisions based on the required treatment for each traffic flow.

In tag-based QoS, traffic is initially classified by having the appropriate forwarding information added to each packet. Desktops, other customer input devices, and network devices can classify and mark the packets, possibly relying on a central database of authorizations. Traffic can be identified by end user, protocol, and application at the network entry point. Then the classifier can decide whether to admit the traffic to the network and, if admitted, which classification tag to place into the data packet headers. Switches, routers, and other network devices in the core of the network then examine the tags to determine how to handle the traffic.

There are different types of QoS technologies that use classification and tagging. This subsection describes IEEE 802 LAN QoS, IP Type of Service (TOS), IP Differentiated Services (DiffServ), Multiprotocol Label Switching (MPLS), and Resource Reservation Protocol (RSVP) as examples; other technologies are also available.

IEEE 802 LAN QoS

The Institute of Electrical and Electronics Engineers (IEEE) has developed the 802 family of LAN standards. Forwarding at the data link layer within a LAN infrastructure is controlled using the IEEE 802.1D specifications, which provides a three-bit field (the 802.1p field) for priority. There are therefore eight possible non-overlapping priorities. There is also a field (the 802.1Q field) that enables the identification of up to 4095 virtual LANs (VLANs). VLAN traffic may also receive differentiated treatment.

IP TOS

The TOS byte has been part of the IP header from the earliest specification. It provides three bits that can be used to differentiate priority levels. Routers can examine these bits to set queuing priorities and to select among routing options. It's also possible to set router filters to examine other parts of the packet header (for example, the protocol type or the origin/ destination address pair) when choosing a particular forwarding priority.

Some administrators use TOS fields and filtering to provide very coarse prioritization within the router queues. At most, one or two classes of traffic, such as certain transactions, are given priority over other traffic in the queues. No strict QoS guarantees are made, and there is no attempt to influence routing decisions at all.

IP DiffServ

DiffServ technology can provide both performance guarantees and performance prioritization. It does that by using the TOS byte (renamed the DS byte) in the IP header to indicate the QoS class. All the information that the router needs to handle the packet is contained in the packet header, so routers don't need to learn or store information about individual traffic flows. The disadvantage of DiffServ is that flows must be handled as part of larger groups; it's not possible to single out a particular flow for special handling, independent of other flows. Instead, it must be grouped with many other flows for its trip through all the routers, and it will receive the same handling as all of the other flows in its group.

Aggregation into a small set of classes simplifies the management of large numbers of flows, improving scalability for large backbones. Each router, for instance, interprets the DS Byte and follows the associated forwarding behavior.

Devices at the network boundary may be used to set the DS byte according to current resource allocation policies. They can map between the DS byte and IEEE 802 QoS tags, for example.

MPLS

A classical router processes each packet by doing the following:

Decomposing the IP header
Checking whether the packet should be discarded for being too old (the time to live has expired)
Looking up the destination address in a forwarding table
Determining the next hop
Adjusting header fields
Repackaging the IP packet for transit
Queuing the IP packet for forwarding

The router repeats these steps even if the next arriving packet belongs to the same flow. This approach becomes a bottleneck with higher flow volumes and faster trunk speeds. MPLS is intended to overcome the limitations of classical routers in backbones with tens of thousands of flows.

MPLS adds a tag to each packet; each tag is associated with a predefined routing and handling strategy. Each router simply reads the tag, using it to identify the next hop and forwarding policies to use. Processing time is shortened to a table lookup and traffic is forwarded quickly through the MPLS domain.

MPLS was originally designed for the dense Internet core where high volumes must be routed with no delays. Administrators define the sets of routes and treatments associated with each label and distribute the information to the core routers. Edge devices are also configured to identify incoming service flows and append the appropriate tag or label.

Administrators can take advantage of this strategy to build static routes for traffic with time constraints, using routes with fewer hops and higher speed trunks, for instance. They can also choose to allow dynamic routing where the routers exchange reachability information and make adjustments on their own.

RSVP

RSVP is a common mechanism for reserving bandwidth across a single network infrastructure. A receiver initiates a reservation request for a requested flow. The request is passed through the network devices and those that are RSVP-enabled reserve the bandwidth as requested. The traffic flow, identified by its addresses and protocol type, is then given special handling when it passes through the RSVP-enabled devices that have accepted the reservation.

One of the drawbacks of RSVP is that devices that do not support it simply pass the request through. This leads to a situation where a device might not support RSVP but does not inform anyone of that fact; the data packets depart their point of origin assuming that they have a dedicated route, but arrive and find no resources reserved on their behalf. Rerouting of traffic flows, which is not uncommon in IP-based networks, may also result in the traffic flow going through routers that are temporarily unaware of the special handling that the packets should receive.

The network must also handle all service flows appropriately, allocating the resources needed to comply with the constraints of every active SLA.

Traffic-Shaping QoS

Bandwidth management is an essential function for guaranteeing service quality. Network bandwidth is shared among a competing set of service flows and must be allocated and managed effectively. The most critical services must receive sufficient resources to meet the objectives set forth in SLAs.

The tag-based approaches to QoS try to perform bandwidth management by tagging specific packets and then instructing all the network equipment to give those packets preferential treatment. Those approaches have difficulties if all the pieces of network equipment in the data flow path don't participate. Use of traffic-shaping QoS is an alternative.

In traffic-shaping QoS, a special appliance or process in a router is invoked to identify data flows (by their source and destination addresses) and sort them into different queues or otherwise manage their data rates. The appliances or processes try to change the characteristics of the traffic itself rather than trying to control the handling of packets between the connection end points. If that appliance or process is located at a key point through which all traffic flows, it can control the available bandwidths even though all the devices in the data flow's path don't participate. Traffic-shaping QoS is not always as precise as tag-based QoS approaches, but it's easier to implement.

There are two basic approaches to traffic-shaping QoS: rate control and queuing. Each is discussed in the following sections.

Taming the Selfish TCP Connection

A basic assumption when TCP was being fleshed out was that each connection was managed independently. The outgrowth of that independence was that each connection optimized its own delivery at the expense of the others sharing the network. As you no doubt know, TCP uses a credit mechanism: the receiving computer system specifies the amount of information it will accept from the sender. This, by itself, is a good thing because it prevents a receiving computer system with lower speed and limited resources from being swamped by a sender on a higher-speed network, as one example.

Imagine a sender is granted a credit of eight packets. (Credit is actually granted in bytes, but the point is the same.) If you are the sending system, you can boost your performance by sending those eight packets as fast as you can, banking on getting a new credit as soon as possible and then sending more data quickly in bunches.

Because connections were all treated equally, they'd each act independently, much like the "tragedy of the commons" problem in classical economics. Each attempt by one connection to grab a bigger share of available bandwidth could result in congestion and packet loss for another connection, quickly degrading overall performance. Routers being hit with multiple bursts may be forced to discard some packets, leading to retransmission delays and connection timeouts. The problem compounds itself, triggering further retransmissions, until a new credit arrives and the bursts start again.

Van Jacobson of the Lawrence Berkeley Laboratories saw the problems this approach caused and proposed the slow-start approach that is the accepted behavior today. A sender doesn't expend the entire credit immediately; instead, it sends a smaller portion of it and keeps increasing the size until getting the feedback of a retransmission to find the network's tolerance at that point. Then the sender stays at that level to minimize interference among all the active connections. RFC 2581 describes these algorithms that are required for TCP implementations. This was a significant breakthrough in understanding early performance problems and mitigating them.

The approach has limitations in today's world because it doesn't recognize the relative priority of the connections; a rogue connection that doesn't adhere to the slow start approach can still cause degradation in other connections. In addition, slow start does not offer any way of providing different bandwidth guarantees or priorities to different flows.

The basic TCP mechanisms are not easily changed with the size of the current installed base. Instead, there are approaches that attempt to neutralize the undesirable characteristics of TCP connection behavior.

Rate Control

Rate control is a QoS strategy that helps regulate a set of TCP flows with a range of forwarding needs. Rate control regulates the introduction of traffic into the transport infrastructures to minimize the interference among flows competing for the same network resources and to set relative priorities for access to scarce resources. Coordinating the behavior of a group of connections is a large departure from the basic free-for-all of the original TCP design concepts and implementations. (See the preceding sidebar for more details.)

Rate control is analogous to the air traffic control system. You have probably had the experience of having your flight delayed because of congestion at the destination airport. You don't take off until there is an opening for you at the other end, the weather clears, or other situations improve. In contrast, the typical TCP philosophy discussed in the sidebar could be characterized like this: launch all the planes and hope they land before they run out of gas or the skies get too crowded.

Packeteer started this niche in 1996 when they introduced their PacketShaper product line, based on Packeteer's patented rate control technology. A PacketShaper device is placed between the sender and receiver, typically in front of servers or at customer-provider network demarcation points. This enables it to intercept the receiver's feedback (TCP flow credits and acknowledgements) and adjust the actual TCP connection behavior by manipulating the timing of protocol acknowledgments and flow control allocations. The PacketShaper inspects the packet headers for address, protocol, and application information, classifies them, and then applies the appropriate policies as the traffic flows through it. This approach leaves the connection endpoints unchanged and unaware of actions taken by the queuing appliance.

The PacketShaper smoothes bursty traffic and thus minimizes its impact on other service flows. TCP connections can be assigned a guaranteed bit rate by the PacketShaper, a function not explicitly enabled by the TCP specifications. The assigned rate is a minimum guaranteed rate, allowing for higher rates when additional bandwidth is available. If a flow for a traffic class cannot get the required bandwidth guarantee, the connection request can be refused, or the connection can be established without a guarantee.

The PacketShaper can also block service flows from the network if desired. A discard policy blocks connection attempts and discards packets without notifying the user. The granular classification lets the PacketShaper redirect web users to an "error" URL that informs them of the blockage. This technique lets administrators keep unwanted flows off of their networks or allow them only at specified times.

Queuing

Queuing can be used to reorganize the traffic streams passing through the queue. For example, a low-priority packet is queued and held if higher-priority traffic is waiting to be forwarded. An arriving high-priority packet is placed in the queue ahead of lower-priority packets.

In queuing-based QoS, the packets in an arriving flow are inspected and assigned to a class. All flows that are members of a class share a queue. Packets are transmitted from the queue based on relative queue priority and rules of fairness among queues to ensure that flows have enough (even if minimal) resources to continue operations.

Class-based queuing (CBQ), originally developed by Sally Floyd and others at the Lawrence Berkeley Laboratory, is an attempt to provide fair allocation of bandwidth without requiring massive amounts of processing power in the network devices. Each class of user is guaranteed a certain minimum bandwidth, and any excess bandwidth is allocated according to rules set up by the network administration. Specific implementations of CBQ have been designed with an entire hierarchy of classes, with, for example, excess capacity redistributed within each branch of the hierarchy as much as possible.

A drawback to CBQ is that there is no fairness within the class. A large number of packets from other members extends the waiting time for following packets from other flows. This causes inconsistency in forwarding and possible quality fluctuations.

Weighted fair queuing (WFQ), which is more complex than CBQ and requires much more processing power, can be used to provide absolute guarantees of maximum latency.

Over-provisioning and Isolated Networks

Decreasing LAN and WAN bandwidth costs have led many organizations to adopt a design by hope approach; they use enough bandwidth and hope that the performance problems are solved. Although it is true that over-provisioning capacity makes some management tasks easier, a design by hope approach does not guarantee that the required service levels can always be delivered, in no small part because it represents a gamble on capacity, unburdened by analysis or optimization of the demands on that capacity.

There are many situations that such over-provisioning cannot solve by itself. For example, using a noncritical, bandwidth-intensive application at the wrong time of day could steal resources from other critical business services. Low-priority junk e-mail and the highest priority real-time video traffic receive the same service. This might not be a problem in a network owned by a small group, but in most larger networks, the groups using the network for e-mail will start complaining about bearing the cost for an over-provisioned network that has to support another group's real-time speech and video traffic. Failures and disruptions will also occur from time to time and may temporarily overstress parts of the network.

Over-provisioned networks also have difficulty scaling, especially if wide-area links are involved. Any aggregation point in the network is a place where temporary congestion might occur, resulting in unanticipated packet loss.

Nevertheless, over-provisioning is still a very popular option. It's extremely simple, and problems are rare in most cases. If a critical application exists that needs priority over other applications, it's often easier to create a completely separate, isolated network for that application instead of implementing true QoS technologies.