8.5 Performance Mechanisms

As presented Chapter 7, performance mechanisms discussed here include QoS, resource control (prioritization, traffic management, scheduling, and queuing), service-level agreements (SLAs), and policies. These mechanisms incorporate the general mechanisms shown in the previous section (see Figure 8.1).

Subsets of these mechanisms are usually used together to form a comprehensive approach to providing single-tier and multitier performance in a network. These mechanisms provide the means to identify traffic flow types; measure their temporal characteristics; and take various actions to improve performance for individual flows, groups of flows, or all flows in the network.

8.5.1 Quality of Service

QoS is determining, setting, and acting on priority levels for traffic flows. QoS is usually associated with IP but is used here to define a class of mechanisms that are used at multiple layers in the network to provision and apply priority levels. This class includes IP QoS (including multiprotocol label switching [MPLS]), type of service (ToS), ATM class of service (CoS), and frame relay committed information rate (CIR). In this section, we will focus on IP QoS.

For IP-based traffic, there are two standard types of QoS: differentiated services (DiffServ, or DS) and integrated services (IntServ, or IS), intended to support two views of network service. DiffServ approaches QoS from the perspective of aggregating traffic flows on a per-hop basis based on traffic behavior, and IntServ approaches QoS from the perspective of supporting traffic flows on an individual end-to-end basis.

In DiffServ, IP packets are marked in the ToS byte for IPv4 or in the traffic class byte in IPv6 so that they will receive the corresponding performance at each network device (or hop). DiffServ defines a set of values (termed differentiated services code points [DSCPs]) for classes of traffic flows to be used by resource control mechanisms. An important concept of DiffServ is that it applies to aggregates of traffic flows (e.g., composite flows), not individual traffic flows.

The main reason for this is for scalability (particularly across the Internet, but could also be applied in a large enterprise environment). If in a network architecture and design all flows requiring priority service were treated individually, one trade-off in the network is the amounts of resources (e.g., memory in network devices) required to store and maintain state information for each individual flow across the network. This resource requirement grows geometrically with the network and therefore does not scale well. By aggregating flows into traffic classes, storing and maintaining state information becomes more tenable. State information, or state, is information about the configuration and status of flows or connections. Examples include addresses (IP or MAC-layer), time (duration of flow/connection, idle time), and temporal characteristics (data rates, packet losses). A detailed discussion of state information can be found in Chapter 10.

There are three traffic classes for DiffServ: best effort, assured forwarding (AF), and expedited forwarding (EF). AF and EF are the preferred traffic classes and are based on the types of performance they require. EF is usually targeted toward traffic that has delay requirements (e.g., real-time or interactive), whereas AF can be used for traffic with both delay and capacity requirements (e.g., multimedia or tele*services).

There are times, however, when a traffic flow needs to be treated individually. IntServ defines values and mechanisms for allocating resources to flows across the end-to-end path of the flow. IntServ is closely tied to the flow nature of a network by placing importance on supporting a flow at every network device in the end-to-end path of that flow. Recall from our discussion of flow analysis that end-to-end paths are defined by you and will vary depending on what you are trying to accomplish. In defining where end-to-end applies for particular flows (e.g., flows with guaranteed requirements), you are determining where mechanisms such as IntServ could be applied.

As mentioned with DiffServ, however, the advantages of IntServ come at a price. IntServ requires support on the network devices across which the flow travels, and it requires resources (e.g., memory, processing, bandwidth) for each flow. Support across multiple networks implies coordination of service between those networks. And requiring resources for each flow means that it will not scale well in areas where flows converge (e.g., the core area of a network).

IntServ also requires a mechanism to communicate flow requirements, as well as the setup and teardown of resource allocations, across network devices in the end-to-end path of a flow. Such signaling is usually provided by the Resource Reservation Protocol (RSVP). Other signaling mechanisms, such as with MPLS, are also being developed for this purpose.

RSVP is used by network devices (including user devices) to request specific QoS levels from network devices in the end-to-end path of a traffic flow. Successful RSVP requests usually result in resources being reserved at each network device along this end-to-end path, along with state information about the requested service.

Thus, a mechanism such as this is best applied in an environment where the network administrator has control of the end-to-end path of the flow, such as within an enterprise environment. Although IntServ is often shunned because of its complexity and scalability issues, it can be used, but it should be used only when there is a strong case for it (from the requirements and flow specifications), and then with an understanding of what will be required in terms of network and personnel resources to implement and maintain it.

A comparison of some of the functions and features of DiffServ and IntServ is presented in Figure 8.2.

Function/Feature	Differentiated Services (DiffServ)	Integrated Services (IntServ)
Scalability	Scalable to Large Enterprise of Service-Provider Networks	Limited to Small or Medium-Size Enterprise Networks
Granularity of Control	Traffic Aggregated into Classes	Per-Flow or Groups of Flows
Scope of Control	Per Network Device (Per-Hop)	All Network Devices in End-to-End Path of Flow

Figure 8.2: Comparison of DiffServ and IntServ.

DiffServ and IntServ can be applied individually or together within a network, and it is possible to combine these mechanisms in a variety of ways. For example, DiffServ may be applied across the entire network by itself, or both may be applied in different areas of the network. In addition, both mechanisms may be applied to the same areas of the network, targeted at different types of flows. In this case DiffServ is first applied to the network, and IntServ is then overlaid onto it.

If we consider the access/distribution/core architectural model (presented in Chapter 5) in the context of IP QoS, we can begin to see where these mechanisms might apply. The access portion of the network is where most flows are sourced and sinked, and as such, this is where they can be (most easily) supported individually. The core of the network is where bulk transport of flows occurs, where it would make the most sense to aggregate them. Thus, one way to look at prioritizing flows is presented in Figure 8.3, where flow aggregation via DiffServ is applied at the core portion, perflow service via IntServ is applied at the access portion, and some translation occurs between the two at the boundary between access and core, possibly at the distribution network.

click to expand
Figure 8.3: Where DiffServ and IntServ apply in the access/distribution/core model.

In Figure 8.3, IntServ is applied as an end-to-end mechanism, where the end points are defined between devices within each access (and possibly its serving distribution) area. Therefore, it is targeted toward those flows that remain within an access network or are sourced/sinked at the distribution network. For example, this would apply for client-server flows where the servers are located at the distribution network.

For flows that cross the core network, their performance would change from IntServ at the access and distribution networks to DiffServ at the core network, and individual state would be lost as a trade-off for scalability. However, flows that remain in the access or distribution networks would still get the full benefits of IntServ. And, importantly, IntServ could be used to signal the core network for resources for a traffic class.

It is also possible to apply both IntServ and DiffServ so that they work concurrently across the entire network. In a case like this (shown as the end-to-end flows in Figure 8.3), IntServ would be used only for a relatively small percentage of flows and would be used end-to-end. DiffServ would be used for other prioritized flows and could be aggregated at either the access or the distribution network.

DiffServ and IntServ are used to apply prioritization, scheduling, and resource control to traffic flows. How these mechanisms work is described in the following section.

8.5.2 Prioritization, Traffic Management, Scheduling, and Queuing

Prioritization, traffic management, scheduling, and queuing are at the heart of providing performance in a network. A performance architecture may include one or more of these mechanisms, in conjunction with QoS, SLAs, and policies, to provide performance to its users, applications, devices, and traffic flows.

These mechanisms are usually implemented in network devices such as routers and switches but can also be applied to the network as stand-alone hardware, such as in vendor-specific admission control and traffic management devices.

Prioritization

Prioritization is the process of determining which users, applications, devices, flows, and connections get service ahead of others or get a higher level of service. Prioritization is necessary, for there will be competition between traffic flows for network resources. With a limited amount of resources available in any network, prioritization determines who gets resources first and how much is allocated.

Prioritization begins during the requirements and flow analyses. Priority levels for users, applications, and devices should be determined as part of the requirements analysis, and priority levels for traffic flows determined during the flow analysis process. For example, you may recall that in Chapter 4 we discussed how to prioritize flows based on the various parameters, such as the numbers of users supported and performance requirements.

For a performance architecture, there are two high-level views of performance: single-tier performance, in which capacity, delay, and RMA are optimized for all traffic flows, and multitier performance, in which capacity, delay, and RMA are optimized for one or more groups of traffic flows, based on groups of users, applications, and/or devices. Either or both of these views can be taken for a network architecture. As with our approach to DiffServ and IntServ, single-tier performance may apply across the entire network, with multitier performance in select areas or as an addition to single-tier performance.

These two views of performance imply that there may be multiple levels (tiers) of performance required by different groups of traffic flows. Whenever there are multiple levels of performance requirements in a network (and thus multiple groups of traffic flows), there will be a need to prioritize these traffic flows. Prioritization is ranking (determining a priority level) based on importance and urgency. Ranking may be applied to users, applications, devices, or their traffic flows. A rank or priority level indicates the importance and/or urgency of that user, application, device, or flow, relative to other users, applications, devices, or flows in that network. Such priority levels are often determined during the requirements and flow analyses.

The most basic or degenerate case of prioritization is when every user, application, device, or flow has the same priority level. Such is the case in best-effort networks. When a user, application, device, flow, or groups of these require performance greater than the general case, they will have higher priority levels. In addition, an individual or group at the same priority level as other individuals or groups may change its priority level due to the urgency of its work.

For example, most users on a network have a common set of applications and devices. They will often use a similar type of desktop or laptop computer, with a standard type of network interface. They will likely use email, Web, word processing, and other common applications. Such users may constitute a group of single-tier performance users, all with a single priority level. For some networks, this is as far as prioritization gets, and the performance architecture focuses on optimizing performance for everyone in this group. In some cases, however, even a single, single-tier performance group may have multiple priority levels. With the aforementioned example, priority levels can be used to allow preference to individuals who have not used the network for some time or who are willing to pay a premium for preferred access.

Priority levels may be based on the type of protocol (e.g., TCP versus User Datagram Protocol), service, or port number; by IP or MAC-layer address; or by other information embedded within the traffic. This information can be maintained in databases and coupled with policies and SLAs (discussed later in this chapter) as part of the performance architecture.

Priority levels are used by network devices to help determine whether traffic flows will be allowed on the network (admission control), to schedule traffic flows onto the network, and to condition flows throughout the network.

Traffic Management

Priority levels determine the relative importance and urgency of traffic flows and how each traffic flow will be handled within the network. Traffic management consists of admission control and traffic conditioning. Admission control is the ability to refuse access to network resources. Traffic conditioning is a set of mechanisms that modify (increase or decrease) performance to traffic flows, as a precursor to scheduling.

Admission control uses priority levels to change the behavior of network access. In a best-effort network without admission control, access to the network is democratic in that all traffic flows have a (more or less) equal chance to get network resources. With admission control, however, access is permitted, denied, or sometimes delayed based on the relative priority of that traffic.

An example of this is assigning a higher priority to real-time traffic flows, such as voice and video. In this example, voice and video traffic flows are given access before other traffic flows. When network resources dedicated to these flows are fully utilized, further flows are refused (blocked). Admission control is most often applied at access areas.

To understand traffic conditioning functions, we will follow traffic flows across a network device that implements traffic conditioning. As traffic flows enter a network device, there must be a mechanism to identify flows and distinguish between flows. Classification is the ability to identify traffic flows. The classifier will look at various parts of the IP packet, such as source and destination addresses, port numbers, or protocol types. In addition, the classifier may look deeper into a packet for the necessary information. For example, voice over IP (VoIP) signaling flows may be determined by looking for Session Initiation Protocol (SIP) identifiers (RFC 3261) within the packets. Upon identifying traffic flows that are important for that network, packets within these flows may be marked, or tagged with a priority level. Examples of marking included tagging packets with DSCPs for best-effort, AF, and EF priority levels.

Once traffic flows have been classified, they may be metered to determine their performance levels. Metering is measuring the temporal performance characteristics of a traffic flow, including traffic rates and burst sizes. Performance characteristics are measured periodically and compared with expected performance boundaries, which can be from SLAs and/or policies. Metering is most often a capability provided in network devices (e.g., routers and switches) as part of their performance implementation but can also be applied as a separate network device (e.g., some network management devices can provide metering support).

For example, a traffic flow may be metered over a period of 1 second. Each second, the peak data rate for that flow is compared to a capacity boundary of 1.5 Mb/s, which was input into the network device from an SLA developed for that traffic flow.

Metering a traffic flow can determine whether a flow is within performance boundaries (Figure 8.4). Conforming traffic is within performance boundaries; nonconforming traffic is outside of performance boundaries. Typically, no action is taken on conforming traffic. Conforming traffic is forwarded to the appropriate output queue (as determined by its priority level) and scheduled for transmission onto the network.

click to expand
Figure 8.4: Illustration of traffic metering at a network device.

When traffic is nonconforming, however (indicating that it is exceeding the specifications of an SLA), it is subject to shaping or dropping. Shaping is delaying traffic to change a performance characteristic, and dropping is discarding traffic. Nonconforming traffic may also be marked, with no other action taken. This is done so that network devices upstream (those receiving this traffic flow) can choose to shape or drop this traffic if necessary.

To shape nonconforming traffic, it may be sent to a shaper queue where delay is added before it is transmitted onto the network. By delaying traffic, a shaper queue changes the performance of that traffic flow. Consider an SLA for a traffic flow that specifies a peak rate of 1.5 Mb/s (Mb/s is shown as Mbits/second in the following discussion for consistency in units, in practice, rate is usually shown as Kb/s, Mb/s, or Gb/s). A meter is measuring that traffic flow and calculates a rate of

(200 packets/second) (1500 byte packets) (8 bits/byte) = 2.4 Mbits/second

This is compared with the SLA specification (1.5 Mbits/second) and found to be nonconforming. Subsequent packets are then forwarded to a shaper queue, where they are delayed by an average of 10 ms. As a result, only 100 packets can be transmitted per second, and the rate of that traffic flow becomes

(100 packets/second) (1500 byte packets) (8 bits/byte) = 1.2 Mbits/second

Shaping will continue either for a specified period or until the traffic flow is again conforming.

The most serious action that can be taken on traffic is dropping, or discarding, packets. This is done when a traffic flow is seriously exceeding its performance boundary or when the network device is congested to the point at which dropping packets is necessary. Traffic conditioning functions are shown in Figure 8.5.

click to expand
Figure 8.5: Traffic conditioning functions.

Scheduling

Once traffic has been prioritized and conditioned, it is forwarded to one or more output queues for transmission onto the network. Scheduling is the mechanism that determines the order in which traffic is processed for transmission. Scheduling uses priority levels to determine which traffic flows get processed first and most often.

Scheduling is applied at network devices throughout a network. In most network devices, such as switches and routers, scheduling is provided through network management or as part of the QoS implementation in that device.

Scheduling may be proprietary (enterprise specific) or standards based. Some commonly used standard scheduling algorithms include weighted fair queuing and class-based queuing. These algorithms provide some degree of fairness in queuing while allowing relative priority levels (weights).

The combination of QoS, prioritization, traffic management, and scheduling provides a comprehensive set of mechanisms that can be applied across a network to achieve various performance levels for traffic flows (Figure 8.6). As we will see, these mechanisms are closely tied to SLAs and policies, which are discussed in Section 8.5.3.

click to expand
Figure 8.6: Performance mechanisms act on network devices.

Queuing

We complete this section with a discussion of queuing. Queuing is storing IP packets (this can also be applied to Ethernet frames or ATM cells, but for the purposes of this discussion, we will limit it to IP packets) within a network device while they wait for processing. There may be several locations where packets are stored (queues) within a network device for each type of processing that the device is performing on each packet (e.g., holding packets received from the network, processing for QoS, holding packets for transmission onto the network).

A number of queuing mechanisms are available in network devices. Each mechanism is developed to achieve a particular objective in processing packets. For example, queue mechanisms may treat all packets the same way, may randomly select packets for processing, or may favor particular packets. In this chapter, we will briefly consider the following queuing mechanisms:

First In First Out (FIFO)
Class-based queuing (CBQ)
Weighted fair queuing (WFQ)
Random early detect (RED)
Weighted RED (WRED)

FIFO queuing is arguably the simplest queuing mechanism available. In FIFO, queuing packets are stored in a single queue. For an output FIFO queue, packets are transmitted onto the network in the order in which they were received (at the input queue).

In CBQ, multiple queues with differing priorities are maintained. Priority levels are configurable in the network device and indicate the performance levels required for each traffic type. Packets of each priority level are placed in their respective queues. Higher priority queues are processed before lower priority queues so higher priority traffic receives more network resources and thus has greater performance.

Like CBQ, WFQ assigns priorities (weights) to queues. Typically, with this mechanism, high-priority traffic flows are processed first and lower-priority traffic flows share the remaining resources.

Generally, when a queue becomes full (e.g., during periods of congestion), packets are dropped either from the beginning of the queue (head) or at the end of the queue (tail). In either case the dropping of these packets is likely to be unfair to one or a few traffic flows. As a result, RED was developed to randomize the packet-dropping process across a queue. In addition, RED will drop packets early (before the queue is actually full) to force traffic flows (i.e., TCP flows) to adjust by reducing their transmission rate.

WRED operates in the same fashion as RED but supports multiple priority levels (one for each queue) for dropping packets.

8.5.3 Service-Level Agreements

SLAs are (typically) formal contracts between a provider and a user that define the terms of the provider's responsibility to the user and the type and extent of accountability if those responsibilities are not met. Although SLAs have traditionally been contracts between various service providers (e.g., ISPs) and their customers, this concept can also be applied to the enterprise environment. In fact, the notion of customer and provider is becoming more common in enterprise networks as they evolve from treating networks as just infrastructure (cost-center approach) to treating them as centers for providing services to their customers (users).

There are two common ways to apply SLAs within a network. First, an SLA can be an agreement between network management/administration and their customers (the network users). Second, an SLA can be used to define the levels of services required from third-party service providers (e.g., cable plant providers, various service providers) for the network.

SLA performance elements may be as simple as a data rate (minimum, peak) and burst tolerance (size, duration) and can be separated into upstream (in the direction from the destination to the source) and downstream (in the direction from the source to the destination). Figure 8.7 shows upstream and downstream for a traffic flow, along with the data sources and sinks.

click to expand
Figure 8.7: Upstream and downstream directions.

Although these terms can apply for any flow, they are most commonly used in service-provider networks. In such networks, the sources of most traffic flows are servers on the Internet and most destinations are subscriber PCs at the service provider's access networks. For this case, downstream is from these servers (i.e., from the Internet) to the subscriber's PC and upstream is from the PC to the Internet. For example, Web pages downloaded from the Internet to a subscriber's PC generate TCP traffic downstream from a Web server to the subscriber PC, and TCP acknowledgments upstream from the subscriber PC to the Web server (Figure 8.8).

click to expand
Figure 8.8: Upstream and downstream directions for Internet Web traffic.

SLAs can include delay and RMA metrics for more complete provisioning of service. An example of an enterprise SLA is shown in Figure 8.9.

Network Service Description for My Enterprise
Service Levels:	Capacity Performance	Delay Performance	Reliability Performance
Basic Service	As Available (Best Effort)	As Available (Best Effort)	As Available (Best Effort)
Silver Service	1.5 Mb/s (Bidirectional)	As Available (Best Effort)	As Available (Best Effort)
Gold Service	10 Mb/s (Bidirectional) (Burst to 100 Mb/s)	Max 100-ms Round-Trip (Between Specified Points)	As Available (Best Effort)
Platinum Service	100/10 Mb/s Up/Down (Burst to 1 Gb/s)	Max 40-ms Round-Trip (Between Specified Points)	99.999% Uptime (User-Server)

Figure 8.9: Example of enterprise SLA.

This SLA specifies capacity, delay, and RMA levels for various types of users and their applications and devices. A basic, or best-effort, service level is shown, in which all performance characteristics are best effort. This service level is the default and would normally be provided to any user free of charge.

The other service levels (silver, gold, and platinum) specify increasing levels of capacity, delay, and RMA performance. Users who subscribe to these performance levels would likely be charged for usage. This may take the form of an internal charge within each organization, an allocation of resources, or both. For example, each valid subscriber to the platinum service (only a certain type of user may be allowed to use this service, based on the user's type of work and needs) may be allocated N hours per month. Usage exceeding N hours per month would result in a charge to that user's organization.

Figure 8.9 is an example of a fairly complex SLA and is used to illustrate how an SLA may be structured. In practice, SLAs are usually simpler, with two to three service levels. Depending on the requirements of an organization, however, an SLA can be more complex than the provided example.

An SLA is typically a contract between users and service providers about the types of services being provided. In this sense, an SLA forms a feedback loop between users and the provider. The service provider has the responsibility of monitoring and managing services to ensure that the users are getting what they are expecting (and possibly paying for) and that users are made aware of what is available to them.

Figure 8.10 shows that, when SLAs are added to the performance architecture, they provide a means for communicating between users, staff, and management about performance needs and services.

click to expand
Figure 8.10: Performance mechanisms with SLAs added.

For SLAs to be effective as feedback loops, they need to operate in conjunction with performance monitoring and trouble reporting/ticketing, as part of the performance or network management architectures.

8.5.4 Policies

Policies are formal or informal sets of high-level statements and rules about how network resources (and, therefore, performance) are to be allocated among users. They are used to create and manage one or more performance objectives. Policies complete the framework of performance for a network by coupling the high-level (e.g., management) view of how the network should perform with mechanisms to implement performance at the network devices (QoS) and feedback loops with users (SLAs) (Figure 8.11).

click to expand
Figure 8.11: Performance mechanisms with policies added.

Policies may describe which network, computing, storage, or other resources are available to users, when resources are available, or which users are permitted to access certain resources. In this sense, these policies are similar to policies for security or routing.

Policy information is often implemented, stored, and managed in policy databases kept on the network. Policy information is passed between databases and network devices using common open policy services and Lightweight Directory Access Protocol (LDAP).