Service Level Agreements | Lan Tutorial With Glossary of Terms: A Complete Introduction to Local Area Networks (Lan Networking Library)

Service Level Agreement (SLA) details the responsibilities of an IT services provider (such as an ISP or ASP), the rights of the service provider's users, and the penalties assessed when the service provider violates any element of the SLA. An SLA also identifies and defines the service offering itself, plus the supported products, evaluation criteria, and QoS that customers should expect.

Enterprises that use outsourced services, including those from ASPs and telecommunications vendors , rely on SLAs to guarantee specific levels of functionality, network bandwidth, and uptime. In fact, research firm IDC (www.idc.com) notes that nearly 97 percent of the large companies (those with 2,500 or more employees ) it queried required an SLA for network availability in the next 12 months. And why not? SLAs are the key to ensuring consistent QoS, performance, and uptime in business-critical computing environments.

SLAs first came into prominence in 1998, when the Frame Relay Forum released its "Service Level Definitions Implementation Agreement," or FRF13. The document's guidelines defined acceptable parameters for several key characteristics of frame relay service, such as frame transfer delay, frame delivery ratio, data delivery ratio, and service availability (or uptime).

As important as the guidelines themselves were-outlining the specific performance and availability metrics that users of frame relay services could expect-FRF13 also served a more important purpose: It indicated clearly that frame relay providers finally felt confident that they could meet standards they themselves had authored , thus assuring customers of their ability to deliver credible service.

Providers of various other outsourcing services have followed suit, and the SLA industry has burgeoned. In fact, IDC predicts that the market for managed- and hosted-service SLAs will grow from $278 million in 1999 to $849 million by 2004.

Generally , SLAs complement other contractual agreements that cover a variety of details, including corrective actions, penalties and incentives, dispute-resolution procedures, nonconformance , acceptable service violations, reporting policies, and rules for terminating a contract.

These contracts generally fit under what some analysts call Service Level Management (SLM), which provides managing and service contract capabilities. Currently, SLM tools on the market, such as FireHunter from Agilent (www. agilent .com) and Eccord Enterprise from Eccord Systems (www.eccordsystems.com), provide flexible ways to define SLAs and monitor their effectiveness.

An SLA's Components

The ASP Industry Consortium (www.allaboutasp.org) has identified four areas that require detailed SLAs: the network itself, any hosting services supplied by the vendor, the applications it hosts and manages , and "customer care," which includes help desk services.

Each area contains its own set of elements, metrics, typical industry ranges, and criteria for calculating these metrics. For instance, the network SLA would include details on bandwidth, performance, and QoS. SLAs should also detail the nature and types of tools required for users and service providers to monitor and manage them.

Network SLAs

The elements of a network SLA should cover the characteristics of the network itself, connection characteristics, and network security. The network SLA identifies the IP performance levels that a service provider guarantees in the course of delivering application services to its customer. While some enterprises accept a "best effort" delivery standard via the public Internet, others demand that their providers offer service over private IP networks that allow specific guarantees for application availability on a customer-by-customer basis.

A network SLA should define the type of network infrastructure that the service provider will deliver. Understanding the nature of a network's physical components helps providers set customer expectations on the performance levels they'll receive. The network SLA also spells out network availability, measured in percent of uptime, and throughput, measured in bits per second.

While 100 percent uptime might be every enterprise's goal, 99.5 percent to 99.9 percent are more realistic averages. One key element of a network SLA is specifying penalties for downtime during critical business hours versus overall downtime: For instance, downtime at 2 a.m. may not disrupt the typical enterprise's business, but it could be unsatisfactory in an e-commerce environment.

When specifying through-put, a network's capacity is detailed in the capacity of the backbone connections within the network's core . Typically these run from 56Kbits/sec (a dial-up connection) up to 10Gbits/sec (also known as OC-192).

Another key part of guaranteeing network service is the connection SLA, which spells out acceptable data losses and data latency (or data delays), plus bandwidth provisioning. In the past, this was alluded to as the basic bandwidth-for example, 56Kbits/sec, T-3, OC-3, and so on-for which customers are billed.

According to the ASP Industry Consortium, few service providers actually detail provisions for data loss (which results from dropped packets in saturated IP networks) in their SLAs; those that do will often guarantee 99 percent packet-delivery rates. While real-time applications such as Voice over IP (VoIP) or interactive media couldn't operate effectively at such a loss rate, packet loss in the 5 percent range is acceptable for typical Web browsing.

Data latency, as with data loss, is critical in VoIP and multimedia environments where delays must not impact end- user performance; real-time interactive applications require response times of 100 milliseconds (ms) or less. The ASP Industry Consortium notes that Web browsing, on the other hand, remains viable at 250ms. In practice, U.S. and European network providers often guarantee a round-trip delay of 85ms between the routers in their core networks.

Another key part of guaranteeing network services is the security SLA, which defines the applications, data, and services that are protected while in transit over the service provider's network. Unlike the hard-and-fast metrics of a network SLA, a security SLA is determined by customer requirements and is more subjective .

Issues specific to this SLA include the level of encryption, such as DES or TripleDES; the point in the network, such as the access point, where data is encrypted; use of public or private encryption keys; and whether certain applications require encryption at all. A network security SLA must also take into con-sideration how the encryption/decryption process impacts network performance and QoS. Finally, it should include severe penalties for security breaches.

Hosting Slas

Hosting SLAs ensure the availability of server-based resources, rather than guarantee server performance levels. As such, hosting SLAs should cover three critical areas: server availability, administration of servers, and data backup and the handling of storage media.

A server availability SLA, measured in percentage of uptime, should guarantee a minimum of 99.0 percent uptime, based on a rolling 30-day period. Although a hosting SLA's objective should be to deliver always-on (100 percent) server availability, 99.5 percent to 99.9 percent uptime is more realistic.

The server-administration component of a hosting SLA details the management responsibilities of a hosting service. Specifically, these spell out the acceptable response times for restoring failed servers, as well as define metrics for performing data backups .

For example, a hosting SLA should mandate that a host provider respond to a restoration request for a failed server within a set period of time (such as one or two hours); it should also guarantee that the server will be returned to service within another specified period (such as 12 to 24 hours).

In addition, it should outline the percentage of scheduled data backups that will actually be conducted ; ASP Industry Consortium guidelines for this indicate that 99 percent of planned backups should be completed. The data backup SLA should also specify frequency of backups-a full nightly backup is typical-and require that the hosting service protect backup tapes by storing them offsite for a predetermined time period (such as 30 or 60 days).

On a higher level, a data backup SLA might also require the hosting service to create and regularly test an overall disaster-recovery plan. This could include contingencies for "hot site" functionality, which would give a customer access to temporary computing facilities when the customer's own site is unavailable due to a catastrophic event, such as a hurricane or an earthquake.

The Application Sla

Applications generally utilize a variety of OSI Transport-layer services. For instance, Web-based applications rely on TCP/IP, which provides specific services such as file-transfer functions. This interaction can impact application SLAs in several ways.

Most importantly, unlike traditional network and server SLAs, where lower-layer ( layers 2, 3, and 4 in the OSI model) cell /packet/frame metrics are easy to define, application SLAs are impacted not only by the transaction-processing execution of the application itself but also by the delays introduced by lower-level protocol error-handling procedures.

Thus, application SLAs require the institution of application-specific metrics-that is, definitions of performance levels that relate to application utilization. For example, an application SLA should define the percent of user interactions, such as downloads or data requests , to be executed without failure.

It should also define the acceptable time lapse between a user's request for data and the moment the updated data screen appears, as well as an acceptable bit-per-second rate for data transfer in a transaction session. The time-lapse guideline works in conjunction with the execution guideline, ensuring, for example, that while a download is deemed successful even if it takes several hours, it would still violate the SLA for taking so long.

In The Penalty Box

As noted, effective SLAs levy penalties against service providers who violate the terms of their contracts. Generally, these come as credits against future service.

In addition to refunds for lost time or poor performance, penalties for SLA violations should also consider the impact of a violation on an enterprise's business. It is, for instance, unreasonable for a service provider to offer a refund just for a specified amount of time lost when an enterprise suffers financial loss because of an SLA violation.

Lisa Erickson-Harris, a senior analyst with Enterprise Management Associates (www.ema.com), says she has heard of cases in which a service provider actually issued a check for a violation. That, however, is rare, so enterprises should ensure that their SLAs provide penalties that cause their service providers significant "pain."

SLA Management Tools

Tere' Bracco, an analyst in research firm Current Analysis' (www.currentanalysis. com) enterprise infrastructure group , warns it is "buyer beware" when negotiating and managing SLAs. Consequently, she recommends that enterprises require service providers to offer them comprehensive tools for monitoring and managing their SLAs.

The best SLA tools let enterprise IT managers look at network performance the same way the service provider does. SLA management tools such as ViewGate Networks' (www.viewgate.com) Inteligo, typically deployed by the service provider but used by the enterprise via a Web interface, allow IT personnel to monitor network performance and manage their network SLAs on the fly, just like the service provider itself.

SLA tools should also monitor and manage an SLA's metrics in real time, rather than providing mere historical views of past performance.

Resources

The ASP Industry Consortium Web site (www.allaboutasp.org) contains a wealth of information on Service Level Agreements (SLAs), including an excerpt from a 75-page white paper on SLAs developed by the group.

The Information Technology Association of America (ITAA) Web site (www.itaa.org) also offers a variety of useful information about SLAs. This includes presentations such as "Managing Technology to Deliver SLAs" and an ITAA SLA Library available only to members .

You can find templates for sample SLAs at www.nextslm.org, a Web site developed and maintained by several networking vendors.

The book Foundations of Service Level Management , (Sams; ISBN: 0672317435) by Rick Sturm, Wayne Morris, and Mary Jander, provides recommendations for Service Level Management (SLM) strategies and SLAs.

This tutorial, number 155, by Jim Carr, was originally published in the June 2001 issue of Network Magazine.