12.7 Service Level Agreements

 < Day Day Up > 



Bandwidth management solutions are also useful for monitoring the extent to which carriers and service providers meet network performance guarantees, as written into SLAs. These are contracts between service providers and customers that define the services provided, the metrics associated with these services, acceptable and unacceptable performance levels, liabilities on the part of the provider, and what actions are to be taken in specific instances of noncompliance. Carriers and service providers write these documents in language that minimizes exposure to liability. Usually, the extent of liability is limited to credits that are applied to the customer’s next monthly invoice.

AT&T, WorldCom, and Sprint are among the largest carriers that offer guarantees on network performance that include metrics on packet loss and latency for their data services, which include Internet, frame relay, and ATM. SLAs are available for other services as well, from traditional T-carrier services to IPbased VPNs, intranets, and multicompany extranets. Service providers feel compelled to offer written performance guarantees to differentiate their offerings in the increasingly competitive market brought about by the Telecommunication Act of 1996.

SLAs are even becoming popular among ISPs as a means to lure business applications out of the corporate data center to outsourced Web hosting arrangements. To do this successfully requires that the ISP operate a carrier-class data center, offer reliability guarantees, and have the technical expertise to fix any problem, day or night. The SLA may also include penalties for poor performance, such as credits against the monthly invoice if network uptime falls below a certain threshold. Some ISPs even guarantee levels of accessibility for their dial-up remote-access customers.

Service-level management refers to the people, systems, and tools that allow the organization to monitor compliance with the SLA for each type of service.

Sometimes the service provider will make the tools available to the customer in the form of reports that can be accessed on its Web site. Depending on the service, the customer may choose to implement third-party tools that give it independent verification of SLA compliance.

It is not enough to have a clearly defined SLA and voluntary compliance to guarantee the service levels of various applications—there must be enforcement mechanisms in place to ensure the SLA is not violated. Enforcement can be automated with bandwidth management solutions, which are especially useful for IP networks. As more companies move to an intranet model that relies on information sharing and Web navigation, it is becoming necessary to ensure high-quality network services for mission-critical business applications. In addition, today’s IP network applications are both bandwidth intensive and time sensitive. They often require support for voice, video, and multimedia, which eats up scarce bandwidth. Bandwidth management tools ensure that users and applications share this resource appropriately.

The key issue to consider with regard to SLAs is having appropriate tools to monitor the performance of the service provider. There are a variety of performance metrics that require measurement to ascertain whether the carrier is delivering the grade of service promised in the SLA. These metrics differ according to the type of network service. A variety of vendors provide measurement tools, and each uses a different approach to address a narrow range of metrics. In being able to independently measure various performance metrics, however, companies have the means to effectively manage the SLA. This can be done with a bandwidth management system or an add-on module to such tools.

12.7.1 Performance Metrics

The SLAs from carriers and service providers define such performance metrics as latency, packet loss, and availability. These metrics are especially useful for describing the performance of packet services and comparing the offerings of different service providers.

Latency

This metric refers to the round-trip delay between the time a request packet is sent to a destination and the time a response packet is received. Latency is reported in milliseconds (ms). Carriers constantly measure the latency (speed) of core areas of their network using data collected by pings via the ICMP. Data is collected from designated routers in key network hubs worldwide, usually in 5-minute intervals. Monthly latency statistics are derived from averaging all samples from the previous month. Targets vary by carrier, but 65 ms or less for regional round trips within Europe and North America is considered very good performance.

Packet Loss

This is a measure of packets sent to a destination that do not elicit corresponding return packets. Packet loss is reported as a percentage of total packets sent. This should be a very low figure, such as 0.016. Instead of packet loss, some carriers use the metric packet delivery, which is also expressed as a percentage. For regional round-trip traffic within Europe and North America, for example, an SLA might promise a packet delivery rate of 99% or greater.

Availability

This metric is a measure of network uptime, or the time the network is actively handling customer traffic. Availability is reported as a percentage of the total uptime. For example, 99.999% uptime, often referred to as “five nines,” is viewed as the highest availability rate that can realistically be achieved. It translates to less than 5 consecutive minutes of downtime per year. A guarantee of 99.99%—“four nines”—availability ensures no more than 5 consecutive minutes of outage per month. The guarantee of 99.9%—“three nines”—availability means that there will be no more than 45 consecutive minutes of outage per month.

SLAs usually spell out conditions that do not qualify as “unavailability,” such as downtime resulting from network maintenance, circuits provided by other carriers, acts or omissions of the customer, or “acts of God,” such as civil disorder, natural cataclysm, or other occurrences beyond the reasonable control of the service provider.

Some carriers and service providers include additional metrics in their SLAs, such as provisioning time and restoration time.

  • Provisioning time. This metric refers to the agreed upon due date of a circuit. If the due date is missed, the recurring charge for that circuit may be waived for one month, for example. This applies to ATM or frame relay PVCs as well.

  • Restoration time. This metric refers to how quickly the carrier or service provider can return a circuit to full operation after a service-affecting outage. If a customer reports a frame relay service outage (even if the problem is with local access), for example, and it is not restored in 4 hours, as stipulated within the SLA, recurring charges for the affected ports and PVCs may be waived for 1 month.

12.7.2 SLA Compliance

This should not stop companies from preferring to do business with carriers and service providers that approach their SLAs proactively. After all, customers appreciate confirmation that the carrier or service provider is aware of the problem and that steps are being taken to achieve a prompt resolution. To maintain positive relations with their customers, most carriers and service providers understand the value of proactive network monitoring, which entails continuous surveillance to determine the potential of problems to arise so that steps can be taken to prevent their occurrence. Unfortunately, many carriers and service providers, such as smaller competitive local exchange carriers (CLECs) and ISPs, cannot afford the tools for proactive network monitoring and do not have the mechanisms in place to warn their customers of possible service-affecting events in a timely fashion.

Although SLAs exact financial penalties when network performance does not meet expectations, it is important to remember that they do not indemnify the organization for lost business. Particularly, such guarantees do not help much if the organization’s network were to go down for an extended period of time, as was demonstrated in April 1998 when AT&T’s frame relay network went down for 26 hours and in August 1999 when WorldCom’s frame relay network went out of service for 10 days. Both carriers tried to smooth things over with customers—AT&T by waiving frame relay service charges until it could be sure the problem would never reoccur and WorldCom by doubling the credits to 20 days to compensate users for the network outage. In both cases, many customers were left to fend for themselves in dealing with lost revenues and business opportunities, unless they had the foresight to have backup services like ISDN in place.

With regard to Internet performance, several reporting services are available on the Web, which periodically rank service providers according to various metrics. These services are not intended as SLA enforcement tools, but only to narrow down the choice of carrier or service provider. One reason why they cannot be used for SLA enforcement is that the measurements represent an “outside-in” view of a third-party and not the “inside” view of a customer. Thus, the rating service may show a carrier or service provider’s latency to be greater than 60 ms, while its SLA guarantees a latency under 60 ms.

Furthermore, ranking by monthly latency shows which providers offer the fastest-service but may not take into account the consistency of this performance. For businesses that depend heavily on Internet access, it will not matter how fast the service is if it is not available. This is why ranking service providers by “ reachability” or “packet loss” is a better methodology for corporate decision-making.



 < Day Day Up > 



LANs to WANs(c) The Complete Management Guide
LANs to WANs: The Complete Management Guide
ISBN: 1580535720
EAN: 2147483647
Year: 2003
Pages: 184

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net