Measuring Network Availability

Availability is expressed as a percentage and is measured by a ratio of two numbers: MTTR and MTBF. It is calculated using the following:

 MTPF Availability = [MTBF / (MTBF + MTTR)] x 100%

The same availability number can be reached using the formula shown here:

 Availability = [Uptime / (Uptime + Downtime)] x 100%

MTBF can be represented as the total uptime of a network, and MTTR can be represented as the total downtime of a network. For example, suppose that a car is available for use 24-hours a day for a full year, totaling 8760 hours. This car experienced some engine issues that kept the car from being used for three days, or 72 hours. Using these numbers, the availability of the car can be measured as shown here:

 Uptime = (Total Hours   Downtime) Uptime = (8760   72) = 8688 Availability Formula (Uptime)= [(8688) / (8688 + 72)] x 100% = 99.2%

Given the previous formula, the car's availability is measured at 99.2 percent.

Network availability is measured in the same fashion; the Uptime (or MTBF) is divided by the sum of the Uptime and the Downtime (MTBF + MTTR), and then multiplied by 100 for the percentage result.

Networks always available are measured as being available 100 percent of the time. This 100 percent goal is for all intents and purposes unattainable because of planned outages for network maintenance, unplanned outages such as a carrier network link or hardware failure, or for any other reason that a network or component can fail, such as a power outage. It is a function of the network designer to mitigate these situations as much as possible.

Network designers use the required availability given to them by the users, network administrators, or the applications themselves, to determine how much protection and redundancy to build into the network. Table 19-1 associates this availability requirement with an availability ratio, measured over the course of a 365-day year.

Table 19-1. Network Availability Requirement Association
Availability	24 x 7 Downtime	Single Shift Downtime
90%	36.5 days	2.5 working days
99%	87 hr, 40 min	20.8 hr (2.5 working days)
99.9% (Three 9s)	8 hr, 45 min	2 hr, 5 min
99.99% (Four 9s)	52.6 min	12.5 min
99.999% (Five 9s)	5 min, 15 sec	75 sec
99.9999% (Six 9s)	31.6 sec	7.5 sec

Consider the car example. The availability was measured for the entire car. If the mechanic wanted to improve the car's availability, it is necessary to analyze the cause of the car's downtime. This analysis is done by breaking the car into several components, such as the engine or transmission. The car's total availability then is measured by multiplying the availability of the individual components, as illustrated in the following formula:

 Availability_TOTAL = (Availability_ENGINE) x (Availability_TRANSMISSION)

Using the earlier example of 72 hours downtime, engine-related issues caused 42 hours and transmission-related issues caused 30 hours of downtime, as shown here:

 Availability_ENGINE = (8718) / (8718 + 42) x 100% = 99.5% Availability_TRANMISSION = (8730) / (8730 + 30) x 100% = 99.7%

The following formula shows the total availability of the car based on the availability of the engine and transmission:

 Availability_TOTAL = (99.5%) x (99.7%) = 99.2% [Availability_TOTAL = ((.995) x (.997) = .992) x 100 = 99.2%]

An automobile mechanic can drill down another level in the car to determine which engine and transmission components are affecting the car's availability, and then build in appropriate safeguards as necessary; that is, more frequent tune-ups, oil changes, and so on. Just as an auto mechanic can drill down into a vehicle's components, network administrators/designers can do the same to determine where to build in network safeguards.

The following two-node network (see Figure 19-1) is used to illustrate the following availability measurements.

Figure 19-1. Two-Node Network

graphics/19fig01.gif

Availability measurements can be taken from any of these points, as shown in Figure 19-2.

Figure 19-2. Two-Node Network

graphics/19fig02.gif

The total network availability for this "Two-Node Network" is measured as follows:


 Availability_TOTAL = (Availability_ROUTERa)x(Availability_CABLEa)x(Availability_CSU/DSUa)x(Availability_SERIALa)x(Availability_WAN )x(Availability_SERIALb)x(Availability_CSU/DSUb)x(Availability_CABLEb)x(Availability_ROUTERb) Availability_TOTAL = (.999)x(.9999)x(.9995)x(.98)x(.98)x(.98)x(.9995)x(.9999)x(.999) Availability_TOTAL = (.9381) x 100 Availability_TOTAL = 93.8%

Up to this point, the equations have been based on linear networks; that is, networks with no redundant or backup components. Higher availability of a network can be achieved by adding redundancy at key points in the network (identified in the preceding equation) and deploying a backup solution, such as failover links or hardware.

For example, using the same two-node network illustrated in Figures 19-1 and 19-2, the network administrator decides that the 94 percent (.98 x .98 x .98) availability of the WAN and its serial links is not meeting availability requirements. The network administrator decides to add backup serial links on both sides of the network service provider's WAN, improving network access and uptime for users on each end.

Measuring the availability of a network with redundant components is a little bit different. The following network (see Figure 19-3) illustrates two (equal) paths between Router A and Router C.

Figure 19-3. Availability with Redundant Components

graphics/19fig03.gif

The following list details the availability percentage of each router shown in Figure 19-3.

Using this network, the following availability percentage is given:

Router A = 99 percent (0.99)
Router B = 97 percent (0.97)
Router C = 98 percent (0.98)
Router D = 95 percent (0.95)

 Availability_ABC = ((0.99) x (0.97) x (0.98)) x 100 = 94.1%

The availability of this network, without the redundant router (Router D) is 94.1 percent. However, this network has two paths between Router A and Router C; one path through Router B (Path ABC), the other path through Router D (Path ADC).

To compute the availability of the network with the redundant piece (Router D), the availability of Router B and Router D must be computed to determine a single variable. If the availability statistics for Router B and Router D were not combined, and the network availability was calculated linearly, the availability statistics are skewed, as the following demonstrates:

 Availability_ABCD = ((0.99) x (0.97) x (0.95) x (0.98)) x 100 = 89.4%

However, both Router B and Router D are not used simultaneously; Router D is the backup for Router B. To determine the availability of Router B and Router D combined, the following formula is applied:

 Availability_B+D = 1 - ((1 - AvailabilityB) x ((1   AvailabilityD)) Availability_B+D = 1 - ((1   0.97) x ((1   0.95)) Availability_B+D = 1 - ((0.03) x ((0.05)) = 1   0.0015 = 99.85 Availability_B+D = (99.85) x (100) = 99.85%

Now that the availability of the redundant piece has been computed, this is injected into the calculation for the entire network, as follows:

 AvailabilityABCD = ((0.99) x (0.9985) x (0.98)) x 100 = 96.9%

NOTE

Adding redundant pieces to a network to improve availability works only if the redundant and primary devices do not share common failure points; for example, power supplies, local access, or routing across the network service provider backbone. To mitigate the risk of a network failure, end-to-end diversity must be maintained from customer entrance facility on one end of a circuit to the entrance facility on the other end.

This same principle of end-to-end diversity applies to local area networks (LANs) as well as wide area networks (WANs). LAN diversity is applied in the form of dual-NICs for servers and workstations, diverse media backbone (such as FDDI or dual-Ethernet), or dual routers or switches (with dual power supplies).

Table 19-1. Network Availability Requirement Association

Figure 19-1. Two-Node Network

Figure 19-2. Two-Node Network

Figure 19-3. Availability with Redundant Components

NOTE