System Availability Profiles

   

Companies with lower access considerations (businesses that aren't open all hours, such as retail chain stores) might have fewer availability requirements than, for example, businesses that do national banking, government agencies, and the health care industry. The availability needs of the data center equipment should be determined in the project scope. Knowing which devices, or groups of devices, are mission-critical (needed 24x7x365) and which devices are in any availability level below mission-critical is important in determining many aspects of data center design, primarily system redundancies. These include:

  • Device redundancies. The number of backup devices that must be available in the event of equipment failures.

  • Power redundancies. The number of feeds from different parts of the grid, the number of UPS systems, etc., that must be installed to make sure these systems stay running.

  • Cooling redundancies. The number of extra HVAC units that must be available in the event that one or more units fail.

  • Network redundancies. The amount of network equipment that must be available in the event of failure. The number of connections to your ISP. The number of network feeds needed to multiple ISPs in the event that one has a catastrophic failure.

In most situations, a data center won't have a single availability profile. Several jobs could be going on from machine to machine, and some tasks have greater availability levels than others, some highly critical. Some might need to be highly available, but are less critical. Determining risk of all the operations is key to making many design decisions.

Consider the following example. The Chekhovian Bank of Moli re decides to upgrade their computer systems and install a data center to keep up with their massive transaction needs. When deciding how to outfit the data center, the question of how available the equipment must be comes up. There are several operations of the data center and they all have different availability profiles. Historical data of the company's operations and general trends help to determine the availability profile of their machines.

The following graph shows the Chekhovian Bank's projected availability profile.

Figure 2-1. Availability Profile of the Chekhovian Bank Data Center

graphics/02fig01.gif

Here is an analysis of the this profile:

  • ATM transactions which are highly utilized (mission-critical) must be available around the clock. Redundant systems are essential.

  • Security and equities trading must be constantly available during business hours (mission-critical) and moderately available the remaining parts of the day. Redundant systems are essential.

  • Home loans are important but some occasional downtime won't be disastrous. Redundancy is a good idea, though this is where corners can be cut.

  • The Community Services Web site should be up and running around-the-clock so people can access the information, but this is a non-critical service and some downtime won't hurt. Redundancy is probably not worthwhile.

  • The Community Services email mailers are sent only once a week in the evening and, though important, it won't hurt the company if the mailers go out late on occasion. No redundancy is required.

Risk-assessment analysts are hired to look at each part of the profile to determine the cost of downtime in each area and help decide the best course of action. They determine that the servers for ATM transactions and equity trading are mission-critical. The cost of either department going down will cost the bank $500,000 per minute of down time. Using the RLU model, the data center designer can calculate that these systems require 200kW of electricity. The cost of a 200kW generator is $2 million. The cost of a 20-minute UPS for 200kW is $450,000. So, for $2.45 million the bank can provide power to its configurations. Since all it would take is a 5-minute outage to lose $2.5 million, a generator and a UPS are considered a viable expenditure.

The servers for the Home Loan portion of the bank require 100kW of power and the risk analysts determine that an outage to this department will cost $5,000 per minute. The cost of a 100kW generator would cost $1 million. A 20 minute UPS for 100kW would be $300,000. The risk analysts also went to the Artaudian Power & Electric Company and got historical information on power outages in the area during the last five years . This data shows that they will average 2 outages a year, but the duration of these outages will be less than ten minutes. Also, the ATM and equity trading groups need a 200kW 20-minute UPS. This UPS can be upgraded to a 300kW twenty minute UPS for only $150,000. At two 10-minute outages a year, the cost of this UPS upgrade will pay for itself in a year and a half. This upgrade is deemed viable but the 100kW generator is not, because it would take 200 minutes of outages of more than 20 minutes to recoup the expenditure.

The systems that run the Community Services web site and mailers represent no significant loss of revenue for the bank if they are down for even a few days. It is determined that no additional cost for increased availability will be approved for these systems.

The cost of services to increase availability is a continuum. Each step in increasing availability has a cost. At some point, the cost of the next step might not be worth the amount of system downtime. So, determining what the availability profile of a configuration will be is determined by the cost of having this configuration unavailable. As mentioned at the beginning of the "Budget" section, it is not about providing your customers with what they want. They always want it all. It's about how much money they are willing to spend to get what they want. It's a cost-effective trade-off.

   


Enterprise Data Center Design and Methodology
Enterprise Data Center Design and Methodology
ISBN: 0130473936
EAN: 2147483647
Year: 2002
Pages: 142
Authors: Rob Snevely

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net