Availability | MCSD Self-Paced Training Kit: Analyzing Requirements and Defining Microsoft .NET Solution Architectures, Exam 70-300: Analyzing Requirements and ... Exam 70-300 (Pro-Certification)

Availability is a measure of the amount of time a system or component is online, ready to perform its specified function. An example of an availability requirement for a highly available computer or system is one that is up and operating with an acceptable service level at least 99.9% of the time.

Although the focus of this section is availability, reliability requirements are also covered because the requirements for availability and reliability share many of the same characteristics. Reliability is the capability of a component or service to perform its required function and be free from failure at a stated instant or over a stated period of time. So availability involves reliability as well as the time required to bring a system back to normal operations after it goes offline because of failure (or for planned maintenance or an upgrade).

Understanding reliability and availability issues and solutions for an application that has not been built is not an easy task. Estimating the actual required availability that satisfies the anticipated business service requirements and meets budget and schedule expectations is often difficult. Start by analyzing the currently running application to discover how often and where failures occur, the root causes of these failures, and possible ways to reduce or eliminate them. Determining how many customers to expect and what time of day they use your application is also helpful. Having access to this information is invaluable in determining requirements for the new application.

In deciding the appropriate level of availability for your application, you need to understand the expected environment by answering these questions:

Who do you expect your customers to be, and what are their expectations?
How much downtime is acceptable and during what times of the day will downtime be tolerated? Downtime during peak business hours is usually never tolerated, but is a 15-minute offline window at 4 a.m. for system backups acceptable?
Do internal company processes depend on the service?
What is the schedule and budget?

Depending on how the application is used, very high availability might cost more than it is worth. Many applications simply do not require special engineering and redundant hardware. Misunderstandings between users and the IT organization about business requirements for availability and reliability can cause unnecessary expense and user dissatisfaction. All interested parties must understand the business requirements and the associated cost-benefit factors and then select an appropriate availability level. If the requirements say 99.5% availability, someone should ask "Is this really necessary?" Instead of trying to create an application that never fails, an alternative might be choosing less availability and reliability and simply accepting the consequences of an occasional application failure. Remember, before you try to create an application that never fails, be sure somebody needs it. At the same time, if your application is Web-based or mission critical and needs to be available around the clock because any interruption will cost your company money, obviously your requirements will revolve around high availability and reliability.

Remember that all requirements determined in this phase must also be measured in terms of their cost versus their potential benefits before you decide to include them in the system specifications.

The following is a typical list of the major causes of applications becoming unavailable or unreliable:

Inadequate testing
Change management problems
Lack of ongoing monitoring and analysis
Operational errors
Weak program code
Interactions with external services or applications
Different operating conditions (usage level changes, peak overloads)
Hardware failures (disks, controllers, network devices, servers, power supplies, memory, CPU)

This list is not complete because many causes, such as natural disasters or other unusual events, can't be avoided. However, being aware of these causes, including hardware failures, is important, especially during the requirements definition. By reviewing this list, you should be able to develop requirements that can eliminate (or at least minimize) these causes affecting your application's availability.

Because many issues of availability (and reliability) are directly related to choosing the right hardware and software technology infrastructure, it might seem that the requirements-gathering phase has a limited effect on maximizing an application's availability. To a certain extent, this might be true, but the commitment to build an application with high availability must exist throughout all stages of the system development life cycle. You need a complete set of requirements to ensure that the correct decisions about hardware and software can be made. Even if you have an existing infrastructure, you'll need those new requirements to verify that the existing infrastructure is adequate. The earlier you address availability and reliability, the fewer headaches you'll have later.