Chapter 6. Business Continuity for Mission-Critical Applications | IP Storage Networking: Straight to the Core

Business continuance encompasses several areas of an organization. Some of the factors taken into account when planning for business continuance include risk assessment, location specifics, personnel deployment, network infrastructure, operational procedures, and application/data availability. This chapter's main focus is in the areas related to applications and data availability.

Varying degrees, or levels, of availability can be achieved for applications based on the tools and technologies employed. The levels of availability range from simple disk failure resilience to quick business resumption after a complete data center outage . The tools and technologies range from disk mirroring to sophisticated automated wide area failover clusters. Of course, cost and complexity increase in proportion to increasing levels of availability.

To better understand the levels of availability and the associated array of tools and technologies, we start with the simple case of an application residing on a single server with data on nonredundant DAS and walk through the process of progressively building higher levels of availability for this application.

Backing up the data on a regular schedule to a tape device provides a fundamental level of availability. In case of data loss due to hardware failure or logical data corruption, the application and data can be made available by restoring from backup tapes.
Since storage disks have some of the lowest mean time between failures (MTBF) in a computer system, mirroring the boot disk and using RAID storage provides resilience in case of disk failure.
In the event of a system crash and reboot, using a quick recovery file system speeds up the reboot process, which in turn minimizes the amount of time the application remains unavailable.
Beyond single-system availability, deploying high-availability and clustering software with redundant server hardware enables automated detection of server failure and provides transparent failover of the application to a second server with minimal disruption to end users.
RAID storage and high-availability software do not protect applications from logical data corruption. Although data can be restored from backup tapes, the recovery process could be cumbersome and time-consuming . Keeping online point-in-time copies of data ensures a more timely recovery.
The next level of availability deals with data center outage. This covers the same realm of disaster recovery and disaster tolerance. In the event of a data center outage, replicating data over a WAN to a remote site server ensures application availability in a relatively short time period. This assumes that restoring data from an offsite backup tape is not viable due to the time involved.
Most methods of remote data replication provide a passive secondary standby system. In the event of a primary data center outage, the use of wide-area failover software, which provides failure notification and automated application recovery at the secondary site, ensures a relatively quick and error-free resumption of application service at the secondary site.

These basic levels of availability progress sequentially in Figure 6-1. Fundamentally, providing levels of availability to applications involves managing replicas of data and designing classes of storage for applications and data. Business continuance planning involves identifying key applications; associating metrics with these applications to determine the level of availability required for each application; and implementing tools, technologies, policies, and procedures to attain those levels of availability.

Figure 6-1. Increasing levels of availability requirements.

graphics/06fig01.jpg