The objective of business continuity is to facilitate uninterrupted business support despite the occurrence of problems. This depends on the problem, of course-it could be a small problem only effecting a single user application through an operative disk drive, or a complete system failure where the outage is caused by the entire SAN or NAS configuration being down. However, it could also be a site disaster affecting the entire data center. Certainly, the categorization of the problem is important, but the plans in place should be able to provide a road map, resources, and ability to hopefully recover from any incident.
Storage has traditionally played a key role in any recovery scenario. Without the data, the applications are useless. Consequently, most recovery scenarios center on making the data available as quickly as possible so business applications can continue the company's operations. Without stating the obvious, this requires the ability to replicate data throughout an infrastructure that enables everything from a micro recovery to a macro disaster site deployment.
Traditionally, with storage being tied directly to the server, this rendered storage problems and outages as associated problem instances-meaning they were viewed as a server outage and thus were combined with all the problems and outages associated with a server that's down or inoperative. As storage networking configurations create their own infrastructures , supporting more independent external components , they become active participants in the management disciplines. This redefines the data center environment and must be reflected in the continuity planning.
Figure 21-2 reflects the change from associated server/storage components to separate and discrete server, network, and storage components. This describes a data center that has divided its processing environment into three discrete infrastructures: servers, networks, and storage. Although many data centers have evolved into viewing and managing their environment this way, the separation of responsibility by operating environment (for example, UNIX, Windows, MVS, zOS, and Linux) must contend with the integration effects of heterogeneous storage networks.
The plans that are designed and implemented all depend on the magnitude of the problem incident. Therefore, it's important to distinguish and categorize the types of storage problems you are likely to encounter as well as larger problems that affect an entire infrastructure. This analysis provides a realistic view of the amount of insurance (redundancy, failover, and recovery methods ) you need.
The interruptions in service that most commonly reflect outages can easily be lumped into two distinct categories: hardware failures and software problems. Hardware failures can be categorized through the inventory of storage networking devices and subcomponents, as well as the subsequent development of a matrix that reflects the causal relationship to each. Software problems can be organized the same way, although the causal relationships may assume complexities that are unnecessary to the task at hand. Because of this, it's best to be reasonable about what the system can and can't do when developing this list.
An enhancement to existing systems information will be to develop a matrix for SAN hardware components. Among necessary items such as component, name , vendor, serial number, and model, this type of list provides an excellent location to indicate a priority designation on the component in case of failure. Given our Plug-and-Play component-driven world, most problem resolutions fall into the category of a field replaceable unit (FRU), which provides an excellent beginning point to the categorization of the list. The value of having readily accessible and up-to-date SAN hardware information will enhance all management activities associated with the SAN hardware.
A corresponding matrix for SAN software configurations can be just as important. The relationship between switch operating system releases, node firmware, and server OS releases is necessary for problem identification and resolution. Keep in mind the SAN and NAS will place the overall solution in a multivendor scenario. The more information you have at hand, the better you can facilitate the problem resolution activities. Like the hardware matrix, the software list and categorization can articulate a level of priority and failure relationship to software elements as well as affected hardware components.
Laying the groundwork for business continuity planning in this manner allows you to view the enterprise in a macro level. This enables the identification of a set of requirements to determine how a storage networking infrastructure fits into the data-center environment. This guides the level of planning that needs to occur at the micro level of storage continuity as you deal with storage recovery and, more specifically , the details surrounding storage networking.
If we refer to Figure 21-1, the ability to analyze the macro level is fairly simple. For example, go through the failure points from an externalization view first. This can be done, as shown in Figure 21-2, by performing a 'what if' scenario on potential network outages. If the lease lines into the NAS servers go down, what applications go offline? Asking further questions qualifies the configuration for potential continuity insurance. For example, is the affected application or application service level compromised? If so, what is the recovery scenario?
In this manner, by working through the external scenarios, you can begin to develop a picture of potential fault points, their effect, and the basis for continuity insurance. In our example, with the leased lines being down, connecting any or all the remote offices would compromise their service level. Therefore, a level of recovery can be planned as insurance from this type of incident.
Planning and design of business continuity processes, resources, and environments require the participation of a diverse set of individuals. Although it's imperative to have the business users present, this may not always be the case. However, it should be the case that at the very least your colleagues in the data center participate, support, and even drive the plans and processes you develop. After all, the storage infrastructure is a key requirement for the continuation of automated business processes.
Without the participation and input from the business users you support, the plans you develop are considered 'out-of-context.' While that should not end your continuity planning, it puts the judgment of business application value and the assignment of recovery and redundant resources in making the storage infrastructure available totally within the IT organization. The out-of-context planning usually gets attention when the cost of business continuity is presented to the company.
The following includes additional detail and best practices when planning within your IT organization or within the user community:
Planning out of context This provides a level of difficulty in determining the financial and operational impact placed on application outages, which may be accomplished through effective planning sessions with application designers and development organizations. It's best, as stated previously, to provide a robust continuity plan that encompasses all the production processing and related data, and then present this to company management with the associated cost. This will quickly lead to participation of the end users who ultimately have to bear the burden of these costs.
Planning within context Planning continuity with the participation of the end users presents a much more realistic view of recovery and redundant requirements. Aside from a formalized project for business continuity, this should be integrated into existing capacity planning activities. Aside from formalized meetings, this offers a less intrusive method of gaining the information. This can be accomplished through activities such as worksheet-based surveys to end users that provide a macro view of the business impact and service level requirements.
Planning with our colleagues At a minimum, the participation of your colleagues within the data center is imperative. As pointed out in dealing with out-of-context planning, the involvement of application design and development management should be involved to provide additional analysis on the value of the data and application service levels.