Disaster Recovery and Business Continuity Planning and Processes


The next step in developing the business continuity plan is to identify recovery strategies and select the strategy or strategies that best meet the organization's needs. It is important to remember that the strategy should include the technologies required for recovery and that the policies and procedures should include specific sequencing. The sequence in which systems are recovered is important for ensuring that the organization can function effectively following a disaster. As an example, the organization might need access to the accounting systems and associated accounting functions to facilitate the purchase of equipment associated with a recovery. If the accounting personnel and systems are not brought online first, this could delay the recovery process. Using the results of the BIA, the BCP team should identify both manual and automated processes that are required for the organization to resume business operations. These processes might include notifying personnel and moving them to processing facilities; notifying partners, customers, and shareholders of a disaster; and bringing hardware, software, and data online for use in processing.

Per ISACA, the classification matrix shown in Table 5.2 can be used to classify the criticality of systems to be recovered. This matrix will help the BCP team identify the best recovery strategies and alternative recovery strategies to be presented to senior management. The selection of the recovery strategy is based on the following:

  • The criticality of the business process and the applications supporting the process

  • The cost of the downtime and recovery

  • Time required to recovery

  • Security

Table 5.2. System Classification

Classification

Description

Critical

These functions cannot be performed unless they are replaced by identical capabilities. Critical applications cannot be replaced by manual methods. Tolerance to interruption is very low; therefore, cost of interruption is very high.

Vital

These functions can be performed manually, but only for a brief period of time. There is a higher tolerance of interruption than with critical systems and, therefore, somewhat lower costs of interruption, provided that functions are restored within a certain time frame (usually five days or less).

Sensitive

These functions can be performed manually, at a tolerable cost and for an extended period of time. Although they can be performed manually, it usually is a difficult process and requires additional staff to perform.

Noncritical

These functions can be interrupted for an extended period of time, at little or no cost to the company, and require little or no catching up when restored.


A variety of strategies exist for the recovery of critical business processes and their associated systems. The best strategy is one that takes into account the cost of downtime and recovery, the criticality of the system, and the likelihood of occurrence determined during the BIA. In addition to actual recovery procedures, the organization should implement different levels of redundancy so that a relatively small event does not escalate to a full-blown disaster. An example of this type of control is to use redundant routing or fully meshed wide area networks. This redundancy would ensure that network communication will continue if portions of the wide area network are lost. This type of redundancy acts to either remove the threat altogether or minimize the likelihood or the effect of occurrence. These types of controls should be evaluated when developing the business-recovery strategies.

The recovery solution might include the use of different types of physical processing facilities and should include agreements and the costs associated with the facility both before and during use.

Hot Sites

A hot site is a facility that is basically a mirror image of the organization's current processing facility. It can be ready for use within a short period of time and contains the equipment, network, operating systems, and applications that are compatible with the primary facility being backed up. When hot sites are used, the staff, data files, and documentation are the only additional items needed in the facility. A hot site is generally the highest cost among recovery options, but it can be justified when critical applications and data need to resume operations in a short period of time. The costs associated include subscription costs, monthly fees, testing costs, activation costs, and hourly or daily charges (when activated). The use of a hot site generally includes connectivity over public networks (WAN or Internet) to enable regular backups and periodic testing to ensure that the hardware and software are compatible.

As with any recovery plan, the hot site should be part of the testing and maintenance procedures. The organization will incur costs associated with a live recovery, which requires the organization's personnel to work onsite at the hot site facility to test the recovery of applications and data. Generally, hot sites are to be used for a relatively short recovery time; they would be used only for a period of a week to several weeks while the primary facility is repaired. The physical facility should incorporate the same level of security as the primary facility and should not be easily identifiable externally (with signs or company logos, for example). This type of external identification creates an additional vulnerability for sabotage. In addition, this facility should not be subject to the same natural disaster that could affect the originating site and, thus, should not be located in proximity to the original site.


Although hot sites are the most expensive type of alternate processing redundancy, they are very appropriate for operations that require immediate or very short recovery times.


Warm Sites

Warm sites are sites that contain only a portion of the equipment and applications required for recovery. In a warm site recovery, it is assumed that computer equipment and operating software can be procured quickly in the event of a disaster. The warm site might contain some computing equipment that is generally of a lower capacity than the equipment at the primary facility. The contracting and use of a warm site are generally lower cost than a hot site but take longer to get critical business functions back online. Because of the requirement of ordering, receiving, and installing equipment and operating systems, a warm site might be operational in days or weeks, as opposed to hours with a hot site. The costs associated with a warm site are similar to but lower than those of a hot site and include subscription costs, monthly fees, testing costs, activation costs, and hourly or daily charges (when activated).

Cold Site

A cold site can be considered a basic recovery site, in that it has the required space for equipment and environmental controls (air conditioning, heating, power, and so on) but does not contain any equipment of connectivity. A cold site is ready to receive the equipment necessary for a recovery but will take several weeks to activate. Of the three major types of off-site processing facilities (hot, warm, and cold), a cold site is characterized by at least providing for electricity and HVAC. A warm site improves upon this by providing for redundant equipment and software that can be made operational within a short time.


A cold site is often an acceptable solution for preparing for recovery of noncritical systems and data.


Duplicate Processing Facilities

Duplicate processing facilities are similar to hot site facilities, with the exception that they are completely dedicated, self-developed recovery facilities. An example of duplicate processing facilities is large organizations that have multiple geographic locations. The organization might have a primary site in Washington, D.C., and might designate a duplicate site at one of its own facilities in Utah. The duplicate facility would have the same equipment, operating systems, and applications and might have regularly synchronized data. In this example, the facility can be activated in a relatively short period of time and does not require the organization to notify a third party for activation. Per ISACA, several principles must be in place to ensure the viability of this approach:

  • The site chosen should not be subject to the same natural disaster(s) as the original (primary) site.

  • There must be a coordination of hardware and software strategies. A reasonable degree of compatibility must exist to serve as a basis for backup.

  • Resource availability must be ensured. The workloads of the sites must be monitored to ensure that availability for emergency backup use will not be impaired.

  • There must be agreement on the priority of adding applications (workloads) until the recovery resources are fully utilized.

  • Regular testing is necessary. Even though duplicate sites are under common ownership, and even if the sites are under the same management, testing of the backup operation is necessary.

Reciprocal Agreements

Reciprocal agreements are arrangements between two or more organizations with similar equipment and applications. In this type of agreement, the organizations agree to provide computer time (and sometimes facility space) to one another in the event of an emergency. These types of agreements are generally low cost and can be used between organizations that have unique hardware or software that cannot be maintained at a hot or warm site. The disadvantage of reciprocal agreements is that they are not enforceable, hardware and software changes are generally not communicated over time (requiring significant reconfiguration in the event of an emergency), and the sites generally do not employ capacity planning, which may render them useless in the event of an emergency. ISACA recommends that organizations considering a reciprocal agreement ensure the terms of the agreement by answering the following questions:

  • How much time will be available at the host computer site?

  • What facilities and equipment will be available?

  • Will staff assistance be provided?

  • How quickly can access be gained to the host recovery facility?

  • How long can the emergency operation continue?

  • How frequently can the system(s) be tested for compatibility?

  • How will the confidentiality of data be maintained?

  • What type of security will be afforded for information systems operations and data?

  • How much advance notice is required for using the facility?

  • Are there certain times of the year or month when the partner's facilities are not available?


A reciprocal agreement is not usually appropriate as an alternate processing solution for organizations with large databases or live transaction processing.


In reviewing the recovery options, the BCP team should review both the agreements and the facilities to be used in recovery to ensure that they will meet the demands of the organization. The facility should have the capacity (space, network, and infrastructure) to support a recovery and should not be oversubscribed. If a facility is oversubscribed and multiple companies declare a disaster at or near the same time, the facility would not be capable of supporting recovery. The vendor that owns the facility should be able to attest to the reliability of the site to include UPS, number of subscribers, diverse network connectivity, and guarantees of space and availability.

The organization must define procedures and put in place agreements to ensure that needed hardware and software will be available. This might include the use of emergency credit lines or credit cards with banks, agreements with hardware and software vendors, and agreements for backup data. A majority of hardware vendors provide high-response services that guarantee hardware and software availability times. These agreements must be in place before the declaration of an emergency. If the organization maintains off-site backup media, there should be an agreement in place for the procurement and shipping of media to the recovery facility.

The BCP team should develop a detailed plan for recovery. This plan should include roles and responsibilities as well as specific procedures associated with the recovery. The following factors should be considered when developing the detailed plan:

  • Predisaster readiness: Contracts, maintenance and testing, policies, and procedures

  • Evacuation procedures: Personnel, required company information

  • Disaster declaration: What defines a disaster? Who is responsible for declaring?

  • Identification of critical business processes and key personnel (business and IT)

  • Plan responsibilities: Plan objectives

  • Roles and responsibilities: Who is responsible for what?

  • Contract information: Who maintains it, and where is it?

  • Procedures for recovery: Step-by-step procedures with defined responsibilities

  • Resource identification: Hardware, software, and personnel required for recovery

The BCP should be written in clear, simple language and should be understandable to all in the organization. It is important to remember that the plan will be implemented under the worst of circumstances, personnel who are assigned duties may not be available, and those who are available could be under significant emotional stress. When the plan is complete, a copy should be maintained off-site and should be easily accessible.

When the primary components of the plan are in place, it is time to organize the plan. The plan should be organized to address response, resumption, recovery, and restoration. The resources required for a successful recovery include the following:

  • People Team members, vendors, partners, customers, clients, shareholders, employees, and services

  • Places Alternative recovery sites, processing locations, off-site storage facilities, vaults, and so on

  • Things Supplies, equipment (computing, office, voice and data communications), and vital records (data, software, documentation, forms, contracts)

The organization of the plan should be prepared to define step-by-step procedures that will take place when a disaster is declared and notification of the necessary personnel who are responsible for the timely resumption of critical business processes and systems. During the organization, the BCP team should incorporate existing policies, procedures, and recovery plans. In addition, the team should define specific training for both key personnel (BCP teams) and employees.

The business continuity plan should be created to minimize the effect of disruptions. The process associated with the development of the plan should include the following steps:

  • Perform a business impact analysis to determine the effect of disruptions on critical business processes

  • Identify, prioritize, and sequence resources (systems and personnel) required to support critical business processes in the event of a disruption

  • Identify recovery strategies that meet the needs of the organization in resumption of critical business functions until permanent facilities are available

  • Develop the detailed disaster-recovery plan for the IT systems and data that support the critical business functions

  • Test both the business continuity and disaster recovery plans

  • Maintain the plan and ensure that changes in business process, critical business functions, and systems assets, such as replacement of hardware, are immediately recorded within the business continuity plan

As an IS auditor, you should review the plan to ensure that it will allow the organization to resume its critical business functions in the event of a disaster. ISACA states the IS Auditors tasks include the following:

  • Evaluating the business continuity plans to determine their adequacy and currency, by reviewing the plans and comparing them to appropriate standards or government regulations

  • Verifying that the business continuity plans are effective, by reviewing the results from previous tests performed by both IT and end-user personnel

  • Evaluating off-site storage to ensure its adequacy, by inspecting the facility and reviewing its contents, security, and environmental controls

  • Evaluating the ability of IT and user personnel to respond effectively in emergency situations, by reviewing emergency procedures, employee training, and results of their tests and drills



Exam Cram 2. CISA
Cisa Exam Cram 2
ISBN: B001EEFNHG
EAN: N/A
Year: 2005
Pages: 146

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net