Disaster Recovery and Business Continuity Plans

Despite preparations to minimize the impact of risk, there may be times when a major event occurs that could jeopardize your business. To deal with such disasters, a proactive approach needs to be taken to ensure the business will function normally no matter what the circumstances. Business continuity is a process that identifies key functions of an organization, the threats most likely to endanger them, and creates processes and procedures that ensure these functions will not be interrupted (at least for long) in the event of an incident. It involves restoring the normal business functions of all business operations, so that all elements of the business can be fully restored.

Business continuity plans are a collection of different plans that are designed to prevent disasters and provide insight into recovering from disasters when they occur. Some of the plans that may be incorporated into a business continuity plan include:

  • Disaster Recovery Plan   Provides procedures for recovering from a disaster after it occurs, and addresses how to return normal IT functions to the business.

  • Business Recovery Plan   Addresses how business functions will resume after a disaster at an alternate site.

  • Business Resumption Plan   Addresses how critical systems and key functions of the business will be maintained.

  • Contingency Plan   Addresses what actions can be performed to restore normal business activities after a disaster, or when additional incidents occur during this process.

In looking at these different plans, it can be seen that they have similarities, but address different areas that make up how the organization will deal with disasters to resume business operations.

Because a business continuity plan focuses on restoring the normal business functions of an entire business, it is important that critical business functions are identified. This will establish the scope of the plan and show what elements of the company need to be addressed. Each department of the company should identify the requirements that are critical for them to continue functioning, and determine which of the functions they perform are critical to the company as a whole. If a disaster occurs, the business continuity plan can then be used to restore those functions.

The business continuity plan should address as many different types of disasters and problems that may affect the company's ability to function. As discussed earlier, this may include natural disasters, personnel problems like strikes, infrastructure failures, telecommunication problems, and issues related to technology (such as server failures). In addressing these issues, the security administrator will be able to determine what elements of the business are highly needed by departments, and would adversely affect the company if they were lost.

A business impact assessment should be created to determine the influence different events would have on key functions of a business. For example, while terrorism may affect the company as a whole, a server failing in a branch office may only have an impact on that location. Identifying how the event will impact areas of the company provides a better opportunity to prioritize threats and deal with them accordingly.

Part of the business impact assessment is determining which functions of a business are more critical than others. While every department of a company may view their needs as most important, it is up to those creating the business continuity plan to determine which ones are crucial to normal business practices. For example, a telemarketing company's need for telephone communications would probably be a higher priority than their need to surf the Internet. By setting such priorities, the administrator can establish which elements of the business need to be restored first.

Since people will want to know when they can resume their jobs, a business impact assessment also establishes estimates of how long it will take for different parts of the business to be made available again. Estimates can be made by looking at historical data, or by contacting third parties who will need to reestablish certain elements of the business. For example, they could look at how long it took for certain systems to be initially installed or for data to be restored to a server from a backup. They could also contact third parties and request estimates on how long they would take to reestablish services like telephone, Internet access, and so forth.

By creating enough of these estimates, a timeline can be created as to how long it will take to restore the company to the point where it can continue doing business. For example, if the administrator knows that it will take one hour to set up equipment, two hours to restore data to a server, and three hours to have telephone communications, then they have a timeline of when these different elements of the business will be restored. In a disaster, people can look at this timeline and have an idea of when certain functions will be restored, and work around them.

For the business continuity plan to work effectively, it is important that budgets be created to establish how much money will be assigned to individual components. For example, while IT systems may be a key function, the corporate intranet may be a luxury and not essential to business operations. In the same light, while the existing server room may use biometrics to control access, the cold site facility may only provide a locked closet for security. This raises another important point: just because a system is being recovered to a previous state may not mean that things will be exactly the same as before.

When generating the various strategies making up the business continuity plan, it is important to include person(s) responsible for information security and members of the local security response team. These members of the organization would be responsible for responding to incidents when they occur, and restoring equipment, software, and data. As seen in the following sections, a number of policies and procedures will be used in the business continuity plan to focus on restoring technologies. To resume normal business functions, strategies need to be created for setting up new servers, restoring data to those servers, reestablishing communication to other parts of a wide area network (WAN), and other issues of a technical nature. It would be remiss to exclude those with experience and/or training in these areas.

Test Day Tip 

A disaster recovery plan focuses on restoring information systems, while a business recovery plan addresses restoring key business functions that are needed to conduct business.

Another important area to be aware of when creating such plans and responding to disasters, is that additional measures may need to be taken to protect systems from harm. While safeguards should have been implemented before a disaster to prevent vulnerabilities from being exploited, safeguards may also need to be implemented after a disaster occurs. Sometimes vulnerabilities may go unnoticed until after problems arise. Once a disaster occurs, however, areas that could have been protected but were not become clearer. For example, if a hacker breaks into a server through a service that was not required, restoring this unneeded service on a replacement server would involve making the same mistake twice. Changing systems to remove vulnerabilities will not protect a system from a disaster that has already happened, but it will protect the system from repeat attacks.

Disaster Recovery Plan

Disaster recovery plans are documents that recognize potential threats, and provide guidance on how to deal with such events when they occur. When creating a disaster recovery plan, it is important to try to identify all the different types of threats that may affect the company. Disasters include such potential threats as terrorism, fire, flooding, hacking, and other incidents. Once the disasters a company could face are determined, they can then create procedures to minimize the risk of such disasters.

The issues dealt with in a disaster recovery plan may address a wide variety of subjects relating to restoring technologies and business functionality. It looks at how such areas can be recovered quickly, with the most business-critical requirements taken care of first. For example, if a company depends on sales from an e-commerce site, then restoring this server would be the primary focus. This would allow customers to continue viewing and purchasing products, while other systems are being restored. In doing so, the company is able to resume the most critical functions of the business first, while less significant functions are being recovered.

When creating the disaster recovery plan to be used by an organization, it is important that it does not violate any existing policies, regulations, or laws. Some companies must adhere to certain rules or guidelines if they are to remain in business, and failing to meet these requirements can cause more harm than the disaster that activates the plan. For example, a hospital may be required to use certain technologies or adhere to certain criteria so that patient information is kept confidential. If they were restoring elements of the network and did not adhere to these requirements, it is possible that lawsuits could result. In other situations, the business might even be shut down for failing to abide by certain regulations or legislation.

A disaster recovery plan may incorporate or include references to other policies, procedures, or documents. For example, a company may have an incident response policy (discussed later in this chapter) that outlines who is to be called and how to deal with certain incidents. Other documentation may provide information on infrastructure, procedures to be followed to fix problems, and other important data. By including or referencing other policies, procedures, and documents, those involved in the disaster recovery will be able to find the information they need to solve problems quickly.

Disaster recovery plans deal with recovering from a multitude of different types of disasters, so it follows suit that different types of resources will need to be addressed. Elements that must be considered are:

  • Data

  • Equipment

  • Software

  • Personnel

  • Facilities

Omitting any aspect that is necessary to the recovery of the business could be detrimental and prevent normal business functions from being reestablished.

Dealing with damaged equipment varies in complexity, depending on its availability and the necessary steps required to restore necessary resources. Some companies may have additional servers with identical configurations to damaged ones for use as replacements when incidents occur. Other companies may not be able to afford such measures or do not have enough additional servers to replace damaged ones. In such cases, they may have to put data on other servers and then configure applications and drive mappings so the data can be accessed from the new location. Whatever the situation, they should try to anticipate such instances in their disaster recovery plan, and devise contingency plans to deal with such problems when they arise.

The cost of applications and operating systems can make up a considerable part of a company's operating budget. To deal with the potential loss of necessary software, copies of programs and their licenses should be kept offsite so that they can be used when systems need to be restored. Configuration information should also be documented and kept offsite so that it can be used to return the system to its previous state.

Because hardware and software may not be easily installed and configured, a company may need to have outside parties involved. As such, they should check their vendor agreements to determine whether they provide onsite service within hours or days, as waiting for outsourced workers can present a significant bottleneck in restoring the system. Companies do not want to be surprised by such delays when a disaster occurs, so preparing for such possibilities is the key to readiness.

Personnel are another important consideration when creating a disaster recovery plan. Certain members of the company may have distinct skill sets that can cause a major loss if that person is unavailable. If a person is injured, dies, or leaves a company, their knowledge and skills are also gone. Imagine a network administrator getting injured, with no one else fully understanding how to perform that job. This would cause a major impact to any recovery plans. Thus, it is important to have a secondary person with comparable skills who can replace important personnel, and to have documentation on systems architecture and other elements related to recovery, as well as clear procedures to follow in performing important tasks.

When considering the issue of personnel, members should be designated who will be part of an incident response team that will deal with disasters when they arise. Members should have a firm understanding of their roles in the disaster recovery plan and the tasks they will need to perform to restore systems. A team leader should also be identified, so a specific person will be responsible for coordinating efforts.

If a team already exists, they should be included in preparing the disaster recovery plan and testing it. Their insight may prove crucial to developing a plan that works. It is also important that they perform "dry runs" of the disaster recovery plan to ensure that developed strategies work as expected, and revise any steps that are ineffective.

A disaster recovery plan approaches risks proactively, setting up controls that can be used in the event of a disaster. This requires preparation and foresight, inclusive to having data backed up regularly, keeping needed information and tools offsite, and having the necessary facilities to recover normal business functions. Failing to do so could mean the business would be unable to recover properly from any disaster that befalls it.

Backups

Preparation for disaster recovery begins long before a disaster actually occurs, so backups of data need to be performed daily to ensure data can be recovered if needed. Backing up data is a fundamental part of any disaster recovery plan. When data is backed up, it is copied to a type of media that can be stored in a separate location. The type of media will vary depending on the amount of data being copied, but can include digital audio tape (DAT), digital linear tape (DLT), compact disks (CDR/CD-RW), or even floppy disks. When data is destroyed, it can then be restored as if nothing had happened.

When making backups, the administrator needs to decide what data will be copied to alternative media. Critical data, such as trade secrets that the business relies on to function, and other important data crucial to the business needs must be backed up. Other data, such as temporary files, applications, and other data may not be backed up, as they can easily be reinstalled or missed in a backup. Such decisions, however, will vary from company to company.

Once the administrator has decided on what information needs to be backed up, they can then determine the type of backup that will be performed. Common backup types include:

  • Full Backup   Backs up all data in a single backup job. Generally, this will include all data, system files, and software on a system. When each file is backed up, the archive bit is changed to indicate that the file was backed up.

  • Incremental Backup   Backs up all data that was changed since the last backup. Because only files that have changed are backed up, this type of backup takes the least amount of time to perform. When each file is backed up, the archive bit is changed.

  • Differential Backup   Backs up all data that has changed since the last full backup. When this type of backup is performed, the archive bit is not changed, so data on one differential backup will contain the same information as the previous differential backup plus any additional files that have changed.

  • Copy Backup   Makes a full backup but does not change the archive bit. Because the archive bit is not marked, it will not affect any incremental or differential backups that are performed.

Because different types of backups will copy data in different ways, the methods used to backup data vary between businesses. One company may take daily full backups, while another may use a combination of full and incremental backups (or full and differential backups). This will affect how data is recovered, and what tapes need to be stored in alternative locations. Regardless of the type used, however, it is important that data is backed up on a daily basis, so large amounts of data will not be lost in the event of a disaster.

Rotation Schemes

It is important to keep at least one set of backup tapes offsite, so that all of the tapes are not kept in a single location. If backup tapes are kept in the same location as the servers that were backed up, all of the data (on the server and the backup tapes) could be destroyed in a disaster. By rotating backups between a different set of tapes, data is not always being backed up to the same tapes, and a previous set is always available in another location.

A popular rotation scheme is the Grandfather-Father-Son (GFS) rotation, which organizes rotation into a daily, weekly, and monthly set of tapes. With a GFS backup schedule, at least one full backup is performed per week, with differential or incremental backups performed on other days of the week. At the end of the week, the daily and weekly backups are stored offsite, and another set is used through the next week. To understand this better, assume a company is open from Monday through Friday. As shown in Table 5.1, a full backup of the server's volumes is performed every Monday, with differential backups performed Tuesday through Friday. On Friday, the tapes are then moved to another location, and another set of tapes is used for the following week.

Table 5.1: Sample Weekly Backup Schedule

Sunday

Monday

Tuesday

Wednesday

Thursday

Friday

Saturday

None

Full Backup

Differential

Differential

Differential

Differential, with week's tapes moved offsite

None

Because it would be too expensive to continually use new tapes, old tapes are reused for backups. A tape set for each week of the month would be rotated back into service and reused. For example, at the beginning of each month, the tape set for the first week of the previous month would be rotated back into service, and used for that week's backup jobs. Because one set of tapes are used for each week of the month, this means that most sets of tapes are kept offsite. Even if one set were corrupted, the setup tapes for the week previous to this could still be used to restore data.

Test Day Tip 

Grandfather, father and son backups may be confused with one another. Remember that, just like your own family tree, there are more with each generation. There may be only one grandfather (a single full backup), multiple fathers (weekly full backups), and even more sons (daily backups). By remembering that there are more with each generation, it may help you remember why there are more and the role each plays in a backup routine.

In the GFS rotation scheme, the full backup is considered the Father, and the daily backup is considered the Son. The Grandfather segment of the GFS rotation is an additional full backup that is performed monthly and stored offsite. The Grandfather tape is not reused, but is permanently stored offsite. Each of the Grandfather tapes can be kept for a specific amount of time (such as a year), so that data can be restored from previous backups, even after the Father and Son tapes have been rotated back into service. If someone needed data restored from several months ago, the Grandfather tape enables a network administrator to retrieve the required files.

Recovery

A backup is only as good as its ability to be restored. Too often, backup jobs are routinely performed, but the network administrator never knows whether the backup was performed properly until the data needs to be restored. To ensure that data is being backed up properly and can be restored correctly, test restores of data to the server should be performed. This can be as simple as attempting to restore a directory or small group of files from the backup tape to another location on the server.

As part of the disaster recovery plan, the administrator needs to determine how data will need to be restored from backups. As seen earlier, there are different types of backups that can be performed. Each of these will take differing lengths of time to restore, and may require additional work.

When only full backups are performed, all of the files are backed up to a tape or other media. As the backup job can fit on a single tape (or set of tapes), the administrator may only need to restore the last backup tape or set that was used. Full backups will backup everything, so additional tapes are not needed.

Incremental backups take the longest to restore. Incremental backups contain all data that was backed up since the last backup, thus many tapes may be used since the last full backup was performed. When this type of backup is used, the last full backup and each incremental backup that was made since need to be restored.

Differential backups take less time and fewer tapes to restore than incremental backups. Because differential backups will backup all data that was changed since the last full backup, only two tapes are needed to restore a system: The tape containing the last full backup and the last tape containing a differential backup.

Since different types of backups have their own advantages and disadvantages, the administrator needs to consider what type of backup will be suitable to their needs. Some types of backups will take longer than others to backup or restore, so they will need to decide whether they want data backed up quickly or restored quickly when needed. To aid the decision, Table 5.2 provides information on different aspects of backup types.

Table 5.2: Factors Associated with Different Types of Backups

Type of Backup

Speed of Making the Backup

Speed of Restoring the Backup

Disadvantages of the Backup Type

Daily Full Backups

Takes longer than using full backups with either incremental or differential backups.

Fastest to restore, as only the last full backup is needed.

Takes consider-ably longer to back up data, as all files are backed up.

Full Backup with Daily Incremental Backups

Fastest method of backing up data, as only files that have changed since the last full or incremental backup are backed up.

Slowest to restore, as the last full backup and each incremental backup made since that time needs to be restored.

Requires more tapes than differential backups.

Full Backup with Daily Differential Backups

Takes longer to back up data than incremental backups.

Faster to restore than incremental backups, as only the last full backup and differential backup is needed to perform the restore.

Each time a backup is per-formed, all data modified since the last full backup (including that which was backed up in the last differential backup) is backed up to tape. This means that data contained in the last differential backup is also backed up in the next differential backup.

Offsite Storage

Once backups have been performed, the backup tapes should not be kept in the same location as the machines that were backed up. After all, a major reason for performing backups is to have the backed up data available in case of a disaster. If a fire or flood occurred and destroyed the server room, any backup tapes in that room could also be destroyed. This would make it pointless to have gone through the work of backing up data. To protect data, the backups should be stored in a different location so that they will be safe until they are needed.

Offsite storage can be achieved in a number of ways. If a company has multiple buildings, such as in different cities, the backups from other sites can be stored in one of those buildings, and the backups for servers in that building can be stored in another building. If this is not possible, then the company can consider using a firm that provides offsite storage facilities. The key is to keep the backups away from the physical location of the original data.

When deciding on an offsite storage facility, administrators' should ensure that it is secure and has the environmental conditions necessary to keep the backups safe. They should also ensure that the site has air conditioning and heating, as temperature changes may affect the integrity of data. It should also be protected from moisture and flooding, and have fire protection in case a disaster befalls the storage facility. The backups need to be locked up and have policies regarding who can pick up the data when needed. Conversely, they want the data to be accessible when needed, so that they can acquire it from the facility and not have to wait until the next time the building is open for business.

Exam Warning 

Backups are an important part of disaster recovery, so it is possible you will get a question or two dealing with this topic. Remember that copies of backups must be stored in offsite locations. If the backups are not kept in offsite storage, they could be destroyed with the original data in a disaster. Offsite storage ensures backups are safe until the time they are needed.

Data is only as good as its ability to be restored. If you cannot restore it, then the work performed to maintain backups was pointless. The time to ensure that backups can be restored is not during a disaster. Test restores should be performed to determine the integrity of data and ensure that the restore process actually works.

Alternate Sites

Recovering from a disaster can be a time-consuming process with many unknown variables. In some cases, the damage will be limited and normal functions can be resumed quickly. If a virus, intruder, or other incident has adversely affected a small amount of data, it can be relatively simple to restore data from a backup and replace the damaged information. However, when disasters occur, the magnitude may extend to segments of the business, such as an entire server room or building. To restore systems to their previous condition, such circumstances require alternate sites to be used.

Alternate sites are important to disaster recovery as they allow companies to experience minimal downtime or almost no downtime at all. When a disaster occurs, a company may require a temporary facility in which data can be restored to servers and business functions can resume. Without such a facility, the company would need to find a new business location, purchase new equipment, set it up, and then go live. When a company is not prepared, such activities could take so long that the disaster could put them out of business.

There are different types of alternate sites that can be established for use during a disaster. These are:

  • Hot sites

  • Warm sites

  • Cold sites

As seen in the following paragraphs, each of these different types of alternate sites are in varying states of readiness, with some allowing normal business functions to resume more quickly than others.

A hot site is the most prepared type of alternate site. It is a facility that has the necessary hardware, software, phone lines, and network connectivity to allow a business to resume normal functions almost immediately. This can be a branch office or data center, but must be online and connected to the production network. A copy of data is held on a server at that location, so little or no data is lost. Replication of data from production servers may occur in real time, so that an exact duplicate of the system is ready when needed. In other instances, the bulk of data is stored on servers, so only a minimal amount of data needs to be restored. This allows business functions to resume very quickly, with almost zero downtime.

A warm site is not as equipped as a hot site, but has part of the necessary hardware, software, and other office needs to restore normal business functions. Such a site may have most of the equipment necessary, but will still need work to bring it on line and support the needs of the business. With such a site, the bulk of the data will need to be restored to servers, and additional work (such as activating phone lines or other services) will need to be done. No data is replicated to the server, so backup tapes must be restored so that data on the servers is recent.

A cold site requires the most work to set up, as it is neither online nor part of the production network. It may have all or part of the necessary equipment and resources needed to resume business activities, but installation is required and data needs to be restored to servers. Additional work (such as activating phone lines and other services) will also need to be done. The major difference between a cold site and a hot site is that a hot site can be used immediately when a disaster occurs, while a cold site must be built from scratch.

If companies are unable to afford keeping alternate facilities available that are only to be used in the event of an emergency, more economic options may be used. Some businesses have branch offices that are networked together, and may provide the space needed to resume operations. In some cases, parts of the business may need to be temporarily split across multiple branch offices, while other branches may provide the space to accommodate everyone effected by a disaster. Another alternative is to make an agreement with another company, so that one will accommodate the other in the event of an emergency. If one of the companies experiences a disaster, operations can temporarily be setup at the other's facilities. While not ideal situations, each of these options allows business to continue until a more permanent solution is found.

When deciding on appropriate locations for alternate sites, it is important that they be in different geographical locations. If the alternate site is not a significant distance from the primary site, it can fall victim to the same disaster. Imagine having a cold site across the road from a company when an earthquake happens. Both sites would experience the same disaster, so now there would be no alternate site available to resume business. On the other hand, you do not want the alternate site so far away that it will significantly add to downtime. If the IT staff needs to get on a plane and fly oversees to another office, this can increase the downtime and result in additional losses. Designate a site that is close enough to work from (such as within a distance of 200 miles away), but not so far that it will become a major issue when a disaster occurs.

Exercise 5.04: Alternate Sites

start example

A company has a main building located in the downtown section of town, and four branch offices. One branch office is located three blocks away from the main building, while the other is located in another city and takes a half-hour drive on the highway to reach. Another branch office is located on the other side of the country, while the fourth is located in Brazil. While these final two offices are useful to the company, they do not play a major role in the normal business practices of the company.

The company has asked you to assist in developing a disaster recovery plan. As part of this plan, you are expected to determine the necessary components that will be used in the event of a disaster. In doing so, you identify the need for an alternative site that can be used if normal business functions cannot be conducted at the main building or branch offices. The company wants normal business functions to resume quickly with almost zero downtime, and is willing to budget for the equipment required for a site to be maintained with copies of data stored on servers at that location. Based on this information, you need to make the following decisions:

  1. If a disaster occurs at the main facility, which of the other branch offices should be used as an alternate site to recover normal business functions?

  2. If a disaster occurs in the branch offices located within driving distance of the main facility, where should an alternate site be set up to recover normal business functions?

  3. What type of alternate site needs to be created to meet the organization's needs?

Answers to Exercise Questions

  1. Alternate sites should be close enough to work from, but not so far that it will become a major issue when a disaster occurs. As such, the branch office located in another city that will take a half-hour to drive to on the highway can be used as an alternate site if the main facility experiences a disaster. The branch office three blocks away is too close, while the other two are too far away.

  2. If a disaster occurs in the branch office three blocks away, then it could also effect the main facility. As such, the branch office located in another city that will take a half-hour to drive to should be used as an alternate site. If a disaster occurred at this location instead, then the main facility or the branch office located nearby it could be used as an alternate site. The other locations are too far away to be considered.

  3. A hot site is needed to meet the organization's needs. A hot site has the necessary hardware, software, phone lines, and network connectivity to allow a business to resume normal functions almost immediately. This is required because the company needs normal business functions to resume very quickly, with almost zero downtime.

end example



SSCP Systems Security Certified Practitioner Study Guide
SSCP Study Guide and DVD Training System
ISBN: 1931836809
EAN: 2147483647
Year: 2003
Pages: 135

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net