Summary | Implementing Backup and Recovery: The Readiness Guide for the Enterprise

< Day Day Up >

Where Do We Start?

As you start planning your backup and recovery system, you need to start gathering detailed information on your enterprise. You need to know the network layout for all systems. If your enterprise is made up of multiple networks, you need to know how much data resides on it and the speed of each network or subnet. Obviously, it is much faster to move data across a 100-Mb/sec (100Base-T) network than a 10-Mb/sec (10Base-T) network. You need to understand the network layout and the corresponding data to help identify potential bottlenecks and take them into consideration as you architect your backup and recovery system. (This information is also necessary in determining where to put media servers and tape devices, but we will get to that in a later chapter.)

As you look at the network that makes up your enterprise, you need to understand the network speed and topology. You also need to understand the disk layout, especially for the larger file servers and database servers, or identify who has this knowledge. You should watch for bottlenecks involving the disks, as well as the networks, SCSI connections, and any other appropriate I/O paths. When considering the decisions that need to be made when architecting backup strategy, the two things you must always keep in mind are the effect on normal production and effects on restore speed and performance. This usually involves making the necessary cost trade-offs to achieve the best of all worlds.

Here are some of the steps necessary for you to gather the information needed before establishing the backup strategy:

Identify all the systems, noting the order in which they would need to be recovered following a disaster.
Identify all networks involved, including speed of network and existing load at various times throughout the 24-hour day and night.
Locate all existing backup-related hardware, such as tape drives and libraries.
Identify recovery requirements.
Identify data and application availability requirements during backup.
Determine the best way to move the data.

We discuss each of these points in a little more detail in the sections that follow.

Identify All Systems

You need to identify all systems that need to be backed up. Generally this will be most if not all of the systems in the enterprise, with the exception of user workstations. There may be some systems that are basically replicated systems and can be easily re-created. In general, it is only necessary to back up one of these systems. The following information should be gathered for all the systems:

Amount of data
Speed of system
Number and type of networks
Type of data-database or filesystem?
Priority of recovery in DR
Tape drive or library installed?

Identify All Networks Involved

The network layout is an important part of the information required. Identifying the layout can be very critical to establishing the backup and recovery strategy. This step addresses the potential performance bottlenecks, because slow networks are often some of the primary bottlenecks. If there is a significant amount of data on a slow network, a media server may need to be located on the network. Any systems that have large amounts of backup data, such as a system with more than 100 GB, should be considered as media servers and have direct connections to a tape drive or drives. Following is the information needed for the networks:

Speed of network
Amount of data residing on the systems
Location of any backup hardware
Current and proposed production traffic

Locate Backup Hardware

Identifying all the systems and mapping the network topology should provide an idea of the total backup requirements. Part of this information is the location of the potential backup devices. The next step is to make sure the hardware is correctly located within the enterprise. Any enterprise backup and recovery strategy should be based on an application that supports library and drive sharing to ensure the tape drives and libraries are connected throughout the enterprise in such a way as to minimize bottlenecks, as well as to gain the most use from these very expensive tape drive resources. In a pure local area network (LAN) environment, it might be advisable to physically locate the tape library or libraries close enough to the systems that have the largest amounts of data so they can be directly connected to the tape drives and therefore perform backups and restores without data being moved across the network. These systems become media servers and control access to their drives. To handle data from other LAN-based systems, you either need to add more drives and give these systems access to their own drives or use the media servers to handle the backups for the systems that do not have their own drives. Also, the systems must be physically located close enough to the tape devices to be directly connected via SCSI cables.

LEARN FROM THE ERRORS OF THEIR WAYS

Backup is a part of any data protection strategy, but there are other technologies, such as replication, that are part of it as well. The key to a sound strategy is to incorporate all the different technologies. I have been involved in too many discussions with people who were trying to recover from an outage only to discover they were not as protected as they thought. One particular case involved a company that had lost their primary server that ran their most critical application. They spent several hours trying to recover from mirrored disks, when the actual failure was filesystem corruption. Mirrors did not help in this case. Their outage was extended, but they were able to recover, since they had backups. My worst call while working in support was from a system administrator who had done a mass delete from the wrong window and had removed enough of the operating system that he could not reboot. When he asked what he had to do to recover, I told him part of the process would be to restore from his latest backup. To this, he answered that configuring for backups was on his 'list of things to do.'

Locate Backup Hardware-SAN Alternative

If a storage area network (SAN) is available, it can allow for more flexibility in the backup and recovery strategy. The backup hardware can be better shared amongst the large data-resident systems while still keeping the data off the production LAN. This can also allow large systems to be backed up directly to tape without making the application servers general-purpose media servers and having these systems back up other LAN-based clients.

Identify Recovery Requirements

As you identify all the systems in the enterprise, you should note the specific recovery requirements of each system. This is very helpful in setting up the backup strategy. If an order-processing application can tolerate an eight-hour outage without severe business consequences, for example, an incremental backup strategy that minimizes backup time at the expense of restore time may be appropriate. For a Web retail application, on the other hand, where every minute of downtime means permanently lost sales, a strategy that replicates data in real time might be more appropriate, even with its greater impact on application performance. The other item to note is the order in which systems need to be recovered as part of your overall disaster recovery (DR) plan.

Identify Data and Application Availability Requirements during Backup

As you assess the backup requirements of each system, you should also make sure you know which of the database applications must be kept up- remain 'hot'-during the backup and which can be shut down to be backed up 'cold.' There are performance trade-offs involved with backing up a database while it is online, but sometimes this is necessary. This is due to the increased I/O activity, since the database activity is continuing, as well as the additional backup I/O. There are other methods of handling database backups, either hot or cold, using frozen image technologies and possibly off-host backup methods. These are discussed later in the book.

Determine the Best Way to Move the Data

You have several options for moving the data from disk to tape. Each has its own advantages and disadvantages. The methods include the following:

Files. This involves using the operating system to read all the appropriate files within the backup set and move that data from disk to tape. This method has more operating system overhead but allows for single files to be backed up and restored. It also enables the application to check each file to determine access or modification time so incremental backups can be performed.
Volumes. An entire volume can be backed up without reading the filesystem structure but by doing a bit-by-bit copy of the data from disk to tape. This is called a raw backup. This method allows for much faster data transfers but in general does not allow for single-file backups and restores. It also does not allow incremental backups. This backup method results in an entire volume being backed up, even the portions that do not contain valid data.
Block level. If the filesystem has enough information about the files, it is possible to determine which blocks have been changed. If the backup application can interface with the filesystem, you can back up just the changed blocks. This type of backup is called block level incremental.
Mapped raw backup. Some backup applications, such as VERITAS Software's NetBackup, can map a raw volume and then perform a raw volume backup while retaining the filesystem map so single files can be restored. This also allows for incremental backups. This type of backup is discussed in more detail in the section on frozen image backups in Chapter 7, 'Evaluating Other Backup-Related Features and Options.'
Off-host backups. This is a mechanism where data is moved from disk to tape without the application host being directly involved in the disk reads or tape writes. This type of backup is discussed in more detail in Chapter 7.

< Day Day Up >