6.6 Best practice 6: Look for ways to shrink disaster-recovery windows


6.6 Best practice #6: Look for ways to shrink disaster-recovery windows

The rule here is that, if your backups take too long, your restores will too. The goal should be to reduce the amount of time it takes to restore your Exchange data. As Exchange deployments become more and more mission critical, disaster-recovery windows are constantly shrinking. I typically recommend that organizations target 4-hour (or less) recovery windows for Exchange data. However, this should be based on each organization’s business needs. Since typical restore times are two times longer that the backup time, the maximum tolerable backup time is 2 hours. You must be able to perform backups on your data-recovery units within a 2-hour period to be able to recover that data within 4 hours. There are several possible strategies to accomplish this. All involve a trade-off between the size of your recovery unit (i.e., how large your Exchange database is) and the time it takes to back up and restore (usually dictated by your disaster-recovery technology choice). Some possible methods include (1) reducing the recovery unit size, (2) increasing backup recovery performance capabilities, and (3) using alternative technologies.

Reducing the recovery unit size

By reducing the size of your unit of recovery, you can manage disaster recovery of that unit within the limits imposed on you. Exchange 2000/ 2003 is particularly helpful in this situation. Before Exchange 2000, the only way to accomplish this was to reduce the size of the information store by adding more servers to your deployment. While this does not address the increased whole-server restore time, it does help with data restore. If your maximum information store size dictated by disaster-recovery constraints was 50 GB, once you reached that limit, you would need to add an additional server to your deployment. With Exchange 2000/2003 support of multiple storage groups and multiple databases per storage group, you have an alternative to adding another server. If you determine that your disaster recovery constraints limit you to a database size of 50 GB, you simply add another database to an Exchange server when that limit is reached. Another way to accomplish this is to break down a deployment with few large information stores into many smaller information stores. For example, instead of an Exchange server with a single 100-GB database, you could configure 10 10-GB databases on a single server. Backup and restore of a single 100-GB database may take many hours. However, backup and restore of 10 10-GB databases concurrently (if some of the databases resided in separate storage groups) would take far less time.

Other techniques can help reduce the amount of data that must be backed up and restored as well. Periodic defragmentation of databases can help in situations where a large amount of data has been deleted from an Exchange database. Also, an Exchange archival solution can reduce the size of existing databases. Several third-party products offer archival functions that offer policy-based aging of user data out of the information stores. Enforcement of mailbox quotas and reducing deleted item retention can also decrease the amount of data that must be recovered. You may also choose to leverage personal store files (PSTs). PSTs allow users to store data individually. When using PST files, however, you will need to set clear policies about whether or not disaster-recovery measures will be provided by the organization for these user files. If so, you may not accomplish any reduction in the amount of data that must be recovered. There are many creative methods to reduce the size of your Exchange information stores. If you are successful in this effort, you will be rewarded with shorter backup and restore times, fewer tapes, and an overall reduction in your Exchange disaster-recovery window.

Increasing backup/recovery performance

Many times, this is much easier said than done. Depending on how you back up your Exchange server, you may be able to apply technology and better design principles toward this goal. Many Exchange deployments with which I come into contact have limits to back up/restore performance that are imposed by either their technology choice or the architecture that technology operates within. If you have a single DAT tape drive attached to every Exchange server in your deployment, you are going to get a “ballpark estimate” maximum backup rate of about 1 to 2 GB per hour. For DLT, that number increases to about 10 to 15 GB per hour or higher. For a database that is 50 GB, the DAT device will never deliver the needed performance (~25 hours to backup and ~50 hours to restore). For the DLT tape drive, you may have acceptable backup rates (~3–4 hours), but the restore rates would be barely adequate (~5–6 hours). LTO may make this an even better story. From an architecture viewpoint, you may have high-speed devices available, but perform your backup and restore operations over a network instead of locally attached devices. In this case, the bottleneck could be the network, and a dedicated backup network backbone would be required. When attempting to reduce disaster-recovery windows with performance enhancements, you will have to evaluate the relative cost versus performance tradeoffs. While an eight-drive DLT array locally attached to every Exchange server may yield the ultimate in performance (I have seen as high as 70 GB per hour on backup in a test lab with this configuration), this solution is not cost-effective for a large deployment. However, in this example, you may be able to deploy several backup servers with DLT arrays or library devices and back up your Exchange server via a dedicated disaster-recovery backbone network. This may strike a better price/performance balance, while still accomplishing the goal of increased backup and restore performance.

Another excellent alternative for increasing the performance of backup/ restore that more and more Exchange administrators are turning to is backup to disk. In fact, Microsoft’s own Operations and Technology Group (OTG) uses backup to disk as its default mechanism for Exchange 2003 backup/restore. Microsoft uses a two-step process where backups are performed to SAN-based disk volumes and then streamed to tape. Two days of backup sets are kept on disk (enabling rapid restoration) and 28 days worth of backup sets are kept on tape (enabling a total of 30-day retention of backups). Backup-to-disk rates can be extremely performant (Microsoft maintains a ~1-hour recovery SLA for databases using this strategy) and can provide another somewhat obvious, but underutilized, mechanism for supercharging your Exchange backup and restore operations.

You can look to other areas to increase disaster-recovery performance as well. The disk subsystem on your Exchange server is another key area that will impact backup and restore performance. Upgrades to this important server subsystem can ensure that backup and/or restore performance is optimized. There are many areas to regard when attempting to increase performance of your disaster-recovery operations for Exchange. The important point is to focus on where your current bottlenecks and limitation exist in your current strategy. By identifying where these bottlenecks exist, you can make sound, cost-effective decisions about which areas make the best sense for further enhancement and investment.

Using alternative technologies

In Chapter 7, I will discuss storage technology and the various features that can be leveraged to increase reliability for Exchange deployments.

Technologies such as business continuance volumes (BCVs) and data replication can add to existing disaster-recovery techniques and measures and can provide alternative recovery options. As an example, utilizing BCV technology can provide for another medium that holds Exchange data in addition to the data that exists as part of your regular on-line backup. In a scenario using BCVs, they could function as a backup volume (from which backups are performed shown in Figure 6.4) or as a rapid-recovery measure in the event of database corruption or data loss. Since many of these alternative technologies are new, there are many caveats. I suggest that these technologies can provide some answers to the challenges of Exchange disaster-recovery. However, I do not recommend that these options be used in lieu of established Microsoft-supported measures. For example, I would not use BCV technology as a replacement for regular on-line backups unless it is implemented within the Windows VSS framework with Exchange Server 2003. I do believe, however, that these technologies can be an important complement to existing practices and methods. In the future when these technologies mature, I expect to see many of them used regularly and as the primary means of increasing backup and restore performance as well as functionality. In the meantime, approach with caution and stay tuned for Chapter 7, where I discuss these technologies in more detail.

click to expand
Figure 6.4: Using snapshot/ clone (BCV) technology with Exchange Server.

By shrinking the amount of time it takes to accomplish disaster-recovery, we can scale our Exchange deployments to larger user populations and datasets. There are several ways to approach this challenge. Reducing data, increasing performance, and leveraging alternative technologies are among the leading strategies. Whatever your approach, seek to identify ways to accomplish this and thereby enable your Exchange deployment to meet the ever-growing service-level requirements of mission-critical systems.




Mission-Critical Microsoft Exchange 2003. Designing and Building Reliable Exchange Servers
Mission-Critical Microsoft Exchange 2003: Designing and Building Reliable Exchange Servers (HP Technologies)
ISBN: 155558294X
EAN: 2147483647
Year: 2003
Pages: 91
Authors: Jerry Cochran

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net