12.9 Disaster Recovery

Like tape backup operations, disaster recovery (DR) has been viewed as a necessary but unattractive requirement for IT storage strategies. The cost of implementing a DR solution is balanced against both the likelihood of a major disruption and the impact on business if access to corporate data is lost. Unfortunately, disaster recovery tends to move toward the top of IT priorities only after major natural or human-caused disasters. As a sad commentary on our times, disaster recovery is now a high priority for many institutions and companies, with IT administrators scrambling to implement DR strategies within the confines of budget and personnel limitations.

The scope of a DR solution is more manageable if administrators first identify the types of applications and data that are most critical to business continuance. Customer information and current transactions, for example, must be readily accessible to continue business operations. Project planning data or code for application updates is not as mission-critical, even though such code may represent a substantial investment and should be recovered at some point. Reducing the volume of data that must be accessible in the event of disaster is key to sizing a DR solution to capture what is both essential and affordable.

In addition to prioritizing corporate data, you must consider the time it will take to restore operations. What duration of outage is acceptable without risking loss of ongoing business? If a company can withstand several days of outage, tape backup and restore may be feasible. Otherwise, disk mirroring from primary to DR sites may be required. In practice, some combination of mirroring for business-critical data and tape backup for less critical data is often deployed so that full business operations can be restored in the event of catastrophe, meaning complete loss of the primary site.

Another fundamental challenge for DR strategies is to determine what distance is sufficient to safeguard corporate data. Formerly, for example, financial institutions in New York would create disaster recovery sites in nearby New Jersey or surrounding states. But metro or regional distances are no longer viewed as sufficient to ensure continuous operation, which now requires interregional separation of hundreds or thousands of miles. Similarly, primary data centers in geologically unstable locations such as San Francisco or Seattle are better served by DR facilities well away from fault lines or potentially devastating tsunamis.

Accommodating distance requirements for open systems storage has always been an issue for native Fibre Channel SAN extension. The high cost of dedicated dark fiber, distance limitations of ~50 miles, and performance problems beyond a metro circumference make native Fibre Channel extension unsuitable for robust DR scenarios. FCIP and iFCP can provide long distance support for Fibre Channel-originated storage traffic, whereas iSCSI offers a native IP storage solution to address the distance issue.

The maximum distance allowed depends on the type of DR strategy to be implemented. As demonstrated by the Promontory Project in 2001, streaming applications such as tape backup can be sustained over thousands of miles at gigabit speeds. Latency-sensitive applications such as synchronous disk-to-disk data replication, however, may not tolerate the natural speed-of-light latency induced by extremely long wide area links. Therefore, you should verify the tolerance for latency by vendor-specific disk replication products (synchronous or asynchronous) as part of your DR site selection process.

The DR configuration shown in Figure 12-13 supports both data replication and tape backup options and uses IP network services to connect the primary site to the DR site. In this example, both sites have director-class Fibre Channel switches that connect servers, storage, and tape. The iFCP gateways provide IP connectivity for the wide area link and are connected via E_Ports to the Fibre Channel directors as well as direct-connected to the storage arrays. This dual connection serves two purposes. First, it establishes a path for tape backup from the production site to the DR site. This enables less mission-critical data to be vaulted at the DR location, ready for restore on an as-needed basis. Second, the direct connection to disk enables synchronous or asynchronous disk-to-disk data replication for the most essential business data. The direct connection between the iFCP gateways and disk arrays eliminates the need to pass data replication traffic through the Fibre Channel director and so provides a more direct path from site to site.

Figure 12-13. Disaster recovery configuration using IP network services and disk-based data replication

graphics/12fig13.gif

This example can be extended to additional locations by using IP routing to provide any-to-any connectivity. In the tape vaulting diagram shown earlier in Figure 12-12, for example, multiple regional data centers can be configured for mutual DR support on a round-robin basis for example, New YorkHoustonLos AngelesSeattleNew York. The primary storage array at each location could serve as a secondary mirror for its upstream neighbor. For large financial institutions, insurance companies, or other enterprises with multiple regional data centers, this design offers a means to implement an economical DR strategy using existing resources and mainstream IP network services.



Designing Storage Area Networks(c) A Practical Reference for Implementing Fibre Channel and IP SANs
Designing Storage Area Networks: A Practical Reference for Implementing Fibre Channel and IP SANs (2nd Edition)
ISBN: 0321136500
EAN: 2147483647
Year: 2003
Pages: 171
Authors: Tom Clark

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net