When a site becomes unavailable because of a physical access limitation or a disaster such as a fire or earthquake, steps must be taken to provide any mission-critical applications or business services from a separate location. Creating Redundant and Failover SitesRedundant sites are created for a few reasons: First, they can be created for load balancing or providing higher performance to clients accessing the resources from different geographic locations. For example, an organization may have one site in San Francisco, California, and another in Tampa, Florida, both on different coasts of the United States. East coast clients would connect to Florida, and West coast clients would connect to California. Each site would be a mirror of the other, so the data and services provided at each location would be the same. Another reason for redundant sites is to provide if not all the computer-based services and applications available at the main site, at least all the mission-critical resources. This way, a company can continue to function, perhaps in a limited capacity, but at least it would be able to perform most important business functions. For more information on deciding what services and applications are most important to a business and whether they should be provided in a failover site, refer to the section "Prioritizing the Environment" in Chapter 32. Because every organization has different requirements when it comes to designing a failover site and what it takes to fail over and eventually fail back, this section covers the basic necessities for failing over between redundant sites to be successful. Planning for Site FailoverCompanyabc.com is a fictitious marketing/graphic design firm headquartered in New York, New York. The company has a secondary failover location in Boston, Massachusetts. The New York site provides virtual private network (VPN) and Terminal Services for the Marketing department employees who rarely make it into the office. Because marketing is the core of Companyabc.com's business, the VPN and Terminal Services are key to business continuity. For example, as the graphic artists develop new material for the Marketing department, the Marketing department sales force and client teams use the VPN to get the data to the client for approval. Also, the New York location houses the accounting server, which is used by employees to enter daily time sheets, which in turn are used to generate invoices to bill the clients. If the New York site becomes unavailable, VPN, Terminal Services, file servers containing client documentation, and the accounting servers must all be restored at the Boston site within a few hours of a site disaster for business continuity. In this scenario, restoring is a relatively simple task; most organizations are much more complicated than this. Creating the Failover SiteWhen an organization decides to plan for site failures as part of a disaster recovery solution, there are many areas that need to be addressed and there can be many options to choose from. Using the Companyabc.com scenario, the biggest factors are file data and remote network connectivity through VPN and Terminal Services. This means that network connectivity is a priority, along with spare servers that can accommodate the user load. The spare servers for file data and accounting need to have enough disk space to accommodate a complete restore. As a best practice to ensure a smooth transition, the following list of recommendations provides a starting point:
Allocating hardware and making the site ready to act as a failover site are simple tasks in concept, but the actual failover and fail-back process can be troublesome. Keep in mind that the preceding list applies to failover sites, and not mirrored or redundant sites configured to provide load balancing. Failing Over Between SitesBefore failing over between sites can be successful, administrators need to be aware of what services need to fail over and in which order of precedence. For example, before an Exchange Server 2003 server can be restored, Active Directory domain controllers, global catalog servers, and DNS servers must be available. As a site failure example, at Companyabc.com's headquarters in New York, a fire in the building leaves the server room soaked in fire retardant chemicals and the servers damaged. Failing services over to the Boston location would be necessary in this case. To keep such a cutover at a high level, the following tasks need to be executed in a timely manner:
Failing Back After Site RecoveryWhen the initial site is back online and available to handle client requests and provide access to data and networking services and applications, it is time to consider failing back the services. This can be a controversial subject because fail-back procedures are normally more difficult than the initial failover procedure, but usually only when database servers are involved. Most organizations plan on the failover and have a tested failover plan that may include database log shipping to the disaster recovery site. However, they do not plan how they can get the current data back to the restored servers in the main or preferred site. Questions to consider for failing back are as follows:
The answers really lie in the complexity of the failed-over environment. If the cutover is simple, there is no reason to wait to fail back. Providing Alternative Methods of Client ConnectivityWhen failover sites are too expensive and not an option, that does not mean that an organization cannot plan for site failures. Other lower-cost options are available but depend on how and where the employees do their work. For example, remote salespeople in Companyabc.com most likely have laptops with all the necessary applications they need installed locally. On the other hand, the accounting employees probably do not have laptops, and even if they did, they would need to access the accounting server to query the updated time entries and generate customer invoices. The following are some ways to deal with these issues without renting or buying a separate failover site:
|