Site Failure Recovery


When a site becomes unavailable because of a physical access limitation or a disaster such as a fire or earthquake, steps must be taken to provide any mission-critical applications or business services from a separate location.

Creating Redundant and Failover Sites

Redundant sites are created for a few reasons: First, they can be created for load balancing or providing higher performance to clients accessing the resources from different geographic locations. For example, an organization may have one site in San Francisco, California, and another in Tampa, Florida, both on different coasts of the United States. East coast clients would connect to Florida, and West coast clients would connect to California. Each site would be a mirror of the other, so the data and services provided at each location would be the same.

Another reason for redundant sites is to provide if not all the computer-based services and applications available at the main site, at least all the mission-critical resources. This way, a company can continue to function, perhaps in a limited capacity, but at least it would be able to perform most important business functions.

For more information on deciding what services and applications are most important to a business and whether they should be provided in a failover site, refer to the section "Prioritizing the Environment" in Chapter 32.

Because every organization has different requirements when it comes to designing a failover site and what it takes to fail over and eventually fail back, this section covers the basic necessities for failing over between redundant sites to be successful.

Planning for Site Failover

Companyabc.com is a fictitious marketing/graphic design firm headquartered in New York, New York. The company has a secondary failover location in Boston, Massachusetts. The New York site provides virtual private network (VPN) and Terminal Services for the Marketing department employees who rarely make it into the office. Because marketing is the core of Companyabc.com's business, the VPN and Terminal Services are key to business continuity. For example, as the graphic artists develop new material for the Marketing department, the Marketing department sales force and client teams use the VPN to get the data to the client for approval. Also, the New York location houses the accounting server, which is used by employees to enter daily time sheets, which in turn are used to generate invoices to bill the clients. If the New York site becomes unavailable, VPN, Terminal Services, file servers containing client documentation, and the accounting servers must all be restored at the Boston site within a few hours of a site disaster for business continuity. In this scenario, restoring is a relatively simple task; most organizations are much more complicated than this.

Creating the Failover Site

When an organization decides to plan for site failures as part of a disaster recovery solution, there are many areas that need to be addressed and there can be many options to choose from. Using the Companyabc.com scenario, the biggest factors are file data and remote network connectivity through VPN and Terminal Services. This means that network connectivity is a priority, along with spare servers that can accommodate the user load. The spare servers for file data and accounting need to have enough disk space to accommodate a complete restore. As a best practice to ensure a smooth transition, the following list of recommendations provides a starting point:

  • Allocate the appropriate hardware devices including servers with enough processing power and disk space to accommodate the restored machines' resources.

  • Host the organization's DNS zones and records using primary DNS servers located either at an Internet service provider (ISP) co-location facility or have redundant DNS servers registered for the domain and located at both the physical locations.

  • Ensure that DNS record-changing procedures are documented and available at the remote site or at an offsite data storage location.

  • For the VPN and Terminal servers, ensure that the host records in the DNS tables are set to low Time to Live (TTL) values so that DNS changes do not take extended periods to propagate across the Internet. Microsoft Windows Server 2003 default TTL time is one hour.

  • Ensure that network connectivity is already established and stable between sites and between each site and the Internet.

  • Replicate file data between the two sites as often as possible.

  • Create at least two copies of backup media (tapes) that contain backed-up or archived company data. One copy should remain at the headquarters. A second copy should be stored with an offsite data storage company. An optional third copy could be stored at another site location; that copy can be used to restore the file to spare hardware on a regular basis to restore Windows if a site failover is necessary.

  • Have a copy of all disaster recovery documentation stored at multiple locations as well as at the offsite data storage company. This will provide redundancy should a recovery become necessary.

Allocating hardware and making the site ready to act as a failover site are simple tasks in concept, but the actual failover and fail-back process can be troublesome. Keep in mind that the preceding list applies to failover sites, and not mirrored or redundant sites configured to provide load balancing.

Failing Over Between Sites

Before failing over between sites can be successful, administrators need to be aware of what services need to fail over and in which order of precedence. For example, before an Exchange Server 2003 server can be restored, Active Directory domain controllers, global catalog servers, and DNS servers must be available.

As a site failure example, at Companyabc.com's headquarters in New York, a fire in the building leaves the server room soaked in fire retardant chemicals and the servers damaged. Failing services over to the Boston location would be necessary in this case.

To keep such a cutover at a high level, the following tasks need to be executed in a timely manner:

  • Update Internet DNS records pointing to the VPN and Terminal servers.

  • Restore any necessary Windows Server 2003 domain controllers, global catalog servers, and internal DNS servers as necessary.

  • Restore VPN and Terminal servers.

  • Restore the file and accounting servers and restore the latest available backup tape when restoring data.

  • Test client connectivity, troubleshoot, and provide remote and local client support as needed.

Failing Back After Site Recovery

When the initial site is back online and available to handle client requests and provide access to data and networking services and applications, it is time to consider failing back the services. This can be a controversial subject because fail-back procedures are normally more difficult than the initial failover procedure, but usually only when database servers are involved. Most organizations plan on the failover and have a tested failover plan that may include database log shipping to the disaster recovery site. However, they do not plan how they can get the current data back to the restored servers in the main or preferred site.

Questions to consider for failing back are as follows:

  • Will downtime be necessary to resynchronize data or databases between the sites?

  • When is the appropriate time to fail back?

  • Is the failover site less functional than the preferred site? In other words, are only mission-critical services provided in the failover site or is it a complete copy of the preferred site?

The answers really lie in the complexity of the failed-over environment. If the cutover is simple, there is no reason to wait to fail back.

Providing Alternative Methods of Client Connectivity

When failover sites are too expensive and not an option, that does not mean that an organization cannot plan for site failures. Other lower-cost options are available but depend on how and where the employees do their work. For example, remote salespeople in Companyabc.com most likely have laptops with all the necessary applications they need installed locally. On the other hand, the accounting employees probably do not have laptops, and even if they did, they would need to access the accounting server to query the updated time entries and generate customer invoices.

The following are some ways to deal with these issues without renting or buying a separate failover site:

  • Consider renting racks or cages at a local ISP to co-locate servers that can be accessed during a site failure.

  • Have users dial in from home to a Terminal server hosted at an ISP to access applications and file data, including the accounting server data.

  • Configure important folders for remote users with laptops so that they can have offline copies stored on their laptops that will synchronize with the server when the connection becomes available.

  • Rent temporary office space, printers, networking equipment, and user workstations with common standard software packages such as Microsoft Office and Internet Explorer. You can plan for and execute this option in about one day. If this is an option, be sure to find a computer rental agency first and get pricing before a failure occurs and you have no choice but to pay the rental rates.




Microsoft Windows Server 2003 Unleashed(c) R2 Edition
Microsoft Windows Server 2003 Unleashed (R2 Edition)
ISBN: 0672328984
EAN: 2147483647
Year: 2006
Pages: 499

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net