Recovering from a Site Failure

 <  Day Day Up  >  

When a site becomes unavailable due to a physical access limitation or a disaster such as a fire or earthquake, steps must be taken to provide the recovery of the Exchange server in the site. Exchange does not have a single-step method of merging information from the failed site server into another server, so the process involves recovering the lost server in its entirety.

To prepare for the recovery of a failed site, an organization can create redundancy in a failover site. With redundancy built into a remote site, the recovery and restore process can be minimized if a recovery needs to performed.

Creating Redundant and Failover Sites

Redundant sites are created for a couple of different reasons. First, a redundant site can have a secondary Internet connection and bridgehead routing server so that if the primary site is down, the secondary site can be the focus for inbound and outbound email communications. This redundancy can be built, configured, and set to automatically provide failover in case of a site failure. See Chapter 3, "Installing Exchange Server 2003," for details on creating Routing Group connectors and bridgehead servers.

The other reason for redundant site preparation is to provide a warm spare server site so that a company will be prepared to perform a server restore of a site server in case of a site failure. The site recovery can simply be having server documentation available in another site or having a full image of server information stored in another site. The more preparatory work is conducted up front, the faster the organization will be able to recover from a system failure.

Creating the Failover Site

When an organization decides to plan for site failures as part of a disaster-recovery solution, many areas need to be addressed and many options exist. For organizations looking for redundancy, network connectivity is a priority, along with spare servers that can accommodate the user load. The spare servers need to have enough disk space to accommodate a complete restore. As a best practice, to ensure a smooth transition, the following list of recommendations provides a starting point:

  • Allocate the appropriate hardware devices, including servers with enough processing power and disk space to accommodate the restored machines' resources.

  • Host the organization's DNS zones and records using primary DNS servers located at an Internet service provider (ISP) collocation facility, or have redundant DNS servers registered for the domain and located at both physical locations.

  • Ensure that DNS record-changing procedures are documented and available at the remote site or at an offsite data storage location.

  • For the Exchange servers, ensure that the host records in the DNS tables are set to low Time to Live (TTL) values so that DNS changes do not take extended periods to propagate across the Internet. The Microsoft Windows Server 2003 default TTL is 1 hour .

  • Ensure that network connectivity is already established and stable between sites and between each site and the Internet.

  • Create at least one copy of backup tape medium for each site. One copy should remain at one location, and a second copy should be stored with an offsite data storage company. An optional third copy could be stored at another site location and can be used to restore the file to spare hardware on a regular basis, to restore Windows if a site failover is necessary.

  • Have a copy of all disaster-recovery documentation stored at multiple locations as well as at the offsite data storage company. This provides redundancy if a recovery becomes necessary.

Allocating hardware and making the site ready to act as a failover site are simple tasks in concept, but the actual failover and failback process can be troublesome . Keep in mind that the preceding list applies to failover sites, not mirrored or redundant sites configured to provide load balancing.

Failing Over Between Sites

Before failing over between sites can be successful, administrators need to be aware of what services need to fail over and in which order of precedence. For example, before an Exchange server can be restored, Active Directory domain controllers, Global Catalog servers, and DNS servers must be available.

To keep such a cutover at a high level, the following tasks need to be executed in a timely manner:

  1. Update Internet DNS records pointing to the Exchange server(s).

  2. Restore any necessary Windows Server 2003 domain controllers, Global Catalog servers, and internal DNS servers.

  3. Restore the Exchange server(s).

  4. Test client connectivity, troubleshoot, and provide remote and local client support as needed.

Failing Back After Site Recovery

When the initial site is back online and available to handle client requests and provide access to data and networking services and applications, it is time to consider failing back the services. This can be a controversial subject because failback procedures are usually more difficult than the initial failover procedure. Most organizations plan on the failover and have a tested failover plan that might include database log shipping to the disaster-recovery site. However, they do not plan how they can get the current data back to the restored servers in the main or preferred site.

Questions to consider for failing back are as follows :

  • Will downtime be necessary to restore databases between the sites?

  • When is the appropriate time to fail back?

  • Is the failover site less functional than the preferred site? In other words, are only mission-critical services provided in the failover site, or is it a complete copy of the preferred site?

The answers really lie in the complexity of the failed-over environment. If the cutover is simple, there is no reason to wait to fail back.

Providing Alternative Methods of Client Connectivity

When failover sites are too expensive and are not an option, that does not mean that an organization cannot plan for site failures. Other lower-cost options are available but depend on how and where the employees do their work. For example, many times users who need to access email can do so without physically being at the site location. Email can be accessed remotely from other terminals or workstations.

The following are some ways to deal with these issues without renting or buying a separate failover site:

  • Consider renting racks or cages at a local ISP to colocate servers that can be accessed during a site failure.

  • Have users dial in from home to a terminal server hosted at an ISP to access Exchange.

  • Set up remote user access using Terminal Services or Outlook Web Access at a redundant site so that users can access their email, Calendar, and contacts from any location.

  • Rent temporary office space, printers, networking equipment, and user workstations with common standard software packages such as Microsoft Office and Internet Explorer. You can plan for and execute this option in about one day. If this is an option, be sure to find a computer rental agency first and get pricing before a failure occurs and you have no choice but to pay the rental rates.

 <  Day Day Up  >  


Microsoft Exchange Server 2003 Unleashed
Microsoft Exchange Server 2003 Unleashed (2nd Edition)
ISBN: 0672328070
EAN: 2147483647
Year: 2003
Pages: 393
Authors: Rand Morimoto

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net