Backing Up and Restoring Clusters


To successfully back up and restore the entire cluster or a single cluster node, the cluster administrator must first understand how to troubleshoot, back up, and restore a stand-alone Windows Server 2003. The process of backing up cluster nodes is the same as for a standalone server, but restoring a cluster may require additional steps or configurations that do not apply to a standalone server. Detailed Windows Server 2003 backup and restore techniques and disaster recovery planning best practices are discussed in Chapter 32, "Backing Up a Windows Server 2003 Environment," and Chapter 33, "Recovering from a Disaster." This section focuses mainly on backing up and restoring cluster nodes.

To be prepared to recover different types of cluster failures, you must take the following steps:

1.

For all cluster nodes (single, MNS, and single-quorum nodes), do the following:

  • Back up each cluster node's local disks.

  • Back up each cluster node's system state.

  • Back up the cluster quorum from any node running in the cluster.

  • Back up each cluster node's disks signatures and volume information.

2.

For clusters with shared storage devices, do the following in addition to Step 1:

  • On the individual cluster nodes, document storage adapter settings, including manufacturer name, model number, and configurations such as SCSI ID and IRQ when applicable. Also, note which motherboard slot the nodes are located in.

  • On shared storage devices with built-in RAID controllers, record disk array configurations, including array type, array members, hot spares, volume definition, disk IDs, and LUNs.

  • Back up shared cluster disks.

To back up cluster nodes and data on their storage devices, you use the Windows Server 2003 Backup utility (ntbackup.exe). For detailed information about this utility and the different backup options available, refer to Chapters 32 and 33.

Cluster Node Backup Best Practices

As a backup best practice for cluster nodes, administrators should strive to back up everything as frequently as possible. Because cluster availability is so important, here are some recommendations for cluster node backup:

  • Back up each cluster node's system state daily and immediately before and after a cluster configuration change is made.

  • Back up cluster local drives and system state daily if the schedule permits or weekly if daily backups cannot be performed.

  • Back up cluster shared drives daily if the schedule permits or weekly if daily backups cannot be performed.

  • Use the MSCS Recovery Utility (ClusterRecovery) utility provided in the Windows Server 2003 Resource Kit to save configuration information such as checkpoint files. These checkpoint files are stored in the quorum but are still used to update Registry settings when resources are moved or failed over to another cluster node.

  • Perform an ASR backup on each node following the creation of a new cluster, monthly, and whenever a change is made on the node. For instance, back up when a new cluster application is installed or when a disk is added or removed from a cluster.

Automated System Recovery Backup

Automated System Recovery has two parts: the ASR backup and the ASR restore. An ASR backup can be used to satisfy one of a cluster node's backup requirements, backing up disk signatures and volume information. When a disk signature is overwritten and the cluster can no longer identify shared disks or read volume information, the administrator needs to restore cluster disk signatures using ASR restore. This approach, however, is a last resort and should be used only if no cluster nodes can communicate with the shared devices and all other cluster restore techniques have been exhausted.

An ASR backup of a cluster node contains a disk signature or signatures and volume information; the current system state, which includes the Registry, cluster quorum, boot files, and the COM+ class registration database; system services; and a backup of all local disks containing operating system files, including system and boot partitions. Currently, the only way to back up disk signatures is to create an ASR backup from the local server console using Windows Server 2003 Backup.

To perform an ASR backup, an administrator needs a blank floppy disk and a backup device; either a tape device or disk will suffice. Using recordable CDs and devices for use with the Backup utility is not yet supported, so if no tape device is available, the backup can be run to a backup file on a local or a network drive. Saving the backup file to a network drive helps to ensure that the media can be accessed when an ASR restore is necessary. One point to keep in mind is that an ASR backup will back up each local drive that contains the operating system and any applications installed. For instance, if the operating system is installed on drive C: and MS Office is installed on drive D:, both of these drives will be completely backed up. Although this can greatly simplify restore procedures, it requires additional storage and increases backup time. Using a basic installation of Windows Server 2003 Enterprise server with only the Cluster Service installed, an ASR backup averages 1.3GB in size.

To create an ASR backup, perform the following steps:

1.

Log on to the cluster node with an account that has the right to back up the system. (Any Local Administrator, Domain Administrator, or Cluster Service account has the necessary permissions to complete the operation.)

2.

Click Start, All Programs, Accessories, System Tools, Backup.

3.

If this is the first time you've run Backup, it will open in Wizard mode. Choose to run it in Advanced mode by clicking the Advanced Mode hyperlink. After you change to Advanced mode, the window should look similar to Figure 31.13.

Figure 31.13. Windows Backup in Advanced mode.


4.

Click the Automated System Recovery Wizard button to start the Automated System Recovery Preparation Wizard.

5.

Click Next after reading the Automated System Recovery Preparation Wizard Welcome screen.

6.

Choose your backup media type and choose the correct media tape or file. If you're creating a new file, specify the complete path to the file, and the backup will create the file automatically. Click Next to continue.

7.

If the file you specified resides on a network drive, click OK at the warning message to continue, as shown in Figure 31.14.

Figure 31.14. Warning when selecting a resource for backup.


8.

Click Finish to complete the Automated System Recovery Preparation Wizard and to start the backup.

9.

After the tape or file backup portion completes, the ASR backup prompts you to insert a floppy disk that will contain the recovery information. Insert the disk and click OK to continue.

10.

Remove the floppy disk as requested and label the disk with the appropriate ASR backup information. Click OK to continue.

11.

When the ASR backup is complete, click Close on the Backup Progress windows to return to the backup program or click Report to examine the backup report.

ASR backups should be performed periodically and immediately following any hardware changes to a cluster node, including changes on a shared storage device or local disk configuration. The information contained in the ASR floppy disk is also stored on the backup media. The ASR floppy contains two files, asr.sif and asrpnp.sif, that can be restored from the backup media and copied to a floppy disk when an ASR restore is necessary.

Backing Up the Cluster Quorum

The cluster quorum is backed up when the system state of any active cluster node is backed up. This backup can be used to restore a cluster node to operation when cluster database or log corruption occurs or when the previous state of a cluster needs to be rolled back up to every cluster node. The cluster quorum should be backed up frequently to ensure that the latest version of the cluster configuration is saved. To back up the cluster quorum, follow the steps outlined in the next section.

Backing Up the Cluster Node System State

Each cluster node's system state should be backed up regularly and before and after any hardware or software changes, including cluster configuration changes. This backup will contain the cluster quorum, local server Registry, COM+ registration database, and boot files necessary to start the system. On a domain controller, the system state will also contain the Active Directory database and the SYSVOL folder.

To back up the system state, perform the following steps:

1.

Log on to the cluster node using an account that has the right to back up the system. (Any Local Administrator, Domain Administrator, or Cluster Service account has the necessary permissions to complete the operation.)

2.

Click Start, All Programs, Accessories, System Tools, Backup.

3.

If this is the first time you've run Backup, it will open in Wizard mode. Choose to run it in Advanced mode by clicking the Advanced Mode hyperlink. After you change to Advanced mode, the window should look like the one in Figure 31.13.

4.

Click the Backup Wizard (Advanced) button to start the Backup Wizard.

5.

Click Next on the Backup Wizard Welcome screen to continue.

6.

On the What to Back Up page, choose the Only Back Up the System State Data button, shown in Figure 31.15, and click Next to continue.

Figure 31.15. Choosing the correct option for backup.


7.

Choose your backup media type and choose the correct media tape or file. If you're creating a new file, specify the complete path to the file, and the backup will create the file automatically. Click Next to continue.

8.

If the file you specified resides on a network drive, click OK at the warning message to continue.

9.

Click Finish to complete the Backup Wizard and start the backup.

10.

When the backup is complete, review the backup log for detailed information and click Close on the Backup Progress window when finished.

Backing Up the Local Disks on a Cluster Node

The cluster node local disks should be backed up regularly and, if possible, should be backed up with the system state. This allows both the system state and local disks to be recovered if a complete server failure should occur.

To back up a cluster node's local disks, perform the following steps:

1.

Log on to the cluster node with an account that has the right to back up the system. (Any Local Administrator, Domain Administrator, or the Cluster Service account has the necessary permissions to complete the operation.)

2.

Click Start, All Programs, Accessories, System Tools, Backup.

3.

If this is the first time you've run Backup, it will open in Wizard mode. Choose to run it in Advanced mode by clicking the Advanced Mode hyperlink. After you change to Advanced mode, the window should look like the one in Figure 31.13.

4.

Click the Backup Wizard (Advanced) button to start the Backup Wizard.

5.

Click Next on the Backup Wizard Welcome screen to continue.

6.

On the What To Back Up page, choose the Back Up Selected Files, Drives, or Network Data button and click Next to continue.

7.

In the Items To Back Up window, shown in Figure 31.16, expand Desktop\My Computer and choose each of the local drives.

Figure 31.16. Choosing items to back up.


8.

Choose your backup media type and choose the correct media tape or file. If you're creating a new file, specify the complete path to the file, and the backup will create the file automatically. Click Next to continue.

9.

If the file you specified resides on a network drive, click OK at the warning message to continue.

10.

Click Finish to complete the Backup Wizard and start the backup.

11.

When the backup is complete, review the backup log for detailed information and click Close on the Backup Progress window when finished.

Backing Up Shared Disks on a Cluster

Shared storage disks can be backed up in a few different ways. The first way is to back up the disks from the node that is currently hosting them. This way, the disks can be backed up using the same process used to back up local disks, except the shared disks are chosen in the Backup Selection window.

The second way requires knowledge of the disk drive letters or mount points; it can be run and scheduled from any machine on the network using an account with permission to back up the cluster disks. If the drive letters are known, the cluster administrator can create network places that point to the cluster disk's administrative hidden shares. Alternatively, the hidden drive shares can be mapped to a local drive letter and backed up using the appropriate mapped network drives.

For example, in a cluster called CLUSTER1 with nodes named SERVER1 and SERVER2 and two shared disks named Q and F, the administrator can back up the drives by creating a network place or mapping a drive to \\cluster1\F$ and \\cluster1\Q$. If the disk resources are currently running in groups active on SERVER1, the administrator can connect to those hidden drive shares using the UNC of \\SERVER1\F$ and \\SERVER1\Q$. Using the cluster name or the network name of the particular cluster group containing a disk resource is preferred because the path will be absolute regardless of which node the group is active on.

Note

If shared disks are defined as volume mount points, backing up the drive also backs up data under the mount points.


Restoring a Single-Node Cluster When the Cluster Service Fails

When Cluster Service on a single node fails and will not start, it is usually a sign of corruption in the local cluster database file CLUSDB. In the interest of time, an administrator can replace the CLUSDB file with the latest CHKxxx.tmp file from the quorum disk's MSCS directory.

To replace the CLUSDB file, follow these steps:

1.

Log on to the cluster node using an account that has the right to back up the system. (Any Local Administrator, Domain Administrator, or Cluster Service account has the necessary permissions to complete the operation.)

2.

Open Cluster Administrator on an available cluster node. Then check to ensure that all cluster groups are running properly to verify that the Cluster Service problem is only on a single node.

3.

If only one node is experiencing Cluster Service startup problems, log on to the server console and click Start, All Programs, Administrative Tools, Services.

4.

In the Services applet, locate Cluster Service and double-click it.

5.

On the General tab of the property page for Cluster Service, disable the Startup Type service. Click OK to save changes.

6.

Reboot the server to release any file locks on the CLUSDB file.

7.

When the server completes the reboot process, log on with a Cluster Administrator account.

8.

Click Start, Run.

9.

Connect to the cluster quorum disk by using the UNC path \\<clustername>\ <quorum_drive_letter>$. For example, in a cluster named cluster1 with a quorum disk named Q, use the path \\cluster1\Q$.

10.

Double-click the MSCS directory.

11.

Choose View, Details in the Explorer window.

12.

Locate the file named CHKxxx.tmp with the latest time stamp, similar to the one shown in Figure 31.17.

Figure 31.17. Choosing a backup set for restoral.


13.

Right-click the file and choose Copy. Then close the Explorer window.

14.

Click Start, Run.

15.

Type in the full path to the cluster directory and click OK. The default path is C:\windows\cluster, where C is the system drive and windows is the %SystemRoot% directory.

16.

Locate the CLUSDB file, right-click it, and choose Rename.

17.

Rename the file to CLUSDB.old and press Enter to save. If the file cannot be renamed, make sure Cluster Service is set to disable, reboot the server, and then try again.

18.

Choose Edit, Paste in the Explorer window. The CHKxxx.tmp file should now be copied in the c:\windows\cluster directory.

19.

Locate the CHKxxx.tmp file, right-click it, and choose Rename.

20.

Rename the file to CLUSDB and press Enter to save. If the file cannot be renamed, make sure the Cluster Service is set to disable, reboot the server, and then try again.

21.

Close the Explorer window.

22.

Click Start, All Programs, Administrative Tools, Services.

23.

In the Services applet, locate Cluster Service and double-click it.

24.

On the General tab of Cluster Service's property page, change the Startup Type service to Automatic. Click OK to save your changes.

25.

Right-click Cluster Service and choose Start.

26.

When Cluster Service starts, move the appropriate group or groups to the recovered node to test failover functionality.

If this process does not restore operational status to Cluster Service, restore the system state from a previous backup by following these steps:

1.

Click Start, All Programs, Accessories, System Tools, Backup.

2.

If this is the first time you've run Backup, it will open in Wizard mode. Choose to run it in Advanced mode by clicking on the Advanced Mode hyperlink. After you change to Advanced mode, the window should look like the one in Figure 31.13.

3.

Click the Restore Wizard (Advanced) button to start the Restore Wizard.

4.

Click Next on the Restore Wizard Welcome screen to continue.

5.

On the What to Restore page, select the appropriate cataloged backup media, expand the catalog selection, and check System State, as shown in Figure 31.18. Click Next to continue.

Figure 31.18. Choosing to restore the system state.


6.

If the correct tape or file backup media does not appear in this window, cancel the restore process. Then, from the Restore Wizard page, locate and catalog the appropriate media and return to the restore process from step 1.

Note

Refer to Chapter 33 for information on how to catalog tape and file backup media.

7.

On the Completing the Restore Wizard page, click Finish to start the restore.

8.

When the process is complete, review the log for detailed information and click Close when finished.

9.

Reboot the restored cluster node as prompted.

10.

When Cluster Service starts, move the appropriate group or groups to the recovered node to test failover functionality.

Restoring a Single Node After a Complete Server Failure

When a single node fails, whether because of hardware problems or software corruption that cannot be repaired in a reasonable amount of time, the node must be rebuilt from scratch. After any hardware problems are resolved, the organization can decide what the best approach to server recovery will be. The two basic approaches to node recovery are outlined next.

Evicting and Rebuilding the Failed Node

This first node recovery process evicts the failed node from the cluster and requires the cluster administrator to rebuild the cluster node from scratch, rejoin the node to the cluster, install any cluster applications, and finally reconfigure the cluster's group failover and failback configurations.

To evict and rebuild the failed node, follow these steps:

1.

Shut down the failed cluster node.

2.

On an available cluster node, log in using a Cluster Administrator account.

3.

Click Start, Administrative Tools, Cluster Administrator.

4.

If Cluster Administrator does not connect to the cluster or connects to a different cluster, choose File, Open Connection.

5.

From the Active drop-down box, choose Open Connection to Cluster. Then, in the Cluster or Server Name drop-down box, type . (period) and click OK to connect.

6.

In the left pane of the Cluster Administrator window, right-click the offline cluster node and choose Evict Node.

7.

When the node is evicted, close Cluster Administrator and immediately start a backup of the local node's system state. Refer to the previous section "Backing Up the Cluster Node System State" for detailed steps for system state backup.

8.

On the failed node, install a clean copy of Windows Server 2003 Enterprise or Datacenter server.

9.

After it is loaded, configure the server to join the correct domain and configure all local drive letters and network card IP addresses as previously configured on the original cluster node. Then reboot if necessary.

10.

Follow the steps to rejoin the cluster as outlined in the previous section, "Adding Additional Nodes to a Cluster."

11.

After the node rejoins the cluster, install any cluster applications as outlined in the vendor's installation guide for cluster installation.

12.

Configure cluster group failover and failback as necessary and move cluster groups to their preferred node.

Restoring the Failed Node Using the ASR Restore

To restore the failed node using the ASR restore, follow these steps:

1.

Shut down the failed cluster node.

2.

On an available cluster node, log in using a Cluster Administrator account.

3.

Click Start, Administrative Tools, Cluster Administrator.

4.

If Cluster Administrator does not connect to the cluster or connects to a different cluster, choose File, Open Connection.

5.

From the Active drop-down box, choose Open Connection to Cluster. Then, in the Cluster or Server Name drop-down box, type . (period) and click OK to connect.

6.

Within each cluster group, make sure to disable failback to prevent these groups from failing over to a cluster node that is not completely restored. Close Cluster Administrator.

7.

Locate the ASR floppy created for the failed node or create the floppy from the files saved in the ASR backup media. For information on creating the ASR floppy from the ASR backup media, refer to Help and Support from any Windows Server 2003 Help and Support tool.

8.

Insert the operating system CD in the failed server and start the server.

9.

If necessary, when prompted, press F6 to install any third-party storage device drivers. This includes any third-party disk or tape controllers that Windows Server 2003 will not recognize.

10.

Press F2 when prompted to perform an automated system recovery.

11.

When prompted, insert the ASR floppy disk and press Enter.

12.

The operating system installation will proceed by restoring disk volume information and reformatting the volumes associated with the operating system. When this process is complete, restart the server as requested by pressing F3 and then Enter in the next window.

13.

After the system restarts, press a key if necessary to restart the CD installation.

14.

If necessary, when prompted, press F6 to install any third-party storage device drivers. This includes any third-party disk or tape controllers that Windows Server 2003 will not recognize.

15.

Press F2 when prompted to perform an automated system recovery.

16.

When prompted, insert the ASR floppy disk and press Enter.

17.

This time, the disks can be properly identified and will be formatted, and the system files will be copied to the respective disk volumes. When this process is complete, the ASR restore will automatically reboot the server. Remove the ASR floppy disk from the drive. The graphic-based OS installation will begin.

18.

If necessary, specify the network location of the backup media using a UNC path and enter authentication information if prompted. The ASR backup will attempt to reconnect to the backup media automatically but will be unable if the backup media are on a network drive.

19.

When the media are located, open the media and click Next. Then finish recovering the remaining ASR data.

20.

When the ASR restore is complete, if any local disk data was not restored with the ASR restore, restore all local disks.

21.

Click Start, All Programs, Accessories, System Tools, Backup.

22.

If this is the first time you've run Backup, it will open in Wizard mode. Choose to run it in Advanced mode by clicking the Advanced Mode hyperlink. After you change to Advanced mode, the window should look like the one in Figure 31.13.

23.

Click the Restore Wizard (Advanced) button to start the Restore Wizard.

24.

Click Next on the Restore Wizard Welcome screen to continue.

25.

On the What To Restore page, select the appropriate cataloged backup media, expand the catalog selection, and check each local drive. Click Next to continue.

26.

If the correct tape or file backup media do not appear in this window, cancel the restore process. Then locate and catalog the appropriate media from the Restore Wizard page and return to the restore process from step 23.

Note

Refer to Chapter 33 for information on how to catalog tape and file backup media.

27.

On the Completing the Restore Wizard page, click Finish to start the restore. Because you want to restore only what ASR did not, you do not need to make any advanced restore configuration changes.

28.

When the restore is complete, reboot the server as prompted.

29.

After the reboot is complete, log on to the restored cluster node and check cluster node functionality.

30.

If everything is working properly, open Cluster Administrator and configure all cluster group failover and failback configurations.

31.

Move cluster groups to their preferred node and close Cluster Administrator.

Restoring an Entire Cluster to a Previous State

Changes to a cluster should be made with caution and, if at all possible, should be made in a lab environment first. When cluster changes have been implemented and deliver undesirable effects, the way to roll back the cluster configuration to a previous state is to restore the cluster quorum to all nodes. This process is simpler than it sounds and is performed from only one node. There are only two disadvantages to this process:

  • All the cluster nodes that were members of the cluster previously need to be currently available and operational in the cluster. For example, if Cluster1 was made up of Server1 and Server2, both of these nodes need to be active in the cluster before the previous cluster configuration can be rolled back.

  • To restore a previous cluster configuration to all cluster nodes, the entire cluster needs to be taken offline long enough to restore the backup, reboot the node from which the backup was run, and manually start Cluster Service on all remaining nodes.

Note

If a cluster node is in a failed state, the cluster configuration cannot be rolled back. Refer to the "Restoring a Single Node After a Complete Server Failure" or the "Restoring the Failed Node Using the ASR Restore" sections to restore a failed cluster node to operational status and then restore a previous cluster configuration as shown here.


To restore an entire cluster to a previous state, perform the following steps:

1.

Log on to the cluster node using an account that has the right to back up the system. (Any Local Administrator, Domain Administrator, or Cluster Service account has the necessary permissions to complete the operation.)

2.

Click Start, All Programs, Accessories, System Tools, Backup.

3.

If this is the first time you've run Backup, it will open in Wizard mode. Choose to run it in Advanced mode by clicking the Advanced Mode hyperlink. After you change to Advanced mode, the window should look like the one in Figure 31.13.

4.

Click the Restore Wizard (Advanced) button to start the Restore Wizard.

5.

Click Next on the Restore Wizard Welcome screen to continue.

6.

On the What To Restore page, select the appropriate cataloged backup media, expand the catalog selection, and check System State (refer to Figure 31.18). Click Next to continue.

7.

If the correct tape or file backup media does not appear in this window, cancel the restore process. Then, from the Restore Wizard page, locate and catalog the appropriate media and return to the restore process from step 4.

8.

On the Completing the Restore Wizard page, select the Advanced button to configure advanced restore settings.

9.

On the Where To Restore page, choose to restore files to the original location and click Next.

10.

A warning message will pop up stating that the restoring system state will overwrite the current system state. Click OK to continue.

11.

On the How To Restore page, choose the Leave Existing Files (Recommended) radio button and click Next to continue.

12.

On the Advanced Restore Options page, check the Restore the Cluster Registry to the Quorum Disk and All Other Nodes box, similar to the options selected in Figure 31.19, and click Next to continue.

Figure 31.19. Selecting options for restoral.


13.

A warning message pops up stating that this restore will replace the master version of the cluster quorum and will stop Cluster Service on all the other nodes in the cluster. Click Yes to continue.

14.

On the Completing The Restore Wizard page, click Finish to start the restore.

15.

When the process is complete, review the log for detailed information and click Close when finished.

16.

Reboot the restored cluster node as prompted.

17.

After the restored node completes rebooting and the previous cluster configuration is restored, start Cluster Service on all the remaining cluster nodes.

18.

Move cluster groups as desired and close Cluster Administrator.

Restoring Cluster Nodes After a Cluster Failure

Cluster nodes can be restored after a cluster failure using a combination of the previously described restore steps, with a few added steps. If each cluster node can start but Cluster Service cannot start on any node, there is most likely a problem with the quorum drive or quorum data.

To restore the cluster nodes in this situation, follow these steps:

1.

To restore the quorum data, follow the steps outlined in the section titled "Restoring a Single-Node Cluster When the Cluster Service Fails."

2.

After the system state restore is completed, if Cluster Service starts on the first node, start Cluster Service on all the remaining nodes.

If Cluster Service does not start, there may be a problem with the cluster quorum drive. Make any necessary repairs on the cluster quorum drive and restore the cluster quorum as outlined in the section "Restoring a Single-Node Cluster When the Cluster Service Fails."

If Cluster Service still does not start, follow the instructions in the Windows Server 2003 Help and Support article named "Recover from a Corrupted Quorum Log or Quorum Disk."

When all nodes in the cluster are non-operational and the cluster nodes need to be rebuilt from scratch, follow these steps:

1.

Power off all nodes in the cluster.

2.

Power on only the cluster node and perform an ASR restore as outlined in the section "Restoring the Failed Node Using the ASR Restore." This restore should restore the node and Cluster Service and basic cluster functionality.

3.

Restore any missing local disk data and cluster disk data.

4.

Perform ASR and local disk restores on remaining cluster nodes to restore complete cluster functionality.




Microsoft Windows Server 2003 Unleashed(c) R2 Edition
Microsoft Windows Server 2003 Unleashed (R2 Edition)
ISBN: 0672328984
EAN: 2147483647
Year: 2006
Pages: 499

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net