Chapter 22: Cluster Maintenance and Recovery | TruCluster Server Handbook (HP Technologies)

Overview

Clusters are great when things are working well. Clusters are great even when things aren't working so well since some of the benefits of a cluster are its high availability and robustness, but we need to be prepared for some possible bad situations. For example, what do you do if you lose an entire member boot disk (which only affects a single member) or cluster_root, cluster_usr, and cluster_var? Or what if something happens to your data file system(s)? These types of problems should be extremely rare if you've followed our advice and the advice given in the TruCluster Server documentation and built your cluster with no single point of failure. But sometimes bad things still happen. For example, an errant "rm *" at the wrong place will cause extensive damage that may only be repaired by a restore of the affected file system(s). We'll tackle some of these types of problems and show you how to work your way out of a few tight situations. In addition we'll show how to change some of the characteristics of your cluster such as the IP address and the cluster interconnect.

We will cover the following:

	Section
Backup and Restore of Critical Cluster File Systems	22.1
Replacing HBA and/or HSx Controllers	22.2
Installing Customer Specific Patches	22.3
Multi-Path Storage	22.4
I/O Barriers and the `cleanPR` Command	22.5
How to Replace a Failed Quorum Disk	22.6
Migrating from MC to LAN Cluster Interconnect (and vice versa)	22.7
Name and Address Changes	22.8
References	22.9