28.2 Metrocluster | HP-UX CSE(c) Official Study Guide and Desk Reference

When we consider the architectural differences between an Extended Serviceguard cluster and a Metrocluster, there are in fact no architectural differences as such:

Multiple, separate data centers are employed; we are guarding against a catastrophic site/network failure.
Each major data center should have the same number of nodes.
Metrocluster is a single cluster of up to 16 nodes.
Distances are up to 100Km utilizing DWDM technology.
All data center machines are connected to shared storage devices.
Networking is via single IP subnets utilizing optical link-level interfaces.
Redundancy of all SAN and network components is essential.
Failover to an adoptive node is automatic.

Obviously, there are fundamental differences between the to architectures. Here are the differences:

Use of a cluster lock disk(s) is not supported.
Cluster lock functionality is provided via arbitrator nodes or a Quorum Server.

- Two arbitrator nodes in a third data center is the preferred solution because it allows for one of these nodes to be shut down for urgent maintenance without affecting the availability of cluster lock functionality.
Data replication is performed by intelligent disc arrays such as HP's XP disc arrays.
Additional software components are loaded on all nodes in the cluster. There are two forms of Metrocluster:
1. Metrocluster/CA : For use in conjunction with HP XP disc arrays
2. Metrocluster/SRDF : For use in conjunction with EMC Symmetrix disc arrays

We briefly discuss the working of Metrocluster/CA. While the concepts for Metrocluster/SRDF are the same, the implementation of how EMC performs remote data replication is slightly different. This brief discussion assumes that you have a complete understanding of how HP's XP disc arrays operate in a Continuous Access configuration. If you do not have this prerequisite knowledge, then these tasks will be almost impossible to understand as well as implement. As well as providing the entire infrastructure that we provided for our Extended Serviceguard cluster, we need to ensure that we have performed the following:

Installed Metrocluster with Continuous Access XP Toolkit (B8109BA).
Verified that Serviceguard A.11.13 or later is installed on all nodes in the cluster, because the current version of Metrocluster/CA (version A.04.10) requires it.
Provided cluster lock functionality either via arbitrator nodes (preferred) or a Quorum Server.
Configured and tested the appropriate CA-LUNs on the main and remote XP disc arrays.
Installed RAID Manager XP software on all nodes in the cluster.
Set up and tested the appropriate RAID Manager instances on all nodes in the cluster.
Set up the appropriate LVM/VxVM/CVM shared volume/disc groups on the CA-LUNs.

Now we are ready to get started. The Metrocluster/CA toolkit is a series of template files under /opt/cmcluster/toolkit/SGCA :

 root@hpeos001[SGCA]  pwd  /opt/cmcluster/toolkit/SGCA root@hpeos001[SGCA]  ll  total 44 dr-xr-xr-x   2 bin        bin           1024 Aug 19 01:44 Samples -rwxr--r--   1 bin        bin            429 Jun 29  2001 sgcapkg.cntl -rwxr--r--   1 bin        bin          20184 Mar  8  2002 xpca.env root@hpeos001[SGCA]  ll Samples/  total 80 -r--r--r--   1 bin        bin           2335 Nov  2  2000 Readme -rwxr--r--   1 bin        bin            620 Nov  2  2000 ftpit -rwxr--r--   1 bin        bin           6338 Nov  2  2000 horcm0.conf.ftsys1 -rwxr--r--   1 bin        bin           6340 Nov  2  2000 horcm0.conf.ftsys1a -rwxr--r--   1 bin        bin           6338 Nov  2  2000 horcm0.conf.ftsys2 -rwxr--r--   1 bin        bin           6340 Nov  2  2000 horcm0.conf.ftsys2a -rwxr--r--   1 bin        bin           3154 Nov  2  2000 mk1VGs -rwxr--r--   1 bin        bin           2616 Nov  2  2000 mk2imports -rwxr--r--   1 bin        bin            516 Nov  2  2000 services.example root@hpeos001[SGCA]

The files under the Samples directory give you guidelines as to how to set up instances, a /etc/services file, exporting/importing volume groups, and so on. Prior to Metrocluster/CA version A.03.00, the file sgcapkg.cntl was highly significant. We would use this file as our package control script in a Metrocluster environment. It had commands specific to Metrocluster and XP disk arrays that would apply only in those situations. This is no longer the case. We can simply build our package control script as we would for any other package, i.e., cmmakepkg “s <package>.cntl . There is logic within the default package control scripts to understand when we are running in a Metrocluster environment. The real crux of the Metrocluster/CA toolkit is the file xpca.env . This file will detail the components of our RAID Manager/XP instances, the names of the nodes in the data centers, and what will happen in the event of a failure of one site. Do we automatically resynchronize the primary site from the remote secondary site after fixing whatever the problem was? What should we do if the secondary site fails? Should we continue with the pair relationship or destroy the relationship? These and a number of behavioral questions are answered by setting the appropriate variables in the xpca.env file. As you can see from the output above, the file is approximately 20Kb in size . I will not list the entire file here. It contains considerable comments regarding the behavior of the various configuration parameters. It is well worth reading this file before proceeding. It is also well worth reading the excellent manual Designing Disaster Tolerant High Availability Clusters (Part Number: B7660-90013) available from http://docs.hp.com/ha/index.html#Metrocluster. What I will do is outline the steps to enable your packages to operate in this environment:

Install Metrocluster/CA on all nodes in the cluster.
Build a Serviceguard package as before, e.g.:
1. mkdir /etc/cmcluster/ clockwatch
2. cd /etc/cmcluster/clockwatch
3. cmmakepkg “p clockwatch.conf
4. cmmakepkg “s clockwatch.cntl
5. Configure both files as appropriate.

Copy the xpca.env file into the package directory ensuring that the target filename is correct, e.g.:

 #  cp /opt/cmcluster/toolkit/SGCA/xpca.env \   /etc/cmcluster/clocwatch/clockwatch_xpca.env

Configure the clockwatch_xpca.env file as appropriate. At the end of the file is the list of configuration parameters. It is a good idea to cut-and-paste these lines to ensure that you know what the default values are. You can then uncomment the variables and set them to the appropriate values:
```
 
```
```
 AUTO_PSUEPSUS=0 AUTO_FENCEDATA_SPLIT=1 AUTO_SVOLPSUS=0 AUTO_SVOLPSUE=0 AUTO_SVOLPFUS=0 AUTO_PSUSSSWS=0 AUTO_NONCURDATA=0 MULTIPLE_PVOL_OR_SVOL_FRAMES_FOR_PKG=0 
```
These values have default values that seem appropriate in most cases. Be careful to ensure that you understand the implications of changing these variables.
Set the HORCMPERM variable as appropriate. The default of MGRNOINST means that we are not using a RAID Manager permissions file.
```
 
```
```
 export HORCMPERM=MGRNOINST 
```
Set the HORCMINST variable to the RAID Manager Instance number to be used by Metrocluster/CA.
```
 
```
```
 export HORCMINT=0 
```
Set the HORCMTIMEOUT value as appropriate. The default is 360 seconds (60 seconds greater than the default sidefile timeout value). If using CA-Extended (Asynchronous), then set this value to be greater than the sidefile timeout value but less than the RUN_SCRIPT_TIMEOUT in the package control file.
```
 
```
```
 HORCMTIMEOUT=360 
```
Set the WAITTIME variable as appropriate:
```
 
```
```
 WAITTIME=300 
```
This variable sets the time to wait while monitoring the state of a pair after a pairresync has been issued. It is advised not to change this from its default value.
Set the PKGDIR variable as appropriate:
```
 
```
```
 PKGDIR="/etc/cmcluster/clockwatch" 
```
Set the FENCE level to either DATA , NEVER , or ASYNC , as appropriate.
```
 
```
```
 FENCE=DATA 
```

Define the list of hosts in Data Center 1 and Data Center 2

 DC1HOST[0]="" DC1HOST[1]="" DC2HOST[0]="" DC2HOST[1]=""

Set the DEVICE_GROUP variable as appropriate:
```
 
```
```
 DEVICE_GROUP="oracle" 
```
This is the Device Group name given to the CA-LUNs in your instance configuration files.
Set the CLUSTER_TYPE variable to metro .
```
 
```
```
 CLUSTER_TYPE="metro" 
```

Prior to Metrocluster/CA version A.04.20, there was nothing else to do. You would run cmcheckconf and cmapplyconf , and your package would start up as expected. Before proceeding, it is worth understanding what commands are run as part of the startup of our package. Here is an excerpt from a package control script:

 # START OF RUN FUNCTIONS ############################################################### #  This function checks for the existence of Metrocluster or # ContinentalClusters packages that use physical data # replication via Continuous Access XP on HP SureStore XP # series disk arrays or SRDF on EMC Symmetrix disk arrays. # #  If the /usr/sbin/DRCheckDiskStatus file exists in the system, # then the cluster has at least one package which will be # configured for remote data mirroring in a metropolitan or # continental cluster. # #  The function is called before attempting to activate the # volume group. If no /usr/sbin/DRCheckDiskStatus file exists, # the function does nothing. # ############################################################### # function verify_physical_data_replication { if [[ -x /usr/sbin/DRCheckDiskStatus ]] then     /usr/sbin/DRCheckDiskStatus "  # START OF RUN FUNCTIONS ############################################################### # This function checks for the existence of Metrocluster or # ContinentalClusters packages that use physical data # replication via Continuous Access XP on HP SureStore XP # series disk arrays or SRDF on EMC Symmetrix disk arrays. # # If the /usr/sbin/DRCheckDiskStatus file exists in the system, # then the cluster has at least one package which will be # configured for remote data mirroring in a metropolitan or # continental cluster. # # The function is called before attempting to activate the # volume group. If no /usr/sbin/DRCheckDiskStatus file exists, # the function does nothing. # ############################################################### # function verify_physical_data_replication { if [[ -x /usr/sbin/DRCheckDiskStatus ]] then /usr/sbin/DRCheckDiskStatus "${0}" "${VGCHANGE}" "${CVM_ACTIVATION_CMD}" "${VG[*]}"  "${CVM_DG[*]}" "${VXVM_DG[*]}" exit_val=$? if [[ $exit_val -ne 0 ]] then exit $exit_val fi fi } 
 " "${VGCHANGE}" "${CVM_ACTIVATION_CMD}" "${VG[*]}"  "${CVM_DG[*]}" "${VXVM_DG[*]}"     exit_val=$?     if [[ $exit_val -ne 0 ]]     then         exit $exit_val     fi fi }

The command /usr/sbin/DRCheckDiskStatus is part of the Metrocluster/CA Toolkit. It is simply a script that will check the status of a CA pair and take any appropriate action to resynchronize a pair if appropriate (and according to the AUTO_ variables you set in the <package>_xpca.env file). This script is relatively short; it works out whether we are running Metrocluster/CA or Metrocluster/SRDF. DRCheckDiskStatus will call a further script (a program as of version A.04.20 of Metrocluster/CA) called /usr/sbin/DRCheckXPCADevGrp , which performs all the necessary checks on a RAID Manager Device Group. If an error state occurs DRCheckDiskStatus , will exit in such a way that our package control script will terminate abnormally and our package will not start up. The function verify_physical_data_replication is the first function executed in an attempt to start the package. Okay, so far so good. We now know what happens at package startup time. What about ongoing monitoring of your CA pairs? Prior to Metrocluster/CA version A.04.20, this was entirely left to the individual administrator. As of version A.04.20, we now have additional configuration possibilities in order to set up a device group monitoring process. Personally, I think this is crucial because if we have a catastrophic failure of a site, an XP array, or a CA link, we will want to know this relatively quickly and be able to instigate a Disaster Recovery process to roll over operations to the remote site. This is exactly what the Device Group Monitor will accomplish via a command /usr/sbin/DRMonitorXPCADevGrp . To configure the Device Group monitor, we will set up a number of additional variables in the xpca.env file and set up an additional SERVICE_PROCESS in the package configuration and control scripts. Here's how it works:

Set up additional service in the package configuration file:
```
 
```
```
 SERVICE_NAME clockwatch_devgrpmon.srv SERVICE_FAIL_FAST_ENABLED NO SERVICE_HALT_TIMEOUT 5 
```
The timeout of 5 seconds is a recommended minimum.

Define Service Process in the package control script:

 SERVICE_NAME[0]="clockwatch_devgrpmon.srv" SERVICE_CMD[0]="/usr/sbin/DRMonitorXPCADevGrp /etc/cmcluster/clockwatch/clockwatch_xpca.env" SERVICE_RESTART[0]="-r 10"

Define the Monitoring variable in the clockwatch_xpca.env file:
```
 
```
```
 MON_POLL_INTERVAL=10 
```
Define how often to poll the status of the Device Group CA pairs (default = 10 minutes)
```
 
```
```
 MON_NOTIFICATION_FREQUENCY=0 
```
This signifies the frequency of notifications. If left to the default = 0, then notifications will occur only when a change of state in the Device Group has occurred. If set to 5, as in the example, the monitor will send a notification every 5 polling intervals or when a change of state is detected .
```
 
```
```
 MON_NOTIFICATION_EMAIL=charles.keenan@hp.com,root@hpeos001 
```
The email addresses to send notifications to. By default, this is an empty string.
```
 
```
```
 MON_NOTIFICATION_SYSLOG=1 
```
This variable will determine whether we send notifications to syslog.log (default=0, i.e., no notifications are sent to syslog.log ).
```
 
```
```
 MON_NOTIFICATION_CONSOLE=1 
```
Decide whether to send notifications to the system console (default=0, i.e., no notifications are sent to the system console).
```
 
```
```
 AUTO_RESYNC=0 
```
Define the behavior of the monitor in attempting a resynchronization when the CA link is down “ default=0, i.e., no resynchronization is performed; 1=the monitor will split a remote Business Copy (if configured) and try to resynchronize the device; 2=assumes the administrator is managing any remote Business Copy volume manually. A resync will occur only if the file called MON_RESYNC exists in the package directory.

Once we have achieved all these steps, we can run cmcheckconf , run cmapplyconf , and start the package. If you are not going to use the Device Group monitor, I would strongly suggest that you implement some automated mechanism to inform you of any state changes in the Device Group. Some administrators I know have looked into using EMS as a mechanism to fail over a package based on the status of an XP CA device. It should be noted that this solution was simplistic, and the administrators in question had to write a number of supporting scripts in an attempt to automate the resynchronization process. In their view, if a failure of this nature occurred, they wanted to know about it first and then make decisions regarding what corrective action to take. If you are planning to perform these tasks manually, make sure that you are aware of the behavior of commands such as pairresync “swapp , pairresync “swaps , and horctakeover .

Extensive testing of the data replication process and any automated processes to ensure that data replication has been accomplished is absolutely vital . As with any of the cluster solutions we are discussing, we are attempting to provide high availability for our mission-critical applications. Should the data for these applications become corrupt, the applications themselves become useless. Ensure that you are fully aware of the issues relating to offsite data replication when using tools such as Continuous Access XP. Ensure that you understand the implications of using such a solution in conjunction with your applications as well; one customer I know wanted to implement a Metrocluster in conjunction with Oracle Parallel Server (concurrent database instances running on different nodes) but was told that having two possible sources for the database was unsupported. The customer implemented an Extended Serviceguard Cluster that uses host-based data replication (MirrorDisk/UX), which was supported. The customer did point out that MirrorDisk/UX gave more than one source for the data as well. This was negated by the software supplier in view of the fact that with MirrorDisk/UX we have logical data replication (the names of the logical volumes do not change), whereas a solution such as Continuous Access is regarded as physical data replication where access to the logical volumes requires importing volume groups, and so on, and completely different device files, i.e., an additional source for the database.