25.7 Setting Up a Serviceguard Package-less Cluster | HP-UX CSE(c) Official Study Guide and Desk Reference

We have finally arrived at creating the cluster. I think you will agree that all our previous discussions were worthwhile. Now that we understand the implications and requirements for setting up a Serviceguard cluster, we can proceed. By the end of this section, we have a running and tested package-less cluster. We use the cookbook we saw at the beginning of this chapter to set up the cluster and construct a number of tests to ensure that the cluster daemons are working as expected.

25.7.1 Understand the hardware and software implications of setting up a cluster

This topic was covered in the previous section. If you have jumped straight to this section, I think it would be beneficial for you to review the preceding six sections.

25.7.2 Set up NTP between all cluster members

The activities of all nodes in the cluster are now coordinated in an attempt to offer a unified computing resource. All nodes in the cluster should be using a common time source. It is a good idea if you set up Network Time Protocol ( NTP ). As a minimum, use one of the machines as the time source synchronizing with its local clock (a Local Clock Impersonator).

25.7.3 Ensure that any shared volume groups are not activated at boot time

We are now at the stage of coordinating activities between nodes. As we have mentioned, Serviceguard is a part of that process. An important part of Serviceguard's role it to coordinate the use of shared data. If we are using LVM volume groups, we need to ensure that our shared volume groups are not activated at system boot time. It is up to Serviceguard to activate a volume group on a node that needs access to the data. In order to disable volume group activation at boot time, we need to modify the startup script /etc/lvmrc . The first part of this process is as follows :

 AUTO_VG_ACTIVATE=1

Changed to

 AUTO_VG_ACTIVATE=0

You then need to tell /etc/lvmrc which volume groups are to be activated at boot time. These will be volume groups that contain files and data that are unique for individual nodes, e.g., a node may have a volume group that contains application documentation and manuals. It does not include vg00 because this is activated before /etc/lvmrc is referenced. Having your additional volume groups activated at boot time is accomplished in the function custom_vg_activation() . I have listed the entire function below with the line I edited underlined and bold.

 custom_vg_activation() {         # e.g., /sbin/vgchange -a y -s         #      parallel_vg_sync "/dev/vg00 /dev/vg01"   parallel_vg_sync "/dev/vgmanuals"   return 0 }

This needs to be performed on all nodes in the cluster. The list of volume groups that are to be activated will vary from machine to machine.

25.7.4 Install Serviceguard and any related Serviceguard patches

To get the most up-to-date patches for Serviceguard, browse on the Web to Hewlett-Packard's IT Resource Center at http://itrc.hp.com. Registration is free, and you'll get access to lots of useful information as well as access to the most up-to-date patches. The issue of patches is covered in Chapter 12, "HP-UX Patches." Patches should be installed after the base product is installed. Installing the base product is pretty straightforward; it's simply a case of a swinstall . Most recent versions of Serviceguard don't even need to reboot. Here are some commands to check whether it is installed:

 #  swlist l fileset Serviceguard   # swlist l fileset s /cdrom a is_reboot Serviceguard

If you are installing either the "Mission Critical" or "Enterprise" Operating Environment for HP-UX 11i, Serviceguard should be installed automatically.

It might be a good idea to make sure that you install two additional products at the same time:

EMS HA Monitors (version A.03.20.01 or later) : This usually entails at least six products on CD 2 of your 11i Core OS CD (alternatively on your Applications CD prior to 11i):

- EMS-Config

- EMS-Core

- EMS-DiskMonitor

- EMS-KRMonitor

- EMS-MIBMonitor

- EMS-RdbmsMon

We use these for monitoring other critical resources.
Cluster Object Manager (version B.01.04 or later) : This usually entails a single product on the same CD as the EMS HA Monitors:

- Cluster-OM

This will be used later with the Serviceguard Manager product.

At this point, I downloaded and installed the Serviceguard patch PHSS_28851 onto my systems. NOTE: Check that you have the most up-to-date patch for your version of Serviceguard :

 root@hpeos001[oracle1] #  swlist PHSS_28851  # Initializing... # Contacting target "hpeos001"... # # Target:  hpeos001:/ # # PHSS_28851                    1.0            Serviceguard and SG-OPS Edition A.11.14   PHSS_28851.ATS-MAN            1.0            Service Guard Advanced Tape Services   PHSS_28851.ATS-RUN            1.0            Service Guard Advanced Tape Services   PHSS_28851.CM-CORE            1.0            CM-CORE Serviceguard OPS Edition SD fileset   PHSS_28851.CM-CORE-MAN        1.0            CM-CORE-MAN Serviceguard OPS Edition SD fileset   PHSS_28851.CM-PKG             1.0            CM-PKG Serviceguard OPS Edition SD fileset   PHSS_28851.CM-PKG-MAN         1.0            CM-PKG-MAN Serviceguard OPS Edition SD fileset root@hpeos001[oracle1] #

25.7.5 Installing a Quorum Server (optional in a basic cluster)

This is optional, but it's becoming more common. First, you need to choose a system or systems where you will run the Quorum Server software. I say "systems" because someone quite kindly pointed out that a single Quorum Server could be seen as an SPOF . To alleviate this, I would run my Quorum Server in a different and separate Serviceguard cluster and eliminate the SPOF by configuring a package that managed the Quorum Serve application. If the primary Quorum Server node fails, it will fail over to an adoptive node. The fact that we can associate an IP address with an application means that our original cluster maintains contact with the Quorum Server via its package IP address. As an example, if you had "Finance" and "Sales" clusters, each could run a Quorum Server Package for the other cluster! Now back to the installation:

Choose a node or nodes that are not part of this cluster. The nodes can be running either HP-UX or Linux.
Install the Quorum Server software (B8467BA version A.02.00):
1. Either from the HP Serviceguard Distributed Components CD, or
2. Download the product for free from http://software.hp.com, under "High Availability." The last time I looked , it was titled "Serviceguard Quorum Server." Use the appropriate tool, i.e., swinstall or rpm to install the product.
The installation doesn't put an entry into /etc/inittab (some installations will configure the Quorum Server software as a Serviceguard package that can subsequently move to an adoptive node should the original Quorum Server fail). You are going to have to do that yourself to ensure that /usr/lbin/qs ( /usr/local/qs/bin/qs for Linux) gets started at boot-up time and gets restarted (the respawn action in /etc/inittab ) if necessary. The entry should look something like this:
```
 
```
```
 qs:345:respawn:/usr/lbin/qs >> /var/adm/qs/qs.log 2>&1 
```
It might be a good idea to ensure that the /var/adm/qs directory has been created as well. Don't run init q yet because the qs daemon will refuse to start if you don't have an authorization file in place.
Set up an authorization file of all nodes requiring Quorum Services. It is simply a list of hostnames and/or IP addresses (why not put both!), one per line; ensure that all nodes in the cluster are entered in the authorization file. The authorization file is called /etc/cmcluster/qs_authfile (or /usr/local/qs/conf/qs_authfile for Linux).
Now we can start the qs daemon by running init q .
Check that the qs daemon is running by monitoring the file /var/adm/qs/qs.log . Here's the output from a machine that successfully started the qs daemon.
```
 
```
```
 Apr 05 16:51:32:0:Starting quorum server Apr 05 16:51:32:0:Total allocated: 440788 bytes, used: 20896 bytes, unused 419880 bytes Apr 05 16:51:32:0:Server is up and waiting for connections at port 1238 
```
1. If you need to update the authorization file, e.g., you add a new node to the cluster, ensure that you get the qs daemon to reread the authorization file. To do this, you simply run the qs command with the “update command line argument.
  
  For HP-UX: # /usr/lbin/qs -updat e
  
  For Linux: # /usr/local/qs/bin/qs “update

25.7.6 Enable remote access to all nodes in the cluster

Early versions of Serviceguard required the use of the file $HOME/.rhosts for the root user . This obviously had implications for network security. To get around this, Serviceguard offers an alternative. You can use a file called /etc/cmcluster/cmclnodelist . The format of this file is the same as $HOME/.rhosts :

 <hostname> <username>

You should list all hostnames in the cluster, and the username will likely be root . If you want other users to be able to monitor the cluster, list their hostname/username as well. Oh, another thing. It might be a good idea to list the IP address/username as well. Just in case something happens to your host lookup database, e.g., DNS, you don't want that causing you unnecessary problems when managing your cluster. Make sure every node has the /etc/cmcluster/cmclnodelist file in place and a complete list of hostname/username and IP address/username for all machines in the cluster.

25.7.7 Create a default ASCII cluster configuration file

This is easier than you think. You simply need to ensure that all nodes are running and contactable, and that the cmclnodelist file is in place. Let's log in to one of the nodes in the cluster. It doesn't matter which one. This is where we will initially perform all of our configuration steps. Serviceguard supplies a command ( cmquerycl ) that will probe all the nodes we tell it to. It will work out for itself which LAN cards are active, which LAN cards " could " be standby LAN cards, and which volume groups are shared; it will list the first physical volume as a candidate as a Cluster Lock PV. If we have the Quorum Server up and running, it can even fill in those details as well. Here are a couple of examples:

 #  cd /etc/cmcluster  #  cmquerycl v C cluster.ascii n node1 n node2

The “v (verbose) is optional, but it does give you some idea what cmquerycl is doing. The “C just specifies the filename to store the resulting configuration file. The “n specifies the nodes to be included in the cluster. Not too challenging, is it? Alternately, you can specify a Quorum Server ( -q qshost1 in our example) and give your cluster a name ( -c McBond in our example) up front:

 #  cd /etc/cmluster  #  cmquerycl v C cluster.ascii c McBond q qshost1 n node1 n node2

This takes a few seconds to complete. Errors will have to be resolved before moving on.

25.7.8 Update the ASCII cluster configuration file

I ran the first of the example cmquerycl commands above. I have listed below the content of the cluster.ascii configuration file.

# ********************************************************************** # ********* HIGH AVAILABILITY CLUSTER CONFIGURATION FILE *************** # ***** For complete details about cluster parameters and how to **** # ***** set them, consult the Serviceguard manual. **** # ********************************************************************** # Enter a name for this cluster. This name will be used to identify the # cluster when viewing or manipulating it. CLUSTER_NAME cluster1 # Cluster Lock Parameters # # The cluster lock is used as a tiebreaker for situations # in which a running cluster fails, and then two equal-sized # sub-clusters are both trying to form a new cluster. The # cluster lock may be configured using either a lock disk # or a quorum server. # # You can use either the quorum server or the lock disk as # a cluster lock but not both in the same cluster. # # Consider the following when configuring a cluster. # For a two-node cluster, you must use a cluster lock. For # a cluster of three or four nodes, a cluster lock is strongly # recommended. For a cluster of more than four nodes, a # cluster lock is recommended. If you decide to configure # a lock for a cluster of more than four nodes, it must be # a quorum server. # Lock Disk Parameters. Use the FIRST_CLUSTER_LOCK_VG and # FIRST_CLUSTER_LOCK_PV parameters to define a lock disk. # The FIRST_CLUSTER_LOCK_VG is the LVM volume group that # holds the cluster lock. This volume group should not be # used by any other cluster as a cluster lock device. # Quorum Server Parameters. Use the QS_HOST, QS_POLLING_INTERVAL, # and QS_TIMEOUT_EXTENSION parameters to define a quorum server. # The QS_HOST is the host name or IP address of the system # that is running the quorum server process. The # QS_POLLING_INTERVAL (microseconds) is the interval at which # Serviceguard checks to make sure the quorum server is running. # The optional QS_TIMEOUT_EXTENSION (microseconds) is used to increase # the time interval after which the quorum server is marked DOWN. # # The default quorum server timeout is calculated from the # Serviceguard cluster parameters, including NODE_TIMEOUT and # HEARTBEAT_INTERVAL. If you are experiencing quorum server # timeouts, you can adjust these parameters, or you can include # the QS_TIMEOUT_EXTENSION parameter. # # For example, to configure a quorum server running on node # "qshost" with 120 seconds for the QS_POLLING_INTERVAL and to # add 2 seconds to the system assigned value for the quorum server # timeout, enter: # # QS_HOST qshost # QS_POLLING_INTERVAL 120000000 # QS_TIMEOUT_EXTENSION 2000000 FIRST_CLUSTER_LOCK_VG /dev/vg01 # Definition of nodes in the cluster. # Repeat node definitions as necessary for additional nodes. NODE_NAME hpeos001 NETWORK_INTERFACE lan0 HEARTBEAT_IP 192.168.0.201 NETWORK_INTERFACE lan1 FIRST_CLUSTER_LOCK_PV /dev/dsk/c1t0d0 # List of serial device file names # For example: # SERIAL_DEVICE_FILE /dev/tty0p0 # Possible standby Network Interfaces for lan0: lan1. NODE_NAME hpeos002 NETWORK_INTERFACE lan0 HEARTBEAT_IP 192.168.0.202 NETWORK_INTERFACE lan1 FIRST_CLUSTER_LOCK_PV /dev/dsk/c0t0d0 # List of serial device file names # For example: # SERIAL_DEVICE_FILE /dev/tty0p0 # Possible standby Network Interfaces for lan0: lan1. # Cluster Timing Parameters (microseconds). # The NODE_TIMEOUT parameter defaults to 2000000 (2 seconds). # This default setting yields the fastest cluster reformations. # However, the use of the default value increases the potential # for spurious reformations due to momentary system hangs or # network load spikes. # For a significant portion of installations, a setting of # 5000000 to 8000000 (5 to 8 seconds) is more appropriate. # The maximum value recommended for NODE_TIMEOUT is 30000000 # (30 seconds). HEARTBEAT_INTERVAL 1000000 NODE_TIMEOUT 2000000 # Configuration/Reconfiguration Timing Parameters (microseconds). AUTO_START_TIMEOUT 600000000 NETWORK_POLLING_INTERVAL 2000000 # Package Configuration Parameters. # Enter the maximum number of packages which will be configured in the # cluster. # You can not add packages beyond this limit. # This parameter is required. MAX_CONFIGURED_PACKAGES 0 # List of cluster aware LVM Volume Groups. These volume groups will # be used by package applications via the vgchange -a e command. # Neither CVM or VxVM Disk Groups should be used here. # For example: # VOLUME_GROUP /dev/vgdatabase # VOLUME_GROUP /dev/vg02 VOLUME_GROUP /dev/vg01

Thankfully, cmquerycl takes most of the heartache out of constructing this file. As you can see, there's quite a lot to digest. A formal definition of all these parameters can be found in the Serviceguard documentation. Here, I attempt to put the formal definition into more meaningful English. The file can be broken down into a number of sections:

Cluster name

- Use a name that means something to you. You might have a naming convention for things such as hostnames, so it might be appropriate to have a similar naming convention if you are going to run multiple clusters.
Cluster lock strategy, i.e., LVM disk and/or Quorum Server

- The name of the Quorum Server.

- Timing parameters for the Quorum Server. Every two minutes, the Quorum Server is polled to make sure it's still alive . If you have a busy network, you can use the " extension " parameter to extend the polling interval, or simply increase the polling interval itself.

- We also specify the name of the LVM volume group that holds the cluster lock disk. The parameter is called FIRST_CLUSTER_LOCK_VG . If your disks are powered from the same power supply as the nodes themselves , you may consider using a SECOND_CLUSTER_LOCK_VG . We could then specify a SECOND_CLUSTER_LOCK_PV for each node.
Individual node specifications

- We specify the node name as well as all LAN interfaces for that node.

- Remember, a LAN interface can be either HEARTBEAT_IP or STATIONARY_IP . STATIONARY_IP means that we don't send heartbeat packets over that interface. The default assumed by cmquerycl is to specify HEARTBEAT_IP for all active interfaces.

- We list the device file for the FIRST and possibly SECOND_CLUSTER_LOCK_PV .

- We specify that the device file we are using is a SERIAL_HEARTBEAT .
Cluster timing parameters

- HEARTBEAT_INTERVAL specifies how often a heartbeat packet is transmitted. The default of 1 second seems reasonable.

- NODE_TIMEOUT defaults to 2 seconds. This is the time at which a node is determined to have failed. If you leave it at 2 seconds, you will get a warning that it might be a good idea to increase this value. The dilemma is the fact that we would like quick cluster reformations when a node does actually fail, hence the 2 second default. Unfortunately, if we have a busy network, then heartbeat packets might not get through in time and we would experience a cluster reformation. The reality is that after the reformation , all the nodes will still be online so nothing drastic will actually happen. The problem is that if you are monitoring syslog.log and you see a cluster reformation, it might ring some alarm bells . It's up to you; my thinking is that if you have a dedicated heartbeat LAN, then 2 seconds should be okay as the maximum traffic on that LAN would be 16 (maximum nodes in a cluster) nodes sending a heartbeat packet once every second; that's not much traffic.
Configuration/reconfiguration timing parameters

- The first time we start the cluster, we must have all nodes online, i.e., 100 percent node attendance. The AUTOSTART_TIMEOUT (default = 10 minutes) is how long we wait for other nodes to become available. Otherwise, we don't start the cluster.

- NETWORK_POLLING_INTERVAL (default 2 seconds) is how often cmcld checks that our standby LAN cards are still working as specified. If we lose an active LAN card, cmcld will immediately move the IP address to the standby LAN card, so it's a good idea to check that the bridged network is operational every few seconds.
Package configuration parameters

- I want to point out only one thing here: MAX_CONFIGURED_PACKAGES (default = 0). That's fairly easy to understand. If we are to configure more packages than this parameter allows, we would need to shut down the entire cluster, so choose a reasonable value up front.
LVM volume groups
- This is a list of LVM volume groups that will be marked as " cluster aware " ( vgchange “c y ) as soon as the cmcld daemon is run. If you try this command without cmcld running, you get an error message. You don't need to list your volume groups here, but make sure when you start the cmcld daemon that you run vgchange “c y against all your shared volume groups. Being " cluster aware " sets a flag in the VGRA that allows the volume group to be activated in " exclusive " mode ( vgchange “a e ). Here are the changes I made to this file:
- Change the default cluster name:

CLUSTER_NAME McBond

- Set up a serial heartbeat if applicable . This is detailed in the individual node specifications:

SERIAL_DEVICE_FILE /dev/tty0p0 SERIAL_DEVICE_FILE /dev/tty1p0

- Allow me to create packages in the future:

MAX_CONFIGURED_PACKAGES 10

25.7.9 Check the updated ASCII cluster configuration file

We need to ensure that all our changes are syntactically correct and can actually be applied to the nodes in question. We have a simple command to do this: cmcheckconf . Below, you see me run this command and the output I received:

 root@hpeos001[cmcluster] #  cmcheckconf -v -C cluster.ascii  Checking cluster file: cluster.ascii Note : a NODE_TIMEOUT value of 2000000 was found in line 104. For a significant portion of installations, a higher setting is more appropriate. Refer to the comments in the cluster configuration ascii file or Serviceguard manual for more information on this parameter. Checking nodes ... Done Checking existing configuration ... Done Warning: Can not find configuration for cluster McBond Gathering configuration information ... Done Gathering configuration information .............. Done Checking for inconsistencies .. Done Maximum configured packages parameter is 10. Configuring 0 package(s). 10 package(s) can be added to this cluster. Creating the cluster configuration for cluster McBond. Adding node hpeos001 to cluster McBond. Adding node hpeos002 to cluster McBond. Verification completed with no errors found. Use the cmapplyconf command to apply the configuration. root@hpeos001[cmcluster] #

Any errors need to be corrected before we move on.

25.7.10 Compile and distribute binary cluster configuration file

We can now compile the binary cluster configuration file /etc/cmcluster/cmclconfig from our ASCII template file. As you can see in the output from the cmcheckconf command, we use cmapplyconf . This will also distribute cmclconfig to all nodes in the cluster. One thing to be aware of is in connection with cluster lock disks. If you are using a cluster lock disk, it is a good idea to activate the volume group on the node from which you are running cmapplyconf . If you don't, cmapplyconf will attempt to activate it in exclusive mode in order to initialize the cluster lock information. If this fails, cmapplyconf will display an error, so having the volume group active in the first place avoids this error. Again, here is the output I received:

 root@hpeos001[cmcluster] #  cmapplyconf -v -C cluster.ascii  Checking cluster file: cluster.ascii Note : a NODE_TIMEOUT value of 2000000 was found in line 104. For a significant portion of installations, a higher setting is more appropriate. Refer to the comments in the cluster configuration ascii file or Serviceguard manual for more information on this parameter. Checking nodes ... Done Checking existing configuration ... Done Warning: Can not find configuration for cluster McBond Gathering configuration information ... Done Gathering configuration information .............. Done Checking for inconsistencies .. Done Maximum configured packages parameter is 10. Configuring 0 package(s). 10 package(s) can be added to this cluster. Creating the cluster configuration for cluster McBond. Adding node hpeos001 to cluster McBond. Adding node hpeos002 to cluster McBond. Completed the cluster creation. root@hpeos001[cmcluster] #

Before we start the cluster, we need to ensure that we have a consistent backup of the LVM structures.

25.7.11 Back up LVM structures of any cluster lock volume groups

The LVM structures in the Volume Group Reserved Area (VGRA) deal with the state and location on disk of the cluster lock. The actual cluster lock and who currently owns it is stored in the Bad Block Relocation Area (BBRA) of the disk. Although we won't back up the actual cluster lock itself, we should back up the LVM structures that relate to it in the VGRA. This is why we use vgcfgbackup at this time. Should we need to, i.e., if a cluster lock disk fails, we can recover these fields with vgcfgrestore .

In our case, it is simply a case of running the following command:

 root@hpeos001[] #  vgcfgbackup /dev/vg01  Volume Group configuration for /dev/vg01 has been saved in /etc/lvmconf/ vg01.conf root@hpeos001[] #

We should consider storing the vgcfgbackup file (/etc/lvmconf/vg01.conf in this case) on all nodes in the cluster. Because Serviceguard is responsible for activating and deactivating volume groups, we should deactivate this volume group with the following:

 root@hpeos001[] #  vgchange -a n /dev/vg01  vgchange: Volume group "/dev/vg01" has been successfully changed. root@hpeos001[] #

We are now ready to start the cluster.

25.7.12 Start cluster services

Ensure that all nodes are online before attempting to start the cluster for the first time. We need 100 percent node attendance. Here is the output from my cluster:

 root@hpeos001[cmcluster] #  cmruncl -v  Successfully started $SGLBIN/cmcld on hpeos001. Successfully started $SGLBIN/cmcld on hpeos002. cmruncl  : Waiting for cluster to form..... cmruncl  : Cluster successfully formed. cmruncl  : Check the syslog files on all nodes in the cluster cmruncl  : to verify that no warnings occurred during startup. root@hpeos001[cmcluster] #

The command cmruncl is how we start cluster services when the cluster is down. When we start the cluster, it would be helpful to have all nodes online. This is not always possible, i.e., one node is down due to urgent maintenance. In this situation, we could use the option “n <nodename> to cmruncl , listing all nodes that are currently available. Here's what happens if I use this command to start cluster services on one node (the other node hpeos001 is currently down):

 root@hpeos002[] #  cmruncl -v -n hpeos002  WARNING: Performing this task overrides the data integrity protection normally provided by Serviceguard. You must be certain that no package applications or resources are running on the other nodes in the cluster:        hpeos001 To ensure this, these nodes should be rebooted (i.e., /usr/sbin/shutdown -r) before proceeding. Are you sure you want to continue (y/[n])?

The reason for the warning is that if the down node(s) restarted but had network problems so that they couldn't contact the nodes currently in the cluster, they could potentially form a cluster of their own. This could lead to two sets of nodes trying to start the same applications. This is not a good idea. Once the down system is rebooted, we can have that node join the cluster with the command cmrunnode .

We should start to get into the habit of running cmviewcl “v . As you can gather, I like my “v option. In a cluster, you probably want to know that everything is working as expected. You can use a “n nodename just to view specific nodes. Here is the output from my cmviewcl command:

 root@hpeos001[cmcluster] #  cmviewcl -v  CLUSTER      STATUS McBond       up   NODE         STATUS       STATE   hpeos001     up           running     Network_Parameters:     INTERFACE    STATUS       PATH         NAME     PRIMARY      up           8/16/6       lan0     STANDBY      up           8/20/5/2     lan1   NODE         STATUS       STATE   hpeos002     up           running     Network_Parameters:     INTERFACE    STATUS       PATH         NAME     PRIMARY      up           2/0/2        lan0     STANDBY      up           4/0/1        lan1

Let's look at /var/adm/syslog/syslog.log . This will detail the starting of the cluster. It takes time to get used to the output from different cluster operations. That's why we are going to test cluster functionality at the end of this section. Part of that will be to check syslog.log and find out what is happening. Here's the healthy output I received in my syslog.log when I started the cluster:

 root@hpeos001[cmcluster] #  more /var/adm/syslog/syslog.log  ... Aug  2 15:48:57 hpeos001 CM-CMD[2733]: cmruncl -v Aug  2 15:48:58 hpeos001 inetd[2734]: hacl-cfg/udp: Connection from localhost (127.0.0.1) at Fri Aug  2 15:48:58 2002 Aug  2 15:48:58 hpeos001 inetd[2735]: hacl-cfg/tcp: Connection from localhost (127.0.0.1) at Fri Aug  2 15:48:58 2002 Aug  2 15:48:58 hpeos001 inetd[2736]: hacl-cfg/tcp: Connection from localhost (127.0.0.1) at Fri Aug  2 15:48:58 2002 Aug  2 15:48:58 hpeos001 cmclconfd[2736]: Executing "/usr/lbin/cmcld" for node hpeos001 Aug  2 15:48:58 hpeos001 inetd[2738]: hacl-cfg/tcp: Connection from localhost (127.0.0.1) at Fri Aug  2 15:48:58 2002 Aug  2 15:48:58 hpeos001 cmcld: Daemon Initialization - Maximum number of packages supported for this incarnation is 10. Aug  2 15:48:58 hpeos001 cmcld: Global Cluster Information: Aug  2 15:48:58 hpeos001 cmcld: Heartbeat Interval is 1 seconds. Aug  2 15:48:58 hpeos001 cmcld: Node Timeout is 2 seconds. Aug  2 15:48:58 hpeos001 cmcld: Network Polling Interval is 2 seconds. Aug  2 15:48:58 hpeos001 cmcld: Auto Start Timeout is 600 seconds. Aug  2 15:48:58 hpeos001 cmcld: Information Specific to node hpeos001: Aug  2 15:48:58 hpeos001 cmcld: Cluster lock disk: /dev/dsk/c1t0d0. Aug  2 15:48:58 hpeos001 cmcld: lan0  0x080009ba841b  192.168.0.201  bridged net:1 Aug  2 15:48:58 hpeos001 inetd[2739]: hacl-cfg/tcp: Connection from hpeos001 (192.168.0.201) at Fri Aug  2 15:48:58 2002 Aug  2 15:48:58 hpeos001 cmcld: lan1  0x0800093d4c50    standby    bridged net:1 Aug  2 15:48:58 hpeos001 cmcld: Heartbeat Subnet: 192.168.0.0 Aug  2 15:48:58 hpeos001 cmcld: The maximum # of concurrent local connections to the daemon that will be supported is 38. Aug  2 15:48:59 hpeos001 cmcld: Total allocated: 2097832 bytes, used: 3726072 bytes, unused 2017224 bytes Aug  2 15:48:59 hpeos001 cmcld: Starting cluster management protocols. Aug  2 15:48:59 hpeos001 cmcld: Attempting to form a new cluster Aug  2 15:49:00 hpeos001 cmtaped[2743]: cmtaped: There are no ATS devices on this cluster. Aug  2 15:49:01 hpeos001 cmcld: New node hpeos002 is joining the cluster Aug  2 15:49:01 hpeos001 cmcld: Clearing Cluster Lock Aug  2 15:49:01 hpeos001 inetd[2749]: hacl-cfg/tcp: Connection from hpeos001 (192.168.0.201) at Fri Aug  2 15:49:01 2002 Aug  2 15:49:03 hpeos001 cmcld: Turning on safety time protection Aug  2 15:49:03 hpeos001 cmcld: 2 nodes have formed a new cluster, sequence #1 Aug  2 15:49:03 hpeos001 cmcld: The new active cluster membership is: hpeos001(id=1), hpeos002(id=2) Aug  2 15:49:03 hpeos001 cmlvmd: Clvmd initialized successfully. Aug  2 15:49:03 hpeos001 inetd[2750]: hacl-cfg/tcp: Connection from hpeos001 (192.168.0.201) at Fri Aug  2 15:49:03 2002 Aug  2 15:49:03 hpeos001 inetd[2751]: hacl-cfg/tcp: Connection from hpeos002 (192.168.0.202) at Fri Aug  2 15:49:03 2002 Aug  2 15:49:04 hpeos001 inetd[2752]: hacl-cfg/tcp: Connection from hpeos001 (192.168.0.201) at Fri Aug  2 15:49:04 2002 Aug  2 15:49:16 hpeos001 inetd[2753]: registrar/tcp: Connection from hpeos001 (192.168.0.201) at Fri Aug  2 15:49:16 2002

As you can see, there is quite a lot going on. Basically, we start the cmcld daemon. In initializing, cmcld outputs our cluster timing parameters. It then identifies which LAN cards are active and which are Standby cards. We then work out whether there are any shared tape devices. We then see the other node ( hpeos002 ) joining the cluster, giving us two members. Finally, the cluster LVM daemon is started. The entries you see for hacl-cfg come from the cluster configuration daemon ( cmclconfd ) that gathers information about LAN cards and volume groups. It also distributes the cluster binary file. During the startup of the cluster, all nodes are communicating with each other to ensure that the cluster is formed correctly and also to elect a cluster coordinator . If cmcld needs to gather information, it will do so by making a request to a cmclconfd process ”actually to a network socket being managed by inetd . inetd is listening for requests on port 5302, and it is inetd that will actually spawn the cmclconfd daemon. One problem I have seen in the past is the entries in /etc/inetd.conf were missing. This causes weird results; I once saw " Error: Unable to establish communication to node <nodename> " when executing a cmquerycl . We checked everything from cables to linkloop commands and tried resetting LAN cards with lanadmin . The only reason I managed to fix it was that my suspicions were aroused by the lack of entries in syslog.log for cmclconfd . In the end, the customer involved admitted that he had recently uninstalled and reinstalled Serviceguard a few times . Don't as me why, he just did. The key was getting familiar with the expected output in syslog.log and trying to troubleshoot from first principles. We should see similar output on all nodes in the cluster.

Here is a brief overview of the election protocol every time a cluster reforms :

Start a reconfiguration timer.
Search for the existing cluster coordinator.

- Send an FC_broadcast (FindCoordinator) message and wait for a reply.
If the Cluster Coordinator replies, send them your "vote."
If no Cluster Coordinator replies, attempt to become the Cluster Coordinator.

- Reply to other nodes and accept " votes ."
After "election timeout," count the "votes."

- <50% : Retry until "reconfiguration timeout" expires . If still <50%, halt the node.

- =50% : Attempt to grab the cluster lock and form the cluster. If this fails, halt the node.

- >50% : Form the cluster.
Wait for "quiescence" timeout, an elapsed time to allow other nodes to halt.
New Cluster Coordinator informs the cluster members of the status and membership of the cluster.
Start heartbeat packets to all cluster members.
Clear the cluster lock.

Note : The current Cluster Coordinator does not perform steps (b) and (c).

If you are interested in finding the cluster coordinator, you need to increase the Serviceguard logging level. This is achieved by using the contributed command cmsetlog . Use of this command by customers is normally only under the guidance of HP Support personnel. HP does not offer official support for this command, so be very careful if you are going to use it. The command and its options are discussed on various HP Education Services courses covering Serviceguard. If you are unfamiliar with the command, it is strongly suggested you do not use it. We will need to increase the logging level to 4 ( /usr/ contrib /bin/cmsetlog 4 ) if we want Serviceguard to report which node is a cluster coordinator. The node that becomes the cluster coordinator will write a message into its /var/adm/syslog/syslog.log of the form " Aug 2 17:22:57 hpeos001 cmcld: This node is now cluster coordinator ".

One last thing. We have the option of starting cluster services every time a node starts up. This is accomplished by editing the startup script /etc/rc.config.d/cmcluster . The default is to not start cluster services at boot time ( AUTOSTART_CMCLD=0 ), and I agree with this for this reason: Once started, why would a node need rebooting? If it does reboot, I would want to know why. Let's look at an example of when a node crashes due to a hardware problem. It reboots, and if we set AUTOSTART_CMCLD=1 , it rejoins the cluster. This will cause a cluster reformation. If the hardware problem is intermittent, the fault may not occur for some time. Alternately, it could happen almost immediately. With AUTOSTART_CMCLD=1 , the node would be joining, leaving, and rejoining the cluster every few minutes. A cluster reformation in itself is not too much to ask, but it is something we want to avoid if it at all possible. Having spurious reformations can confuse everyone involved and may actually "hide" real problems when they do occur. With AUTOSTART_CMCLD=0 , a node will stay out of the cluster, allowing you to investigate why it rebooted before having the node rejoin the cluster when the problem has been rectified.

25.7.13 Test cluster functionality

There are a number of tests we will perform. Some of them are quite straightforward and test the basic functionality of the cluster; we use Serviceguard commands to accomplish these tests. I will call these Standard Tests . Other tests are designed to uncover whether Serviceguard can provide the high availability features it claims it can. For these tests, we use "unorthodox" methods to test Serviceguard. I call these Stress Tests . We need to be sure that Serviceguard will react promptly and correctly in the event of an unexpected incident, e.g., if a LAN card fails. Let's start with the Standard Tests:

Standard Tests :
1. Cluster can start and stop successfully .
  
  You should be able to run the following to start the cluster:
  
  # cmruncl -v
  
  You should be able to run the following to halt the cluster:
  
  # cmhaltcl v
  
  You can run these commands from any node in the cluster. Check the output in /var/adm/syslog/syslog.log to ensure that everything is working as expected. This is basic functionality. Do not proceed until you are satisfied that the cluster can be started and stopped from every node.
2. Individual nodes can leave the cluster .
  
  When we are performing critical maintenance on an individual node, we want to stop cluster services only on that node. Some administrators feel that if the node is going to be "out of commission" for a considerable time, then we should take it out of the cluster altogether. I can see some logic in that. My only concern is that we will have to recompile the cluster binary configuration file to remove and then add the node into the cluster. What would happen if another node were not running during this recompilation? We could be in a position where we want to re-add the original node, but we are having to wait until the second node comes back online to ensure that every node has the most up-to-date cluster binary file. For this reason alone, I would leave the node as being a member of the cluster, but just stop cluster services. Even if we reboot the node, it will not start cluster services as AUTOSTART_CMCLD=0 . To stop cluster service, we would run the following command:
  
  # cmhaltnode v
  
  Ensure that cluster service have stopped by checking /var/adm/syslog/syslog.log and the output from cmviewcl “v . We could run cmhaltnode from any node in the cluster. If we want to halt cluster services for a node other than our own, we can run this:
  
  # cmhaltnode v othernode
  
  Obviously, othernode is the hostname on which we are starting up cluster services. Again, check /var/adm/syslog/syslog.log and the output from cmviewcl “v to ensure that everything is functioning as expected.
3. Individual nodes can join the cluster .
  
  In this instance, we want a node to rejoin a running cluster. Maybe we have concluded our critical maintenance, or the machine crashed and we have finished our investigations and repairs . We want to start cluster services only on this node. To accomplish this, we run the following:
  
  # cmrunnode v
  
  Ensure that cluster service has stopped by checking /var/adm/syslog/syslog.log and the output from cmviewcl “v . Like cmhaltnode , we could run cmrunnode from any node in cluster. If we want to start cluster services for a node other than our own, we can run this:
  
  # cmrunnode v othernode
  
  Obviously, othernode is the hostname on which we are shutting down cluster services. Again, check /var/adm/syslog/syslog.log and the output from cmviewcl “v to ensure that everything is functioning as expected.
Stress Tests:

These test are a little "unorthodox" only insofar as we are trying to think of situations that may happen in a production environment and which could threaten access to our applications. We want to test these situations in a controlled way to ensure that Serviceguard is behaving as expected.
1. Remove an active LAN card .
  
  There should be no perceived problems when we perform this test. Serviceguard should automatically relocate the IP address associated with our Primary LAN to the Standby LAN card. Serviceguard will also send out an ARP broadcast to all machines currently communicating via that IP address to flush their ARP cache and, hence, disassociate the IP address with a MAC address. All clients will then need to send an ARP request to reestablish the IP-MAC mapping. In doing so, they will now find that the MAC address of the Standby LAN card is associated with the relevant IP address. This is the output I found in /var/adm/syslog/syslog.log after I pulled the cable from my active LAN card and then put it back in:
  
  Aug 2 19:39:19 hpeos001 cmcld: lan0 failed Aug 2 19:39:19 hpeos001 cmcld: Subnet 192.168.0.0 switched from lan0 to lan1 Aug 2 19:39:19 hpeos001 cmcld: lan0 switched to lan1
  
  As we can see, Serviceguard reacted instantaneously to relocate the IP address to the standby LAN card. One word of warning: If you keep your NODE_TIMEOUT value low, i.e., 2 seconds, it may be that you see a cluster reformation in syslog.log at the same time as Serviceguard relocates the IP address. This is due to timing issues with sending and receiving heartbeat packets. Because Serviceguard can relocate the IP address almost instantaneously, we see the cluster reform at the same time as the IP address is relocated . Here's what we see with cmviewcl “v :
  
  root@hpeos001[cmcluster] # cmviewcl -v CLUSTER STATUS McBond up NODE STATUS STATE hpeos001 up running Network_Parameters: INTERFACE STATUS PATH NAME PRIMARY down 8/16/6 lan0 STANDBY up 8/20/5/2 lan1 NODE STATUS STATE hpeos002 up running Network_Parameters: INTERFACE STATUS PATH NAME PRIMARY up 2/0/2 lan0 STANDBY up 4/0/1 lan1 root@hpeos001[cmcluster] #
  
  Notice that the PRIMARY LAN card for hpeos001 is now " down ." On reconnecting the LAN card, Serviceguard relocates the IP address back to the Primary LAN card, as we can see from syslog.log .
  
  Aug 2 19:45:22 hpeos001 cmcld: lan0 recovered Aug 2 19:45:22 hpeos001 cmcld: Subnet 192.168.0.0 switched from lan1 to lan0 Aug 2 19:45:22 hpeos001 cmcld: lan1 switched to lan0
  
  If you were seeing lots of these "local LAN failover" errors, then I would consider logging a Hardware Support Call with your local Hewlett-Packard Response Center to have a Hewlett-Packard Hardware Engineer check whether your LAN card is malfunctioning. It could also be a faulty cable or a faulty hub/switch.
2. A situation where cmcld is starved for resources.
  
  This is a particularly critical situation. As we now know, cmcld is a critical part of the suite of Serviceguard daemons. It is considered to be so important that it runs at an HP-UX Real-Time Priority of 20. This means that when it wants to run, there's a high probability that it will be the most important process on the system. There are few processes with a higher priority. However, I have come across many installations where Real-Time priorities have been used to improve the responsiveness of critical application processes. In one such situation ”a four-processor machine ”the administrators had four database instances running in a Serviceguard cluster. The main database daemons were running at priority = 20 in an attempt to maximize the amount of CPU time the main database processes received. The administrators felt that it was highly unlikely that at any one time all the database processes would be executing requests to such an intensity that cmcld would not get execution time on any processor. As we know from Murphy's Law, such a situation did arise. The database processes spawned a significant number of child processes. Along with cmcld , this constituted enough of a contention that cmcld did not get any execution time in the NODE_TIMEOUT interval. The cluster coordinator made a decision that the node had failed and instigated a cluster reformation. On reforming the cluster (a two-node cluster), the original node had, by that time, " resolved " its starvation problem and won the resulting election and, hence, was the only node left in the cluster. The other node instigated a Transfer Of Control (TOC) to preserve data integrity ( split-brain syndrome) because it did not obtain the cluster lock. The application running on the node that instigated a Transfer Of Control (TOC) had to be restarted on the remaining node. The moral of the story is twofold:
  
  Be very careful if you are going to run processes at or above priority = 20.
  
  If you are going to use high priority processes, consider increasing your NODE_TIMEOUT .
  
  Below, we look at analyzing the resulting crashdump . We are interested in establishing a number of facts:
  
  Check out the cluster configuration files and syslog.log for additional information.
  
  Was the crash a TOC instigated by Serviceguard?
  
  When was the last time cmcld ran?
  
  In my example, I simply ran STOP cmcld by sending it a signal 24, i.e., kill “STOP $(cat /var/adm/cmcluster/cmcld.pid) on the machine hpeos002 . This is obviously something I do not suggest that you undertake on a live system. Here's the output form cmviewcl “v :
  
  root@hpeos001[cmcluster] # cmviewcl -v CLUSTER STATUS McBond up NODE STATUS STATE hpeos001 up running Network_Parameters: INTERFACE STATUS PATH NAME PRIMARY up 8/16/6 lan0 STANDBY up 8/20/5/2 lan1 NODE STATUS STATE hpeos002 down failed Network_Parameters: INTERFACE STATUS PATH NAME PRIMARY unknown 2/0/2 lan0 STANDBY unknown 4/0/1 lan1 root@hpeos001[cmcluster] #
  
  We would follow this up by analyzing the information from syslog.log :
  
  Aug 2 19:52:00 hpeos001 cmcld: Timed out node hpeos002. It may have failed. Aug 2 19:52:00 hpeos001 cmcld: Attempting to adjust cluster membership Aug 2 19:52:02 hpeos001 inetd[4426]: registrar/tcp: Connection from hpeos001 (192.168.0.201) at Fri Aug 2 19:52:02 2002 Aug 2 19:52:06 hpeos001 vmunix: SCSI: Reset requested from above -- lbolt: 547387, bus: 1 Aug 2 19:52:06 hpeos001 cmcld: Obtaining Cluster Lock Aug 2 19:52:09 hpeos001 vmunix: SCSI: Resetting SCSI -- lbolt: 547687, bus: 1 Aug 2 19:52:09 hpeos001 vmunix: SCSI: Reset detected -- lbolt: 547687, bus: 1 Aug 2 19:52:16 hpeos001 cmcld: Unable to obtain Cluster Lock. Operation timed out. Aug 2 19:52:16 hpeos001 cmcld: WARNING: Cluster lock disk /dev/dsk/c1t0d0 has failed. Aug 2 19:52:16 hpeos001 cmcld: Until it is fixed, a single failure could Aug 2 19:52:16 hpeos001 cmcld: cause all nodes in the cluster to crash Aug 2 19:52:16 hpeos001 cmcld: Attempting to form a new cluster Aug 2 19:52:23 hpeos001 cmcld: Obtaining Cluster Lock Aug 2 19:52:24 hpeos001 cmcld: Cluster lock /dev/dsk/c1t0d0 is back on-line Aug 2 19:52:24 hpeos001 cmcld: Turning off safety time protection since the cluster Aug 2 19:52:24 hpeos001 cmcld: may now consist of a single node. If Serviceguard Aug 2 19:52:24 hpeos001 cmcld: fails, this node will not automatically halt
  
  The " SCSI: Reset “ lbolt " messages in this instance is as a result of the node resetting the SCSI interface after another node leaves the cluster. Should you see any " SCSI: Reset “ lbolt " messages during normal operation, you should investigate them as a separate hardware-related problem.
  
  You can see that I obtain the cluster lock after the SCSI reset. I am now the only member of the cluster.
  
  Here's the crashdump analysis I performed on the resulting TOC of hpeos002 . Input commands will be underlined. Interesting findings will be highlighted with a larger font and accompanying notes:
  
  root@hpeos002[] # cd /var/adm/crash root@hpeos002[crash] # ll total 4 -rwxr-xr-x 1 root root 1 Aug 2 21:01 bounds drwxr-xr-x 2 root root 1024 Aug 2 21:01 crash.0 root@hpeos002[crash] # cd crash.0 root@hpeos002[crash.0] # ll total 129856 -rw-r--r-- 1 root root 1176 Aug 2 21:01 INDEX -rw-r--r-- 1 root root 8372224 Aug 2 21:01 image.1.1 -rw-r--r-- 1 root root 8364032 Aug 2 21:01 image.1.2 -rw-r--r-- 1 root root 8368128 Aug 2 21:01 image.1.3 -rw-r--r-- 1 root root 8376320 Aug 2 21:01 image.1.4 -rw-r--r-- 1 root root 8388608 Aug 2 21:01 image.1.5 -rw-r--r-- 1 root root 4390912 Aug 2 21:01 image.1.6 -rw-r--r-- 1 root root 20223172 Aug 2 21:01 vmunix root@hpeos002[crash.0] # more INDEX comment savecrash crash dump INDEX file version 2 hostname hpeos002 modelname 9000/715 panic TOC, pcsq.pcoq = 0.15f4b0, isr.ior = 0.91fcf0

NOTE : Although this tells us the system instigated a Transfer Of Control (TOC), it doesn't tell us why.

 dumptime  1028318274 Fri Aug   2 20:57:54 BST 2002 savetime  1028318469 Fri Aug   2 21:01:09 BST 2002 release   @(#)     $Revision: vmunix:    vw: -proj    selectors: CUPI80_BL2000_1108 -c 'Vw  for CUPI80_BL2000_1108 build' -- cupi80_bl2000_1108 'CUPI80_BL2000_1108'  Wed Nov  8 19:05  :38 PST 2000 $ memsize   268435456 chunksize 8388608 module    /stand/vmunix vmunix 20223172 3848474440 image     image.1.1 0x0000000000000000 0x00000000007fc000 0x0000000000000000 0x0000000000001127 3658315572 image     image.1.2 0x0000000000000000 0x00000000007fa000 0x0000000000001128 0x00000000000019ef 2052742134 image     image.1.3 0x0000000000000000 0x00000000007fb000 0x00000000000019f0 0x00000000000030af 1656526062 image     image.1.4 0x0000000000000000 0x00000000007fd000 0x00000000000030b0 0x00000000000090bf 2888801859 image     image.1.5 0x0000000000000000 0x0000000000800000 0x00000000000090c0 0x000000000000c97f 1440262390 image     image.1.6 0x0000000000000000 0x0000000000430000 0x000000000000c980 0x000000000000ffff 320083218 root@hpeos002[crash.0] # root@hpeos002[crash.0] #  q4pxdb vmunix  . Procedures: 13 Files: 6 root@hpeos002[crash.0] #  q4 -p .  @(#) q4 $Revision: B.11.20f $ $Fri Aug 17 18:05:11 PDT 2001 0 Reading kernel symbols ... Reading data types ... Initialized PA-RISC 1.1 (no buddies) address translator ... Initializing stack tracer ... script /usr/contrib/Q4/lib/q4lib/sample.q4rc.pl executable /usr/contrib/Q4/bin/perl version 5.00502 SCRIPT_LIBRARY = /usr/contrib/Q4/lib/q4lib perl will try to access scripts from directory /usr/contrib/Q4/lib/q4lib q4: (warning) No loadable modules were found q4: (warning) No loadable modules were found q4>  ex &msgbuf+8 using s  NOTICE: nfs3_link(): File system was registered at index 3. NOTICE: autofs_link(): File system was registered at index 6. NOTICE: cachefs_link(): File system was registered at index 7. 1 graph3 2 bus_adapter 2/0/1 c720 2/0/1.0 tgt 2/0/1.0.0 sdisk 2/0/1.1 tgt 2/0/1.1.0 sdisk 2/0/1.3 tgt 2/0/1.3.0 stape 2/0/1.6 tgt 2/0/1.6.0 sdisk 2/0/1.7 tgt 2/0/1.7.0 sctl 2/0/2 lan2 2/0/4 asio0 2/0/6 CentIf 2/0/8 audio 2/0/10 fdc 2/0/11 ps2 5 bus_adapter 5/0/1 hil 5/0/2 asio0 4 eisa 4/0/1 lan2 8 processor 9 memory     System Console is on the ITE Entering cifs_init... Initialization finished successfully... slot is 9 Logical volume 64, 0x3 configured as ROOT Logical volume 64, 0x2 configured as SWAP Logical volume 64, 0x2 configured as DUMP     Swap device table:  (start & size given in 512-byte blocks)         entry 0 - major is 64, minor is 0x2; start = 0, size = 1048576     Dump device table:  (start & size given in 1-Kbyte blocks)         entry 00000000 - major is 31, minor is 0x6000; start = 88928, size = 524288 Starting the STREAMS daemons-phase 1 Create STCP device files          $Revision: vmunix:    vw: -proj    selectors: CUPI80_BL2000_1108 -c 'Vw for CUPI80_BL2000_1108 build' -- cupi80_bl2000_1108 'CUPI80_BL2000_1108'  Wed Nov  8 19:05:38 PST 2000 $ Memory Information:     physical page size = 4096 bytes, logical page size = 4096 bytes     Physical: 262144 Kbytes, lockable: 185460 Kbytes, available: 213788 Kbytes NOTICE: vxvm:vxdmp: added disk array OTHER_DISKS, datype = OTHER_DISKS   Serviceguard: Unable to maintain contact with cmcld daemon.     Performing TOC to ensure data integrity.

NOTE : As we can see here, Serviceguard has given us a clear message that there is something wrong in the cluster. Let us find the cmcld process:

 q4>  load struct proc from proc_list max nproc next p_global_proc  loaded 134 struct procs as a linked list (stopped by null pointer) q4>  print p_pid p_uid p_comm  grep cmcld  3791     0   "cmcld" q4>  keep p_pid==3791  kept 1 of 134 struct proc's, discarded 133 q4>  load struct kthread from p_firstthreadp  loaded 1 struct kthread as an array (stopped by max count) q4>  trace pile  stack trace for process at 0x0`0316a040 (pid 3791), thread at 0x0`02f5e800 (tid 3897) process was not running on any processor _swtch+0xc4 _sleep+0x2f4 select+0x5e4 syscall+0x6ec $syscallrtn+0x0 q4>  print kt_tid kt_pri kt_lastrun_time ticks_since_boot   3897    532          104635           110504

NOTE : Here, we can see the priority of cmcld = 20. The internal, kernel priorities are offset by 512 to the external user priorities. We can also see the discrepancy between the last time this thread ran and the cumulative ticks (10 milliseconds ) since the system was booted .

 q4> q4>  (ticks_since_boot - kt_lastrun_time)/100   072     58      0x3a

NOTE : As we can see, 58 (decimal) seconds passed since cmcld ran. This is some time outside of our NODE_TIMEOUT , so we can conclude that we are now into the time when the election would be running and this node, having lost the election, instigated a Transfer Of Control (TOC).

 q4>  history  HIST NAME   LAYOUT COUNT TYPE           COMMENTS    1 <none>   list   134 struct proc    stopped by null pointer    2 <none> mixed?    1 struct proc    subset of 1    3 <none>  array     1 struct kthread stopped by max count q4>  recall 2  copied a pile q4>  print p_pid p_uid p_stat p_cursig p_comm  p_pid p_uid p_stat p_cursig  p_comm  3791     0  SSTOP       24 "cmcld" q4>

NOTE : We sent signal 24 ( STOP signal) and hence p_cursig is set. This is confirmed by the STATE of the process = SSTOP .

Process priorities do not constitute the only reason why cmcld may be starved for resources. It could be due to many other reasons. The idea here is for us to attempt to establish any possible reasons why we experienced a cluster reformation and resulting TOC. Even if we are confident about the reasons surrounding the cluster reformation and resulting TOC, I would strongly suggest that you place a Software Support Call to your local Hewlett-Packard Response Center to have a trained Response Center Engineer analyze the crashdump in detail and come to his own conclusions. We can pass on our initial findings in order to try to speed up the process of finding a root cause for this problem. It is always wise to have professional, experienced help in establishing the root cause of any system crash. Your local Response Center will continue with the investigation to establish the root cause of the failure.

We now know that our cluster has formed successfully and is behaving " properly ." We have ensured that certain critical tests have shown that Serviceguard can accommodate local LAN failures easily and it reacts, as expected, when critical components, i.e., cmcld , becomes starved for resources. At this time, we can conclude our discussions on a package-less cluster. I know of a number of installations that use Serviceguard simply to provide automatic LAN failover, i.e., what we have demonstrated so far. If that is all you want Serviceguard to perform, then that's it happy clustering. However, most of us will be using Serviceguard to protect our applications from the types of failures we considered earlier. After all, it is the applications that run on these systems that give these systems their value within the organization. Let's move on to consider Packages. We look at constructing from "scratch" and consider using some of the various Serviceguard Toolkits.