Forcing a Stonith Event with Mon


We can use Mon on the cluster node manager to access cluster services from outside the cluster. Mon can test the health of the cluster from the perspective of a client computer and use custom scripts to take corrective action to return the services to normal operation. For example, if the cluster services are not available on the VIP address, Mon can check to see if the services are still available on the cluster node RIP addresses. If the services are working normally on the RIP addresses but not the VIP address, Mon knows the Director is malfunctioning and corrective action is required.

Here is a sample mon.cf configuration file to accomplish this:

 alertdir      = /usr/lib/mon/alert.d mondir        = /usr/lib/mon/mon.d logdir        = /usr/lib/mon/logs histlength    = 500 dtlogging     = yes dtlogfile     = /usr/lib/mon/logs/dtlog # The list of cluster node host names. hostgroup clusternodes clnode1 clnode2 clnode3 clnode4 # 209.100.100.2 is the LVS VIP address hostgroup clusterservices 209.100.100.2 watch clusternodes     service RIP         interval 30s         monitor telnet.monitor         period wd {Su-Sa}             alert mail.alert alert@domain.com             upalert mail.alert alert@domain.com             alertevery 1h watch clusterservices     service VIP         interval 1m         monitor telnet.monitor         depend clusternodes:RIP         period wd {Su-Sa}             alterafter 5         alert mail.alert alert@domain.com         alert initiate.stonith.event 

Note 

The telnet.monitor script is included with the Mon distribution.

This configuration file tells Mon to watch the telnet service on the cluster nodes, and if the cluster nodes are available when telnet is not available on the cluster VIP (from the LVS Director), the Mon server will reset the power to the Director (the standby Director should take over the cluster load-balancing service and resume offering the telnet service to client computers).

The initiate.stonith.event script should contain the Stonith commands to power reset the LVS Director. To create this script, we first need to build a Stonith configuration file—we'll call ours /etc/stonith.cfg. In this file, place the correct syntax for your Stonith device (see Chapter 9). Here is the syntax for a Western Telematic Remote Power Switch rps10 device:

 #vi /etc/stonith.cfg     /dev/ttyS0    lvsdirector    0 

This single line in the /etc/stonith.cfg file tells the stonith command (we'll use it in a moment) that the RPS10 device is connected to the first serial port on the cluster node manager, and that it is controlling power to the host named lvsdirector using the first port on the RPS device.

Now create the initiate.stonith.event script containing:

 #!/bin/bash # # initiate.stonith.event # /usr/sbin/stonith -F /etc/stonith.cfg lvsdirector 

Place this file in the /usr/lib/mon/alert.d directory, and make sure it has the execute bit set:

 #chmod 600 /usr/lib/mon/alert.d/initiate.stonith.event 

Note 

You will probably also want to create a mon.cf entry to monitor the VIP address that does not depend on the RIP addresses that will notify you via email or pager when the VIP address isn't accessible. Regardless of what has gone wrong, you'll want to know when resources cannot be accessed on the VIP.



The Linux Enterprise Cluster. Build a Highly Available Cluster with Commodity Hardware and Free Software
Linux Enterprise Cluster: Build a Highly Available Cluster with Commodity Hardware and Free Software
ISBN: 1593270364
EAN: 2147483647
Year: 2003
Pages: 219
Authors: Karl Kopper

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net