This chapter has three Mon recipes that build on each other to provide a Mon configuration capable of monitoring the cluster nodes using SNMP. Each cluster node runs the snmpd daemon, which monitors system resources and writes information to a Management Information Base (MIB) stored locally on each cluster node. Mon, running on the cluster node manager, examines the health of each cluster node by reading each of these MIBs across the network, as shown in Figure 17-2.
Figure 17-2: Mon examines the MIB of each cluster node
Notice in Figure 17-2 that the MIB is stored locally on each cluster node. Mon can select the information it wants from each MIB on the cluster nodes.
List of ingredients:
The Mon software (on the CD-ROM included with this book or from http://www.kernel.org)
The fping package (on the CD-ROM or from http://www.fping.com)
The SNMP software package
Your first step is to compile and install the fping package. You'll find the fping source code on the CD-ROM in the chapter17 directory. After untarring the fping package (see Appendix A), enter the following commands in the fping source directory to install the /usr/local/sbin/fping program:
#./configure #make #make check #make install
Note | You can use fping in conjunction with your own script to monitor the status of the cluster nodes in the network, as discussed on the fping man page. However, in this recipe we will use the powerful features of the Mon package. |
Next, you need to install the Perl SNMP package that Mon will use to monitor hosts using SNMP. Your distribution probably already ships with SNMP support, but it may not include the Perl SNMP package.
You can compile the Perl SNMP package from CPAN (called simply "SNMP"). If this compile fails, you'll need to rebuild the entire SNMP package and the Perl SNMP package from the source code on the CD-ROM included with this book. To do so, first untar the net-snmp-version.tar file from the CD-ROM on to your local hard drive, and then from inside the netsnmp-version source directory, enter the following.
#./configure #make #umask 022 #make install
Note | You must run these commands as root. |
To build and install the Perl SNMP package, enter:
#cd perl #perl Makefile.PL -NET-SNMP-CONFIG="../net-snmp-config" #make #make install
Note | The perl directory is located underneath the net-snmp source code directory. |
Instead of compiling and installing the source from the CD-ROM included with this book, you may want to get the latest version by installing the net-snmp-devel package found on the download link at http://net-snmp.sourceforge.net and using CPAN to install SNMP with the commands:
#perl -MCPAN -e shell cpan> install SNMP
These two commands download and install the latest Perl SNMP package from the CPAN site.
The chapter17/dependencies directory on the CD-ROM holds the Perl modules you need to get Mon up and running. To install these modules, copy them to your local hard drive, untar them, and then cd into each directory and enter the commands:
#perl Makefile.PL #make #make test #make install
Note | If the SNMP installation fails, use the net-snmp source code included on the CD-ROM |
You can also download the CPAN modules (see Chapter 16 for an example and more information about how to use CPAN) using the commands:
#perl -MCPAN -e shell cpan>install Time::Period cpan>install Time::HiRes cpan>install Convert::BER cpan>install Mon::Protocol cpan>install Mon::SNMP cpan>install Mon::Client
And, depending upon what you want to monitor, you may also wish to install the following optional modules:
cpan>install Filesys::DiskSpace cpan>install Net::Telnet cpan>install Net::LDAPapi cpan>install Net::DNS cpan>install SNMP
Before installing Mon, create the directory where the Mon files will reside before downloading:
#mkdir -p /usr/lib/mon
Next, untar the Mon tar file located in the chapter17 directory on the CDROM into this directory (or download Mon from http://www.kernel.org/software/mon).
#mount /mnt/CD #cd /mnt/CD/chapter17 #tar xvf mon-<version>.tar -C /usr/lib #mv /usr/lib/mon-<version> /usr/lib/mon
You should now have a /usr/lib/mon directory with a list of files that looks like this:
alert.d clients CREDITS INSTALL mon.d README utils cgi-bin COPYING doc KNOWN-PROBLEMS mon.lsm state.d VERSION CHANGES COPYRIGHT etc mon muxpect TODO
Edit /etc/services, and add the following two lines to the end of the file:
mon 2583/tcp # MON mon 2583/udp # MON traps
To help make administration of Mon easier, create a symbolic link from the /usr/lib/mon/etc directory (just created by the software download) to the /etc/mon directory on the system with the following command:
#ln -s /usr/lib/mon/etc /etc/mon
The chapter17 directory on the CD-ROM contains the following sample configuration file.
Note | Do not enter the arrows and the numbers that follow them; they appear here for documentation purposes only. |
# # Simplified cluster "mon.cf" configuration file # alertdir = /usr/lib/mon/alert.d ← [1] mondir = /usr/lib/mon/mon.d ← [2] logdir = /usr/lib/mon/logs ← [3] histlength = 500 ← [4] dtlogging = yes ← [5] dtlogfile = /usr/lib/mon/logs/dtlog ← [6] hostgroup clusternodes clnode1 clnode2 clnode3 ← [7] ← [8] watch clusternodes ← [9] service cluster-ping-check [10] interval 5s [11] monitor fping.monitor [12] period wd {Sa-Su} [13] alert mail.alert alert@domain.com ← [14] upalert mail.alert alert@domain.com ← [15] alertevery 1h ← [16]
[1] | → Path to the alert scripts. |
[2] | → Path to the monitor scripts. |
[3] | → Path to the log file. |
[4] | → The maximum number of events to save in the log. |
[5] | → Enable downtime logging. |
[6] | → Log to store downtime events. |
[7] | → List of cluster nodes assigned to a group. |
[8] | → Required empty line after each hostgroup. |
[9] | → Watch all of the nodes in the group. |
[10] | → Call the service anything you want. |
[11] | → How frequently the fping utility is run. |
[12] | → Use the fping.monitor script. |
[13] | → Type perldoc Time::Period for syntax.[1] |
[14] | → When one of the nodes goes down. |
[15] | → When one of the nodes comes up. |
[16] | → Only send alert email once per hour. |
Note | This configuration file is located on the CD-ROM in the chapter17 directory. |
Note the time period definition in this configuration file is period wd {Sa-Su}. If the host does not reply to a ping attempt when Mon is running outside this time period, no alert will be sent. This example uses Saturday thru Sunday as its period, which means the host can fail at any time and Mon will report it.
Note | We are using a very short interval of five seconds (5s) for testing purposes. You may want to reduce the amount of ping (ICMP) traffic on your network by increasing this time interval when you use Mon in production. |
Complex Mon configuration files can be built using the GNU m4 macro processor. The full documentation for the m4 processor is available online at http://www.gnu.org/software/m4/manual/index.html. Why use the m4 macro processor to process your Mon configuration files? The m4 macro processor allows you to create sophisticated configuration files that contain, among other things, variables. You don't have to retype the same thing multiple times throughout the configuration file (one email address for all of your alerts, for example). See the /usr/lib/mon/etc/ example.m4 configuration file that comes with the Mon distribution for an example. When you build a configuration file that relies on the m4 processor, you can manually convert it to a normal configuration file that Mon understands by passing it through the m4 macro processor each time you make a change, or you can start Mon with the -c option followed by the full path and file name of the configuration file and Mon will pass your configuration file through the m4 macro processor for you automatically.
To simulate what Mon will do, run our alert script manually:
#/usr/lib/mon/mon.d/fping.monitor clnode1
Assuming the clnode1 computer is online and able to reply to ICMP packets, you should see output that looks like this:
start time: <current date> end time : <current date> duration : 0 seconds ------------------------------------------------------------------------------ reachable hosts rtt ------------------------------------------------------------------------------ 209.100.100.2 58.10 ms
This report indicates that cluster node1 at IP address 209.100.100.2 responded to the ping in 58.10 milliseconds.
If this script does not work properly, you may need to tell Perl where to find the fping utility. For example, edit the fping.monitor file, and change the line:
my $CMD = "fping -e -r $RETRIES -t $TIMEOUT";
so it looks like this:
my $CMD = "/usr/local/sbin/fping -e -r $RETRIES -t $TIMEOUT";
To test the email alert, enter:
#echo "Testing 123" | /usr/lib/mon/alert.d/mail.alert alert@domain.com
where alert@domain.com is the email address you would like to send email messages to.
Create the Mon log directory based on the logdir parameter you used in the mon.cf file.
#mkdir -p /usr/lib/mon/logs
Before we run Mon as a daemon, we can run it from a shell prompt to check our configuration for syntax errors with the command:
#/usr/lib/mon/mon -d
If all of the cluster nodes are "pingable," you will see (among other things) lines like the following scroll by at each interval (recall that we set the interval to five seconds for testing purposes):
PID 8211 (clusternodes/cluster-ping-check) exited with [0]
This line indicates that the cluster responded to the ping test. However, if you take down a cluster node, you'll see the exit status of the script change from 0 to 1. You can also watch the /var/log/messages file for a report that the mail.alert script was run. When Mon sees the node drop off the network, you should get an email alert, and when it returns to the network, you should receive an email upalert.
Now let's build on this basic Mon configuration by adding support for the SNMP protocol.
[1]wd is a code passed to the Time::Period Perl program to indicate the scale used to define the time period. The Perl period program tests to see if the time right now falls within the time period value you assign using the wd scale. The wd scale permits 1-7 or su, mo, tu, we, th, fr, sa to be used in the period definition. Thus, {Sa-Su} means the period is Saturday through Sunday, which is the same as saying all week or anytime. (A blank period would mean the same thing.)