10.4 Programming Monitors | Linux Kernel in a Nutshell (In a Nutshell (OReilly))

Many administration tools will perform a task and then exit. For example, you may write a tool that queries SNMP variables on a device and then prints the results. This is a relatively straightforward program. However, you may also find that you occasionally want to write a program that runs all the time, monitoring a network service.

Ideally, you will not need to write a program like this yourself but can instead use the capabilities of other monitoring software. It is risky to have too many separate pieces of software performing monitoring functions. If one of them should die because of a bug or other problem, would you notice? If you have one central piece of monitoring software, you probably will, especially if you have software monitoring the monitor itself. But if you have 10 different scripts monitoring conditions of your network, all of which report only once in a while, how long will it take you to notice that one of them has stopped reporting? Regardless, the following sections provide a few tips that will help if you do find you need to write a monitoring script.

10.4.1 Loop Timing

Pay careful attention to the parts of your program that have loops in them. When writing a monitoring script, you will probably have at least one loop: the large loop around the entire program that keeps it going. Be sure to add delays to the loop as needed! You can do this with the sleep command in both Perl and the Bourne shell, as demonstrated below. Remember that the computer will try to execute your script as quickly as it can. If your script is sending SNMP probes, sending email, or doing anything that interacts with the rest of the world, you will want to make sure you do not cause an operational problem by overwhelming some resource. Let's say you are going to write a script that checks if a machine is responding to SNMP requests . If the machine does not respond to SNMP, the script will send a piece of email to the network administrators:

 #!/bin/sh    host=www.example.com    to=admins@example.com    community=public    while [ 1 ]; do snmpget -v 1 $host $community system.sysDescr.0 > /dev/null 2> \         /dev/null      if [ $? -eq 1 ]; then         echo "The host $host is not pinging"  /usr/lib/sendmail $to         sleep 1800      fi      sleep 60    done

This script has a number of shortcomings: It assumes sendmail is available and it will repeatedly send email while the host being probed is unresponsive . The latter point will be addressed in the next section. It also does not include a subject line, which is a problem addressed in Section 10.4.4.

The important thing to notice is that there are two sleep statements in this example. The first one is at the end of the while loop, and it causes the program to sit and do nothing for a full minute. This ensures that the program does not do anything more often than once a minute. If this sleep statement were not present, the script would send a flood of SNMP requests to the host you wish to probe. Also vital is the sleep statement after email is sent, which causes it to wait 30 minutes (1800 seconds) after sending mail before doing anything else. If it were not present, and www.example.com stopped responding to SNMP requests for an extended period of time, the script would send a piece of email once every minute. Just imagine what would happen if neither sleep statement were included and www.example.com was down.

10.4.2 State Machines

Instead of receiving a piece of email every half hour in the above example, it would be preferable to receive a piece of email only when the machine either has been responding but stops or has not been responding but begins to respond. To make this happen, you must introduce the concept of state into your script. You will use a variable that indicates the current state of the SNMP status of the machine (we can call the states " alive " and "dead"), and when the state changes, the program triggers a message:

 #!/bin/sh    host="www.example.com"    to="admins@example.com"    community="public"    state="alive"    while [ 1 ]; do      snmpget -v 1 $host $community system.sysDescr.0 > /dev/null 2> \         /dev/null      response=$?      if [ "$state" = "alive" ]; then         if [ $response -eq 1 ]; then            echo "The host $host has stopped pinging"  \       /usr/lib/sendmail $to            sleep 300            state="dead"         fi      elif [ "$state" = "dead" ]; then         if [ $response -eq 0 ]; then            echo "The host $host has started pinging"  \       /usr/lib/sendmail $to            sleep 300            state="alive"         fi      fi      sleep 60    done

If the SNMP state is "alive" but the machine is no longer responding, a notification is sent, and the state is changed to "dead." If the state is "dead" and the machine is responding, notification is sent and the state is changed to "alive." When the state stays the same (SNMP was dead and is still not responding or SNMP was alive and continues to respond), no message is sent.

A program that has different states like this is called a finite state machine . The possible states of the program, along with indications of how to transition between those states, can be represented in a simple state transition diagram , as in Figure 10.1. This program is simple and has only two states, making the state transition diagram useful only as an exercise. But add another state or two to your program, and it can easily become complicated enough that a state transition diagram will be a great help. It will ensure that you fully understand the structure of the state machine and that you do not forget to deal with any unexpected circumstances. It is surprisingly easy to forget about a state transition or two if you have even a modest number of states in your program.

Figure 10.1. Simple State Transition Diagram.

graphics/10fig01.gif

10.4.3 Keeping It Running

When writing a monitoring tool, especially one that will remain silent when there are no problems to report, you must take care to write the program so that it is not likely to exit unexpectedly. For scripting languages like the Bourne shell and Perl, this mostly this means keeping the scripts simple and avoiding calls to exit and other functions that terminate the program. In languages such as C, you will need to take many additional measures to ensure the program does not exit because of memory corruption or other bugs that are unlikely to occur in a scripting language.

10.4.4 Sending Nicer Mail with Sendmail

The sendmail program does not have a convenient mechanism for specifying a subject line or other message headers on the command line. In order to send a message that includes these headers, you can create an appropriately formatted file and run sendmail with the -t argument. The following program fragment demonstrates one way to do this in the Bourne shell:

 tmpmsg=/tmp/msg.$$    echo "To: admins@example.com" > $tmpmsg    echo "From: root@server.example.com" >> $tmpmsg    echo "Subject: Monitor for $host" >> $tmpmsg    echo "" >> $tmpmsg    echo "$host is not responding" >> $tmpmsg    cat $tmpmsg  /usr/lib/sendmail -t    rm -f $tmpmsg

Or, if you want to impress your friends , you can use this fancy shortcut:

 tmpmsg=/tmp/msg.$$    cat > $tmpmsg <<EOF    To: admins@example.com    From: root@server.example.com    Subject: Monitor for $host    $host is not responding    EOF    cat $tmpmsg  /usr/lib/sendmail -t    rm $tmpmsg

Make sure the blank line is present between the subject line and the message body; this is how sendmail determines where the headers end and the message begins.

In Perl, the process is the same: Create a file with the message you wish to send and feed it to the standard input of sendmail -t :

 open (SENDMAIL, "/usr/lib/sendmail -t");    print SENDMAIL "To: admins\@example.com\n";    print SENDMAIL "From: root\@server.example.com\n";    print SENDMAIL "Subject: Monitor for $host\n";    print SENDMAIL "\n";    print SENDMAIL "$host is not responding.\n";    close(SENDMAIL);

Note that at signs (@) need to be escaped with a backslash and that a newline character is explicitly included at the end of each line.