Chapter 17: The Simple Network Management Protocol and Mon


In this chapter, we'll see how the Simple Network Management Protocol (SNMP) can be used in conjunction with the Mon monitoring package to alert you of problems on the cluster nodes. The SNMP daemon, snmpd, runs on each cluster node, and the Mon monitoring daemon runs on the cluster node manager sitting outside of the cluster. When Mon finds a problem, it alerts you via an email message or a text message sent to your PDA device, so that you can take action before the problem affects users running mission-critical applications on the cluster.

Mon

The humbly named Mon monitoring package uses a hierarchical relationship between events to decide whether or not to alert the system administrator. For example, if you are watching the http and telnet daemons on a cluster node, and you are also watching to see if the node is alive using the ping utility (and ICMP packets), you don't need three alerts: the first reporting that http is not available, the second complaining that telnet doesn't work, and finally a third telling you the node is not available on the network.

Mon Alerts

Mon can send alerts using a variety of methods, including:

  • SNMP traps.

  • Email messages.

  • Short Message System (SMS) text messages to a cell phone or PDA device.

  • A custom script or program.

Mon allows you to control how often an alert is sent if a service continues to be down for a specified time period and to control where and how often alerts are sent according to the day of the week or the time of day.

Mon Monitoring Scripts

To monitor a service, Mon runs a monitoring script and passes arguments to it (an IP address to monitor, for example). Mon looks at three things returned by the script:

  • The exit or return status of the script.

  • The first line of output printed by the monitor script (the status summary in Mon-speak).

  • Any remaining output from the monitor script (the status detail).

Mon and Monitoring Script Return Codes

Mon looks at the return code of the monitoring script. If the return code is 0, Mon knows that nothing bad has happened. Any nonzero return value tells Mon that the test failed. If a script returns a nonzero status more than once, Mon will start comparing the status summary, which is the first line of text printed by the script, to see if it should issue another alert due to the change in the status summary.



The Linux Enterprise Cluster. Build a Highly Available Cluster with Commodity Hardware and Free Software
Linux Enterprise Cluster: Build a Highly Available Cluster with Commodity Hardware and Free Software
ISBN: 1593270364
EAN: 2147483647
Year: 2003
Pages: 219
Authors: Karl Kopper

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net