7.5 Configuring Sysmon

Once you have verified that sysmond is up and running, you can begin to fill out the configuration with the devices you wish to monitor. The Sysmon configuration is made up of a list of objects to be monitored , along with a section of global configuration options. Within each object definition, you must specify the IP address of the device to be tested , the kind of test that should be performed, and any objects the device depends on. When object A is configured to depend on object B, notifications will not be sent about the status of object A if object B is also unreachable.

Note that each line of the configuration ends with a semicolon, except for lines that end with an open curly brace . Additionally, lines beginning with a pound sign (#) are ignored and can be used for comments.

7.5.1 The Root Node

Sysmon requires one object to be defined as the root of the device hierarchy. This is the object that all others depend upon to be up. A good choice for this is the server that Sysmon runs on itself, as was configured in the earlier example. Note that in defining the root node, the name of the object must be in quotation marks:

 root="server";

7.5.2 Objects and Dependencies

Now you can add to the configuration other objects to be monitored. Start by adding a simple ping test for the router that server.example.com is connected to:

 object router1 { ip "192.0.2.5"; type ping; desc "Router1"; dep "server"; contact "admin@example.com"; };

The Object Name

First note the name of the object, here "router1." The name does not need to correspond to the hostname of the device, though having it correspond may keep the configuration easy to maintain. The object name is used solely for referencing the object later in the configuration. It can be any text string you like, as long each object has a unique name.

Setting the IP Address

The first object option listed above is the ip option, which specifies the IP address of the device to be tested. This field can really contain either the hostname or the IP address of the device. There are advantages and disadvantages to both. If an IP address is used, Sysmon will have no dependence on DNS's working properly, and the test will be performed appropriately even if DNS has failed. However, if you have many pieces of equipment to keep track of and the hardware being monitored may be replaced by a device with a different IP address that later takes over the older hostname, it would be highly preferable to use hostnames instead of IP addresses. This way, your Sysmon configuration does not become a second place where DNS information must be maintained . Sysmon does keep its own DNS cache, which gives you some ability to control the interaction between Sysmon and the DNS independent of the software running on the server. The options that control this behavior are described in the section on Global Options.

Setting the Test Type

The next line in the object added above directs Sysmon to perform a ping test on the device in question. There are 10 other possible values for the the test type, all of which are listed in Figure 7.3. Most of these tests are configured just like the ping test: Simply declare the test type, and you're done. For a few of them, there are extra options you must use. For example, in configuring an object to test Web service, using the www test type, you must include the url and urltext options:

Figure 7.3. Sysmon Test Types.

Test	Function	Options
`ping`	standard ping test
`pop3`	working POP3 server	`username` , `password`
`tcp`	generic listening TCP port	`port`
`udp`	generic listening UDP port	`port`
`radius`	working radius server	`username` , `password` , `secret`
`nntp`	listening news server
`smtp`	listening mail server
`imap`	listening IMAP server
`x500`	listening x500 directory server
`www`	listening web server	`url` , `urltext`
`sysmon`	running remote sysmon server

 object web-server { ip "www.example.com"; type www; desc "Main Web Server"; dep "router1"; url "http://www.example.com/"; urltext "<TITLE>"; contact "admin@example.com"; };

This tests the URL http://www.example.com/ . If the page is not loadable or does not contain the text <TITLE>, the test will fail.

The tcp and udp tests must have a port number specified so that Sysmon knows which port to test. The pop3 test requires a valid username and password, and the radius test requires the same, along with a radius secret string.

Setting the Object Description

The object description, set with desc , is simply text you add to help identify the object in reports . Sysmon will run if you do not include this option, but because of the way the configuration file is parsed, if the desc option is not present, the description from the previous object will be used. This is probably not the desired behavior. If you explicitly want to set no description for a device, you can do it with:

 desc "";

Specifying Dependencies

An object's dependencies are set with the dep command. Every object must depend on at least one other object, with the exception of the root node, which has no dependencies. An object may depend on more than one other object by including multiple dep lines:

 object server1 { ip "server1.example.com"; type ping; desc "Server 1"; dep "router1-servernet"; dep "router2-servernet"; contact "admin@example.com"; };

In this example, server1.example.com has a connection to two different routers, which are objects defined elsewhere in the config. If either router is functioning, there will be connectivity to the server. By listing two dependencies, you can direct Sysmon to ignore the server's status only if both routers are down.

Also notice that the address listed for each router is the address of the relevant interface. It is good practice to use separate dependencies for each interface instead of using just one of the router's addresses for every dependency. This way, if a particular interface goes down, you will ignore only services behind that interface instead of the services behind the entire router.

Setting the Contact

The contact command specifies the email address that should be notified if an object fails its tests. You can list multiple email addresses by separating them with commas:

 contact "admin@example.com,joe-pager@example.com";

Using the Spawn Option

Email is not always the best way to notify an administrator of a critical problem. To begin with, email is not guaranteed to be a timely service. Though abnormal, it is possible for an email message to be queued for hours, or even days. Additionally, email notifications will be useless if the mail system itself is unavailable. It is preferable to send critical notifications via some other mechanism, such as a direct message to a pager or cell phone.

Sysmon does not yet have support for sending pages by itself, but there is a hook that will let you do it. The spawn command will execute a program of your choosing when notification needs to be sent for an object. The argument to spawn is the name of the program to execute and the arguments that should be passed to that program. In those arguments, you must specify the format of the message to be sent. Sysmon has a number of "replacement" variables that will translate to different pieces of Sysmon information. For example, %H is replaced with the DNS name of the host being monitored, %s is the name of the service, and %U is the state of the service, either "up" or "down." So the spawn line might look like:

 spawn "/var/tmp/notify.sh %H %s %U";

Then you can create a simple program /var/tmp/notify.sh :

 #!/bin/sh echo "$*"  /usr/lib/sendmail admin@example.com 2> /dev/null

Of course, it would be silly for your notification script to send email like this since you could accomplish your goal just as well with the contact command. It is used here simply as an example. In your own environment, you would instead send the text to a program that would send a page, such as QuickPage. ^[1]

^[1] The QuickPage program is available from http://www.qpage.org/.

The result of the above spawn command, if a service goes down, would be text that looks like:

 WWW.EXAMPLE.COM www down

There are many more replacement variables available, and you can use them to create as detailed a message as you would like. They are all listed in Figure 7.4, which is taken from the Sysmon documentation.

Figure 7.4. Sysmon Replacement Variables. From Sysmon online documentation at `www.sysmon.org/config.html` .

Var	Replacement
`%m`	local host name
`%H`	DNS name of host being monitored
`%s`	service
`%p`	port number (numeric)
`%T`	Current Time hh:mm:ss
`%t`	Current Time mm dd hh:mm:ss
`%d`	Downtime dd:hh:mm
`%D`	Downtime with seconds dd:hh:mm:ss
`%i`	Unique ID for outage
`%I`	IP of host down
`%w`	warning/what
`%u`	error-type converted into string describing it
`%h`	hostname with failure
`%r`	reliability percentage
`%V`	Verbose History (not implemented)
`%c`	Failure iteration count (since last success)
`%C`	Success iteration count (since last failure)
`%U`	Service state (as 'up' or 'down')

Other Object Options

A couple of other options can be used in an object definition. In particular, the contact_on option can direct Sysmon to send notifications only when a service goes up or when it goes down, instead of on both occasions, as is the default behavior. There is also an option called reverse that swaps the meaning of "up" and "down" for service status. This is useful if you use Sysmon to monitor another Sysmon server.

The use of these and other object options is listed in the documentation that comes with the Sysmon package and on the Sysmon Web page.

7.5.3 Global Options

Most of the global options in the Sysmon configuration start with the word config , followed by the option name. While global options can be listed anywhere in the configuration, they are usually placed at the beginning of the file, before any objects are defined.

The Status File

When the Sysmon daemon is running, it will periodically write a file with the status of services that it is monitoring. This can either be in HTML to produce a Web page or in a raw text format suitable for viewing from a terminal. By default, only services that are down are listed, but you can change this by using the showupalso option described below. Use the config statusfile option to direct Sysmon to write the file:

 config statusfile html "/usr/local/apache/htdocs/sysmon.html";

Or if you want a text file:

 config statusfile text "/var/tmp/status.txt";

The time interval that Sysmon waits between refreshing these files is 60 seconds by default, but you can change it with the html refresh option. For example:

 config html refresh 30;

would cause the file to be rewritten every 30 seconds.

Viewing Both Up and Down Services

The status file will print a list of services that are not responding, but if you would like it to also include those services that are responding, use the showupalso option:

 config showupalso;

In the HTML version of the status file, hosts that are up are printed in green, hosts that are down are in red, and hosts that are failing tests and may go down are yellow. These colors can be changed with the upcolor , downcolor , and recentcolor options, respectively.

Mail Header Options

Ordinarily, Sysmon sends mail messages with a from address of "root" at the server the software is running on. You can change this and other mail headers in the global configuration:

 config from "admin@example.com"; config replyto "admin@example.com"; config errorsto "errors@example.com";

You can also change the format of the subject line, using the same replacement variables as before:

 config subject "Sysmon: %H %s %U";

Or you can disable subject lines entirely with:

 config nosubject;

This can be advantageous if you use email to send messages to a phone or pager that does not handle subject lines gracefully.

Test Queuing Options

There are a few options available that control how Sysmon processes service tests and notifications. First, the numfailures option controls how many tests a service must fail before a notification message is sent. By default, a service must fail four tests in a row, but if you wanted notification after only three failures:

 config numfailures 3;

Normally, Sysmon schedules a test to be run 60 seconds after the last time the same test was completed. If you are monitoring a very large number of services, you can reduce the load on the server by increasing this interval. If you need tests run faster, thereby making Sysmon more sensitive, you can decrease it. The following would set the interval to 90 seconds:

 config queuetime 90;

Though it is tempting to lower the queue time in order to make Sysmon detect problems quickly, anything lower than 60 seconds is probably excessive.

Sysmon will happily run more than one test at a time; by default, it will run up to 100 tests simultaneously . You can raise or lower this value with the maxqueued option:

 config maxqueued 50;

A server that has limited resources and many tests to run may need to run fewer tests at once, while a fast server can benefit by taking advantage of the ability to run more tests at a time.

Usually Sysmon sends only one notification when a service changes status. If you would like it to send repeated messages when a service is down, you can use the pageinterval option. Unlike with the other options, the units are in minutes:

 config pageinterval 20;

This would send a reminder notification every 20 minutes when a service is down.

DNS Options

As mentioned earlier, Sysmon keeps its own DNS cache separate from the one the server's operating system uses. By default, entries from the cache are expired every 15 minutes, but you can change this with the dnsexpire option, which takes an argument in seconds:

 config dnsexpire 300;

This would expire entries from the cache every 5 minutes.

Every 10 minutes, Sysmon also sends information about the cache to syslog. This interval can be changed with the dnslog option, in seconds:

 config dnslog 900;

Message Formatting Options

If you do not like the message format that Sysmon uses by default, you can change it by using the replacement variables described earlier, in conjunction with the pmesg global configuration option. For example:

 config pmesg "%H %s %U";

would produce the simple text just as before, but this time, it would be the body of all mail notifications. The default message format is:

 %H (%I) %w is %u %d

Using Variables

If you have an even moderately sized installation, you may have many objects configured with the same information again and again, and worse yet, some of that information may need to change over time. Say you have a couple of different groups of people that should receive notifications when different services go down. One group is responsible for a certain set of hardware, another is responsible for a different set. You can list all the email addresses with every object, but then when an address needs to be added or removed, you will need to reconfigure every object. Instead, you can use a global variable to store information once, and that information will be used as is throughout the rest of the file. When you wish to make a change, you will have to do it in only one place. The following example demonstrates the use of variables to store lists of contacts:

 set network-group = "netops@example.com, joe-pager@example.com"; set network-group-nopage = "netops@example.com"; set web-group = "frank@example.com, jill@example.com; object router5 { ip "router5-backbone.example.com"; type ping; desc "Router 5 Backbone"; dep "server"; contact "$network-group"; }; object web-ping { ip "www.example.com"; type ping; desc "Web Server Ping"; dep "server"; contact "$network-group-nopage"; }; object web-server { ip "www.example.com"; type www; desc "Main Web Server"; dep "web-ping"; url "http://www.example.com"; urltext "<TITLE>"; contact "$web-group"; };

The variables defined at the beginning are referenced later with a dollar sign in front of the variable name. Notice that quotation marks still need to be used, even when a variable is referenced.

Using Includes

Another problem for large installations is that the configuration file can quickly become large and unwieldy. It will be easier to maintain if you break it down into smaller files by whatever grouping makes the most sense for you. Sysmon will let you do this with the include option:

 include "/usr/local/etc/sysmon.webservers.conf";

The named file will have its contents included wherever the statement is placed in the configuration.

There are a number of different ways to organize groups of configuration files. You may wish to have different files for different services: one for Web servers, one for ping tests, and so on. Or perhaps it would work better in your environment to organize into different files for different physical parts of the network.

Other Global Options

Other global configuration options available include an option to change the file to which the process ID is written, an option to turn off registration messages sent to the Sysmon registration server, and an option to change the facility to which the program sends syslog messages. These are all described in the documentation that comes with Sysmon.