Section 8.1. Internal Polling | Essential SNMP, Second Edition

8.1. Internal Polling

It may seem like a waste of bandwidth to poll a device just to find out that everything is OK. On a typical day, you may poll dozens of devices hundreds or thousands of times without discovering any failures or outages. Of course, that's really what you want to find outand you'll probably conclude that SNMP has served its purpose the first time you discover a failed device and get the device back online before users have had a chance to start complaining. However, in the best of all possible worlds, you'd get the benefits of polling without the cost: that is, without devoting a significant chunk of your network's bandwidth to monitoring its health.

This is where internal polling comes in. As its name implies, internal polling is performed by an agent that is internal, or built in, to the device you want to manage. Since polling is internal to the device, it doesn't require traffic between the agent and your NMS. Furthermore, the agent doing the polling does not have to be an actual SNMP agent, which can allow you to monitor systems (either machines or software) that do not support SNMP. For example, some industrial-strength air-conditioning equipment vendors provide operational status information via a serial port. If the air-conditioning unit is attached to a terminal server or similar device, it becomes easy to use scripting languages to monitor the unit and generate traps if the temperature exceeds a certain threshold. This internal program can be written in your favorite scripting language, and it can check any status information to which you can get access. All you need is a way to get data from the script to the management station.

One strategy for writing a polling program is to use "hooks" within a program to extract information that can then be fed into an SNMP trap and sent to the NMS. We will cover traps more in Chapter 9. Another way to do internal polling is to use a program (e.g., sh, Perl, or C) that is run at set intervals. (On Unix, you would use cron to run a program at fixed intervals; there are similar services on other operating systems.) Hooks and cron-driven scripts both allow you to check internal variables and report errors as they are found. Here is a Perl script that checks for the existence of a file and sends a trap if the file is not found:

 #!/usr/local/bin/perl # Filename: /opt/local/perl_scripts/check4file.pl use SNMP_util "0.54";  # This will load the BER and SNMP_Session modules for us $FILENAME = "/etc/passwd"; # # if the /etc/passwd file does not exist, send a trap! # if(!(-e $FILENAME)) {     snmptrap("public\@nms:162", ".1.3.6.1.4.1.2789", "sunserver1", 6, 1547, \              ".1.3.6.1.4.1.2789.1547.1", "string", "File \:$FILENAME\: Could\              NOT Be Found"); }

Here is what the Sun-style crontab looks like:

 $ crontab -l # Check for this file every 15 minutes and report trap if not found 4,19,34,49 * * * * /opt/local/perl_scripts/check4file.pl

Notice that we poll four minutes after each quarter hour rather than on the quarter hour. The next poll we insert into the crontab file may run five minutes after the quarter hour (5,20,35,50). This practice prevents us from starting a huge number of programs at the same time. It's a particularly good idea to avoid polling on the hourthat's a popular time for random programs and cron jobs to start up. Consult the cron manpage if you are unfamiliar with its operation.

8.1.1. Remote Monitoring (RMON)

RMON is a supplement to the MIB-II group. This group, if supported by the device's SNMP agent, allows us to do both internal and external polling. We can poll devices through a remote NMS (external polling) or have the local RMON agent check itself periodically and report any errors (internal polling). The RMON agent will send traps when error conditions are found.

Many devices support RMON, making it an effective mechanism for internal polling. For example, Cisco supports the Events and Alarms RMON categories. You can configure the Alarms category to poll MIBs internally and react in different ways when a rising or falling threshold occurs. Each threshold has the option of calling an internal event. Figure 8-1 shows the flow that these two RMON categories take.

The distinction between alarms and events is important. Each alarm is tied to a specific event that defines what action to perform when the alarm goes off. Once a threshold is met, triggering an alarm, the alarm calls the event, which can perform

Figure 8-1. RMON process flow

additional functions, including sending traps to the NMS and writing a record in a log. Standard SNMP traps are preconfigured by the agent's vendor, which gives network managers no control over setting any kind of thresholds; however, RMON allows a network manager to set rising and falling thresholds. Figure 8-2 represents the interaction between a router's RMON agent and an NMS.

Figure 8-2. RMON and NMS interaction

In Figure 8-2, the Cisco router's SNMP agent forwards a trap to the NMS. Notice the direction of communication: RMON trap transmission is unidirectional. The NMS receives the trap from the Cisco router and decides what action to take, if any.

In addition to sending traps, we can also log events ; if we so choose, we can even log the event without generating a trap. Logging can be particularly useful when you are initially configuring RMON alarms and events. If you make your alarm conditions too sensitive, you can clog your NMS with trigger-happy RMON events. Logging can help you fine-tune your RMON alarms before they are released into production.

8.1.1.1. RMON configuration

As a practical example of how to configure RMON, we will use Cisco's RMON implementation, starting with events. The following IOS command defines an RMON event:

 rmon event number [log] [trap community] [description string] [owner string]

If you're familiar with IOS, you should be expecting a corresponding no command that discards an RMON event:

 no rmon event number

The parameters to these IOS commands are:

number: Specifies the unique identification number for the event. This value must be greater than 0; a value of 0 is not allowed.
log: Tells the agent to log the entry when triggered. This argument is optional.
trap community: Specifies the trap community string, i.e., a community string to be included with the trap. Many network management programs can be configured to respond only to traps with a particular community string.
description string: Describes the event.
owner string: Ties the event or item to a particular person.

Here are two examples of how to create Cisco RMON events. The first line creates a rising alarm, which facilitates sending a trap to the NMS. The second creates a falling alarm that might indicate that traffic has returned to an acceptable level (this alarm is logged but doesn't generate a trap):

 (config)#rmon event 1 log trap public description "High ifInOctets" owner dmauro (config)#rmon event 2 log description "Low ifInOctets" owner dmauro

You can also use logging to keep track of when the events were called. Though you can configure traps without logging, what happens if the line to your NMS goes down? Logging ensures that you don't lose information when the NMS is disabled. We suggest using both log and trap on all your events. You can view the logs of your RMON events by issuing the following command on the router:

 orarouter1# show rmon event Event 1 is active, owned by dmauro  Description is High ifInOctets  Event firing causes log and trap to community public, last fired 00:00:00 Event 2 is active, owned by dmauro  Description is Low ifInOctets  Event firing causes log, last fired 00:00:00

The following Net-SNMP command walks the rmon event table, which displays the values we just set:

 $ snmpwalk -v1 -c public -m orarouter1 .iso.org.dod.internet.mgmt.mib-2.rmon RMON-MIB::eventIndex.1 = INTEGER: 1 RMON-MIB::eventIndex.2 = INTEGER: 2 RMON-MIB::eventDescription.1 = STRING: High ifInOctets RMON-MIB::eventDescription.2 = STRING: Low ifInOctets RMON-MIB::eventType.1 = INTEGER: logandtrap(4) RMON-MIB::eventType.2 = INTEGER: log(2) RMON-MIB::eventCommunity.1 = STRING: "public" RMON-MIB::eventCommunity.2 = "" RMON-MIB::eventLastTimeSent.1 = Timeticks: (0) 0:00:00.00 RMON-MIB::eventLastTimeSent.2 = Timeticks: (0) 0:00:00.00 RMON-MIB::eventOwner.1 = STRING: "dmauro" RMON-MIB::eventOwner.2 = STRING: "dmauro" RMON-MIB::eventStatus.1 = INTEGER: valid(1) RMON-MIB::eventStatus.2 = INTEGER: valid(1)

Most of the information we set on the command line is available through SNMP. We see two events, with indexes 1 and 2. The first event has the description High ifInOctets; it is logged and a trap is generated; the community string for the event is public; the event's owner is dmauro; the event is valid, which essentially means that it is enabled; and we also see that the event has not yet occurred because the value of eventLastTimeSent is 0:00:00.00.^[*] Instead of using the command line to define these events, we could have used snmpset either to create new events or to modify events we already have. If you take this route, keep in mind that you must set the eventEntry.eventStatus to 1, for "valid," for the event to work properly.

^[*] Timeticks: (0) shows that no event occurred. This value is useful if you plan to write your own script to query the RMON objects on your router.

You can poll the objects ifDescr and ifType in the mgmt.interfaces.ifEntry subtree to help you identify which instance number you should use for your devices. If you are using a device with multiple ports, you may need to search the ifType, ifAdminStatus, and ifOperStatus objects to help you identify what's what. In the next section, "External Polling," we will see that it is not necessary to keep track of these MIB variables (the external polling software takes care of this for us).

Now that we have our events configured, let's start configuring alarms to do some internal polling. We need to know what we are going to poll, what type of data is returned, and how often we should poll. Assume that the router is our default gateway to the Internet. We want to poll the router's second interface, which is a serial interface. Therefore, we want to poll mgmt.interfaces.ifEntry.ifInOctets.2 to get the number of outbound octets on that interface, which is an INTEGER type.^[] To be precise, the .2 at the end of the OID indicates the second entry in the ifEntry table. On our router, this denotes the second interface, which is the one we want to poll.) We want to be notified if the traffic on this interface exceeds 90,000 octets/second; we'll assume things are back to normal when the traffic falls back under 85,000 octets/second. This gives us the rising and falling thresholds for our alarm. Next, we need to figure out the interval at which we are going to poll this object. Let's start by polling every 60 seconds.

^[] From RFC 2819, Now we need to put all this information into a Cisco RMON alarm command. Here is the command to create an alarm:
 rmon alarm number variable interval {delta | absolute}     rising-threshold value [event-number]     falling-threshold value [event-number]     [owner string] 
The following command discards the alarm:
 no rmon alarm number 
The parameters to these commands are:

number

Specifies the unique identification number assigned to the alarm.

variable

Specifies which MIB object to monitor.

interval

Specifies the frequency (in seconds) at which the alarm monitors the MIB variable.

delta

Indicates that the threshold values given in the command should be interpreted in terms of the difference between successive readings.

absolute

Indicates that the threshold values given in the command should be interpreted as absolute values; i.e., the difference between the current value and preceding values is irrelevant.

rising-threshold value [event-number]

Specifies the value at which the alarm should be triggered, calling the event, when the value is rising. event-number is the event that should be called when the alarm occurs. The event number is optional because the threshold doesn't have to be assigned an event. If either of the two thresholds is left blank, the event number will be set to 0, which does nothing.

falling-threshold value [event-number]

Specifies the value at which the alarm should be triggered, calling the event, when the value is falling. event-number is the event that should be called when the alarm occurs. The event number is optional because the threshold doesn't have to be assigned an event. If either of the two thresholds is left blank, the event number will be set to 0, which does nothing.

owner string

Ties this alarm to a particular person.

To configure the alarm settings we just described, enter the following command, in configuration mode, on a Cisco console:
 orarouter1(config)#rmon alarm 25 ifEntry.10.2 60 absolute \ rising-threshold 90000 1 falling-threshold 85000 2 owner dmauro 
This command configures alarm number 25, which monitors the object in ifEntry.10.2 (instance 2 of ifEntry.ifInOctets, or the input octets on interface 2) every 60 seconds. It has a rising threshold of 90,000 octets, which has event number 1 tied to it: event 1 is called when traffic on this interface exceeds 90,000 octets/second. The falling threshold is set to 85,000 octets and has event number 2 tied to it. Here's how the alarm looks in the router's internal tables:
 orarouter1#show rmon alarm Alarm 1 is active, owned by dmauro  Monitors ifEntry.10.2 every 60 second(s)  Taking absolute samples, last value was 87051  Rising threshold is 90000, assigned to event 1  Falling threshold is 85000, assigned to event 2  On startup enable rising or falling alarm 
The last line of output says that the router will enable the alarm upon reboot. As you'd expect, you can also look at the alarm settings through the RMON MIB, beginning with the subtree 1.3.6.1.2.1.16. As with the events themselves, we can create, change, edit, and delete entries using snmpset.

One problem with internal polling is that getting trends and seeing the data in a graph or table is difficult. Even if you develop the backend systems to gather MIB objects and display them graphically, retrieving data is sometimes painful. The Multi Router Traffic Grapher (MRTG) is a great program that allows you to do both internal and external polling . Furthermore, it is designed to generate graphs of your data in HTML format. MRTG is covered in Chapter 12.