Section 9.2. Receiving Traps

9.2. Receiving Traps

Let's start by discussing how to deal with incoming traps . Handling incoming traps is the responsibility of the NMS. Some NMSs do as little as display the incoming traps to standard output (stdout). However, an NMS server typically has the ability to react to SNMP traps it receives. For example, when an NMS receives a linkDown trap from a router, it might respond to the event by paging the contact person, displaying a pop-up message on a management console, or forwarding the event to another NMS. This procedure is streamlined in commercial packages but still can be achieved with freely available open source programs.

9.2.1. HP OpenView

OpenView uses three pieces of software to receive and interpret traps:

ovtrapd (1M)
xnmtrap
xnmevents

OpenView's main trap-handling daemon is called ovtrapd. This program listens for traps generated by devices on the network and hands them off to the Postmaster daemon (pmd). In turn, pmd triggers what OpenView calls an event. Events can be configured to perform actions ranging from sending a pop-up window to NNM users, forwarding the event to other NMSs, or doing nothing at all. The configuration process uses xnmtrap, the Event Configurations GUI. The xnmevents program displays the events that have arrived, sorting them into user-configurable categories.

OpenView keeps a history of all the traps it has received; to retrieve that history, use the command $OV_BIN/ovdumpevents. In older versions of OpenView, traps are kept in an event logging file in $OV_LOG/trapd.log. By default, this file rolls over after it grows to 4 MB. It is then renamed trapd.log.old and a new trapd.log file is started. If you are having problems with traps, either because you don't know whether they are reaching the NMS or because your NMS is being bombarded by too many events, you can use tail -f to watch trapd.log so that you can see the traps as they arrive. (You can also use ovdumpevents to create a new file.) To learn more about the format of this file, refer to OpenView's manual pages for trapd.conf (4) and ovdumpevents (1M).

Recent releases of OpenView instead put traps into the OpenView Event Database. Many admins prefer the old logfile format, however. If you are running a recent release of OpenView and want to see a trapd.log file of all your traps, run ovdumpevents to create this file.

It might be helpful to define exactly what an OpenView event is. Think of it as a small record, similar to a database record. This record defines which trap OpenView should watch out for. It further defines what sort of action (send an email, page someone, etc.), if any, should be performed.

9.2.2. Using NNM's Event Configurations

OpenView uses an internal definition file to determine how to react to particular situations. This definition file is maintained by the xnmtrap program. We can start xnmtrap by selecting Options Event Configurations (in the NNM GUI) or by giving the command $OV_BIN/xnmtrap. In the Enterprise Identification window, scroll down and click on the enterprise name OpenView .1.3.6.1.4.1.11.2.17.1. This displays a list in the Event Identification window. Scroll down in this list until you reach OV_Node_Down. Double-click on this event to bring up the Event Configurator (Figure 9-1).

Figure 9-1. OpenView Event ConfiguratorOV_Node_Down

Figure 9-1 shows the OV_Node_Down event in the Event Configurator. When this event is triggered, it inserts an entry containing the message "Node down," with a severity level of Warning, into the Status Events category. OpenView likes to have a leading 0 (zero) in the Event Object Identifier, which indicates whether this is an event or trapthere is no way to change this value yourself. The number before the 0 is the enterprise OID; the number after the 0 is the specific trap numberin this case, 58916865.^[*] Later we will use these numbers as parameters when generating our own traps.

^[*] This is the default number that OpenView uses for this OV_Node_Down trap.

9.2.2.1. Selecting event sources

The Source option is useful when you want to receive traps from certain nodes and ignore traps from other nodes. For example, if you have a development router that people are taking up and down all day, you probably would rather not receive all the events generated by the router's activity. In this case, you could use the Source field to list all the nodes from which you would like to receive traps and leave out the development router. To do this, you can either type each hostname by hand and click Add after each one, or select each node (using the Ctrl and mouse-click sequence) on your OpenView Network Node Map and click Add From Map. Unfortunately, the resulting list isn't easy to manage. Even if you take the time to add all the current routers to the Event Sources, you'll eventually add a new router (or some other hardware you want to manage). You then have to go back to all your events and add your new devices as sources . OpenView allows you to use pattern matching and source files, making it easier to tailor and maintain the source list.

9.2.2.2. Setting event categories

When NNM receives an event, it sorts the event into an event category. The Categories drop-down box lets you assign the event you're configuring to a category. The list of available categories will probably include the following predefined categories (you can customize this list by adding categories specific to your network and deleting categories, as we'll see later in this section):

Error events
Threshold events
Status events
Configuration events
Application alert events
Don't log or display
Log only

The last two categories really aren't event categories in the true sense of the word. If you select "Don't log or display," OpenView will not save the event in its database and will not display the Event Log Message in any Event Categories. OpenView will display the Popup Notification in a pop-up window and run the Command for Automatic Action. The "Log only" option tells OpenView not to display the event but to keep a log of the event in its database.^[*]

^[*] As mentioned earlier, you can convert the database into a logfile using the ovdumpevents command.

"Log only" is useful if you have some events that are primarily informational; you don't want to see them when they arrive, but you would like to record them for future reference. The Cisco event frDLCIStatusChange - .1.3.6.1.2.1.10.32.0.1 is a good example of such an event. It tells us when a Virtual Circuit has changed its operational state. If displayed, we will see notifications whenever a node goes down and whenever a circuit changes its operational state to down. This information is redundant because we have already gotten a status event of "node down" and a DLCI change.^[*] With this event set to "Log only" we can go to the logfile only when we think things are fishy.

^[*] OpenView has a feature called Event Correlation that groups certain events together to avoid flooding the user with redundant information. You can customize these settings with a developer's kit.

9.2.2.3. Forwarding events and event severities

The Forward Event radio button, once checked, allows you to forward an event to other NMSs. This feature is useful if you have multiple NMSs or a distributed network management architecture. Say that you are based in Atlanta, but your network has a management station in New York in addition to the one on your desk. You don't want to receive all of New York's events, but you would like the node_down information forwarded to you. On New York's NMS, you could click Forward Event and insert the IP address of your NMS in Atlanta. When New York receives a node_down event, it will forward the event to Atlanta.

The Severity drop-down list assigns a severity level to the event. OpenView supports six severity levels: Unknown, Normal, Warning, Minor, Major, and Critical. The severity levels are color-coded to make identification easier; Table 9-1 shows the color associated with each severity level. The levels are listed in order of increasing severity. For example, an event with a severity level of Warning has a higher precedence than an event with a severity level of Minor.

Table 9-1. OpenView severity levels
Severity	Color
Unknown	Blue
Normal	Green
Warning	Cyan
Minor	Yellow
Major	Orange
Critical	Red

The colors are used both on OpenView's maps and in the Event Categories. Parent objects, which represent the starting point for a network, are displayed in the color of the highest severity level associated with any object underneath them.^[*] For example, if an object represents a network with 250 nodes and one of those nodes is down (a Critical severity), the object will be colored red, regardless of how many nodes are up and functioning normally. The term for how OpenView displays colors in relation to objects is status source; it is explained in more detail in Chapter 5.

^[*] Parent objects can show status (colors) in four ways: Symbol, Object, Compound, or Propagated.

9.2.2.4. Log messages, notifications, and automatic actions

Returning to Figure 9-1, the Event Log Message and Popup Notification fields are similar but serve different purposes. The Event Log Message is displayed when you view the Event Categories and select a category from the drop-down list. The Popup Notification, which is optional, displays its message in a window that appears on any server running OpenView's NNM. Figure 9-2 shows a typical pop-up message. The event name, delme in this case, appears in the titlebar. The time and date at which the event occurred are followed by the event message, "Popup Message Here." To create a pop-up message like this, insert "Popup Message Here" in the Popup Notification section of the Event Configurator. Every time the event is called, a pop-up will appear.

Figure 9-2. OpenView pop-up message

The last section of the Event Configurator is the Command for Automatic Action. The automatic action allows you to specify a Unix command or script to execute when OpenView receives an event. You can run multiple commands by separating them with a semicolon, much as you would in a Unix shell. When configuring an automatic action, remember that rsh can be very useful. We like to use rsh sunserver1 audioplay -v50 /opt/local/sounds/siren.au, which causes a siren audio file to play. The automatic action can range from touching a file to opening a trouble ticket.

In each Event Log Message, Popup Notification, and Command for Automatic Action, special variables can help you identify the values from your traps or events. These variables provide the user with additional information about the event. Here are some of the variables you can use (the online help has a complete list):

$1: Print the first passed attribute (i.e., the value of the first variable binding) from the trap.
$2: Print the second passed attribute.
$n: Print the nth attribute as a value string. Must be in the range of 1-99.
$*: Print all the attributes as [seq] name (type).

Before you start running scripts for an event, find out the average number of traps you are likely to receive for that event. This is especially true for OV_Node_Down. If you write a script that opens a trouble ticket whenever a node goes down, you could end up with hundreds of tickets by the end of the day. Monitoring your network will make you painfully aware of how much your network "flaps," or goes up and down. Even if the network goes down for a second, for whatever reason, you'll get a trap, which will in turn generate an event, which might register a new ticket, send you a page, etc. The last thing you want is "The Network That Cried Down!" You and other people on your staff will start ignoring all the false warnings and may miss any serious problems that arise. One way to estimate how frequently you will receive events is to log events in a file ("Log only"). After a week or so, inspect the logfile to see how many events accumulated (i.e., the number of traps received). This is by no means scientific, but it will give you an idea of what you can expect.

9.2.3. Custom Event Categories

OpenView uses the default categories for all its default events. Look through the $OV_CONF/C/trapd.conf file to see how the default events are assigned to categories. You can add categories by going to Event Configuration Edit Configure Event Categories. Figure 9-3 shows this menu, with some custom categories added.

It's worth your while to spend time thinking about what categories are appropriate for your environment. If you plow everything into the default categories, you will be bothered by the Critical "Printer Needs Paper" event when you really want to be notified of the Critical "Production Server on Fire" event. Either event will turn Status Events red. The categories in Figure 9-3 are a good start, but think about the types of events and activities that will be useful for your network. The Scheduled and Unscheduled (S/U) Downtime category is a great example of a category that is more for human intervention than for reporting network errors. Printer Events is a nice destination for your "Printer Needs Paper" and "Printer Jammed" messages.

Even though none of the default categories is required (except for Error), we recommend that you don't delete them, precisely because they are used for all of the default events. Deleting the default categories without first reconfiguring all the

Figure 9-3. Adding event categories in OpenView

default events will cause problems. Any event that does not have an event category available will be put into the default Error category. To edit the categories, copy the trapd.conf file into /tmp and modify /tmp/trapd.conf with your favorite editor. The file has some large warnings telling you never to edit it by hand, but sometimes a few simple edits are the best way to reassign events. An entry in the portion of the file that defines event behavior looks like this:

 EVENT RMON_Rise_Alarm .1.3.6.1.2.1.16.0.1 "Threshold Events" Warning FORMAT RMON Rising Alarm: $2 exceeded threshold $5; value = $4. (Sample type = \ $3; alarm index = $1) SDESC This event is sent when an RMON device exceeds a preconfigured threshold. EDESC

It's fairly obvious what these lines do: they map a particular RMON event into the Threshold Events category with a severity of Warning; they also specify what should happen when the event occurs. To map this event into another category, change Threshold Events to the appropriate category. Once you've edited the file, use the following command to merge in your updates:

  $ $OV_BIN/xnmevents -l load /tmp/trapd.conf

9.2.4. The Event Categories Display

The Event Categories window (Figure 9-4) is displayed on the user's screen when NNM is started. It provides a very brief summary of what's happening on your network; if it is set up appropriately, you can tell at a glance whether there are any problems you should be worrying about.

If the window gets closed during an OpenView session, you can restart it using the Fault Events menu item or by issuing the command $OV_BIN/xnmevents. The menu displays all the event categories, including any categories you have created.

Figure 9-4. OpenView Event Categories

Two categories are special: the Error category is the default category used when an event is associated with a category that cannot be found; the All category is a placeholder for all events and cannot be configured by the Event Configurator. The window shows you the highest severity level of any event in each event category.

The box to the left of Status Events is cyan (a light blue), showing that the highest unacknowledged severity in the Status Events category is Warning. Clicking on that box displays an alarm browser that lists all the events received in the category. A nice feature of the Event Categories display is the ability to restore a browser's state or reload events from the trapd.log and trapd.log.old files. Reloading events is useful if you find that you need to restore messages you deleted in the past.

OpenView extends the abilities of Event Categories by keeping a common database of acknowledged and unacknowledged events. Thus, when a user acknowledges an event, all other users see this event updated.

At the bottom of Figure 9-4, the phrase "[Read-Only]" means that you don't have write access to Event Categories. If this phrase isn't present, you have write access. OpenView keeps track of events in a single database, though older versions stored events on a per-user basis, using a special database located in $OV_LOG/xnmevents.<username>. With write access, you have the ability to update this file whenever you exit. By default, you have write access to your own event category database, unless someone has already started the database by starting a session with your username. There may be only one write-access Event Categories per user, with the first one getting write access and all others getting read-only privileges.

9.2.5. The Alarms Browser

Figure 9-5 shows the alarms browser for the Status Events category. In it we see a single Warning event, which is causing the Status Events category to show cyan.

Figure 9-5. OpenView alarms browser

The color of the Status Events box is determined by the highest precedence event in the category. Therefore, the color won't change until either you acknowledge the highest precedence event or an event arrives with an even higher precedence. Clicking in the far-left column (Ack)^[*] acknowledges the message and sets the severity to 0.

^[*] OpenView also supports Event Correlation, which has a column in this window as well.

The Actions menu in the alarms browser allows you to acknowledge, deacknowledge, or delete some or all events. You can even change the severity of an event. Keep in mind that this does not change the severity of the event on other Event Categories sessions that are running. For example, if one user changes the severity of an event from Critical to Normal, the event will remain Critical for other users. The View menu lets you define filters, which allow you to include or discard messages that match the filter.

When configuring events, keep in mind that you may receive more traps than you want. When this happens, you have two choices. First, you can go to the agent and turn off trap generation, if the agent supports this. Second, you can configure your trap view to ignore these traps. We saw how to do this earlier: you can set the event to "Log only" or try excluding the device from the Event Sources list. If bandwidth is a concern, you should investigate why the agent is sending out so many traps before trying to mask the problem.

9.2.6. Creating Events Within OpenView

OpenView gives you the option of creating additional (private) events. Private events are just like regular events, except that they belong to your private enterprise subtree rather than to a public MIB. To create your own events, launch the Event Configuration window from the Options menu of NNM. You will see a list of all currently loaded events (Figure 9-6).

Figure 9-6. OpenView's Event Configuration window

The window is divided into two panes. The top pane displays the Enterprise Identification, which is the leftmost part of an OID. Clicking on an enterprise ID displays all the events belonging to that enterprise in the lower pane. To add your own enterprise ID, select Edit Add Enterprise Identification and insert your enterprise name and a registered enterprise ID.^{Add Event, and then type the Event Name for your new event, making sure to use Enterprise Specific (the default) for the event type. Insert an Event Object Identifier. This identifier can be any number that hasnt already been assigned to an event in the currently selected enterprise. Finally, click OK and save the event configuration (using File Save).}

To copy an existing event, click on the event you wish to copy and select Edit Copy Event; youll see a new window with the event you selected. From this point on, the process is the same.

Traps with "no format" are traps for which nothing has been defined in the Event Configuration window. There are two ways to solve this problem: you can either create the necessary events on your own, or load a MIB that contains the necessary trap definitions, as discussed in Chapter 5. "No format" traps are frequently traps defined in a vendor-specific MIB that hasn't been loaded. Loading the appropriate MIB often fixes the problem by defining the vendor's traps and their associated names, IDs, comments, severity levels, etc.

Before loading a MIB, review the types of traps the MIB supports. You will find that most traps you load come, by default, in LOGONLY mode. This means that you will not be notified when the traps come in. After you load the MIB, you may want to edit the events it defines, specifying the local configuration that best fits your site.

9.2.7. Monitoring Traps with Perl

If you can't afford an expensive package like OpenView, you can use the Perl language to write your own monitoring and logging utility. You get what you pay for since you will have to write almost everything from scratch. On the other hand, you'll learn a lot and probably have a better appreciation for the finer points of network management. One of the most elementary, but effective, programs to receive traps is in a distribution of SNMP Support for Perl 5, written by Simon Leinen. Here's a modified version of Simon's program:
 #!/usr/bin/perl   use SNMP_Session; use BER; use Socket;   $session = SNMPv1_Session->open_trap_session (  ); while (($trap, $sender, $sender_port) = $session->receive_trap (  )) {     chomp ($DATE='/bin/date \'+%a %b %e %T\'');     print STDERR "\n$DATE - " . inet_ntoa($sender)         . " - port: $sender_port\n";     print_trap ($session, $trap); }   sub print_trap{     ($this, $trap) = @_;     my($community, $ent, $agent, $gen, $spec, $dt, $bindings)         = $this->decode_trap_request ($trap);     print "   Community:\t".$community."\n";     print "   Enterprise:\t".BER::pretty_oid ($ent)."\n";     print "   Agent addr:\t".inet_ntoa ($agent)."\n";     print "   Generic ID:\t$gen\n";     print "   Specific ID:\t$spec\n";     print "   Uptime:\t".BER::pretty_uptime_value ($dt)."\n";     $prefix = "   bindings:\t";     my ($binding, $oid, $value);     while ($bindings ne '') {         ($binding,$bindings) = &decode_sequence ($bindings);         ($oid, $value) = decode_by_template ($binding, "%O%@");         print $prefix.BER::pretty_oid ($oid).                 " => ".pretty_print ($value)."\n";         $prefix = "\t\t";     } } 
This program displays traps as they are received from different devices in the network. Here's some output, showing two traps:
 Mon Apr 28 22:07:44 - 10.123.46.26 - port: 63968    community:   public    enterprise:  1.3.6.1.4.1.2789.2500    agent addr:  10.123.46.26    generic ID:  6    specific ID: 5247    uptime:      0:00:00    bindings:    1.3.6.1.4.1.2789.2500.1234 => 14264026886 Mon Apr 28 22:09:46 - 172.16.51.25 - port: 63970    community:   public    enterprise:  1.3.6.1.4.1.2789.2500    agent addr:  172.16.253.2    generic ID:  6    specific ID: 5247    uptime:      0:00:00    bindings:    1.3.6.1.4.1.2789.2500.2468 => Hot Swap Now In Sync 
The output format is the same for both traps. The first line shows the date and time at which the trap occurred, together with the IP address of the device that sent the trap. Most of the remaining output items should be familiar to you. The bindings output item lists the variable bindings that were sent in the trap PDU. In the preceding example above, each trap contained one variable binding. The object ID is in numeric form, which isn't particularly friendly. If a trap has more than one variable binding, this program displays each binding, one after another.

An ad hoc monitoring system can be fashioned by using this Perl script to collect traps and some other program to inspect the traps as they are received. Once the traps are parsed, the possibilities are endless. You can write user-defined rules that watch for significant traps and, when triggered, send an email alert, update an event database, send a message to a pager, etc. These kinds of solutions work well if you're in a business with little or no budget for commercially available NMS software or if you're on a small network and don't need a heavyweight management tool.

9.2.8. Using the Network Computing Technologies Trap Receiver

The Trap Receiver by Network Computing Technologies is a freely available program that's worth trying.^[*] This program, which currently runs only on Windows-based systems, displays trap information as it's received. It has a standard interface but can be configured to execute certain actions against traps, like OpenView's Command for Automatic Action function. Figure 9-7 shows Trap Receiver's user interface.
^[*] This software can be found at http://www.ncomtech.com/download.htm.

Figure 9-7. Trap Receiver

You can log and forward messages and traps, send email or a page in response to a trap, as well as execute commands. By writing some code in C or C++, you can gain access to an internal trap stream. This program can be a great starting place for Windows administrators who want to use SNMP but lack the resources to implement something like OpenView. It's simple to use, extensible, and free.

9.2.9. Receiving Traps Using Net-SNMP

The last trap receiver we'll discuss is part of the Net-SNMP package, which is also freely available. snmptrapd allows you to send SNMP trap messages to facilities such as Unix syslog or stdout. For most applications the program works in the background, shipping messages to syslog(8). There are some configuration parameters for the syslog side of snmptrapd; these tell snmptrapd what facility level it should use for the syslog messages. The following command forwards traps to standard output (-Lo) rather than to syslog and does not fork off into the background (-f):
 $ ./snmptrapd -f -Lo 2005-05-05 08:00:24 NET-SNMP version 5.2.1 Started. 2005-05-05 08:03:05 sunserver2.ora.com [12.1.45.26] (via UDP: [12.1.45.26]:37223) TRAP, SNMP v1, community public          SNMPv2-SMI::enterprises.2789.2500.1224 Enterprise Specific Trap (1224) Uptime: 60 days, 14:41:38.72          SNMPv2-SMI::enterprises.2789.2500.1224 = INTEGER:   2005-05-05 08:10:16 sunserver2.ora.com [12.1.45.26] (via UDP: [12.1.45.26]:37223) TRAP, SNMP v1, community public      SNMPv2-SMI::enterprises.2789.2500.1445 Enterprise Specific Trap (1445) Uptime: 60 days, 14:41:38.72          SNMPv2-SMI::enterprises.2789.2500.1445 = STRING: "Fail Over Complete" 
By now the output should look familiar; it's similar to the reports generated by the other programs we've seen in this chapter. The Net-SNMP trap daemon is another great tool for scriptwriters. A simple Perl script can watch the file in which snmptrapd logs its traps, looking for important events and reacting accordingly. It's easy to build a powerful and flexible monitoring system at little or no expense.

The Net-SNMP trap daemon can also handle SNMPv2/SNMPv3 traps and informs. Recall that inform was introduced in SNMPv2. It allows a sender to receive an acknowledgment when the receiver gets the trap. To configure snmptrapd to receive both SNMPv3 traps and informs, care must be taken. You must add a createUser command to the snmpd.conf file. For example, to receive informs, I have the following in my snmpd.conf file:
 createUser kschmidt MD5 mysecretpass DES mypassphrase 
When snmptrapd starts up, it will discover the remote engine ID of the sender of the inform. To understand the role of the engine ID, let's look at a createUser enTRy that can be used to handle SNMPv3 traps:
 createUser -e 0x012345 kschmidt MD5 mysecretpass DES mypassphrase 
The difference is the inclusion of the -e switch. This configures the engine ID of the remote machine that will send us traps. In other words, we need to know it ahead of time. Refer to RFC 3411 for a specific algorithm for creating the engine ID. The next section shows how to send SNMPv3 traps from Net-SNMP.

In this section, we have looked at several packages that can receive traps and act on them, based on the traps' content. Keep in mind that all of these programs, whether they're free or cost tens of thousands of dollars, are basically doing the same thing: listening on some port (usually UDP port 162) and waiting for SNMP messages to arrive. What sets the various packages apart is their ability to do something constructive with the traps. Some let you program hooks that execute some other program when a certain trap is received. The simpler trap monitors just send a message logging the trap to one or more files or facilities. These packages are generally less expensive than the commercial trap monitors but can be made to operate like full-fledged systems with some additional programming effort. Languages such as Perl give you the ability to extend these simpler packages.