Using Built-in Events | Performance and Fault Management: A Practical Guide to Effectively Managing Cisco Network Devices (Cisco Press Core Series)

Most network devices and host computers have a variety of triggers already set or programmed into them to generate events. Before you start configuring events, you should understand what is already set up on the devices you are interested in.

Built-in events are reported using a variety of techniques, including the following:

Devices write messages to their console. Some of these messages may be the result of thresholds being crossed, for example,when the temperature of the device exceeds a set point.
Many systems log messages to files. Windows NT has an event log that records system events. Many UNIX systems use the syslog protocol to log system events. Network devices such as routers and switches can record system events on a syslog server. Applications may also log messages to files. Some of these messages are the result of applications finding that a threshold has been crossed, such as a database management system noticing that a database partition is within 90 percent of capacity.
Devices supporting SNMP generate notifications (traps or informs) when a variety of things happens. These notifications are defined as part of MIB supported by these devices.

All of these events or messages are generated from built-in triggers on these devices. Some of these events will be useful; others won't. Some can be disabled to prevent them from generating undesired events. Others can't be disabled, so if they are not useful, you'll still need to process and discard the events. One of the most important functions of an event management system is to filter events and only generate faults when an action needs to be taken. Avoiding generating useless events makes the work of an event management system that much easier.

A good example is that devices supporting SNMP generate link up and link down notifications if the link status of an interface changes state. It would seem that you want to know if a link in your network goes down. However, do you really want to know when someone reboots their PC? On the other hand, if the link to a large office in New York goes down, you'll want to be notified as soon as possible. Here, you have the same trigger, same object type, different interface, but vastly different priorities.

The next three sections cover each of the built-in event types in detail.

Console Messages

Many devices print messages to a console port. Sometimes, these messages are extremely important to capture and process. For example, a device that crashes may be able to write a key message explaining the nature of the crash or the problem leading up to the crash to the console port, but not be able to send the message anywhere else. Several techniques allow you to capture these messages and process them.

You can configure Cisco routers and switches to send the majority of their console messages as syslog messages, thus making it less essential to take special action to capture the console messages. However, even with these devices, some console messages will be produced that are not available any other way that can be highly beneficial in troubleshooting problems. For example, sometimes messages are printed just before a crash that can help diagnose the nature and cause of a crash. The device is unlikely to be able to generate a syslog message and send it out if it is in the midst of crashing. Also, devices that are failing to boot usually are able to print messages only to the console.

Usually, the devices that need to have the messages printed to their console ports monitored are clustered in computer rooms and equipment closets. A recommended way of logging these messages for processing is to connect the console ports of these devices to terminal servers. Then, one or more centralized systems can run a process that keeps Telnet sessions open to these devices and logs the output. This output can be processed as explained in Chapter 7's section "Collecting and Normalizing Log Files."

An additional benefit of connecting your network devices to terminal servers is that it is fairly easy to get to the console port of any device from any other point in the network. This can greatly speed up troubleshooting and problem resolution.

However, terminal servers normally allow only one session to a given port. So, you will need some method to allow you to log messages to the console port and still gain access for troubleshooting. One way is to terminate the logging Telnet sessions attached to the terminal server ports going to each console port before you start an interactive Telnet session. A much more elegant way is to enhance the process managing the Telnet sessions to allow a mechanism to remotely tap into that Telnet session and provide two-way communications and logging of both sides of the conversation. This approach has the added benefit of giving you a log of changes made through the console port on all your network devices.

System Event Logs, Syslog Messages, and Applications Logs

Cisco IOS devices can be configured, starting in IOS version 11.2, to send syslog messages above a configurable level as SNMP notifications to a management station.

Scripts can be written, and probably some exist to process UNIX syslog messages and to\ forward messages on to an event management system. The same holds true for the Windows NT event log or applications logs. Although processing these logs falls more in the domain of systems management, the borders between system management and network management continue to blur. The network manager increasingly needs information about the health of systems and applications to be able to answer the age-old question: "The network is down, can you fix it?"

SNMP Notifications

Most network devices support SNMP and send SNMP notifications. Many notifications are preconfigured and others can be enabled. MIB II defines several notifications that are mandatory for all SNMP devices to support:

coldStart: The device fully rebooted
warmStart: The device soft or warm reinitialized
authenticationFailure: SNMP community string incorrect
linkDown: An interface changed state to linkDown
linkUp: An interface changed state from linkDown to one of several other states
egpNeighborLoss: The device lost an EGP neighbor

The authenticationFailure notification and the two link state notifications can generate lots of messages. We recommend that the authenticationFailure notifications be counted and discarded. If excessive notifications of this type are seen on a device or across the network, further steps could be taken to determine the source of the SNMP queries with invalid community strings.

As we discussed before, you should only enable link state notifications on "interesting" interfaces. Otherwise, you'll be buried in notifications, especially if you have hyperactive PC users or flapping WAN links.

See the second part of this book for more details on notifications applicable to specific devices and management areas.