Controlling Event Generation | Performance and Fault Management: A Practical Guide to Effectively Managing Cisco Network Devices (Cisco Press Core Series)

It is obviously very important to generate events when there is some sort of problem. It is also important to limit the number of events generated to prevent an excessive load on your network and your event management system.

For example, if your utilization on a link rapidly changes, it may cross a threshold many times and generate lots of events. Similarly, if a WAN link is changing from up to down and back rapidly (flapping), many events will be generated by a threshold on this interface.

You can control the rate at which events are generated in the following ways:

Throttling events
Lengthening the sample rate
Hysteresis

The following three sections cover these techniques in detail.

Throttling Events

Throttling events means that a device throws away events generated over a certain rate. Often, the device will throttle events system-wide. This means that if one event is often being repeated, the system will indiscriminately throw away events to keep the event rate below the set rate. This technique is very useful for preventing a device from flooding a network or system with events. However, it also underscores the fact that events may not be received by the event management system for many reasons.

Lengthening the Sample Rate

Lengthening the sample rate ensures that events are generated less often. If your data stream is highly variable so that every sample period generates an event, if you sample once per second, you will generate one event a second, which is probably many more events than you or your event management system can handle. Reducing the sample rate to once every five minutes may produce useful events at a more reasonable rate. It also significantly reduces the load on the system processing the threshold. Longer sample rates also have the effect of smoothing the data stream, which probably will reduce the number of events generated at the cost of missing some peaks. You, as the network administrator, must determine what sample rate is best for your network. Many network administrators start with very aggressive rates and then are overwhelmed with the amount of data generated.

If you are polling and then processing the data to generate events, high sample rates will also increase the load that network management adds to your network. If you are using the Remote Management MIB (RMON) or another method to have the device check for events, the sample rate will affect the traffic on the network only if events are triggered very often.

Hysteresis

Hysteresis is a mechanism to reduce the volume of events generated by thresholds on rapidly varying continuous or time-series data.

For example, take a network segment going to a server that you are monitoring for utilization with a 70 percent threshold without hysteresis. A large project is coming to an end and the engineers are pushing out lots of large drawing files to this file server in a mad rush to finish their parts of the project. During this time, this segment goes from an average utilization of 10 to 20 percent to averaging between 60 and 80 percent. Every time the utilization rises above 70 percent, an event is generated. What would be more useful is information about the time at which the threshold is first crossed and how long this abnormally high utilization lasts. Hysteresis will allow you to get this information without lots of redundant events.

This mechanism works as follows:

A threshold is set up to track the value of an object.
Different rising and falling set points are assigned to this threshold, with the rising set point exceeding the falling set point.
An event is triggered when the rising set point is crossed.
Once this threshold is crossed, an event is not generated again until the falling threshold is crossed. The same mechanism prevents falling thresholds from being generated until the rising threshold is crossed again.

Figure 5-1 presents an example of how the hysteresis mechanism operates.

Figure 5-1. The Hysteresis Mechanism

graphics/05fig01.gif

Notice in Figure 5-1 that because the initial state was set to trigger on a rising threshold, no alarm is generated at point A. As the value of the object increases to above the rising threshold, an alarm is generated at point B. No alarms are generated at points C, D, E, or F until a falling alarm is generated when the falling threshold is crossed at point G. Once again, no additional alarms are generated at point H until one is triggered by crossing the rising threshold at point I. This mechanism can drastically reduce the volume of events without eliminating the information required to determine whether a fault exists. Without hysteresis, nine alarms would have been generated; with hysteresis, only three are generated. Often, the reduction is more dramatic.