Using Event Correlation Tools

I l @ ve RuBoard

As a company's enterprise grows to include more systems and devices, the company probably also wants to reduce the number of staff people needed to manage the enterprise. The sheer volume of events can be overwhelming.

To avoid floods of information in an environment with multiple managed nodes, some type of event correlation is needed. For example, in a clustered environment, each node may notice the failure of a shared disk device. Without intelligent event filtering, the management station may receive a critical event from each cluster node. Event correlation tools provide event reduction and consolidation, exclude unnecessary and meaningless information, and identify root causes.

In addition to the notification flexibility provided by IT/O, EMS, and other tools, you need additional control over the messages received. A network printer failure provides a good example. Although you may like to know when a network printer fails, if the event is detected and reported by all users of the printer it could inundate the Message Browser. Similarly, if the same event is reported multiple times, you may miss a more critical event. Filtering could make management of these types of scenarios much easier. Correlation tools can also be used to help resolve problems. For example, a network printer may be down to some of the users, but not all. The tools can be used to pinpoint the problem.

We have selected a few event correlation tools to discuss in this section. Many other event correlation products are available. Here is a list of some that are not discussed:

  • NetCool/OMNIbus from Micromuse provides event correlation capabilities. The Netcool/Reporter can generate reports based on events interesting to a particular user .

  • Tivoli Enterprise Console, part of Tivoli TME, can be configured to do simple event correlation.

  • COMMAND/POST, from Boole & Babbage Inc., does event correlation. It works with Tivoli TME, BMC PATROL, and OpenView.

  • InCharge, from System Management Arts (SMARTS), does event correlation. It is preconfigured with information about common network problems. InCharge is integrated with Tivoli TME.

OpenView Event Correlation Services

HP OpenView provides the Event Correlation Services (ECS) tool that integrates with Open- View NNM and IT/O. ECS can be used to reduce and consolidate events. It can also be used to suppress and reduce event storms. This tool can be configured to filter out unnecessary information and help you identify the root cause of a problem. Using ECS, you only need to be notified of critical events. ECS is capable of receiving events as SNMP traps, CMIP notifications, opcmsg messages, and ASCII events. It can be used to manage networks, systems, databases, and applications and is extensible, so new events/sources are easy to add.

The OpenView ECS product has two major components : the ECS Designer and the ECS Engine. The ECS Engine is a runtime correlation engine. It is the real-time component that acts against defined correlation rules. The ECS Designer is a graphical utility used to simplify developing and designing correlation rules. It also provides simulation capabilities to test rules before deploying them.

A correlation circuit in ECS is a collection of nodes that are configured to perform correlation and filtering. ECS can correlate events independent of the order in which they were received. You can also configure ECS to suppress repeated events using a time-based filter.

With the ECS integration in IT/O, correlation of events can be done from multiple devices locally by the agent or centrally at the management server. This not only reduces network traffic, but also allows for the root cause to be determined faster and closer to the source, providing more efficient problem resolution.

Correlation templates are easy to distribute to managed nodes. They are distributed the same way as other IT/O templates. Automatic and operator-initiated actions can also be configured and executed locally, even when the management server is down.

ECS also integrates with NNM. ECS can discover the network topology from NNM and can use it for correlation. It also comes with some out-of-the-box correlation circuits for network event correlation.

Seagate NerveCenter

Seagate NerveCenter provides network event correlation and behavior management for UNIX and NT systems. It uses rules-based filtering and advanced correlation to pinpoint root causes and help you manage the volume of critical network issues and events in the enterprise. Nerve- Center can also be configured to perform automated corrective actions.

NerveCenter has three main components: NCServer, AdminTool, and NC Client. The NCServer maintains a database of events. It detects events, distributes information to clients , and communicates with the network management platform. The AdminTool is used for maintaining and configuring NerveCenter domains.

NerveCenter comes with a graphical alarm interface. It provides a display to show the health of the environment at a glance. There is an Alarm and Traffic Summary window that shows existing alarms and their severity.

Rules are also user-configurable via a drag-and-drop GUI. The user can specify particular events and define corrective actions.

Behavior models are used to define the relationships between critical conditions and spe- cific corrective actions. NerveCenter comes with several predefined models for monitoring network traffic, performance, status, security, and error conditions.

There are many different types of automatic actions available to configure for specific events. The following is a list of automatic actions available from NerveCenter:

  • Send a notification to a management station or another NCServer.

  • Execute a UNIX or NT command or Perl script.

  • Log data to a file or database.

  • Send a page or e-mail.

  • Send an SNMP trap.

  • Perform an SNMP set.

NerveCenter correlates across network devices, UNIX systems, and NT systems. Correlation can be done at a central management station or using a distributed management model. In the distributed model, each NCServer is responsible for managing different domains.

NerveCenter correlates events through SNMP polling and listening for SNMP traps. It can be configured to monitor specific MIB variables . New MIBs can easily be added, so NerveCenter can be configured to poll additional MIB objects.

NerveCenter integrates in most network management platforms, including OpenView NNM and ITO, IBM NetView for AIX, Tivoli TME, and Unicenter TNG. It can also be run standalone. With IT/O integration, NerveCenter can forward messages to IT/O, and IT/O can forward messages to NerveCenter.

IT Masters MasterCell

IT Masters provides an event correlation tool called MasterCell. It provides event management capabilities for mission-critical applications in a distributed environment.

MasterCell uses a distributed architecture, which employs agents and a central server. The agents perform limited filtering and can perform actions as well as forward events to the server. Using intelligent agents , MasterCell can eliminate bottlenecks by adding more cells to distribute the load. Also, using this multi-tiered structure puts intelligence as close to the source as possible. Events can be analyzed through intermediate queries and then selectively propagated. This architecture also eliminates an SPOF of the event management environment.

MasterCell has four main components: the event processors/cells, an event browser, the Knowledge Base Editor, and adapters. Adapters, which run on the managed nodes, feed events into the cells. The cells collect and analyze the events. The event browser and Knowledge Base Editor, which connect to one or more cells, are written in Java, so you can run them from anywhere .

Cells are the event processors in the MasterCell environment. These lightweight correlation engines are distributed across the enterprise. They collect and analyze events, then respond, store, propagate, or group events according to defined rules and actions. You can have networks of cells called domains. Cells can be grouped by geographical, functional, or organizational boundaries. Operators can be assigned to specific cell domains. Local actions can be performed on the workstation where a cell is installed.

The MasterCell event browser can be used to browse events from one or more cells. Cells group events into collectors. Grouped events are shown as icons in the browser with color -coded indicators. You can drill down to view the events for each collector. Figure 9-2 shows an example of the collectors and their status indicators defined under demo1 . You can also interactively trigger corrective actions from the browser.

Figure 9-2. Using the MasterCell event browser to view the status of collectors.

graphics/09fig02.gif

The Knowledge Base Editor, or KB Editor, is used to define classes and event instances. It is also used to define configuration rules. A wizard is available to walk you through the steps. You can edit the knowledge base(s) offline, then distribute them to the cells. Figure 9-3 shows some instances of application events, such as APP_DOWN , which have been defined using the KB Editor.

Figure 9-3. Using the MasterCell KB Editor to define events and rules.

graphics/09fig03.gif

Several new features have been added in the 2.0 release of MasterCell to dynamically maintain up-to-date configurations across cells and browsers in the enterprise. Configuration changes in the knowledge bases are maintained on a configuration server and are automatically distributed to the cells. When invoking the browser, configuration data from the server is retrieved automatically.

The adapters feed events into cells. Adapters, which run on monitored systems, are background processes responsible for detecting events, translating the events into BAROC, and notifying MasterCell. MasterCell understands events using the BAROC language. This language is also understood at the Tivoli Enterprise Console. The adapters convert events into BAROC and send the events to the cell. MasterCell comes with adapters for SNMP, the NT event log, and generic text logs such as syslog . Events will be buffered if the adapter is unable to propagate them to the cell.

The MasterCell rules engine goes through nine distinct processing steps for each event collected by a cell. It refines an event by gathering more information about the event or executing a command to qualify the event before the event is processed . It filters events, deciding whether an event should be dropped or passed on based on knowledge base rules. It regulates events by holding repetitive occurrences of events until a threshold based on time is met. It updates existing events in response to new events. It does abstraction, which creates higher-levelevents by combining low-level events. It also correlates an event with others to establish cause-and-effect relationships. The execute phase is where operators are notified and corrective actions based on rules are performed. The timer phase schedules actions based on time factors in the rules. It also propagates, forwarding events to other cells.

MasterCell also has a performance throttling feature to control the rate at which events are processed. This feature can be used to control the impact of event storms on server resources.

MasterCell runs on Solaris, AIX, HP-UX, and Windows NT. In addition, it integrates with Tivoli's TME Enterprise Console, BMC PATROL, and Remedy AR System.

I l @ ve RuBoard


UNIX Fault Management. A Guide for System Administrators
UNIX Fault Management: A Guide for System Administrators
ISBN: 013026525X
EAN: 2147483647
Year: 1999
Pages: 90

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net