Chapter 6. Event and Fault Management


Your event management system (EMS) is the place where everything comes together and where the network lets you know when it needs attention. If your EMS is configured and working properly and if your thresholds are properly set, you can relax and wait for the system to report problems to you. If you don't want to spend your days analyzing reports or listening to complaints from network users, you'll want to invest the time required to configure this system properly for your network.

The problem is that the network will produce many more events than you'll want to deal with directly. So your EMS needs to process all these events and somehow just report to you when there's a problem that needs your attention. To do so, your EMS must have the knowledge to determine what events require what type of action, if any. This chapter will assist you in ensuring that your EMS can perform this function successfully.

Your EMS should be the point to which all events are delivered and the point to which everything interested in faults should go to find them. So, for example, when your availability monitor discovers devices it can't contact or regains contact with devices, it should deliver this information as events to your EMS. The network devices will discover issues that you'll want to process through your EMS. Your EMS is responsible for determining when there are faults and distributing these faults to your team to repair and to your network health displays. The EMS also is responsible for logging low-priority faults. And finally, it needs to record faults and time to resolution and deliver the faults to reporting systems to enable you to determine your network reliability or uptime. These relationships are shown in Figure 6-1.

Figure 6-1. Event and Fault Management System

graphics/06fig01.gif

For clarity of discussion, this chapter treats your EMS as if it were separate and distinct from your NMS. We are distinguishing your EMS from the rest of your NMS, regardless of whether your EMS is a separate product or set of scripts, or an integrated part of a commercial NMS.

This chapter covers the following:

  • An overview of events, including event producers, event types, and event delivery protocols and methods

  • Event-processing to determine faults including collection, normalization, correlation, and fault determination

  • Fault management, including fault tracking, fault delivery, and network self-repair



Performance and Fault Management
Performance and Fault Management: A Practical Guide to Effectively Managing Cisco Network Devices (Cisco Press Core Series)
ISBN: 1578701805
EAN: 2147483647
Year: 2005
Pages: 200

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net