Performance and Fault Management
Authors: Maggiora P. L. Elliott C. E. Thompson J. M.
Published year: 2005
Pages: 9-10/200
Buy this book on amazon.com >>

Preface

The server guys just came in the room and said it's your problem: the network is slow. People have been calling the help desk since early this morning (it's now lunch ) complaining that the sales order application has been extremely slow or unreachable. You looked at your Network Management Station earlier on and noticed nothing particularly wrong. Yet, the server guys claim it's the network, not their servers.

Also seated in the room are your boss, her boss, and the boss's boss. Orders aren't being placed and everyone now seems to think the fault lies in the network; it's your problem now.

This is a common situation with people who design and manage networks. When something goes wrong, no matter what it is, the first reaction is to blame the network. Maybe the network is to blame for the problem, maybe it is not. To know for sure, you will need to have implemented some level of performance and fault management techniques that help you isolate the cause of a particular performance related problem.


The Meaning of Performance and Fault Management

Performance and fault management encompasses a wide array of tools and topics.

Traditionally, network management tools provided logs of network systems messages (traps, syslog) and colored statuses of each device and interface (green means up, red means down). While these systems were valuable for network troubleshooting, they did not report on or inform an engineer of the health of a network system until it actually went down.

Understanding the activities of a router or switch requires more than knowing whether the device is up or down. You probably want to know some of the following types of information:

  • How much traffic is passing through the interfaces? Is it too much? How much is too much?

  • Is the device CPU busy? How busy? How much is too much?

  • Is the device running out of memory?

Aside from general device health, you probably also want to understand how characteristics of the network devices affect the reliability of the network. For instance, are too many people trying to dial into your ISDN router? Are too many collisions occurring on the Ethernet segments?

How much traffic is too much? How much is normal for the network? These are essential questions that network engineers and managers ask. Questions about how much is too much come into the Cisco TAC, and the answer, much to customer dismay, is "it depends."

Performance and fault management encompasses issues such as the following:

  • You need to implement performance and fault management, but don't know where to begin.

  • You find people are either ignoring the current network management tools or that the tools are providing useless information.

  • Problem resolution times are taking considerably longer then they should

  • Cisco publishes bunches of MIBs. Where do you begin?

  • You need to select network management tools but aren't quite sure how to do so.

  • You've been told to manage the Frame Relay connections for your 17,000 site network, but you aren't sure what characteristics to look at to determine whether the connections are working right.

  • You want to make sure the network devices are healthy , but don't know what to measure or what is considered acceptable in making a determination.

  • You don't know what traps are available or how to configure your NMS trap receiver to print the trap information in a human readable format.

This book addresses these issues. In addition, it teaches you how to navigate Cisco's documentation, MIBs, and management in order to keep up with the constantly changing pace of managing Cisco devices.

Performance and Fault Management
Authors: Maggiora P. L. Elliott C. E. Thompson J. M.
Published year: 2005
Pages: 9-10/200
Buy this book on amazon.com >>