Performance Measurement and Reporting Tools | Performance and Fault Management: A Practical Guide to Effectively Managing Cisco Network Devices (Cisco Press Core Series)

It's important that you be informed of problems before your users start screaming. Monitoring your network is often the first defense for detecting and resolving problems. The main areas you will want to measure or monitor are availability, response time, utilization, and accuracy. These areas are covered in detail in Chapter 4, "Performance Measurement and Reporting."

As Figure 9-1 shows, data about what to monitor is obtained from the knowledge base and is used to drive the collection of this information from the devices in your network. That information then can be used by other tools to trigger events and to generate reports.

The data collected through performance monitoring is useful for both reactive and proactive reporting. In reactive mode, it can be used with your event-management tools to trigger events when set conditions are met. In proactive mode, your performance-reporting tools can use this data to produce reports and graphs to analyze your network's performance and help plan for future capacity needs or identify other issues that will need attention.

There are a couple of issues that will affect your selection of a tool or tools to perform these tasks for you: reducing NMS traffic and tracking interface data. The next two sections cover these issues.

Reducing NMS Traffic

If you selected an NMS framework, it probably includes the capability to auto-discover your network. It may also automatically poll your network devices for availability or other performance measurements. Other tools you select may also auto-discover your network. None of these tools attempts to consolidate their polling together to lessen the load on your network and the devices being polled. So you will want to be aware of the duplication and work to lessen its impact on your network.

For example, a tool may have the capability to monitor your network for availability, but you are using the tool for another purpose. Be sure to turn off the availability function on this tool to lessen the load on the network and the confusion about which tool to use for what function.

Another example is in the case of auto-discovery. You may find that although a tool can discover your network, you have another tool that you prefer to use for discovery. Many tools can support importing devices and sometimes topology information from other products, thus reducing the amount of traffic on your network due to network-discovery processes.

Basically, you should select the best tool for each task and ensure that your other tools don't duplicate that function and add unnecessary traffic to your network or consume additional resources.

Tracking Interface Data

Another issue to pay attention to when evaluating data that collects information on an interface basis is how the tool keeps track of interfaces. The ifIndex of an interface can change due to several events. These may include the following:

If the device is reset or the SNMP process on the device is reinitialized, all ifIndexes may be renumbered.
If a board is removed and a new board is inserted to replace it, the ifIndexes of the interfaces on that board may change.
An interface may go down and you may, in the process of debugging the problem, move the physical connection from one interface to another. Thus, the ifIndex for the physical interface didn't change, but the function of the interface did.

The application should have some mechanism for tracking interfaces that is independent of ifIndex. SNMP provides several alternative variables that can be used. These include ifDescr from the MIB-II interfaces table, ifName and ifAlias from the RFC 1573 update to the interfaces table, and enterprise-specific variables such as portName from the CISCO-STACK-MIB. You need to determine what behavior you prefer and then evaluate tools to determine whether they work as you want them to.

First, you need to determine whether you want to track interfaces based upon function, such as 'WAN link to Kenogami Lake, Ontario'; or upon physical interface, such as 'Board 1, Port 1 on device Switch1.' You may want to handle your WAN links one way and your LAN links another because it is often more important to track performance on WAN links. The configuration of WAN links, including speed, tends to vary more from link to link for WAN links.

For interfaces that you want to track by function, you will need to label the interface in some way. Cisco routers allow you to set ifDescr with the description interface config command. Cisco Catalyst switches allow you to set portName through the use of the set port name command. Your network management software will need to track the relationship between the object(s) you set and ifIndex.

Tracking interfaces by physical interface doesn't require you to set anything. Your network management software will need to choose an object such as ifName to label each interface with. It will need to track the relationship between this object and ifIndex to ensure that the correct data is associated with the correct interface.

You need a way to specify types of ports and devices that should be included in the polling. Polling every single device and interface found by auto-discovery might be okay in some instances, but in others it may lead to information overload.

To reduce the amount of performance data that is stored, some systems will automatically aggregate data as it ages. This can be a real advantage, allowing you to see trends over a longer period of time and speeding up the time it takes to generate reports because less data has to be processed. Make sure, however, that the aggregation process doesn't hinder the capability to compare recent data to past data.

You may find that it is easier to collect data on all ports and not aggregate data, but throw data away after it gets to a certain age. This strategy is more practical now that disk space has become rather cheap.

Performance Monitoring and Reporting

It's only half of the job to collect data. You will want to do several things with this data. These things fall into two broad categories: performance monitoring and performance reporting.

Performance monitoring is reactive in nature. You will need to set up triggers and thresholds against which events are generated and processed by your event management system. Tools to help you do this are covered in the section called "Event and Fault Management."

Performance reporting is proactive in nature. Here, you are using tools that generate tabular reports, graphs, gauges, and dials; and using this data for planning and troubleshooting. These reports will also help you baseline your network, so you know what is normal and what triggers and thresholds to set for your network.

As more and more users are demanding that reports be available through the Web, your reporting tool should be able to present data through a web interface. You should not be limited to canned reports. You should be able to define new or modify existing reports to give you exactly the information you want, preferably through a web interface.

Most likely, you will need different levels of reports. First, reports that tend to be more ad hoc and specific. Second are general reports that operations and designer folks can use to understand general trends in the network. Third are reports for upper management; reports that demonstrate at a high level how the network is performing and help justify any further expenditures for network growth.

Criteria for Selecting Performance Management and Reporting Tools

The criteria you'll want to consider for performance management and reporting tools, in addition to the general criteria covered in the beginning of this chapter, include the following:

The tools should work with your other tools, including the capability to use the knowledge base to obtain the list of devices to monitor instead of duplicating the functions (and the network traffic) of other tools. They should also integrate with your event and fault management tools.
The tools should handle ifIndex changes intelligently and in the way that you prefer to have them handled.
The monitoring tools should be able to limit polling to those devices and interfaces that you are interested in. Alternatively, they should be able to collect data on all devices and interfaces without your intervention, saving you time at the cost of resources.
You may want a tool that aggregates data over time to reduce the resources that are required to store the data and speed up generating reports.

Some of the tools that provide performance measurement and reporting are listed in the following sections.

Availability:

Aprisma Spectrum
Castle Rock SNMPc Enterprise Edition
Computer Associates Unicenter TNG
Hewlett Packard OpenView Network Node Manager
Ipswitch WhatsUp Gold
Tivoli NetView

Response-time monitoring:

Concord Network Health Suite
Cisco Internetwork Performance Monitor
Cisco Traffic Director
NetScout Manager
Perform Sage/X

Accuracy:

Concord Network Health Suite
Multi Router Traffic Grapher (MRTG)
Tavve Performance Reporting Monitor (PRM)

Utilization:

Aprisma Spectrum
Avesta Trinity
Castle Rock SNMPc Enterprise Edition
Cisco Traffic Director
Computer Associates Unicenter TNG
Concord Network Health Suite
Desktalk TREND
Hewlett Packard OpenView Network Node Manager
Multi Router Traffic Grapher
NetOps DSM
NetScout Manager
NextPoint S³ Traffic Manager
ProactiveNet eBiz.IT
OnionPeel Network Reporting Depot
Perform Sage/X
Smarts InCharge
Tavve Performance Reporting Monitor
Tivoli NetView

Performance reporting:

Aprisma Spectrum
Avesta Trinity
Castle Rock SNMPc Enterprise Edition
Computer Associates Unicenter TNG
Concord Network Health Suite
Desktalk TREND
Hewlett Packard OpenView Network Node Manager
Ipswitch WhatsUp Gold
Onion Peel Network Reporting Depot
Perform Dashboard
ProactiveNet eBiz.IT
SAS Institute IT Service Vision
Tavve Performance Reporting Monitor
Tivoli NetView