Collecting Network Performance Data

I l @ ve RuBoard

This section describes some of the tools available for collecting network performance information. Some overlap exists between network performance management and system performance management. For example, a network application may have a low network transmission rate because it isn't getting sufficient CPU time to send its data. Consequently, overlap also exists between the tools used for system and network performance management.

Using RMON and RMON-II Instrumentation

RMON I and RMON II (defined in RFCs 1757 and 2021, respectively) together form a standard for network performance management. Monitors , or probes, are placed on specific LAN segments and then collect data on those segments. Packet statistics and performance history can be stored. An RMON management application can collect data from remote RMON probes and present it at a central site. Alarms can also be triggered based on specified conditions and sent to the management station.

The RMON definition consists of the following optional sections, or groups:

  • Statistics

  • History

  • Alarm

  • Hosts

  • HostsTopN

  • Matrix

  • Filter

  • Capture

  • Event

The statistics section provides information on the amount of data being sent on a particular Ethernet interface, including the number of broadcasts, errors, and collisions. You can also see the distribution of packets of different sizes.

The history group allows statistics to be sampled at a specified polling interval and summed over time. The RMON statistics are Ethernet-centric, but extensions to support Token Ring are defined in RFC 1513.

The alarm and event groups are used to report when configured thresholds are met. Statistics are periodically checked to see whether a threshold has been met. After an event is sent, no additional events are sent until the opposite threshold has been met.

RMON probes can discover new systems, or hosts, by watching the source and destination station addresses in packets passing on the LAN segment. The hosts group stores each address seen on the interface. A variety of important statistics are kept about each system, such as the number of packets sent and received, the number of error packets sent, and the number of broadcasts sent. This can be invaluable information for catching a misbehaving system that is dominating the network bandwidth on a LAN segment.

The hostsTopN group keeps track of the top systems on the segment for a specific statistic, such as packets received, error packets sent, or broadcast packets sent. The number of systems in the list is configurable.

The matrix group can be used to determine traffic problems. Statistics are kept on the network conversations between two addresses. Information collected includes the number of packets sent in either direction, and the number of error packets sent.

The filter and capture groups work together to capture packets sent on a segment. A management station can then download the packets.

RMON focuses on the data link and network layers of the network stack. RMON-II extends the specification to include the transport, session, presentation, and application layer protocols. Additional groups provided by RMON-II include:

  • Protocol Directory

  • Protocol Distribution

  • Address Mapping

  • Network Layer Host

  • Network Layer Matrix

  • Application Layer Host

  • Application Layer Matrix

The Protocol Directory lists all the protocols that the monitor supports. A monitor can add more protocols, but the extensibility is limited.

The Protocol Distribution group shows per-protocol statistics for the amount of data and number of packets being sent on a LAN segment. From this information, you can calculate the bandwidth utilization per protocol and, hopefully, isolate a troublesome application.

The Address Mappings group is used to map a network address to the station address to which it is bound. The Network Layer Host group then shows the amount of traffic for each network address. The Network Layer Matrix group shows network traffic statistics between pairs of network addresses. You can also see the top traffic producers .

The Application Layer Host shows network traffic statistics for a particular network address, broken down by protocol. Similarly, the Application Layer Matrix shows network traffic statistics for traffic between a pair of network addresses.

As previously mentioned, both Ethernet and Token Ring use the RMON standard. Other standards are being considered for other link types. For example, to account for the switching technology in ATM, a new specification called AMON has been proposed by Hewlett-Packard. Other vendors have additional proposals for supporting ATM.

NetMetrix Site Manager

As mentioned earlier, NetMetrix Site Manager can be used to monitor the status and utilization of your network devices. Some of the different errors reported were discussed earlier. Utilization reports can also be generated, such as the top users on a network segment, and the devices sending the most packets.

NetMetrix also includes a MIB Browser so that you can query the MIB values of a selected network device.

MeasureWare

The MeasureWare Agent is a Hewlett-Packard product that collects and logs resource and performance metrics. MeasureWare agents run and collect data on the individual server systems being monitored . Agents exist for many platforms and operating systems, including HP-UX, Solaris, and AIX.

The MeasureWare Agent collects data, summarize it, timestamp it, log it, and send alarms when appropriate. The agents collect and report on a wide variety of system resources, performance metrics, and user -defined data. The information can then be exported to spreadsheets or to performance analysis programs, such as PerfView. The data can be used by these programs to generate alarms to warn of potential performance problems. By using historical data, trends can be discovered . This can help to solve resource issues before they affect system performance.

MeasureWare agents collect data at three different levels: global system metrics, application metrics, and process metrics. Global and application data is summarized at five-minute intervals, whereas process data is summarized at one-minute intervals. Important applications can be defined by an administrator by listing the processes that make up the application in a configuration file.

The basic categories of MeasureWare data are system, application, process, and transaction. Optional modules exist for database and networking support, too. MeasureWare agents also collect data provided through the Data Source Integration (DSI) interface. For instance, data from NetMetrix is provided by the Network Response Facility (NRF) via DSI integration.

The following network metrics are available from MeasureWare. Additional metrics provided by MeasureWare are covered in other chapters.

  • Number of configured LAN interfaces

  • Number and rate of NFS requests during interval

  • Rate of LAN errors

  • Rate of LAN collisions

  • Number and rate of inbound LAN packets, per LAN interface, during interval

  • Number and rate of outbound LAN packets, per LAN interface, during interval

  • Number and rate of LAN errors, per LAN interface, during interval

  • Number and rate of LAN collisions, per LAN interface, during interval

MeasureWare agents provide data and alarms to PerfView for analysis, and also to the IT/O management console. SNMP traps can be sent at the time that a threshold condition is met. Automated actions can be taken, or the operator can choose to take a suggested action.

MeasureWare's extract command can be used to export data to other tools, such as spreadsheet programs.

Application Resource Measurement (ARM) APIs can be used to instrument applications so that response times can be measured. The application response time information can be passed along to MeasureWare agents for analysis. The ARM APIs are described in more detail in Chapter 7.

GlancePlus

GlancePlus is a real-time, graphical, performance monitoring tool from Hewlett-Packard. It is used to monitor the performance and system resource utilization of a single system. Both Motif-based and character-based interfaces are available. The product can be used on HP-UX, Sun Solaris, and many other operating systems.

GlancePlus collects information similar to that collected by MeasureWare, and samples data more frequently than MeasureWare. GlancePlus can be used to graphically view current system and network resource activity and utilization. It can also show application and process information. Transaction information can be shown if the MeasureWare Agent is installed and active. In addition to system metrics, GlancePlus can show alarm information, color -coded to reflect severity.

GlancePlus is also capable of setting and receiving performance- related alarms. Customizable rules determine when a system performance problem should be sent as an alarm. The rules are managed by the GlancePlus Adviser. The Adviser menu includes the Edit Adviser Syntax option, which you can select to view and optionally modify all alarm conditions.

Alarms result in onscreen notification, with color representing the criticality of an alarm. An alarm can also trigger a command or script to be executed automatically. Instead of sending an alarm, GlancePlus can print messages, or notify you by executing a UNIX command, such as mailx, by using its EXEC feature.

To configure events, you need to edit a configuration file. The GlancePlus Adviser syntax file (/var/opt/perf/adviser.syntax) contains the symptom and alarm configuration. Additional syntax files can also be used. A condition for an alarm to be sent can be based on rules involving different symptoms.

You can also execute scripts in command mode. To execute a script, type:

 glance adviser_only syntax <script file name> 

GlancePlus can be used to show general network performance characteristics for a system. For example, GlancePlus can show the rate of incoming network packets, and the percentage of all packets that were errors or Ethernet collisions. This can be shown either globally or for each network interface. An example of a GlancePlus networking graph is shown in Figure 6-9. You can see packet transmission rates and error rates in this graph.

Figure 6-9. GlancePlus view of network performance.

graphics/06fig09.gif

GlancePlus can also be used to focus on a specific network problem area, such as a network interface. Also, focusing on a specific network interface can enable the operator to ignore noncritical network components , such as the interface to a subnetwork under development.

For NFS, GlancePlus keeps track of performance statistics, such as the number of read operations per second, number of write operations per second, and number of I/O operations per second for all clients or servers communicating with the server.

More than 600 metrics are accessible from GlancePlus. Some of these metrics are discussed in other chapters. The complete list of metrics can be found by using the online help facility. This information can also be found in the directory /opt/perf/paperdocs/gp/C.

GlancePlus enables you to use filters to reduce the amount of information shown. For example, you can set up a filter in the process view to show only the more active system processes. GlancePlus can also show short- term historical information. When selected, the alarm buttons , visible on the main GlancePlus screen, show a history of alarms that have occurred.

GlancePlus will also show the Process Resource Manager's (PRM) behavior, if PRM is installed, and allows you to change PRM process group entitlements . PRM is a Hewlett-Packard product for dividing a system's CPU, memory, or I/O bandwidth resources among users or applications.

PerfView

PerfView is a graphical performance analysis tool from Hewlett-Packard that is used to graphically display performance and system resource utilization for one system or multiple systems simultaneously , so that comparisons can be made. A variety of performance graphs can be displayed. The graphs are based on data collected over a period of time, unlike the real-time graphs of GlancePlus. This tool runs on HP-UX or NT systems and works with data collected by MeasureWare agents.

PerfView has the following three main components:

  • PerfView Monitor: Provides the ability to receive alarms. A textual description of an alarm can be displayed. Alarms can be filtered by severity, type, or source system. Also, after an alarm is received, the alarm can be selected, to display a graph of related metrics. An operator can monitor trends leading to failures, and then take proactive actions to avoid problems. Graphs can be used for comparison between systems and to show a history of resource consumption. An internal database is maintained that keeps a history of alarm notification messages.

  • PerfView Analyzer: Provides resource and performance analyses for system resources. System metrics can be shown at three different levels: process, application (configured by the user as a set of processes), and global system information. It relies on data received from MeasureWare agents on managed nodes. Data can be analyzed from up to eight systems concurrently. All MeasureWare data sources are supported. PerfView Analyzer is required by both PerfView Monitor and PerfView Planner.

  • PerfView Planner: Provides forecasting capability. Graphs can be extrapolated into the future. A variety of graphs (such as linear, exponential, s-curve , and smoothed) can be shown for forecasted data.

PerfView's ability to show history and trend information can be helpful in diagnosing system problems. Graphing performance information can help you to understand whether a persistent problem exists or an anomaly is simply a momentary spike of activity. An example of a PerfView graph of network performance statistics in shown in Figure 6-10. Each network statistic is listed at the top of the graph and is color-coded.

Figure 6-10. PerfView graph of network performance.

graphics/06fig10.gif

To diagnose a problem further, PerfView Monitor can allow users to change time intervals, to try to find the specific time a problem occurred. The graph is redrawn showing the new time period.

PerfView is integrated with several other monitoring tools. You can launch GlancePlus from within PerfView by accessing the Tools menu. PerfView can also be launched from the IT/O Applications Bank. When troubleshooting an event in the IT/O Message Browser window, you can launch PerfView to see a related performance graph.

PerfView Monitor is not used with IT/O. Instead, the IT/O Message Browser is used. When an alarm is received in IT/O, the operator can click the alarm, and a related PerfView graph can be shown.

In a single performance graph, PerfView can show information collected from multiple systems. The PerfView and ClusterView products have also been integrated to enable the operator to select a cluster symbol on an HP OpenView submap and launch the PerfView application, to quickly show a performance comparison between the cluster systems.

Additional information about MeasureWare, GlancePlus, and PerfView can be found at the HP Resource and Performance Management Web site at http://www.openview.hp.com/solutions/application/.

BMC PATROL for UNIX

BMC PATROL does not provide a separate networking Knowledge Module (KM). Instead, the BMC System KM includes monitors for network metrics. The System KM can identify network overloads. From the console, it can show TCP/IP-related processes and list NIS accounts and groups. The alerts can be sent via SNMP to other management stations by using Patrolink.

Network General Sniffer Pro

A LAN analyzer can show you the type of traffic being transferred on your network. Identifying the protocols being used can give some indication of the types of applications running in the environment. With this information, you can determine whether protocol filtering is needed. An analyzer may be the only reliable way to detect an improperly terminated LAN cable. Network General's Sniffer Pro product views, captures, and analyzes network traffic without affecting server performance or introducing network traffic.

However, many vendors are now providing analyzers as pure software solutions. Sniffer Pro is a software solution that can run on your management station and collect data on different LAN segments. An example network performance graph is shown in Figure 6-11.

Figure 6-11. Sniffer Pro summary of LAN statistics.

graphics/06fig11.gif

I l @ ve RuBoard


UNIX Fault Management. A Guide for System Administrators
UNIX Fault Management: A Guide for System Administrators
ISBN: 013026525X
EAN: 2147483647
Year: 1999
Pages: 90

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net