Resource and Performance Monitoring Tools

I l @ ve RuBoard

As mentioned earlier, the majority of database performance problems are the result of a poor database design or poorly coded database queries. Other reasons for performance problems are data access contention ("hot spots"), inadequate hardware resources, or a poorly configured database server. This section describes a variety of tools and techniques for locating database performance problems.

Application Resource Measurement

Because measuring the performance of an application is difficult, Hewlett-Packard and Tivoli Systems jointly developed an open set of APIs for measuring application response times. This standard set of APIs is called Application Resource Measurement (ARM). If an application has been modified to use the new APIs at key points, then you can measure and control the application's end-to-end performance. API calls are made in the application to mark the beginning and end of transactions. A Software Developer's Kit (SDK) is freely available for ARM from either Tivoli or Hewlett-Packard to make this process easier. Table 8-5 lists and describes the few available ARM APIs.

Baan has announced that the next release of its ERP application will be instrumented using the ARM API, which will make collecting application performance data for Baan more straightforward.

Table 8-5. ARM APIs
Name Function
arm_init Initializes ARM environment for application
arm_getid Names each transaction class used in application
arm_start Marks the start of execution of transaction instance
arm_update Updates information about transaction instance
arm_stop Marks the end of transaction
arm_end Cleans up application's memory used by ARM

After the application data is collected, it can be presented through performance management tools such as MeasureWare and HP PerfView.

Oracle Trace

Oracle has an API called Oracle Trace that is similar to the ARM APIs, except that Oracle Trace is available only for Oracle environments. Oracle7 server release 7.3.2 and SQL*NET release 2.3 are already instrumented with Oracle Trace calls. Oracle Trace can be used to instrument an Oracle application to get performance data. This set of APIs can also be used to detect the following two types of events occurring in an application:

  • Duration or transaction events: Have a start and stop time associated with them.

  • Asynchronous events: Refer to special conditions, such as errors.

Tables 8-6 and 8-7 show the Oracle Server 7.3 duration and asynchronous events, respectively, that are instrumented for Oracle Trace.

An operator can use the asynchronous events to create an audit trail of database accesses , for example. The duration of various database transactions can be studied over time to look for trends.

An application that has been modified to include calls to Oracle Trace routines is called a facility. The user configures the facility definition file to define which metrics, or items, to collect when the application is traced. Table 8-8 shows the items available, by default, for every event.

Additional cross-facility items, such as the Oracle7 Server Transaction ID, can be used to correlate data collected from multiple applications.

After an application has been instrumented, performance analysis can be enabled from the OEM console. In fact, the Oracle Trace Manager can automatically discover applications that have been instrumented using Oracle Trace APIs. A trace can be started or stopped , and data can be viewed from past or currently active traces. Applications can also start and stop tracing from within themselves . Multiple facilities can be included in the same trace. When the trace, called a collection, is stopped, the data is written to a collection file.

In addition to viewing trace information from the GUI, reports can be run against the collection file. The otrcrep command generates a detailed trace report, displaying statistics for all items associated with every occurrence of every event for every facility involved in the collection. Reports can be generated for a specific event or for a specific process, for example. Data can also be exported to an Oracle7 Server database, where data can be queried using SQL scripts. Oracle provides a set of sample scripts to show how this is done.

Table 8-6. Oracle Trace Duration Events
Event Description
Parse Start Start of event containing SQL query information (actual text of query)
Parse End End of event containing SQL query information
Execute Start Start of event containing information for execution of SQL query plan
Execute End End of event containing information for execution of SQL query plan
Fetch Start Start of event containing actual row retrieval information
Fetch End End of event containing actual row retrieval information
LogicalTX Start Start of event marking the first time a database update is performed that may change the database status
LogicalTX End End of event marking the first time a database update is performed that may change the database status
PhysicalTX Start Start of event marking a definite change in database status
PhysicalTX End End of event marking a definite change in database status
Table 8-7. Oracle Trace Asynchronous Events
Event Description
Connection Records each connection to a database
Disconnect Records each disconnection from a database
ErrorStack Code stack for core dump
Migration Session migration between shared server processes
ApplReg Application context information
RowSource Row information
SQLSegment Text of SQL statement
Wait Records a generic WAIT event; context is provided in the event strings
Table 8-8. Oracle Trace Standard Items
Name Description
UCPU Amount of CPU time in user mode
SCPU Amount of CPU time in system mode
INPUT_IO Number of times filesystem performed input
OUTPUT_IO Number of times filesystem performed output
PAGEFAULTS Number of hard and soft page faults
PAGEFAULT_IO Number of hard page faults
MAX_RSSIZE Maximum resident set size used

Oracle provides a function for calculating the elapsed time for duration events. Reports can also be generated on the frequency of different types of events, such as connection events. Statistics available include the minimum, maximum, and average times spent doing different events. The Oracle Trace APIs are listed with a brief description in Table 8-9.

Oracle V$ Tables

Oracle provides a set of virtual tables that can be used to access memory structures within the shared global area. Many of these tables contain database performance information that is continuously updated. Table 8-10 lists and describes the contents of some of the more interesting V$ tables.

Table 8-9. Oracle Trace APIs
Name Function
epc_init Initial trace call; used to register facility with Oracle Trace
epc_add_reg_id Adds a registration ID for a facility
epc_remove_reg_id Removes a registration ID for a facility
epc_start_event Records start of duration event
epc_end_event Records end of duration event
epc_event Records asynchronous event
epc_collect Starts a collection from within a facility
epc_cancel Stops a collection started by epc_collect
epc_bind Binds a process to a collection that is already active
epc_flush Writes collection data buffers to disk
epc_context Stores collection data on per-thread basis
epc_delete_context Indicates the end of thread
epc_set_cf_items Sends cross-facility items to another process
epc_cf_value Assigns values to cross-facility items
epc_get_cf_items Obtains cross-facility items from another process
Table 8-10. Oracle V$ Tables
V$ Table Contains
V$DATABASE Database information from the control file
V$FILESTAT File read/write statistics
V$INSTANCE Status information for the current instance
V$LIBRARYCACHE Library cache performance information
V$LICENSE Information on license limits
V$PROCESS Process information
V$ROWCACHE Data dictionary performance information
V$SESSION Session information
V$SESSTAT Information on the memory usage for each current session
V$SGA Summary information on system global area
V$SGASTAT Detailed information on system global area
V$SYSSTAT Current system-wide value for each variable in V$SESSTAT

To get a list of all available Oracle V$ performance monitoring tables, execute the following SQL query:SELECT * FROM X$KQFVI;

For more information on the V$ tables, consult Oracle's Oracle Server Tuning Guide and Oracle Server Reference manuals.

GlancePlus Pak 2000

GlancePlus Pak 2000 is a product from Hewlett-Packard. It includes performance management, fault management, and real-time performance monitoring. It has three components :

  • MeasureWare

  • GlancePlus

  • ITO/Special Edition

MeasureWare is an HP product that collects and logs resource and performance metrics. MeasureWare agents run and collect data on the individual server systems being monitored . Agents exist for many platforms and operating systems, including HP-UX, Solaris, and AIX.

The MeasureWare agents collect data, summarize it, timestamp it, log it, and send alarms when appropriate. The agents collect and report on a wide variety of system resources, performance metrics, and user-defined data. The information can then be exported to spreadsheets or to performance analysis programs, such as PerfView. The data can be used by these programs to generate alarms to warn of potential performance problems. By using historical data, you can discover trends, which can help you address resource issues before they affect system performance.

MeasureWare collects system, application, process, and transaction data. MeasureWare agents collect data at three different levels: global system metrics, application, and process metrics. Global and application data is summarized at five-minute intervals, whereas process data is summarized at one-minute intervals.

GlancePlus is a real-time, graphical, performance monitoring tool from Hewlett-Packard. It is used to monitor the performance and system resource utilization of a single system. You should use GlancePlus to see whether a performance problem can be isolated to the system or the database. Both Motif-based and character-based interfaces are available. The product can be used on HP-UX, Solaris, and many other operating systems.

GlancePlus collects information similar to MeasureWare, but samples data more frequently. GlancePlus can be used to graphically view current CPU, memory, swap, and disk activity, as well as utilization at the system level. It can also show application and process information. Transaction information can be shown if the MeasureWare Agent is installed and active.

ITO/Special Edition is the single-system version of IT/O. From the Java-based GUI, you can view events from a single system in the Event Browser. Diagnostic tools are also provided from the GUI. ITO/Special Edition uses IT/O's intelligent agent technology, which includes the ability to collect data from various sources and execute automated actions.

To monitor an application in GlancePlus or MeasureWare, the application must first be defined. To define important applications, you must first define the processes that comprise the application. This is done in the parm file (located at /var/opt/perf/parm), in which you list the process names that make up the application. Wildcards can be used if the exact names vary. For example, ora* could be used to define an application based on all processes beginning with ora. GlancePlus and MeasureWare summarize process information for processes listed in the application definition.

Customizable rules determine when a system performance problem should be sent as an alarm. In GlancePlus, the rules are managed by the GlancePlus Adviser. Choosing the Edit Adviser Syntax option on the Adviser menu enables you to see and modify all alarm conditions. The GlancePlus Adviser syntax file (/var/opt/perf/adviser.syntax) contains the symptom and alarm configuration. Additional syntax files can also be used. For MeasureWare, alarms are defined in the alarmdef file, located at /var/opt/perf/alarmdef.

In GlancePlus, alarms result in onscreen notifications, with color representing the criticality of an alarm. An alarm can also trigger a command or script to be executed automatically. Instead of sending an alarm, GlancePlus can print messages or notify you by executing a UNIX command, such as mailx, using its EXEC feature.

You can also have alarms sent according to conditions that are based on a combination of metrics. For example, a CPU bottleneck alarm can be based on the CPU use and CPU run queue length. MeasureWare agents provide these alarms to PerfView for analysis, and to the IT/O management console. SNMP traps can also be sent at the time a threshold condition is met. Automated actions can be taken, or the operator can choose to take a suggested action.

Application- and transaction-level information is available. Transactions must be defined by the application using the ARM API. The following are some of the available metrics:

  • Transaction average response time

  • Distribution of transaction response times

  • Number of aborted transactions

  • Response time of aborted transactions

More than 600 metrics are accessible from GlancePlus and MeasureWare. The complete list of metrics can be found by using the product's online help facility. On systems with GlancePlus Pak 2000 installed, this information can also be found in the directory /opt/perf/paperdocs/gp/C.

Hewlett-Packard also provides a separate product, the Process Resource Manager (PRM), which can be used to control CPU, memory, and disk utilization between users or applications. PRM has done some special integration with Oracle, as well. If PRM is being used, GlancePlus shows how well PRM application groups are staying within their resource entitlements . From GlancePlus, you can also change PRM process group entitlements.

Although MeasureWare provides extensive performance and resource information, it has limited configuration information and no data about system faults. For further information, visit the HP Resource and Performance Management Web site at http://www.openview.hp.com/solutions/application/.

Oracle Management Pak

The Oracle Management Pak from Hewlett-Packard can be used to monitor Oracle databases and is used in conjunction with GlancePlus Pak 2000. You can also migrate from this single-server-environment solution to an enterprise solution simply by adding the intelligent agent for IT/O.

Numerous database metrics are available. Oracle Management Pak can report the status of the database instance, individual tablespaces and rollback segments, and configured Oracle background processes. The number of snapshot refresh errors can also be reported .

The following is a summary of the resource management metrics that are included for Oracle:

  • Maximum number of sessions since server was last started

  • Percentage of free tablespace to total available

  • Number of free chunks in tablespace

  • Number of segments that cannot add extents

  • Number of segments approaching maximum extents

  • Percentage of open cursors to total cursors configured

  • Percentage of DML locks used

  • Percentage of maximum processes used

  • Percentage of enqueue resources used

  • Number of current user logons

  • Archive free-space percentage

  • Space remaining on dump devices

In addition to resource management, the following Oracle performance metrics are available:

  • Ratio of blocks read to physical reads

  • Disk sort rate

  • Percentage of memory sorts to total

  • Average number of rows per sort

  • Block buffer hit ratio (current and total)

  • Dictionary cache hit ratio

  • Library cache reload percentage

  • Percentage of free memory to total shared pool memory

  • Transaction commit rate

  • Transaction rollback rate

  • Percentage of rollbacks to commits

  • Background checkpoint completion rate

  • User call rate

  • Ratio of recursive calls to user calls

  • Percentage of chained rows fetched to total

  • Rate at which full table scans occur

  • Buffer busy rate

  • Number of waits for redo log buffer space

  • Percentage of rollback buffer waits to gets

All of the threshold monitoring and real-time graphs included with GlancePlus Pak 2000 can be applied to these database metrics as well.

PerfView

PerfView is a graphical performance analysis tool from Hewlett-Packard. It is used to graphically display performance and system resource utilization for one system or for multiple systems simultaneously , so comparisons can be made. A variety of performance graphs can be displayed. The graphs are based on data collected over a period of time, unlike the real-time graphs of GlancePlus, which show only recent history. PerfView runs on HP-UX or NT systems and works with data collected by MeasureWare agents.

PerfView has three main components:

  • PerfView Monitor: Provides the ability to receive alarms. A textual description of an alarm can be displayed. Alarms can be filtered by severity, type, or source system. Also, after an alarm is received, the alarm can be selected, which displays a graph of related metrics. An operator can monitor trends leading to failures, and can then take proactive actions to avoid problems. Graphs can be used for comparison between systems and to show a history of resource consumption. An internal database is maintained that keeps a history of alarm notification messages.

  • PerfView Analyzer: Provides resource and performance analysis for disks and other resources. System metrics can be shown at three different levels: process, application (configured by the user as a set of processes), and global system information. It relies on data received from MeasureWare agents on the managed nodes. Data can be analyzed from up to eight systems concurrently. All MeasureWare data sources are supported. Perf-View Analyzer is required by both PerfView Monitor and PerfView Planner.

  • PerfView Planner: Provides forecasting capability. Graphs can be extrapolated into the future. A variety of graphs (such as linear, exponential, s-curve , and smoothed) can be shown for forecasted data.

In addition to graphing and analyzing system resources, process and application resources can be graphed and analyzed using PerfView. With PerfView, users can define the set of processes that make up an application. PerfView can then be used to show a history of a specified application's utilization. PerfView's ability to show history and trend information can be helpful in diagnosing system problems. Graphing performance information can help you understand whether a persistent problem exists or an anomaly is simply a momentary spike of activity.

With PerfView, you can analyze, alarm, forecast, and report on the Oracle Management Pak metrics and compare them with system, network, and application metrics. The graph in Figure 8-2 shows commit, or transaction, rates over time.

Figure 8-2. Graphing transaction rates in PerfView.

graphics/08fig02.gif

To diagnose a problem further, PerfView Monitor enables you to change time intervals, to try to find the specific time a problem occurred. The graph is redrawn showing the new time period.

PerfView is integrated with several other monitoring tools. You can launch GlancePlus from within PerfView by accessing the Tools menu. PerfView can be launched from the IT/O Applications Bank. When troubleshooting an event in the IT/O Message Browser window, you can launch PerfView to see a related performance graph.

When using IT/O with PerfView, the PerfView Monitor is disabled. The IT/O Message Browser should be used to view alarms, instead. When an alarm is received in IT/O, you can click the alarm to display a related PerfView graph.

In a single performance graph, PerfView can show information collected from multiple systems. The PerfView and ClusterView products have also been integrated, enabling you to select a cluster symbol on an HP OpenView submap and launch the PerfView application, to quickly show a performance comparison between all systems in the cluster.

For further information, visit the HP Resource and Performance Management Web site at http://www.openview.hp.com/solutions/application/.

SMART Plug-Ins for Databases

IT/Operations is meant to provide a consistent interface for managing systems and networks. The goal of HP's SMART Plug-Ins (SPIs) is to extend support for applications and databases. SPIs are preconfigured solutions that are intended to be used with IT/O, although IT/O is not required. Today, SPIs exist for SAP R/3, Baan, Oracle, Informix, and Sybase.

SPIs for Databases provide the same set of metrics as provided by the Oracle Management Pak, plus the ability to report information via IT/O's reliable RPC mechanism, not via SNMP traps.

The tight integration between SPIs and IT/O means that other features are available to SPIs, such as event filtering and correlation, and task escalation. Events include built-in instructions and operator text that can aid in diagnosis and recovery. Corrective actions are predefined for many SPI events.

Figure 8-3 shows how the SPI for Informix integrates into IT/O. You can see the new message groups that were added for Informix by the SPI for Informix. The Application Group window shows all the database tools added for SPI for Informix. The Message Browser enables you to see messages about the Informix database.

Figure 8-3. Using IT/O to see events about Informix databases.

graphics/08fig03.gif

Unlike EMS, which is focused on fault management, SPIs for Databases focus on database configuration and resource management. IT/O templates can specify different thresholds for different database instances. SPIs can verify that the database processes defined in Oracle's init.ora configuration file are actually running, and their performance can be monitored in GlancePlus and PerfView as a single application.

BMC PATROL Knowledge Modules

BMC Software, Inc. provides monitoring capabilities through its PATROL software suite. With PATROL, you can define thresholds and send events to a console or to other management applications. PATROL consists of a console, intelligent agents, and more than 40 different monitoring programs, or Knowledge Modules (KMs). Each KM is optional, so you can pick the monitors that are best for your environment.

The console provides a centralized graphical display, with icons representing system components or other monitored components. Icons change color to correspond to their status. For example, an icon may change to red to represent a critical event on its node or component. The PATROL console also provides an Event Browser, in which you can filter, correlate, sort, and escalate events. Alarms can be configured so that events are sent to the console and shown in the Event Browser. Events can also be configured to go to HP OpenView IT/O or other SNMP-based management stations .

Intelligent agents provide the ability to discover the system, database, and application components in an enterprise. The agents reside on each server. On an ongoing basis, the agents look for problems; when they discover a problem, they either take preconfigured actions or send notification so that recovery can be done manually.

The KMs contain the expertise used by PATROL to know what to monitor and how to react when problems occur. KMs are used to monitor a set of "parameters," which can include a description of the monitored attribute, the polling interval, the method for measuring the attribute, and a threshold for abnormal values. The KMs provide rules to detect events and perform corrective actions. Events are sent to an operator console when an error or warning condi tion occurs. BMC PATROL includes KMs for UNIX, SAP R/3, Oracle, Informix, and other applications.

BMC's primary business is database and application monitoring. It provides KMs for Oracle, Sybase, and Informix databases. BMC also has KMs for DB2, Red Brick, and Ingres. PATROL arguably provides more database metrics than are available from the other products described in this chapter, although you may need to include multiple KMs.

For Oracle environments, PATROL monitors more than 70 metrics for Oracle servers, tablespaces, and users. Metrics include user connection information, active locks, I/O statistics, dictionary hit ratios, and CPU utilization. Remote servers can be monitored by using SQL*NET, even if the server is not running the PATROL agent software. After the Oracle KM is loaded, PATROL determines all the Oracle database instances on the system. The PATROL administrator manually starts and stops the monitoring of instances.

PATROL provides the following categories of monitoring information for Oracle:

  • Overall Database Performance

  • Server Instance Information

    • Log Buffer

    • Shared Pool

    • Internal Structures

  • Multithreaded Server Information

  • Database Information

    • General Statistics

    • Trace & Archive

    • Space Management

  • User and Application Information

  • Transaction Information

    • General Statistics

    • Block Level

    • Row Level

    • User & Application

    • Rollback

  • User Session Information

    • Backup and Recovery

    • Miscellaneous

A variety of database performance graphs are available as icons from the PATROL console. Figure 8-4 shows some of the metrics maintained by the KM for Oracle. You can double-click these icons to see a graph or gauge.

Figure 8-4. BMC PATROL icons for performance graphs.

graphics/08fig04.gif

BMC also provides database administration tools. PATROL DB includes database administrative tools such as Pathfinder, DB-Alter, DB- Reorg , DB-Change Manager, DB-Integrity, DB-Voyager, and SQL-Explorer. These tools are integrated into the PATROL framework and can be launched from the PATROL console. By contrast, IT/O provides monitoring only, relying on database vendors to provide these administrative tools.

BMC has bundled its database products into a PATROL Availability Suite for Oracle. The product bundle includes the PATROL KM for Oracle, PATROL DB-Stats for Oracle, PATROL DB-Reorg for Oracle, and PATROL DB-Integrity.

Unlike MC/ServiceGuard, PATROL does not provide any failover capability. However, a limited set of automated recovery actions is available. Recovery actions can be performed by the agents without requiring communication with the console.

HP used to rely on PATROL to provide database information to its MeasureWare Agent, but the Database KMs were resold through a special licensing agreement, so HP now uses its SPIs for Databases or the Oracle Management Pak to gather database information for MeasureWare.

PLATINUM DBVision

PLATINUM Technology, Inc. provides the ProVision suite of products for monitoring the performance of servers, databases, applications, and networks. ProVision consists of four products:

  • ServerVision: Provides performance information about the computer system, including the load average, number of current users, disk rates, most active CPU process, and CPU and disk usage per process.

  • DBVision: PLATINUM's product for database monitoring, it can monitor Oracle, Sybase, Informix, and DB2 databases. A Motif GUI and a Windows NT GUI are supported. The emphasis is on resource and performance monitoring. DBVision can detect problems such as running out of available tablespace, high CPU utilization, low swap space, processes waiting for locks, and runaway processes. The product can also be used to see which user is tying up system resources.

  • TransTracker: A pre-deployment tool for measuring the system, database, and network usage of a transaction. Instead of requiring an application to be instrumented, TransTracker is able to identify transactions by collecting data from network packets.

  • WireTap: Also used to measure transaction response time, it provides network monitoring by capturing packet-level data and categorizing data by protocol (such as HTTP and TCP). The user can identify SQL statements to look for, and define alarm thresholds so that notification can be sent when transaction response times are not acceptable. WireTap provides Web performance data for HTML requests , as well as Web server "hit" rates.

When problems are detected , alerts can be sent to a pager, via e-mail, or to a centralized console receiving information from multiple databases. In response to an event, predetermined corrective actions can be taken, which can be defined in customized scripts that are executed automatically.

The database performance data provided by DBVision includes the following statistics:

  • Average number of physical writes per second per database instance

  • Average total physical reads per second

  • Average total physical writes per second

  • Average block reads per second

  • Average block writes per second

  • Average number of logical reads per second

  • Total active transactions

  • Active transactions per session

  • Block change ratio

DBVision also includes database resource information, such as:

  • Free tablespace

  • Number of current logins

  • Tablespace fragmentation

  • A table's number of extents

  • Free space per chunk

In addition to database monitoring, PLATINUM provides tools for database administration and for identifying database performance bottlenecks. Database administration tools include Enterprise DBA and TSReorg.

Enterprise DBA provides user administration, schema, and content management. Administrative tasks can be done across different databases, with Oracle, Informix, and DB2 databases supported. Remote agents are used to change the database.

TSReorg supports both Oracle and Sybase databases. It can be used to reorganize database tables, indexes, and tablespaces. Using DBVision and TSReorg together, you can configure corrective actions to be performed automatically by TSReorg in response to a DBVision event. Tablespaces can be monitored so that database reorganization is triggered automatically. Tablespaces can be reorganized while a database stays online. TSReorg can also be used to view database structures graphically, showing where database fragmentation exists.

A Database Analyzer tool can be used to graphically display tablespace and database usage information, which can be used for capacity planning. The Oracle Server Manager can also be used to monitor Oracle databases. Log Analyzer can be used as an additional troubleshooting tool for Sybase and DB2 database environments.

I l @ ve RuBoard


UNIX Fault Management. A Guide for System Administrators
UNIX Fault Management: A Guide for System Administrators
ISBN: 013026525X
EAN: 2147483647
Year: 1999
Pages: 90

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net