I l @ ve RuBoard |
As mentioned earlier, the majority of database performance problems are the result of a poor database design or poorly coded database queries. Other reasons for performance problems are data access contention ("hot spots"), inadequate hardware resources, or a poorly configured database server. This section describes a variety of tools and techniques for locating database performance problems. Application Resource MeasurementBecause measuring the performance of an application is difficult, Hewlett-Packard and Tivoli Systems jointly developed an open set of APIs for measuring application response times. This standard set of APIs is called Application Resource Measurement (ARM). If an application has been modified to use the new APIs at key points, then you can measure and control the application's end-to-end performance. API calls are made in the application to mark the beginning and end of transactions. A Software Developer's Kit (SDK) is freely available for ARM from either Tivoli or Hewlett-Packard to make this process easier. Table 8-5 lists and describes the few available ARM APIs. Baan has announced that the next release of its ERP application will be instrumented using the ARM API, which will make collecting application performance data for Baan more straightforward. Table 8-5. ARM APIs
After the application data is collected, it can be presented through performance management tools such as MeasureWare and HP PerfView. Oracle TraceOracle has an API called Oracle Trace that is similar to the ARM APIs, except that Oracle Trace is available only for Oracle environments. Oracle7 server release 7.3.2 and SQL*NET release 2.3 are already instrumented with Oracle Trace calls. Oracle Trace can be used to instrument an Oracle application to get performance data. This set of APIs can also be used to detect the following two types of events occurring in an application:
Tables 8-6 and 8-7 show the Oracle Server 7.3 duration and asynchronous events, respectively, that are instrumented for Oracle Trace. An operator can use the asynchronous events to create an audit trail of database accesses , for example. The duration of various database transactions can be studied over time to look for trends. An application that has been modified to include calls to Oracle Trace routines is called a facility. The user configures the facility definition file to define which metrics, or items, to collect when the application is traced. Table 8-8 shows the items available, by default, for every event. Additional cross-facility items, such as the Oracle7 Server Transaction ID, can be used to correlate data collected from multiple applications. After an application has been instrumented, performance analysis can be enabled from the OEM console. In fact, the Oracle Trace Manager can automatically discover applications that have been instrumented using Oracle Trace APIs. A trace can be started or stopped , and data can be viewed from past or currently active traces. Applications can also start and stop tracing from within themselves . Multiple facilities can be included in the same trace. When the trace, called a collection, is stopped, the data is written to a collection file. In addition to viewing trace information from the GUI, reports can be run against the collection file. The otrcrep command generates a detailed trace report, displaying statistics for all items associated with every occurrence of every event for every facility involved in the collection. Reports can be generated for a specific event or for a specific process, for example. Data can also be exported to an Oracle7 Server database, where data can be queried using SQL scripts. Oracle provides a set of sample scripts to show how this is done. Table 8-6. Oracle Trace Duration Events
Table 8-7. Oracle Trace Asynchronous Events
Table 8-8. Oracle Trace Standard Items
Oracle provides a function for calculating the elapsed time for duration events. Reports can also be generated on the frequency of different types of events, such as connection events. Statistics available include the minimum, maximum, and average times spent doing different events. The Oracle Trace APIs are listed with a brief description in Table 8-9. Oracle V$ TablesOracle provides a set of virtual tables that can be used to access memory structures within the shared global area. Many of these tables contain database performance information that is continuously updated. Table 8-10 lists and describes the contents of some of the more interesting V$ tables. Table 8-9. Oracle Trace APIs
Table 8-10. Oracle V$ Tables
To get a list of all available Oracle V$ performance monitoring tables, execute the following SQL query:SELECT * FROM X$KQFVI; For more information on the V$ tables, consult Oracle's Oracle Server Tuning Guide and Oracle Server Reference manuals. GlancePlus Pak 2000GlancePlus Pak 2000 is a product from Hewlett-Packard. It includes performance management, fault management, and real-time performance monitoring. It has three components :
MeasureWare is an HP product that collects and logs resource and performance metrics. MeasureWare agents run and collect data on the individual server systems being monitored . Agents exist for many platforms and operating systems, including HP-UX, Solaris, and AIX. The MeasureWare agents collect data, summarize it, timestamp it, log it, and send alarms when appropriate. The agents collect and report on a wide variety of system resources, performance metrics, and user-defined data. The information can then be exported to spreadsheets or to performance analysis programs, such as PerfView. The data can be used by these programs to generate alarms to warn of potential performance problems. By using historical data, you can discover trends, which can help you address resource issues before they affect system performance. MeasureWare collects system, application, process, and transaction data. MeasureWare agents collect data at three different levels: global system metrics, application, and process metrics. Global and application data is summarized at five-minute intervals, whereas process data is summarized at one-minute intervals. GlancePlus is a real-time, graphical, performance monitoring tool from Hewlett-Packard. It is used to monitor the performance and system resource utilization of a single system. You should use GlancePlus to see whether a performance problem can be isolated to the system or the database. Both Motif-based and character-based interfaces are available. The product can be used on HP-UX, Solaris, and many other operating systems. GlancePlus collects information similar to MeasureWare, but samples data more frequently. GlancePlus can be used to graphically view current CPU, memory, swap, and disk activity, as well as utilization at the system level. It can also show application and process information. Transaction information can be shown if the MeasureWare Agent is installed and active. ITO/Special Edition is the single-system version of IT/O. From the Java-based GUI, you can view events from a single system in the Event Browser. Diagnostic tools are also provided from the GUI. ITO/Special Edition uses IT/O's intelligent agent technology, which includes the ability to collect data from various sources and execute automated actions. To monitor an application in GlancePlus or MeasureWare, the application must first be defined. To define important applications, you must first define the processes that comprise the application. This is done in the parm file (located at /var/opt/perf/parm), in which you list the process names that make up the application. Wildcards can be used if the exact names vary. For example, ora* could be used to define an application based on all processes beginning with ora. GlancePlus and MeasureWare summarize process information for processes listed in the application definition. Customizable rules determine when a system performance problem should be sent as an alarm. In GlancePlus, the rules are managed by the GlancePlus Adviser. Choosing the Edit Adviser Syntax option on the Adviser menu enables you to see and modify all alarm conditions. The GlancePlus Adviser syntax file (/var/opt/perf/adviser.syntax) contains the symptom and alarm configuration. Additional syntax files can also be used. For MeasureWare, alarms are defined in the alarmdef file, located at /var/opt/perf/alarmdef. In GlancePlus, alarms result in onscreen notifications, with color representing the criticality of an alarm. An alarm can also trigger a command or script to be executed automatically. Instead of sending an alarm, GlancePlus can print messages or notify you by executing a UNIX command, such as mailx, using its EXEC feature. You can also have alarms sent according to conditions that are based on a combination of metrics. For example, a CPU bottleneck alarm can be based on the CPU use and CPU run queue length. MeasureWare agents provide these alarms to PerfView for analysis, and to the IT/O management console. SNMP traps can also be sent at the time a threshold condition is met. Automated actions can be taken, or the operator can choose to take a suggested action. Application- and transaction-level information is available. Transactions must be defined by the application using the ARM API. The following are some of the available metrics:
More than 600 metrics are accessible from GlancePlus and MeasureWare. The complete list of metrics can be found by using the product's online help facility. On systems with GlancePlus Pak 2000 installed, this information can also be found in the directory /opt/perf/paperdocs/gp/C. Hewlett-Packard also provides a separate product, the Process Resource Manager (PRM), which can be used to control CPU, memory, and disk utilization between users or applications. PRM has done some special integration with Oracle, as well. If PRM is being used, GlancePlus shows how well PRM application groups are staying within their resource entitlements . From GlancePlus, you can also change PRM process group entitlements. Although MeasureWare provides extensive performance and resource information, it has limited configuration information and no data about system faults. For further information, visit the HP Resource and Performance Management Web site at http://www.openview.hp.com/solutions/application/. Oracle Management PakThe Oracle Management Pak from Hewlett-Packard can be used to monitor Oracle databases and is used in conjunction with GlancePlus Pak 2000. You can also migrate from this single-server-environment solution to an enterprise solution simply by adding the intelligent agent for IT/O. Numerous database metrics are available. Oracle Management Pak can report the status of the database instance, individual tablespaces and rollback segments, and configured Oracle background processes. The number of snapshot refresh errors can also be reported . The following is a summary of the resource management metrics that are included for Oracle:
In addition to resource management, the following Oracle performance metrics are available:
All of the threshold monitoring and real-time graphs included with GlancePlus Pak 2000 can be applied to these database metrics as well. PerfViewPerfView is a graphical performance analysis tool from Hewlett-Packard. It is used to graphically display performance and system resource utilization for one system or for multiple systems simultaneously , so comparisons can be made. A variety of performance graphs can be displayed. The graphs are based on data collected over a period of time, unlike the real-time graphs of GlancePlus, which show only recent history. PerfView runs on HP-UX or NT systems and works with data collected by MeasureWare agents. PerfView has three main components:
In addition to graphing and analyzing system resources, process and application resources can be graphed and analyzed using PerfView. With PerfView, users can define the set of processes that make up an application. PerfView can then be used to show a history of a specified application's utilization. PerfView's ability to show history and trend information can be helpful in diagnosing system problems. Graphing performance information can help you understand whether a persistent problem exists or an anomaly is simply a momentary spike of activity. With PerfView, you can analyze, alarm, forecast, and report on the Oracle Management Pak metrics and compare them with system, network, and application metrics. The graph in Figure 8-2 shows commit, or transaction, rates over time. Figure 8-2. Graphing transaction rates in PerfView.
To diagnose a problem further, PerfView Monitor enables you to change time intervals, to try to find the specific time a problem occurred. The graph is redrawn showing the new time period. PerfView is integrated with several other monitoring tools. You can launch GlancePlus from within PerfView by accessing the Tools menu. PerfView can be launched from the IT/O Applications Bank. When troubleshooting an event in the IT/O Message Browser window, you can launch PerfView to see a related performance graph. When using IT/O with PerfView, the PerfView Monitor is disabled. The IT/O Message Browser should be used to view alarms, instead. When an alarm is received in IT/O, you can click the alarm to display a related PerfView graph. In a single performance graph, PerfView can show information collected from multiple systems. The PerfView and ClusterView products have also been integrated, enabling you to select a cluster symbol on an HP OpenView submap and launch the PerfView application, to quickly show a performance comparison between all systems in the cluster. For further information, visit the HP Resource and Performance Management Web site at http://www.openview.hp.com/solutions/application/. SMART Plug-Ins for DatabasesIT/Operations is meant to provide a consistent interface for managing systems and networks. The goal of HP's SMART Plug-Ins (SPIs) is to extend support for applications and databases. SPIs are preconfigured solutions that are intended to be used with IT/O, although IT/O is not required. Today, SPIs exist for SAP R/3, Baan, Oracle, Informix, and Sybase. SPIs for Databases provide the same set of metrics as provided by the Oracle Management Pak, plus the ability to report information via IT/O's reliable RPC mechanism, not via SNMP traps. The tight integration between SPIs and IT/O means that other features are available to SPIs, such as event filtering and correlation, and task escalation. Events include built-in instructions and operator text that can aid in diagnosis and recovery. Corrective actions are predefined for many SPI events. Figure 8-3 shows how the SPI for Informix integrates into IT/O. You can see the new message groups that were added for Informix by the SPI for Informix. The Application Group window shows all the database tools added for SPI for Informix. The Message Browser enables you to see messages about the Informix database. Figure 8-3. Using IT/O to see events about Informix databases.
Unlike EMS, which is focused on fault management, SPIs for Databases focus on database configuration and resource management. IT/O templates can specify different thresholds for different database instances. SPIs can verify that the database processes defined in Oracle's init.ora configuration file are actually running, and their performance can be monitored in GlancePlus and PerfView as a single application. BMC PATROL Knowledge ModulesBMC Software, Inc. provides monitoring capabilities through its PATROL software suite. With PATROL, you can define thresholds and send events to a console or to other management applications. PATROL consists of a console, intelligent agents, and more than 40 different monitoring programs, or Knowledge Modules (KMs). Each KM is optional, so you can pick the monitors that are best for your environment. The console provides a centralized graphical display, with icons representing system components or other monitored components. Icons change color to correspond to their status. For example, an icon may change to red to represent a critical event on its node or component. The PATROL console also provides an Event Browser, in which you can filter, correlate, sort, and escalate events. Alarms can be configured so that events are sent to the console and shown in the Event Browser. Events can also be configured to go to HP OpenView IT/O or other SNMP-based management stations . Intelligent agents provide the ability to discover the system, database, and application components in an enterprise. The agents reside on each server. On an ongoing basis, the agents look for problems; when they discover a problem, they either take preconfigured actions or send notification so that recovery can be done manually. The KMs contain the expertise used by PATROL to know what to monitor and how to react when problems occur. KMs are used to monitor a set of "parameters," which can include a description of the monitored attribute, the polling interval, the method for measuring the attribute, and a threshold for abnormal values. The KMs provide rules to detect events and perform corrective actions. Events are sent to an operator console when an error or warning condi tion occurs. BMC PATROL includes KMs for UNIX, SAP R/3, Oracle, Informix, and other applications. BMC's primary business is database and application monitoring. It provides KMs for Oracle, Sybase, and Informix databases. BMC also has KMs for DB2, Red Brick, and Ingres. PATROL arguably provides more database metrics than are available from the other products described in this chapter, although you may need to include multiple KMs. For Oracle environments, PATROL monitors more than 70 metrics for Oracle servers, tablespaces, and users. Metrics include user connection information, active locks, I/O statistics, dictionary hit ratios, and CPU utilization. Remote servers can be monitored by using SQL*NET, even if the server is not running the PATROL agent software. After the Oracle KM is loaded, PATROL determines all the Oracle database instances on the system. The PATROL administrator manually starts and stops the monitoring of instances. PATROL provides the following categories of monitoring information for Oracle:
A variety of database performance graphs are available as icons from the PATROL console. Figure 8-4 shows some of the metrics maintained by the KM for Oracle. You can double-click these icons to see a graph or gauge. Figure 8-4. BMC PATROL icons for performance graphs.
BMC also provides database administration tools. PATROL DB includes database administrative tools such as Pathfinder, DB-Alter, DB- Reorg , DB-Change Manager, DB-Integrity, DB-Voyager, and SQL-Explorer. These tools are integrated into the PATROL framework and can be launched from the PATROL console. By contrast, IT/O provides monitoring only, relying on database vendors to provide these administrative tools. BMC has bundled its database products into a PATROL Availability Suite for Oracle. The product bundle includes the PATROL KM for Oracle, PATROL DB-Stats for Oracle, PATROL DB-Reorg for Oracle, and PATROL DB-Integrity. Unlike MC/ServiceGuard, PATROL does not provide any failover capability. However, a limited set of automated recovery actions is available. Recovery actions can be performed by the agents without requiring communication with the console. HP used to rely on PATROL to provide database information to its MeasureWare Agent, but the Database KMs were resold through a special licensing agreement, so HP now uses its SPIs for Databases or the Oracle Management Pak to gather database information for MeasureWare. PLATINUM DBVisionPLATINUM Technology, Inc. provides the ProVision suite of products for monitoring the performance of servers, databases, applications, and networks. ProVision consists of four products:
When problems are detected , alerts can be sent to a pager, via e-mail, or to a centralized console receiving information from multiple databases. In response to an event, predetermined corrective actions can be taken, which can be defined in customized scripts that are executed automatically. The database performance data provided by DBVision includes the following statistics:
DBVision also includes database resource information, such as:
In addition to database monitoring, PLATINUM provides tools for database administration and for identifying database performance bottlenecks. Database administration tools include Enterprise DBA and TSReorg. Enterprise DBA provides user administration, schema, and content management. Administrative tasks can be done across different databases, with Oracle, Informix, and DB2 databases supported. Remote agents are used to change the database. TSReorg supports both Oracle and Sybase databases. It can be used to reorganize database tables, indexes, and tablespaces. Using DBVision and TSReorg together, you can configure corrective actions to be performed automatically by TSReorg in response to a DBVision event. Tablespaces can be monitored so that database reorganization is triggered automatically. Tablespaces can be reorganized while a database stays online. TSReorg can also be used to view database structures graphically, showing where database fragmentation exists. A Database Analyzer tool can be used to graphically display tablespace and database usage information, which can be used for capacity planning. The Oracle Server Manager can also be used to monitor Oracle databases. Log Analyzer can be used as an additional troubleshooting tool for Sybase and DB2 database environments. |
I l @ ve RuBoard |