Introduction to ADDM

Introduction to `ADDM`

As mentioned in previous chapters, Oracle Database 10g comes with a number of advisors built in. For the performance analyst, the Automatic Database Diagnostic Monitor (ADDM) is the most important of the advisors and is indeed the starting point in the investigation of any performance issue. Affectionately called "ADDuM" or "ADAM," this component of the Database Diagnostic Pack is a powerful aide to the performance analyst when it is understood and used carefully. Simply stated, it is a self-diagnostic mechanism built into the kernel that automatically examines and analyzes the AWR snapshots at the end of every snapshot with the objective of determining any performance-affecting issue. It is then able to recommend corrective action; these recommendations come with an expected benefit.

ADDM does not stop with these recommendations. Just as a general practitioner examines a patient in an initial investigation and recommends further examination in particular areas by other specialists, ADDM can direct the performance analyst to other advisors such as the SQL Tuning Advisor or the SQL Access Advisor when it determines that a problem exists. By default, ADDM executes at the end of every scheduled snapshot, performing a scheduled "database performance health checkup" using the captured AWR snapshots. In effect, it is casting an expert eye on the AWR report in a scheduled manner, every time, all the time. This is equivalent to an unpaid expert DBA performing analysis of these snapshots 24/7/365! These recommendations and findings are stored in the database so that you can analyze and report on them later on, at a convenient time.

The Goal of `ADDM`

Ultimately, the goal of ADDM is to reduce the DB Time component generated by any load on the database. As you saw in Chapter 10, "Adopting a New Approach to Tuning," the DB Time is the total time spent by the foreground sessions performing useful work. In other words, it is a combination of CPU spent parsing and executing SQL, PL/SQL, and Java as well other overheads such as process setup and management. When triggered, ADDM drills down into the performance statistics to identify the root cause of problems rather than just the symptoms, and reports the overall impact of the issue on the system as a whole. In making a recommendation, it reports the benefits that can be expected, again in terms of this DB Time. The use of this common currency allows the impact of several problems or recommendations to be compared effectively.

Inside Scoop on `ADDM` Rules

Because the Time and Wait model helps greatly in summarizing and classifying the various performance statistics, ADDM can quickly focus on where time is spent by using a set of top-down tree-structured rules. These rules are based on decades of collective experience of performance experts in a dedicated Oracle Server Technologies Performance group at Oracle HQ. Apparently, these rules have been validated by applying them on a number of STATSPACK reports and comparing the results with conclusions arrived via experience. We understand that the same personnel who created STATSPACK in Oracle 8i and improved it in Oracle 9i were behind ADDM, so that certainly explains both the similarities as well as the improvements.

Using such a well-understood and easily identifiable set of components to quantify the impact also prevents judgments based on experience rather than hard figures. For example, a rule of thumb, based on experience, might have said that an IOPS (the number of I/O operations per second) should not exceed the rate of, say, 1,000 per second. Anything more than this rate was classified as a problem that should be fixed. That said, we are aware of many systems that can run significantly higher IOPS rates without noticeably affecting performance. Using the new Time and Wait model data in AWR, ADDM can now report quantitatively that such I/O operations are, say, taking 30% of time spent in the database during that period. This quantified value makes it much easier to understand the problem and help determine the effect of fixing the issue, rather than just making a judgmental statement such as "The database is performing too much I/O." Better still, it helps the performance analyst concentrate on what is important to tune so that the fix has the most effect.

Problem Areas Handled by `ADDM`

ADDM handles the most frequently observed performance problems and drills down to the root cause rather than taking the easier approach of just reporting symptoms. This reporting includes but is not limited to problems seen in the following areas:

CPU bottlenecks. Is the system CPU bound by Oracle processes or by some other applications?
Excessive parsing. Is there too much parsing due to use of short SQLs that do not use bind variables?
Lock contention. Is there application-level lock contention?
Concurrency. Is there an excessive number of buffer busy waits, latching, and the like, which reduce concurrency and thus prevent the application from scaling effectively?
I/O capacity. Is the I/O subsystem performing as required, as compared to a set of expected I/O throughput figures?
Incorrect sizing of Oracle memory and file structures. Are Oracle memory structures, such as the buffer cache and redo log buffer, adequate? Are Oracle's file structures, such as the size of redo logs, adequate? Are Oracle settings, such as an aggressive MTTR (mean time to recover), stressing the system?
High-load SQL statements. Are any SQL statements consuming excessive system resources?
High-load Java and PL/SQL time. Are Java and PL/SQL statements consuming a large amount of resources?
Poor connection management. Are there excessive logon/logoff rates?
Hot objects. Are any "hot" objects assessed repeatedly and needing investigation?
RAC-specific issues. Are there any hot blocks in the global cache that result in inter instance contention? Is the interconnect behaving properly, without any latency issues?

ADDM reports these problems as "findings," but does not stop with the diagnosis; it recommends possible solutions, based on the detected problem areas. When appropriate, ADDM recommends multiple solutions for the performance analyst to choose from. These are in the form of recommendations, and include the following:

Hardware changes. This includes increasing the CPU capacity or changing the I/O subsystem configuration.
Database-configuration changes. This includes changing initialization parameter settings, such as those for session caching of cursors, sort area size, and so on.

Schema-level changes. ADDM may recommend partitioning a table or index, using automatic segment-space management (ASSM) for certain segments, and so on.

Recommendations for Individual Session Tuning

ADDM recommendations are made only at the database level. ADDM does not target the tuning of individual response time. You will have to use ASH and the tracing techniques described in Chapters 10, "Adopting a New Approach to Tuning," and 11, "Effectively Using the Automatic Workload Repository," in order to determine performance issues within individual sessions.

Application changes. ADDM may recommend using the cache option for sequences when it encounters high access rates for SEQ$, and recommend using bind variables when it observes short SQLs that have hard-coded values.
Using other advisors. ADDM may recommend running the SQL Tuning Advisor on high-load SQL or running the Segment Advisor on hot objects.

Using `ADDM` with `ASH`

When ADDM is combined with ASH data, it is possible to quickly determine the root cause as well as to drill down to the depths of session details to document what exactly transpired. It is thus possible to even determine what occurred a while ago without having to resort to setting up a replay of the workload in a test environment. Even if ASH data is flushed from the buffers, keep in mind that every 10th sample is retained, and this provides at least a coarse drill down. In comparison, STATSPACK and most other tools do not store session-level data that could assist in such drill downs. As mentioned previously, the extended SQL trace cannot be used in this case because trace needs to be switched on prior to the occurrence of the problem.

Nonproblematic Areas

ADDM highlights nonproblematic areas in addition to the problem areas. This is based on wait classes that have been determined as not affecting the result significantly, but are nevertheless listed. A performance analyst can then quickly see that these wait classes were eliminated and hence not spend time and effort working on something that will not produce significant improvement. This is akin to a general practitioner assuring a patient that an otherwise worrying symptom is not the cause of a deeper issue, helping the patient concentrate on battling what is more important to his or her health. Sometimes, large values in STATSPACK reports that actually do not pose a problem seem alarming. Many a performance analyst has wasted time and resources trying to chase and fix such issues. This is almost entirely avoided in Oracle Database 10g as you'll see later.

Cases When `ADDM` Analysis Is Not Done

ADDM does not spend time performing analysis when it is not required to do so. When it is invoked at the end of every AWR snapshot, ADDM first determines the DB Time spent within the snapshot period. When this value is insignificant compared to the total time period of analysis, ADDM does not proceed further, instead recording the fact that there was no significant activity and thus the report was not generated.

Helpful `ADDM` Views

It pays to be aware of a number of views that ADDM uses. You can use these views in many innovative ways. For example, you can use them to help preserve interesting scenarios and findings before the data is purged by AWR. A small number of these views is listed in Table 12.1. More details can be found in the Oracle Database 10g Reference Manual as well as in the Oracle Database 10g Performance Guide.

Table 12.1. Some `ADDM` Views
View Name	Description
`DBA_ADVISOR_LOG`	This view shows the state of all tasks from all advisors, including `ADDM`. A scheduled check of the `STATUS` and `ERROR` columns is advised to make sure that nothing amiss is occurring on these scheduled tasks. Scheduled `ADDM` runs are stamped as ADDM`:DBID_Instance_Snapshot_ID` for easy identification, while manually scheduled ADDM runs are named `TASK_Task_ID`. This can help differentiate between manual and automated runs.
`DBA_ADVISOR_FINDINGS`	This view exposes the findings of the various advisors. Both the type of finding and the expected impact is shown. See Listing 12.1 for an example of how this view can be used. When no significant activity occurs between two snapshots, the `MESSAGE` column in this view records the message `There was no significant database activity to run the ADDM`. Thus you can use this as a criterion for discarding any `ADDM` tasks that need not be considered.
`DBA_ADVISOR_RECOMMENDATIONS`	This view details the advisor recommendations that follow the findings. The recommendations are tied to the findings. As well, the `BENEFIT_TYPE` and `BENEFIT` columns display the type and amount of expected benefit by following that recommendation.
`DBA_ADVISOR_ACTIONS`	This view exposes the actions required by the recommendations of the aforementioned advisors. Of particular interest is the `MESSAGE` column.
`DBA_ADVISOR_RATIONALE`	This view lists the rationale behind the recommendations from the aforementioned advisors. It identifies the impact of each recommendation. Look at the `MESSAGE`, `IMPACT`, and `IMPACT_TYPE` columns for more information.

Before you move to the next section, let's look at a brief example of how one of the views can be used. A performance analyst has many other tasks, so rather than expecting him or her to review all the ADDM reports that are automatically produced by the system throughout the day and night, the DBA_ADVISOR_FINDINGS view can be used to summarize this information. The query and output are shown in Listing 12.1.

Listing 12.1. Determining Whether `ADDM` Found Problems in the Past 24 Hours

 SQL> select type, count(*) from dba_advisor_findings   2  where task_id in   3  (select task_id from dba_advisor_log where execution_start > sysdate - 1)   4  group by type; TYPE          COUNT(*) ----------- ---------- INFORMATION         40 PROBLEM             31 SYMPTOM             18

From the result, you can see that there were at least 31 problems detected by ADDM during the past 24 hours. This will warrant more research and analysis of individual ADDM tasks. You can use the query shown in Listing 12.2 to determine which ADDM task IDs to query on.

Listing 12.2. Determining the `ADDM` Task IDs with Problems in the Past 24 Hours

 SQL> select distinct f.task_id, f.task_name, l.execution_start   2  from dba_advisor_findings f, dba_advisor_log l   3  where f.type in ('PROBLEM', 'SYMPTOM')   4  and f.task_id = l.task_id   5  and l.execution_start > sysdate  1;    TASK_ID TASK_NAME                      EXECUTION ---------- ------------------------------ ---------       1490 TASK_1490                      05-FEB-05       1493 ADDM:845721274_1_1522          05-FEB-05       1494 TASK_1494                      05-FEB-05       1495 ADDM:845721274_1_1523          05-FEB-05

You can then use this output to produce a detailed ADDM report, given the task IDs. You also notice that tasks 1490 and 1494 were probably manually performed, while ADDM automatically produced tasks 1493 and 1495.