Section 7.9. Un-Instrumented Oracle Kernel Code | Optimizing Oracle Performance

7.9 Un-Instrumented Oracle Kernel Code

The final cause of missing time in a trace file that I'll cover is the category consisting of un-instrumented Oracle kernel code . As you've now seen, Oracle provides instrumentation for database calls in the form of c and e data, which represent total CPU consumption and total elapsed duration, respectively. For segments of Oracle kernel code that can consume significant response time but not much CPU capacity, Oracle Corporation gives us the "wait event" instrumentation, complete with an elapsed time and a distinguishing name for the code segment being executed.

Chapter 12 lists the number of code segments that are instrumented in a few popular releases of the Oracle kernel since 7.3.4. Notice that the number grows significantly with each release listed in the table. There are, for example, 146 more instrumented system calls in release 9.2.0 than there are in release 8.1.7. Certainly, some of these newly instrumented events represent new product features. It is possible that some new names in V$EVENT_NAME correspond to code segments that were present but just not yet instrumented in an earlier Oracle kernel release.

7.9.1 Effect

When Oracle Corporation leaves a sequence of kernel instructions un-instrumented, the missing time becomes apparent in one of two ways:

Missing time within a database call

Any un-instrumented code that occurs within the context of a database call will leave a gap between the database call's elapsed duration ( e ) and the value of c + S ela for the call. Within the trace file, the phenomenon will be indistinguishable from the time spent not executing problem that I described earlier. On systems that do not exhibit much paging or swapping, the presence of a large gap ( D ) in the following equation for an entire trace file is an indicator of an un-instrumented time problem:

figs/eq_0711.gif

To envision this problem, imagine what would happen if a five-second database file read executed by a fetch were un-instrumented. The elapsed time for the fetch ( e ) would include the five-second duration, but neither the total CPU consumption for the fetch ( c ) nor the wait event durations ( ela values) would be large enough to account for the whole elapsed duration.

Missing time between database calls

Any un-instrumented code that occurs outside of the context of a database call cannot be detected in the same manner as un-instrumented code that occurs within a database call. You can detect missing time between calls in one of two ways. First, a sequence of between-call events with no intervening database calls is an indication of un-instrumented Oracle kernel code path . Second, you can detect un-instrumented calls by inspecting tim statistic values within a trace file. If you see "adjacent" tim values that are much farther apart in time than the intervening database call and wait event lines can account for, then you've discovered this problem. A trace file exhibits this issue if it has a large D value in the formula in which R denotes the known response time for which the trace file was supposed to account:

Oracle bug number 2425312 is one example of this problem. It is a case in which entire database calls executed through the PL/SQL remote procedure call (RPC) interface emit no trace data whatsoever. The result is a potentially enormous gap in the time accounting within a trace file.

In practice you may never encounter a situation in which un-instrumented system calls will consume an important proportion of a program's elapsed time. We encounter the phenomenon at a rate of fewer than five per thousand trace files at www.hotsos.com. One case of un-instrumented database activity is documented as bug number 2425312 at Oracle MetaLink . You may encounter this bug if you trace Oracle Forms applications with embedded (client-side) PL/SQL. You will perhaps encounter other cases in which un-instrumented time materially affects your analysis, but those cases will be rare.

7.9.2 Trace Writing

You will encounter at least one un-instrumented system call every time you use SQL trace, although its performance impact is usually small. It is the write call that the Oracle kernel uses to write SQL trace output to the trace file. Using strace allows you to see quite plainly how the Oracle kernel writes each line of data to a trace file. Of the several hundred extended SQL trace files collected at www.hotsos.com by the time of this writing, fewer than 1% exhibit accumulation of unaccounted-for time that might be explained by slow trace file writing. However, you should follow these recommendations to reduce the risk that the very act of tracing an application program will materially degrade the performance of an application:

Check with Oracle MetaLink to ensure that your system is not susceptible to Oracle kernel bugs that might unnecessarily impede the performance of trace file writing. For example, bug number 2202613 affects the performance of trace file writing on some Microsoft Windows 2000 ports. Bug number 1210242 needlessly degrades Oracle performance while tracing is activated.
Place your USER_DUMP_DEST and BACKGROUND_DUMP_DEST directories on efficient I/O channels. Don't write trace data to your root filesystem or the oldest, slowest disk drive on your system. Although the outcome of the diagnostic process will often be significant performance improvement, no analyst wants to be accused of even temporarily degrading the performance of an application.
Keep load that competes with trace file I/O as low as possible during a trace. For example, avoid tracing more than one session at a time to the same I/O device. Exceptions include application programs that naturally emit more than one trace file, such as parallel operations or any program that distributes workload over more than one Oracle server processes.

Don't let the overhead of writing to trace files deter you from using extended SQL trace data as a performance diagnostic tool. Keep the overhead in perspective. The potential overhead is not noticeable in most cases. Even if the performance overhead were nearly unbearable, the overhead of tracing a program once is a worthwhile investment if the diagnosis results in either of the following outcomes :

You can repair the program under analysis, resulting in significant conservation of system capacity and a significant reduction in end- user response time.
You can prove that the program under analysis performs as well as it can, and thus that further optimization investment will be futile.

Top