Section 5.4. Evolution of the Response Time Model

5.4 Evolution of the Response Time Model

In the 1980s, when most of today's "tuning methods " were invented, Oracle's SQL trace facility did not yet have the capability to emit wait event timing information ”the WAIT lines ”into the trace file. The c , e , and tim data were the only trace data elements that we had. Of course, if most of an application's response time had been spent consuming CPU, then the c and e data told us most of what we needed to know about the performance of our database calls. However, if some of a database call's response time was not due to CPU consumption, then our analysis became more difficult.

For example, consider the following fetch call statistics obtained from an application running on Oracle 8.1.7.2:

 FETCH #1:c=80741,e=151841,p=9628,cr=34304348,cu=10,mis=0,r=0,dep=0,og=4,tim=87762034

This fetch call consumed 1,518.41 seconds of elapsed time, only 807.41 of which was spent on the CPU. Where did the other 711.00 seconds of response time go? Was it latch contention ? Enqueue waits? Long disk queues? Excessive paging? We simply cannot know by looking at this FETCH line. Its statistics contain insufficient information to determine where the unaccounted-for 711 seconds of elapsed time went. Certainly, a large p value is a clue that some of the unaccounted-for e time might have been consumed by OS read calls, but there are roughly 200 different wait events that Oracle could have executed during those 711 seconds. From viewing only the fetch statistics shown here, we cannot know how the 711 seconds were consumed.

In 1992 with the release of kernel Version 7.0.12, Oracle Corporation published an elegant solution to this problem. The new mechanism that Oracle provided was simply to instrument several events executed by the Oracle kernel that consume elapsed time but not CPU capacity. The value of the so-called wait data is absolutely extraordinary. It helps to fill in the time gap between e and c . Anjo Kolk and Shari Yamaguchi were the first to document the use of "wait data" in the document that became the landmark YAPP Method [Kolk and Yamaguchi (1999)].

Let's revisit our previous example, in which we had 711 seconds of unaccounted-for time. Instructing the Oracle kernel to produce the WAIT statistics adds 9,748 more lines of data to our trace file before the fetch call. Executing the Perl program in Example 5-7 upon 9,749 lines of trace data produces the following resource profile:

 $  prof-cid waits.1.trc  Duration     Pct  Oracle kernel event ---------  ------  ----------------------------------------   807.41s   53.2%  total CPU   426.26s   28.1%  direct path write   197.29s   13.0%  db file sequential read    76.23s    5.0%  unaccounted-for     8.28s    0.5%  latch free     2.87s    0.2%  db file scattered read     0.05s    0.0%  file open     0.02s    0.0%  buffer busy waits     0.00s    0.0%  SQL*Net message to client ---------  ------  ----------------------------------------  1518.41s  100.0%  Total response time

Now we know. Over 53% of the response time for the fetch was consumed on a CPU in user mode. Over 28% was consumed writing (surprise!) to disk. Another 13% was consumed by reading from disk, and roughly another 6% of the response time was consumed in various other wait events.

Example 5-7. A Perl program that creates a resource profile from raw SQL trace data for a single, simple Oracle database call (with no associated recursive database calls)

 #!/usr/bin/perl      # $Header: /home/cvs/cvm-book1/sqltrace/prof-cid.pl,v 1.4 2003/03/20 23:32:32 cvm Exp $ # Cary Millsap (cary.millsap@hotsos.com) # Copyright (c) 1999-2003 by Hotsos Enterprises, Ltd. All rights reserved.      # Create a resource profile for a single database call. # Usage:  #!/usr/bin/perl # $Header: /home/cvs/cvm-book1/sqltrace/prof-cid.pl,v 1.4 2003/03/20 23:32:32 cvm Exp $ # Cary Millsap (cary.millsap@hotsos.com) # Copyright (c) 1999-2003 by Hotsos Enterprises, Ltd. All rights reserved. # Create a resource profile for a single database call. # Usage: $0 file.trc # Requires input of Oracle extended SQL trace data (level 8 or level 12) # that has been pre-filtered to contain only a single database call (that # is, a single PARSE, EXEC, FETCH, UNMAP, or SORT UNMAP with no recursive # children) and the WAIT lines associated with that db call. Example input # file content: # # WAIT #2: nam='db file sequential read' ela= 0 p1=2 p2=3240 p3=1 WAIT # #2: nam='db file sequential read' ela= 0 p1=2 p2=3239 p3=1 FETCH # #2:c=213,e=998,p=2039,cr=100550,cu=5,mis=0,r=0,dep=0,og=4,tim=85264276 use strict; use warnings; my $cid; # cursor id my %ela; # $ela{event} contains sum of ela statistics for event my $sum_ela = 0; # sum of all ela times across events my $r = 0; # response time for database call my $action = "(?:PARSE EXEC FETCHUNMAPSORT UNMAP)"; while (<>) { if (/^WAIT #(\d+): nam='([^']*)' ela=\s*(\d+)/i) { $ela{$2} += $3; $sum_ela += $3; } elsif (/^$action #(\d+):c=(\d+),e=(\d+)/i) { $ela{"total CPU"} += $2; $r = $3; } if (!defined $cid) { $cid = $1; } else { die "can't mix data across cursor ids $cid and $1" if $1 != $cid; } } $ela{"unaccounted-for"} = $r - ($ela{"total CPU"} + $sum_ela); printf "%9s %6s %-40s\n", "Duration", "Pct", "Oracle kernel event"; printf "%8s- %5s- %-40s\n", "-"x8, "-"x5, "-"x40; printf "%8.2fs %5.1f%% %-40s\n", $ela{$_}/100, $ela{$_}/$r*100, $_ for sort { $ela{$b} <=> $ela{$a} } keys %ela; printf "%8s- %5s- %-40s\n", "-"x8, "-"x5, "-"x40; printf "%8.2fs %5.1f%% %-40s\n", $r/100, 100, "Total response time"; 
 file.trc      # Requires input of Oracle extended SQL trace data (level 8 or level 12) # that has been pre-filtered to contain only a single database call (that # is, a single PARSE, EXEC, FETCH, UNMAP, or SORT UNMAP with no recursive # children) and the WAIT lines associated with that db call. Example input # file content: # #   WAIT #2: nam='db file sequential read' ela= 0 p1=2 p2=3240 p3=1 WAIT #   #2: nam='db file sequential read' ela= 0 p1=2 p2=3239 p3=1 FETCH #   #2:c=213,e=998,p=2039,cr=100550,cu=5,mis=0,r=0,dep=0,og=4,tim=85264276      use strict; use warnings; my $cid;            # cursor id my %ela;            # $ela{event} contains sum of ela statistics for event my $sum_ela = 0;    # sum of all ela times across events my $r = 0;          # response time for database call my $action = "(?:PARSEEXECFETCHUNMAPSORT UNMAP)"; while (<>) {     if (/^WAIT #(\d+): nam='([^']*)' ela=\s*(\d+)/i) {         $ela{} += ;         $sum_ela += ;     }     elsif (/^$action #(\d+):c=(\d+),e=(\d+)/i) {         $ela{"total CPU"} += ;         $r = ;     }     if (!defined $cid) {         $cid = ;     } else {         die "can't mix data across cursor ids $cid and " if  != $cid;     } } $ela{"unaccounted-for"} = $r - ($ela{"total CPU"} + $sum_ela); printf "%9s  %6s  %-40s\n", "Duration", "Pct", "Oracle kernel event"; printf "%8s-  %5s-  %-40s\n", "-"x8, "-"x5, "-"x40; printf "%8.2fs  %5.1f%%  %-40s\n", $ela{$_}/100, $ela{$_}/$r*100, $_ for sort { $ela{$b}  <=> $ela{$a} } keys %ela; printf "%8s-  %5s-  %-40s\n", "-"x8, "-"x5, "-"x40; printf "%8.2fs  %5.1f%%  %-40s\n", $r/100, 100, "Total response time";

Note the row labeled "unaccounted-for" in our resource profile. Consider how it was computed. The total elapsed time ”in fact the response time ”for the fetch call is simply the value of e for the fetch. The raw trace data account for this response time in two ways:

The total CPU time component of the fetch call's response time is recorded as the c statistic on the FETCH line itself.
The system-call time components of the response time are recorded as ela statistics on all of the WAIT lines associated with the fetch.

The "unaccounted-for" duration is thus the leftover amount D (delta) expressed in the following formula:

How Oracle response time accounting has evolved since Oracle Version 6 is an interesting story. In Version 6, Oracle's SQL trace facility printed database call response times ( e ) and CPU consumptions ( c ) to the trace file, but that was the only response time data that the Oracle kernel published. The first Oracle response time model was simple. It was "response time equals CPU consumption plus some unidentified other stuff," or:

e = c + D

This model is effective when D is small, but it is not reliable for diagnosing many types of response time problems that occur when D is large. In the Version 6 days, most analysts were taught to assume that large values of D were attributable to time consumed by operating system read calls. This assumption is often incorrect (as was the case in the resource profile shown previously), but it has helped analysts solve many application performance problems. One reason for the model's success in spite of its over-simplicity is that so many Oracle application problems are caused by fetch calls that access the database buffer cache excessively. These cases create small D values for which the e = c + D model works just fine.

Oracle kernel developers were among the first to encounter the most serious inadequacies of the model. The range of potential root causes for large D values was so large that some important high-end response time problems simply could not be solved without more operational data. Oracle's extended SQL trace data, introduced to the general market in 1992 with release 7.0.12, is an elegant solution to the problem. Extended SQL trace data include those WAIT lines that tell us how much time the Oracle kernel spends "waiting" for the execution of key events. The new, significantly improved response time model made possible by the new extended SQL trace feature of Oracle release 7.0.12 is the one that we use today:

As it happens, extended SQL trace data provide significantly more diagnostic power than most analysts have ever believed. Of the few analysts who even realize that the gap D exists, some deem the existence of the gap a deficiency of extended SQL trace data that renders the data unreliable. On the contrary, as you shall see, there is good information buried in the value of D . There are several contributory causes of non-zero D values, as I explain in Chapter 7. Understanding these causes helps you exploit the full diagnostic power of Oracle's extended SQL trace data.

Top