Section 7.7. Quantization Error | Optimizing Oracle Performance

7.7 Quantization Error

A few years ago, a friend argued that Oracle's extended SQL trace capability was a performance diagnostic dead-end. His argument was that in an age of 1GHz CPUs, the one-centisecond resolution of the then-contemporary Oracle8 i kernel was practically useless. However, in many hundreds of projects in the field, the extended SQL trace feature has performed with practically flawless reliability ”even the "old one-centisecond stuff" from Oracle8 i . The reliability of extended SQL trace is difficult to understand unless you understand the properties of measurement resolution and quantization error .

7.7.1 Measurement Resolution

When I was a kid, one of the everyday benefits of growing up in the Space Age was the advent of the digital alarm clock. Digital clocks are fantastically easy to read, even for little boys who don't yet know how to "tell the time." It's hard to go wrong when you can see those big red numbers showing "7:29". With an analog clock, a kid sometimes has a hard time knowing whether it's five o'clock and seven o'clock, but the digital difference between 5 and 7 is unmistakable.

But there's a problem with digital clocks. When you glance at a digital clock that shows "7:29", how close is it to 7:30? The problem with digital clocks is that you can't know. With a digital clock, you can't tell whether it's 7:29:00 or 7:29:59 or anything in-between. With an analog clock, even one without a second-hand , you can guess about fractions of minutes by looking at how close the minute hand is to the next minute.

All times measured on digital computers derive ultimately from interval timers , which behave like the good old digital clocks at our bedsides. An interval timer is a device that ticks at fixed time intervals. Times collected from interval timers can exhibit interesting idiosyncrasies. Before you can make reliable decisions based upon Oracle timing data, you need to understand the limitations of interval timers.

An interval timer's resolution is the elapsed duration between adjacent clock ticks. Timer resolution is the inverse of timer frequency . So a timer that ticks at 1 GHz (approximately 10 ⁹ ticks per second) has a resolution of approximately 1/10 ⁹ seconds per tick, or about 1 nanosecond (ns) per tick. The larger a timer's resolution, the less detail it can reveal about the duration of a measured event. But for some timers ( especially ones involving software), making the resolution too small can increase system overhead so much that you alter the performance behavior of the event you're trying to measure.

Heisenberg Uncertainty and Computer Performance Analysis

The problem of measuring computer event durations with a discrete clock is analogous to the famous uncertainty principal of quantum physics. The uncertainty principal, formulated by Werner Heisenberg in 1926, holds that the uncertainty in the position of a particle times the uncertainty in its velocity times the mass of the particle can never be smaller than a certain quantity, which is known as Planck's constant [Hawking (1988) 55]. Hence, for very small particles, it is impossible to know precisely both the particle's position and its velocity.

Similarly, it is difficult to measure some things very precisely in a computing system, especially when using software clocks. A smaller resolution yields a more accurate measurement, but using a smaller resolution on a clock implemented with software can have a debilitating performance impact upon the application you're trying to measure. You'll see an example later in this chapter when I discuss the resolution of the getrusage system function.

In addition to the influences of resolution upon computer application timings, the effects of measurement intrusion of course influence the user program's execution time as well. The total impact of such unintended influences of instrumentation upon an application's performance creates what Oracle performance analysts might refer to as a "Heisenberg-like effect."

As a timing statistic passes upward from hardware through various layers of application software, each application layer can either degrade its resolution or leave its resolution unaltered. For example:

The resolution of the result of the gettimeofday system call, by POSIX standard, is one microsecond. However, many Intel Pentium CPUs contain a hardware time stamp counter that provides resolution of one nanosecond. The Linux gettimeofday call, for example, converts a value from nanoseconds (10 ^-9 seconds) to microseconds (10 ^-6 seconds) by performing an integer division of the nanosecond value by 1,000, effectively discarding the final three digits of information.
The resolution of the e statistic on Oracle8 i is one centisecond. However, most modern operating systems provide gettimeofday information with microsecond accuracy. The Oracle8 i kernel converts a value from microseconds (10 ^-6 seconds) to centiseconds (10 ^-2 seconds) by performing an integer division of the microsecond value by 10,000, effectively discarding the final four digits of information [Wood (2003)].

Thus, each e and ela statistic emitted by an Oracle8 i kernel actually represents a system characteristic whose actual value cannot be known exactly, but which resides within a known range of values. Such a range of values is shown in Table 7-1. For example, the Oracle8 i statistic e=2 can refer to any actual elapsed duration e _a in the range:

2.000 000 cs

e _a

2.999 999 cs

Table 7-1. A single e or ela statistic in Oracle8i represents a range of possible actual timing values

e , ela statistic (cs)	Minimum possible gettimeofday value (cs)	Maximum possible gettimeofday value (cs)
	0.000 000	0.999 999
1	1.000 000	1.999 999
2	2.000 000	2.999 999
3	3.000 000	3.999 999
...	...	...

On the Importance of Testing Your System

Please test conjectures about how Oracle computes times on your system by using a tool like strace . One trace file in our possession generated by Oracle release 9.2.0.1.0 running on a Compaq OSF1 host reveals a c resolution of 3,333.3 m s, an e resolution of 1,000 m s, and an ela resolution of 10,000 m s. These data of course make us wonder whether the e and ela times on that platform are really coming from the same gettimeofday system call. (If the e and ela values on this platform had come from the same system call, then why would the resulting values have different apparent resolutions ?) With a system call trace, it would be a very easy question to answer. Without one, we can only guess.

7.7.2 Definition of Quantization Error

Quantization error is the quantity E defined as the difference between an event's actual duration e _a and its measured duration e _m . That is:

E = e _m - e _a

Let's revisit the execution of Example 7-1 superimposed upon a timeline, as shown in Figure 7-5. In this drawing, each tick mark represents one clock tick on an interval timer like the one provided by gettimeofday . The value of the timer was t = 1492 when do_something began . The value of the timer was t ₁ = 1498 when do_something ended. Thus the measured duration of do_something was e _m = t ₁ - t = 6. However, if you physically measure the length of the duration e _a in the drawing, you'll find that the actual duration of do_something is e _a = 5.875 ticks. You can confirm the actual duration by literally measuring the height of e _a in the drawing, but it is not possible for an application to "know" the value of e _a by using only the interval timer whose ticks are shown in Figure 7-5. The quantization error is E = e _m - e _a = 0.125 ticks, or about 1.7 percent of the 5.875-tick actual duration.

Figure 7-5. An interval timer is accurate for measuring durations of events that span many clock ticks

Now, consider the case when the duration of do_something is much closer to the timer resolution, as shown in Figure 7-6. In the left-hand case, the duration of do_something spans one clock tick, so it has a measured duration of e _m = 1. However, the event's actual duration is only e _a = 0.25. (You can confirm this by measuring the actual duration in the drawing.) The quantization error in this case is E = 0.75, which is a whopping 300% of the 0.25-tick actual duration. In the right-hand case, the execution of do_something spans no clock ticks, so it has a measured duration of e _m = 0; however, its actual duration is e _a = 0.9375. The quantization error here is E = -0.9375, which is -100% of the 0.9375-tick actual duration.

Figure 7-6. An interval timer is not accurate for measuring durations of events that span zero or only a few clock ticks

To describe quantization error in plain language:

Any duration measured with an interval timer is accurate only to within one unit of timer resolution.

More formally , the difference between any two digital clock measurements is an elapsed duration with quantization error whose exact value cannot be known , but which ranges anywhere from almost -1 clock tick to almost +1 clock tick. If we use the notation r _x to denote the resolution of some timer called x , then we have:

x _m - r _x < x _a < x _m + r _x

The quantization error E inherent in any digital measurement is a uniformly distributed random variable (see Chapter 11) with range - r _x < E < r _x , where r _x denotes the resolution of the interval timer.

Whenever you see an elapsed duration printed by the Oracle kernel (or any other software), you must think in terms of the measurement resolution. For example, if you see the statistic e=4 in an Oracle8 i trace file, you need to understand that this statistic does not mean that the actual elapsed duration of something was 4 cs. Rather, it means that if the underlying timer resolution is 1 cs or better, then the actual elapsed duration of something was between 3 cs and 5 cs. That is as much detail as you can know.

For small numbers of statistic values, this imprecision can lead to ironic results. For example, you can't actually even make accurate comparisons of event durations for events whose measured durations are approximately equal. Figure 7-7 shows a few ironic cases. Imagine that the timer shown here is ticking in 1-centisecond intervals. This is behavior that is equivalent to the Oracle8 i practice of truncating digits of timing precision past the 0.01-second position. Observe in this figure that event A consumed more actual elapsed duration than event B , but B has a longer measured duration; C took longer than D , but D has a longer measured duration. In general, any event with a measured duration of n + 1 may have an actual duration that is longer than, equal to, or even shorter than another event having a measured duration of n . You cannot know which relationship is true.

Figure 7-7. Any duration measured with an interval timer is accurate only to within one unit of timer resolution. Notice that events measured as n clock ticks in duration can actually be longer than events measured as n + 1 ticks in duration

An interval timer can give accuracy only to ±1 clock tick, but in practical application, this restriction does not diminish the usefulness of interval timers. Positive and negative quantization errors tend to cancel each other over large samples. For example, the sum of the quantization errors in Figure 7-6 is:

E ₁ + E ₂ = 0.75 + (-0.9375) = 0.1875

Even though the individual errors were proportionally large, the sum of the errors is a much smaller 16% of the sum of the two actual event durations. In a several hundred SQL trace files collected at www.hotsos.com from hundreds of different Oracle sites, we have found that positive and negative quantization errors throughout a trace file with hundreds of lines tend to cancel each other out. Errors commonly converge to magnitudes smaller than ±10% of the total response time measured in a trace file.

7.7.3 Complications in Measuring CPU Consumption

You may have noticed by now that your gettimeofday system call has much better resolution than getrusage . Although the pseudocode in Example 7-3 makes it look like gettimeofday and getrusage do almost the same thing, the two functions work in profoundly different ways. As a result, the two functions produce results with profoundly different accuracies.

7.7.3.1 How gettimeofday works

The operation of gettimeofday is much easier to understand than the operation of getrusage . I'll use Linux on Intel Pentium processors as a model for explaining. As I mentioned previously, the Intel Pentium processor has a hardware time stamp counter (TSC) register that is updated on every hardware clock tick. For example, a 1-GHz CPU will update its TSC approximately a billion times per second [Bovet and Cesati (2001) 139-141]. By counting the number of ticks that have occurred on the TSC since the last time a user set the time with the date command, the Linux kernel can determine how many clock ticks have elapsed since the Epoch. The result returned by gettimeofday is the result of truncating this figure to microsecond resolution (to maintain the POSIX-compliant behavior of the gettimeofday function).

7.7.3.2 How getrusage works

There are two ways in which an operating system can account for how much time a process spends consuming user-mode or kernel-mode CPU capacity:

Polling (sampling): The operating system could be instrumented with extra code so that, at fixed intervals, each running process could update its own rusage table. At each interval, each running process could update its own CPU usage statistic with the estimate that it has consumed the entire interval's worth of CPU capacity in whatever state the processes is presently executing in.
Event-based instrumentation: The operating system could be instrumented with extra code so that every time a process transitions to either user running or kernel running state, it could issue a high-resolution timer call. Every time a process transitions out of that state, it could issue another timer call and publish the microseconds' worth of difference between the two calls to the process' rusage accounting structure.

Most operating systems use polling, at least by default. For example, Linux updates several attributes for each process, including the CPU capacity consumed thus far by the process, upon every clock interrupt [Bovet and Cesati (2001) 144-145]. Some operating systems do provide event-based instrumentation. Sun Solaris, for example, provides this feature under the name microstate accounting [Cockroft (1998)].

With microstate accounting, quantization error is limited to one unit of timer resolution per state switch. With a high-resolution timer (like gettimeofday ), the total quantization error on CPU statistics obtained from microstate accounting can be quite small. However, the additional accuracy comes at the cost of incrementally more measurement intrusion effect. With polling, however, quantization error can be significantly worse , as you'll see in a moment.

Regardless of how the resource usage information is obtained, an operating system makes this information available to any process that wants it via a system call like getrusage . POSIX specifies that getrusage must use microseconds as its unit of measure, but ”for systems that obtain rusage information by polling ”the true resolution of the returned data depends upon the clock interrupt frequency.

The clock interrupt frequency for most systems is 100 interrupts per second, or one interrupt every centisecond (operating systems texts often speak in terms of milliseconds ; 1 cs = 10 ms = 0.010 s). The clock interrupt frequency is a configurable parameter on many systems, but most system managers leave it set to 100 interrupts per second. Asking a system to service interrupts more frequently than 100 times per second would give better time measurement resolution, but at the cost of degraded performance. Servicing interrupts even only ten times more frequently would intensify the kernel-mode CPU overhead consumed by the operating scheduler by a factor of ten. It's generally not a good tradeoff .

If your operating system is POSIX-compliant, the following Perl program will reveal its operating system scheduler resolution [Chiesa (1996)]:

 $  cat clkres.pl  #!/usr/bin/perl use strict; use warnings; use POSIX qw(sysconf _SC_CLK_TCK); my $freq = sysconf(_SC_CLK_TCK); my $f = log($freq) / log(10); printf "getrusage resolution %.${f}f seconds\n", 1/$freq; $  perl clkres.pl  getrusage resolution: 0.01 seconds

With a 1-cs clock resolution, getrusage may return microsecond data, but those microsecond values will never contain valid detail smaller than 1/100 ^th (0.01) of a second.

The reason I've explained all this is that the quantization error of the Oracle c statistic is fundamentally different from the quantization error of an e or ela statistic. Recall when I wrote:

Any duration measured with an interval timer is accurate only to within one unit of timer resolution.

The problem with Oracle's c statistic is that the statistic returned by getrusage is not really a duration . That is, a CPU consumption "duration" returned by getrusage is not a statistic obtained by taking the difference of a pair of interval timer measurements. Rather:

On systems with microstate accounting activated, CPU consumption is computed as potentially very many short durations added together.
On systems that poll for rusage information, CPU consumption is an estimate of duration obtained by a process of polling.

Hence, in either circumstance, the quantization error inherent in an Oracle c statistic can be much worse than just one clock tick. The problem exists even on systems that use microstate accounting. It's worse on systems that don't.

Figure 7-8 depicts an example of a standard polling-based situation in which the errors in user-mode CPU time attribution add up to cause an over-attribution of time to a database call's response time. The sequence diagram in this figure depicts both the user-mode CPU time and the system call time consumed by a database call. The CPU axis shows clock interrupts scheduled one cs apart. Because the drawing is so small, I could not show the 10,000 clock ticks on the non-CPU time consumer axis that occur between every pair of ticks on the CPU axis.

Figure 7-8. The way getrusage polls for CPU consumption can cause over-attributions of response time to an individual database call

In response to the actions depicted in Figure 7-8, I would expect an Oracle9 i kernel to emit the trace data shown in Example 7-5. I computed the expected e and ela statistics by measuring the durations of the time segments on the sys call axis. Because of the fine resolution of the gettimeofday clock with which e and ela durations are measured, the quantization error in my e and ela measurements is negligible.

Example 7-5. The Oracle9i timing statistics that would be generated by the events depicted in Figure 7-8

 WAIT #1: ...ela= 6250 WAIT #1: ...ela= 6875 WAIT #1: ...ela= 32500 WAIT #1: ...ela= 6250 FETCH #1:c=60000,e=72500,...

The actual amount of CPU capacity consumed by the database call was 2.5 cs, which I computed by measuring durations physically in the picture. However, getrusage obtains its CPU consumption statistic from a process's resource usage structure, which is updated by polling at every clock interrupt. At every interrupt, the operating system's process scheduler tallies one full centisecond (10,000 m s) of CPU consumption to whatever process is running at the time. Thus, getrusage will report that the database call in Figure 7-8 consumed six full centiseconds' worth of CPU time. You can verify the result by looking at Figure 7-8 and simply counting the number of clock ticks that are spanned by CPU consumption.

It all makes sense in terms of the picture, but look at the unaccounted-for time that results:

figs/eq_0703.gif

Negative unaccounted-for time means that there is a negative amount of "missing time" in the trace data. In other words, there is an over -attribution of 39,375 m s to the database call. It's an alarmingly large-looking number, but remember, it's only about 4 cs. The actual amount of user-mode CPU that was consumed during the call was only 25,000 m s (which, again, I figured out by cheating ”by measuring the physical lengths of durations depicted in Figure 7-8).

7.7.4 Detection of Quantization Error

Quantization error E = e _m - e _a is the difference between an event's actual duration e _a and its measured duration e _m . You cannot know an event's actual duration; therefore, you cannot detect quantization error by inspecting an individual statistic. However, you can prove the existence of quantization error by examining groups of related statistics. You've already seen an example in which quantization error was detectable. In Example 7-5, we could detect the existence of quantization error by noticing that:

It is easy to detect the existence of quantization error by inspecting a database call and the wait events executed by that action on a low-load system, where other influences that might disrupt the e c + S ela relationship are minimized.

The following Oracle8 i trace file excerpt shows the effect of quantization error:

 WAIT #103: nam='db file sequential read' ela= 0 p1=1 p2=3051 p3=1 WAIT #103: nam='db file sequential read' ela= 0 p1=1 p2=6517 p3=1 WAIT #103: nam='db file sequential read' ela= 0 p1=1 p2=5347 p3=1 FETCH #103:c=0,e=1,p=3,cr=15,cu=0,mis=0,r=1,dep=2,og=4,tim=116694745

This fetch call motivated exactly three wait events. We know that the c , e , and ela times shown here should be related by the approximation :

On a low-load system, the amount by which the two sides of this approximation are unequal is an indication of the total quantization error present in the five measurements (one c value, one e value, and three ela values):

figs/eq_0706.gif

Given that individual gettimeofday calls account for only a few microseconds of measurement intrusion error on most systems, quantization error is the prominent factor contributing to the 1-centisecond (cs) "gap" in the trace data.

The following Oracle8 i trace file excerpt shows the simplest possible over-counting of elapsed time, resulting in a negative amount of unaccounted-for time:

 WAIT #96: nam='db file sequential read' ela= 0 p1=1 p2=1691 p3=1 FETCH #96:c=1,e=0,p=1,cr=4,cu=0,mis=0,r=1,dep=1,og=4,tim=116694789

Here, we have E = -1 cs:

figs/eq_0707.gif

In this case of a "negative gap" like the one shown here, we cannot appeal to the effects of measurement intrusion for explanation; the measurement intrusion effect can create only positive unaccounted-for time. It might have been possible that a CPU consumption double-count had taken place; however, this isn't the case here, because the value ela= 0 means that no CPU time was counted in the wait event at all. In this case, quantization error has had a dominating influence, resulting in the over-attribution of time within the fetch.

Although Oracle9 i uses improved output resolution in its timing statistics, Oracle9 i is by no means immune to the effects of quantization error, as shown in the following trace file excerpt with E > 0:

 WAIT #5: nam='db file sequential read' ela= 11597 p1=1 p2=42463 p3=1 FETCH #5:c=0,e=12237,p=1,cr=3,cu=0,mis=0,r=1,dep=2,og=4,tim=1023745094799915

In this example, we have E = 640 m s:

figs/eq_0708.gif

Some of this error is certainly quantization error (it's impossible that the total CPU consumption of this fetch was actually zero). A few microseconds are the result of measurement intrusion error.

Finally, here is an example of an E < 0 quantization error in Oracle9 i trace data:

 WAIT #34: nam='db file sequential read' ela= 16493 p1=1 p2=33254 p3=1 WAIT #34: nam='db file sequential read' ela= 11889 p1=2 p2=89061 p3=1 FETCH #34:c=10000,e=29598,p=2,cr=5,cu=0,mis=0,r=1,dep=3,og=4,tim=1017039276445157

In this case, we have E = -8784 m s:

figs/eq_0709.gif

It is possible that some CPU consumption double-counting has occurred in this case. It is also likely that the effect of quantization error is a dominant contributor to the attribution of time to the fetch call. The 8,784- m s over-attribution is evidence that the actual total CPU consumption of the database call was probably only about (10,000 - 8,784) m s = 1,216 m s.

7.7.5 Bounds of Quantization Error

The amount of quantization error present in Oracle's timing statistics cannot be measured directly. However, the statistical properties of quantization error can be analyzed in extended SQL trace data. First, there's a limit to how much quantization error there can be in a given set of trace data. It is easy to imagine the maximum quantization error that a set of elapsed durations like Oracle's e and ela statistics might contribute. The worst total quantization error for a sequence of e and ela statistics occurs when all the individual quantization errors are at their maximum magnitude and the signs of the quantization errors all line up.

Figure 7-9 exhibits the type of behavior that I'm describing. This drawing depicts eight very-short-duration system calls that happen to all cross an interval timer's clock ticks. The actual duration of each event is practically zero, but the measured duration of each event is one clock tick. The total actual duration of the system calls shown is practically zero, but the total measured duration is 8 clock ticks. For this set of n = 8 system calls, the total quantization error is essentially nr _x , where r _x is, as described previously, the resolution of the interval timer upon which the x characteristic is measured.

Figure 7-9. A worst-case type scenario for the accumulation of quantization error for a sequence of measured durations

It shouldn't take you long to notice that the situation in Figure 7-9 is horribly contrived to suit my purpose of illustrating a point. For things to work out this way in reality is extremely unlikely . The probability that n quantization errors will all have the same sign is only 0.5 ⁿ . The probability of having n = 8 consecutive negative quantization errors is only 0.00390625 (that's only about four chances in a thousand). There's less than one chance in 10 ⁸⁰ that n = 265 statistics will all have quantization errors with the same sign.

For long lists of elapsed duration statistics, it is virtually impossible for all the quantization errors to "point in the same direction." Yet, my contrivance in Figure 7-9 goes even further. It assumes that the magnitude of each quantization error is maximized. The odds of this happening are even more staggeringly slim than for the signs to line up. For example, the probability that the magnitude of each of n given quantization error values exceeds 0.9 is only (1 - 0.9) ⁿ . The odds of having each of n = 265 quantization error magnitudes exceed 0.9 are one in 10 ²⁶⁵ .

What Does "One Chance in Ten to [Some Large Power]" Mean?

To put the probability "one chance in 10 ⁸⁰ " into perspective, realize that scientists estimate that there are only about 10 ⁸⁰ atoms in the observable universe (source: http://www.sunspot.noao.edu/sunspot/pr/answerbook/universe.html, http://www.sciencenet.org.uk/database/Physics/0107/p01539d.html, and others). This means that if you could print 265 uniformly distributed random numbers between -1 and +1 on every atom in our universe, you should expect that only one such atom would have all 265 numbers on it with all the same sign.

The other probability, "one chance in 10 ²⁶⁵ ," is even more mind-boggling to imagine. To do it, imagine nesting universes three levels deep. That is, imagine that every one of the 10 ⁸⁰ atoms in our universe is itself a universe with 10 ⁸⁰ universes in it, and that each of those universes contains 10 ⁸⁰ atoms. At that point, you'd have enough atoms to imagine one occurrence of a "one chance in 10 ²⁴⁰ " atom. Even in universes nested three levels deep, the odds of finding an atom with all 265 of its random numbers exceeding 0.9 in magnitude would still be only one chance in 10,000,000,000,000,000,000,000,000.

For n quantization errors to all have the same sign and all have magnitudes greater than m , the probability is the astronomically unlikely product of both probabilities I've described:

P ( n quantization error values areall greater than m or all less than - m )= (0.5) ⁿ (1 - m ) ⁿ

Quantization errors for elapsed durations (like Oracle e and ela statistics) are random numbers in the range:

- r _x < E < r _x

where r _x is the resolution of the interval timer from which the x statistic (where x is either e or ela ) is obtained.

Because negative and positive quantization errors occur with equal probability, the average quantization error for a given set of statistics tends toward zero, even for large trace files. Using the central limit theorem developed by Pierre Simon de Laplace in 1810, you can even predict the probability that quantization errors for Oracle e and ela statistics will exceed a specified threshold for a trace file containing a given number of statistics.

I've begun work to compute the probability that a trace file's total quantization error (including the error contributed by c statistics) will exceed a given threshold; however, I have not yet completed that research. The problem in front of me is to calculate the distribution of the quantization error produced by c , which, as I've said already, is complicated by the nature of how c is tallied by polling. I intend to document my research in this area in a future project.

Happily, there are several pieces of good news about quantization error that make not yet knowing how to quantify it quite bearable:

In the many hundreds of Oracle trace files that we have analyzed at www.hotsos.com, it has been extremely uncommon for a properly collected (see Chapter 6) file's total unaccounted-for duration to exceed about 10% of total response time.
In spite of the possibilities afforded by both quantization error and CPU consumption double-counting, it is apparently extremely rare for a trace file to contain negative unaccounted-for time whose magnitude exceeds about 10% of total response time.
In cases where unaccounted-for time accounts for more than 25% of a properly collected trace file's response time, the unaccounted-for time is almost always caused by one of the two remaining phenomena that I'll discuss in the following sections.
The presence of quantization error has not yet hindered our ability to properly diagnose the root causes of performance problems by analyzing only Oracle extended SQL trace data, even in Oracle8 i trace files in which all statistics are reported with only one-centisecond resolution.
Quantization error becomes even more of a non-issue in Oracle9 i with the improvement in statistical resolution.

Sometimes, the effect of quantization error can cause loss of faith in the validity of Oracle's trace data. Perhaps nothing can be more damaging to your morale in the face of a tough problem than to gather the suspicion that the data you're counting on might be lying to you. A firm understanding of the effects of quantization error is possibly your most important tool in keeping your faith.

Top