Section 7.5. Measurement Intrusion Effect

7.5 Measurement Intrusion Effect

Any software application that attempts to measure the elapsed durations of its own subroutines is susceptible to a type of error called measurement intrusion effect [Malony et al. (1992)]. Measurement intrusion effect is a type of error that occurs because the execution duration of a measured subroutine is different from the execution duration of the subroutine when it is not being measured. In recent years , I have not had reason to suspect that measurement intrusion effect has meaningfully influenced any Oracle response time measurement I've analyzed . However, understanding the effect has helped me fend off illegitimate arguments against the reliability of Oracle operational timing data.

To understand measurement intrusion, imagine the following problem. You have a program called U , which looks like this:

 program U {     # uninstrumented     do_something; }

Your goal is to find out how much time is consumed by the subroutine called do_something . So you instrument your program U , resulting in a new program called I :

 program I {     # instrumented     e0 = gettimeofday;  # instrumentation     do_something;     e1 = gettimeofday;  # instrumentation     printf("e=%.6f sec\n", (e1-e0)/1E6); }

You would expect this new program I to print the execution duration of do_something . But the value it prints is only an approximation of do_something 's runtime duration. The value being printed, e1 - e0 converted to seconds, contains not just the duration of do_something , but the duration of one gettimeofday call as well. The picture in Figure 7-3 shows why.

Figure 7-3. The elapsed time e ₁ - e is only an approximation of the duration of do_something; the duration also includes the total execution duration (shaded area) of one gettimeofday call

The impact of measurement intrusion effect upon program U is the following:

Execution time of I includes two gettimeofday code paths more than the execution time of U .
The measured duration of do_something in I includes one full gettimeofday code path more than do_something actually consumes.

This impact is minimal for applications in which the duration of one gettimeofday call is small relative to the duration of whatever do_something -like subroutine you are measuring. However, on systems with inefficient gettimeofday implementations (I believe that HP-UX versions prior to release 10 could be characterized this way), the effect could be meaningful.

Measurement intrusion effect is a type of systematic error . A systematic error is the result of an experimental "mistake" that is consistent across measurements [Lilja (2000)]. The consistency of measurement intrusion makes it possible to compute its influence upon your data. For example, to quantify the Oracle kernel's measurement intrusion effect introduced by gettimeofday calls, you need two pieces of data:

The number of timer calls that the Oracle kernel makes for a given operation.
The expected duration of a single timer call.

Once you know the frequency and average duration of your Oracle kernel's timer calls, you have everything you need to quantify their measurement intrusion effect. Measurement intrusion is probably one reason for the missing time that you will encounter when performing an Oracle9 i clock-walk (Chapter 5).

Finding these two pieces of data is not difficult. You can use the strace tool for your platform to find out how many timer calls your Oracle kernel makes for a given set of database operations. To compute the expected duration of one timer call, you can use a program like the one shown in Example 7-4. This code measures the distance between adjacent gettimeofday calls and then computes their average duration over a sample size of your choosing.

Example 7-4. Measuring the measurement intrusion effect of calls to gettimeofday

 #!/usr/bin/perl     # $Header: /home/cvs/cvm-book1/measurement0intrusion/mef.pl,v 1.4 2003/03/19 04:38:48  cvm Exp $ # Cary Millsap (cary.millsap@hotsos.com) # Copyright (c) 2003 by Hotsos Enterprises, Ltd. All rights reserved.     use strict; use warnings; use Time::HiRes qw(gettimeofday);     sub fnum($;$$) {     # return string representation of numeric value in     # %.${precision}f format with specified separators     my ($text, $precision, $separator) = @_;     $precision = 0   unless defined $precision;     $separator = "," unless defined $separator;     $text = reverse sprintf "%.${precision}f", $text;     $text =~ s/(\d\d\d)(?=\d)(?!\d*\.)/$separator/g;     return scalar reverse $text; }     my ($min, $max) = (100, 0); my $sum = 0; print "How many iterations? "; my $n = <>; print "Enter 'y' if you want to see all the data: "; my $all = <>; for (1 .. $n) {     my ($s0, $m0) = gettimeofday;     my ($s1, $m1) = gettimeofday;     my $sec = ($s1 - $s0) + ($m1 - $m0)/1E6;     printf "%0.6f\n", $sec if $all =~ /y/i;     $min = $sec if $sec < $min;     $max = $sec if $sec > $max;     $sum += $sec; } printf "gettimeofday latency for %s samples\n", fnum($n); printf "\t%0.6f    seconds minimum\n", $min; printf "\t%0.6f    seconds maximum\n", $max; printf "\t%0.9f seconds average\n", $sum/$n;

On my Linux system used for research (800MHz Intel Pentium), this code reveals typical gettimeofday latencies of about 2 m s:

 Linux$  mef  How many iterations?  1000000  Enter 'y' if you want to see all the data:  n  gettimeofday latency for 1,000,000 samples         0.000001    seconds minimum         0.000376    seconds maximum         0.000002269 seconds average

Measurement intrusion effect depends greatly upon operating system implementation. For example, on my Microsoft Windows 2000 laptop computer (also 800MHz Intel Pentium), gettimeofday causes more than 2.5 times as much measurement intrusion effect as our Linux server, with an average of almost 6 m s:

 Win2k$  perl mef.pl  How many iterations?  1000000  Enter 'y' if you want to see all the data:  n  gettimeofday latency for 1,000,000 samples         0.000000    seconds minimum         0.040000    seconds maximum         0.000005740 seconds average

By experimenting with system calls in this manner, you can begin to understand some of the constraints under which the kernel developers at Oracle Corporation must work. Measurement intrusion effect is why developers tend to create timing instrumentation only for events whose durations are long relative to the duration of the measurement intrusion. The tradeoff is to provide valuable timing information without debilitating the performance of the application being measured.

Top

7.5 Measurement Intrusion Effect

Figure 7-3. The elapsed time e 1 - e is only an approximation of the duration of do_something; the duration also includes the total execution duration (shaded area) of one gettimeofday call

Example 7-4. Measuring the measurement intrusion effect of calls to gettimeofday

Figure 7-3. The elapsed time e ₁ - e is only an approximation of the duration of do_something; the duration also includes the total execution duration (shaded area) of one gettimeofday call