Section 7.3. How Software Measures Itself | Optimizing Oracle Performance

7.3 How Software Measures Itself

Finding out how the Oracle kernel measures itself is not a terribly difficult task. This section is based on studies of Oracle8 i and Oracle9 i kernels running on Linux systems. The Oracle software for your host operating system might use different system calls than our system uses. To find out, you should see how your Oracle system behaves by using a software tool that traces system calls for a specified process. For example, Linux provides a tool called strace to trace system calls. Other operating systems use different names for the tool. There is truss for Sun Solaris ( truss is actually the original system call tracing tool for Unix), sctrace for IBM AIX, tusc for HP-UX, and strace for Microsoft Windows . I shall use the Linux name strace to refer generically to the collection of tools that trace system calls.

At the time of this writing, it is possible to find strace tools for several operating systems at http://www.pugcentral.org/howto/truss.htm.

The strace tool is easy to use. For example, you can observe directly what the Oracle kernel is doing, right when each action takes place, by executing a command like the following:

 $  strace -p 12417  read(7,

In this example, strace shows that a Linux program with process ID 12417 (which happens to have been an Oracle kernel process on my system) has issued a read call and is awaiting fulfillment of that call (hence the absence of the right parenthesis in the output shown here).

It is especially instructive to view strace output and Oracle SQL trace output simultaneously in two windows, so that you can observe exactly when lines are emitted to both output streams. The write calls that the Oracle kernel uses to emit its trace data of course appear in the strace output exactly when you would expect them to. The appearance of these calls makes it easy to understand when Oracle kernel actions produce trace data. In Oracle9 i for Linux (and Oracle9 i for some other operating systems), it is especially easy to correlate strace output and SQL trace output, because values returned by gettimeofday appear directly in the trace file as tim values. By using strace and SQL trace simultaneously in this manner, you can positively confirm or refute whether your Oracle kernel behaves like the pseudocode that you will see in the following sections.

Using strace will introduce significant measurement intrusion effect into the performance of the program you're tracing. I discuss the performance effects of measurement intrusion later in this chapter.

7.3.1 Elapsed Time

The Oracle kernel derives all of its timing statistics from the results of system calls issued upon the host operating system. Example 7-1 shows how a program like the Oracle kernel computes the durations of its own actions.

Example 7-1. How software measures its own response time

 t0 = gettimeofday;   # mark the time immediately before doing something do_something; t1 = gettimeofday;   # mark the time immediately after doing it t = t1 - t0;         # t is the approximate duration of do_something

The gettimeofday function is an operating system call found on POSIX-compliant systems. You can learn by viewing your operating system's documentation that gettimeofday provides a C language data structure to its caller that contains the number of seconds and microseconds that have elapsed since the Epoch, which is 00:00:00 Coordinated Universal Time (UTC), January 1, 1970.

Documentation for such system calls is usually available with your operating system. For example, on Unix systems, you can view the gettimeofday documentation by typing man gettimeofday at the Unix prompt. Or you can visit http://www.unix-systems.org/single_unix_specification/ to view the POSIX definition.

Note that in my pseudocode, I've hidden many mechanical details that I find distracting. For example, gettimeofday doesn't really return a time; it returns 0 for success and -1 for failure. It writes the "returned" time as a two-element structure (a seconds part and a microseconds part) in a location referenced by the caller's first argument in the gettimeofday call. I believe that showing all this detail in my pseudocode would serve only to complicate my descriptions unnecessarily.

Imagine the execution of Example 7-1 on a timeline, as shown in Figure 7-2. In the drawing, the value of the gettimeofday clock is t = 1492 when a function called do_something begins. The value of the gettimeofday clock is t ₁ = 1498 when do_something ends. Thus the measured duration of do_something is t = t ₁ - t = 6 clock ticks .

I've used the time values 1492 through 1499 to keep our discussion simple. These values of course do not resemble the actual second and microsecond values that gettimeofday would return in the twenty-first century. Consider the values I'll discuss in this book to be just the final few digits of an actual clock.

Figure 7-2. The function called do_something begins after clock tick 1492 and ends after clock tick 1498, resulting in a measured response time of 6 clock ticks

The Oracle kernel reports on two types of elapsed duration: the e statistic denotes the elapsed duration of a single database call, and the ela statistic denotes the elapsed duration of an instrumented sequence of instructions (often a system call) executed by an Oracle kernel process. The kernel performs these computations by executing code that is structured roughly like the pseudocode shown in Example 7-2. Notice that the kernel uses the method shown in Example 7-1 as the basic building block for constructing the e and ela metrics.

Example 7-2. Pseudocode showing how the Oracle kernel measures its own run times

 procedure dbcall {     e0 = gettimeofday;      # mark the wall time     ...                     # execute the db call (may call wevent)     e1 = gettimeofday;      # mark the wall time     e = e1 - e0;            # elapsed duration of dbcall     print(TRC, ...);        # emit PARSE, EXEC, FETCH, etc. line }     procedure wevent {     ela0 = gettimeofday;    # mark the wall time     ...                     # execute the wait event here     ela1 = gettimeofday;    # mark the wall time     ela = ela1 - ela0;      # ela is the duration of the wait event     print(TRC, "WAIT...");  # emit WAIT line }

7.3.2 CPU Consumption

The Oracle kernel reports not only the elapsed duration e for each database call and ela for each system call, but also the amount of total CPU capacity c consumed by each database call. In the context of the process state transition diagram shown in Figure 7-1, the c statistic is defined as the approximate amount of time that a process has spent in the following states:

user running

kernel running

On POSIX-compliant operating systems, the Oracle kernel obtains CPU usage information from a function called getrusage on Linux and many other operating systems, or a similar function called times on HP-UX and a few other systems. Although the specifications of these two system calls vary significantly, I will use the name getrusage to refer generically to either function. Each function provides its caller with a variety of statistics about a process, including data structures representing the following four characteristics:

Approximate time spent by the process in user running state
Approximate time spent by the process in kernel running state
Approximate time spent by the process's children in user running state
Approximate time spent by the process's children in kernel running state

Each of these amounts is expressed in microseconds, regardless of whether the data are accurate to that degree of precision.

You'll see shortly that, although by POSIX standard, getrusage returns information expressed in microseconds, rarely does the information contain detail at sub-centisecond resolution.

The Oracle kernel performs c , e , and ela computations by executing code that is structured roughly like the pseudocode shown in Example 7-3. Notice that this example builds upon Example 7-2 by including executions of the getrusage system call and the subsequent manipulation of the results. In a method analogous to the gettimeofday calculations, the Oracle kernel marks the amount of user-mode CPU time consumed by the process at the beginning of the database call, and then again at the end. The difference between the two marks ( c0 and c1 ) is the approximate amount of user-mode CPU capacity that was consumed by the database call. Shortly I'll fill you in on exactly how approximate the amount is.

Example 7-3. Pseudocode showing how the Oracle kernel measures its own run times and CPU consumption

 procedure dbcall {     e0 = gettimeofday;          # mark the wall time     c0 = getrusage;             # obtain resource usage statistics     ...                         # execute the db call (may call wevent)     c1 = getrusage;             # obtain resource usage statistics     e1 = gettimeofday;          # mark the wall time     e = e1 - e0;                # elapsed duration of dbcall     c = (c1.utime + c1.stime)       - (c0.utime + c0.stime);  # total CPU time consumed by dbcall     print(TRC, ...);            # emit PARSE, EXEC, FETCH, etc. line }     procedure wevent {     ela0 = gettimeofday;        # mark the wall time     ...                         # execute the wait event here     ela1 = gettimeofday;        # mark the wall time     ela = ela1 - ela0;          # ela is the duration of the wait event     print(TRC, "WAIT...");      # emit WAIT line }

Top