To understand a fourth cause of unaccounted-for time in a properly collected Oracle trace file, let's perform a brief thought experiment. Imagine a program P that consumes exactly ten seconds of user-mode CPU time and produces an output on your terminal. Imagine that you were to run this program in a loop on a system with one CPU. If you were the only user connected to this system, you should expect response time of ten seconds for each execution of P .
If you were to observe the CPU utilization of this single-CPU system during your single-user repetitive execution of P , you would probably notice your CPU running at 100% utilization for the duration of your job. But what if you were to add a second instance of the P loop on the single-CPU system? In any ten-second elapsed time interval, there is only ten seconds' worth of CPU capacity available on the single-CPU machine. We thus cannot expect to accomplish two complete executions of a program that consumes ten seconds of capacity in one ten-second interval. We would expect that the response time of each P would increase to roughly 20 seconds each. That's how long it would take for one CPU to provide ten seconds' worth of CPU capacity to each of two competing processes, if its capacity were to be dispensed fairly and in small time slices to the two competing processes.
Let's assume that we had instrumented our code in a manner similar to how the Oracle kernel does it, as I've shown in Example 7-6.
e0 = gettimeofday; c0 = getrusage; P; # remember, P makes no system calls c1 = getrusage; e1 = gettimeofday; e = e1 - e0; c = (c1.stime + c1.utime) - (c0.stime + c0.utime); printf "e=%.0fs, c=%.0fs\n", e, c;
Then we should expect approximately the timing output shown in Table 7-2 for each given concurrency level on a single-CPU system. You should expect our program P to consume the same amount of total CPU capacity, regardless of how busy the system is. But of course, since the CPU capacity of the system is being shared more and more thinly across users as we increase the concurrency level, you should expect for the program to execute for longer and longer elapsed times before being able to obtain the ten seconds of CPU time that it needs.
Number of users running P concurrently | Timing output |
---|---|
1 | e=10s, c=10s |
2 | e=20s, c=10s |
3 | e=30s, c=10s |
4 | e=40s, c=10s |
The table shows what we expected to see, but notice that we now have created a "problem with missing time" in some of our measurements. Remember our performance model: the elapsed time of a program is supposed to approximately equal the time spent consuming CPU capacity plus the time spent executing instrumented "wait events," as in:
However, even in the two-user case, this works out to:
We can use the substitution ela = 0 because we know our program executes no instrumented "wait events" whatsoever. All our program does is consume some CPU capacity and then print the result. (And even the printf statement can be eliminated as a suspect because the call to it occurs outside of the domain of the timer calls.) As you can plainly see, c + S ela = 10 is a really lousy approximation of e = 20. Where did the "missing time" go? In our Table 7-2 cases, the problem just keeps getting worse as user concurrency increases . Where have we gone wrong in our instrumentation of P ?
The answer is easy to understand after looking again at Figure 7-1. Recall that even when a process is executing happily along in user running state, a system clock interrupt occurs on most systems every 1/100 th of a second. This regularly scheduled interrupt transitions each running process into the kernel running state to service the interrupt. Once in kernel running state, the process saves its current context and then executes the scheduler subroutine (see Section 7.1.2 earlier in this chapter). If there is a process in the ready to run state, then the system's scheduling policy may dictate that the running process must be preempted, and the ready to run process be scheduled and given an opportunity to consume CPU capacity itself.
When this happens, note what happens to our originally running process. When it is interrupted , it is transitioned immediately to kernel running state. Note in particular that the process does not get the chance to execute any application code to see what time it is when this happens. When the process is preempted, it is transitioned to ready to run state, where it waits until it is scheduled. When it is finally scheduled (perhaps only a mere 10 milliseconds later), it spends enough time in kernel running state to reinstate its context, and then it returns to user running state, right where it left off.
How did the preemption activity affect the timing data for the process? The CPU time spent in kernel running state while the scheduler was preparing for the process's preemption counts as CPU time consumed by the process. But time spent in ready to run state did not count as CPU time consumed by the process. However, when the process had completed its work on P , the difference e=e1-e0 of course included all time spent in all states of the process state transition diagram. The net effect is that all the time spent in ready to run state continues to tally on the e clock, but it's not accounted for as CPU capacity consumption, or for anything else that the application is aware of for that matter. It's as if the process had been conked on the head and then awoken, with no way to account for what happened to the time spent out of consciousness.
This is what happens to each process as more concurrent processes were added to the mix shown in Table 7-2. Of course, the more processes there were waiting in ready to run state, the more each process had to wait for its turn at the CPU. The longer the wait, the longer the program's elapsed time. For three and four users, the unaccounted-for time increases proportionally. It's simply a problem of a constant- sized pie (CPU capacity) being divvied up among more and more pie-eaters (users running P ). There's really nothing in need of "repair" about the instrumentation upon P . What you need is to understand how to estimate how much of the unaccounted-for time is likely to be due to time spent not executing.
The existence and exact size of this gap is of immense value to the Oracle performance analyst. The size of this gap permits you to use extended SQL trace data to identify when performance problems are caused by excessive swapping or time spent in the CPU run queue, as you shall see in one of the case studies in Chapter 12.
Top |