Section 12.2. Case 2: Large CPU Service Duration

12.2 Case 2: Large CPU Service Duration

One of our first www.hotsos.com customers was an Oracle Financial and Manufacturing Applications site. The customer was experiencing poor response times in several different programs ”some stock, some custom. This customer engaged us not to fix performance problems, but to teach their staff how to do it. Job one was to teach the customer's new performance analyst how to collect and use good diagnostic data. Job two was to hack our own pre-beta Sparky and Hotsos Profiler software into usable shape so the customer could use it after we had gone home. It was a nice experience for us. The new performance analyst was an applications administrator and had never actually done much performance improvement work before this project.

Over the course of a couple of months, our contact with the new performance analyst dwindled from daily phone calls to weekly emails. One day we received a call just to say hello and to brag a little bit about one of his day's accomplishments. This is the story of that accomplishment.

12.2.1 Targeting

Over the previous few weeks, our friend had done an excellent job of working down his company's list of targeted slow user actions. He described that he had actually gotten to the point where the conspicuous absence of performance complaints had left him with more free time at work than he was accustomed to having. So, with some of his free time, he had decided to investigate why a particular batch job had always taken so long. (Remember, the new performance analyst had worked closely with the applications before this performance improvement project, so he knew first-hand how long this thing took.) So he traced the program. Example 12-3 shows the resource profile for the trace file.

Example 12-3. Resource profile for Oracle Purchasing program

 Response Time Component                Duration        # Calls     Dur/Call ----------------------------- ----------------- -------------- ------------ CPU service                     1,527.5s  60.8%        158,257    0.009652s db file sequential read           432.0s  17.2%         62,495    0.006913s unaccounted-for                   209.6s   8.3% global cache lock s to x           99.9s   4.0%          3,434    0.029083s global cache lock open s           85.9s   3.4%          3,507    0.024502s global cache lock open x           57.9s   2.3%          1,930    0.029990s latch free                         26.8s   1.1%          1,010    0.026505s SQL*Net message from client        19.1s   0.8%          6,714    0.002846s write complete waits               11.1s   0.4%            155    0.071806s enqueue                            11.1s   0.4%            330    0.033606s row cache lock                     11.1s   0.4%            485    0.022887s log file switch completion          7.3s   0.3%             15    0.487333s log file sync                       3.3s   0.1%             39    0.084872s wait for DLM latch                  3.0s   0.1%             91    0.032418s global cache lock busy              1.5s   0.1%             11    0.139091s DFS lock handle                     1.4s   0.1%             43    0.032558s global cache lock null to x         0.9s   0.0%              8    0.112500s rdbms ipc reply                     0.6s   0.0%              7    0.081429s global cache lock null to s         0.4s   0.0%              7    0.060000s library cache pin                   0.1s   0.0%              7    0.015714s SQL*Net message to client           0.0s   0.0%          6,714    0.000003s file open                           0.0s   0.0%             13    0.000000s SQL*Net more data from client       0.0s   0.0%              2    0.000000s ----------------------------- ----------------- -------------- ------------ Total                           2,510.5s 100.0%

As you can see, CPU service and database file reading dominate the profile with almost 80% of the total response time. Roughly 10% more of the response time is consumed by global cache lock operations required by Oracle Parallel Server, and the final 10% was distributed over a few percentage points of unaccounted-for time and lots of inconsequential events.

There's no way by looking only at the resource profile to determine whether the CPU service consumption shown here is excessive, but to make a material impact upon the almost 42-minute response time will certainly require a reduction of the duration of the CPU service component. The first question you answer in a case like this is, "Which SQL is responsible for this CPU service consumption?" The Hotsos Profiler makes this task particularly easy, by providing a section in its output that lists the top five SQL statements that contribute to each response time component, as shown in Example 12-4.

Example 12-4. The Hotsos Profiler identifies the contribution to CPU service duration by SQL statement

 SQL Statement Id             Duration ------------------ ------------------ 704365403            1,066.4s   69.8% 3277176312             371.9s   24.3% 1107640601               8.5s    0.6% 3705838826               6.5s    0.4% 529440951                6.0s    0.4% 111 others              68.7s    4.5% ------------------ ------------------ Total                1,527.5s  100.0%

Two SQL statements completely dominate the session's consumption of CPU capacity. In the Hotsos Profiler output, each statement ID is a hyperlink that takes you to the data shown in Example 12-5. With Oracle's tkprof utility, you can accomplish the task by specifying the sort order sort=prscpu,execpu,fchcpu . With this sort order, the SQL statement that you're searching for will then show up at the top of the output.

Example 12-5. SQL text and performance statistics for statement 704365403, the top contributor to the session's CPU service consumption

 Statement Text update po_requisitions_interface set requisition_header_id=:b0  where (req_number_segment1=:b1 and request_id=:b2)     Statement Cumulative Database Call Statistics Cursor  Action         ------ Response Time -------         LIO        PIO Action   Count   Rows    Elapsed       CPU    Other      Blocks     Blocks ------- ------ ------  --------- --------- --------  ---------- ---------- Parse        0      0        0.0       0.0      0.0           0          0 Execute  1,166      0    1,455.0   1,066.4    388.6   8,216,887      3,547 Fetch        0      0        0.0       0.0      0.0           0          0 ------- ------ ------  --------- --------- --------  ---------- ---------- Total    1,166      0    1,455.0   1,066.4    388.6   8,216,887      3,547     Per Exe      1      0        1.3       0.9      0.3       7,047          3 Per Row  1,166      1    1,455.0   1,066.4    388.6   8,216,887      3,547

The information in Example 12-5 is quite revealing . Here are some interesting observations:

The statement that contributes the most CPU time to the session's response time is a very simple UPDATE that is executed 1,166 times.
However, 1,166 executions of this UPDATE statement never processed a single row.
Each execution required an average of 7,047 LIO operations (that's 8,216,887 LIOs divided by 1,166 executions) to determine that no rows matched the statement's simple WHERE clause predicate.
The database buffer cache hit ratio for this statement is very "good." It is:

Ironically, one of the reasons that this statement never percolated to the top priority for the system's performance analysts may have been that its cache hit ratio was so good, the system's performance monitoring tools regarded the statement as exemplary.

12.2.2 Diagnosis and Response

In Chapter 11, I describe a simple rule of thumb about LIO call counts: if a SQL statement requires more than about ten LIO operations per row returned per table in the FROM clause, then the statement is probably doing too many LIOs. Well, this UPDATE statement isn't a query with a FROM clause, but nevertheless, it does execute much of the same Oracle kernel code path as would the following query:

 select requisition_header_id=:b0 from po_requisitions_interface where (req_number_segment1=:b1 and request_id=:b2)

How many LIOs should be required to determine that this query returns no rows? My estimation is fewer than ten. Here's why: if a composite index existed upon the two columns REQ_NUMBER_SEGMENT1 and REQUEST_ID , then the Oracle kernel should be able to determine that the query returns no rows by simply plunging the index from root to leaf. The number of LIO operations required to execute this plunge is the height of the index. The height of an index is its BLEVEL value (for example, from DBA_INDEXES , for index segments you have analyzed ) plus one. The most enormous indexes I've ever heard of have heights of seven or less. Therefore, you should expect with a composite index upon REQ_NUMBER_SEGMENT1 and REQUEST_ID , the number of LIO operations per executions will be seven or less.

Remember, a database call's CPU consumption is usually proportional to the number of LIO operations it performs . Therefore, if you can reduce a call's number of LIO operations from 7,047 to just 7, then you can expect to reduce the database call's total CPU consumption by a similar factor of 1,000. You can thus expect for the LIO reduction to cause a total CPU consumption reduction for executions of this UPDATE statement from 1,066.4 seconds to roughly 1 second. This expected improvement of roughly 1,000 seconds is a big enough chunk of response time reduction that it's worth testing the result at this point. The recommended performance improvement activity is to create a composite index upon the two columns REQ_NUMBER_SEGMENT1 and REQUEST_ID .

12.2.3 Results

The total program response time actually dropped by far more than the 1,000-second savings that I forecast. The overachievement came from collateral benefits, including:

The second largest SQL contributor to the session's CPU service consumption, statement 3277176312, used the exact same WHERE clause as statement 704365403 used. The index creation thus had a tremendous performance improving effect upon both of the top contributors to the session's total response time.
LIO reduction reduces total session workload, not just in the CPU service category, but in other categories as well. Notably, if you can eliminate many of a session's database buffer visits , then you will usually eliminate many of the session's motives for performing disk read operations as well. Eliminating LIO operations almost always produces the collateral benefit of reduced PIO call counts as well. Reducing LIO call count can reduce waits for global cache lock. .. events, latch free events, and others as well.

Creating a new index also creates the risk of collateral damage , however. In this case, the risk of query damage was minimized because the base table was an interface table that was referenced by only a few SQL statements in the application. To be completely thorough when you create a new index (or drop an old one), you should re-check all of your application's execution plans to ensure that any plan changes introduced by the schema change are not harmful . (The Project Laredo tool described at http://www.hotsos.com is one way to accomplish this.)

12.2.4 Lessons Learned

This case highlights several important ideas:

SQL optimization is often simpler than you might expect. The key is in knowing which SQL you need to optimize.
The collateral benefit of LIO call reduction is extremely powerful.
Creating or dropping an index provides opportunity for both collateral benefit and collateral damage. Mitigating your risk requires analysis of all the potential execution plan changes that the index manipulation might inspire .
A SQL statement's database buffer cache hit ratio is not a valid measure of its efficiency.

Top