Regularly, we get nice letters from people who are interested in letting us know that our method and tools are helping. This case is the result of one such letter from a friend in Iceland who reduced the response time of a query from 6.5 hours to 10.9 seconds ”and fixed a previously undiscovered functional bug ”by adding four bytes to its SQL text. This is the story of how Method R helped him identify the SQL statement that was causing the problem. In the end, he improved the response time of an important batch job from nearly eight hours to just one hour .
Application targeting in this case was typical. The system owner was a bank. One of the application's batch jobs was taking so long to run that it was unable to finish in its assigned batch window. The job started at 11:00 p.m. each night, and it would sometimes run until noon the next day. The bank was limiting the number of accounts to update so that the batch job would finish before opening hours. So our friend targeted this batch job for the collection of extended SQL trace data. Example 12-11 shows the resource profile for a run that consumed nearly eight hours of run time.
Response Time Component Duration # Calls Dur/Call ----------------------------- ----------------- -------------- ------------ db file scattered read 19,051.1s 68.4% 1,828,249 0.010420s CPU service 6,889.3s 24.7% 959,148 0.007183s db file sequential read 1,892.7s 6.8% 406,417 0.004657s latch free 29.0s 0.1% 1,071 0.027106s log file switch completion 1.6s 0.0% 14 0.112143s SQL*Net message from client 0.3s 0.0% 10 0.034000s log buffer space 0.1s 0.0% 1 0.100000s log file sync 0.1s 0.0% 4 0.022500s file open 0.1s 0.0% 54 0.001296s buffer busy waits 0.0s 0.0% 14 0.002143s undo segment extension 0.0s 0.0% 2,111 0.000014s SQL*Net message to client 0.0s 0.0% 10 0.000000s ----------------------------- ----------------- -------------- ------------ Total 27,864.4s 100.0%
The top lines of Example 12-11 bear the distinct signature of inefficient SQL: lots of file reading and CPU capacity consumption. The next task is to determine which SQL statements are consuming so many resources. In this case, executing tkprof with the option sort =prsdsk,exedsk,fchdsk will produce a report with the SQL having the largest number of PIO blocks at the top.
|
The analyst in this story knew which SQL statement contributed the most to the db file scattered read problem, because he used the Hotsos Profiler contribution table shown in Example 12-12.
SQL Statement Id Duration ------------------ ------------------ 1163242303 19,028.9s 99.9% 1626975503 6.7s 0.0% 808413641 5.2s 0.0% 3187134541 3.7s 0.0% 1594054818 2.4s 0.0% 8 others 4.3s 0.0% ------------------------------------- Total 19,051.1s 100.0%
The hunt thus progressed quickly to the performance analysis of the SQL statement 1163242303 (this is the statement's hv value in the PARSING IN CURSOR section of the raw trace data). Example 12-13 shows the text and performance statistics for this statement.
|
Statement Text SELECT EIGANDI,INNLENT_ERLENT,VERDTRYGGING,SKULDFLOKKUN,VBRTEGUND,FLOKKUR FROM V_SKULDABREF_AVOXTUN WHERE EIGANDI = :b1 AND INNLENT_ERLENT = :b2 AND ((RAFVAETT = :b3 ) OR ((RAFVAETT = :b4 ) AND (INNLAUSN IS NULL OR INNLAUSN > :b5 ) AND (VIDMIDDAGS <= :b6 ))) GROUP BY EIGANDI,INNLENT_ERLENT,VERDTRYGGING,SKULDFLOKKUN,VBRTEGUND,FLOKKUR ORDER BY EIGANDI,INNLENT_ERLENT,VERDTRYGGING,SKULDFLOKKUN,VBRTEGUND,FLOKKUR Statement Cumulative Database Call Statistics Cursor Action ------ Response Time ------- LIO PIO Action Count Rows Elapsed CPU Other Blocks Blocks ------- ------ ------ --------- --------- -------- ---------- ---------- Parse 3,739 0 1.9 0.7 1.2 147 17 Execute 3,739 0 1.7 1.6 0.2 0 0 Fetch 4,212 473 23,466.4 4,135.6 19,330.8 36,566,201 36,550,345 ------- ------ ------ --------- --------- -------- ---------- ---------- Total 11,690 473 23,470.0 4,137.9 19,332.1 36,566,348 36,550,452 Per Exe 1 0 6.3 1.1 5.2 9,780 9,092 Per Row 8 1 49.6 8.8 40.9 77,307 71,868
The SQL text for this statement is not too difficult to understand. In spite of the Icelandic object names , the statement is simply a query from a single object. There's apparently not even a join. However, the statement's execution plan obtained from the trace file's STAT lines and shown in Example 12-14 reveal a different story.
Something a little more complicated than a simple one-table query is going on here. In fact, V_SKULDABREF_AVOXTUN is a view. The raw trace file confirms it. The parse call for the SELECT shown in Example 12-13 required a recursive parse, execute, and fetch calls against VIEW$ , in the same manner as the query from DBA_OBJECTS that I describe in Chapter 5. The definition of the view V_SKULDABREF_AVOXTUN becomes the next target of our attention. One thing that an Oracle extended SQL trace file does not contain is the definition of each view accessed by SQL identified within the trace file. However, because the trace file makes it clear that V_SKULDABREF_AVOXTUN is a view, it's a simple enough matter to query DBA_VIEWS to determine the definition. Example 12-15 shows that definition.
Rows Row Source Operation (Object Id) --------------- ---------------------------------------------------------- 473 SORT ORDER BY 473 SORT GROUP BY 974 VIEW V_SKULDABREF_AVOXTUN 974 SORT UNIQUE 974 UNION-ALL 686 HASH JOIN 886 TABLE ACCESS FULL SKULDABREF_AVOXTUN(21435) 103,247 TABLE ACCESS FULL VBR_FLOKKAR(19409) 288 FILTER 288 HASH JOIN 940 TABLE ACCESS BY INDEX ROWID BREF(19460) 2,649 INDEX RANGE SCAN(20593) 10,744 TABLE ACCESS FULL VBR_FLOKKAR(19409)
CREATE OR REPLACE VIEW v_skuldabref_avoxtun ( eigandi, innlent_erlent, verdtrygging, skuldflokkun, vbrtegund, flokkur, rafvaett, ostadladur, brefnumer, vidmiddags, innlausn, nafnverd, kaupkrafa, ees, vextir ) AS select s.eigandi, s.innlent_erlent, s.verdtrygging, s.skuldflokkun, s.vbrtegund, s.flokkur, 'N' rafvaett, nvl(f.ostadlad,'N') ostadladur, s.brefnumer, s.vidmiddags, s.innlausn, s.nafnverd, s.kaupkrafa, s.ees, null vextir from fjastofn.vbr_flokkar f, fja_pfm.skuldabref_avoxtun s where f.audkenni=s.flokkur and f.rafvaett is null union select s.eig eigandi, 'I', Fja_pfm.Ymis_Foll.TegundTryggingar(f.visitala) verdtrygging, f.skuldaranumer skuldflokkun, f.vbrtegund vbrtegund, s.aud flokkur, 'J' rafvaett, 'N' ostadladur, 'x' brefnumer, to_date(null) vidmiddags, to_date(null) innlausn, s.nav nafnverd, 0 kaupkrafa, '+' ees, null vextir from fjastofn.vbr_flokkar f, fja_pfm.bref s where f.audkenni=s.aud and f.rafvaett is not null /
The analyst discussed the view definition with a developer who understood its business purpose. From that discussion, the analyst and developer were able to prove that the view definition was flawed. Using a UNION of the two SELECT statements in the view definition instead of a UNION ALL was causing two problems:
It caused a very costly but unneeded SORT UNIQUE row source operation. (The new data that the Oracle release 9.2 kernel emits in the STAT lines would better highlight the enormous cost.)
It caused a bug as well, because the UNION erroneously eliminated rows that the application users needed.
Example 12-16 shows the execution plan of the statement shown in Example 12-13 after the view definition was corrected by inserting the bytes ALL into the view definition.
Rows Row Source Operation (Object Id) --------------- ---------------------------------------------------------- 473 SORT ORDER BY 473 SORT GROUP BY 974 VIEW V_SKULDABREF_AVOXTUN 974 UNION-ALL 686 HASH JOIN 886 INDEX RANGE SCAN(21436) 103,314 TABLE ACCESS FULL VBR_FLOKKAR(19409) 288 FILTER 288 HASH JOIN 940 INDEX RANGE SCAN(29887) 10,744 TABLE ACCESS FULL VBR_FLOKKAR(19409)
Example 12-17 shows the more exciting news. The query that had previously consumed 23,470.0 seconds of response time now consumes only 10.9 seconds. It produces the same (actually better) application output; it just consumes about 6.5 fewer hours to do it.
Statement Text SELECT EIGANDI,INNLENT_ERLENT,VERDTRYGGING,SKULDFLOKKUN,VBRTEGUND,FLOKKUR FROM V_SKULDABREF_AVOXTUN WHERE EIGANDI = :b1 AND INNLENT_ERLENT = :b2 AND ((RAFVAETT = :b3 ) OR ((RAFVAETT = :b4 ) AND (INNLAUSN IS NULL OR INNLAUSN > :b5 ) AND (VIDMIDDAGS <= :b6 ))) GROUP BY EIGANDI,INNLENT_ERLENT,VERDTRYGGING,SKULDFLOKKUN,VBRTEGUND,FLOKKUR ORDER BY EIGANDI,INNLENT_ERLENT,VERDTRYGGING,SKULDFLOKKUN,VBRTEGUND,FLOKKUR Statement Cumulative Database Call Statistics Cursor Action ------ Response Time ------- LIO PIO Action Count Rows Elapsed CPU Other Blocks Blocks ------- ------ ------ --------- --------- -------- ---------- ---------- Parse 3,722 0 2.0 0.6 1.4 44 1 Execute 3,722 0 1.3 1.4 -0.1 14 0 Fetch 4,195 473 7.6 2.8 4.8 44,764 792 ------- ------ ------ --------- --------- -------- ---------- ---------- Total 11,639 473 10.9 4.8 6.2 44,822 793 Per Exe 1 0 0.0 0.0 0.0 12 0 Per Row 8 1 0.0 0.0 0.0 95 2
The results were stunning. Example 12-18 shows the resource profile for the job after optimization. Total response time for the eight-hour batch job dropped to slightly more than one hour.
Response Time Component Duration # Calls Dur/Call ----------------------------- ----------------- -------------- ------------ CPU service 2,684.7s 73.9% 953,452 0.002816s db file sequential read 847.6s 23.3% 77,944 0.010874s unaccounted-for 93.2s 2.6% db file scattered read 5.8s 0.2% 295 0.019627s log file switch completion 1.6s 0.0% 7 0.234286s latch free 1.0s 0.0% 362 0.002873s file open 0.1s 0.0% 49 0.002041s log file sync 0.1s 0.0% 7 0.011429s buffer busy waits 0.0s 0.0% 1 0.010000s SQL*Net message from client 0.0s 0.0% 10 0.001000s SQL*Net message to client 0.0s 0.0% 10 0.000000s ----------------------------- ----------------- -------------- ------------ Total 3,634.1s 100.0%
Before correcting the view definition for V_SKULDABREF_AVOXTUN , the bank had restricted the number of accounts upon which they would allow the job to run. Otherwise, the job's execution would violate the morning's online window by several hours. After making the correction, they can feed virtually any parameters they want to the query and it will not take considerably longer to run. Since optimization, the batch job has never even gotten close to the eight-hour runtime that the "before" job required.
This case illustrates the following points:
The resource profile pattern of large db file ... and CPU service durations usually indicates the use of inefficient SQL somewhere within the user action. To fix the problem, you have to find that SQL.
The extended SQL trace file contained exactly the information that the analyst needed to optimize the targeting process for finding the root cause of his batch job's performance problem. Even though the view definition was not present in the trace data, the trace file contained information that focused our attention upon the view definition as the source of the performance problem.
Sometimes, the scrutiny of performance analysis exposes functional bugs . In this case, the performance analyst was able to determine that not only was a specific query doing more work than it should have, it was actually returning an incomplete result set in some circumstances.
Top |