19.2 Significant Events | Oracle High Performance Tuning for 9i and 10g

< Day Day Up >

Firstly, let's simply pick out significant events from the database I have running.

COL event FORMAT a30; COL waits FORMAT 9999990; COL timeouts FORMAT 99999990; COL average FORMAT 99999990; SELECT event "Event", time_waited "Total Time", total_waits     "Waits"          ,average_wait "Average", total_timeouts "Timeouts" FROM v$system_event WHERE IdleEvent(event) IS NOT NULL ORDER BY event;

Following is the list of significant performance affecting events from my database containing the highly concurrent active Accounts schema.

Tip

The most important factor is the total time spent waiting (Total Time) not the number of waits (Waits).

Event                       Total Time   Waits    Average   Timeouts -------------------------   ----------   ------   -------   -------- LGWR wait for redo copy     29257        14557      2       5854 async disk IO               0            54         0       0 buffer busy waits           426039       131236     3       61 buffer deadlock             50977        8944       6       8944 control file heartbeat      411          1          411     1 control file parallel       7349         8888       1       0    write control file sequential     6335         3632       2       0    read db file parallel write      30626        5694       5       2815 db file scattered read      74492        28648      3       0 db file sequential read     467185       80872      6       0 db file single write        27           54         0       0 direct path read            58           108        1       0 direct path write           14           78         0       0 enqueue                     93255        9865       9       63 instance state change       0            2          0       0 latch free                  110683       29690      4       3254 library cache load lock     303          39         8       0 library cache pin           1894         186        10      0 log file parallel write     34751        21291      2       19359 log file sequential read    42           58         1       0 log file single write       22           58         0       0 log file switch             1189         79         15      0    completion log file sync               150289       15409      10      140 process startup             348          7          50      2 refresh controlfile         24           1          24      0 command row cache lock              19           14         1       0

Tip

A number of new events shown below help to divide up latch and enqueue wait events.

latch activity latch free latch: In memory undo latch latch: KCL gc element parent lat latch: cache buffer handles latch: cache buffers chains latch: cache buffers lru chain latch: checkpoint queue latch latch: enqueue hash chains latch: gcs resource hash latch: ges resource hash list latch: latch wait list latch: library cache latch: library cache lock latch: library cache pin latch: messages latch: parallel query alloc buff latch: redo allocation latch: redo copy latch: row cache objects latch: session allocation latch: shared pool latch: undo global data latch: virtual circuit queues

It makes perfect logical sense that when assessing the impact of a wait event it should be weighed against all other wait events. It is of course true that some wait events are completely debilitating to the database no matter what their percentage of occurrence. One example of this is the occurrence of buffer deadlock events in my previous V$SYSTEM_EVENT query listing. Deadlocks are always a serious problem but are fortunately very unusual. These deadlocks occur because I am trying to stress out my highly active concurrent database deliberately. I am using triggers and hot block table updates on summary columns. I get deadlocks. What a surprise.

Tip

Deadlocks are rare and are usually a result of coding design issues and not high load! However, database performance is often highly dependent on the way in which applications are built.

There are two main approaches when examining events:

Total Wait Time. Some types of events may not occur frequently but their wait times could be significant such that those total wait times exceed the total wait times of more frequently occurring events. Obviously this approach is relative between different events and their possible performance impact.
Polling Over a Period of Time. In this chapter I have used temporary tables and some very simple stored procedures to poll real-time views over a period of time. This type of statistics collection can be executed easily using STATSPACK. The reason for including this level of detail in this book is to promote understanding. Once a reasonable level of understanding is gained, using STATSPACK will be a matter of reading the manual for correct syntax. This book is intended as a performance tuning teaching guide and not a database administration reference manual.

Before we examine significant wait events in detail we need another query. I want to know what the total wait time for each event is, as a percentage measured against every other event. A query like this can give me a better idea, not a solution. Remember we are looking for symptoms of problems not solutions. Here is the query.

COL percentage FORMAT 9999999990; SELECT event "Event", total_waits "Waits", time_waited     "Total Time"          ,TO_CHAR(                (time_waited /                      (SELECT SUM(time_waited) FROM                          v$system_event                      WHERE IdleEvent(event) IS NOT NULL)          )*100, 990.99) "Percentage" FROM v$system_event WHERE IdleEvent(event) IS NOT NULL     ORDER BY event;

Here is the result. Generally I would want to examine in detail anything greater than 1%. More potentially catastrophic wait events such as deadlocks I would examine if greater than zero. Some of the more interesting events are highlighted.

Event                           Waits   Total Time   Percent -----------------------------   ------   ---------   ------- LGWR wait for redo copy         14591    29377       0.37 async disk IO                   54       0           0.00 buffer busy waits               131301   426285      5.40 buffer deadlock                 8948     51000       0.65 control file heartbeat          1        411         0.01 control file parallel write     8892     7360        0.09 control file sequential read    3635     6374        0.08 db file parallel write          5702     30652       0.39 db file scattered read          28648    74492       0.94 db file sequential read         81015    468348      5.93 db file single write            54       27          0.00 direct path read                108      58          0.00 direct path write               78       14          0.00 enqueue                         9872     93266       1.18 instance state change           2        0           0.00 latch free                      29734    110921      1.41 library cache load lock         39       303         0.00 library cache pin               186      1894        0.02 log file parallel write         21323    34836       0.44 log file sequential read        58       42          0.00 log file single write           58       22          0.00 log file switch completion      79       1189        0.02 log file sync                   15437    150731      1.91 process startup                 7        348         0.00 refresh controlfile command     1        24          0.00 row cache lock                  14       19          0.00

What are some significant database events?

Buffer Busy Waits. Waiting for a database buffer cache block to be available; the block is being read in or changed.
Datafile Scattered and Sequential Reads. Reads of datafile physical blocks on disk, where scattered means potentially random (full physical scans) in scattered noncontiguous parts of the buffer and sequential means more focused (index reads).
Direct Path Read and Writes. Passing of sorting and hash joins to temporary sort space on disk.
Free Buffer Waits. No free or clean blocks available in the database buffer cache and waiting for writing to disk of database buffer cache dirty blocks.
Row Cache Lock Waits. Metadata cache waits on schema object definitions.
Library Cache Waits. The library cache contains parsed SQL code, PL/SQL blocks, and optimized query plans.
Redo Log Waits. Redo log wait events.
Rollback and Automated Undo Waits. Wait events can help determine sizes and numbers of rollback segments. Automatic undo is easier to manage but its reliability may be in question due its very recent introduction into Oracle Database. However, I have so far heard only good reports about automated undo.
Enqueue Waits. Locking in the database either of datafile tables and rows or latches busy in buffers.
Latch Free Waits. Waiting to access an area of a buffer; a latch required to access that buffer area is busy.

Now let's examine these significant events in turn.

19.2.1 Buffer Busy Waits

A buffer busy wait occurs when more than one process is attempting to access a single database buffer cache block at once. In other words, the block is not in memory but another process is already reading it into memory or the block is already in memory but currently being changed by another process.

Obviously it is a major problem if many DML statements are attempting to alter something in a block at the same time. Let's query the V$WAITSTAT view to see exactly which types of buffers are causing a problem, and examine a few others as well.

Tip

The V$WAITSTAT performance view shows a breakdown of buffer busy wait events by class of buffer.

SELECT * FROM v$waitstat:     CLASS                   COUNT      TIME ---------------------   ------    ------- data block              141860     474325 sort block                   0          0 save undo block              0          0 segment header             157        195 save undo header             0          0 free list                    0          0 extent map                   0          0 1st level bmb                0          0 2nd level bmb                0          0 3rd level bmb                0          0 bitmap block                 0          0 bitmap index block           0          0 file header block            2         34 unused                       0          0 system undo header           0          0 system undo block            0          0 undo header                614       1487 undo block                  27        103

Data Block. Indicates database buffer cache dirty writes or an inability to read from disk due to contention with other competing requests.
Sort Block. Indicates there is not enough sort space or a temporary sort is either nonexistent on disk or is not defaulted for the current user.
Segment Header. There is contention for the first block of a table or an index. A segment can be a table or index, amongst other database objects.
Free List. There is too much free space maintained for each block.
Extent Map. Extents may be poorly sized if there is too much recursion. Perhaps extent sizes are all different or they are too small. This is likely to be less common using locally managed tablespaces as opposed to dictionary-managed tablespaces.

Tip
Dictionary-managed tablespaces will be deprecated in a future version of Oracle Database.
Bitmap Block. Bitmap contention.
File Header Block. Datafile contention.

Undo Header and Block. For manual rollback there are not enough rollback segments, they are too small or there are inconsistent sizes across all rollback segments. If automated undo is used then perhaps change the UNDO_RETENTION parameter.

Tip

Manual rollback is deprecated and replaced by automatic undo (automatic rollback). However, many Oracle9i and even Oracle8i databases are still in use. Automated undo is recommended for use with Oracle9i Database.

It appears that we have problems with data blocks (tables and indexes), segment headers, and rollback. My database uses both manual rollback and automated undo at different points in time in order to demonstrate both. We can use the DBA_EXTENTS, V$SESSION_WAIT, and V$SESSION views to find objects causing problems. This query might help in this respect.

SELECT de.segment_name, sw.event FROM dba_extents de, (       SELECT sw.event, sw.p1 AS file_id, sw.p2 AS block_id       FROM v$session_wait sw       WHERE IdleEvent(event) IS NOT NULL       AND sw.sid IN             (SELECT sid FROM v$session WHERE username =                'ACCOUNTS')       AND sw.event != 'null event' ) sw WHERE de.owner = 'ACCOUNTS' AND de.file_id = sw.file_id AND de.block_id = sw.block_id;

This previous query could be extremely slow. Any queries on the DBA_EXTENTS view in my database are incredibly slow. My physical storage structure is poorly organized and my database is extremely busy. As a result I decided to do some manual statistics collection. What I did was to create a temporary table and insert the results of the subquery semi-join between the V$SESSION and V$SESSION_WAIT views. I also know the username I am looking at, namely the Accounts schema. Something like this:

Tip

This type of analysis is easy with STATSPACK and even easier using the Oracle Database Wait Event Interface, which will be covered in Chapter 22.

DROP TABLE tmp; CREATE TABLE tmp (event VARCHAR2(64), file_id NUMBER,     block_id NUMBER); BEGIN       FOR counter IN 1.1000 LOOP             INSERT INTO tmp               SELECT sw.event AS event                 ,sw.p1 AS file_id, sw.p2 AS block_id               FROM v$session_wait sw               WHERE IdleEvent(event) IS NOT NULL               AND sw.sid IN               , (SELECT sid FROM v$session WHERE username =                  'ACCOUNTS')               AND sw.event != 'null event';              COMMIT;       END LOOP; END; /

The execution of this anonymous procedure generated over a thousand rows into my temporary table. I then did this.

SELECT COUNT(event), event FROM tmp GROUP BY event;     COUNT(EVENT)    EVENT ------------    ---------------           90    buffer busy waits           12    buffer deadlock          688    db file sequential read           97    enqueue           48    latch free           91    log file sync

Now since we have 90 buffer busy waits we have a problem. Use a query like this to find out which tables and indexes are affected.

SELECT DISTINCT(segment_name) FROM dba_extents WHERE (file_id, block_id) IN   (SELECT file_id, block_id FROM tmp WHERE event = 'buffer      busy waits') AND owner = 'ACCOUNTS';

Then I joined the temporary table and the DBA_EXTENTS view and got some interesting results. The query returned over 3,000 rows. Ouch! Well I know there are problems in my database, serious problems. That is the way I intended it to be. In fact I know which tables are probably causing the problems with sequence number generated primary keys and locking of index blocks. The GeneralLedger and StockMovement tables are prime culprits for this type of activity because they are most heavily added to using triggers. "Yukkity-yuk!" The triggers do what I want them to do. They make my coding easier and they cause performance problems.

Causes of Buffer Busy Waits

Looking back at the query on V$WAITSTAT my Accounts schema highly active concurrent database is having problems with data blocks, segment headers, undo headers, undo blocks and file header blocks; just one or two problems.

Data Block. These errors usually occur when there are index problems resulting in too many full table scans, or mass data insertions adding multiple rows to the same block. My database uses sequence number primary keys and is doing a lot of insertion activity. The larger more active insertion tables should probably use reverse key indexes on the primary keys to avoid index hot block contention.

Segment Header. This occurs generally with lack of block concurrency. Better settings for FREELISTS and INITRANS might help. Once again tablespaces are all locally managed, allowing for better automated management of physical space than for tablespaces in my Accounts schema database. This database is simply very busy. Changing tablespaces to automatic segment space management would help by better managing freespace and concurrency.

Tip

Setting values for PCTUSED, FREELIST, and FREELIST_ GROUPS in database objects such as tables will be ignored if the containing tablespace is set to automatic segment space management. Automated segment space management is reputed to realize up to 35% performance improvement, specifically in Oracle RAC environments.

Undo Header. Rollback header problems hint at creating more rollback segments if using manual rollback.
Undo Block. This error is generally touted as indicating a need to make existing rollback segments larger. I do not agree with this approach. An OLTP database requires as many rollback segments as possible, small and sized the same. Perhaps extent size increments could even be substantially decreased assuming transactions are all small.

Decreasing Buffer Busy Waits

Let's take another look at all events.

COL percentage FORMAT 9999999990; SELECT event "Event", total_waits "Waits", time_waited     "Total  ,TO_CHAR(    (time_waited /      (SELECT SUM(time_waited) FROM v$system_event    WHERE IdleEvent(event) IS NOT NULL) )*100, 990.99) "Percentage"     FROM v$system_event WHERE IdleEvent(event) IS NOT NULL ORDER BY event;     Event                            Waits   Total Time   Percent -----------------------------   ------   ----------   ------- LGWR wait for redo copy            100           84      0.00 buffer busy waits               114429      1721602      8.68 buffer deadlock                     31          116      0.00 control file heartbeat               1          410      0.00 control file parallel write       2568        40352      0.20 control file sequential read      1163        11180      0.06 db file parallel write            3618        42457      0.21 db file scattered read          114041      3148406     15.87 db file sequential read         718549     14837478     74.78 direct path read                    31            3      0.00 direct path write                    9            0      0.00 enqueue                            343        24617      0.12 latch free                         632         6341      0.03 library cache load lock              8           76      0.00 library cache pin                   86         1508      0.01 log file parallel write           5314         4155      0.02 log file sequential read            24           41      0.00 log file single write               24           17      0.00 log file sync                     1651         1810      0.01 process startup                      1           82      0.00 refresh controlfile command          1            6      0.00 row cache lock                       9          106      0.00

And let's look at buffer busy waits specifically.

COL event FORMAT a30; COL percentage FORMAT 9999999990;        SELECT event "Event" ,ROUND((time_waited /       (SELECT SUM(time_waited) FROM v$system_event WHERE IdleEvent(event) IS NOT NULL))*100, 0) "%" FROM v$system_event WHERE event = 'buffer busy waits';

My current percentage of buffer busy waits is 9%.

Event                 % ------------------   --- buffer busy waits     9

Now let's make a few changes to try to reduce those buffer busy wait events. Increase the database buffer cache size to reduce buffer contention.

ALTER SYSTEM SET db_cache_size = 64M;

Reduce multiple block read counts into the database buffer cache to reduce the amount read at once.

ALTER SYSTEM SET db_file_multiblock_read_count = 1;

Some of the larger tables are subject to heavy sequential primary key sequence insertion activity and are subject to potential concurrency problems. I do get segment header contention but no free list contention. Using reverse key indexes on these primary keys may reduce segment header contention.

ALTER INDEX XPK_GENERALLEDGER REBUILD REVERSE; ALTER INDEX XPK_STOCKMOVEMENT REBUILD REVERSE;

Using automated segment space management may help with percentage used space and freelist space management. I use the CREATE TABLESPACE command to create 2K block size table-spaces for static tables and indexes in the Accounts schema. I create tablespaces with automated segment space management. As already stated using automated segment space management could help concurrency. Note that I also place the DATA2K and INDX2K new tablespaces on two new drives, separate to that of currently used tablespaces and separately to each other. Additionally I set the DB_2K_CACHE_SIZE parameter and move the static tables into the new tablespaces.

ALTER SYSTEM SET db_2k_cache_size = 8M; CREATE TABLESPACE DATA2K       DATAFILE 'e:\oracle\oradata\test\data2k01.dbf'       SIZE 2M AUTOEXTEND ON EXTENT MANAGEMENT LOCAL       SEGMENT SPACE MANAGEMENT AUTO; CREATE TABLESPACE INDX2K       DATAFILE 'f:\oracle\oradata\test\indx2k01.dbf'       SIZE 2M AUTOEXTEND ON EXTENT MANAGEMENT LOCAL       SEGMENT SPACE MANAGEMENT AUTO; ALTER TABLE type MOVE TABLESPACE DATA2K; ALTER TABLE subtype MOVE TABLESPACE DATA2K; ALTER TABLE coa MOVE TABLESPACE DATA2K; ALTER TABLE category MOVE TABLESPACE DATA2K; ALTER TABLE stock MOVE TABLESPACE DATA2K; ALTER INDEX XAK_CATEG_TEXT REBUILD ONLINE TABLESPACE INDX2K; ALTER INDEX XPK_CATEGORY REBUILD ONLINE TABLESPACE INDX2K; ALTER INDEX XFK_COA_SUBTYPE REBUILD ONLINE TABLESPACE INDX2K; ALTER INDEX XFK_COA_TYPE REBUILD ONLINE TABLESPACE INDX2K; ALTER INDEX XPK_COA REBUILD ONLINE TABLESPACE INDX2K; ALTER INDEX XAK_STOCK_TEXT REBUILD ONLINE TABLESPACE INDX2K; ALTER INDEX XFK_S_CATEGORY REBUILD ONLINE TABLESPACE INDX2K; ALTER INDEX XPK_STOCK REBUILD ONLINE TABLESPACE INDX2K; ALTER INDEX XPK_SUBTYPE REBUILD ONLINE TABLESPACE INDX2K; ALTER INDEX XPK_TYPE REBUILD ONLINE TABLESPACE INDX2K;

Now let's take another look at buffer busy waits after allowing the database to execute my high concurrency processing for over an hour with the new tablespaces and parameter settings.

COL event FORMAT a30; COL percentage FORMAT 9999999990; SELECT event "Event" ,ROUND((time_waited /       (SELECT SUM(time_waited) FROM v$system_event       WHERE IdleEvent(event) IS NOT NULL))*100, 0) "%" FROM v$system_event WHERE event = 'buffer busy waits';

The result is totally opposite to that expected. Additionally the percentage of buffer busy wait events is climbing over time. Drilling down into the Oracle Database Wait Event Interface might help to resolve this issue more easily, or perhaps even show that there is no problem. My buffer busy waits are now 45%. I expected less than 9%.

Event                    % -------------------   ----- buffer busy waits       45

Let's take a look at all events again.

COL percentage FORMAT 9999999990; SELECT event "Event", total_waits "Waits", time_waited     "Total"  ,TO_CHAR(    (time_waited /      (SELECT SUM(time_waited) FROM v$system_event    WHERE IdleEvent(event) IS NOT NULL)  )*100, 990.99) "Percentage" FROM v$system_event WHERE IdleEvent(event) IS NOT NULL     ORDER BY event;     Event                        Waits   Total Time   Percent -------------------------   ------   ----------   ------- LGWR wait for redo copy         38           24      0.00 buffer busy waits           456308      4672151     45.07 buffer deadlock                 13           12      0.00 control file heartbeat           1          410      0.00 control file parallel    write                      1429        14323      0.14 control file sequential    read                        712         4260      0.04 db file parallel write        1610        16889      0.16 db file sequential read     409662      5630885     54.32 direct path read                37            0      0.00 direct path write               11            1      0.00 enqueue                        360        21290      0.21 instance state change            1            0      0.00 latch free                    1295          827      0.01 library cache load lock         26          270      0.00 library cache pin               68          919      0.01 log file parallel write       2944         2187      0.02 log file sequential    read                         24           47      0.00 log file single write           24           17      0.00 log file sync                 1000         1553      0.01 process startup                  1           23      0.00 refresh controlfile    command                       1            4      0.00 row cache lock                  16           11      0.00

In the previous query result the db file sequential read event is higher and db file scattered read events are nonexistent. It is possible that higher buffer busy wait events are caused by hot blocks on static tables in the new 2K-sized tablespaces, although this should be less likely because the block size is smaller.

Obviously all these changes did not help so I will change them all back. Be careful with any of these types of changes. It is possible that placing the much larger Customer and Supplier tables into the 2K block size tablespaces may help performance. The static data I did place into the 2K block size tablespaces are small enough tables as to probably be negligible. Additionally there is so much small transactional activity in my database that management of the extra table-spaces and the extra database buffer cache area caused more problems than were solved.

Tip

With respect to the futility of tuning of ratios and specifically the database buffer cache hit ratio, note how increasing the size of the database buffer cache and even splitting that cache into two had no useful effect whatsoever. Quite often a high database buffer cache hit ratio is as a result of a very much oversized database buffer cache. This is usually done in an attempt to "tune" the database buffer cache hit ratio. There is no sense in tuning the "untunable".

ALTER SYSTEM SET db_cache_size = 32M; ALTER SYSTEM SET db_file_multiblock_read_count = 4; ALTER INDEX XPK_GENERALLEDGER REBUILD NOREVERSE; ALTER INDEX XPK_STOCKMOVEMENT REBUILD NOREVERSE;     ALTER TABLE type MOVE TABLESPACE DATA; ALTER TABLE subtype MOVE TABLESPACE DATA; ALTER TABLE coa MOVE TABLESPACE DATA; ALTER TABLE category MOVE TABLESPACE DATA; ALTER TABLE stock MOVE TABLESPACE DATA;     ALTER INDEX XAK_CATEG_TEXT REBUILD ONLINE TABLESPACE    INDX; ALTER INDEX XPK_CATEGORY REBUILD ONLINE TABLESPACE    INDX; ALTER INDEX XFK_COA_SUBTYPE REBUILD ONLINE TABLESPACE     INDX; ALTER INDEX XFK_COA_TYPE REBUILD ONLINE TABLESPACE INDX; ALTER INDEX XPK_COA REBUILD ONLINE TABLESPACE INDX; ALTER INDEX XAK_STOCK_TEXT REBUILD ONLINE TABLESPACE    INDX; ALTER INDEX XFK_S_CATEGORY REBUILD ONLINE TABLESPACE    INDX; ALTER INDEX XPK_STOCK REBUILD ONLINE TABLESPACE INDX; ALTER INDEX XPK_SUBTYPE REBUILD ONLINE TABLESPACE INDX; ALTER INDEX XPK_TYPE REBUILD ONLINE TABLESPACE INDX;     DROP TABLESPACE data2k INCLUDING CONTENTS; DROP TABLESPACE indx2k INCLUDING CONTENTS;     ALTER SYSTEM SET db_2k_cache_size = 0;

19.2.2 Datafile Scattered and Sequential Reads

db file scattered read. The simplest explanation for this event is that it occurs during full physical scans, full table, or fast full index scans; there are large amounts of I/O, reading more than one block at once, effectively scattering blocks all over the database buffer cache, leading to fragmentation of the buffer.
db file sequential read. This event usually, but not always, occurs during a single block read of data from disk. The event is either an index to table ROWID pointer access or perhaps a very small, single block table read.

Tip
Other activities causing these events are usually negligible with respect to performance, especially in a highly active database.

Unfortunately there is a lot of conflicting documentation and opinion as to exactly what can be done to reduce the occurrence of these events, and even whether their frequency should be reduced. Mostly requirements between full and precise scanning are application dependent. In other words, OLTP databases with small transactions should have mostly sequential reads. Data warehouses should most often use scattered reads. This is not an absolute for every application and every database. In other words, scattered reads are not necessarily a sign of poor performance in an OLTP database.

Additionally high amounts of full physical scanning can indicate possible fragmentation or perhaps a multiple block scanning setting inappropriate for the application (the DB_FILE_MULTIBLOCK_ READ_COUNT parameter), amongst other possible causes.

Once again full physical scans are not necessarily a bad thing. For instance fast, very costly disk RAID arrays may sometimes read multiple blocks faster than single blocks and thus more full scans will result. Obviously this is relative to how much data is read and how much is required. In other words, full physical scans are still healthier in data warehouses or when reporting. The threshold for when full physical scans become less efficient may simply be higher with more expensive equipment.

Once again isolate specific culprit objects by joining V$SESSION, V$SESSION_WAIT, and DBA_EXTENTS. In general, use of random and sequential access is often application dependent. Properly tuned OLTP database SQL code will often execute more sequential access on indexes than otherwise. Significant scattered read access is often due to full table scans on smaller tables and index fast full scans, both of which can and do have a positive impact on performance. My Accounts schema is executing heavy sequential index access, appropriate to a small-sized transaction, highly active OLTP database.

My Accounts schema database is more or less properly tuned with respect to sequential and scattered reads. Current values are as shown.

COL event FORMAT a30; COL percentage FORMAT 9999999990; SELECT event "Event"       ,ROUND(             (time_waited /                   (SELECT SUM(time_waited) FROM                       v$system_event                   WHERE IdleEvent(event) IS NOT NULL)       )*100, 0) "%" FROM v$system_event WHERE event IN       ('db file sequential read','db file scattered          read');

Multiple configuration changes have altered the statistics in my database somewhat as can been seen. These values are fairly good and consistent with my application requirements.

Event                         % --------------------------   ---- db file sequential read       76 db file scattered read        11

Let's make a very simple change, encouraging the Optimizer to perform more full physical scans. Physical scans are beneficial to larger transactions such as when batch processing or in a data warehouse.

ALTER SYSTEM SET db_file_multiblock_read_count = 128;

Sequential reads have increased and scattered reads have decreased, once again not as expected. We could temporarily disable some key indexes and have the desired effect. However, it is interesting that full physical scan rates have actually been reduced and not increased.

Event                         % --------------------------   ---- db file sequential read       84 db file scattered read         8

Both I/O read and full physical scan activity in my database is minimal due to the nature of my high concurrency coding. Decreasing the database buffer cache size at the same time might have the desired ill effect, increasing the likelihood of full table scans. Let's make the database buffer cache seriously small.

ALTER SYSTEM SET db_cache_size=4M;

And let's check the same events once again after a few minutes of execution. Once again there is little response and what there is is the opposite. If by now I have not pointed out the danger of randomly changing parameters according to published documentation then I am wasting my time. Most of the reason why I am not getting the expected results is because my application code simply does not need the configuration changes made in this chapter so far. My data model design is efficient and my application concurrency code has an absolutely minimal footprint on hardware resources. The value of data model and SQL code tuning cannot be stressed more vehemently!

Event                         % --------------------------   ---- db file sequential read      85 db file scattered read        6

19.2.3 Direct Path Reads and Writes

direct path read.
direct path write.

Tip
Both these events involve asynchronous I/O, available in Windows but not always on Unix.

These two events occur when sorting passes to disk or hash joins are not executed entirely in memory. SORT_AREA_SIZE and HASH_AREA_SIZE parameters could be increased. Increasing these parameters may use up too much memory in a large concurrent shared server environment. Increasing these parameters could cause more problems than are resolved. In short it is better to tune the root cause of a problem, in this case likely to be SQL code, rather than allow for poorly tuned SQL code statements to co-exist with oversized buffer configuration values.

19.2.4 Free Buffer Waits

The free buffer waits event occurs when there are no free clean blocks available in the database buffer cache. In other words, there are too many dirty (changed) blocks in the database buffer cache, waiting to be written to disk. The reason why this event happens is simple to explain. There is too much going on. You could get a bigger machine, faster disks or even create more database writer processes. The truth is that free buffer wait events often occur when dirty buffers are not written out to disk fast enough. The reason is one or a combination of the previously mentioned factors.

Free buffer wait events can even sometimes occur when the database buffer cache is too small, where the percentage of dirty blocks is continuously high and waiting to be written to disk. If the database buffer cache is too small high DML activity and latch wait problems could cause the database writer processes to be unable to keep up.

Once again to reiterate on a previously mentioned premise, do not simply resize buffers up to enormous amounts of memory. The general purpose of buffer caches is as to behave as fast temporary transfer mechanisms or to enhance high sharing and concurrency capability. The database buffer cache is a fast temporary transfer mechanism catering to vast differences in speed between CPU processing speed and disk I/O performance. The shared pool allows for very high concurrency or sharing of previously obtained objects and executed code. Sizing the database buffer cache and the shared pool to fill all available RAM based on high buffer cache hit ratios is not a solution: it does not solve the underlying problems. Low and high cache hit ratios generally indicate problems on a deeper level, not resolved by simply bumping up those caches to ridiculous sizes.

Tip

Tuning of cache ratios is often a fruitless exercise but they might indicate that a problem exists, though not necessarily. Do not try to tune ratios.

19.2.5 Row Cache Lock Waits

Row cache lock waits are usually SYSTEM tablespace metadata update locks or latch waits. The Accounts schema does have a small number of row cache lock waits. The number is below 1% and probably negligible. The event is called a row cache lock wait.

19.2.6 Library Cache Waits

The library cache contains parsed SQL code, PL/SQL blocks, and optimized query plans. Three types of wait events occur in the library cache.

Library Cache Lock. Loading and locking an object into memory.
Library Cache Load Lock. Cannot obtain a lock to load an object into memory.
Library Cache Pin. An object is pinned into memory.

Both row cache lock and library cache wait events occur in the shared pool. In trying to create a problem I will make the following changes in the configuration parameter file and bounce my database. Not setting the SHARED_POOL_RESERVED_SIZE parameter will force Oracle Database to automatically set that parameter to 5% of the SHARED_POOL_SIZE parameter.

shared_pool_reserved_size = 0 shared_pool_size = 4M

Tip

These parameters can be changed online using the ALTER SYSTEM command. I did and my database effectively halted when decreasing in size. Obviously what is being used must be cleared. However, I also executed the ALTER SYSTEM FLUSH SHARED_POOL command and still got the same response.

COL event FORMAT a30; COL percentage FORMAT 9999999990; SELECT event "Event"       ,ROUND(             (time_waited /                   (SELECT SUM(time_waited) FROM                      v$system_event                   WHERE IdleEvent(event) IS NOT NULL)       )*100, 2) "%" FROM v$system_event WHERE event LIKE 'library cache%' OR event LIKE 'row    cache%';

This time I get an expected result. Since my shared pool is too small I found a marked increase in both row cache and library cache areas. I also got a few very nasty ORA-00600 internal error codes in my alert log due to a serious shortage of shared pool space. Don't try this one at home! The point is made though. The smaller the shared pool is, the more shared pool type wait events will occur. On the contrary, do not size the shared pool up and up simply to cope with application SQL code and data model design issues, as you will eventually encounter scalability issues, probably at a point where you have enough paying customers to cause you a serious business problem, when you can least afford lack of scalability.

Event                           % --------------------------   ------ row cache lock                  .38 library cache pin             16.72 library cache load lock        1.59

19.2.7 Redo Log Waits

Redo log waits are affected by the redo log buffer, redo log files, and archive log files. Many things affect the efficiency of logging in Oracle Database. Performance is more likely to be adversely affected by a log buffer which is too small rather than too large. The most profound effect on log performance is usually disk I/O speed and physical architecture, during log buffer flushing to disk and copying of redo log files to archive log files.

When examining redo log waits and problems look for the redo log buffer running out of space, too much time taken to write redo logs or copying of archives, and perhaps even too much redo log file switching.

Redo Entries. How often redo entries are copied into the buffer.
Redo Writes. How often buffer entries are written to redo log files.
Redo Buffer Allocation Retries. This event indicates failure to allocate space in the redo log buffer either because of slow log writer processing speed or a log switch. Thus the log buffer could be too small or redo logs could be too small. However, this event can also indicate poorly performing disk I/O where log files are stored.
Redo Log Space Requests. Records requests for the buffer to write out to a redo log file typically during a log switch.

Tip
The redo log space requests event has been misinterpreted in the past and does not indicate requests for space in the log buffer.
Redo Sync Writes. A COMMIT command will require the database writer process to write dirty blocks to disk. Any required redo log entries must be flushed from the log buffer to redo log files prior to database writer activity, for the sake of recoverability. High frequency of this event could indicate high COMMIT rates, which may or may not be a problem, but are usually caused by application requirements or poor SQL coding.

A redo buffer allocation retries event occurs if the log buffer is too small. However, I have never found it useful, even on the largest databases, to make a log buffer larger than 1 Mb and rarely above 512K. Log buffer size is entirely dependent on transaction size. I have seen log buffers in excess of 500 Mb and this caused other problems, prior to tuning of course. The redo log buffer does not have to be equivalent to redo log file sizes. Additionally writing a large redo log file to disk can take a lot of time. Redo log buffer and file events are found by querying the V$SYSSTAT view.

COL name FORMAT a32; SELECT value, name FROM v$sysstat WHERE name LIKE '%redo%' ORDER BY 1 DESC;     VALUE       NAME --------    ----------------------- 86290896    redo size   927748    redo wastage   293852    redo entries   175867    redo blocks written    25776    redo write time    23132    redo synch time     3920    redo writer latching time     3569    redo writes     1870    redo synch writes       50    redo buffer allocation retries        0    redo log space requests        0    redo log space wait time        0    redo log switch interrupts        0    redo ordering marks

There is no sense in having a huge redo log buffer. The redo log buffer is cyclic in nature. This means that redo log entries can be written to one part of the buffer as other entries in another part of the buffer are flushed to disk. Contention can occur sometimes when the redo log buffer is too small. However, depending on database activity, configuration and application behavior, a large redo log buffer could possibly cause problems by having to flush large chunks of data to disk. Once again a memory cache buffer is a high-speed bridge between CPU and disk storage, not a temporary storage area. The purpose of buffers is twofold: firstly, to manage speed differences between CPU and disk storage and secondly, to take advantage of CPU speed over that of disk storage.

Tip

Most redo log waits are as a direct result of poor I/O response time or contention, rarely log buffer size unless the LOG_BUFFER parameter has been tampered with.

Do not always assume that the redo log buffer is the source of redo log and archive problems. One very easy method of assessing redo log problems is by checking time stamps for both redo logs and archive logs on disk. If the dates are all the same then redo log files might be too small. Obviously this task can be accomplished using various Oracle Database views. Personally I prefer to simply examine file timestamps in the operating system. Automated scripting for this type of functionality is also faster built at the operating-system level. Checking timestamps on archive logs can be done with the following query.

SELECT * FROM (  SELECT TO_CHAR(completion_time,'MM/DD/YYYY HH24:MI:SS')      "Timestamp" ,name "Archive Log" FROM v$archived_log ORDER BY 1 DESC) WHERE ROWNUM < 10;

My timestamps are very close together and thus redo log file sizes in my database are probably too small. However, my database is not archived and there are no archive logs being written. As far as my database is concerned there is nothing to tune.

Timestamp               Archive Log --------------------    -------------------------------------- 04/02/2003 20:12:55     F:\ORACLE\ORADATA\TEST\ARCHIVE\                            ARC_1350.ARC 04/02/2003 20:12:54     F:\ORACLE\ORADATA\TEST\ARCHIVE\                            ARC_1349.ARC 04/02/2003 20:12:53     F:\ORACLE\ORADATA\TEST\ARCHIVE\                            ARC_1347.ARC 04/02/2003 20:12:53     F:\ORACLE\ORADATA\TEST\ARCHIVE\                            ARC_1348.ARC 04/02/2003 20:12:52     F:\ORACLE\ORADATA\TEST\ARCHIVE\                            ARC_1346.ARC 04/02/2003 20:12:51     F:\ORACLE\ORADATA\TEST\ARCHIVE\                            ARC_1345.ARC 04/02/2003 20:12:50     F:\ORACLE\ORADATA\TEST\ARCHIVE\                            ARC_1343.ARC 04/02/2003 20:12:50     F:\ORACLE\ORADATA\TEST\ARCHIVE\                            ARC_1344.ARC 04/02/2003 20:12:49     F:\ORACLE\ORADATA\TEST\ARCHIVE\                            ARC_1342.ARC

If the database is periodically halting then archiving could be temporarily halting the database, waiting for redo log files to be copied to archive log files. A situation is possible where all redo files contain nonarchived redo log entries. The redo log files are cyclic just like the redo log buffer. If all redo log files are waiting for copying to archive log files then no logging activity can occur until at least one redo log file is archived. The database will wait for archiving to catch up. In this case perhaps create more redo log files or another archive process. The speed of disk storage and network communication to the disks containing redo log and archive log files could also be an issue. For instance, standby database log file transfers should have a good network connection.

Three other interesting facts about redo log files:

Duplexed Redo Logs. Multiple redo log file copies or duplexed redo log files are not an issue for performance, especially on a multiple CPU platform. Redo log files will be written in parallel. Redo log files are I/O-bound and not CPU-bound. This means that the I/O disk structure is so much more important to redo logs as to make CPU use more or less negligible.
Log Appending. Redo and archive log files are written sequentially by appending and are best placed onto sequential access disk storage RAID array structures.
Alternating Log Files. In extreme circumstances alternate redo log files can be split onto separate disk storage areas since they may be open at the same time where the first redo log file is being archived and the second flushed from the redo log buffer.

Tuning redo and archive details is not all about the redo log buffer. "Guesstimate" changes to the redo log buffer will probably not solve performance problems and could make them worse.

19.2.8 Rollback and Undo Waits

Rollback segment and automated undo wait statistics can be obtained from the V$WAITSTAT performance view as shown. My Accounts schema database uses both manual rollback segments and automated undo, optionally with either switched online at any one time.

SELECT class "Event", count, time FROM v$waitstat       WHERE class LIKE '%undo%' ORDER BY class;     Event                 COUNT   TIME -------------------   -----   ---- save undo block           0      0 save undo header          0      0 system undo block         0      0 system undo header        0      0 undo block               16     63 undo header             128    355

Manual Rollback Waits

Comparing rollback wait events and database buffer cache activity is a little vague at best. If there are a large amount of rollback or automated undo wait events in relation to general database activity a number of things could be inferred. The obvious answer is that an application is performing a lot of rollback activity. A high incidence of ROLLBACK command instigated rollback activity is probably abnormal. Application rollback activity is much more likely to be caused by heavy DML activity resulting in heavy rollback use obtaining consistent point-in-time snapshot views of data. This is normal. Increasing commit frequency at the application level can help to alleviate rollback use but can cause other problems as well.

Comparing rollback wait events to database buffer cache events should produce a very low ratio. Perhaps a high ratio implies that rollback space is being waited for. If the percentage of rollback space wait events to database buffer cache is high then perhaps there are either not enough rollback segments or those rollback segments are not large enough. This query shows a negligible ratio of well below 1% using manual rollback segments.

SELECT (       (SELECT SUM(count) FROM v$waitstat       WHERE class IN ('undo header', 'undo block'))       /       (SELECT SUM(value) FROM v$sysstat       WHERE name IN ('db block gets', 'consistent gets')) )*100 "Rollback vs DBCache" FROM DUAL;     Rollback vs DBCache ----------------------          .004774797

High rollback activity may not necessarily indicate a problem. High rollback rates could be an application requirement. However, transaction failures would be an issue. Investigate applications.

Note

Oracle Database 10 Grid Manual rollback is deprecated.

Automated Undo Waits

Executing the same query using automated undo rollback, with the database under very heavy contention, gives slightly different, and perhaps slightly worse performance. The difference in performance is 0.3%. This is such a low value as to be completely negligible.

SELECT (       (SELECT SUM(count) FROM v$waitstat       WHERE class IN ('undo header', 'undo block'))       /       (SELECT SUM(value) FROM v$sysstat       WHERE name IN ('db block gets', 'consistent gets')) )*100 "Rollback vs DBCache" FROM DUAL;     Rollback vs DBCache ----------------------          .007855806

Tip

It is quite possible that sometimes a database administrator could tune manual rollback segments to perform better than automated undo segments. This point is probably a matter of opinion amongst database administrators.

Note

Oracle Database 10 Grid Manual rollback is deprecated.

On the other hand, my database shows wait statistics considerably higher using automated undo as opposed to using manual rollback, as shown in the previous query of the V$WAITSTAT performance view. This could have something to do with my geriatric hardware.

SELECT class "Event", count, time FROM v$waitstat WHERE class LIKE '%undo%' ORDER BY class;     Event                  COUNT   TIME --------------------   -----   ---- save undo block            0      0 save undo header           0      0 system undo block          0      0 system undo header         3    124 undo block               206    748 undo header              179   2111

Automated undo has a parameter called UNDO_RETENTION. This parameter allows retention of committed rollback database changes for a period of time. The more potentially unnecessary data retained, the more out of control automated undo could become. Defaults using the Oracle Database Configuration Assistant appear to be 900 for an OLTP database creation and 10,800 for a data warehouse database creation. It might be best not to oversize this parameter. The V$UNDOSTAT performance view is used for automated undo and provides probably more useful information about automated undo than the V$ROLLSTAT view provides about manual rollback.

A lot of experimentation, particularly with manual rollback segments, is in place in other chapters in this book. In general, a highly concurrent small transaction load OLTP database will require as many small manual rollback segments as possible. In the case of automated undo a lower value for the UNDO_RETENTION parameter is appropriate to small transactions. The opposite is true for large transactions when executing batch processing or in a data warehouse.

19.2.9 Enqueue Waits

This type of wait is an event caused by a lock on a row or a buffer latch wait. The wait event occurs when another request is waiting for a lock held by another session process. To find locks we use the V$SESSION_WAIT and V$LOCK views. My database has an enqueue locking value of over 1% so it possibly an issue. Additionally there are deadlocks, exemplifying this point further.

Tip

Deadlocks are usually caused by poor application coding such as use of triggers. This is most certainly the case in my database.

Oracle Database has many different types of enqueue events. These are some of the more significant and common ones.

TM. DML enqueue sometimes due to lack of indexing on foreign keys or just lack of indexing in general.
TX. Transactional enqueue or exclusive lock on a row or a table. Hopefully a row and not a table.
SQ. Sequence numbers.

This query is executed against the V$LOCK performance view on my database in the Accounts schema, monitored over about 30 s. These numbers can be written off as normal.

SELECT type "Lock", COUNT(type) "Locks" FROM v$lock GROUP     BY type;     Lo     Locks ----   ------ SQ         10 TX         80

Many older relational databases implemented what was called incremental locking. Incremental locking successively locked larger objects or amounts of data as the amount of resources to maintain current locking grew. Oracle Database does not escalate locking even if contention can appear to cause this type of behavior. Block-level contention can occur in Oracle Database when enough concurrent DML row locking activity occurs in a single block, which can be counteracted by changing block level concurrency storage parameter settings. Certain types of DML activity can cause different types of locking. In general, Oracle Database locking will occur in different ways as follows:

Row Share. Read share lock on a single row.
Row Exclusive. Write lock on a row excluding write access to other sessions.
Table Share. As with rows but for an entire table.
Table Exclusive. As with rows but for an entire table.

Removing locks from a database instance can be an administration issue when serious. Resolving locking issues in terms of tuning out the locks is more of an SQL code and data model issue and possibly the result of poor application coding and design.

Enqueue numbers can be altered using the ENQUEUE_ RESOURCES parameter. Resource limitations can be monitored using the V$RESOURCE_LIMITS performance view.

COL resource FORMAT a24; SELECT RESOURCE_NAME "Resource", INITIAL_ALLOCATION "Initial" ,CURRENT_UTILIZATION "Current", MAX_UTILIZATION "Max" ,LIMIT_VALUE "Limit" FROM v$resource_limit WHERE resource_name NOT LIKE('g%') ORDER BY 1;     Resource                   Initial   Current   Max   Limit ----------------------   ---------   -------   ---   --------- branches                       247         0     0   UNLIMITED cmtcallbk                      247         0     0   UNLIMITED dml_locks                      988        71   134   UNLIMITED enqueue_locks                 2890        29    31   2890 enqueue_resources             1208        43    43   UNLIMITED max_rollback_segments           50        17    17   50 max_shared_servers               5         2     2   5 parallel_max_servers            11         0     0   11 processes                      200        26    29   200 sessions                       225        32    36   225 sort_segment_locks       UNLIMITED         0     1   UNLIMITED temporary_table_locks    UNLIMITED         0     0   UNLIMITED transactions                   247        10    14   UNLIMITED

In my database enqueue wait events occur at a steady rate of around 1%.

COL event FORMAT a30; COL percentage FORMAT 9999999990; SELECT event "Event"       ,ROUND(             (time_waited /                   (SELECT SUM(time_waited) FROM                       v$system_event                   WHERE IdleEvent(event) IS NOT NULL)       )*100, 2) "%" FROM v$system_event WHERE event = 'enqueue';     Event        % -------   ---- enqueue   1.02

Let's try to produce some enqueue wait events. Firstly, I drop all my Accounts schema foreign key indexes using a command like the following, which generates DDL commands. I have scripts to regenerate all my foreign key indexes.

SELECT 'DROP INDEX'||index_name|| ';'       FROM user_indexes WHERE index_name LIKE 'XFK_%';

Now let's check the types of locks occurring.

SELECT type "Lock", COUNT(type) "Locks" FROM v$lock GROUP BY type;

There are now a lot of TM and quite a few TX enqueue lock events since I have removed all my foreign key indexes.

Lo    Locks ---   ----- CF        1 JQ       24 MR       10 RT        1 TM       52 TS        1 TX        9 XR        1

Now I am checking all wait events again.

COL percentage FORMAT 9999999990; SELECT event "Event", total_waits "Waits", time_waited    "Total" ,TO_CHAR(   (time_waited /     (SELECT SUM(time_waited) FROM v$system_event    WHERE IdleEvent(event) IS NOT NULL)  )*100, 990.99) "Percentage" FROM v$system_event WHERE IdleEvent(event) IS NOT NULL    ORDER BY event;

As expected, enqueue wait events have increased slightly. Notice how db file scattered read events are dramatically increased, most likely due to full physical table scanning of tables during Referential Integrity checks on foreign keys with no indexes, and otherwise where the Optimizer utilizes foreign key indexes in SQL code.

Event                          Waits     Total   Percent ---------------------------   ------   -------   ------- LGWR wait for redo copy           16        28      0.00 buffer busy waits              58396    238368      9.40 buffer deadlock                    7        38      0.00 control file heartbeat             1       410      0.02 control file parallel write      369      3467      0.14 control file sequential read     283      1104      0.04 db file parallel write            86       755      0.03 db file scattered read        102816   1774183     69.94 db file sequential read        32830    426158     16.80 direct path read                  31         2      0.00 direct path write                  9         0      0.00 enqueue                          304     80489      3.17 latch free                       683      9212      0.36 library cache load lock            5        23      0.00 library cache pin                118      1748      0.07 log file parallel write          376       257      0.01 log file sequential read          24        40      0.00 log file single write             24        17      0.00 log file sync                     74       284      0.01 refresh controlfile command        1         6      0.00 row cache lock                     3        46      0.00

19.2.10 Latch Free Waits

Latches are a special type of locking mechanism, protecting buffers in memory. Latches are used to lock blocks in buffers to ensure that multiple sessions or processes do not alter the same space in memory at the same time. Unlike a lock waiting for lock release, a latch can spin or sleep and then try to access the same buffer area once again. Latches protect buffers from corruption by changes from different processes at the same time.

My database shows a latch free wait value of less than 2%. This is insignificant due to the nature of the wait event. It is normal for a database to have to wait for access to parts of Oracle Database buffer caches. The buffer caches should be a great deal smaller than physical areas in the datafiles. Sizing buffers too small can cause buffer contention or competition for memory access due to not enough available space in memory for cache buffers. On the contrary, sizing buffers too large can possibly cause the same problems. Managing of data in buffers is much more complex than managing of data in disk storage.

Latch wait events can be found by querying the V$SESSION_ WAIT performance view: drill down using the various latch V$ performance views. Latches are a major topic all by themselves. Therefore, this is a good point to end this general discussion of events and wait events and pass the examination of latches onto the next chapter. The next chapter will look into latches in detail, plus some special wait events applicable to latches, and how to handle those events.

< Day Day Up >