Buffer Busy Waits | Oracle Wait Interface: A Practical Guide to Performance Diagnostics & Tuning (Osborne ORACLE Press Series)

Until Oracle9 i Database, the buffer busy waits event parameters are file#, block#, and ID (reason code). In Oracle Database 10 g Release 1, the first two parameters remain the same, but the third parameter gives the block class#. This wait event falls under the Concurrency wait class. Keep the following key thoughts in mind when dealing with the buffer busy waits event.

The Oracle session is waiting to pin a buffer. A buffer must be pinned before it can be read or modified. Only one process can pin a buffer at any one time.
buffer busy waits indicate read/read, read/write, or write/write contention .
The appropriate action to take depends on the reason encoded in the P3 parameter.

A session that reads or modifies a buffer in the SGA must first acquire the cache buffers chains latch and traverse the buffer chain until it finds the necessary buffer header. Then it must acquire a buffer lock or a pin on the buffer header in shared or exclusive mode, depending on the operation it intends to perform. Once the buffer header is pinned, the session releases the cache buffers chains latch and performs the intended operation on the buffer itself. If a pin cannot be obtained, the session waits on the buffer busy waits wait event. This wait event does not apply to read or write operations that are performed in sessions ‚ private PGAs.

Common Causes, Diagnosis, and Actions

The following four pieces of information are essential for diagnosing the buffer busy waits problem when it is the leading bottleneck that slows a process down.

The primary reason code that represents why a process fails to get a buffer pin.
The class of block that the buffer busy waits wait event is for.
The SQL statement that is associated with the buffer busy waits event.
The segment that the buffer belongs to.

Of the four, the first two are the most important as they reveal the reason a process fails to get a buffer pin and what class of block it is. Your corrective action depends on what they are, not on the number of buffer busy waits that a segment has encountered as tracked by the V$SEGMENT_STATISTICS in the Oracle9 i Database. Shortly, we will show you how to correct the buffer busy waits problem, but first let us talk about the reason code.

A session may fail to get a pin on a buffer header for various reasons. The reasons are represented by a set of codes. Up to Oracle9 i Database, a process that waits on the buffer busy waits event publishes the reason code in the P3 parameter of the wait event. Oracle uses a four-digit code in versions up to Oracle 8.0.6 and a three-digit code starting in Oracle 8.1.6. The Oracle Metalink note #34405.1 provides a table of reference. The codes are also listed in Chapter 3. Although there are at least 10 different reasons why a buffer busy waits wait event can occur, codes 130 and 220 (equivalent to codes 1013 and 1016 in Oracle8.0) are most common. Basically, a code number that is less than 200 means the wait is I/O related .

Reason code 130 means there are multiple sessions concurrently requesting the same data block that is not already in the buffer cache and has to be read from disk. This is typical in applications that spawn multiple concurrent threads or sessions, and each one executes the same query that goes after the same data set. In this case, you can check the sessions ‚ logon time in the V$SESSION view, and chances are you will find them only a few seconds apart. When multiple sessions request the same data block that is not in the buffer cache, Oracle is smart enough to prevent every session from making the same operating system I/O call. Otherwise , this can severely increase the number of system I/Os. Instead, Oracle allows only one of the sessions to perform the actual I/O, while others wait for the block to be brought into the buffer cache. The other sessions wait for the block on the buffer busy waits event and the session that performs the I/O waits on the db file sequential read (or the db file scattered read ) wait event. You will notice that the buffer busy waits and the db file sequential read events share the same P1 (file#) and P2 (block#) values.

Reason code 220 indicates there are multiple sessions trying to concurrently modify different rows within the same block that is in the buffer cache. This symptom is typical in applications with high DML concurrency. Unfortunately, a block can be pinned by only one process at any one time. The other concurrent processes must wait on the buffer busy waits wait event until the first change is complete. This is a good thing; otherwise the block will be corrupted.

Sadly, the Oracle Database 10 g Release 1 drops the reason code. The P3TEXT of the buffer busy waits event is changed from ID to CLASS#. (Not to be confused with the WAIT_CLASS#, which represents the wait event class or category.) This CLASS# refers to the class of blocks in the V$WAITSTAT view as shown in the following listing. Class 1 corresponds to the ‚“data block, ‚½ class 2 corresponds to the ‚“ sort block, ‚½ class 3 corresponds to the ‚“save undo block, ‚½ and so on. The last one, class 18, corresponds to the ‚“undo block. ‚½ Without the reason code, it is very important that you also capture the SQL statement that is associated with the buffer busy waits event so that you know what the context is. In other words, if the CLASS# is 1 (data block) and the SQL statement is a query, then this is similar to reason code 130. However, if the SQL statement is a DML, then this is similar to reason code 220.

Note ‚

In Oracle Database 10 g, the buffer busy wait codes 110 and 120 are replaced by another wait event, namely read by another session. The parameters are the same: file#, block#, class#.

 select * from v$waitstat; 
 
 CLASS                   COUNT       TIME 
 ------------------ ---------- ---------- 
 data block           41476005    4743636 
 sort block                  0          0 
 save undo block             0          0 
 segment header          49514       3471 
 save undo header            0          0 
 free list                  97         10 
 extent map                  0          0 
 1st level bmb               0          0 
 2nd level bmb               0          0 
 3rd level bmb               0          0 
 bitmap block                0          0 
 bitmap index block      34033       4276 
 file header block       13584       2379 
 unused                      0          0 
 system undo header          0          0 
 system undo block           0          0 
 undo header            309220       8021 
 undo block            5112402     435459

If a session spent the majority of its processing time waiting on the buffer busy waits event, you must find out what class of block it waited on the most, as this will help you to formulate the right solution. For example, if the buffer busy waits contention is mostly on a table ‚ s segment header block (class #4), then adjusting the table ‚ s PCTFREE will not resolve the problem. Prior to Oracle Database 10 g , the only way to determine the class of block was by translating the P1 and P2 parameters of the buffer busy waits event. Only a limited number of block classes can be determined in such a manner. Perhaps this is why Oracle makes it easy for DBAs by changing the P3 definition of the buffer busy waits event in Oracle Database 10 g . The following query identifies several classes of blocks in versions prior to Oracle Database 10 g . First, it checks to see if the P1 and P2 parameters combination translates to a segment header. If not, it checks to see if they translate to a FREELIST GROUPS block. Failing this last check, the combination may be a data, index, or rollback block.

 select Segment Header class, 
 a.segment_type, a.segment_name, a.partition_name 
 from   dba_segments a, v$session_wait b 
 where  a.header_file  = b.p1 
 and    a.header_block = b.p2 
 and    b.event        = buffer busy waits 
 union 
 select Freelist Groups class, 
 a.segment_type, a.segment_name, a.partition_name 
 from   dba_segments a, v$session_wait b 
 where  b.p2 between a.header_block + 1 and (a.header_block + a.freelist_groups) 
 and    a.header_file     = b.p1 
 and    a.freelist_groups > 1 
 and    b.event           = buffer busy waits 
 union 
 select a.segment_type   block class, 
 a.segment_type, a.segment_name, a.partition_name 
 from   dba_extents a, v$session_wait b 
 where  b.p2 between a.block_id and a.block_id + a.blocks - 1 
 and    a.file_id  = b.p1 
 and    b.event    = buffer busy waits 
 and    not exists (select 1 
 from   dba_segments 
 where  header_file  = b.p1 
 and    header_block = b.p2);

Now we will discuss the appropriate method for treating buffer busy waits problems based on the reason code and the class of block.

Contention for Data Blocks (Class #1) with Reason Code 130

If the majority of the buffer busy waits wait events are centered on data blocks (class #1) and the reason code is 130, this shows the application runs multiple sessions that query the same data set at the same time. (You will only know this if you query the V$SESSION_WAIT view repeatedly or trace the session with the 10046 event or use the data sampling methods discussed in Chapter 4.) This is an application issue. There are three things you can do to minimize this problem:

Reduce the level of concurrency or change the way the work is partitioned between the parallel threads.
Optimize the SQL statement to reduce the number of physical and logical reads.
Increase the number of FREELISTS and FREELIST GROUPS.

From our experience, it is very difficult to get the application to reduce the level of concurrency. It may not be a good idea because it limits scalability. However, there are differences between scalability and a blind attempt by the application to improve performance by spawning multiple sessions. So far, SQL tuning has worked wonderfully to reduce the occurrences of buffer busy waits . Check the SQL execution plan and optimize the SQL statement to use the most effective join method and access paths that reduce the number of physical and logical reads.

The following statistics from the V$SYSTEM_EVENT view are taken from a 29- hour -old Oracle 8.1.7.4 instance that is infested by the buffer busy waits wait event, primarily due to reason code 130. The database is a data warehouse, and the application opens multiple concurrent sessions that query the database using parallel query slaves. As you can see, processes wasted about 364 hours (131187473 centiseconds) cumulatively on the buffer busy waits wait event. The application may have been better off not using parallel query because slave processes that belong to different sessions compete for the same blocks. The application may also have been better off releasing the queries in stages instead of all at the same time. If you have a similar situation, you can show your statistics to the application group to communicate the problem and your solution.

 select * from v$system_event where event = buffer busy waits; 
 
 EVENT               TOTAL_WAITS TOTAL_TIMEOUTS TIME_WAITED AVERAGE_WAIT 
 ------------------- ----------- -------------- ----------- ------------ 
 buffer busy waits     174531409            683   131187473   .751655383

You may also see secondary buffer busy waits contention on data blocks when there are not enough FREELISTS. This is especially true when multiple sessions are concurrently inserting into the same table and the table has only one FREELISTS and FREELIST GROUPS. In this case, multiple sessions are directed to the same data block for record insertions, which creates the buffer busy waits contention. When a session spends a lot of time waiting on the buffer busy waits event, and the SQL statement that is associated with the event is an INSERT statement, you should check how many FREELISTS that table has. Of course, insufficient FREELISTS primarily cause buffer busy waits contention on the segment header ‚ class #4, which we will discuss shortly.

Contention for Data Blocks (Class #1) with Reason Code 220

If the majority of the buffer busy waits wait events are centered on data blocks and the reason code is 220, this indicates there are multiple sessions performing DML on the same object at the same time. In addition, if the database block size is large (for example, 16K and above), it can only intensify this symptom as larger blocks generally contain more rows per block. There are three things you can do to minimize this problem:

Reduce the level of concurrency or change the portioning method.
Reduce the number of rows in the block.
Rebuild the object in another tablespace with a smaller block size (Oracle9 i Database and above).

Again, as mentioned earlier, it may not be practical to limit scalability by reducing the level of concurrency.

If the data blocks belong to tables or indexes, then consider rebuilding the objects to reduce the number of rows per block and spread the data over a larger number of blocks. For example, you can rebuild a table or an index with a higher PCTFREE. In some cases, we have rebuilt indexes with PCTFREE as high as 50 percent. The downside to this is that index range scans and index fast full scans will be slower. You can also alter the table to minimize the number of rows per block with the ALTER TABLE table_name MINIMIZE RECORDS_PER_BLOCK command. Starting in Oracle9 i Database, you can move or rebuild the object in another tablespace with a smaller block size. While these actions can minimize the buffer busy waits problem, they will definitely increase full table scans time and disk space utilization. As the saying goes, there is no such thing as a free lunch .

Contention for Data Segment Headers (Class #4)

If the majority of the buffer busy waits wait events are centered on data segment headers (that is, the table or index segment header and not the undo segment header), this usually means some tables or indexes in the database have high segment header activities. Processes visit segment headers for two main reasons ‚ to get or modify the process FREELISTS information and to extend the high watermark (HWM). There are three things you can do to minimize this problem:

Increase the number of process FREELISTS and FREELIST GROUPS of the identified object.
Make sure the gap between PCTFREE and PCTUSED is not too small.
Make sure the next extent size is not too small.

The first step is to identify the segment name and type. They can be identified from the P1 and P2 parameters of the buffer busy waits wait event. Then you can alter the object to increase the number of process FREELISTS. If time and opportunity permit, you should also increase the number of FREELIST GROUPS by recreating the object. You should create all data segments with a minimum of two FREELIST GROUPS, by default ‚ even in a single-instance database. Contrary to some teachings, FREELIST GROUPS are not exclusively for Oracle Real Application Clusters (RAC) databases. It costs you one Oracle block per FREELIST GROUPS, but the benefits are well worth it. Table 6-6 shows how the FREELIST GROUPS helps minimize buffer busy waits contention on the segment header in a single-instance database. The Oracle instance is recycled prior to each test and the same number of sessions are used to perform each load. In fact, everything is the same except for the number of FREELIST and FREELIST GROUPS.

Table 6-6: FREELIST GROUPS ‚ Effects On Segment Header Contention
V$WAITSTAT	Test #1 Freelists = 1 Freelist Groups = 1		Test #2 Freelists = 12 Freelist Groups = 1		Test #3 Freelists = 4 Freelist Groups = 3
Class	Count	Time	Count	Time	Count	Time
data block	656432	534104	9299	6986	3045	1447
sort block
save undo block
segment header	384272	150444	72337	110850	79	6
save undo header
free list					8400	8503
extent map
1st level bmb
2nd level bmb
3rd level bmb
bitmap block
bitmap index block
file header block	1
Unused
system undo header
system undo block
undo header	2388	36	166	18	155	51
undo block

If you do not want to mess with FREELISTS and FREELIST GROUPS, you can rely on the Automatic Segment Space Management (ASSM) feature to scatter the incoming data from the insert statements. The caveat is that this is an Oracle9 i Database feature. As with any new feature, you should first research the known bugs on the Oracle Metalink site. ASSM is not the silver bullet. There will still be buffer busy waits contention. In fact, the ‚“1st level bmb, ‚½ ‚“2nd level bmb, ‚½ and ‚“3rd level bmb ‚½ classes are related to ASSM.

Next, make sure the gap between the PCTFREE and PCTUSED of the table is not too small in order to minimize block cycling to the FREELISTS. Query the appropriate DBA views (DBA_TABLES or DBA_TAB_PARTITIONS) for the information.

Finally, you should check the next extent size of the identified segment. A high insert rate combined with a small next extent size can cause frequent insertion of new entries into the extent map located in the segment header. Consider altering or rebuilding the object with a larger next extent size. If the object resides in a locally managed tablespace, consider moving the object into a reasonable uniform-size locally managed tablespace.

Contention for Undo Segment Headers (Class #17)

If the majority of the buffer busy waits wait events are centered on undo segment headers, this indicates there are either too few rollback segments in the database or their extent sizes are too small, causing frequent updates to the segment headers. If you use the system-managed undo introduced in Oracle9 i Database, you shouldn ‚ t have to deal with this problem as Oracle will create additional undo segments according to demand. However, if you are still using the old rollback segments, then the following applies.

You can create additional private rollback segments and bring them online to reduce the number of transactions per rollback segment. Don ‚ t forget to modify the ROLLBACK_SEGMENTS parameter in the INIT.ORA file accordingly . If you use public rollback segments, you can lower the value of the initialization parameter TRANSACTIONS_PER_ROLLBACK_SEGMENT, which influences the number of public rollback segments that will come online at startup. Don ‚ t let this parameter confuse you; it does not limit the number of concurrent transactions that can use a rollback segment. Oracle determines the minimum number of public rollback segments acquired at startup by TRANSACTIONS / TRANSACTIONS_PER_ROLLBACK_SEGMENT. So, a smaller divisor allows Oracle to bring up more public rollback segments.

Contention for Undo Blocks (Class #18)

If the majority of the buffer busy waits wait events are centered on undo blocks, this usually means there are multiple concurrent sessions querying data that is being updated at the same time. Essentially the query sessions are fighting for the read consistent images of the data blocks. This is an application issue and there is nothing amiss in the database. The problem should go away when the application can run the query and DML at different times.

System-Level Diagnosis

Oracle maintains a number of instance-level statistics on buffer busy waits . These statistics can give you a rough idea of what you are dealing with, but the information may not be specific enough for you to formulate a corrective action. They are mentioned here for the sake of completeness.

The view X$KCBWAIT (kernel cache buffer wait) is the base view for the V$WAITSTAT view, which keeps track of buffer busy waits contentions by block class. The class with the highest count deserves your attention, but unless you are also monitoring the buffer busy waits symptom at a lower level, you don ‚ t have a clear direction to proceed. Let ‚ s say the data block class has the highest count of all. Which segment was affected most, and why were the sessions unable to pin the buffers? Did they fail to get the pins while attempting to read or change the blocks? Unfortunately, Oracle does not keep track of buffer busy waits by block class and reason code. We hope someday Oracle will provide a matrix of buffer busy waits by SQL statement, segment name, and block class. Oracle Database 10 g keeps a history of the V$WAITSTAT view. The data can be seen through the DBA_HIST_WAITSTAT view. You can export the data into Microsoft Excel or some other charting tool for trending analysis. You should order the data based on SNAP_ID and CLASS as follows :

 select * 
 from   dba_hist_waitstat 
 order by snap_id, class;

The view X$KCBFWAIT ( kernel cache buffer file wait) keeps track of buffer busy waits contentions by database file. You can discover which data file has the most waits using the following query. Unless you have only one segment per data file, it is hard to pinpoint which segment in the data file suffered the most waits.

 select b.file_id, b.file_name, a.count 
 from   x$kcbfwait a, dba_data_files b 
 where  a.indx   = b.file_id  1 
 and    a.count  > 0 
 order by a.count; 
 
 FILE_ID FILE_NAME                                   COUNT 
 ---------- -------------------------------------- ---------- 
 26 /dev/vgEMCp105/rPOM1P_slice2k_174         3643362 
 27 /dev/vgEMCp105/rPOM1P_slice2k_175         3814756 
 25 /dev/vgEMCp105/rPOM1P_slice2k_173         4296088 
 24 /dev/vgEMCp105/rPOM1P_slice2k_172         5191989 
 . . .

The view X$KCBWDS keeps track of buffer busy waits by cache buffers lru chain latch that protects the LRU and LRUW working sets, as discussed in the ‚“What Does the Latch Free Wait Event Tell You? ‚½ section earlier in this chapter.

 select set_id, dbwr_num, blk_size, bbwait 
 from   x$kcbwds 
 where  bbwait > 0; 
 
 SET_ID   DBWR_NUM   BLK_SIZE     BBWAIT 
 ---------- ---------- ---------- ---------- 
 17          0       8192    9045337 
 18          1       8192    9019887 
 19          2       8192    9034296 
 20          3       8192    9052245 
 21          0       8192    9029914 
 22          1       8192    9036767 
 23          2       8192    9045665 
 24          3       8192    9031014

The view X$KSOLSFTS is the base view for the V$SEGMENT_STATISTICS view, which keeps track of buffer busy waits by segments. This only tells you which segment has experienced the most buffer busy waits . There is an interesting column in the X$KSOLSFTS view that is not exposed in the V$SEGMENT_STATISTICS view. The column is FTS_STMP, which records the last time the value (FTS_STAVAL) was updated for the particular segment. In other words, you can know the last time a buffer busy waits occurred on a particular segment.

 select * 
 from   v$segment_statistics 
 where  statistic_name = buffer busy waits 
 order by value;

The V$SYSTEM_EVENT view keeps track of the instance-wide total waits and time waited on the buffer busy waits wait event. Always query this view in the order of TIME_WAITED to see where the buffer busy waits is in relation to other wait events. If it is not in the top-five list of non-idle events, you shouldn ‚ t have to worry about it.

You might think that the numbers in the preceding views would agree with each other ‚ the TOTAL_WAITS of the buffer busy waits wait event in the V$SYSTEM_EVENT view should equal the SUM(COUNT) in the V$WAITSTAT view, and the SUM(COUNT) in the X$KCBFWAIT view, and the SUM(VALUE) in the V$SEGMENT_STATISTICS view. However, they do not as we prove in the following listing. Don ‚ t be surprised if you see a large discrepancy between them. This happens for three main reasons. One, the underlying memory structures are not protected by latches, and so in a busy system, simultaneous updates to the memory structures can corrupt the count. Two, the buffer busy waits are called from various places within the Oracle kernel, and in some places the code always adds a constant high number (for example, 100) instead of adding the actual number of waits that occurred. And three, there are no consistent reads done for these memory structures.

 select  a.value A, b.value B, c.value C, d.value D 
 from   (select sum(bbwait) AS value from x$kcbwds) a, 
 (select sum(count)  AS value from v$waitstat) b, 
 (select sum(value) AS value 
 from   v$segment_statistics 
 where  statistic_name = buffer busy waits) c, 
 (select total_waits AS value 
 from   v$system_event 
 where  event = buffer busy waits) d; 
 
 A          B          C          D 
 ---------- ---------- ---------- ---------- 
 72749463   72668122   65914197   72751265