Global Cache Waits | Oracle Wait Interface: A Practical Guide to Performance Diagnostics & Tuning (Osborne ORACLE Press Series)

The most common wait events in the RAC environment related to the global cache are: global cache cr request , global cache busy, and enqueue . We will discuss these wait events in detail and explain how to investigate the problem if excessive wait times are observed with these wait events.

global cache cr request

When a process requires one or more blocks, Oracle first checks whether it has those (blocks) in its local cache. The simple hashing algorithm based on the DBA (Data Block Address) is used to traverse the cache buffers chains , and a suitable lock is found if it is in the hash bucket. When a session requests a block(s) that was not found in its local cache, it will request that the resource master grant shared access to those blocks. If the blocks are in the remote cache, then the blocks are transferred using the interconnect to the local cache. The time waited to get the blocks from the remote cache is accounted in the g lobal cache cr request wait event.

Note ‚

This event is known as gc cr request in Oracle Database 10g. The ‚global cache ‚ is changed to just ‚gc ‚ .

The time it takes to get the buffer from remote instance to local instance depends on whether the buffer is in shared or exclusive mode. If the buffer is in shared mode, the remote instance will clone buffer in its buffer cache and ship it to the local cache. Based on the fairness value on that buffer, the lock downgrade may also happen if the number of CR blocks exceeds the _FAIRNESS_THRESHOLD counter value. If the buffer is in exclusive mode (XCUR), the PI has to be built and shipped across the buffer cache. The statistics are incremented according to whether the CUR block or the CR block is shipped.

Typically, global cache cr request waits are followed by the db file sequential/ scattered read waits. During the sequential scan a few blocks may be in the remote buffer cache and the rest of the blocks may be on the disks.

Normally, the requesting process will wait up to 100cs and then retry reading the same block either from the disk, or it will wait for the buffer from the remote cache, depending on the status of the lock. Excessive waits for global cache cr request may be an indication of a slow interconnect. The private interconnect (the high speed interconnect) should be used for cache transfer between instances, and the public networks should be used for client server traffic. In some cases, the RAC may not pick the interconnect, and the Cache Fusion traffic may be routed through the public network. In this case you will see a huge number of waits for g lobal cache cr request. You can use the oradebug ipc command to verify if the private network is used for cache transfer between instances.

Finding the Interconnect Used for Cache Transfer

The following procedure can be used to find the interconnect used for Cache Fusion:

 SQL> oradebug setmypid 
 Statement processed. 
 SQL> oradebug ipc 
 Information written to trace file. 
 SQL> oradebug tracefile_name 
 /oracle/app/oracle/product/9.2.0/admin/V920/udump/v9201_ora_16418.trc 
 SQL>

The trace file will contain the details of the IPC information along with the interconnect details:

 SKGXPCTX: 0xad95d70 ctx 
 admono 0xbcfc2b9 admport: 
 SSKGXPT 0xad95e58 flags         info for network 0 
 socket no 8  IP 192.168.0.5  UDP 38206  
 sflags SSKGXPT_UP 
 info for network 1 
 socket no 0     IP 0.0.0.0      UDP 0 
 sflags SSKGXPT_DOWN 
 active 0        actcnt 1 
 context timestamp 0

From the preceding trace file, you know the private network 192.168.0.5 is used for the Cache Fusion traffic and that the protocol used is UDP.

Table 8-1: RAC Network Protocols
OS/Hardware	Network
Sun	RSM (Remote Shared Memory Firelink)
HP PA RISC/IA	HMP Hyper Fabric
HP-Tru64	RDG (Reliable Data Gram) Memory Channel
AIX	UDP High-performance Switch, FDDI
Linux	Gigabit Ethernet
VMS	TCP/UDP Ethernet

Note ‚

For some reason, if the right interconnect is not used by the Oracle kernel, the CLUSTER_INTERCONNECTS parameter can be used to specify the interconnect for cache fusion. However, this limits the failover capability during interconnect failures.

Most of the hardware/OS vendors use proprietary high-speed protocols for the private network. The following table gives the short description of the network protocols used in different hardware/OS platforms. Other than the listed cluster servers, Veritas Cluster uses its own protocol (Low Latency Protocol) for the Cache Fusion traffic and cluster-wide traffic.

Relinking Oracle Binaries to Use the Right Interconnect Protocol

By default, TCP is used for inter-instance communication. To change the communication protocol to use the high speed interconnect, you need to relink the Oracle binaries and bind them to the high speed interconnect card. For example, on a default installation, UDP is used for the cluster interconnect in HP-UX. During instance startup, the interconnect protocol information is written to the alert.log:

 cluster interconnect IPC version:Oracle UDP/IP 
 IPC Vendor 1 proto 2 Version 1.0

To make use of high speed interconnect you need to shut down all Oracle services and relink the binaries using the following command:

 $ make -f ins_rdbms.mk rac_on ipc_hms ioracle

If you want to revert to UDP you can relink the binaries with the UDP protocol:

 $ make -f ins_rdbms.mk rac_on ipc_udp ioracle

The alert log can be verified to check whether the right interconnect is used. The following is reported in the alert log if the UDP is used as interconnect protocol:

 cluster interconnect IPC version:Oracle UDP/IP 
 IPC Vendor 1 proto 2 Version 1.0

Global Cache Wait Process

When a session wants a CR buffer, it will submit the request after checking the status of the buffers and the state of the lock element for those CR buffers. It will sleep till this request is complete. This sleep is recorded as a global cache cr request wait event. The receipt of the buffers will wake up the session, and the wait time is recorded for this particular wait event.

In general, high waits for global cache cr request could be because of:

Slower interconnect or insufficient bandwidth between the instances. This could be addressed by relinking binaries with the right protocol or by providing the faster interconnect.
Heavy contention for a particular block, which could cause the delays in lock elements down converts.
Heavy load on the system (CPU) or scheduling delays for the LMS processes. Normally the number of LMS processes is derived from the number of CPUs in the system with a minimum of two LMS processes. Increasing the number of LMS processes or increasing the priority of them could help in getting more CPU time, thus minimizing the waits. The init.ora parameter _LM_LMS can be used to set the number of LMS processes.
Slow disk subsystem or long running transactions with a large number of uncommitted (means the blocks are in XCUR mode) transactions.

Global Cache Statistics

The statistics related to global cache are available from the V$ SYSSTAT view. The following SQL provides the details about the global cache statistics:

 REM -- In Oracle10g global cache waits are known as gc waits 
 REM -- Replace global cache% with gc% for Oracle10g 
 
 select name,value 
 from   v$sysstat 
 where  name like '%global cache%'; 
 
 NAME                                                          VALUE 
 --------------------------------------------------------- --------- 
 global cache gets                                            115332 
 global cache get time                                          7638 
 global cache converts                                         55504 
 global cache convert time                                     14151 
 global cache cr blocks received                               62499 
 global cache cr block receive time                           143703 
 global cache current blocks received                         126763 
 global cache current block receive time                       20597 
 global cache cr blocks served                                 79348 
 global cache cr block build time                                266 
 global cache cr block flush time                             135985 
 global cache cr block send time                                 649 
 global cache current blocks served                            55756 
 global cache current block pin time                             811 
 global cache current block flush time                           159 
 global cache current block send time                            389 
 global cache freelist waits                                       0 
 global cache defers                                              72 
 global cache convert timeouts                                     0 
 global cache blocks lost                                        772 
 global cache claim blocks lost                                    0 
 global cache blocks corrupt                                       0 
 global cache prepare failures                                     0 
 global cache skip prepare failures                             4544

The average latency for constructing a CR copy can be obtained from the following formula:

 Latency for CR block = 
 (global cache cr block receive time  *10) / global cache cr blocks received

Reducing the PI and CR Buffer Copies in the Buffer Cache

In the RAC buffer cache, a new buffer class called PI (past image) is created whenever a dirty buffer is sent to remote cache. This helps the local buffer cache to have a consistent version of the buffer. When the buffer is globally dirty, it is flushed from the global cache. Sometimes, because of the high volatility of the buffer cache, the CR buffers and PI buffers may flood the buffer cache, which may increase the waits for global cache. The V$BH view can be used to monitor the buffer cache. The following SQL shows the buffers distribution inside the buffer cache:

 select status,count(*) 
 from   v$bh 
 group  by status; 
 
 STATU   COUNT(*) 
 ----- ---------- 
 cr            76 
 pi             3 
 scur        1461 
 xcur        2358

Setting FAST_START_MTTR_TARGET enables the incremental checkpointing, which reduces the PI buffers. Having more PI and CR images will shrink the usable buffers to a lower value and may increase the physical reads. Setting this parameter to a nonzero value enables incremental check pointing. The value for this parameter is specified in seconds.

global cache busy

In a cluster environment, each instance might be mastering its own set of data. There is always the chance that the other instance may require some data that is cached in the other instance, and the buffers cached in the other instance may be readily available or locally busy. Depending on the state of that buffer in the remote instance, the pin times may increase proportionally with the time it takes to service the block request.

If the buffer is busy at the cache level, you can use the following SQL to get the operations that are keeping the buffer busy. Based on the operations, you can reduce the contention for those buffers. This process can identify the buffer busy waits in single instance database environments also.

 select wh.kcbwhdes "module", 
 sw.why0 "calls", 
 sw.why2 "waits", 
 sw.other_wait "caused waits" 
 from   x$kcbwh wh, 
 x$kcbsw sw 
 where wh.indx = sw.indx 
 and sw.other_wait > 0 
 order by sw.other_wait; 
 MODULE                    CALLS      WAITS CAUSED_WAITS 
 -------------------- ---------- ---------- ------------ 
 kdiwh06: kdifbk          113508          0            1 
 ktewh25: kteinicnt       112436          0            2 
 kduwh01: kdusru            4502          0            3 
 kdiwh18: kdifind           1874          0            3 
 kddwh03: kddlkr          270333          0            4 
 kdswh01: kdstgr          139727          0           41

Note ‚

The column OTHER_WAIT is OTHER WAIT (notice the space rather than the underscore ) in Oracle8 i Database.

The preceding output is from a relatively quiet system, which does not have many waits for the global buffer. Table 8-2 lists the most commonly seen buffer busy waits and the operations/functions causing them.

Table 8-2: Common Operations Causing buffer busy waits
Module	Operation
Kdifbk	Fetches the single index row matching the argument key.
Kdusru	Updates the single row piece.
Kdifind	Finds the appropriate index block to store the key.
Kdstgr	Performs full table scan get row. Rows are accessed by a full table scan. Check the number of FULL table scans
Kdsgrp	Performs get row piece. Typically, row pieces are affected only in case of chained and migrated rows. Row chaining has to be analyzed and fixed.
Kdiixs	Performs index range scan.
Kdifxs	Fetches the next or previous row in the index scan.
Kdifbk	Fetches the single index row matching the agreement key.
Ktugct	Block cleanout.