14.5 Database tuning | Oracle Real Application Clusters

< Day Day Up >

In the previous section the discussion was around instance tuning, where basically tuning of the memory parameters and instance-specific areas was the goal. In this section we will look at tuning some of the physical characteristics, such as the database, the storage, and the segments. Memory management is to ensure that Oracle efficiently utilizes data that is in memory. The tuning of the physical layer is equally important, because the data is retrieved from disk before being loaded into memory. If the physical characteristics of the database did not help in providing an efficient data retrieval mechanism, there could be considerable performance issues across all instances participating in the clustered database configuration.

Statistics play a very important part in tuning this tier of the database. Statistical data is important for all areas of tuning; however, in the case of physical attributes, unless there is physical evidence that there are performance issues with a specific area, it is difficult to fix.

Similar to the instance tuning, there are parameters and views that would help tune the database layer. The STATSPACK utility that was discussed in Chapter 13 is a great source that would help in tuning the physical attributes of the database.

DB_BLOCK_SIZE

While tuning the database, the first and foremost important consideration is selecting the appropriate DB_BLOCK_SIZE. Fortunately, with Oracle 9i, database administrators have the opportunity to have multiple blocks. The primary block size, which is also called the default DB_BLOCK_SIZE cannot be changed once it is determined. Oracle allows for creation of four additional block sizes per database implementation: 4, 8, 16, and 32 KB.

The DB_BLOCK_SIZE determines the amount of data that could be retrieved per block retrieval. A large block size would be helpful for a decision support systems (DSS)/data warehouse implementation, where data is retrieved in large volumes, mostly through full table scans. A smaller block size of, say, 8 KB would be good enough for an online transaction processing (OLTP) application, where data is retrieved mostly in the form of singleton selections based on more specific selection criteria.

Block sizes also affect the segment-level performance. Segment-level performance issues can be obtained by collecting a STATSPACK snapshot at level 7 or through Oracle-provided views such as V$SEGMENT_ STATISTICS and V$SEGSTAT.

The common indication that there could be potential problems at the segment level is the ''buffer busy waits'' event noticed in the STATSPACK report. The other areas reported in the STATSPACK report that related to the segment-level statistics is the logical Reads and physical reads:

Top 5 Logical Reads per Segment for DB: PRODDB Instance: RAC1 Snaps: -> End Segment Logical Reads Threshold: 10000                                    Subobject Obj.   Logical Owner  Tablespace  Object Name        Name   Type     Reads   %Total ------ ----------  -------------  ---------- ----- --------- ------- SOWNER COMP_DATA_  COMPANY       _DATA_P002  TABLE   937,312   24.31 SOWNER COMP_DATA_  COMPANY       _DATA_P001  TABLE   427,264   11.08 SOWNER INDX_TBS30  USPRL_INDX3               INDEX   260,272    6.75 SOWNER DATA_TBS30  REGION_DETAIL _DATA_P001  TABLE   226,784    5.88 SOWNER COMP_DATA_  COMPANY       _DATA_P003  TABLE   176,016    4.56 --------------------------------------------------------------------

Top 5 Buf. Busy Waits per Segment for DB: PRODDB Instance: RAC1 Snaps: -> End Segment Buffer Busy Waits Threshold: 100                                    Subobject Obj.   Logical Owner  Tablespace  Object Name        Name   Type     Reads   %Total ------ ----------  -------------  ---------- ----- --------- ------- SOWNER COMP_DATA_  COMPANY       _DATA_P002  TABLE   253,560   46.31 SOWNER COMP_DATA_  COMPANY       _DATA_P001  TABLE    99,660   18.20 SOWNER COMP_DATA_  COMPANY       _DATA_P003  TABLE    84,533   15.44 SOWNER USPRL_DATA  USER_PROFILE  _DATA_P002  TABLE    70,094   12.80 SOWNER USPRL_DATA  USER_PROFILE  _DATA_P001  TABLE    21,919    4.00 --------------------------------------------------------------------

Top 5 Buf. Busy Waits per Segment for DB:  PRODDB Instance:RAC1 Snaps:  -> End Segment Buffer Busy Waits Threshold: 100                                    Subobject Obj.   Logical Owner  Tablespace  Object Name        Name   Type     Reads  %Total ------ ----------  -------------  ---------- ----- --------- ------ SOWNER COMP_DATA_  COMPANY       _DATA_P001  TABLE      358   34.82 SOWNER USERS       PK_USPRL                  INDEX      243   23.64 SOWNER COMP_DATA_  COMPANY       _DATA_P003  TABLE      207   20.14 SOWNER INDX_TBS20  USPRL_INDX1   _INDX_P001  INDEX       64    6.23 SOWNER INDX_TBS30  USRLI_INDX1   _INDX_P001  INDEX       39    3.79 -------------------------------------------------------------------

Prior to Oracle 9i, to get to the bottom of the issue that cased these wait events, one had to hop through various tables and layers of information.

Another method to obtain this information is to directly query certain new dynamic views:

V$SEGMENT_STATISTICS

Starting with Oracle 9i Release 2, Oracle provides real-time segment-level statistical information. This view helps the database administrator to drill down into certain wait events noticed at the system level and associate them with a specific table or index.

Like other statistics gathering options, segment-level information is only provided if the STATISTICS_LEVEL parameter and the TIMED_ STATISTICS parameter has been enabled.

Oracle collects the following statistical information at the segment level:

SELECT STATISTIC_NAME FROM V$SEGSTAT_NAME; STATISTIC_NAME --------------------------------------- ITL waits buffer busy waits db block changes global cache cr blocks served global cache current blocks served logical reads physical reads physical reads direct physical writes physical writes direct row lock waits 11 rows selected.

From the above list the ''buffer busy waits'' are normally of concern to most database administrators. To track down all the objects that have had ''buffer busy waits'' since the instance startup, the following query will be helpful:

COL OBJECT_NAME FORMAT A15 COL TABLESPACE_NAME FORMAT A20 COL OBJECT_TYPE FORMAT A25 COL STATISTIC_NAME FORMAT A20 COL VALUE FORMAT 9999999 SELECT       OBJECT_NAME,       TABLESPACE_NAME,       OBJECT_TYPE,       VALUE FROM V$SEGMENT_STATISTICS WHERE STATISTIC_NAME ='buffer busy waits' AND VALUE >0 ORDER BY VALUE /

Note

The definition and cause of this event, and its remedy, are discussed later in this chapter.

V$SEGSTAT

This is another view that provides segment-level statistics on a real-time basis. This view is also new in Oracle 9i Release 2.

The information contained in this view is similar to what is contained in the V$SEGMENT_STATISTICS view. However, when using the V$SEGSTAT view, the values in the various columns will have to be joined with other object views to get the actual names of objects.

For example, to get the tablespace name associated with the data in the V$SEGSTAT view, the TS# column needs to be joined with the V$TABLESPACE view.

COL NAME FORMAT A25 COL STATISTIC_NAME FORMAT A25 COL VALUE FORMAT 999999999999 SELECT       NAME,       STATISTIC_NAME,       VALUE FROM V$SEGSTAT VSS, V$TABLESPACE VTS WHERE VSS.TS# = VTS.TS# AND VALUE > 0 ORDER BY VALUE

The outcome of segment-level analysis is to eliminate the problem encountered by frequent waits at the segment level. The two parameters, among others, that would be worth investigating are the INITTRANS and the MAXTRANS parameters at the table and index definitions.

To confirm this finding, another wait event that could be examined is the ''enqueue'' wait. To determine the specific type of enqueue that is being waited on, the following query and a snapshot of the data on some of the enqueue types should help us drill down further.

SELECT       EQ_TYPE,       TOTAL_REQ#,       TOTAL_WAIT#,       FAILED_REQ#,       CUM_WAIT_TIME FROM V$ENQUEUE_STAT ORDER BY TOTAL_WAIT# / EQ  TOTAL_REQ  TOTAL_WAIT  FAILED_REQ  CUM_WAIT_TIME --  ---------  ----------  ----------  ------------- SQ       2017          27           0         409205 MR      34979          34           0            150 TS         88          53           1            205 HW      35475         230           0           2807 IR       1515         389           0          59010 JQ        984         969          10           3574 PS       6443        2808        1068         245641 TT     103161        3180           0          24616 US       4227        3677           0          19460 TA       4877        4754           0          51993 FB       9926        5118           0           8412 EQ  TOTAL_REQ  TOTAL_WAIT  FAILED_REQ CUM_WAIT_TIME --  ---------  ----------  ---------- ------------- CF     682320        5649           0          43608 TM    2048657       10556           1         567503 TX    1914970       28237           0         145181

From the above query and the data, it is obvious that the TX (transaction) and TM (DML enqueue) have encountered the highest numbers of requests (TOTAL_REQ) and waits (TOTAL_WAIT). The TX enqueue confirms our earlier finding that the INITRANS and the MAXTRANS parameters may require some adjustment. Another enqueue that has a high cumulative wait time (CUM_WAIT_TIME) is the SQ (sequence number) enqueue.

This is usually an application-related issue pertaining to row locking. However, under RAC, processing can magnify the effect of TX enqueue waits. Performance bottlenecks on leaf blocks of right-growing indexes may also appear as TX enqueue waits while index block splits are in progress.

TX enqueue performance issues along with ''buffer busy waits'' can be reduced by setting the value of the INITRANS parameter to be equal to the number of CPUs per node multiplied by the number of nodes in the cluster multiplied by 0.75, or to a higher value based on the number of transactions interfering with the objects in question. However, setting the INITRANS or MAXTRANS parameters to a value greater than 99 could cause performance problems. Setting a higher value causes overallocation of space for the transaction block header, which in turn causes less space to be available for the data layer variable header and causes more I/O operation.

DML_LOCKS

Under OPS, it was recommended that this parameter be set to a value of 0 to help improve performance. While this was true, when DML_LOCKS was set to 0, Oracle did not permit any DDL operations on the objects and hence caused severe maintenance concerns. When this parameter was set to 0, the database administrators had to shut down the instance, modify this parameter, make the changes, and set the DML_LOCKS back to its original value and bounce the instance again.

One of the reasons for TM enqueues that were noticed in the earlier example was too few locks. If the DML_LOCKS parameter is not set to 0 (which is not recommended because of maintenance-related difficulties) the number of locks is calculated based on the number of sessions, number of transactions, and the average number of objects that it would access directly or indirectly.

For example, if the instance is configured for 400 sessions, and if each session can execute a query that joins four tables, then each table on average will have at least one index that is accessed by the query. The data related to the table could be residing in at least one data partition and the index in at least one index partition, which means

400 Sessions x 8 Object (4 Tables + 4 Indexes) = 3200 3200 x 2 Partitions (1 Table partition + 1 Index partition) 6400

The DML_LOCKS parameter in this situation should be set to 6400 or higher.

ENQUEUE_RESOURCES

Another related parameter that helps enqueue activity is the ENQUEUE_RESOURCES. This parameter sets the number of resources that are concurrently locked. While DML_LOCKS allows a certain number of locks per session, the ENQUEUE_RESOURCE parameter allows the setting of these locks. Both these parameters go hand in hand.

At database startup time, Oracle allocates the number of enqueues specified by the ENQUEUE_RESOURCE parameter. The default value for this parameter is derived from the SESSIONS parameter and should be greater than DML_LOCKS ) 20. For 4 to 10 sessions, the default value is the number of database files ) (( SESSIONS_3) x 5) + 20. For more than 10 sessions it is the number of database files ) ((SESSIONS -10) x 2) + 55.

The usage of ENQUEUE_RESOURCES can be obtained by monitoring the V$RESOURCE_LIMIT view.

V$RESOURCE_LIMIT

This view provides visibility to some of the threshold settings and the current consumption values for some of the critical resources.

Some resources, those used by the distributed lock manager (DLM) for example, have a soft limit, which is the initial allocation, and a hard limit, which is theoretically infinite (although in practice it is limited by SGA size). During SGA reservation/initialization, a place is reserved in SGA for the INITIAL_ALLOCATION of resources, but if this allocation is exceeded, additional resources are allocated up to the value indicated in the LIMIT_VALUE against a specific resource. Certain resources such as the ges_ress, ges_locks, dml_locks, etc., have a maximum LIMIT_VALUE defined as UNLIMITED (UNLIMITED = 4,294,967,295). The following query generates the output containing the current utilization of resources.

COL CU FORMAT 9999999 COL MU FORMAT 9999999 COL IA FORMAT A10 COL LV FORMAT A10 SELECT       RESOURCE_NAME,       CURRENT_UTILIZATION CU,       MAX_UTILIZATION MU,       INITIAL_ALLOCATION IA,       LIMIT_VALUE LV FROM V$RESOURCE_LIMIT / RESOURCE_NAME             CU     MU       IA         LV ------------------    ------ ------ --------- --------- processes                 28    175       300       300 sessions                  21    186       335       335 enqueue_locks            268    307      4564      4564 enqueue_resources        268    315      3532 UNLIMITED ges_procs                 27    173       301       301 ges_ress               18194  24068     13085 UNLIMITED ges_locks              18534  26493     16625 UNLIMITED ges_cache_ress            55   1146         0 UNLIMITED ges_reg_msgs              75   1825       830 UNLIMITED ges_big_msgs              27    576       830 UNLIMITED ges_rsv_msgs               0      0       600       600 RESOURCE_NAME             CU     MU        IA        LV --------------------- ------ ------ --------- --------- gcs_resources          58457 102224    134675    134675 gcs_shadows            10824  62624    134675    134675 dml_locks                0      200      3000 UNLIMITED temporary_table_locks    0        1 UNLIMITED UNLIMITED transactions            11       45       368 UNLIMITED branches                 0        1       368 UNLIMITED cmtcallbk                0        1       368 UNLIMITED sort_segment_locks       0       53 UNLIMITED UNLIMITED max_rollback_segments   11       27        74        74 max_shared_servers       0        0        20        20 parallel_max_servers     9       11        41        41 22 rows selected

The CURRENT_UTILIZATION (CU) column indicates whether the value indicated in the INITIAL_ALLOCATION (IA) column has been exceeded. When the initial allocation value is exceeded, the additional required resources are allocated from the shared pool, where they compete for space with other resources.

V$ROWCACHE

This dynamic view is used to measure the caching behavior of the dictionary cache. The dictionary cache is part of the shared pool and does not have any tunable parameters other than the SHARED_POOL initialization parameter itself. This means that if the shared pool has not been sized correctly there is a direct impact on the dictionary cache.

COLUMN PARAMETER FORMAT A21 COLUMN PCT_SUCC_GETS FORMAT 999.9 COLUMN UPDATES FORMAT 999,999,999 SELECT PARAMETER      , SUM(GETS)      , SUM(GETMISSES)      , 100*SUM(GETS - GETMISSES) / SUM(GETS) PCT_      SUCC_GETS      , SUM(MODIFICATIONS) UPDATES FROM V$ROWCACHE WHERE GETS > 0 GROUP BY PARAMETER; PARAMETER           SUM(GETS) SUM(GETMISSES) PCT_SUCC_GETS  UPDATES -------------------  --------  -------------  ------------ -------- dc_constraints            462            214          53.7      456 dc_files              5008999           1193         100.0       51 dc_free_extents             9              4          55.6        9 dc_global_oids          78108            303          99.6        0 dc_histogram_defs     3111931         182217          94.1    2,487 dc_object_ids          945364           5088          99.5      396 dc_objects             134090           8963          93.3    8,498 dc_profiles             24694              5         100.0        0 dc_rollback_segments  1759820            236         100.0      107 dc_segments            116673           1725          98.5    1,049 dc_sequences             2742            829          69.8    2,742 PARAMETER           SUM(GETS) SUM(GETMISSES) PCT_SUCC_GETS  UPDATES ------------------- ---------  ------------- ------------- --------- dc_table_scns              10             10             0        0 dc_tablespace_quotas     6990            530          92.4    4,272 dc_tablespaces        4002024            534         100.0       34 dc_used_extents             3              3             0        3 dc_user_grants         233901             83         100.0        0 dc_usernames            99176             66          99.9        0 dc_users             15175456            107         100.0        2 18 rows selected.

In the output above, the various values provided are indications of inefficient use of the dictionary cache. For example, the high gets on the dc_users parameter indicate that there is frequent log-in and log-out activity, which could be reduced by having the application perform on a single sign-on option. This means that the application could have several user accounts, but the application itself has only one user sign-on password.

Applications not using locally managed tablespaces could experience high dc_used_extents, dc_segments, and dc_free_extents counts. This is due to frequent access to the dictionary to update the latest extent information. Using LMT reduces this activity because all tablespace management is performed locally at the tablespace level.

A common parameter that should be of concern in a RAC environment is the dc_sequences. This indicates that there are not enough cache sizes defined for the various sequences used by the application. Tuning the cache sizes on the sequences should help reduce the gets on this parameter.

< Day Day Up >