Database Configuration Parameter Tuning and Monitoring


The database configuration parameters have an effect on the database, as well as on all applications that access the database. The parameters listed in Table 8.3 have the most significant impact on the performance of the database and its applications.

Table 8.3. Database Configuration Parameter Descriptions

Parameter

Description

BUFFPAGE

Default Buffer Pool Size

LOGBUFSZ

Log Buffer Size

APPLHEAPSZ

Application Heap Size

SORTHEAP

Sorting

SHEAPTHRES_SHR

Sorting

LOCKLIST

Locking

MAXLOCKS

Locking

NUM_IOCLEANERS

Number of Asynchronous Page Cleaners

NUM_IOSERVERSW

Number of I/O Servers

MINCOMMIT

Number of Commits to Group

CATALOGCACHE_SZ

Catalog Cache Size

CHNGPGS_THRESH

Changed Pages Threshold

AVG_APPLS

Average Number of Concurrent Applications

Default Buffer Pool Size (BUFFPAGE)

Background

As discussed previously, a DB2 buffer pool is an area in memory into which database pages containing table rows or index keys are read and manipulated. The purpose of the buffer pool is to improve database system performance by caching frequently accessed pages in memory to eliminate the need for I/O to retrieve the pages because data can be accessed much faster from memory than from a disk. Therefore, the fewer times the database manager needs to read from or write to a disk, the better will be the performance. The configuration of the buffer pool (or buffer pools) is one of the most important tuning areas because it is here that most of the data manipulation takes place for applications connected to the database.

When a buffer pool is created or altered, its size can be explicitly specified or it can be set to use the default buffer pool size. To use the default buffer pool size, as specified by the BUFFPAGE database configuration parameter, the size of the buffer pool must be set to -1 when the buffer pool is created or altered .

If the database has a single large buffer pool, the default buffer pool size can be used. For a database with multiple buffer pools, the buffer pools will need to be sized independently, and attempting to use one size for all of the buffer pools will normally lead to suboptimal performance.

Configuring

To change the default buffer pool size and to increase the default buffer pool size, use the following commands:

  update db cfg for <dbname> using BUFFPAGE bigger_value   alter bufferpool IBMDEFAULTBP size -1  
Monitoring

Monitoring of the effectiveness of the database buffer pool or buffer pools is covered in detail in the Monitoring Buffer Pool Activity section of Chapter 2, Data Manipulation, so it will not be repeated here.

Automatic

No

Online

Yes

Log Buffer Size (LOGBUFSZ)

Background

Just as buffer pools help to improve database performance, the log buffer is used to improve logging performance by providing an area of memory for the DB2 engine to write log records to instead of writing directly to disk. DB2 uses a dual buffering technique so that, as it is asynchronously writing a log buffer to disk, DB2 can continue logging to the other log buffer. The log buffers are written to disk when any of the following conditions occur:

  • A transaction commits (or MINCOMMIT transactions commit).

  • The log buffer is full.

  • One second has elapsed since the last log buffer flush.

Configuring

Buffering of the log records results in more efficient log file I/O because the log records are written to disk less frequently, and multiple log records can be written with each write request. The default size of the log buffer is normally too small for most update workloads. When increasing the value of the log buffer, it is also important to consider the size of the database heap (DBHEAP) because the log buffer area is allocated from within the database heap.

Typically, the log buffer should be increased to a minimum value of 256 pages, as follows

  update database cfg for <dbname> using LOGBUFSZ 256  

to help improve overall performance, especially for online workloads.

Monitoring

The database logging activity and I/O effectiveness can be examined using a database snapshot. To capture a database snapshot and extract the information related to the logging activity, use the following:

  get snapshot for database on <database_name>  grep i "Log space"   or   SELECT sec_log_used_top, tot_log_used_top, total_log_used, total_log_available   FROM TABLE(SNAPSHOT_DATABASE('SAMPLE',-1)) as SNAPSHOT_DATABASE  

The output of the get snapshot command would look like the following:

  Log space available to the database (Bytes)       = 4549916   Log space used by the database (Bytes)            = 550084   Maximum secondary log space used (Bytes)          = 0   Maximum total log space used (Bytes)              = 550084  

The SQL statement above would produce the following output:

  SEC_LOG_USED_TOP  TOT_LOG_USED_TOP  TOTAL_LOG_USED  TOTAL_LOG_AVAILABLE   ----------------  ----------------  --------------  -------------------   0            550084          550084              4549916  

However, when examining the log space, it is important to ensure that there are enough primary log files configured to handle normal processing. Secondary logs should be used for exceptions only in cases where the amount of logging is expected to be heavier than normal, i.e., nightly batch processing. It is also important that there be sufficient log space available at all times to ensure that there is no danger of encountering a log full condition.

To determine the current amount of log space available (CLSA), use the following formula:

  CLSA = Log space available to the database - Log space used by the database  

or the following SQL statement:

  SELECT total_log_available - total_log_used as "CurrentLogSpaceAvail"   FROM TABLE(SNAPSHOT_DATABASE('SAMPLE',-1))   as SNAPSHOT_DATABASE  

To determine the minimum amount of log space available, use the following formula:

  CLSA = Log space available to the database - Maximum total log space used  

or the following SQL statement:

  SELECT total_log_available - tot_log_used_top as "CurrentLogSpaceAvail"   FROM TABLE(SNAPSHOT_DATABASE('SAMPLE',-1))   as SNAPSHOT_DATABASE  

To capture a database snapshot and extract the information related to the I/O effectiveness of the database logging, use the following:

[View full width]
 
[View full width]
get snapshot for database on <database_name> grep i "Log pages" or SELECT log_reads, log_writes FROM TABLE(SNAPSHOT_DATABASE('SAMPLE',-1)) as graphics/ccc.gif SNAPSHOT_DATABASE

We can use the database snapshot to determine whether the LOGBUFSZ parameter is optimal by looking at the lines shown in the following example:

  Log pages read         = 0   Log pages written      = 12644  

The ratio between the number of log pages read to the number of log pages written should be as small as possible. Ideally, there should not be any log pages read, and the number of log pages written will depend on the insert/update/delete workload on the database server. When there are a significant number of log pages read, it is an indication that the size of the log buffer should be increased.

Automatic

No

Online

No

Application Heap Size (APPLHEAPSZ)

Background

The application heap size (APPLHEAPSZ) defines the number of private memory pages available to be used by a DB2 UDB instance on behalf of each DB2 agent and/or subagent. This heap is used to store a copy of the currently executing sections of the access plan for the application associated with the DB2 agent or subagent.

NOTE

If the database is partitioned, the executing sections of the SQL statements for the agents and subagents will be stored in the application control heap (APP_CTL_HEAP_SZ), not in the application heap.


The application heap is allocated when an agent or subagent is initialized for an application. The amount of memory allocated will be only what is needed to process the request that has been given to the DB2 agent or subagent. When a DB2 agent or subagent requires additional application heap to be able to process larger SQL statements, DB2 will allocate additional memory, up to the maximum specified by the application heap size.

Configuring

For a database with small, relatively simple SQL, the default application heap size is normally adequate. However, for large, complex SQL, the default may not be large enough, and applications may run out of application heap and encounter errors. In DB2 UDB Version 7, the default value of the application heap was normally too low. However, in DB2 UDB Version 8, the default value has been increased to 256 pages.

If an application does encounter errors running out of application heap, the size of the application heap can be increased as follows:

  update database cfg for <database_name> using applheapsz 1024  
Monitoring

It is not possible to monitor the size of the application heap allocated within each DB2 agent or subagent directly. Therefore, in most cases, it is best to increase the application heap size, as above, and when an application receives an error indicating that there is not enough storage in the application heap to increase the value of this parameter. Because DB2 allocates only what is required, do not increase the size of the database heap by small amounts. If an error is encountered , double the size of the database heap and test to see whether the error has been eliminated.

Automatic

No

Online

No

Sorting (SORTHEAP, SHEAPTHRES_SHR)

Background

The sort heap (SORTHEAP) size specifies the maximum number of private memory pages to be used for private sorts or the maximum number of shared memory pages to be used for shared sorts. If the DB2 optimizer chooses to perform a private sort, the sort heap size affects agent private memory. If the DB2 optimizer chooses to perform a shared sort, the sort heap size affects the database-level shared memory. Each sort operation will have a separate sort heap that will be allocated as needed by DB2 where the underlying data will be sorted. Normally, DB2 will allocate a full sort heap. However, if directed by the optimizer, a smaller amount of memory than specified by the sort heap size may be allocated, using the information provided by the optimizer and the database statistics.

The sort heap threshold is an instance-level configuration parameter, as described previously. Private and shared sorts use memory from two different memory areas at the operating system level. The total size of the shared sort memory area is allocated and statically predetermined at the time of the first connection to (or activation of) a database based on the value of the shared sort heap threshold (SHEAPTHRES_SHR) parameter. The total size of the private sort memory area is allocated as needed and is not restricted in size. Because of the fundamental differences between shared and private memory, the sort heap threshold (SHEAPTHRES and SHEAPTHRES_SHR) parameters are applied differently for private and shared sorts:

  • For private sorts, the sort heap threshold parameter is an instance-wide soft limit on the total amount of memory that can be used by private sorts at any given time. When the total usage of private sort memory for a DB2 instance reaches this limit, the memory allocated for new private sort requests will be reduced by a factor of one half.

  • For shared sorts, the sort heap threshold parameter is a database-wide hard limit on the total amount of memory that can be used by shared sorts at any given time. When the total usage of shared sort memory for a DB2 instance reaches this limit, no further shared sort memory requests will be allowed until one of the currently executing shared sorts completes.

Configuring

The default value for SHEAPTHRES is quite low, especially for decision support databases. It is good practice to increase the value for SHEAPTHRES significantly because it can have a very dramatic effect on performance. However, increasing the SHEAPTHRES blindly can mask a real problem in the system or applications.

NOTE

By default, SHEAPTHRES_SHR is equal to SHEAPTHRES.


When determining an appropriate value for the SHEAPTHRES and SHEAPTHRES_SHR parameters, consider the following:

  • Hash joins and dynamic bitmaps used for index ANDing and star joins use sort heap memory. Increase the size of the SORTHEAP, SHEAPTHRES, and SHEAPTHRES_SHR when these techniques are used.

  • Increase the SHEAPTHRES when large private sorts are frequently required.

  • Increase the SHEAPTHRES_SHR when large shared sorts are frequently required.

  • The values of SHEAPTHRES and SHEAPTHRES_SHR need to be based on the value of the SORTHEAP, as well as on the average number of applications executing in the database.

    • For example, if database monitoring shows that, on average, there are 12 applications concurrently executing in DB2, setting the SHEAPTHRES to 12 “15 times the SORTHEAP would be a good starting point.

    • Because shared sorts are less common than private sorts, setting the SHEAPTHRES_SHR to 5 times the SORTHEAP would be a good starting point.

To set the SHEAPTHRES configuration parameter, use the following command:

  update dbm cfg using sheapthres 80000  

To set the SHEAPTHRES_SHR configuration parameter, use the following command:

  update db cfg for <database_name> using sheapthres_shr 30000  

Once the database is operational, monitor the system to determine whether the sort heap threshold is being reached. However, sorts may also be overflowed without hitting the sort heap threshold if they require more memory than the configured sort heap size. These sort overflows may be eliminated by increasing the size of the sort heap, as follows:

  update database cfg for <database_name> using sortheap 6000  
Monitoring

Due to the impact of sorting on database performance, DB2 monitors a number of things in relation to sort activity.

Sorts that are started after the sort heap threshold has been reached may not get an entire SORTHEAP allocated and, therefore, have a much higher chance of overflowing. The number of sorts stated after the SHEAPTHRES has been reached is reported in the post threshold sorts database monitor element.

In addition to the number of piped sorts requested and accepted, the number of overflowed sorts is also available in the snapshot information. These are related to the sort heap, as well as the sort heap threshold, and they will be discussed in more detail when the sort heap is discussed later in this chapter.

To determine whether the sort heap threshold is being reached, take a database manager snapshot, using the following command:

  get snapshot for database manager  grep i "Post threshold sorts"  

This will capture a snapshot for the specified database and extract the monitor element concerned with the number of database files closed. This would produce output like the following:

  Post threshold sorts                 =  16  

This information can also be captured using an SQL statement, as follows:

  SELECT post_threshold_sorts FROM TABLE(SNAPSHOT_DBM(-1)) as SNAPSHOT_DBM  

This would produce output like the following:

  POST_THRESHOLD_SORTS   --------------------   16  

If this value is excessive, the size of the sort heaps and/or sort heap threshold should be examined to determine whether they are sufficient. In addition, the applications should be examined to ensure that they are using appropriate indexes.

If the allocation of one more sort heap equals or exceeds the sort heap threshold, a piped sort cannot be performed, and any request for a piped sort will get rejected. A sort heap will be allocated to handle the sorting of the data, but it will be a reduced size.

The percentage of piped sorts requests that have been serviced by the database manager can be calculated by using the formula:

  Percent Piped Sorts = (piped_sorts_accepted / piped_sorts_requested) * 100%  

If this percentage of piped sorts is low, the sort performance could be improved by increasing the sort heap threshold. The number of piped sort requests that have been rejected by DB2 can be calculated using the following formula:

  Piped Sorts Rejected = piped_sorts_requested - piped_sorts_accepted  

A high number of rejected pipe sort requests may indicate that either the value of sort heap or sort heap threshold is too small to support the current workload.

Another indicator of sort performance is the percentage of post threshold sorts. DB2 allocates sort heaps at the beginning of sorts and at the beginning of any sort merge phase. If at any time during a sort a request to allocate a sort heap would exceed the SHEAPTHRES, the sort would be classified as a post threshold sort. The percentage of post threshold sorts is calculated using the following formula:

  Percent Post Threshold Sorts = (post_threshold_sorts / sum of total_sorts) * 100%  

The total amount of private sort heap that is currently allocated can be monitored using the following:

  get snapshot for database manager  grep i "Private Sort heap allocated"   or   SELECT sort_heap_allocated FROM TABLE(SNAPSHOT_DBM(-1)) as SNAPSHOT_DBM  

For an online database, it is also important to examine the number of sort operations per transaction (SPT), as well as the percentage of overflowed sorts (POS), as follows:

  SPT = (Total Sorts) / (Commit statements attempted + Rollback statements attempted)   POS = (Sort overflows * 100) / (Total sorts)  

Because online databases need to provide instantaneous results, the number of sort operations per transaction should be as low as possible. If the average number of sorts per transaction is three or more, this indicates that there are far too many sorts per transaction. If the percentage of overflowed sorts is greater than 3%, there may be serious and unexpected large sorts occurring. When this happens, increasing the SORTHEAP and/or SHEAPTHRES will likely only mask the underlying performance problem, not fix it. The best action to help correct these sorting issues is to capture dynamic SQL snapshots, look for poorly performing SQL statements, and add proper indexes as required.

See Appendix B for a sample script to parse the output of a dynamic SQL snapshot and output the information in ASCII-delimited format so that it can be read into a spreadsheet to be examined.

Automatic

No

Online

Yes

Locking (LOCKLIST, MAXLOCKS, LOCKTIMEOUT, DLCHKTIME)

Background

The lock list is an area of memory used to store all locks that are currently active within a database. The maximum storage for lock list database configuration parameter (LOCKLIST) indicates the amount of storage that is allocated to the lock list for each database. Each database has its own lock list that contains information about the locks held by all applications concurrently connected to the database. Locking is the mechanism that DB2 UDB uses to control concurrent access to data in the database by multiple applications. Within a DB2 database, locks can be obtained on both rows and tables but not on pages.

The amount of space required for a lock depends on whether DB2 is installed in 32-bit mode or 64-bit mode and on whether it is the first lock on a database object or a subsequent lock on a database object.

When DB2 is installed in 32-bit mode, each lock will require either 36 or 72 bytes of space in the database lock list, depending on whether other locks are held on the object:

  • 72 bytes are required to hold a lock on an object that has no other locks held on it.

  • 36 bytes are required to record a lock on an object that has an existing lock held on it.

When DB2 is installed in 64-bit mode, each lock will require either 56 or 112 bytes of space in the database lock list, depending on whether other locks are held on the object:

  • 112 bytes are required to hold a lock on an object that has no other locks held on it.

  • 56 bytes are required to record a lock on an object that has an existing lock held on it.

The maximum percentage of the lock list before escalation database configuration parameter (MAXLOCKS) parameter defines the percentage of the lock list that can be held by an application before DB2 UDB will perform lock escalation for the application. Lock escalation is the process of replacing a number of row locks with a single table lock, therefore reducing the number of locks in the database's lock list and making space available for new locks to be obtained. When the number of locks held by any one application reaches this percentage of the total lock list size, DB2 will perform lock escalation as follows:

  1. Examine the lock list for the identified application to determine which object has the most row-level locks.

  2. Request an equivalent table-level lock.

  3. Release the row-level locks once the table lock is granted.

If, after replacing the row locks with a single table lock, the MAXLOCKS value is no longer exceeded for the application, lock escalation will stop. If not, lock escalation will continue on other database tables until the percentage of the lock list held by the application is below the MAXLOCKS configuration parameter.

NOTE

The MAXLOCKS configuration parameter multiplied by the MAXAPPLS configuration parameter cannot be less than 100.


Lock escalation will also occur if the lock list becomes full for any reason. When the lock list is full, DB2 will perform lock escalation as follows:

  1. Examine the lock list to determine which application has the most locks.

  2. Examine the lock list for the identified applications to determine which object has the most row-level locks.

  3. Request an equivalent table-level lock.

  4. Release the row-level locks once the table lock is granted.

If, after replacing the row locks with a single table lock, the lock list is still full (because other applications can still obtain locks while the escalation is occurring), lock escalation will continue following the same procedure as above. If the lock escalation causes space to become available in the lock list, lock escalation will stop.

Although the escalation process itself does not take much time, locking entire tables (versus locking individual rows) can cause a decrease in concurrency, and overall database performance may be impacted for subsequent accesses against the affected tables. Lock escalation can also cause deadlocks as the row locks are being converted to the table lock.

Deadlocks

When multiple applications are working with data in the same database, there are opportunities for a deadlock to occur between two or more applications, especially if lock escalations are occurring. A deadlock is created when one application is waiting for a lock that is held by another application, and that application is waiting for a lock held by the first application. Each of the waiting applications is locking data needed by another application, while also holding a lock held by the waiting application. Mutual waiting for other applications to release locks on the data leads to a deadlock condition. In this case, the applications could potentially wait forever until one of the other applications releases a lock on the held data.

Because applications do not voluntarily release locks on data that they need, DB2 UDB uses a deadlock detector process to detect and break deadlocks to allow application processing to continue. As its name suggests, the deadlock detector is started every period of time defined by the deadlock check time (DLCHKTIME), when it reads the database lock list and examines the information about the DB2 agents that are waiting on locks. If a deadlock condition is detected , the deadlock detector arbitrarily selects one of the applications involved in the deadlock as the victim. The victim application will be rolled back by DB2, and its locks will be released so that the other applications can continue. The victim application will receive an SQL 911 error code with a reason code of 2.

In Version 8, DB2 has enhanced the diagnostics available when a deadlock occurs, and the deadlock can now be debugged without the need to reproduce the condition with event monitors or snapshot monitoring turned on. When the diagnostic level is set to 4, a record will be written to the diagnostic log, indicating which application caused the deadlock condition, as well as the SQL statement it was executing. An example of the diagnostic record is below:

  Request for lock "REC: (2, 13) RID 0000000B" in mode "..U" failed due to deadlock   Application caused the lock wait is "*LOCAL.DB2.00F888145716"   Statement: 7570 6461 7465 2074 3120 7365 7420 6331     update t1 set c1   3d32 3120 7768 6572 6520 6332 3d39                     =21 where c2=9  

Based on the above diagnostic entry, we can determine that:

  • DB2 was attempting to acquire a record lock (REC) to execute this statement.

  • The lock mode requested was Update (U).

  • The application that caused the error had an application ID of "LOCAL.DB2.00F888145716".

  • The statement that caused the error was "update t1 set c1=21 where c2=9".

The LOCKTIMEOUT configuration parameter can be used to prevent applications from waiting indefinitely for locks. By default, the lock timeout is set to -1, which tells DB2 to wait indefinitely for locks. To time out the locks after a set period of time, the lock timeout parameter can be set to a specific value.

Configuring

The size of the lock list can be estimated as follows:

The minimum lock list size would occur if all (or almost all) of the locks exist on objects that already have locks. The maximum lock list size would occur if all locks are unique and exist on objects without any other locks. Therefore, the lower bound for the lock list would be calculated as follows:

[View full width]
 
[View full width]
MinLockList = (ALA * LS1 * MAXAPPLS) / 4096 Where: ALA = Average # of locks per application LS1 = Lock Size (either 36 or 56 bytes, depending on whether DB2 is graphics/ccc.gif running in 32-bit or 64-bit mode)

The upper bound for the lock list would be calculated as follows:

[View full width]
 
[View full width]
MaxLockList = (ALA * LS2 * MAXAPPLS) / 4096 Where: ALA = Average # of locks per application LS2 = Lock Size (either 72 or 112 bytes, depending on whether DB2 is graphics/ccc.gif running in 32-bit or 64-bit mode)

Then attempt to estimate the percentage of unique locks versus subsequent locks on the same object and use this percentage to choose a value between these extremes as the initial size of the lock list.

The default value for MAXLOCKS is 10%. This is normally too small and can cause unnecessary lock escalation. A value of 25 “30% is normally a better balance between overall concurrency and minimizing lock escalations.

To configure the size of the lock list to be 500 pages and increase the MAXLOCKS to 30%, use the following commands:

  update db cfg for <database_name> using LOCKLIST 500   update db cfg for <database_name> using MAXLOCKS 30  

The default deadlock check time is 10 seconds (10,000 milliseconds ), and the default lock timeout is indefinite (-1). To increase the deadlock check time to 30 seconds and enable locks to time out after waiting for 5 seconds, use the following commands:

  update db cfg for <database_name> using DLCHKTIME 30000   update db cfg for <database_name> using LOCKTIMEOUT 5  
Monitoring

Lock escalation will cause more table locks and fewer row-level locks, thus reducing concurrency within the database. In addition to reduced concurrency, lock escalation can also cause deadlocks to occur, which will result in transactions being rolled back. Due to the importance of locking on overall database performance, there is a database monitor specifically for locking. It is important to note that, because the lock snapshots capture timestamp information within the monitor elements, they are the most expensive (i.e., they have the largest overhead) of the database monitors.

As with other monitor information, the data can be captured using a database monitor snapshot or an SQL statement. However, it is important to note that the SQL snapshot acquires a lock while capturing the lock snapshot information, so in reality, the number of "real" locks on the system is one less than the number reported by the SQL statement. The following command will capture a database snapshot and return only the locking-related information for the database:

  get snapshot for database on <database_name>  grep -i 'Lock'  

The following SQL statement will provide the same information as the above command:

  SELECT LOCKS_HELD, LOCK_WAITS, LOCK_WAIT_TIME,   LOCK_LIST_IN_USE, DEADLOCKS, LOCK_ESCALS, X_LOCK_ESCALS,   LOCKS_WAITING,  LOCK_TIMEOUTS, INT_DEADLOCK_ROLLBACKS   FROM TABLE(SNAPSHOT_DATABASE('SAMPLE',-1)) as SNAPSHOT_DATABASE;  

If the above command and SQL statement were executed at the same time, the output of the command would look like the following:

  Locks held currently                      = 12   Lock waits                                = 0   Time database waited on locks (ms)        = 0   Lock list memory in use (Bytes)           = 2080   Deadlocks detected                        = 1   Lock escalations                          = 0   Exclusive lock escalations                = 0   Agents currently waiting on locks         = 0   Lock Timeouts                             = 0   Internal rollbacks due to deadlock        = 1  

The output of the SQL statement would look like the following:

  LOCKS_HELD          LOCK_WAITS       LOCK_WAIT_TIME   LOCK_LIST_IN_USE    DEADLOCKS        LOCK_ESCALS   LOCKS_WAITING       LOCK_TIMEOUTS    INT_DEADLOCK_ROLLBACKS    X_LOCK_ESCALS   ----------------    -------------    ----------------------    -------------   13                  0                0                         1   2116                1                0   0                   0                0   1 record(s) selected.  

Notice that there is an extra lock reported in the output of the SQL statement, as well as 36 extra bytes of lock list usage. This information was captured on a 32-bit DB2 instance where a lock will use 36 bytes of the lock list for a lock if there are already other locks on the same object.

Locks Held

In general, the number of locks held should be as low as possible because applications holding locks for long periods of time will hurt application concurrency. For a database with a large number of applications, there will normally be some locks held at any give time, especially for an online system with inserts , updates, or deletions.

When examining the number of locks held, it is important also to examine the number of currently connected applications. For a system with 100 connected users, 100 locks is not excessive. However, if there are only five applications connected to the database, each application is holding 20 locks on average, which can be excessive.

The average number of locks per applications (ALA) is calculated using:

  ALA = Locks held currently / Applications connected currently  

or

  SELECT  (real (real(APPLS_CUR_CONS)) / (real(LOCK_WAIT_TIME)))   as AverageLocksPerApp   FROM TABLE(SNAPSHOT_DATABASE('SAMPLE',-1)) as SNAPSHOT_DATABASE;  
Lock wait time

The time that the database has waited on locks should be as low as possible because when DB2 is waiting for the lock, the applications are also waiting. A single lock wait may not seem too bad if we look only at the number of lock waits. However, if the lock waited for 5 minutes, it is very important to examine the reason for this and attempt to eliminate the source of the lock wait.

Lock list memory in use

If the amount of lock list in use is approaching the size of the lock list, this is an indication that lock escalation may be imminent. For normal operations, the lock list should not be more than 60% used. This leaves the remaining 40% of the lock list for exception processing. The percentage of the lock list used (PLU) is calculated using the following formula:

  PLU = (Lock list memory in use / LOCKLIST) * 100%  
Deadlocks detected

As discussed previously, when a deadlock is encountered, one of the applications involved in the deadlock will be rolled back and will need to redo the entire transaction. All deadlocks should be investigated and eliminated, if possible.

Lock escalations/exclusive lock escalations

Lock escalations cause decreased application concurrency and in many cases, the application may encounter lock waits and/or deadlocks while performing the lock escalation. These locking issues are even more likely with exclusive lock escalations because a number of exclusive row locks are converted to an exclusive table lock, and no other applications can access any row in the entire table.

Lock timeouts

The number of lock timeouts needs to be analyzed in conjunction with the setting for the LOCKTIMEOUT database configuration parameter. If the LOCKTIMEOUT parameter were configured to a low value to cause locks to time out quickly, the number of lock timeouts would normally be higher. If the LOCKTIMEOUT parameter is configured to a relatively high value, lock timeouts should not occur frequently, and when a lock timeout is encountered, it should be examined.

In Version 8, DB2 has enhanced the diagnostics available when a lock timeout occurs, and the lock timeout can now be debugged without the need to reproduce the condition with event monitors or snapshot monitoring turned on. When the diagnostic level is set to 4, a record will be written to the diagnostic log, indicating which application caused the lock timeout condition, as well as the SQL statement it was executing. An example of the diagnostic record is below:

  Request for lock "TAB: (2, 13)" in mode ".IX" timed out   Application caused the lock wait is "*LOCAL.DB2.007340152709"   Statement: 7570 6461 7465 2074 3120 7365 7420 6331   update t1 set c1   3d63 312b 3531 3231 30                              =c1+51210  

Based on the above diagnostic entry, we can determine that:

  • DB2 was attempting to acquire a table lock (TAB) to execute this statement.

  • The lock mode requested was Intent-eXclusive (IX).

  • The application that caused the error had an application ID of "LOCAL.DB2.007340152709".

  • The statement that caused the error was "update t1 set c1=c1+51210".

Automatic

No

Online

Yes

Number of Asynchronous Page Cleaners (NUM_IOCLEANERS)

Background

When a DB2 agent, acting on behalf of an application, needs to access table or index pages it will first look for the pages in the database buffer pool area. If the page cannot be found in the buffer pool area, it will be read from disk into the buffer pool. If the buffer pool is full, DB2 must select a "victim" page to be overwritten in the buffer pool. If the victim page is dirty (i.e., it was changed since it was read into the buffer pool and has not been written to disk), it must first be written to disk before it can be over written. During the write operation, the application must wait. To reduce the likelihood that the victim page will be dirty, DB2 uses page cleaners to asynchronously write dirty pages to disk before they are chosen as victims.

NOTE

An additional benefit of the page cleaners writing the dirty pages to disk is the reduction in the amount of work required in the event that there is a problem with DB2 and a database restart/recovery is required.


The number of asynchronous page cleaners database configuration parameter (NUM_IOCLEANERS) specifies the number of asynchronous page cleaner processes that can be run for the database. If this parameter is set to zero (0), there will be no page cleaners available for the database and as a result, the database agents will perform all of the page writes from the buffer pool to disk synchronously.

If the applications for a database consist primarily of transactions that update data, an increase in the number of cleaners will help improve the performance and will also reduce the recovery time from soft failures, such as power outages, because the contents of the database on disk will be more up to date at any given time.

How the Page Cleaners Are Triggered

The page cleaners can be triggered in three different ways.

Dirty page threshold

When a page in the buffer pool is changed, it is added to the buffer pool's dirty list. At this time, DB2 checks to see whether this addition to the dirty list exceeds the changed page threshold (aka. dirty page threshold) for the buffer pool. If the changed page threshold is exceeded, the page cleaners will be triggered.

The changed page threshold database configuration parameter (CHNGPGS_THRESH) represents the percentage of the buffer pool that can be dirty before the page cleaners are triggered.

LSN gap

When transactions are occurring against the database, they will be logged. To reduce the amount of work required in the event of a problem, DB2 will trigger the page cleaners as it writes to the log file(s).

The percentage of the log file reclaimed before the soft checkpoint database configuration parameter SOFTMAX represents the percentage of a log file that is written before the page cleaners are triggered.

Dirty page steals

When an agent requests a page that must be read from disk and DB2 chooses the victim page, if the page is dirty, the page must first be written to disk before it can be used to read the new page that the agent has requested. After a number of dirty victim pages have been selected, DB2 will automatically trigger the page cleaners to write the dirty pages to disk.

How the Page Cleaners Work

When the page cleaners are triggered, all of the page cleaners are triggered at the same time. They will each gather up to 400 pages from the dirty lists for the database buffer pools. The pages from the dirty list will then be written to disk, one page at a time, until the page cleaner has processed its assigned dirty pages. Once it has written all of the pages, it will check to see whether there are more pages to be written or whether there have been any new triggers. If so, it will gather a new list of pages to process; if not, it will wait for the next page cleaner trigger.

Configuring

Because all page cleaners are started whenever a page cleaner trigger is hit, having too many page cleaners can overwhelm the run queue on the server and cause a significant performance impact on the system. Therefore, as a rule of thumb, set the number of page cleaners equal to the number of CPUs in the database server.

For example, for a server with 16 CPUs, the following command will set the number of asynchronous page cleaners to 16:

  update db cfg for <database_name> using NUM_IOCLEANERS 16  
Monitoring

When examining the effectiveness of the asynchronous page cleaners, it is important to examine the ratio of asynchronous data and index page writes. The percentage of asynchronous data (PADW) and index page writes (PAIX) can be calculated using the following formulas:

  PADW =  (Asynchronous pool data page writes / Buffer pool data writes) * 100%   PAIX =  (Asynchronous pool index page writes / Buffer pool index writes) * 100%  

The number of page cleaners could potentially be reduced if:

  • PADW is close to 100%

  • PAIX is close to 100%

It is also important to understand which of the three I/O cleaner triggers is causing the page cleaners to be activated and to write the dirty pages from the buffer pools to disk. This information is available in the database snapshot information or through an SQL table function. To take a database snapshot and extract the entries that describe the page cleaner triggers in the database snapshot, use the following command:

  get snapshot for database on <database_name>  grep -i 'cleaner triggers'  

The SQL table function that will return the page cleaner triggers would look like the following:

  SELECT DB_NAME,   POOL_LSN_GAP_CLNS,   POOL_DRTY_PG_STEAL_CLNS,   POOL_DRTY_PG_THRSH_CLNS   FROM TABLE(SNAPSHOT_DATABASE('SAMPLE',-1))   as SNAPSHOT_DATABASE  

The output of the get snapshot command would look like the following:

  LSN Gap cleaner triggers                        = 142   Dirty page steal cleaner triggers               = 2   Dirty page threshold cleaner triggers           = 396  

The output of the SQL function would look like the following:

  DB_NAME  POOL_LSN_GAP_CLNS   POOL_DRTY_PG_STEAL_CLNS   POOL_DRTY_PG_THRSH_CLNS   -------  -----------------   -----------------------   -----------------------   SAMPLE                  142                         2                       396   1 record(s) selected.  

In this case, the page cleaners were triggered by the "good" triggers (changed page threshold and/or LSN gap) well over 99% of the time. As was explained earlier, a dirty page steal trigger is done only after a number of pages have been synchronously written to disk and their associated clients forced to wait. If the number of "bad" page cleaner triggers (i.e., dirty page steal triggers) is more than a couple of percentage points of the total number of triggers, the values set for changed page threshold and soft max, as well as the number of page cleaners, should be examined.

The percentage of bad page cleaner triggers (PBPCT) is calculated as follows:

[View full width]
 
[View full width]
PBPCT = ((LSN gap cleaner triggers) / (Dirty page steal cleaner triggers + Dirty page graphics/ccc.gif threshold cleaner triggers + LSN gap cleaner triggers)) * 100%

Based on the snapshot information above, the PBPCT equals:

  PBPCT = ((2) / (142 + 396 + 2)) * 100%   PBPCT = 0.37%  

This ratio is very good and indicates that the system is primarily writing dirty pages to disk using the asynchronous page cleaners, and applications are not waiting for synchronous page writes. However, based on the following snapshot information for the page cleaner triggers, the PBPCT is much higher.

  DB_NAME  POOL_LSN_GAP_CLNS   POOL_DRTY_PG_STEAL_CLNS   POOL_DRTY_PG_THRSH_CLNS   -------  -----------------   -----------------------   -----------------------   SAMPLE                   17                      2034                      1192   1 record(s) selected.  

The PBPCT equals:

  PBPCT = ((2034) / (17 + 1192 + 2034)) * 100%   PBPCT = 62.7%  

In this case, the asynchronous page cleaners are rarely being triggered by the pool LSN gap trigger. This indicates that the database configuration parameter SOFTMAX may be set too high. To determine the value of the SOFTMAX configuration variable, use the command:

  get db cfg for sample  grep i softmax  

This returns the following:

  Percent log file reclaimed before soft chckpt (SOFTMAX) = 100  

In this case, the page cleaners are being triggered each time a log file is filled. Because this value is not abnormally high, next examine the log file size by using the command:

  get db cfg for sample  grep i logfilsiz  

This returns the following:

  Log file size (4KB)                   (LOGFILSIZ) = 250000  

The log file size for this database is 250,000 4-KB pages, or 1 GB. Therefore, the page cleaners are being triggered only after 1 GB of log information has been written. If the log file size cannot be reduced, the SOFTMAX configuration parameter can be reduced to cause the page cleaners to be triggered more frequently. To update the SOFTMAX configuration parameter to cause the page cleaners to trigger after 10% of a log has been written, use the following command:

  update db cfg for sample using softmax 10  

If the log files do not need to be this large and can be reduced, the log file size can be changed to 250 4K pages or 1 MB using the following command:

  update db cfg for sample using logfilsiz 250  
Asynchronous Pages per Write

When the page cleaners are triggered, it is important that they be writing to disk as efficiently as possible. Having the page cleaners triggered too infrequently and writing a large number of pages to disk will cause the system to slow down. Likewise, having the page cleaners triggered frequently but writing a small number of pages to disk is also inefficient.

The number of pages written per page cleaner trigger is not captured in any of the DB2 snapshots. However, the average number of pages written per asynchronous write request can be calculated, using the database base and buffer pool snapshot information. The average pages per asynchronous write (APPAW) can be calculated using the formula:

  APPAW = ((Asynchronous pool data page writes + Asynchronous pool index   page writes) / (Dirty page steal cleaner triggers +  Dirty page   threshold cleaner triggers + LSN gap cleaner triggers))  

Based on the following information from the database and buffer pool snapshots:

  LSN Gap cleaner triggers                            = 142   Dirty page steal cleaner triggers                   = 2   Dirty page threshold cleaner triggers               = 396   Asynchronous pool data page writes                  = 167660   Asynchronous pool index page writes                 = 178944  

the APPAW would be:

  APPAW = (167660 + 178944) / (142 + 2 + 396)   APPAW = 641.9  

In this case, the page cleaners wrote an average of 641.9 pages, or 2.5 MB, each time they were triggered. This value needs to be examined in the context of the size of the buffer pool that is being examined. For a 1-GB buffer pool, this is a small value, and perhaps the page cleaners are being triggered too aggressively. For a 100-MB buffer pool, this value is much more reasonable.

Automatic

No

Online

No

Number of I/O Servers (NUM_IOSERVERS)

Background

When a DB2 agent acting on behalf of an application needs to access table or index pages, it will first look for the pages in the database buffer pool area. If the page cannot be found in the buffer pool area, it will be read from disk into the buffer pool. I/O is very expensive and in this case, the agent cannot do anything but wait for the read request to finish before it can access the page. These page reads are typically done one page at a time and if the application also needs to access subsequent pages within the table or index, this is not an efficient method for reading the pages into the buffer pool.

In many situations, DB2 UDB can anticipate the pages that will be requested by an application and read them into the buffer pool before the agent actually attempts to access them. This is referred to as prefetching . Prefetching can improve the database performance because the pages will be found in the buffer pool when the agent accesses them, reducing or eliminating the time the application must wait for the page to be read from disk into the buffer pool. This is more relevant to DSS-type workloads that scan large indexes and tables than it is for OLTP-type workloads that involve less scanning and more random insert/update/delete activity.

Prefetching can be enabled by the DB2 optimizer when it is building the access plan for a statement and determines that it will be scanning a large portion of a table or index. It can also be enabled or triggered when DB2 is executing an access plan and detects that it has read a number of pages in sequence and will likely continue to do so. This is known as sequential detection and can be enabled or disabled using the database configuration parameter SEQDETECT.

Configuring

If the prefetch size is set as a multiple of the table space extent size (i.e., prefetch size = extent size x number of containers) for all of the table spaces in the database and all of the table spaces are being scanned at the same time, the number of prefetchers should be equal to the number of disks belonging to the database. However, if one or more of the table spaces has been set up using more aggressive prefetching (i.e., the prefetch size is a multiple of this value) and/or some of the table spaces are not being scanned at the same time as the others, the calculation becomes more complicated. To determine the number of prefetchers required in this case:

  • Determine the table spaces that will potentially be scanned at the same time.

  • For each of these table spaces, determine the number of prefetchers required to service a scan of it (based on the formulas above).

  • Sum these values to determine the total number of prefetchers required.

The number of prefetchers for the database can be set using the following command:

  update db cfg for <database_name> using NUM_IOSERVERS 64  
Monitoring

An important aspect of the prefetching performance that can be analyzed using the snapshot information is the amount of synchronous versus asynchronous I/O. The percentage of asynchronous read requests (or asynchronous read ratio, ARR) is calculated using the following formula:

  ARR = ((Asynchronous data reads + Asynchronous index reads) / (Data   logical reads + Index logical reads)) * 100%  

The ARR can also be calculated by using the SQL table function as follows:

[View full width]
 
[View full width]
select BP_NAME, (INT(((FLOAT(pool_Async_data_Reads + pool_async_index_Reads)) / (FLOAT graphics/ccc.gif (Pool_Index_L_Reads + Pool_data_L_Reads))) * 100)) AS Asynch_Read_Ratio FROM TABLE(SNAPSHOT_BP('SAMPLE',-1)) as SNAPSHOT_BP;

For the following buffer pool snapshot:

  Bufferpool Snapshot   Bufferpool name                           = IBMDEFAULTBP   Database name                             = SAMPLE   Database path                             = /v1/db2/NODE0000/SQL00001/   Input database alias                      = SAMPLE   Buffer pool data logical reads            = 523956   Buffer pool data physical reads           = 33542   Buffer pool data writes                   = 288   Buffer pool index logical reads           = 257949   Buffer pool index physical reads          = 11323   Total buffer pool read time (ms)          = 12012   Total buffer pool write time (ms)         = 720   Asynchronous pool data page reads         = 5227   Asynchronous pool data page writes        = 276   Buffer pool index writes                  = 255   Asynchronous pool index page reads        = 451   Asynchronous pool index page writes       = 239   Total elapsed asynchronous read time      = 819   Total elapsed asynchronous write time     = 663   Asynchronous read requests                = 3553   Direct reads                              = 69664   Direct writes                             = 16902   Direct read requests                      = 2780   Direct write requests                     = 411   Direct reads elapsed time (ms)            = 4830   Direct write elapsed time (ms)            = 979   Database files closed                     = 17   Data pages copied to extended storage     = 0   Index pages copied to extended storage    = 0   Data pages copied from extended storage   = 0   Index pages copied from extended storage  = 0   Unread prefetch pages                     = 0   Vectored IOs                              = 0   Pages from vectored IOs                   = 0   Block IOs                                 = 0   Pages from block IOs                      = 0   Physical page maps                        = 0  

the ARR would be:

  ARR = ((5227 + 451) / (523956 + 257949)) * 100%   ARR = 0.73 %  

This is a very small value and would indicate that there is very little prefetch activity occurring for this database. This could be due to a number of reasons, such as:

  1. The workload is reading and writing single rows, so it cannot take advantage of prefetching.

  2. There are too few prefetchers configured for the database.

  3. The table spaces in the database are set up with only one container each, so that prefetching cannot take place.

For a system with multiple buffer pools, it is normally a good idea to separate tables with a high percentage of asynchronous reads from those with a low percentage of asynchronous reads. The ARR can also be examined for each table space to help separate the table spaces with high and low ARRs. For the following table space snapshot information, we see that there are four table spaces with different access patterns:

  Tablespace name                               = TSPC1   Buffer pool data logical reads              = 1200   Asynchronous pool data page reads           = 32   Buffer pool index logical reads             = 3400   Asynchronous pool index page reads          = 128   Tablespace name                               = TSPC2   Buffer pool data logical reads              = 15000   Asynchronous pool data page reads           = 14000   Buffer pool index logical reads             = 90000   Asynchronous pool index page reads          = 86000   Tablespace name                               = TSPC3   Buffer pool data logical reads              = 9000   Asynchronous pool data page reads           = 8600   Buffer pool index logical reads             = 6250   Asynchronous pool index page reads          = 5975   Tablespace name                               = TSPC4   Buffer pool data logical reads              = 7200   Asynchronous pool data page reads           = 1400   Buffer pool index logical reads             = 800   Asynchronous pool index page reads          = 770  

In this case, the ARRs would be:

  TBSPC1               3.5%   TBSPC2               95.2%   TBSCP3               95.6%   TBSPC4               27.1%  

Because the table spaces TBSPC1 and TBSPC4 both have low ARRs, they should not be placed in the same buffer pool as table space TBSPC2 or TBSPC3. Because the table spaces TBSPC2 and TBSPC3 both have a high ARR, they could be placed in the same buffer pool. However, DB2 places a limit on the number of pages that can be prefetched into a buffer pool before they are accessed by a db2 agent, so having two table spaces with a high ARR in the same buffer pool may have an adverse effect. It may be more optimal to place the table spaces TBSPC2 and TBSPC3 in their own buffer pools.

Physical Read Rate

It is also important to examine the rate at which DB2 is reading pages from disk. This should be calculated for all table spaces and when compared, will show whether the I/O is spread evenly across all table spaces or whether the workload on certain table spaces is causing more I/O than in other table spaces.

The rate at which pages are read from disk (or page read rate, PRR) is calculated using the following formula:

[View full width]
 
[View full width]
PRR = (Data physical reads + Index physical reads) / (Time since monitor switches reset or graphics/ccc.gif activated)

Based on the above buffer pool snapshot, the PRR would be:

  PRR = (33542 + 11323) / (23.53 seconds)   PRR = 1906.7 reads per second  

Examining the table space snapshot for the table spaces using the identified buffer pool may provide additional information to help determine which table space(s) are being read most often. Any table space(s) with a significantly higher I/O rate than the other table spaces can be examined to determine whether the performance could be improved by assigning the table space to its own buffer pool or by adding containers to the table space to improve the I/O bandwidth.

Read Time

For every millisecond that a db2 agent spends waiting for a page to be read into the buffer pool, the application is also waiting. The database snapshots do not provide information on the amount of time taken by each read request. However, they do provide enough information to calculate the average time taken per read request. The average read time (ART) is calculated using the following formula:

  ART = (Total buffer pool read time) / (Data physical reads + Index physical reads)  

The ART can also be calculated using the SQL table function, as follows:

  select BP_NAME,   (INT(((FLOAT(pool_read_time)) / (FLOAT(Pool_Index_p_Reads + Pool_data_p_Reads))) * 100))   AS Avg_Read_Time_in_ms   FROM TABLE(SNAPSHOT_BP('SAMPLE',-1))   as SNAPSHOT_BP;  

The SQL table function can also be used to calculate the buffer pool, data, and index hit ratios, as well as the ARR and ART in one SQL statement. This can be done using the following statement:

[View full width]
 
[View full width]
select BP_NAME, (INT((1 - ((FLOAT(pool_Index_P_Reads + pool_data_P_Reads)) / (FLOAT(Pool_Index_L_Reads + graphics/ccc.gif Pool_data_L_Reads)))) * 100)) AS BPool_Hit_Ratio, (INT((1 - ((FLOAT(pool_Data_P_Reads)) / (FLOAT(Pool_Data_L_Reads)))) * 100)) AS Data_Hit_Ratio, (INT((1 - ((FLOAT(pool_Index_P_Reads)) / (FLOAT(Pool_Index_L_Reads)))) * 100)) AS Index_Hit_Ratio, (INT(((FLOAT(pool_Async_data_Reads + pool_async_index_Reads)) / (FLOAT(Pool_Index_L_Reads graphics/ccc.gif + Pool_data_L_Reads))) * 100)) AS Asynch_Read_Ratio, (INT(((FLOAT(pool_read_time)) / (FLOAT(Pool_Index_p_Reads + Pool_data_p_Reads))) * 100)) AS Avg_Read_Time_in_ms FROM TABLE(SNAPSHOT_BP('SAMPLE',-1)) as SNAPSHOT_BP;

The output of this statement looks like the following:

[View full width]
 
[View full width]
BP_NAME BPOOL_HIT_RATIO DATA_HIT_RATIO INDEX_HIT_RATIO ASYNCH_READ_RATIO graphics/ccc.gif AVG_READ_TIME_IN_MS ------------ --------------- -------------- --------------- ----------------- graphics/ccc.gif ------------------- IBMDEFAULTBP 69 78 63 0 362 1 record(s) selected.
Automatic

No

Online

No

Number of Commits to Group (MINCOMMIT)

Background

The number of commits to group database configuration parameter (MINCOMMIT) allows the writing of the database log buffer to disk until a minimum number of commits have been performed. This delay can help reduce the database manager overhead associated with the writing of log buffers to disk. This can result in improved performance when there are multiple applications running against a database and many commits are requested by the applications within a very short time frame.

NOTE

The grouping of commits will occur only when the value of this parameter is greater than one and when the number of applications connected to the database is greater than or equal to the value of this parameter.


When commit grouping is being performed, application commit requests are held until either the number of commit requests equals the value of this parameter or one second has elapsed.

Configuring

The MINCOMMIT value can have a significant impact on performance, both positive and negative, and must be set correctly. The default value for MINCOMMIT is 1, and for online applications with many short transactions this should not be changed unless there is a large number of applications executing in the database at the same time; otherwise , very short transactions may be dramatically increased.

When there is a large number of concurrent applications with short transactions and the log buffer is being flushed too frequently by the application commits, the MINCOMMIT parameter can be adjusted if there is a large number of commits per second. The number of transactions per second can be determined using the database snapshots and the MINCOMMIT parameter adjusted accordingly . The number of commits to group can be increased by using the following command:

  update db cfg for <database_name> using MINCOMMIT 5  

If the MINCOMMIT database configuration parameter is increased, the log buffer (LOGBUFSZ) may also need to be increased to avoid having a full log buffer force a flush to disk during these heavy load periods. In this case, the size of the log buffer can be calculated by using the following formula:

  LOGBUFSZ = MINCOMMIT * (log space used, on average, by a transaction)  

In a decision support/data warehousing system, leave MINCOMMIT set to 1. Setting MINCOMMIT greater than 1 can greatly increase processing time if there are not enough concurrent transactions per second to warrant it.

Monitoring

The database monitor can be used to determine the average number of transactions that have been performed over a period of time, as follows:

  get snapshot for database on <database_name>  

The important elements of the snapshot that should be analyzed are below:

  Last reset timestamp                      = 09-12-2002 14:51:43.786876   Snapshot timestamp                        = 09-12-2002 14:56:27.787088   Commit statements attempted               = 1011   Rollback statements attempted             = 10   Log space used by the database (Bytes)    = 3990   Log pages written                         = 23  

The number of transactions per second can be calculated by using the following formula:

  TPS = ((Commit statements attempted + Rollback statements attempted) /   (Last reset timestamp - Snapshot timestamp))  

An estimation of the number of log buffer flushes can be made using the database snapshot information. If the log buffer is being flushed every second, the number of log flushes would equal:

  NumLF = Last reset timestamp - Snapshot timestamp  

If the log buffer is being flushed only once it is filled, the number of log flushes would equal:

  NumLF = Log pages written / LOGBUFSZ  

The actual number of log buffer flushes will have been between these two calculated values. The log space used per transaction can then be calculated using the following formula:

  LSPT = (Log space used by the database / (Last reset timestamp -   Snapshot timestamp))  
Automatic

No

Online

Yes

Catalog Cache Size (CATALOGCACHE_SZ)

Background

The catalog cache is allocated out of the database shared memory and is used to cache system catalog information. In DB2 UDB Version 7, the catalog cache existed only on the catalog partition, so that application needing information from the database system catalog always had to make a call to the catalog partition to retrieve the information. In a partitioned database in DB2 UDB Version 8, there is one catalog cache for each database partition. Caching catalog information at individual partitions allows the database manager to reduce its internal overhead by eliminating the need to access the system catalogs (and/or the catalog node in a partitioned database environment) to obtain information that has previously been retrieved. The catalog cache is used to store:

  • SYSTABLES information (including packed descriptors)

  • Authorization information, including SYSDBAUTH information and execute privileges for routines

  • SYSROUTINES information

The use of the catalog cache can help improve the overall performance of:

  • Binding packages and compiling SQL statements

  • Operations that involve checking database-level privileges

  • Operations that involve checking execute privileges for routines

  • Applications that are connected to noncatalog partitions in a partitioned database environment

Configuring

If the size of the catalog cache is set to the default value of -1 in a partitioned database, the value used to calculate the size of the catalog cache will be four times the value specified for the MAXAPPLS configuration parameter, with a minimum value of eight.

For a database with a large number of objects and/or users, the default size of the catalog cache may not be sufficient, especially in a partitioned database. The size of the catalog cache can be configured using the following command:

  update db cfg for <database_name> using CATALOGCACHE_SZ 32  
Monitoring

When preparing execution strategies for SQL statements, DB2 first checks the catalog cache to learn about the definition of the database, table spaces, tables, indexes, and views. If all the required information is available in the cache, DB2 can avoid disk I/Os and access to the catalog partition and, therefore, shorten plan preparation times. Having a high package cache hit ratio (95% or better) is key for online database applications.

Keep increasing the CATALOGCACHE_SZ until you reach 95%. Issue the command db2 get snapshot for database on DBNAME and compute the hit ratio using the following formula:

  (1  (Catalog cache inserts / Catalog cache lookups)) * 100  

You should also increase the CATALOGCACHE_SZ if the value of catalog cache overflows is greater than zero. In addition, if the value of catalog cache heap full is greater than zero, both DBHEAP and CATALOGCACHE_SZ should be proportionally increased.

Automatic

No

Online

Yes

Changed Pages Threshold (CHNGPGS_THRESH)

Background

As discussed previously, the changed pages threshold is one of the triggers of the asynchronous page cleaners within a database. The changed pages threshold specifies the level (percentage) of changed pages within the database buffer pool(s) at which the asynchronous page cleaners will be started, if they are not currently active. When the page cleaners are triggered, all of the page cleaners are triggered at the same time. They will each gather up to 400 pages from the dirty lists for the database buffer pools. The pages from the dirty list will then be written to disk one page at a time until the page cleaner has processed its assigned dirty pages. Once it has written all of the pages, it will check to see whether there are more pages to be written or whether there have been any new triggers. If so, it will gather a new list of pages to process; if not, it will wait for the next page cleaner trigger.

Configuring

The default setting for the changed pages threshold is too high for most online databases. When set to 60%, it is typically never the trigger for the page cleaners and when it is the page cleaner trigger, the amount of data to be written to disk can be quite high. This can cause a significant slowdown of the system as the I/O cleaners are triggered and flush the dirty pages. With the changed pages threshold set to 20% or 30%, the amount of data to be written to disk is much less, and the impact on the system is much less. In addition, there is a much higher possibility that the changed pages threshold will trigger the page cleaners.

To update the database configuration to set the changed pages threshold to 25%, use the following command:

  update db cfg for <database_name> using CHNGPGS_THRESH 25  
Monitoring

When analyzing the page cleaner activity, it is important to understand which of the three I/O cleaner triggers is causing the page cleaners to be activated and write the dirty pages from the buffer pools to disk. This information is available in the database snapshot information or through an SQL table function. To take a database snapshot and extract the entries that describe the page cleaner triggers in the database snapshot, use the following command:

  get snapshot for database on <database_name>  grep -i 'cleaner   triggers'  

The SQL table function that will return the page cleaner triggers would look like the following:

  SELECT DB_NAME,   POOL_LSN_GAP_CLNS,   POOL_DRTY_PG_STEAL_CLNS,   POOL_DRTY_PG_THRSH_CLNS   FROM TABLE(SNAPSHOT_DATABASE('SAMPLE',-1))   as SNAPSHOT_DATABASE  

The output of the get snapshot command would look like the following:

  LSN Gap cleaner triggers                        = 142   Dirty page steal cleaner triggers               = 2   Dirty page threshold cleaner triggers           = 396  

The output of the SQL function would look like the following:

  DB_NAME  POOL_LSN_GAP_CLNS  POOL_DRTY_PG_STEAL_CLNS  POOL_DRTY_PG_THRSH_CLNS   -------  -----------------  -----------------------  -----------------------   SAMPLE                 142                        2                      396  1 record(s) selected. 

In this case, the page cleaners were triggered by the "good" triggers (changed page threshold and/or LSN gap) well over 99% of the time. As was explained earlier, a dirty page steal trigger is done only after a number of pages have been synchronously written to disk and their associated clients forced to wait. If the number of "bad" page cleaner triggers (i.e., dirty page steal triggers) is more than a couple of percentage points of the total number of triggers, the values set for changed page threshold and softmax, as well as the number of page cleaners, should be examined.

The percentage of bad page cleaner triggers (PBPCT) is calculated as follows:

  PBPCT = ((LSN Gap Cleaner Triggers) / (Dirty page steal cleaner   triggers + Dirty page threshold cleaner triggers + LSN gap cleaner   triggers)) * 100%  

Based on the snapshot information above, the PBPCT equals:

  PBPCT = ((2) / (142 + 396 + 2)) * 100%   PBPCT = 0.37%  

This ratio is very good and indicates that the system is primarily writing dirty pages to disk using the asynchronous page cleaners, and applications are not waiting for synchronous page writes. However, based on the following snapshot information for the page cleaner triggers, the PBPCT is much higher.

  DB_NAME  POOL_LSN_GAP_CLNS   POOL_DRTY_PG_STEAL_CLNS   POOL_DRTY_PG_THRSH_CLNS   -------  -----------------   -----------------------   -----------------------   SAMPLE                   17                      2034                      1192   1 record(s) selected.  

The PBPCT equals:

  PBPCT = ((2034) / (17 + 1192 + 2034)) * 100%   PBPCT = 62.7%  
Automatic

No

Online

No

Average Number of Active Applications (AVG_APPLS)

Background

The average number of active applications is used by the DB2 optimizer to estimate the amount of resources, particularly buffer pool space, that will be available to run the access plan for the statement being optimized.

Configuring

When setting this parameter, use the database monitor to capture the average number of concurrently executing applications over a period of time and, using a sampling technique, calculate an average. To update the database configuration to set the average number of active applications based on the snapshot information, use the following command:

  update db cfg for <database_name> using AVG_APPLS 16  
Monitoring

To take a database snapshot and extract the entry that gives the number of concurrently executing applications, use the following command:

  get snapshot for database on <database_name>   grep -i 'Appls. executing in db manager currently'  

The SQL table function that will also return the number of concurrently executing applications would look like the following:

  SELECT APPLS_IN_DB2   FROM TABLE(SNAPSHOT_DATABASE('SAMPLE',-1))as SNAPSHOT_DATABASE  

The output of the get snapshot command would look like the following:

  Appls. executing in db manager currently   = 12  

The output of the SQL function would look like the following:

  APPLS_IN_DB2   ------------   12   1 record(s) selected.  

This information should be captured over a period of time, at average as well as peak workloads, and averaged to determine an optimal value for this parameter.

Automatic

No

Online

Yes



Advanced DBA Certification Guide and Reference for DB2 UDB v8 for Linux, Unix and Windows
Advanced DBA Certification Guide and Reference for DB2 Universal Database v8 for Linux, UNIX, and Windows
ISBN: 0130463884
EAN: 2147483647
Year: 2003
Pages: 121

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net