Example 5: Periodic Noticeable Slowdown of Database Response Time


Example 5: Periodic Noticeable Slowdown of Database Response Time

Problem Description

CustomerC is running an eCommerce site with the backed data store on DB2. At various times throughout the day, the database response time for application requests becomes very long.

Problem Analysis and Resolution

Step 1: Examine the Database and Database Manager Configuration Parameters
Database manager configuration

The three parameters in the database manager configuration that stand out are highlighted in bold below. In this case, the application is mainly an online application; therefore, intra-partition parallelism should be disabled. Because there will be a large number of concurrent applications, the sort heap threshold (SHEAPTHRES) will need to be high; however, the existing value is sufficient.

  Node type = Database Server with local clients   Database manager configuration release level            = 0x0a00   CPU speed (millisec/instruction)             (CPUSPEED) = 4.000000e-05   Max number of concurrently active databases     (NUMDB) = 8   Data Links support                          (DATALINKS) = NO   Federated Database System Support           (FEDERATED) = NO   Transaction processor monitor name        (TP_MON_NAME) =   Default charge-back account           (DFT_ACCOUNT_STR) =   Java Development Kit installation path       (JDK_PATH) = /wsdb/v81/bldsupp/AIX/jdk1.3.0   Diagnostic error capture level              (DIAGLEVEL) = 3   Notify Level                              (NOTIFYLEVEL) = 3   Diagnostic data directory path               (DIAGPATH) =   Default database monitor switches   Buffer pool                           (DFT_MON_BUFPOOL) = OFF   Lock                                     (DFT_MON_LOCK) = OFF   Sort                                     (DFT_MON_SORT) = OFF   Statement                                (DFT_MON_STMT) = OFF   Table                                   (DFT_MON_TABLE) = OFF   Timestamp                           (DFT_MON_TIMESTAMP) = ON   Unit of work                              (DFT_MON_UOW) = OFF   Monitor health of instance and databases   (HEALTH_MON) = OFF   SYSADM group name                        (SYSADM_GROUP) = BUILD   SYSCTRL group name                      (SYSCTRL_GROUP) =   SYSMAINT group name                    (SYSMAINT_GROUP) =   Database manager authentication        (AUTHENTICATION) = SERVER   Cataloging allowed without authority   (CATALOG_NOAUTH) = YES   Trust all clients                      (TRUST_ALLCLNTS) = YES   Trusted client authentication          (TRUST_CLNTAUTH) = CLIENT   Use SNA authentication                   (USE_SNA_AUTH) = NO   Bypass federated authentication            (FED_NOAUTH) = NO   Default database path                       (DFTDBPATH) = /home/dsnow   Database monitor heap size (4KB)          (MON_HEAP_SZ) = 90   Java Virtual Machine heap size (4KB)     (JAVA_HEAP_SZ) = 1024   Audit buffer size (4KB)                  (AUDIT_BUF_SZ) = 0   Size of instance shared memory (4KB)  (INSTANCE_MEMORY) = AUTOMATIC   Backup buffer default size (4KB)            (BACKBUFSZ) = 1024   Restore buffer default size (4KB)           (RESTBUFSZ) = 1024    Sort heap threshold (4KB)                  (SHEAPTHRES) = 80000    Directory cache support                     (DIR_CACHE) = YES   Application support layer heap size (4KB)   (ASLHEAPSZ) = 15   Max requester I/O block size (bytes)         (RQRIOBLK) = 32767   Query heap size (4KB)                   (QUERY_HEAP_SZ) = 1000   DRDA services heap size (4KB)            (DRDA_HEAP_SZ) = 128   Priority of agents                           (AGENTPRI) = SYSTEM   Max number of existing agents               (MAXAGENTS) = 200   Agent pool size                        (NUM_POOLAGENTS) = 100(calculated)   Initial number of agents in pool       (NUM_INITAGENTS) = 0   Max number of coordinating agents     (MAX_COORDAGENTS) = MAXAGENTS   Max no. of concurrent coordinating agents  (MAXCAGENTS) = MAX_COORDAGENTS   Max number of client connections      (MAX_CONNECTIONS) = MAX_COORDAGENTS   Keep fenced process                        (KEEPFENCED) = YES   Number of pooled fenced processes         (FENCED_POOL) = MAX_COORDAGENTS   Initialize fenced process with JVM     (INITFENCED_JVM) = NO   Initial number of fenced processes     (NUM_INITFENCED) = 0   Index re-creation time                       (INDEXREC) = RESTART   Transaction manager database name         (TM_DATABASE) = 1ST_CONN   Transaction resync interval (sec)     (RESYNC_INTERVAL) = 180   SPM name                                     (SPM_NAME) =   SPM log size                          (SPM_LOG_FILE_SZ) = 256   SPM resync agent limit                 (SPM_MAX_RESYNC) = 20   SPM log path                             (SPM_LOG_PATH) =   TCP/IP Service name                          (SVCENAME) =   Discovery mode                               (DISCOVER) = SEARCH   Discovery communication protocols       (DISCOVER_COMM) =   Discover server instance                (DISCOVER_INST) = ENABLE    Maximum query degree of parallelism   (MAX_QUERYDEGREE) = ANY     Enable intra-partition parallelism     (INTRA_PARALLEL) = NO    No. of int. communication buffers(4KB)(FCM_NUM_BUFFERS) = 512  
Database configuration for database sample

The three parameters in the database manager configuration that stand out are highlighted in bold below.

  • Since the applications will be performing a lot of inserts /updates/deletes, the log buffer size (LOGBUFSZ) will be important. However, the log buffer is sized adequately for this workload.

  • Because this is an online application, the sort list heap (SORTHEAP) should not be too large. When it is too large, the DB2 optimizer will tend to favor sorts over index scans , and with a large number of concurrent applications, sorting will be detrimental to performance.

  • In this case, the default buffer pool size (BUFFPAGE) is 1,000 4-KB pages. To determine the real size of the buffer pool, use the following statement:

      select * from syscat.bufferpools  
    • In this case, there is only one buffer pool (IBMDEFAULTBP), and it has a size of 675,000 4-KB pages.

    • This is adequately sized for this workload.

  Database configuration release level                    = 0x0a00   Database release level                                  = 0x0a00   Database territory                                      = US   Database code page                                      = 819   Database code set                                       = ISO8859-1   Database country/region code                            = 1   Dynamic SQL Query management           (DYN_QUERY_MGMT) = DISABLE   Discovery support for this database       (DISCOVER_DB) = ENABLE   Default query optimization class         (DFT_QUERYOPT) = 5   Degree of parallelism                      (DFT_DEGREE) = 1   Continue upon arithmetic exceptions   (DFT_SQLMATHWARN) = NO   Default refresh age                   (DFT_REFRESH_AGE) = 0   Number of frequent values retained     (NUM_FREQVALUES) = 10   Number of quantiles retained            (NUM_QUANTILES) = 20   Backup pending                                          = NO   Database is consistent                                  = NO   Rollforward pending                                     = NO   Restore pending                                         = NO    Multi-page file allocation enabled                      = NO    Log retain for recovery status                          = NO   User exit for logging status                            = NO   Data Links Token Expiry Interval (sec)      (DL_EXPINT) = 60   Data Links Write Token Init Expiry Intvl(DL_WT_IEXPINT) = 60   Data Links Number of Copies             (DL_NUM_COPIES) = 1   Data Links Time after Drop (days)        (DL_TIME_DROP) = 1   Data Links Token in Uppercase                (DL_UPPER) = NO   Data Links Token Algorithm                   (DL_TOKEN) = MAC0   Database heap (4KB)                            (DBHEAP) = 1200   Size of database shared memory (4KB)  (DATABASE_MEMORY) = AUTOMATIC   Catalog cache size (4KB)              (CATALOGCACHE_SZ) = (MAXAPPLS*4)    Log buffer size (4KB)                        (LOGBUFSZ) = 256    Utilities heap size (4KB)                (UTIL_HEAP_SZ) = 5000    Buffer pool size (pages)                     (BUFFPAGE) = 1000    Extended storage segments size (4KB)    (ESTORE_SEG_SZ) = 16000   Number of extended storage segments   (NUM_ESTORE_SEGS) = 0   Max storage for lock list (4KB)              (LOCKLIST) = 100   Max size of appl. group mem set (4KB) (APPGROUP_MEM_SZ) = 20000   Percent of mem for appl. group heap   (GROUPHEAP_RATIO) = 70   Max appl. control heap size (4KB)     (APP_CTL_HEAP_SZ) = 128   Sort heap thres for shared sorts (4KB) (SHEAPTHRES_SHR) = (SHEAPTHRES)    Sort list heap (4KB)                         (SORTHEAP) = 4096    SQL statement heap (4KB)                     (STMTHEAP) = 4096   Default application heap (4KB)             (APPLHEAPSZ) = 256   Package cache size (4KB)                   (PCKCACHESZ) = (MAXAPPLS*8)   Statistics heap size (4KB)               (STAT_HEAP_SZ) = 4384   Interval for checking deadlock (ms)         (DLCHKTIME) = 10000   Percent. of lock lists per application       (MAXLOCKS) = 10   Lock timeout (sec)                        (LOCKTIMEOUT) = -1    Changed pages threshold                (CHNGPGS_THRESH) = 60     Number of asynchronous page cleaners   (NUM_IOCLEANERS) = 10     Number of I/O servers                   (NUM_IOSERVERS) = 40    Index sort flag                             (INDEXSORT) = YES   Sequential detect flag                      (SEQDETECT) = YES   Default prefetch size (pages)         (DFT_PREFETCH_SZ) = 32   Track modified pages                         (TRACKMOD) = OFF   Default number of containers                            = 1   Default tablespace extentsize (pages)   (DFT_EXTENT_SZ) = 32   Max number of active applications            (MAXAPPLS) = AUTOMATIC   Average number of active applications       (AVG_APPLS) = 1   Max DB files open per application            (MAXFILOP) = 64   Log file size (4KB)                         (LOGFILSIZ) = 100000   Number of primary log files                (LOGPRIMARY) = 32   Number of secondary log files               (LOGSECOND) = 8   Changed path to log files                  (NEWLOGPATH) =   Path to log files                                       = /databases/sample/logs   Overflow log path                     (OVERFLOWLOGPATH) =   Mirror log path                         (MIRRORLOGPATH) =   First active log file                                   =   Block log on disk full                (BLK_LOG_DSK_FUL) = NO   Percent of max active log space by transaction(MAX_LOG) = 0   Num. of active log files for 1 active UOW(NUM_LOG_SPAN) = 0   Group commit count                          (MINCOMMIT) = 1   Percent log file reclaimed before soft chckpt (SOFTMAX) = 100   Log retain for recovery enabled             (LOGRETAIN) = OFF   User exit for logging enabled                (USEREXIT) = OFF   Auto restart enabled                      (AUTORESTART) = ON   Index re-creation time                       (INDEXREC) = SYSTEM (RESTART)   Default number of loadrec sessions    (DFT_LOADREC_SES) = 1   Number of database backups to retain   (NUM_DB_BACKUPS) = 12   Recovery history retention (days)     (REC_HIS_RETENTN) = 366   TSM management class                    (TSM_MGMTCLASS) =   TSM node name                            (TSM_NODENAME) =   TSM owner                                   (TSM_OWNER) =   TSM password                             (TSM_PASSWORD) =  
Step 2: Check the System for I/O Bottlenecks, Excess Paging, or Other Processes/Applications Using Excess System Resources

This requires operating system tools such as vmstat, iostat, ps, and/or top to capture the memory, I/O, and process level information. It is important to capture the snapshots from these tools over a period of time, not taking just one snapshot, especially because the first line of the output of the vmstat and iostat tools contains average information since the server was started and is not really useful in analyzing a problem.

Step 3: Make Changes Based on Examination of Configuration Information

While the system is experiencing the slowdown, capture iostat output as follows :

  iostat 5 > iostat.out  

Next, examine the iostat.out file. Below is an entry from the iostat.out file.

  Disks:        % tm_act     Kbps      tps    Kb_read   Kb_wrtn   hdisk0           2.6      13.2       7.3         19        21   hdisk1           8.3      99.2      37.3          3       208   hdisk2           8.3      99.2      37.3          3       208   hdisk3           8.3      99.2      37.3          3       208   hdisk4           8.3      99.2      37.3          3       208   hdisk5           8.3      99.2      37.3          3       208   hdisk6           8.3      99.2      37.3          3       208   hdisk7           8.3      99.2      37.3          3       208   cd0              0.0       0.0       0.0          0         0  

The other entries were very similar to the above, so they are not shown here. In this case, the I/O activity is all writes and is spread across all of the disks on which the database resides, i.e., hdisk1 through hdisk7.

Next, run the top or ps aux command to determine which processes are currently executing that could be causing the I/O. The output of the ps aux command is below:

  USER         PID %CPU %MEM   SZ  RSS    TTY STAT    STIME  TIME COMMAND   inst1      40970 10.0  2.0   40 23888      - A      Oct 03 572:36 db2pclnr   inst1      32776 10.0  2.0   40 23888      - A      Oct 03 570:33 db2pclnr   inst1      24582 10.0  2.0   40 23888      - A      Oct 03 576:07 db2pclnr   inst1      16388 10.0  2.0   40 23888      - A      Oct 03 538:13 db2pclnr   inst1     122974 10.0  0.0  320   324      - A      Oct 03 548:32 db2pclnr   inst1     294988 10.0  0.0  700   720      - A      Oct 03 513:02 db2pclnr   inst1      23109 10.0  2.0   52 23900      - A      Oct 03 522:04 db2pclnr   inst1     401514 10.0  2.0  184 24032      - A      Oct 03 509:06 db2pclnr   inst1      81940 10.0  2.0  116 23964      - A      Oct 03 511:07 db2pclnr   inst1     262242 10.0  0.0  936   940      - A      Oct 03 511:01 db2pclnr  

The ps aux command shows that there are 10 db2pclnr processes running, using all of the available CPU cycles. The db2pclnr process is the I/O cleaner or page cleaner process that asynchronously flushes dirty pages from the buffer pools to disk.

NOTE

The ps aux command gives more information than just db2pclnr processes; however, only the relevant information is shown above.


Step 4: Determine Which I/O Cleaner Triggers Caused the Buffer Pool Flush

To determine which of the triggers for the page cleaners actually caused the I/O cleaners to start, a database snapshot is required. To capture the database snapshot, use the following command:

  get snapshot for database on sample > sample.db.snap  

To take a snapshot and extract just the lines with the page cleaner trigger information, use the following command:

  get snapshot for database on sample  grep i 'triggers'  

The output from the above command looks like the following:

  LSN Gap cleaner triggers                 = 0   Dirty page steal cleaner triggers        = 0   Dirty page threshold cleaner triggers    = 1  

Therefore, the I/O cleaners were triggered by the changed pages threshold (CHNGPGS_THRESH) database configuration parameter. Looking back at the database configuration information above, the CHNGPGS_THRESH configuration parameter is set to 60, as shown below:

  Changed pages threshold                (CHNGPGS_THRESH) = 60  

As discussed previously, there is only one buffer pool (IBMDEFAULTBP), and it has a size of 675,000 4-KB pages. Therefore, when the CHNGPGS_THRESH triggers the I/O cleaners, there are 675,000 x .6, or 405,000 dirty pages that need to be flushed to disk. When the I/O cleaners are triggered, all of the configured page cleaners (10, in this case) are started at the same time. Therefore, there will be 10 db2pclnr processes working to write 1.6 GB of data to disk.

Step 5: Reduce the Changed Pages Threshold Parameter Setting

To reduce the impact of the page cleaners when they are triggered, they need to be triggered more often but with fewer dirty pages to write to disk. This can be done by:

  • Reducing the changed pages threshold (CHNGPGS_THRESH)

  • Reducing the percent log file reclaimed before soft checkpoint (SOFTMAX)

Update the database configuration to reduce the CHNGPGS_THRESH and SOFTMAX, as follows:

  update db cfg for sample using chngpgs_thresh 20   update db cfg for sample using softmax 50  

This will cause the I/O cleaners to be triggered when half a log file is filled or when 20% of the buffer pool is occupied by dirty pages. This will trigger the I/O cleaners more frequently but with less work to perform when they are triggered.

Step 6: Retest

After reducing the CHNGPGS_THRESH and SOFTMAX, the Web servers were reconnected to the database, and the performance slowdown did not reoccur.

The page cleaner trigger information from the snapshots was recaptured and looked like the following:

  LSN Gap cleaner triggers                    = 0   Dirty page steal cleaner triggers           = 0   Dirty page threshold cleaner triggers       = 4  


Advanced DBA Certification Guide and Reference for DB2 UDB v8 for Linux, Unix and Windows
Advanced DBA Certification Guide and Reference for DB2 Universal Database v8 for Linux, UNIX, and Windows
ISBN: 0130463884
EAN: 2147483647
Year: 2003
Pages: 121

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net