Example 2: Unable to Get Desired Throughput


Problem Description

CustomerB was in the midst of implementing a new application, performing stress tests before going into production. While performing the stress test, the database server could not ramp up to the desired number of client applications while achieving the required response times.

Problem Analysis and Resolution

Step 1: Examine the Database and Database Manager Configuration Parameters
Database manager configuration

The three parameters in the database manager configuration that stand out are highlighted in bold below. In this case, the application is mainly an online application; therefore, intra-partition parallelism should be disabled. Because there will be a large number of concurrent applications, the sort heap threshold (SHEAPTHRES) will need to be high; however, the existing value is sufficient.

  Node type = Database Server with local and remote clients   Database manager configuration release level            = 0x0900   CPU speed (millisec/instruction)             (CPUSPEED) = 8.580921e-07   Max number of concurrently active databases     (NUMDB) = 8   Data Links support                          (DATALINKS) = NO   Federated Database System Support           (FEDERATED) = YES   Transaction processor monitor name        (TP_MON_NAME) =   Default charge-back account           (DFT_ACCOUNT_STR) =   Java Development Kit 1.1 installation path   (JDK_PATH) =   Diagnostic error capture level              (DIAGLEVEL) = 3   Notify Level                              (NOTIFYLEVEL) = 3   Diagnostic data directory path               (DIAGPATH) = /products/db2/fsprdi/sqllib/db2dump   Default database monitor switches   Buffer pool                         (DFT_MON_BUFPOOL) = OFF   Lock                                   (DFT_MON_LOCK) = OFF   Sort                                   (DFT_MON_SORT) = OFF   Statement                              (DFT_MON_STMT) = OFF   Table                                 (DFT_MON_TABLE) = OFF   Unit of work                            (DFT_MON_UOW) = OFF   Monitor health of instance and databases   (HEALTH_MON) = OFF   SYSADM group name                        (SYSADM_GROUP) = DBA   SYSCTRL group name                      (SYSCTRL_GROUP) = DBASCTL   SYSMAINT group name                    (SYSMAINT_GROUP) = DBAMNT   Database manager authentication        (AUTHENTICATION) = SERVER   Cataloging allowed without authority   (CATALOG_NOAUTH) = NO   Trust all clients                      (TRUST_ALLCLNTS) = YES   Trusted client authentication          (TRUST_CLNTAUTH) = CLIENT   Use SNA authentication                   (USE_SNA_AUTH) = NO   Bypass federated authentication            (FED_NOAUTH) = NO   Default database path                       (DFTDBPATH) = /products/db2/fsprdi   Database monitor heap size (4KB)          (MON_HEAP_SZ) = 56   UDF shared memory set size (4KB)           (UDF_MEM_SZ) = 256   Java Virtual Machine heap size (4KB)     (JAVA_HEAP_SZ) = 2048   Audit buffer size (4KB)                  (AUDIT_BUF_SZ) = 0   Backup buffer default size (4KB)            (BACKBUFSZ) = 1024   Restore buffer default size (4KB)           (RESTBUFSZ) = 1024    Sort heap threshold (4KB)                  (SHEAPTHRES) = 100000    Directory cache support                     (DIR_CACHE) = YES   Application support layer heap size (4KB)   (ASLHEAPSZ) = 15   Max requester I/O block size (bytes)         (RQRIOBLK) = 32767   Query heap size (4KB)                   (QUERY_HEAP_SZ) = 16000   DRDA services heap size (4KB)            (DRDA_HEAP_SZ) = 128   Priority of agents                           (AGENTPRI) = SYSTEM   Max number of existing agents               (MAXAGENTS) = 200   Agent pool size                        (NUM_POOLAGENTS) = 4 (calculated)   Initial number of agents in pool       (NUM_INITAGENTS) = 0   Max number of coordinating agents     (MAX_COORDAGENTS) = MAXAGENTS   Max no. of concurrent coordinating agents  (MAXCAGENTS) = MAX_COORDAGENTS   Max number of logical agents          (MAX_LOGICAGENTS) = MAX_COORDAGENTS   Keep fenced process                        (KEEPFENCED) = YES   Number of pooled fenced processes         (FENCED_POOL) = MAX_COORDAGENTS   Initialize DARI process with JVM       (INITFENCED_JVM) = NO   Initial number of fenced DARI process  (NUM_INITFENCED) = 0   Index re-creation time                       (INDEXREC) = RESTART   Transaction manager database name         (TM_DATABASE) = 1ST_CONN   Transaction resync interval (sec)     (RESYNC_INTERVAL) = 180   SPM name                                     (SPM_NAME) = it_ibm60   SPM log size                          (SPM_LOG_FILE_SZ) = 256   SPM resync agent limit                 (SPM_MAX_RESYNC) = 20   SPM log path                             (SPM_LOG_PATH) =   TCP/IP Service name                          (SVCENAME) = 50000   Discovery mode                               (DISCOVER) = SEARCH   Discovery communication protocols       (DISCOVER_COMM) = TCPIP   Discover server instance                (DISCOVER_INST) = ENABLE    Maximum query degree of parallelism   (MAX_QUERYDEGREE) = ANY     Enable intra-partition parallelism     (INTRA_PARALLEL) = NO    No. of int. communication buffers(4KB)(FCM_NUM_BUFFERS) = 1024   Node connection elapse time (sec)         (CONN_ELAPSE) = 10   Max number of node connection retries (MAX_CONNRETRIES) = 5   Max time difference between nodes (min) (MAX_TIME_DIFF) = 60   db2start/db2stop timeout (min)        (START_STOP_TIME) = 10  
Database configuration for database sample

The three parameters in the database manager configuration that stand out are highlighted in bold below.

  • Because the applications will be performing a lot of inserts /updates/deletes, the log buffer size (LOGBUFSZ) will be important. However, the log buffer is sized adequately for this workload.

  • Because this is an online application, the sort list heap (SORTHEAP) should not be too large. When it is too large, the DB2 optimizer will tend to favor sorts over index scans , and with a large number of concurrent applications, sorting will be detrimental to performance.

  • In this case, the default buffer pool size (BUFFPAGE) is 1,000 4-KB pages. To determine the real size of the buffer pool, use the following statement:

      select * from syscat.bufferpools  
    • In this case, there is only one buffer pool (IBMDEFAULTBP), and it has a size of 5,000 4-KB pages.

    • This is a rather small buffer pool; therefore, increasing the size of the buffer pool likely will help performance.

  Database configuration release level                    = 0x0a00   Database release level                                  = 0x0a00   Database territory                                      = US   Database code page                                      = 819   Database code set                                       = ISO8859-1   Database country/region code                            = 1   Dynamic SQL Query management           (DYN_QUERY_MGMT) = DISABLE   Discovery support for this database       (DISCOVER_DB) = ENABLE   Default query optimization class         (DFT_QUERYOPT) = 5   Degree of parallelism                      (DFT_DEGREE) = 1   Continue upon arithmetic exceptions   (DFT_SQLMATHWARN) = NO   Default refresh age                   (DFT_REFRESH_AGE) = 0   Number of frequent values retained     (NUM_FREQVALUES) = 10   Number of quantiles retained            (NUM_QUANTILES) = 20   Backup pending                                          = NO   Database is consistent                                  = NO   Rollforward pending                                     = NO   Restore pending                                         = NO    Multi-page file allocation enabled                      = NO    Log retain for recovery status                          = NO   User exit for logging status                            = NO   Data Links Token Expiry Interval (sec)      (DL_EXPINT) = 60   Data Links Write Token Init Expiry Intvl(DL_WT_IEXPINT) = 60   Data Links Number of Copies             (DL_NUM_COPIES) = 1   Data Links Time after Drop (days)        (DL_TIME_DROP) = 1   Data Links Token in Uppercase                (DL_UPPER) = NO   Data Links Token Algorithm                   (DL_TOKEN) = MAC0   Database heap (4KB)                            (DBHEAP) = 1200   Size of database shared memory (4KB)  (DATABASE_MEMORY) = AUTOMATIC   Catalog cache size (4KB)              (CATALOGCACHE_SZ) = (MAXAPPLS*4)    Log buffer size (4KB)                        (LOGBUFSZ) = 256    Utilities heap size (4KB)                (UTIL_HEAP_SZ) = 5000    Buffer pool size (pages)                     (BUFFPAGE) = 1000    Extended storage segments size (4KB)    (ESTORE_SEG_SZ) = 16000   Number of extended storage segments   (NUM_ESTORE_SEGS) = 0   Max storage for lock list (4KB)              (LOCKLIST) = 100   Max size of appl. group mem set (4KB) (APPGROUP_MEM_SZ) = 20000   Percent of mem for appl. group heap   (GROUPHEAP_RATIO) = 70   Max appl. control heap size (4KB)     (APP_CTL_HEAP_SZ) = 128   Sort heap thres for shared sorts (4KB) (SHEAPTHRES_SHR) = (SHEAPTHRES)    Sort list heap (4KB)                         (SORTHEAP) = 1024    SQL statement heap (4KB)                     (STMTHEAP) = 4096   Default application heap (4KB)             (APPLHEAPSZ) = 256   Package cache size (4KB)                   (PCKCACHESZ) = (MAXAPPLS*8)   Statistics heap size (4KB)               (STAT_HEAP_SZ) = 4384   Interval for checking deadlock (ms)         (DLCHKTIME) = 10000   Percent. of lock lists per application       (MAXLOCKS) = 10   Lock timeout (sec)                        (LOCKTIMEOUT) = -1    Changed pages threshold                (CHNGPGS_THRESH) = 60     Number of asynchronous page cleaners   (NUM_IOCLEANERS) = 10     Number of I/O servers                   (NUM_IOSERVERS) = 10    Index sort flag                             (INDEXSORT) = YES   Sequential detect flag                      (SEQDETECT) = YES   Default prefetch size (pages)         (DFT_PREFETCH_SZ) = 32   Track modified pages                         (TRACKMOD) = OFF   Default number of containers                            = 1   Default tablespace extentsize (pages)   (DFT_EXTENT_SZ) = 32   Max number of active applications            (MAXAPPLS) = AUTOMATIC   Average number of active applications       (AVG_APPLS) = 1   Max DB files open per application            (MAXFILOP) = 64   Log file size (4KB)                         (LOGFILSIZ) = 1000   Number of primary log files                (LOGPRIMARY) = 3   Number of secondary log files               (LOGSECOND) = 2   Changed path to log files                  (NEWLOGPATH) =   Path to log files                                       = /databases/sample/logs   Overflow log path                     (OVERFLOWLOGPATH) =   Mirror log path                         (MIRRORLOGPATH) =   First active log file                                   =   Block log on disk full                (BLK_LOG_DSK_FUL) = NO   Percent of max active log space by transaction(MAX_LOG) = 0   Num. of active log files for 1 active UOW(NUM_LOG_SPAN) = 0   Group commit count                          (MINCOMMIT) = 1   Percent log file reclaimed before soft chckpt (SOFTMAX) = 100   Log retain for recovery enabled             (LOGRETAIN) = OFF   User exit for logging enabled                (USEREXIT) = OFF   Auto restart enabled                      (AUTORESTART) = ON   Index re-creation time                       (INDEXREC) = SYSTEM (RESTART)   Default number of loadrec sessions    (DFT_LOADREC_SES) = 1   Number of database backups to retain   (NUM_DB_BACKUPS) = 12   Recovery history retention (days)     (REC_HIS_RETENTN) = 366   TSM management class                    (TSM_MGMTCLASS) =   TSM node name                            (TSM_NODENAME) =   TSM owner                                   (TSM_OWNER) =   TSM password                             (TSM_PASSWORD) =  
Step 2: Make Changes Based on the Examination of Configuration Information

The size of the buffer pool was increased as follows :

  alter bufferpool IBMDEFAULTBP immediate size 500000  
Step 3: Retest

After increasing the size of the buffer pool, the response time improved, and more concurrent users were able to connect to the database and run the applications; however, the system was still unable to attain the required number of applications.

Step 4: Check System for I/O Bottlenecks, Excess Paging, or Other Processes/Applications Using Excess System Resources

This requires operating system tools such as vmstat, iostat, and/or top to capture the memory, I/O, and process level information. It is important to capture the snapshots from these tools over a period of time, not taking just one snapshot, especially because the first line of the output of the vmstat and iostat tools contains average information since the server was started and is not really useful in analyzing a problem.

While the applications are running, and particularly when the system begins to stop responding, capture iostat output, as follows:

  iostat 5 > iostat.out  

Next, examine the iostat.out file, below:

  Disks:        % tm_act     Kbps      tps    Kb_read   Kb_wrtn   Hdisk0           2.6      13.2       7.3         19         0   Hdisk1           1.8       9.2       7.3         19         0   Hdisk2           1.8       9.2       7.3         19         0   Hdisk3           1.8       9.2       7.3         19         0   Hdisk4           1.8       9.2       7.3         19         0   Hdisk5          42.8     213.2      107.3       237       286   Hdisk6           1.8       9.2       7.3         19         0   Hdisk7           1.8       9.2       7.3         19         0   Hdisk8           1.8       9.2       7.3         19         0   Hdisk9           1.8       9.2       7.3         19         0   cd0              0.0       0.0       0.0          0         0   Disks:        % tm_act     Kbps      tps    Kb_read   Kb_wrtn   Hdisk0           2.6      13.2       7.3         19         0   Hdisk1           1.8       9.2       7.3         19         0   Hdisk2           1.8       9.2       7.3         19         0   Hdisk3           1.8       9.2       7.3         19         0   Hdisk4           1.8       9.2       7.3         19         0   Hdisk5          47.1     234.7      111.3       261       292   Hdisk6           1.8       9.2       7.3         19         0   Hdisk7           1.8       9.2       7.3         19         0   Hdisk8           1.8       9.2       7.3         19         0   Hdisk9           1.8       9.2       7.3         19         0   cd0              0.0       0.0       0.0          0         0   Disks:        % tm_act     Kbps      tps    Kb_read   Kb_wrtn   Hdisk0           2.6      13.2       7.3         19         0   Hdisk1           1.8       9.2       7.3         19         0   Hdisk2           1.8       9.2       7.3         19         0   Hdisk3           1.8       9.2       7.3         19         0   Hdisk4           1.8       9.2       7.3         19         0   Hdisk5          43.3     214.6      109.8       283       239   Hdisk6           1.8       9.2       7.3         19         0   Hdisk7           1.8       9.2       7.3         19         0   Hdisk8           1.8       9.2       7.3         19         0   Hdisk9           1.8       9.2       7.3         19         0   cd0              0.0       0.0       0.0          0         0  

The iostat output shows that one disk is much busier than all of the other disks; therefore, it is important to understand what part of the database is physically stored on Hdisk5.

In this case, the disks are not striped volumes , they are just a bunch of disks (JBOD); therefore, use the operation system tools to determine what file system was created on Hdisk5, then analyze the database to determine what part of the database is stored on that file system.

Step 5: Determine What Is on Hdisk5

The operating system tools indicate that the file system /tablespaces/sample/temp was created on Hdisk5. Based on the name of the file system, it appears to be where the temporary table space was created. This can be verified as follows:

Determine the table space ID for the temporary table space:

  select tbspaceid, tbspace from syscat.tablespaces where datatype='T'  

The output of the statement above is:

  TBSPACEID   TBSPACE   ----------- -----------   1 TEMPSPACE1   1 record(s) selected.  

Determine the container definitions for the temporary table space as follows:

  list tablespace containers for 1  

The output of the statement above is:

  Tablespace Containers for Tablespace 1   Container ID                         = 0   Name                                 = /tablespaces/sample/temp   Type                                 = Path  

Therefore, the disk with the most activity is the disk where the temporary table space has been placed.

Step 6: Determine and Implement a More Optimal Database Layout

Based on the iostat output, there are 10 physical disks in the server. To eliminate the I/O bottleneck and spread the I/O across as many disks as possible, the physical design of the database needed to be changed. Instead of isolating each of the table spaces to its own physical disk, each table space would be created with eight containers, with one container on each of the disks Hdisk2 through Hdisk9. In this case, because it was an online system, one disk was set aside for the database logs and another for the operating system paging space.

To change the physical layout of the database, an offline backup was taken, the database was dropped, and a redirected restore was performed. During the redirected restore, the table space definitions were changed.

Step 7: Retest

After changing the table space definitions, the response time improved even more, and the desired number of concurrent applications were able to run successfully.

Step 8: Check the System to Ensure Elimination of the I/O Bottleneck

While the applications are running, capture iostat output as follows:

  iostat 5 > iostat.out  

Next examine the iostat.out file, below:

  Disks:        % tm_act     Kbps      tps    Kb_read   Kb_wrtn   Hdisk0           2.6      13.2       7.3         19         0   Hdisk1           4.8      12.2      17.3          0        78   Hdisk2          12.1      29.2      27.2        109       134   Hdisk3          12.5      29.4      27.1        127       125   Hdisk4          12.2      29.1      27.5        113       130   Hdisk5          12.3      29.3      27.4        116       124   Hdisk6          12.4      29.3      27.4        124       126   Hdisk7          12.3      29.1      27.2        103       131   Hdisk8          12.4      29.4      27.3        108       132   Hdisk9          12.2      29.1      27.1        110       130   cd0              0.0       0.0       0.0          0         0  

The iostat output now shows that the I/O is spread evenly across the data disks, and none of the disks is busy more than 15% of the time. This leaves room for the I/O subsystem to handle additional requests if needed.



Advanced DBA Certification Guide and Reference for DB2 UDB v8 for Linux, Unix and Windows
Advanced DBA Certification Guide and Reference for DB2 Universal Database v8 for Linux, UNIX, and Windows
ISBN: 0130463884
EAN: 2147483647
Year: 2003
Pages: 121

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net