The system catalog tables contain information about columns, tables, and indexes: the number of rows in a table, the use of space by a table or index, and the number of values in a column. However, this information is not kept current but instead has to be generated by a utility called RUNSTATS. The statistics collected by the RUNSTATS utility can be used in two ways: to display the physical organization of the data and to provide information that the DB2 optimizer needs to select the best access path for executing SQL statements.
To have efficient access paths to data, current statistics must exist that reflect the state of your tables, columns, and indexes. Whenever a dynamic SQL statement is issued, the DB2 optimizer reads the system catalog tables to review the available indexes, the size of each table, the characteristics of a column, and other information to select the best access path for executing the query. If the statistics do not reflect the current state of the tables, the DB2 optimizer will not have the correct information to make the best choice in selecting an access path to execute your query.
This issue becomes more crucial as the complexity of the SQL statements increases. When only one table is accessed without indexes, fewer choices are available to the optimizer. However, when the SQL statement involves several tables, each with one or more indexes, the number of choices available to the optimizer increases dramatically.
Choosing the correct access path can reduce the response time considerably by reducing the amount of I/O needed to correctly retrieve that data. Depending on the size of the tables, the indexes available, and other considerations, the selected access path can affect the response time, which varies from minutes to hours. You may also want to consider the physical and logical design of your database.
The next step in improving performance involves the use of the RUNSTATS utility. It is recommended that you execute RUNSTATS on a frequent basis on tables that have a large number of updates, inserts, or deletes. For tables with a great deal of insert or delete activity, you may decide to run statistics after a fixed period of time or after the insert or delete activity.
DB2 allows you to reorganize and use the RUNSTATS utility on the system catalog tables. This important feature can improve the access plans generated when querying the system catalog tables. DB2 may access these tables when you issue an SQL statement, even though you are referencing only user tables. Therefore, it is very important to have current statistics on the system catalog tables.
It is recommended that you use the RUNSTATS utility after REORG on a table.
RUNSTATS is critical to a properly tuned DB2 environment. The utility gathers statistics about DB2 table spaces and indexes. It can also be embedded in other utilities, such as REORG and LOAD, enabling you to collect statistics during the execution of a given utility. The RUNSTATS utility can be executed on a table space and its indexes, for each object independently, or even for a specific column.
RUNSTATS should initially be executed on all columns after the data is loaded and reorganized. RUNSTATS execution should be done after
The statistics should be run before any binding/rebinding of packages or plans and before any performance tuning that would require monitoring the statistics in the catalog. Using SHRLEVEL CHANGE with RUNSTATS will not lock or interfere with other processes. This access means that you can execute RUNSTATS as often as necessary.
An option allows RUNSTATS to report on the statistics gathered, using the REPORT NO|YES parameter. The following UPDATE options can control what and whether RUNSTATS updates the DB2 catalog.
Following is an example of executing the RUNSTATS utility and updating the catalog with only the statistics that are collected for access-path selection. The REPORT option will route the collected statistics to the SYSPRINT output DD statement.
RUNSTATS TABLESPACE DB2CERT.CERTTS REPORT YES UPDATE ACCESSPATH
The RUNSTATS sampling feature allows you to choose the percentage of nonindexed column statistics gathered and will help RUNSTATS to execute faster. The sampling technique can affect the optimizer's choice in access-path selection because the sampling must be representative of the data. In the absence of true representation, it would assume a linear distribution of data, which would affect the filter factor and costing done by the optimizer. Following is an example of using a sampling of 25 percent:
RUNSTATS TABLESPACE DB2CERT.CERTTS TABLE(ALL)SAMPLE 25 INDEX(ALL) SHRLEVEL CHANGE
Key-correlation statistics enable DB2 to gather statistics about how one column's value is related to the value of another column. Without these statistics, only the FIRSTKEYCARD (number of distinct values in the first key column) and FULLKEYCARD (number of distinct values of the first key) columns are updated with limited information, and the correlation is on columns for FULLKEYCARD only. Without key-correlation statistics, no second- or third- key cardinality, and multikey cardinality is considered independently, often leading to inaccurate estimation of filter factors and inaccurate estimation of join size, join sequencing, and join methods, which can result in inefficient access-path selection.
Key-correlation statistics are collected by RUNSTATS with minimal additional overhead and can provide CPU and elapsed-time reductions through improved cost and resources estimations. These key-correlation statistics play a major role in access-path selection by providing the optimizer columns with information on multicolumn cardinalities and multicolumn frequent values.
This feature enables you to specify the number of columns (NUMCOLS) on which to collect statistics and to specify the number of values (COUNT). These keywords are used in the RUNSTATS utility. The KEYCARD parameter indicates that cardinalities for each column, concatenated with all previous key columns, are to be collected. This will provide more information for the optimizer when the related columns are used in a compound WHERE clause.
Using this RUNSTATS feature enables you to build the frequency values for critical concatenated key columns, such as the first and second columns, or maybe the first, second, and third columns.
The following example shows how to update statistics on frequently occurring values. For a given index, we can use the KEYCARD option to indicate that the utility is to collect cardinality statistics for the index. For a three-column index, the utility will collect cardinality statistics for the first column, the first and second, and the first, second, and third. By using NUMCOLS, the utility will also collect the 10 most frequently occurring values for the first group and then the 15 most frequently occurring values for the second group:
RUNSTATS INDEX (DBA1.CERTIX) KEYCARD FREQVAL NUMCOLS 1 COUNT 10 FREQVAL NUMCOLS 2 COUNT 15
The next example shows how to update statistics for a group of columns. The COLGROUP keyword is used to group the columns; the distribution statistics for the group will be stored in SYSCOLDIST:
RUNSTATS TABLESPACE (DBA1.CERTTS) TABLE (DBA1.CANDIDATE) COLGROUP (CITY, STATE)
If the COLGROUP and FREQVAL options are used together, the frequency-distribution statistics for a specific group of nonindex columns can be collected. The COUNT keyword specifies the number of frequently occurring values to be collected from the specified column group:
RUNSTATS TABLESPACE (DBA1.CERTTS) TABLE (DBA1.CANDIDATE) COLGROUP (CITY, STATE) FREQVAL COUNT 10
Collection of data-correlation information and skewed data distributions can help with better access-path selection and may help reduce RID list size.
These SYSCOLDIST and SYSCOLDISTSTATS values for frequency-distribution statistics are stored catalog table and columns as follows:
The COLCARDF column in SYSCOLUMNS contains the number of distinct values for a single column.
RUNSTATS on the DB2 Catalog
The RUNSTATS utility needs to be executed on the DB2 catalog, depending on the amount of DDL, DML, and other activities that will insert and delete rows in DB2 catalog tables. DB2 will not be able to appropriately optimize queries against the catalog without having current statistics on the table spaces and index spaces. The same principle applies here as with DB2 user-defined objects; a current view of all DB2 objects is also needed to determine the need for and frequency of reorganization of DB2 catalog table spaces and index spaces.
SQL Cache Invalidation
The only way to invalidate statements in the dynamic SQL cache is to execute the RUNSTATS utility on the objects that the queries are dependent on. If statistics are updated, affected dynamic SQL statements that are cached are invalidated so they can be reprepared. Cache invalidation is at table space and index space levels. This can be done using a RUNSTATS REPORT YES; however, this can also take a fair amount of time. Optionally, RUNSTATS can be run with REPORT NO and UPDATE NONE. This way, users can invalidate dynamic SQL cache statements without the overhead of collecting the stats, generating a report, or updating catalog tables.
RUNSTATS TABLESPACE DB1.CERTTS REPORT NO UPDATE NONE
DB2 provides the ability to keep a history of statistics for better proactive performance-analysis capabilities. This helps to better monitor objects in terms of growth over time, along with other information to determine whether objects need to change. The keyword HISTORY in the RUNSTATS, REORG, LOAD, and REBUILD utilities is used to specify that historical statistics are to be collected:
REORG INDEX.....HISTORY RUNSTATS TABLESPACE.....HISTORY
The historical statistics are kept in the following nine DB2 catalog tables:
As the statistics age and are no longer needed, they can be deleted with the MODIFY STATISTICS utility.
In order to avoid having to run a separate RUNSTATS job, inline statistics can be collected when running REORG, LOAD, or REBUILD INDEX. Establishing inline statistics is faster than running RUNSTATS after REORG, LOAD, or REBUILD INDEX, because the inline statistics are established as a byproduct by separate subtasks of these utilities. An option to request a statistics report and/or update the DB2 catalog for the objects involved is available. For REORG, you can request the statistics for the table space being reorganized and/or the indexes associated with the tables of the table space or for the indexes being reorged.
For the LOAD utility, you can request inline stats only for REPLACE or RESUME and for REBUILD INDEX only for indexes being built.
The inline statistics can be requested by using the STATISTICS clause in the REORG, LOAD, or REBUILD utilities. The same parameters as for the RUNSTATS utility can be used for inline statistics.
Inline statistics cannot be collected for the following:
Inline statistics established during LOAD may include the information about rows that have been discarded. If only a few rows have been discarded, this is not a concern, but if a large number of rows have been discarded, the inline stats may be very inaccurate, and it may be better to execute a separate RUNSTATS utility.
DB2 can collect statistics on table spaces and index spaces and then periodically write this information to two user-defined tables. The real-time statistics can be used by user-written queries/programs, a DB2-supplied stored procedure, or Control Center to make decisions about object maintenance.
DB2 is always collecting statistics for database objects. The statistics are kept in virtual storage and are calculated and updated asynchronously on externalization. In order to externalize them, the environment must be properly set up. A new set of DB2 objects must be created so that DB2 can write out the statistics. SDSNSAMP(DSNTESS) contains the information necessary to set up these objects.
Two tables, with appropriate indexes, must be created to hold the statistics:
These tables are kept in a database named DSNRTSDB, which must be started in order to externalize the statistics that are being held in virtual storage. DB2 will then populate the tables with either one row per table space or index space or one row per partition. For tables in a data sharing environment, each member will write its own statistics to the real-time statistics (RTS) tables.
Some of the important statistics that are collected for table spaces are
Some statistics that may help determine when a reorg is needed are space allocated, extents, number of inserts/updates/deletes (singleton or mass) since the last REORG or LOAD REPLACE; number of unclustered inserts, number of disorganized LOBs, and number of overflow records created since the last REORG.
Other statistics help determine when RUNSTATS should be executed: number of inserts/updates/deletes (singleton and mass) since the last RUNSTATS execution. Statistics collected to help with COPY determination are distinct updated pages and changes since the last COPY execution and the RBA/LRSN of first update since last COPY.
Statistics are also gathered on indexes. Basic index statistics are total number of unique of duplicate entries, number of levels and active pages, space allocated, and extents. A statistic that helps to determine when a reorg is needed is the time when the last REBUILD, REORG, or LOAD REPLACE occurred. Statistics can be collected on the number of updates/deletes (real or pseudo, singleton or mass)/inserts (random and those that were after the highest key) since the last REORG or REBUILD execution. These statistics are, of course, very helpful for determining how the data physically looks after certain processes, such as batch inserts, have occurred, so we can take appropriate actions, if necessary.
Externalizing and Using Real-Time Statistics
Various events can trigger the externalization of the statistics. DSNZPARM STATSINSTdefault, 30 minutesis used to control the externalization of the statistics at a subsystem level. Several processes will have an effect on the real-time statistics: SQL, utilities, and the dropping/creating of objects.
Once externalized, queries can then be written against the tables. For example, a query against the TABLESPACESTATS table can be written to identify when a table space needs to be copied because more than 30 percent of the pages have changed since the last image copy was taken:
SELECT NAME FROM SYSIBM.SYSTABLESPACESTATS WHERE DBNAME = 'DB1' and ((COPYUPDATEDPAGES*100)/NACTIVE)>30
This query compares the last RUNSTATS timestamp to that of the last REORG execution on the same object to determine when RUNSTATS is needed. If the date of the last REORG is more recent than the last RUNSTATS, it may be time to execute RUNSTATS:
SELECT NAME FROM SYSIBM.SYSTABLESPACESTATS WHERE DBNAME = 'DB1' and (JULIAN_DAY(REORGLASTTIME)>JULIAN_DAY(STATSLASTTIME))
The next example may be useful if you want to monitor the number of records that were inserted since the last REORG or LOAD REPLACE and that are not well clustered with respect to the clustering index. Ideally, well clustered means that the record was inserted into a page that was within 16 pages of the ideal candidate page, determined by the clustering index. The SYSTABLESPACESTATS table value REORGUNCLUSTINS can be used to determine whether you need to run REORG after a series of inserts:
SELECT NAME FROM SYSIBM.SYSTABLESPACESTATS WHERE DBNAME = 'DB1' and ((REORGUNCLUSTINS*100)/TOTALROWS)>10
A DB2-supplied stored procedure can help with this process and, possibly, even work toward automating the whole determination/utility execution process. This stored procedure, DSNACCOR, is a sample procedure that will query the RTS tables and determine which objects need to be reorganized, image copied, updated with current statistics, have taken too many extents, and those that may in a restricted status. DSNACCOR creates and uses its own declared temporary tables and must run in a WLM address space. The output of the stored procedure provides recommendations by using a predetermined set of criteria in formulas that use the RTS and user input for their calculations. DSNACCOR can make recommendations for everything (COPY, REORG, RUNSTATS, EXTENTS, RESTRICT) or for one or more of your choice and for specific object types (table spaces and/or indexes).
STOSPACE collects space information from the VSAM catalogs for storage groups and related table spaces and indexes. This utility then updates the DB2 catalog with information about the amount of space being used. This utility is executed against a storage group. (For more information about storage groups, refer to Chapter 4.)
An example of STOSPACE syntax follows:
STOSPACE STOGROUP CERTSTG