Because DB2 maintains the SYSIBM and SYSCAT system catalog tables and views, they are read-only. However, you can update the SYSSTAT views. This is a special set of views that you can use to update database statistics. The information contained in these views affects how the DB2 optimizer chooses access plans when executing a query. For planning purposes, you can change the statistical information in the SYSSTAT tables so that they do not reflect the actual state of tables and indexes. This lets you: You must have explicit DBADM authority for the database to modify statistics for tables and indexes and their components. That is, your user ID must have DBADM authority in the SYSCAT.DBAUTH table. Table D.2 provides information about the system catalog tables that contain catalog statistics and the RUNSTATS options that collect specific statistics. Table D.2. Table Statistics (SYSCAT.TABLES and SYSSTAT.TABLES) | | RUNSTATS Option |
---|
Statistic | Description | Table | Indexes |
---|
FPAGES | Number of pages being used by a table. | Yes | Yes | NPAGES | Number of pages containing rows. | Yes | Yes | OVERFLOW | Number of rows that overflow. | Yes | No | CARD | Number of rows in a table (cardinality). | Yes | Yes[1] | ACTIVE_BLOCKS | For MDC tables, the total number of occupied blocks. | Yes | No |
[1] If the table does not have any indexes defined and you request statistics for indexes, no new CARD statistics are updated. The previous CARD statistics are retained. The Yes or No in the RUNSTATS Option column indicates whether you need to execute the RUNSTATS command on the table, the indexes, or both to collect the statistics specified in the Statistic column. For example, if you want to collect the statistics on FPAGES, you must execute the RUNSTATS command on both the table and indexes. On the other hand, if you want to collect the statistics for OVERFLOW, you need to execute the RUNSTATS command on the table only. Let's walk you through a what-if scenario: If the EMPLOYEE table had a lot more rows that it has now, which access plan would the DB2 optimizer choose? First, you collect the statistics on the EMPLOYEE table using the RUNSTATS command: RUNSTATS ON TABLE SYLVIAQ.EMPLOYEE After it is completed, obtain the statistics by querying the SYSSTAT.TABLES view: SELECT * FROM SYSSTAT.TABLES WHERE TABSCHEMA = 'SYLVIAQ' AND TABNAME = 'EMPLOYEE' Figure D.8 shows the output. Figure D.8. Statistics for the EMPLOYEE table 
In Figure D.8, the CARD column indicates that the EMPLOYEE table currently has 32 rows. To update the statistics for the EMPLOYEE table to reflect a bigger table, issue: UPDATE SYSSTAT.TABLES SET CARD = 10000, NPAGES = 1000, FPAGES = 1000, OVERFLOW = 2 WHERE TABSCHEMA = 'SYLVIAQ' AND TABNAME = 'EMPLOYEE' After this is completed, you can run your query against the EMPLOYEE table and get the access plan in text-based format using the db2exfmt command, or in graphic format using the Visual Explain GUI tool (see Chapter 4, Using the DB2 Tools). You must be careful when manually updating catalog statistics: arbitrary changes can seriously affect the performance of subsequent queries. You can use any of the following methods to revert your changes back. ROLLBACK the unit of work in which the changes have been made (assuming the unit of work has not been committed). Use the RUNSTATS utility to recalculate and refresh the catalog statistics. Update the catalog statistics to indicate that statistics have not been gathered. (For example, setting column NPAGES to 1 indicates that the number-of-pages statistic has not been collected.) Replace the catalog statistics with the data they contained before you made any changes. This method is possible only if you used the db2look command to capture the statistics before you made any changes. In some cases, the optimizer may determine that some particular statistical value or combination of values is not valid, and it will use the default values and issue a warning. Such circumstances are rare, however, since most of the validation is done when updating the statistics. Tables D.3 through D.8 briefly describe the rest of the updatable SYSSTAT views. Table D.3. Column Statistics (SYSCAT.COLUMNS and SYSSTAT.COLUMNS) | | RUNSTATS Option |
---|
Statistic | Description | Table | Indexes |
---|
COLCARD | Column cardinality. | Yes | Yes[1] | AVGCOLLEN | Average length of a column. | Yes | Yes[1] | HIGH2KEY | Second highest value in a column. | Yes | Yes[1] | LOW2KEY | Second lowest value in a column. | Yes | Yes[1] | NUMNULLS | Number of NULLs in a column. | Yes | Yes[1] | SUB_COUNT | Average number of subelements. | Yes | No[2] | SUB_DELIM_LENGTH | Average length of each delimiter separating each subelement. | Yes | No[2] |
[1] Column statistics are gathered for the first column in the index key.
[2] These statistics provide information about data in columns that contain a series of subfields or subelements that are delimited by blanks. The SUB_COUNT and SUB_DELIM_LENGTH statistics are collected only for single-byte character set string columns of type CHAR, VARCHAR, GRAPHIC, and VARGRAPHIC. Table D.4. Multicolumn Statistics (SYSCAT.COLGROUPS and SYSSTAT.COLGROUPS) | | RUNSTATS Option |
---|
Statistic | Description | Table | Indexes |
---|
COLGROUPCARD | Cardinality of the column group. | Yes | No |
The multicolumn distribution statistics listed in Tables D.5 and D.6 are not collected by RUNSTATS. You can update them manually, however. Table D.5. Multicolumn Distribution Statistics (SYSCAT.COLGROUPDIST and SYSSTAT.COLGROUPDIST) | | RUNSTATS Option |
---|
Statistic | Description | Table | Indexes |
---|
TYPE | F = Frequency value. Q = Quantile value. | Yes | No | ORDINAL | Ordinal number of the column in the group. | Yes | No | SEQNO | Sequence number n that represents the nth TYPE value. | Yes | No | COLVALUE | The data value as a character literal or a null value. | Yes | No |
Table D.6. Multicolumn Distribution Statistics 2 (SYSCAT.COLGROUPDISTCOUNTS and SYSSTAT.COLGROUPDISTCOUNTS) | | RUNSTATS Option |
---|
Statistic | Description | Table | Indexes |
---|
TYPE | F = Frequency value. Q = Quantile value. | Yes | No | SEQNO | Sequence number n that represents the nth TYPE value. | Yes | No | VALCOUNT | If TYPE = F, VALCOUNT is the number of occurrences of COLVALUEs for the column group identified by this SEQNO. If TYPE = Q, VALCOUNT is the number of rows whose value is less than or equal to COLVALUEs for the column group with this SEQNO. | Yes | No | DISTCOUNT | If TYPE = Q, this column contains the number of distinct values that are less than or equal to COLVALUEs for the column group with this SEQNO. Null if unavailable. | Yes | No |
Table D.7. Index Statistics (SYSCAT.INDEXES and SYSSTAT.INDEXES) | | RUNSTATS Option |
---|
Statistic | Description | Table | Indexes |
---|
NLEAF | Number of index leaf pages. | No | Yes | NLEVELS | Number of index levels. | No | Yes | CLUSTERRATIO | Degree of clustering of table data. | No | Yes[2] | CLUSTERFACTOR | Finer degree of clustering. | No | See [1], [2] | DENSITY | Ratio (percentage) of SEQUENTIAL_ PAGES to the number of pages in the range of pages occupied by the index.[3] | No | Yes | FIRSTKEYCARD | Number of distinct values in the first column of the index. | No | Yes | FIRST2KEYCARD | Number of distinct values in the first two columns of the index. | No | Yes | FIRST3KEYCARD | Number of distinct values in the first three columns of the index. | No | Ye | FIRST4KEYCARD | Number of distinct values in the first four columns of the index. | No | Yes | FULLKEYCARD | Number of distinct values in all columns of the index, excluding any key value in a type-2 index for which all record identifiers (RIDs) are marked deleted. | No | Yes | PAGE_FETCH_PAIRS | Page fetch estimates for different buffer sizes. | No | See [1], [2] | SEQUENTIAL_PAGES | Number of leaf pages located on disk in index key order, with few or no large gaps between them. | No | Yes | AVERAGE_SEQUENCE_PAGES | Average number of index pages accessible in sequence. This is the number of index pages that the prefetchers can detect as being in sequence. | No | Yes | AVERAGE_RANDOM_PAGES | Average number of random index pages between sequential page accesses. | No | Yes | AVERAGE_SEQUENCE_GAP | Gap between sequences. | No | Yes | AVERAGE_SEQUENCE_FETCH_PAGES | Average number of table pages accessible in sequence. This is the number of table pages that the prefetchers can detect as being in sequence when they fetch table rows using the index. | No | Yes[4] | AVERAGE_RANDOM_FETCH_PAGES | Average number of random table pages between sequential page accesses when fetching table rows using the index. | No | Yes[4] | AVERAGE_SEQUENCE_FETCH_GAP | Gap between sequences when fetching table rows using the index. | No | Yes[4] | NUMRIDS | Number of record identifiers (RIDs) in the index, including deleted RIDs in type-2 indexes. | No | Yes | NUMRIDS_DELETED | Total number of RIDs marked deleted in the index, except RIDs on leaf pages on which all record identifiers are marked deleted. | No | Yes | NUM_EMPTY_LEAFS | Total number of leaf pages on which all record identifiers are marked deleted. | No | Yes |
[2] CLUSTERFACTOR and PAGE_FETCH_PAIRS are not collected with the DETAILED clause unless the table is of a respectable size. If the table is greater than about 25 pages, then CLUSTERFACTOR and PAGE_FETCH_ PAIRS statistics are collected. In this case, CLUSTERRATIO is 1 (not collected). If the table is a relatively small table, only CLUSTERRATIO is filled in by RUNSTATS while CLUSTERFACTOR and PAGE_FETCH_PAIRS are not. If the DETAILED clause is not specified, only the CLUSTERRATIO statistic is collected.
[1] Detailed index statistics are gathered by specifying the DETAILED clause on the RUNSTATS command.
[3] This statistic measures the percentage of pages in the file containing the index that belongs to that table. For a table having only one defined index, DENSITY should normally be 100. DENSITY is used by the optimizer to estimate how many irrelevant pages from other indexes might be read, on average, if the index pages were prefetched.
[4] These statistics cannot be computed when this table is in a DMS table space. Column distribution statistics (listed in Table D.8) are gathered by specifying the WITH DISTRIBUTION clause on the RUNSTATS command. Note that distribution statistics cannot be gathered unless there is a sufficient lack of uniformity in the column values. Table D.8. Column Distribution Statistics (SYSCAT.COLDIST and SYSSTAT.COLDIST) | | RUNSTATS Option |
---|
Statistic | Description | Table | Indexes |
---|
DISTCOUNT | If TYPE = Q, the number of distinct values that are less than or equal to COLVALUE statistics. | Distribution[1] | No | TYPE | Indicator of whether row provides frequent-value or quantile statistics. | Distribution | No | SEQNO | Frequency ranking of a sequence number to help uniquely identify the row in the table. | Distribution | No | COLVALUE | Data value for which frequency or quantile statistics is collected. | Distribution | No | VALCOUNT | Frequency with which the data value occurs in columns. For quantiles, the number of values is less than or equal to the data value (COLVALUE). | Distribution | No |
[1] DISTCOUNT is collected only for columns that are the first key column in an index. Table D.9. Function Statistics (SYSCAT.FUNCTIONS and SYSSTAT.FUNCTIONS)Statistic | Description |
---|
IOS_PER_INVOC | Estimated number of read/write requests executed each time a function is executed. | INSTS_PER_INVOC | Estimated number of machine instructions executed each time a function is executed. | IOS_PER_ARGBYTE | Estimated number of read/write requests executed per input argument byte. | INSTS_PER_ARGBYTES | Estimated number of machine instructions executed per input argument byte. | PERCENT_ARGBYTES | Estimated average percent of input argument bytes that the function will actually process. | INITIAL_IOS | Estimated number of read/write requests executed only the first/last time the function is invoked. | INITIAL_INSTS | Estimated number of machine instructions executed only the first/last time the function is invoked. | CARDINALITY | Estimated number of rows generated by a table function. |
|