How to Use the SYSSTAT Tables to Perform What-if Modeling and Analysis

Because DB2 maintains the SYSIBM and SYSCAT system catalog tables and views, they are read-only. However, you can update the SYSSTAT views. This is a special set of views that you can use to update database statistics. The information contained in these views affects how the DB2 optimizer chooses access plans when executing a query.

For planning purposes, you can change the statistical information in the SYSSTAT tables so that they do not reflect the actual state of tables and indexes. This lets you:

Model query performance on a development system using production system statistics.
Perform what-if query performance analysis by examining various possible changes to the query access plan.

You must have explicit DBADM authority for the database to modify statistics for tables and indexes and their components. That is, your user ID must have DBADM authority in the SYSCAT.DBAUTH table.

Table D.2 provides information about the system catalog tables that contain catalog statistics and the RUNSTATS options that collect specific statistics.

Table D.2. Table Statistics (SYSCAT.TABLES and SYSSTAT.TABLES)
		RUNSTATS Option
Statistic	Description	Table	Indexes
FPAGES	Number of pages being used by a table.	Yes	Yes
NPAGES	Number of pages containing rows.	Yes	Yes
OVERFLOW	Number of rows that overflow.	Yes	No
CARD	Number of rows in a table (cardinality).	Yes	Yes^[1]
ACTIVE_BLOCKS	For MDC tables, the total number of occupied blocks.	Yes	No

^[1] If the table does not have any indexes defined and you request statistics for indexes, no new CARD statistics are updated. The previous CARD statistics are retained.

The Yes or No in the RUNSTATS Option column indicates whether you need to execute the RUNSTATS command on the table, the indexes, or both to collect the statistics specified in the Statistic column. For example, if you want to collect the statistics on FPAGES, you must execute the RUNSTATS command on both the table and indexes. On the other hand, if you want to collect the statistics for OVERFLOW, you need to execute the RUNSTATS command on the table only.

Let's walk you through a what-if scenario: If the EMPLOYEE table had a lot more rows that it has now, which access plan would the DB2 optimizer choose?

First, you collect the statistics on the EMPLOYEE table using the RUNSTATS command:

 RUNSTATS ON TABLE SYLVIAQ.EMPLOYEE

After it is completed, obtain the statistics by querying the SYSSTAT.TABLES view:

 SELECT * FROM SYSSTAT.TABLES WHERE TABSCHEMA = 'SYLVIAQ' AND TABNAME = 'EMPLOYEE'

Figure D.8 shows the output.

Figure D.8. Statistics for the EMPLOYEE table

In Figure D.8, the CARD column indicates that the EMPLOYEE table currently has 32 rows. To update the statistics for the EMPLOYEE table to reflect a bigger table, issue:

 UPDATE SYSSTAT.TABLES    SET CARD   = 10000,        NPAGES = 1000,        FPAGES = 1000,        OVERFLOW = 2    WHERE TABSCHEMA = 'SYLVIAQ' AND TABNAME = 'EMPLOYEE'

After this is completed, you can run your query against the EMPLOYEE table and get the access plan in text-based format using the db2exfmt command, or in graphic format using the Visual Explain GUI tool (see Chapter 4, Using the DB2 Tools).

You must be careful when manually updating catalog statistics: arbitrary changes can seriously affect the performance of subsequent queries. You can use any of the following methods to revert your changes back.

ROLLBACK the unit of work in which the changes have been made (assuming the unit of work has not been committed).
Use the RUNSTATS utility to recalculate and refresh the catalog statistics.
Update the catalog statistics to indicate that statistics have not been gathered. (For example, setting column NPAGES to 1 indicates that the number-of-pages statistic has not been collected.)
Replace the catalog statistics with the data they contained before you made any changes. This method is possible only if you used the db2look command to capture the statistics before you made any changes.

In some cases, the optimizer may determine that some particular statistical value or combination of values is not valid, and it will use the default values and issue a warning. Such circumstances are rare, however, since most of the validation is done when updating the statistics.

Tables D.3 through D.8 briefly describe the rest of the updatable SYSSTAT views.

Table D.3. Column Statistics (SYSCAT.COLUMNS and SYSSTAT.COLUMNS)
		RUNSTATS Option
Statistic	Description	Table	Indexes
COLCARD	Column cardinality.	Yes	Yes^[1]
AVGCOLLEN	Average length of a column.	Yes	Yes^[1]
HIGH2KEY	Second highest value in a column.	Yes	Yes^[1]
LOW2KEY	Second lowest value in a column.	Yes	Yes^[1]
NUMNULLS	Number of NULLs in a column.	Yes	Yes^[1]
SUB_COUNT	Average number of subelements.	Yes	No^[2]
SUB_DELIM_LENGTH	Average length of each delimiter separating each subelement.	Yes	No^[2]

^[1] Column statistics are gathered for the first column in the index key.

^[2] These statistics provide information about data in columns that contain a series of subfields or subelements that are delimited by blanks. The SUB_COUNT and SUB_DELIM_LENGTH statistics are collected only for single-byte character set string columns of type CHAR, VARCHAR, GRAPHIC, and VARGRAPHIC.

Table D.4. Multicolumn Statistics (SYSCAT.COLGROUPS and SYSSTAT.COLGROUPS)
		RUNSTATS Option
Statistic	Description	Table	Indexes
COLGROUPCARD	Cardinality of the column group.	Yes	No

The multicolumn distribution statistics listed in Tables D.5 and D.6 are not collected by RUNSTATS. You can update them manually, however.

Table D.5. Multicolumn Distribution Statistics (SYSCAT.COLGROUPDIST and SYSSTAT.COLGROUPDIST)
		RUNSTATS Option
Statistic	Description	Table	Indexes
TYPE	F = Frequency value. Q = Quantile value.	Yes	No
ORDINAL	Ordinal number of the column in the group.	Yes	No
SEQNO	Sequence number n that represents the nth TYPE value.	Yes	No
COLVALUE	The data value as a character literal or a null value.	Yes	No

Table D.6. Multicolumn Distribution Statistics 2 (SYSCAT.COLGROUPDISTCOUNTS and SYSSTAT.COLGROUPDISTCOUNTS)
		RUNSTATS Option
Statistic	Description	Table	Indexes
TYPE	F = Frequency value. Q = Quantile value.	Yes	No
SEQNO	Sequence number n that represents the nth TYPE value.	Yes	No
VALCOUNT	If TYPE = F, VALCOUNT is the number of occurrences of COLVALUEs for the column group identified by this SEQNO. If TYPE = Q, VALCOUNT is the number of rows whose value is less than or equal to COLVALUEs for the column group with this SEQNO.	Yes	No
DISTCOUNT	If TYPE = Q, this column contains the number of distinct values that are less than or equal to COLVALUEs for the column group with this SEQNO. Null if unavailable.	Yes	No

Table D.7. Index Statistics (SYSCAT.INDEXES and SYSSTAT.INDEXES)
		RUNSTATS Option
Statistic	Description	Table	Indexes
NLEAF	Number of index leaf pages.	No	Yes
NLEVELS	Number of index levels.	No	Yes
CLUSTERRATIO	Degree of clustering of table data.	No	Yes^[2]
CLUSTERFACTOR	Finer degree of clustering.	No	See ^[1], ^[2]
DENSITY	Ratio (percentage) of SEQUENTIAL_ PAGES to the number of pages in the range of pages occupied by the index.^[3]	No	Yes
FIRSTKEYCARD	Number of distinct values in the first column of the index.	No	Yes
FIRST2KEYCARD	Number of distinct values in the first two columns of the index.	No	Yes
FIRST3KEYCARD	Number of distinct values in the first three columns of the index.	No	Ye
FIRST4KEYCARD	Number of distinct values in the first four columns of the index.	No	Yes
FULLKEYCARD	Number of distinct values in all columns of the index, excluding any key value in a type-2 index for which all record identifiers (RIDs) are marked deleted.	No	Yes
PAGE_FETCH_PAIRS	Page fetch estimates for different buffer sizes.	No	See ^[1], ^[2]
SEQUENTIAL_PAGES	Number of leaf pages located on disk in index key order, with few or no large gaps between them.	No	Yes
AVERAGE_SEQUENCE_PAGES	Average number of index pages accessible in sequence. This is the number of index pages that the prefetchers can detect as being in sequence.	No	Yes
AVERAGE_RANDOM_PAGES	Average number of random index pages between sequential page accesses.	No	Yes
AVERAGE_SEQUENCE_GAP	Gap between sequences.	No	Yes
AVERAGE_SEQUENCE_FETCH_PAGES	Average number of table pages accessible in sequence. This is the number of table pages that the prefetchers can detect as being in sequence when they fetch table rows using the index.	No	Yes^[4]
AVERAGE_RANDOM_FETCH_PAGES	Average number of random table pages between sequential page accesses when fetching table rows using the index.	No	Yes^[4]
AVERAGE_SEQUENCE_FETCH_GAP	Gap between sequences when fetching table rows using the index.	No	Yes^[4]
NUMRIDS	Number of record identifiers (RIDs) in the index, including deleted RIDs in type-2 indexes.	No	Yes
NUMRIDS_DELETED	Total number of RIDs marked deleted in the index, except RIDs on leaf pages on which all record identifiers are marked deleted.	No	Yes
NUM_EMPTY_LEAFS	Total number of leaf pages on which all record identifiers are marked deleted.	No	Yes

^[2] CLUSTERFACTOR and PAGE_FETCH_PAIRS are not collected with the DETAILED clause unless the table is of a respectable size. If the table is greater than about 25 pages, then CLUSTERFACTOR and PAGE_FETCH_ PAIRS statistics are collected. In this case, CLUSTERRATIO is 1 (not collected). If the table is a relatively small table, only CLUSTERRATIO is filled in by RUNSTATS while CLUSTERFACTOR and PAGE_FETCH_PAIRS are not. If the DETAILED clause is not specified, only the CLUSTERRATIO statistic is collected.

^[1] Detailed index statistics are gathered by specifying the DETAILED clause on the RUNSTATS command.

^[3] This statistic measures the percentage of pages in the file containing the index that belongs to that table. For a table having only one defined index, DENSITY should normally be 100. DENSITY is used by the optimizer to estimate how many irrelevant pages from other indexes might be read, on average, if the index pages were prefetched.

^[4] These statistics cannot be computed when this table is in a DMS table space.

Column distribution statistics (listed in Table D.8) are gathered by specifying the WITH DISTRIBUTION clause on the RUNSTATS command. Note that distribution statistics cannot be gathered unless there is a sufficient lack of uniformity in the column values.

Table D.8. Column Distribution Statistics (SYSCAT.COLDIST and SYSSTAT.COLDIST)
		RUNSTATS Option
Statistic	Description	Table	Indexes
DISTCOUNT	If TYPE = Q, the number of distinct values that are less than or equal to COLVALUE statistics.	Distribution^[1]	No
TYPE	Indicator of whether row provides frequent-value or quantile statistics.	Distribution	No
SEQNO	Frequency ranking of a sequence number to help uniquely identify the row in the table.	Distribution	No
COLVALUE	Data value for which frequency or quantile statistics is collected.	Distribution	No
VALCOUNT	Frequency with which the data value occurs in columns. For quantiles, the number of values is less than or equal to the data value (COLVALUE).	Distribution	No

^[1] DISTCOUNT is collected only for columns that are the first key column in an index.

Table D.9. Function Statistics (SYSCAT.FUNCTIONS and SYSSTAT.FUNCTIONS)
Statistic	Description
IOS_PER_INVOC	Estimated number of read/write requests executed each time a function is executed.
INSTS_PER_INVOC	Estimated number of machine instructions executed each time a function is executed.
IOS_PER_ARGBYTE	Estimated number of read/write requests executed per input argument byte.
INSTS_PER_ARGBYTES	Estimated number of machine instructions executed per input argument byte.
PERCENT_ARGBYTES	Estimated average percent of input argument bytes that the function will actually process.
INITIAL_IOS	Estimated number of read/write requests executed only the first/last time the function is invoked.
INITIAL_INSTS	Estimated number of machine instructions executed only the first/last time the function is invoked.
CARDINALITY	Estimated number of rows generated by a table function.

Table D.2. Table Statistics (SYSCAT.TABLES and SYSSTAT.TABLES)

Figure D.8. Statistics for the EMPLOYEE table

Table D.3. Column Statistics (SYSCAT.COLUMNS and SYSSTAT.COLUMNS)

Table D.4. Multicolumn Statistics (SYSCAT.COLGROUPS and SYSSTAT.COLGROUPS)

Table D.5. Multicolumn Distribution Statistics (SYSCAT.COLGROUPDIST and SYSSTAT.COLGROUPDIST)

Table D.6. Multicolumn Distribution Statistics 2 (SYSCAT.COLGROUPDISTCOUNTS and SYSSTAT.COLGROUPDISTCOUNTS)

Table D.7. Index Statistics (SYSCAT.INDEXES and SYSSTAT.INDEXES)

Table D.8. Column Distribution Statistics (SYSCAT.COLDIST and SYSSTAT.COLDIST)

Table D.9. Function Statistics (SYSCAT.FUNCTIONS and SYSSTAT.FUNCTIONS)