How to Use the SYSSTAT Tables to Perform What-if Modeling and Analysis


Because DB2 maintains the SYSIBM and SYSCAT system catalog tables and views, they are read-only. However, you can update the SYSSTAT views. This is a special set of views that you can use to update database statistics. The information contained in these views affects how the DB2 optimizer chooses access plans when executing a query.

For planning purposes, you can change the statistical information in the SYSSTAT tables so that they do not reflect the actual state of tables and indexes. This lets you:

  • Model query performance on a development system using production system statistics.

  • Perform what-if query performance analysis by examining various possible changes to the query access plan.

You must have explicit DBADM authority for the database to modify statistics for tables and indexes and their components. That is, your user ID must have DBADM authority in the SYSCAT.DBAUTH table.

Table D.2 provides information about the system catalog tables that contain catalog statistics and the RUNSTATS options that collect specific statistics.

Table D.2. Table Statistics (SYSCAT.TABLES and SYSSTAT.TABLES)
  

RUNSTATS Option

Statistic

Description

Table

Indexes

FPAGES

Number of pages being used by a table.

Yes

Yes

NPAGES

Number of pages containing rows.

Yes

Yes

OVERFLOW

Number of rows that overflow.

Yes

No

CARD

Number of rows in a table (cardinality).

Yes

Yes[1]

ACTIVE_BLOCKS

For MDC tables, the total number of occupied blocks.

Yes

No


[1] If the table does not have any indexes defined and you request statistics for indexes, no new CARD statistics are updated. The previous CARD statistics are retained.

The Yes or No in the RUNSTATS Option column indicates whether you need to execute the RUNSTATS command on the table, the indexes, or both to collect the statistics specified in the Statistic column. For example, if you want to collect the statistics on FPAGES, you must execute the RUNSTATS command on both the table and indexes. On the other hand, if you want to collect the statistics for OVERFLOW, you need to execute the RUNSTATS command on the table only.

Let's walk you through a what-if scenario: If the EMPLOYEE table had a lot more rows that it has now, which access plan would the DB2 optimizer choose?

First, you collect the statistics on the EMPLOYEE table using the RUNSTATS command:

 RUNSTATS ON TABLE SYLVIAQ.EMPLOYEE 

After it is completed, obtain the statistics by querying the SYSSTAT.TABLES view:

 SELECT * FROM SYSSTAT.TABLES WHERE TABSCHEMA = 'SYLVIAQ' AND TABNAME = 'EMPLOYEE' 

Figure D.8 shows the output.

Figure D.8. Statistics for the EMPLOYEE table


In Figure D.8, the CARD column indicates that the EMPLOYEE table currently has 32 rows. To update the statistics for the EMPLOYEE table to reflect a bigger table, issue:

 UPDATE SYSSTAT.TABLES    SET CARD   = 10000,        NPAGES = 1000,        FPAGES = 1000,        OVERFLOW = 2    WHERE TABSCHEMA = 'SYLVIAQ' AND TABNAME = 'EMPLOYEE' 

After this is completed, you can run your query against the EMPLOYEE table and get the access plan in text-based format using the db2exfmt command, or in graphic format using the Visual Explain GUI tool (see Chapter 4, Using the DB2 Tools).

You must be careful when manually updating catalog statistics: arbitrary changes can seriously affect the performance of subsequent queries. You can use any of the following methods to revert your changes back.

  • ROLLBACK the unit of work in which the changes have been made (assuming the unit of work has not been committed).

  • Use the RUNSTATS utility to recalculate and refresh the catalog statistics.

  • Update the catalog statistics to indicate that statistics have not been gathered. (For example, setting column NPAGES to 1 indicates that the number-of-pages statistic has not been collected.)

  • Replace the catalog statistics with the data they contained before you made any changes. This method is possible only if you used the db2look command to capture the statistics before you made any changes.

In some cases, the optimizer may determine that some particular statistical value or combination of values is not valid, and it will use the default values and issue a warning. Such circumstances are rare, however, since most of the validation is done when updating the statistics.

Tables D.3 through D.8 briefly describe the rest of the updatable SYSSTAT views.

Table D.3. Column Statistics (SYSCAT.COLUMNS and SYSSTAT.COLUMNS)
  

RUNSTATS Option

Statistic

Description

Table

Indexes

COLCARD

Column cardinality.

Yes

Yes[1]

AVGCOLLEN

Average length of a column.

Yes

Yes[1]

HIGH2KEY

Second highest value in a column.

Yes

Yes[1]

LOW2KEY

Second lowest value in a column.

Yes

Yes[1]

NUMNULLS

Number of NULLs in a column.

Yes

Yes[1]

SUB_COUNT

Average number of subelements.

Yes

No[2]

SUB_DELIM_LENGTH

Average length of each delimiter separating each subelement.

Yes

No[2]


[1] Column statistics are gathered for the first column in the index key.

[2] These statistics provide information about data in columns that contain a series of subfields or subelements that are delimited by blanks. The SUB_COUNT and SUB_DELIM_LENGTH statistics are collected only for single-byte character set string columns of type CHAR, VARCHAR, GRAPHIC, and VARGRAPHIC.

Table D.4. Multicolumn Statistics (SYSCAT.COLGROUPS and SYSSTAT.COLGROUPS)
  

RUNSTATS Option

Statistic

Description

Table

Indexes

COLGROUPCARD

Cardinality of the column group.

Yes

No


The multicolumn distribution statistics listed in Tables D.5 and D.6 are not collected by RUNSTATS. You can update them manually, however.

Table D.5. Multicolumn Distribution Statistics (SYSCAT.COLGROUPDIST and SYSSTAT.COLGROUPDIST)
  

RUNSTATS Option

Statistic

Description

Table

Indexes

TYPE

F = Frequency value.

Q = Quantile value.

Yes

No

ORDINAL

Ordinal number of the column in the group.

Yes

No

SEQNO

Sequence number n that represents the nth TYPE value.

Yes

No

COLVALUE

The data value as a character literal or a null value.

Yes

No


Table D.6. Multicolumn Distribution Statistics 2 (SYSCAT.COLGROUPDISTCOUNTS and SYSSTAT.COLGROUPDISTCOUNTS)
  

RUNSTATS Option

Statistic

Description

Table

Indexes

TYPE

F = Frequency value.

Q = Quantile value.

Yes

No

SEQNO

Sequence number n that represents the nth TYPE value.

Yes

No

VALCOUNT

If TYPE = F, VALCOUNT is the number of occurrences of COLVALUEs for the column group identified by this SEQNO.

If TYPE = Q, VALCOUNT is the number of rows whose value is less than or equal to COLVALUEs for the column group with this SEQNO.

Yes

No

DISTCOUNT

If TYPE = Q, this column contains the number of distinct values that are less than or equal to COLVALUEs for the column group with this SEQNO. Null if unavailable.

Yes

No


Table D.7. Index Statistics (SYSCAT.INDEXES and SYSSTAT.INDEXES)
  

RUNSTATS Option

Statistic

Description

Table

Indexes

NLEAF

Number of index leaf pages.

No

Yes

NLEVELS

Number of index levels.

No

Yes

CLUSTERRATIO

Degree of clustering of table data.

No

Yes[2]

CLUSTERFACTOR

Finer degree of clustering.

No

See [1], [2]

DENSITY

Ratio (percentage) of SEQUENTIAL_ PAGES to the number of pages in the range of pages occupied by the index.[3]

No

Yes

FIRSTKEYCARD

Number of distinct values in the first column of the index.

No

Yes

FIRST2KEYCARD

Number of distinct values in the first two columns of the index.

No

Yes

FIRST3KEYCARD

Number of distinct values in the first three columns of the index.

No

Ye

FIRST4KEYCARD

Number of distinct values in the first four columns of the index.

No

Yes

FULLKEYCARD

Number of distinct values in all columns of the index, excluding any key value in a type-2 index for which all record identifiers (RIDs) are marked deleted.

No

Yes

PAGE_FETCH_PAIRS

Page fetch estimates for different buffer sizes.

No

See [1], [2]

SEQUENTIAL_PAGES

Number of leaf pages located on disk in index key order, with few or no large gaps between them.

No

Yes

AVERAGE_SEQUENCE_PAGES

Average number of index pages accessible in sequence. This is the number of index pages that the prefetchers can detect as being in sequence.

No

Yes

AVERAGE_RANDOM_PAGES

Average number of random index pages between sequential page accesses.

No

Yes

AVERAGE_SEQUENCE_GAP

Gap between sequences.

No

Yes

AVERAGE_SEQUENCE_FETCH_PAGES

Average number of table pages accessible in sequence. This is the number of table pages that the prefetchers can detect as being in sequence when they fetch table rows using the index.

No

Yes[4]

AVERAGE_RANDOM_FETCH_PAGES

Average number of random table pages between sequential page accesses when fetching table rows using the index.

No

Yes[4]

AVERAGE_SEQUENCE_FETCH_GAP

Gap between sequences when fetching table rows using the index.

No

Yes[4]

NUMRIDS

Number of record identifiers (RIDs) in the index, including deleted RIDs in type-2 indexes.

No

Yes

NUMRIDS_DELETED

Total number of RIDs marked deleted in the index, except RIDs on leaf pages on which all record identifiers are marked deleted.

No

Yes

NUM_EMPTY_LEAFS

Total number of leaf pages on which all record identifiers are marked deleted.

No

Yes


[2] CLUSTERFACTOR and PAGE_FETCH_PAIRS are not collected with the DETAILED clause unless the table is of a respectable size. If the table is greater than about 25 pages, then CLUSTERFACTOR and PAGE_FETCH_ PAIRS statistics are collected. In this case, CLUSTERRATIO is 1 (not collected). If the table is a relatively small table, only CLUSTERRATIO is filled in by RUNSTATS while CLUSTERFACTOR and PAGE_FETCH_PAIRS are not. If the DETAILED clause is not specified, only the CLUSTERRATIO statistic is collected.

[1] Detailed index statistics are gathered by specifying the DETAILED clause on the RUNSTATS command.

[3] This statistic measures the percentage of pages in the file containing the index that belongs to that table. For a table having only one defined index, DENSITY should normally be 100. DENSITY is used by the optimizer to estimate how many irrelevant pages from other indexes might be read, on average, if the index pages were prefetched.

[4] These statistics cannot be computed when this table is in a DMS table space.

Column distribution statistics (listed in Table D.8) are gathered by specifying the WITH DISTRIBUTION clause on the RUNSTATS command. Note that distribution statistics cannot be gathered unless there is a sufficient lack of uniformity in the column values.

Table D.8. Column Distribution Statistics (SYSCAT.COLDIST and SYSSTAT.COLDIST)
  

RUNSTATS Option

Statistic

Description

Table

Indexes

DISTCOUNT

If TYPE = Q, the number of distinct values that are less than or equal to COLVALUE statistics.

Distribution[1]

No

TYPE

Indicator of whether row provides frequent-value or quantile statistics.

Distribution

No

SEQNO

Frequency ranking of a sequence number to help uniquely identify the row in the table.

Distribution

No

COLVALUE

Data value for which frequency or quantile statistics is collected.

Distribution

No

VALCOUNT

Frequency with which the data value occurs in columns. For quantiles, the number of values is less than or equal to the data value (COLVALUE).

Distribution

No


[1] DISTCOUNT is collected only for columns that are the first key column in an index.

Table D.9. Function Statistics (SYSCAT.FUNCTIONS and SYSSTAT.FUNCTIONS)

Statistic

Description

IOS_PER_INVOC

Estimated number of read/write requests executed each time a function is executed.

INSTS_PER_INVOC

Estimated number of machine instructions executed each time a function is executed.

IOS_PER_ARGBYTE

Estimated number of read/write requests executed per input argument byte.

INSTS_PER_ARGBYTES

Estimated number of machine instructions executed per input argument byte.

PERCENT_ARGBYTES

Estimated average percent of input argument bytes that the function will actually process.

INITIAL_IOS

Estimated number of read/write requests executed only the first/last time the function is invoked.

INITIAL_INSTS

Estimated number of machine instructions executed only the first/last time the function is invoked.

CARDINALITY

Estimated number of rows generated by a table function.




Understanding DB2(R. Learning Visually with Examples)
Understanding DB2: Learning Visually with Examples (2nd Edition)
ISBN: 0131580183
EAN: 2147483647
Year: 2004
Pages: 313

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net