MDC Performance Guidelines | Advanced DBA Certification Guide and Reference for DB2 Universal Database v8 for Linux, UNIX, and Windows

The following section will describe tips and techniques for understanding the performance characteristics of workloads using MDC tables that are different from regular table design choices. These performance aspects consist of the following:

Query processing and validating query plans
Choosing RID indexes to complement the dimension block indexes
Performance tuning parameters for queries, inserts , and load
Limitations and troubleshooting tips

Query Processing Overview

In general, the query workload should take advantage of the multi-dimensional clustering by using block indexes effectively. Block indexes can be used for index scans , index ANDing, index ORing, nested loop joins, star joins, hash joins, and group by clauses.

Block Index Scans

If the table PARTS is created with the dimensions SHIPPEDDATE and GENPART, consider queries with predicates in the WHERE clause of the form:

SHIPPEDDATE = '2002-11-11'
SHIPPEDDATE > '2002-11-11'
SHIPPEDDATE < '2002-11-11'
SHIPPEDDATE BETWEEN '2002-11-01' and '2002-11-30'

All of these predicates are perfect candidates for using the block index defined on the SHIPPEDDATE column. The block index scans should provide performance benefits because the units of I/O and processing are in blocks, or extents, and the size of these indexes is usually much more compact than corresponding RID indexes. In addition, only the first page of the block will need to be checked for valid data, not the entire block.

For the statement:

  select SHIPPEDDATE from parts where SHIPPEDDATE < '2002-11-05'

the access plan looks like the following:

  Access Plan:   -----------   Total Cost:   438.915   Query Degree:  1   Rows   RETURN   (   1)   Cost   I/O     14   FETCH   (   2)   438.915   112   /----+----\   3.5            16   IXSCAN     TABLE: USER1   (   3)    PARTS   0.0061235       16   INDEX: SYSIBM   SQL0211171657071

In this example, the block index is scanned to determine the block(s) that contain the qualifying rows, then the blocks of the table are scanned to read the qualifying rows of data.

NOTE

Notice that the index name above is SQL0211171657071 because it was system generated when the table was created. For ease of examining EXPLAIN information, etc., it is normally a good idea to rename these indexes to make the name more easily understood / remembered .

Block Index ANDing

For an MDC table that contains two or more dimensions, the query workload should contain query predicates that are selective on two or more attributes. For example, the PARTS table has dimensions on SHIPPEDDATE, GENPART, and DISCOUNT. If a query contains predicates of the form:

SHIPPEDDATE > '2002-11-01' AND DISCOUNT > 0.3

PARTNUM between 100 and 200 AND DISCOUNT = 0.4

then access plans can combine the block indexes, using index ANDing.

NOTE

It is possible for an index scan on one dimension to be selected as well.

Block Index ORing

For the PARTS table defined with three dimensions on SHIPPEDDATE, GENPART, and DISCOUNT, queries containing predicates of the form shown below are candidates for Block Index ORing:

SHIPPEDDATE > '2002-11-01' OR DISCOUNT = 0.3

DISCOUNT IN (0.1, 0.2, 0.3)

Combining Dimension Block Indexes and Record Indexes

MDC tables also allow all of the flexibilities available to regular tables. Regular indexes (RID indexes) can also be created on MDC tables to complement the dimension block indexes. These indexes can be combined with block indexes to perform index ANDing and ORing. For example, if the PARTS table has dimensions on SHIPPEDDATE and DISCOUNT, and a RID index on PARTNUM, then predicates such as the following can use index ANDing and ORing techniques combining block and RID indexes:

SHIPPEDDATE > '2002-11-01' AND PARTNUM=101
DISCOUNT < 0.1 OR PARTNUM between 100 and 200

The following are some guidelines for defining RID indexes to complement block indexes:

Define RID indexes on attributes or sets of attributes that have not been chosen as dimensions.
If the dimension for the PARTS table had been defined on only the SHIPPEDDATE column, then PARTNUM and DISCOUNT are candidates for defining RID indexes.
RID indexes can be defined on attributes with large column cardinalities that appear in equality predicates.
If the SHIPMODE column on the PARTS table has a small number of distinct values, it would not be a good candidate for a dimension. However, if many queries use the SHIPMODE as a predicate (i.e., SHIPMODE='Ground'), then a RID index would be useful.
In some cases, a RID index on a rolled-up dimension column can also benefit the query workload. For example, if the PARTNUM column is rolled up to GENPART, as above, a RID index can be defined on PARTNUM for queries with equality predicates such as 'PARTNUM = 1200', whereas queries with range predicates such as 'PARTNUM > 1000' can use the dimension index.

Index-Only Access Restrictions on Block Indexes

Because dimension block indexes can occasionally point to empty blocks, it is not possible to just look at the index key entries for clauses such as EXISTS, MIN, MAX, or DISTINCT. The query plan for these types of clauses on an MDC table case will read the data in the block to verify the presence of a row before proceeding. This can result in degradation in the query performance.

If such queries are likely to occur in the query workload, then RID indexes should be created on the referenced columns to take advantage of index-only access.

Monotonicity

The derivation of block-level range predicates from column-level range predicates can only be done for columns that have been generated using a monotonic expression.

The MONTH function is not monotonic because the range of dates between Sept 2002 and Mar 2003, although increasing, correspond to the month values between 09 and 03, which is decreasing . However, the expression integer(DATE)/100 generates values 200209 and 200303, which increase as the date increases ; therefore, this expression is monotonic.

If the optimizer determines that the expression involved in a generated column is not monotonic, it will not be able to generate corresponding predicates on the generated column for range predicates on the base column, but it will still be able to generate predicates on the generated column for equality and IN predicates.

NOTE

Monotonic means that an increasing range of values on the base column corresponds to a range of values on the generated column that is never decreasing.

NOTE

In some cases, the optimizer will not be able to determine definitively whether an expression is monotonic, and in these cases, it will have to assume that the expression is not monotonic.