Index Guidelines

< Day Day Up >

Consider the following guidelines when planning the index strategy for your DB2 databases and applications.

Use Workload As a Basis for Indexing

The basic trick to creating the appropriate indexes is to analyze your workload and build indexes to match. Indexing one table at a time is unlikely to produce a batch of indexes that will match your actual workload. The trick is to choose a workload to optimize. Choose an important workload, such as month-end processing or another peak processing time. You might also choose a workload for a specific application or group of applications.

After you choose the workload, you will need to examine the queries in the workload, documenting each table that is accessed and the frequency of each query run during the workload. Favor building indexes to optimize the queries that run the most often. For example, it is more beneficial to create an index to optimize a query that runs 1,000 times a day, than for a query that runs 3 times a day.

Then, factor in the relative priority of each query. You might favor building indexes to optimize very important queries over other queries that might run more frequently. For example, a query ( Q1 ) that is only run 10 times a day might be more important than another query ( Q2 ) that runs 120 times a day. Why? Reasons might include the following:

Q1 is more important to the business than Q2 because it brings in more revenue, satisfies more customers, or reduces spending.
Q1 is used by an important business user , such as the CEO or perhaps your most important customers.
Q1 is required for regulatory or legal reasons.
The users of Q1 are willing to spend more to optimize their query than the users of Q2 ; of course, approach this scenario with caution, because financial clout should not be the only consideration for DB2 query optimization.

When you have sufficiently ranked each query in a particular workload you can begin to build indexes for the most important queries in the workload. At least the top 30% most critical processes should have optimal indexes built for them, taking into consideration the predicate filtering and sequencing required.

Be sure to consider each workload profile and its performance requirements as you embark on an index creation strategy.

Be Aware of Data Modification

Remember, indexing optimizes the performance of data access but degrades the performance of data modification. Every INSERT or DELETE performed on a table will also insert or delete the data from every index defined on the table. Additionally, an UPDATE to a column in a table will cause DB2 to update any index that references that column.

In general, keep adding indexes until the impact on data modification causes your applications to be able to meet the necessary performance and service levels. When modification performance becomes unacceptable, remove the last index (or indexes) created until the performance of modification becomes tolerable.

Uniqueness Recommendations

You can enforce the uniqueness of a column or a group of columns by creating a unique index on those columns . You can have more than one unique index per table.

It usually is preferable to enforce the uniqueness of columns by creating unique indexes, thereby allowing the DBMS to do the work. The alternative is to code uniqueness logic in an application program to do the same work that DB2 does automatically.

CAUTION

Remember: If security is liberal for application tables, ad hoc SQL users can modify table data without the application program, and thereby insert or update columns that should be unique to non-unique values. However, this cannot happen if a unique index is defined on the columns.

Create a Unique Index for Each Primary Key

Every primary key explicitly defined for a table must be associated with a corresponding unique index. If you do not create a unique index for a primary key, an incomplete key is defined for the table, making the table inaccessible.

Use `WHERE NOT NULL` to Allow Multiple Nulls in a `UNIQUE` Index

Specify the UNIQUE WHERE NOT NULL clause to enable multiple nulls to exist in a unique index. This is useful when an index contains at least one nullable column, but all non-null entries must be unique.

Create Indexes for Foreign Keys

Unless an index already exists for access reasons or the table is too small to be indexed, create an index for each foreign key defined for a table. Because DB2's referential integrity feature accesses data defined as a foreign key behind the scenes, it's a good idea to enhance the efficiency of this access by creating indexes.

When to Avoid Indexing

There are only a few situations when you should consider not defining indexes for a table. Consider avoiding indexing when the table is very small, that is, less than 100 or so pages. However, there are scenarios where even a small table can benefit from being indexed (for example, for uniqueness or for specific, high-performance access requirements).

Another scenario where indexing might not be advantageous is when the table has heavy insert and delete activity but is relatively small, that is, less than 200 or so pages.

A table also should not be indexed if it always is accessed with a scan ”in other words, if there is no conditional predicate access to the table.

When to Avoid Placing Columns in an Index

Sometimes you should not define indexes for columns. If the column is updated frequently and the table is less than 200 or so pages, consider avoiding placing the column in an index.

Avoid defining an index for a column if an index on the column exists that would make the new index redundant. For example, if an index exists on COL1 , COL2 in TABLE1 , a second index on COL1 only is redundant. An index on COL2 alone is not redundant because it is not the first column in the index.

When to Specify Extra Index Columns

When the column or columns to be indexed contain non-unique data, consider adding an extra column to increase the cardinality of the index. This reduces the index RID list and avoids chaining ”an inefficient method of processing index entries. Uniqueness can be gauged by determining the cardinality for the columns in the index. The cardinality for the columns is nothing more than the number of distinct values stored in the columns. If this number is small (for example, less than 10% of the total number of rows for the table), consider adding extra columns to the index. (A column's cardinality can be found in the DB2 Catalog using queries presented in Part IV, "DB2 Performance Monitoring.")

Control the Length of Index Keys

DB2 V8 permits indexes to be defined with keys of up to 2,000 bytes. The previous limit for index key length was 255.

CAUTION

The maximum size of a partitioning key is still 255 whether partitioning is index-controlled or table-controlled.

Of course, just because DB2 permits greater key lengths does not mean that you should pad more columns into your indexes. For example, just blindly adding every column to every index to support more index-only access is generally not a wise idea. Be sure that there is a reasoned, practical need for each column defined to an index.

CAUTION

To calculate the actual physical limit for index key length you must take nullable columns into account. Subtract 1 from the maximum length for every column that is nullable. So, if an index contains 5 columns, 3 of which can be set to null, the total length of the 5 columns can be no greater than 1997 (2000 “3 = 1997).

Indexing Large and Small Tables

For tables over 100 (or so) pages, it is best to define at least one index. If the table is very large (over 10,000 pages), try to limit the indexes to those that are absolutely necessary for adequate performance. When a large table has multiple indexes, data modification performance can suffer. When large tables lack indexes, however, access efficiency will suffer. This fragile balance must be monitored closely. In most situations, more indexes are better than fewer indexes because most applications are query-intensive rather than update- intensive .

For tables containing a small number of pages (up to 100 or so pages) consider limiting indexes to those required for uniqueness and perhaps to support common join criterion. Add indexes also when the performance of queries that access the table suffers. Test the performance of the query after the index is created, though, to ensure that the index helps. When you index a small table, increased I/O (due to index accesses) may cause performance to suffer when compared to a complete scan of all the data in the table.

Promote Index-Only Access

When an index contains all of the columns being selected in a query, DB2 can choose to use index-only access. With index-only access, DB2 will read all of the data required from the index without having to access the table space. Index-only access can reduce the number of I/Os required to satisfy a query. For example, consider the following query:

 SELECT  DEPTNAME, MGRNO FROM    DSN8810.DEPT WHERE   DEPTNO > 'D00';

Now, also consider that there is only one index on this table ”a two-column index on DEPTNO and MGRNO . DB2 will most likely choose to use this index to satisfy the DEPTNO > 'D00' predicate. But, if we add the DEPTNAME column to this index, DB2 can conceivably use this index for index-only access because all of the required columns are stored in the index.

So, it can be worthwhile to extend an index with an additional column or two to encourage index-only access. However, this practice should be deployed with care; you do not want to overload every index, because it will become unwieldy to manage.

When deciding whether to extend an index to encourage index-only access, be sure to consider the following factors:

Adding extra columns to an index will increase the size of the index requiring additional storage.
Adding extra columns to an index might increase the number of levels in the index, especially if the table is very large. Additional levels can degrade performance.
The sequencing of columns in a multi-column index is important and can significantly impact performance if chosen improperly.
Use caution when choosing the indexes to be overloaded; consider overloading indexes for index-only access only for the most important or performance-critical queries in your applications.

Multi-column Indexes

If a table has only multi-column indexes, try to specify the high-level column in the WHERE clause of your query. This action results in an index scan with at least one matching column.

A multi-column index can be used to scan data to satisfy a query in which the high-level column is not specified (but another column in the index is specified). However, a non-matching index scan of this sort is not as efficient as a matching index scan.

Consider Several Indexes Instead of a Multi-column Index

Because DB2 can utilize multiple indexes in an access path for a single SQL statement, multiple indexes might be more efficient (from a global perspective) than a single multi-column index. If access to the columns varies from query to query, multiple indexes might provide better overall performance for all your queries, at the expense of an individual query.

If you feel that multiple indexes might be of benefit for your specific situation, test their effectiveness first in a test environment by

Dropping the multi-column index
Creating a single index for each of the columns in the multi-column index
Updating DB2 Catalog statistics to indicate production volume
Running EXPLAIN on all the affected queries and analyzing the results

CAUTION

It is not common for the DB2 optimizer to choose multiple-index access in practice. Use caution before dropping a multi-column index in favor of multiple indexes because, after optimization, DB2 might choose to use only one of the indexes instead of a multiple index strategy.

Multi-Index Access

DB2 can use more than one index to satisfy a data retrieval request. For example, consider two indexes on the DSN8810.DEPT table: one index for DEPTNO and another index for ADMRDEPT . If you executed the following query, DB2 could use both of these indexes to satisfy the request:

 SELECT   DEPTNO, DEPTNAME, MGRNO FROM     DSN8810.DEPT WHERE    DEPTNO > 'D00' AND      ADMRDEPT = 'D01';

If multi-index access is used, the index on DEPTNO is used to retrieve all departments with a DEPTNO greater than 'D00' , and the index on ADMRDEPT is used to retrieve only rows containing 'D01' . Then these rows are intersected and the correct result is returned.

An alternative to the multi-index access just described is a single multi-column index. If you create one index for the combination of columns ADMRDEPT and DEPTNO , DB2 could use this index, as well. When deciding whether to use multiple indexes or multi-column indexes, consider the following guidelines:

Multi-index access is usually less efficient than access by a single multi-column index.
Many multi-column indexes require more DASD than multiple single-column indexes.
Consider the access criteria for all applications that will be querying the table that must be indexed. If the indexing needs are light, a series of multi-column indexes is usually the best solution. If the indexing needs are heavy and many combinations and permutations of columns are necessary to support the access criteria, multiple single-column indexes could be a better solution.
Sometimes one multi-column index can fit the needs of many different access criteria. For example, suppose that the DSN8810.EMP table (see Appendix A, "DB2 Sample Tables") has three access needs, as follows :

LASTNAME only

LASTNAME and FIRSTNME

LASTNAME , FIRSTNME , and BONUS

One index on the concatenation of the LASTNAME , FIRSTNME , and BONUS columns would efficiently handle the access needs for this table. When only LASTNAME is required, only the first column of the index is used. When both LASTNAME and FIRSTNME are specified in a query, only the first two columns are used. Finally, if all three columns are specified in a query, the index uses all three columns.

With index screening, DB2 also could use the same three column index to satisfy a query specifying only LASTNAME and BONUS . A matching index scan would be performed on LASTNAME , and then DB2 could screen the index for the BONUS values.

Of course, you might need to create indexes for one (or both) of the other two indexes if they are needed to maintain uniqueness.
Consider the tradeoff of DASD versus performance, and weigh the access criteria to determine the best indexing scenario for your implementation.

Specify Appropriate Index Parameters

The first design decision to be made when defining an indexing strategy for a table is to choose a useful clustering strategy. Clustering reduces I/O. The DB2 optimizer usually tries to use an index on a clustered column before using other indexes. Choose your clustering index wisely; in general, use the index accessed for scanning most often, because clustering improves sequential access more than random access.

Specify index-free space the same as the table space free space. The same reason for the free space in the table space applies to the free space in the index. Remember that index row sizes are smaller than table row sizes, so plan accordingly when calculating free space. Also, as PCTFREE increases, the frequency of page splitting decreases and the efficiency of index updates increases .

When an index page is completely filled and a new entry must be inserted, DB2 splits the index leaf page involved in two, moving half the data to a new page. Splits can cause DB2 to lock at many levels of the index, possibly causing splits all the way back to the root page. This splitting activity is inefficient and should be avoided by prudent use of free space and frequent index reorganizations. DB2 also uses a free page for splits if one is available within 64 pages of the original page being split. Use the suggestions in Table 6.1 as a rough guideline for specifying PCTFREE and FREEPAGE based on insert and update frequency. Of course, these are very rough guidelines and your free space allocations will vary according to the volatility of your data and the frequency of reorganization.

Table 6.1. Index Free Space Allocation Chart

Type of Index Processing	`FREEPAGE`	`PCTFREE`
Read only
Less than 20% of volume inserted or updated between `REORG` s		10 to 20
Twenty to 60% of volume inserted or updated between `REORG` s	63	20 to 30
Greater than 60% of volume inserted or updated between `REORG` s	15	20 to 30

Additionally, refer to the VCAT versus STOGROUP considerations presented in Chapter 5. The same considerations that apply to table space allocation also apply to index allocation.

Create Indexes Before Loading Tables

The LOAD utility updates indexes efficiently. Usually, the LOAD utility is more efficient than building indexes for tables that already contain data. The data being loaded should be sorted into the order of the clustering index before execution.

Consider Deferring Index Creation

The DEFER option on the CREATE INDEX statement allows the index to be created but not populated . The RECOVER INDEX utility can then be executed to populate the index. This will speed the index creation process because REBUILD INDEX usually populates index entries faster than CREATE INDEX .

Creating a STOGROUP -defined index with DEFER YES causes the underlying VSAM data set for the index to be allocated.

Additionally, the DB2 catalog is updated to record that the index exists. But, if the table being indexed currently contains data, DB2 will turn on the recover pending flag for the index space and issue a +610 SQLCODE . Subsequent execution of RECOVER INDEX will turn off the recover pending flag and populate the index.

Consider Deferring Index Data Set Definition

The DEFINE parameter can be used to control when the underlying data set(s) for the index space are created. DEFINE YES , which is the default, indicates that the data sets are created when the index is created. DEFINE NO indicates that data set creation will not occur until data is inserted in to the index. The DEFINE parameter should be used only with STORGROUP -defined indexes; it will be ignored if specified for VCAT -defined indexes.

Specifying DEFINE NO can be useful to minimize the number of data sets where indexes are being created on empty tables that will remain empty for some time.

Let DB2 Tell You What Indexes to Create

Consider using CREATE INDEX with the DEFER YES option to create many different indexes for new applications. The indexes will be recorded in the DB2 catalog, but will not be populated. Then, update the statistics in the DB2 catalog to indicate anticipated production volumes and run EXPLAIN on yourperformance-sensitive queries.

Use REBUILD INDEX to populate the indexes that were used and drop the indexes that were not used. In this way DB2 can help you choose which indexes will be useful.

Store Index and Table Space Data Sets Separately

You should assign indexes to different STOGROUP s or different volumes than the table spaces containing the tables to which the indexes apply. This reduces head contention and increases I/O efficiency. This is especially important for tables and indexes involved in parallel queries.

Consider Separate Index Buffer Pools

Consider placing critical indexes in a different buffer pool from your table spaces. For more in-depth buffer pool consideration, see Chapter 28, "Tuning DB2's Components."

`PRIQTY` and `SECQTY`

If you are defining indexes using the STOGROUP method, you must specify primary and secondary space allocations. The primary allocation is the amount of physical storage allocated when the index is created. As the amount of data in the index grows, secondary allocations of storage are taken. Use the guidelines specified for table space space allocations to guide your index space allocation efforts.

The default values for index PRIQTY and SECQTY are the same as the 4KB page size defaults for table space PRIQTY and SECQTY .

Use `PIECESIZE` to Explicitly Define Index Data Set Size

Consider using the PIECESIZE clause to specify the largest data set size for a non-partitioned index.

The creation of non-partitioning indexes (NPIs) on tables in a partitioned table space has been one of the most vexing problems facing DBAs. Partitioned table spaces tend to be large and by their very design will span multiple underlying data sets. The partitioning index that defines the partitioning key and key ranges also spans multiple data sets. There can be only one partitioning index per partitioned table space. What happens when you need to define more than one index on a table in a partitioned table space?

Well, in the old days (pre-V5), the DBA could not control the creation of the underlying data set(s) used for NPIs. As of V5, the PIECESIZE clause of the CREATE INDEX statement can be used during index creation to break an NPI into several data sets (or pieces ). More accurately, the PIECESIZE clause specifies the largest data set size for a non-partitioned index. PIECESIZE can be specified in kilobytes, megabytes, or gigabytes. For example, the following statement will limit the size of individual data sets for the XACT2 index to 256 megabytes:

 CREATE TYPE 2 UNIQUE INDEX DSN8710.XACT2   ON DSN8710.ACT (ACTKWD ASC)   USING STOGROUP DSN8G710         PRIQTY 65536K         SECQTY 8192K         ERASE NO   BUFFERPOOL BP0   CLOSE NO   PIECESIZE 256M;

Basically, PIECESIZE is used to enable NPIs to be created on very large partitioned table spaces. It breaks apart the NPI into separate pieces that can be somewhat managed individually. Without PIECESIZE , NPIs would be quite difficult to manage and administer. Keep in mind, though, that PIECESIZE does not magically partition an NPI based on the partitioning scheme of the table space. This is a common misperception of the PIECESIZE clause. So, if you have a partitioned table space with four partitions and then create an NPI with four pieces, the data in the NPI pieces will not match up with the data in the four partitions.

When using PIECESIZE , more data sets will be created, and therefore you can obtain greater control over data set placement. Placing the pieces on separate disk devices can help to reduce I/O contention for SQL operations that access NPIs during read or update processing. The elapsed time improvement might be even greater when multiple tasks are accessing the NPI.

Separating the NPI into pieces allows for better performance of INSERT , UPDATE , and DELETE processes by eliminating bottlenecks that can be caused by using only one data set for the index. The use of pieces also improves concurrency and performance of heavy INSERT , UPDATE , and DELETE processing against any size partitioned table space with NPIs.

Keep in mind that PIECESIZE is only a specification of the maximum amount of data that a piece (that is, a data set) can hold and not the actual allocation of storage, so PIECESIZE has no effect on primary and secondary space allocation. Each data set will max out at the PIECESIZE value, so specifying PRIQTY greater than PIECESIZE will waste space.

CAUTION

Avoid setting the PIECESIZE too small. A new data set is allocated each time the PIECESIZE threshold is reached. DB2 increments the A001 component of the data set name each time. This makes the physical limit 999 data sets ( A001 through A999 ). If PIECESIZE is set too small, the data set name can limit the overall size of the table space. Ideally, the value of your primary quantity and secondary quantities should be evenly divisible into PIECESIZE to avoid wasting space.

To choose a PIECESIZE value, divide the overall size of the entire NPI by the number of data sets that you want to have. For example, for an NPI that is 8 megabytes, you can arrive at four data sets for the NPI by specifying PIECESIZE 2M . Of course, if your NPI grows over eight megabytes in total you will get additional data sets. Keep in mind that 32 pieces is the limit if the underlying table space is not defined with DSSIZE 4G or greater. The limit is 254 pieces if the table space is defined as DSSIZE 4G or greater.

Index Image Copies

As of DB2 V6 and later, it is possible to use the COPY utility to make backup image copies of index data sets. You also can use the RECOVER utility on index image copies to recover indexes. To use COPY on indexes, the COPY parameter must be set to YES . The default value for the COPY parameter is NO .

NOTE

The REBUILD utility can be used to rebuild indexes from the underlying data in the table. REBUILD can be executed on any index regardless of the value of the COPY parameter.

Indexing Auxiliary Tables

Only one index can be specified on an auxiliary table. The index cannot specify any columns. The default key for an index on an auxiliary table is implicitly defined as a ROWID , which is a unique 19-byte, DB2-generated value. For more information on auxiliary tables consult Chapter 9, "Large Objects and Object/Relational Databases."

Type 2 Indexes Are Required

Prior to DB2 V6, there were two types of indexes available to DB2: Type 1 and Type 2. Type 2 indexes were introduced with DB2 Version 4; Type 1 indexes had been available since Version 1 of DB2. However, as of DB2 V6, Type 1 indexes are obsolete and no longer supported. Type 2 indexes are the only type of index that can be defined. Type 2 indexes provide the following benefits over Type 1 indexes:

Eliminate index locking (the predominant cause of contention in most pre-V4 DB2 applications).
Do not use index subpages.
Are the only type supported for ASCII encoded tables.
Many newer DB2 features cannot be used unless Type 2 indexes are used; these features include row level locking, data sharing, full partition independence, uncommitted reads, UNIQUE WHERE NOT NULL , and CPU and Sysplex parallelism.

NOTE

The TYPE 2 clause can be explicitly specified in the CREATE INDEX statement. However, if it is not specified, DB2 will create a Type 2 index anyway. As of DB2 V6, it does not matter whether TYPE 2 is explicitly specified in the CREATE INDEX statement; Type 2 indexes are the only indexes that will be created by DB2. Furthermore, as of DB2 V8, DB2 will not work with Type 1 indexes, only Type 2 indexes.

< Day Day Up >

Use Workload As a Basis for Indexing

Be Aware of Data Modification

Uniqueness Recommendations

Create a Unique Index for Each Primary Key

Use WHERE NOT NULL to Allow Multiple Nulls in a UNIQUE Index

Create Indexes for Foreign Keys

When to Avoid Indexing

When to Avoid Placing Columns in an Index

When to Specify Extra Index Columns

Control the Length of Index Keys

Indexing Large and Small Tables

Promote Index-Only Access

Multi-column Indexes

Consider Several Indexes Instead of a Multi-column Index

Multi-Index Access

Specify Appropriate Index Parameters

Table 6.1. Index Free Space Allocation Chart

Create Indexes Before Loading Tables

Consider Deferring Index Creation

Consider Deferring Index Data Set Definition

Let DB2 Tell You What Indexes to Create

Store Index and Table Space Data Sets Separately

Consider Separate Index Buffer Pools

PRIQTY and SECQTY

Use PIECESIZE to Explicitly Define Index Data Set Size

Index Image Copies

Indexing Auxiliary Tables

Type 2 Indexes Are Required

Use `WHERE NOT NULL` to Allow Multiple Nulls in a `UNIQUE` Index

`PRIQTY` and `SECQTY`

Use `PIECESIZE` to Explicitly Define Index Data Set Size