A Closer Look At Indexes | DB2 9 Fundamentals: Certification Study Guide

In Chapter 4, "Working with Databases and Database Objects," we saw that an index is an object that contains an ordered set of pointers that refer to rows in a base table. Each index is based upon one or more columns in the base table they refer to (known as keys), yet they are stored as separate entities.

While some indexes are created implicitly to provide support for a table's definition (for example, to provide support for a primary key), indexes are typically created explicitly, using tools available with DB2. One way to explicitly create an index is by executing the CREATE INDEX SQL statement. The basic syntax for this statement is:

 CREATE <UNIQUE> INDEX [IndexName] ON [TableName]([PriColumnName] <ASC | DESC>, ...) <INCLUDE ([SecColumnName], ...)> <CLUSTER> <DISALLOW REVERSE SCANS | ALLOW REVERSE SCANS>

where:

IndexName	Identifies the name that is to be assigned to the index to be created.
TableName	Identifies the name assigned to the base table with which the index to be created is to be associated.
PriColumnName	Identifies one or more primary columns that are to be part of the index's key. (The combined values of each primary column specified will be used to enforce data uniqueness in the associated base table.)
SecColumnName	Identifies one or more secondary columns whose values are to be stored with the values of the primary columns specified, but are not to be used to enforce data uniqueness.

If the UNIQUE clause is specified when the CREATE INDEX statement is executed, rows in the table associated with the index to be created must not have two or more occurrences of the same values in the set of columns that make up the index key. If the base table for which the index is to be created contains data, this uniqueness is checked when the DB2 Database Manager attempts to create the index specified. If records with duplicate values for the index key are found, the index will not be created; if no duplicates are found, the index is created, and uniqueness is enforced each time an insert or update operation is performed against the table. Any time the uniqueness of the index key is compromised, the insert or update operation will fail and an error will be generated.

Therefore, if you wanted to create an index for a base table named EMPLOYEES such that the index key consists of a column named EMPNO and all employee numbers entered into the EMPNO column are guaranteed to be unique, you could do so by executing a CREATE INDEX statement that looks something like this:

 CREATE UNIQUE INDEX empno_indx ON employees (empno)

If an index is created for an empty table, that index will not have any entries stored in it until the table the index is associated with is populated. On the other hand, if an index is created for a table that already contains data, index entries will be generated for the existing data and added to the index as soon as it is created.

Any number of indexes can be created for a table, using a wide variety of combinations of columns. However, each index comes at a price in both storage requirements and performance: Each index replicates its key values, and this replication requires additional storage space. And because each modification to a table results in a similar modification to all indexes defined on the table, performance can decrease when insert, update, and delete operations are performed. In fact, if a large number of indexes are created for a table that is modified frequently, overall performance will decrease, rather than increase. Tables that are used for data mining, business intelligence, business warehousing, and other applications that execute many (and often complex) queries while rarely modifying data are prime targets for multiple indexes. On the other hand, tables that are used in on-line transactional processing (OLTP) environments, or other environments where data throughput is high, should use indexes sparingly.

Clustering Indexes

A clustering index is a special index that, when used, informs the DB2 Database Manager to always try to store records on a page that contains other records that have similar index key values. (If no space is available on that page, the DB2 Database Manager will attempt to store the record in a page that is nearby.) A clustering index usually increases performance by decreasing the amount of I/O required to access data: this results in fewer page fetches, since like data values are stored on the same physical page. (Only one index in a table can be a clustering index.)

When a logical set of rows are physically stored close together, a read operation on the set of rows will require less I/O, because adjacent rows are more likely to be found within the same extent (remember, data pages are written in batches called extents) instead of being widely distributed across multiple extents. And because similar key values are placed on the same data page whenever possible, often only a portion of a table will need to be read in response to a query. A clustering index is most useful for columns that have range predicates because it allows better sequential access of data in the base table.

A clustering index is created by specifying the CLUSTER option with the CREATE INDEX SQL statement. Thus, if you wanted to create a clustering index for a base table named EMPLOYEES such that the index key consists of a column named EMPNO and all employee numbers entered into the EMPNO column are guaranteed to be unique, you could do so by executing a CREATE INDEX statement that looks something like this:

 CREATE UNIQUE INDEX empno_cindx ON employees (empno)  CLUSTER

When creating a clustering index, the PCTFREE option of the CREATE INDEX SQL statement can be used to control how much space is reserved for future insert and update operations. Specify a higher PCTFREE value (the default is 10 percent) at index creation time to reduce the likelihood of index page splits occurring when records are inserted into the index.

Tip

Over time, update operations can cause rows to change page locations, thereby reducing the degree of clustering that exists between an index and its data pages. Reorganizing a table (with the REORG utility) using the appropriate index will return the specified index to its original level of clustering.

Multidimensional Clustering (MDC) Indexes

Multidimensional clustering (MDC) provides a way to cluster data along multiple dimensions automatically. Such clustering results in significant improvement in query performance, as well as significant reduction in the overhead of data maintenance operations, such as table/index reorganization, and index maintenance operations during insert, update, and delete operations. Multidimensional clustering is primarily intended for data warehousing, OLTP, and large database environments.

Earlier, we saw that when a clustering index is used, the DB2 Database Manager maintains the physical order of data on pages in the key order of the index, as records are inserted and updated in the table. With good clustering, only a portion of the table needs to be accessed in response to a query, and when the pages are stored sequentially, more efficient prefetching can be performed. With MDC, these benefits are extended to multiple keys (or dimensions); MDC allows a table to be physically clustered on more than one key (or dimension) simultaneously. Not only will queries access only those pages that contain records with the correct dimension values, these qualifying pages will be grouped by extents. Furthermore, although a table with a clustering index can become unclustered over time as space fills up in the table, an MDC table is able to maintain its clustering over all dimensions automatically and continuously, thus eliminating the need to reorganize a table in order to restore the original level of clustering used.