Section 7.10. Multidimensional Clustering Tables and Block Indexes


7.10. Multidimensional Clustering Tables and Block Indexes

Multidimensional clustering (MDC), as its name implies, allows for clustering of the physical data pages in multiple dimensions. For example, using the sales table illustrated previously in Figure 7.15, you can cluster the data based on two dimensions: sales_person and year columns. This method of clustering has several benefits over clustering indexes.

  • With MDC, the data pages are physically clustered by several dimensions simultaneously. With clustering indexes, only one cluster index is allowed per table; the other indexes in the table are unclustered.

  • MDC guarantees clustering over time even though frequent INSERT operations are performed. Thus, less maintenance and overhead is required. With clustering indexes, this is not the case. As data pages are filled up, a clustered INSERT operation may encounter that the row to insert does not fit the right page to maintain the clustering of the data. In such cases, the row may end up on a page that is not close to the other related rows. Clustering indexes require an administrator to perform periodic table reorganizations to recluster the table and set up pages with additional free space to accommodate future clustered INSERT requests.

  • MDC uses block indexes, indexes that point to an entire block of pages. These are smaller indexes than regular and clustering indexes, which point to a single record.

NOTE

MDC is primarily intended for data warehousing environments; however, it can also work in online transaction processing (OLTP) environments.


7.10.1. MDC Tables

Let's redefine our sales table as an MDC table, using dimensions sales_person and year.

 CREATE TABLE sales (        sales_person     VARCHAR(30)  NOT NULL,        region           CHAR(5)      NOT NULL,        number_of_sales  INT          NOT NULL,        year             INT        ) ORGANIZE BY DIMENSIONS (sales_person, year) 

DB2 places records that have the same sales_person and year values in physical locations that are close together as they are inserted into the table. These locations are called blocks. A block can be treated as an extent. The size of an extent can be defined in the CREATE TABLESPACE statement. The minimum size for a block is two pages, like extents.

Figure 7.18 illustrates the contents of the sales table using the new MDC definition. For simplicity, in this example a block can hold only two records.

Figure 7.18. The sales table defined as an MDC table


The figure shows this MDC table is physically organized such that records having the same year and sales_person values are grouped together into separate blocks. For example, all records in block 0 have sales_person = John and year = 2000. All records in block 4 have sales_person = Mary and year = 2002.

When a block is filled, DB2 will allocate a new block or reuse an old block for the new records inserted. In Figure 7.18, block 4 was filled, and thus block 5 had to be created.

Blocks that have the same dimension values are grouped into cells. Each cell represents a unique combination of the dimension values. If there are X different values for sales_person, and Y different values for year, there are X*Y number of cells. In Figure 7.18, you see the table sales has three values for dimension sales_person, namely, John, Mary, and Sam. It also has three values for dimension year, namely, 2000, 2001, and 2002. Therefore, nine cells are illustrated, one for each combination.

A cell contains only the necessary blocks to store the records that have the dimension values of that cell. If there are no records (as in the case of cell 4 in Figure 7.18), no blocks will be allocated.

In Figure 7.18, we also illustrate the concept of a slice. A slice consists of all the cells that belong to a specific value of a dimension. Figure 7.18 highlights two out of six slices, one for dimension year with a value of 2000 and the other for dimension sales_person with a value of Mary.

7.10.2. Block Indexes

Block indexes are pointers to a block, not a single record. A block index points to the beginning of each block, which has a unique block ID (BID). MDC tables use only block indexes. Figure 7.19 shows a comparison between a regular index and a block index.

Figure 7.19. A regular index versus a block index


A block index has the following advantages over a regular index.

  • Block indexes are significantly smaller than regular indexes because they point to a block rather than a record. The reduced size makes index scans much faster.

  • Less maintenance overhead is associated with block indexes. They only need to be updated when adding the first record to a block and removing the last record from a block.

  • Prefetching is done in blocks, thus the amount of I/O is reduced.

An MDC table defined with even a single dimension can benefit from block indexes and can be a viable alternative to a regular table using a clustering index.

When an MDC table is created, a dimension block index is created for each specified dimension. For our sales table, two dimension block indexes are created, one for the sales_person dimension and one for the year dimension, as illustrated in Figure 7.20.

Figure 7.20. Block indexes for sales_person and year


A query requesting records that have sales_person = John can use the sales_person block index to quickly access all three blocks (block 0, 1, and 2) that satisfy this criteria. Another query can use the year block index to independently access all blocks that have year = 2002 (blocks 2, 4, 5, and 8).

In addition to the dimension block indexes, a composite block index is also created during MDC table creation. A composite block index contains all columns across all dimensions and is used to maintain the clustering of data over INSERT and UPDATE activity. If a single dimension block index already contains all the dimension key columns, a composite block index is not created.

7.10.3. The Block Map

A block map is an array containing an entry for each block of an MDC table. The entry indicates whether or not a block is in use. Each block has a unique identifier (BID) and also an IN_USE status bit. When a DELETE operation removes the last record in a block, DB2 frees the block by changing its IN_USE status bit and removing its BID from all block indexes. When new records are inserted and they can no longer fit into existing blocks, DB2 first scans for free blocks, looking for ones without the IN_USE bit set. If a free block is found, DB2 reuses it, updates its IN_USE bit, and adds its BID to block indexes.

Reusing free blocks greatly reduces fragmentation and in turn minimizes the need to reorganize the MDC table even though pages within the blocks may be fragmented.

7.10.4. Choosing Dimensions for MDC Tables

Choosing the right dimensions for an MDC table is crucial for obtaining the maximum advantages MDC can provide. You should consider the following:

  • Choose columns with the lowest cardinality.

    One of the advantages of using block indexes is that they point to a block rather than a record; therefore, there are fewer pointers to traverse. If each block contains only one record, the block index essentially becomes a regular index. You should try to minimize the number of blocks by increasing the number of records they can contain. You can achieve this by choosing columns with the lowest cardinality, that is, the lowest number of distinct values. For example, a column like region, with possible values of North, South, East, and West, is a good choice. A column like employee_id, which uniquely identifies each employee of a company that has 100,000 employees, is definitely a bad choice.

  • Choose the correct block size (extent size).

    MDC tables allocate space in blocks. The entire block is allocated even if only one record is inserted. For example, if your block can hold 100 pages, and on average only 10 records are inserted per block (assuming only one record can fit in a page), then 90% of the space is wasted. Thus, make sure you choose the correct block size.

  • Choose the right number of dimensions.

    The higher the number of dimensions, the more possible combinations you can have, and therefore the higher the number of possible cells. If there are many cells, each cell will likely contain only a few records, and if that is the case, the block size needs to be set to a small number.

NOTE

The Design Advisor tool can make recommendations on what dimensions to choose for a given table.




Understanding DB2(R. Learning Visually with Examples)
Understanding DB2: Learning Visually with Examples (2nd Edition)
ISBN: 0131580183
EAN: 2147483647
Year: 2004
Pages: 313

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net