7.10. Multidimensional Clustering Tables and Block IndexesMultidimensional clustering (MDC), as its name implies, allows for clustering of the physical data pages in multiple dimensions. For example, using the sales table illustrated previously in Figure 7.15, you can cluster the data based on two dimensions: sales_person and year columns. This method of clustering has several benefits over clustering indexes.
NOTE MDC is primarily intended for data warehousing environments; however, it can also work in online transaction processing (OLTP) environments. 7.10.1. MDC TablesLet's redefine our sales table as an MDC table, using dimensions sales_person and year. CREATE TABLE sales ( sales_person VARCHAR(30) NOT NULL, region CHAR(5) NOT NULL, number_of_sales INT NOT NULL, year INT ) ORGANIZE BY DIMENSIONS (sales_person, year) DB2 places records that have the same sales_person and year values in physical locations that are close together as they are inserted into the table. These locations are called blocks. A block can be treated as an extent. The size of an extent can be defined in the CREATE TABLESPACE statement. The minimum size for a block is two pages, like extents. Figure 7.18 illustrates the contents of the sales table using the new MDC definition. For simplicity, in this example a block can hold only two records. Figure 7.18. The sales table defined as an MDC tableThe figure shows this MDC table is physically organized such that records having the same year and sales_person values are grouped together into separate blocks. For example, all records in block 0 have sales_person = John and year = 2000. All records in block 4 have sales_person = Mary and year = 2002. When a block is filled, DB2 will allocate a new block or reuse an old block for the new records inserted. In Figure 7.18, block 4 was filled, and thus block 5 had to be created. Blocks that have the same dimension values are grouped into cells. Each cell represents a unique combination of the dimension values. If there are X different values for sales_person, and Y different values for year, there are X*Y number of cells. In Figure 7.18, you see the table sales has three values for dimension sales_person, namely, John, Mary, and Sam. It also has three values for dimension year, namely, 2000, 2001, and 2002. Therefore, nine cells are illustrated, one for each combination. A cell contains only the necessary blocks to store the records that have the dimension values of that cell. If there are no records (as in the case of cell 4 in Figure 7.18), no blocks will be allocated. In Figure 7.18, we also illustrate the concept of a slice. A slice consists of all the cells that belong to a specific value of a dimension. Figure 7.18 highlights two out of six slices, one for dimension year with a value of 2000 and the other for dimension sales_person with a value of Mary. 7.10.2. Block IndexesBlock indexes are pointers to a block, not a single record. A block index points to the beginning of each block, which has a unique block ID (BID). MDC tables use only block indexes. Figure 7.19 shows a comparison between a regular index and a block index. Figure 7.19. A regular index versus a block indexA block index has the following advantages over a regular index.
An MDC table defined with even a single dimension can benefit from block indexes and can be a viable alternative to a regular table using a clustering index. When an MDC table is created, a dimension block index is created for each specified dimension. For our sales table, two dimension block indexes are created, one for the sales_person dimension and one for the year dimension, as illustrated in Figure 7.20. Figure 7.20. Block indexes for sales_person and yearA query requesting records that have sales_person = John can use the sales_person block index to quickly access all three blocks (block 0, 1, and 2) that satisfy this criteria. Another query can use the year block index to independently access all blocks that have year = 2002 (blocks 2, 4, 5, and 8). In addition to the dimension block indexes, a composite block index is also created during MDC table creation. A composite block index contains all columns across all dimensions and is used to maintain the clustering of data over INSERT and UPDATE activity. If a single dimension block index already contains all the dimension key columns, a composite block index is not created. 7.10.3. The Block MapA block map is an array containing an entry for each block of an MDC table. The entry indicates whether or not a block is in use. Each block has a unique identifier (BID) and also an IN_USE status bit. When a DELETE operation removes the last record in a block, DB2 frees the block by changing its IN_USE status bit and removing its BID from all block indexes. When new records are inserted and they can no longer fit into existing blocks, DB2 first scans for free blocks, looking for ones without the IN_USE bit set. If a free block is found, DB2 reuses it, updates its IN_USE bit, and adds its BID to block indexes. Reusing free blocks greatly reduces fragmentation and in turn minimizes the need to reorganize the MDC table even though pages within the blocks may be fragmented. 7.10.4. Choosing Dimensions for MDC TablesChoosing the right dimensions for an MDC table is crucial for obtaining the maximum advantages MDC can provide. You should consider the following:
NOTE The Design Advisor tool can make recommendations on what dimensions to choose for a given table. |