Reorganizing the warehouse | Oracle9iR2 Data Warehousing

7.5 Reorganizing the warehouse

Reorganizing a database, irrespective of whether it is a data warehouse or a database used for transaction processing-style systems, is not a task to be undertaken lightly. -Unfortunately, in a data warehouse, the time required to reorganize can become a serious issue due to the high data volumes involved. Therefore, it shouldn't be necessary to reorganize the database entirely, but minor changes may be necessary.

7.5.1 Why reorganize?

Reorganizations can occur for a variety of reasons, such as:

The business needs change
A flaw in the database design
A regular archiving of data
The government changes the rules
The integration of another company's computer system following an acquisition

A change in the business requirements from the system is almost impossible to predict, and one solution to the problem may be to create a data mart rather than restructure the entire data warehouse.

A fundamental flaw in the database design can be overcome by carefully reviewing the database design and, if necessary, asking for an external review by an experienced data warehouse designer. One of the problems with data warehouses is that what may seem a good design at the outset may prove unsuitable in the long term due to the large volumes of data involved. Therefore, try to minimize the likelihood of this occurring by using techniques such as partitioning.

Some reorganization may be planned. If you want to keep your data warehouse somewhat constant in size and not let it grow indefinitely, you may decide to keep only a few years of data. When new data is added, old data is archived and removed. This is called a rolling window operation and is discussed later in this section.

7.5.2 Partition maintenance

Some of the structural changes required by the data warehouse can be achieved using partition operations. In this section, we will look at some of the most common partition maintenance operations, including the following:

Adding and dropping partitions
Exchanging a partition
Splitting and merging partitions
Coalescing a partition
Truncating a partition
Moving a partition

Adding and dropping partitions to perform rolling window operations

When you first create the warehouse, you may have little or no historic data to populate it. Therefore, for the first 18 to 24 months, you happily add new data every month until the required system limits are met. Then, when the data warehouse is full, every month you back up the old data and drop it to make room for new data. Without partitions you would have to scan the data and manually delete it, which is very time consuming. The faster alternative to this problem is to drop the partition containing the old data and create a new partition for the new data, as shown in Figure 7.42.

click to expand
Figure 7.42: Partition maintenance.

This technique is applicable only if the data is partitioned on a date. Therefore, if you decide upon another partitioning scheme, such as a code, then this type of maintenance operation would not be possible.

The SQL commands to perform the tasks in Figure 7.42 are shown below. First, the old partition containing the data for sales for the month of May 1997 is dropped.

 SQL> ALTER TABLE easydw.purchases      DROP PARTITION purchases_may97;

Next, the new partition for the data for sales for the month of April 2002 is created. The first step is to create the tablespace where the data will reside.

 SQL> CREATE TABLESPACE purchases_apr2002      datafile 'E:\EASYDW\PURCHASESapr2002.f'      size 5m reuse autoextend on default storage      (initial 16k next 16k pctincrease 0 maxextents unlimited);

The next step is to alter the table definition to reference the new tablespace that was just created. The PURCHASES_APR2002 partition is added to the table.

 SQL> ALTER TABLE easydw.purchases      ADD PARTITION purchases_apr2002        values less than (TO_DATE('30-04-2002', 'DD-MM-YYYY'))        pctfree 0 pctused 99        storage (initial 64k next 16k pctincrease 0)        tablespace purchases_apr2002 ;

Exchanging a partition

After a new partition is added to a table, data can be loaded into it. Various techniques to load data were described in Chapter 5. If the data is already in a table in the database, the fastest way to move the data into the new partition is using exchange partition, which is used to move data from a nonpartitioned table into a partitioned table. Exchange partition can also be used to convert a partition into a nonpartitioned table or between various types of partitioned tables. The following code segment shows moving the data from the APR_ORDERS table into the PURCHASES_APR2002 partition of the EASYDW.PURCHASES table.

 SQL> ALTER TABLE easydw.purchases      EXCHANGE PARTITION purchases_apr2002 WITH TABLE apr_orders;

Merging partitions

You can merge two partitions into one or split a partition if a partition becomes too large. Data can either be merged into a new partition or into an existing partition.

One technique often used by designers is to keep the first six months' data in monthly partitions, and then after that, store the data in quarterly partitions. Using the MERGE PARTITION option the data can be easily moved to the new partition, as illustrated in the following code segment.

If we wanted to combine the data for January through March 2002 into a partition for the quarter, we could merge the partitions. First, a new tablespace is created to store the Q1 purchases.

 SQL> CREATE TABLESPACE purchases_q1_2002        DATAFILE 'c:\ora9ir2\oradata\orcl\PURCHASESQ12002.f' SIZE 5M        REUSE AUTOEXTEND ON        DEFAULT STORAGE         (INITIAL 64K NEXT 64K PCTINCREASE 0 MAXEXTENTS UNLIMITED);

The partitions are merged two at a time. If a table is partitioned by range, only adjacent partitions can be merged. We can merge January and February or February and March but cannot merge January and March.

The new partition inherits the upper bound of the two merged partitions. Therefore, the two partitions with the highest ranges need to be merged first (February and March) into the Q1 partition. The following statement merges the partitions and stores them in the newly created tablespace. The PURCHASES_Q12002 partition is automatically added to the purchases table. The PURCHASES_FEB2002 and PURCHASES_MAR2002 are automatically dropped from the purchases table.

 SQL> ALTER TABLE purchases        MERGE PARTITIONS purchases_feb2002,purchases_mar2002        INTO PARTITION purchases_q12002        TABLESPACE purchases_q1_2002

Next, the partition with the lowest range, PURCHASES_JAN2002, is merged into the PURCHASES_Q12002 partition. This statement shows merging data into an existing partition.

 SQL>ALTER TABLE purchases       MERGE PARTITIONS purchases_jan2002,purchases_q12002       INTO PARTITION purchases_q12002       TABLESPACE purchases_q1_2002

After this operation, the purchases table contains one partition, as shown in Figure 7.43. The high value for the purchases_q12002 table is March 31, 2002.

click to expand
Figure 7.43: Merge partition.

When merging partitions, both the data and the indexes are merged. The index partitions for January, February, and March were automatically dropped and were replaced by a new index partition for Q1. In the following query, there is a new index partition, PURCHASES_Q12002, for the PURCHASE_PRODUCT_INDEX and the PURCHASE_TIME_INDEX.

 SQL> SELECT INDEX_NAME, PARTITION_NAME, STATUS FROM USER_IND_PARTITIONS; INDEX_NAME              PARTITION_NAME      STATUS ----------------------- ------------------- ------- PURCHASE_PRODUCT_INDEX  PURCHASES_Q12002    UNUSABLE PURCHASE_TIME_INDEX     PURCHASES_Q12002    UNUSABLE

The new index partitions are unusable and must be rebuilt, as shown in the following example.

 SQL> ALTER INDEX purchase_product_index REBUILD PARTITION purchases_q12002 SQL> ALTER INDEX purchase_time_index REBUILD PARTITION purchases_q12002

Splitting partitions

If a partition becomes too big, it may need to be split to help maintenance operations complete in a shorter period of time or to spread the I/O across more devices. A partition can be split into two new partitions. If the PURCHASES table was originally partitioned by quarter, and sales significantly exceeded expectations, resulting in a very large partition, the partition could be split up into three monthly partitions. Tablespaces are first created for PURCHASES_JAN2002, PURCHASES_FEB2002, and PURCHASES_MAR2002.

In the following example, all rows with PURCHASE_DATE less than or equal to January 31, 2002, will be split into the PURCHASES_JAN2002 partition. The remaining rows will remain in the PURCHASES_Q12002 partition. The January purchases will be stored in the PURCHASES_JAN2002 tablespace.

 SQL> ALTER TABLE purchases      SPLIT PARTITION purchases_q12002      AT (TO_DATE('31-JAN-2002','dd-mon-yyyy'))      INTO (PARTITION purchases_jan2002            TABLESPACE purchases_jan2002,            PARTITION purchases_q12002)

After the split operation, there are two partitions. Next, the remaining rows in the partition, PURCHASE_Q12002 are split into the February and March partitions.

 SQL> ALTER TABLE purchases      SPLIT PARTITION purchases_q12002      AT (TO_DATE('28-FEB-2002','dd-mon-yyyy'))      INTO (PARTITION purchases_feb2002            TABLESPACE purchases_feb2002,            PARTITION purchases_mar2002            TABLESPACE purchases_mar2002)

The data has been repartitioned, and the PURCHASES_Q12002 partition automatically dropped, as can be seen in Figure 7.44.

click to expand
Figure 7.44: Split partition.

Any partitions of the local indexes corresponding to the PURCHASES_Q12002 partition have been dropped. In their place are new local index partitions for the new table partitions.

 SQL> SELECT INDEX_NAME, PARTITION_NAME, STATUS      FROM USER_IND_PARTITIONS; INDEX_NAME               PARTITION_NAME      STATUS ------------------------ ------------------- ------ PURCHASE_PRODUCT_INDEX   PURCHASES_JAN2002   USABLE PURCHASE_PRODUCT_INDEX   PURCHASES_FEB2002   UNUSABLE PURCHASE_PRODUCT_INDEX   PURCHASES_MAR2002   USABLE PURCHASE_TIME_INDEX      PURCHASES_JAN2002   USABLE PURCHASE_TIME_INDEX      PURCHASES_FEB2002   UNUSABLE PURCHASE_TIME_INDEX      PURCHASES_MAR2002   USABLE

Any unusable indexes must be rebuilt.

 SQL> ALTER INDEX PURCHASE_PRODUCT_INDEX      REBUILD PARTITION purchases_feb2002 SQL> ALTER INDEX purchase_time_index      REBUILD PARTITION purchases_feb2002

Coalescing hash partitions

Range and list partitions can be merged, but hash partitions cannot; they must be coalesced. Rather than determining what partition a row is stored in by comparing the value of the partitioning key with the table's partitioning criteria, as is done for range or list partitioning, the partition is determined by applying a hash function.

Merging partitions, in effect, reduces the number of partitions by one. When the number of partitions changes, the hash function must be reapplied to redistribute the data. Coalescing the partitions does this.

The following example shows the creation of a hash-partitioned table with ten partitions.

 CREATE TABLE easydw.hash_purchases  (product_id            varchar2(8),   time_key              date,   customer_id           varchar2(10),   purchase_date         date,   purchase_time         number(4,0),   purchase_price        number(6,2),   shipping_charge       number(5,2),   today_special_offer   varchar2(1)) PARTITION BY HASH(product_id) PARTITIONS 10;

To reduce the number of partitions by one, issue the following command.

 SQL> ALTER TABLE easydw.hash_purchases COALESCE PARTITION;

This, in effect, drops a partition.

Likewise, a hash-partitioned table cannot be split when it becomes too big. To increase the number of partitions in a hash-partitioned table, alter the table and add a partition to it.

Truncating partitions

Sometimes we need to remove all the rows in a partition. For example, we may only keep 18 months' worth of data, and, once a month, we need to remove that old data. Rather than delete each row individually, we can use the TRUNCATE PARTITION option, which rapidly removes the data.

To remove all the rows from a partition, but not the partition itself, use TRUNCATE partition. This is much faster than deleting each row in the partition individually. Any local indexes for the partition, such as the EASYDW.PURCHASE_TIME_INDEX, are truncated also. Prior to truncating the partition, there are 7,498 rows in the PURCHASES_JAN2002 partition.

 SQL> SELECT COUNT(*) FROM purchases      WHERE time_key BETWEEN TO_DATE('01-JAN-2002') AND                             TO_DATE('31-JAN-2002')   COUNT(*)   --------       7498 SQL> ALTER TABLE PURCHASES      TRUNCATE PARTITION purchases_jan2002;

After truncating the partition, all rows have been deleted.

 SQL> SELECT COUNT(*) FROM purchases      WHERE time_key BETWEEN TO_DATE('01-JAN-2002') AND                             TO_DATE('31-JAN-2002')   COUNT(*)   ----------            0

Moving partitions

A partition can be moved from one tablespace to another. If the January purchases partition had been created in EASYDW_DEFAULT or some other tablespace, and had to be moved to the PURCHASES_JAN2002 tablespace, the following command could be used. By specifying NOLOGGING, the operation will not be logged, resulting in better performance. After performing NOLOGGING operations, don't forget to take a backup, since you will not be able to recover in the event of a media failure.

 SQL> ALTER TABLE purchases MOVE PARTITION purchases_jan2002      TABLESPACE purchases_jan2002 NOLOGGING

Partitions facilitate management

Most of the management operations that can be performed at a table level can also be performed on an individual partition or subpartition of a table. Partitions or subpartitions can be backed up, exported, restored, and recovered without affecting the availability of the other partitions or subpartitions. To best manage each partition or subpartition independently, they should each be stored in their own tablespace. Each tablespace should be stored on one or more separate storage devices.

Summary Management and the query optimizer make use of the fact that the data is partitioned in choosing the optimal strategy for executing a query or refreshing a materialized view. Partition change tracking (PCT) for materialized views, which was discussed in Chapter 4, keeps track of what partitions have been updated after a partition maintenance operation and recomputes only that portion of the materialized view when it is refreshed. Partition change tracking also increases the query rewrite capabilities of the materialized view by rewriting queries to use the partitions of the materialized view that are not stale. If a table is partitioned, the query optimizer can use a technique called partition elimination, or partition pruning, to determine if a certain query can be answered by reading only specific partitions.

Hint:

The examples shown in this section have performed partition maintenance operations on a table, but don't forget that partitions are also used on indexes, so you will have to create and maintain the corresponding index partitions. Also, in these examples we've used partitions, but subpartitions can be used as well.

7.5.3 Index changes

Probably one of the aspects of the design that will be changed is the indexes. New ones will be created, and existing ones will be modified. In a data warehouse, there is a temptation to create more indexes, because of the lack of updates to the system. However, you should consider the impact all of these indexes will have on the data load time.

If you need to rebuild an index for any reason, then it is suggested that you use the ALTER INDEX REBUILD statement, which should offer better performance than dropping and recreating the index.

Many table maintenance operations on partitioned tables invalidate global indexes, and they are marked UNUSABLE. You must then rebuild the entire global index or, if partitioned, all of its partitions. To avoid this in Oracle 9i Release 2, you can include UPDATE GLOBAL INDEXES in the ALTER TABLE statement for the maintenance operation. Specifying this clause tells Oracle to update the global index at the time it executes the maintenance operation DDL statement.

Hint:

By partitioning the data, you can perform maintenance on specific index partitions rather than on the entire index.