Chapter 17: Degenerate Dimensions


This chapter teaches you a technique for consolidating dimensions called the degenerate dimension. The technique reduces the number of dimensions and simplifies a dimensional data warehouse schema. A simpler schema is easier to understand than a complex one and delivers faster query performance.

You degenerate a dimension when the dimension does not have any data needed by the data warehouse user. You relocate the data from the degenerated dimension into the fact table and remove the degenerated dimension.

Degenerating the Order Dimension

In this section I explain how to degenerate the order dimension, including revising the schema and the regular population script. The first thing you do with the degenerate dimension technique is identify any column that is never used for data analysis.

For example, the order_number column in the order dimension is potentially such a column. However, your users might still need the order number if they want to see the details of a transaction. Therefore, before you degenerate the order dimension, you have to relocate order numbers to the sales_order_fact table. Figure 17.1 shows the schema after the relocation.

image from book
Figure 17.1: The schema with the order dimension degenerated

To degenerate the order_dim table, do the following four steps in sequence:

  1. Add the order_number column to the sales_order_fact table.

  2. Move the existing order numbers from the order_dim table to sales_prder_fact table.

  3. Remove the order_sk column in the sales_order_fact table.

  4. Remove the order_dim table.

The script in Listing 17.1 does all of the necessary steps.

Listing 17.1: Degenerating order dimension

image from book
 /*****************************************************************/ /*                                                               */ /* degenerate.sql                                                */ /*                                                               */ /*****************************************************************/ /* default to dw database                                        */ USE dw; /* adding order_number column                                    */ ALTER TABLE sales_order_fact ADD order_number INT AFTER receive_date_sk ; /* loading existing order_number                                 */ UPDATE sales_order_fact a, order_dim b SET a.order_number = b.order_number WHERE a.order_sk = b.order_sk ; /* removing order_sk column                                      */ ALTER TABLE sales_order_fact   DROP order_sk ; /* removing the order_dim table                                  */ DROP TABLE order_dim ; /* end of script                                                 */
image from book

Run the script in Listing 17.1 by calling the script file name.

 mysql> \. c:\mysql\scripts\degenerate.sql

You should see the following on the console.

 Database changed Query OK, 49 rows affected (0.34 sec) Records: 49  Duplicates: 0  Warnings: 0 Query OK, 49 rows affected (0.06 sec) Rows matched: 49  Changed: 49  Warnings: 0 Query OK, 49 rows affected (0.32 sec) Records: 49  Duplicates: 0  Warnings: 0 Query OK, 0 rows affected (0.04 sec)

The message indicates that you had 49 orders in the order dimensions truncated. This is the number of order numbers that moved to the fact table.

Confirm that the order_number column was added to the sales_fact_table by typing the following statement.

 mysql> desc sales_order_fact;

The result is given below.

 +------------------------+-------------+----+---+---------+-------+ | Field                  |Type         |Null|Key| Default | Extra | +------------------------+-------------+----+---+---------+-------+ |customer_sk             |int(11)      | YES|   | NULL    |       | |product_sk              |int(11)      | YES|   | NULL    |       | |order_date_sk           |int(11)      | YES|   | NULL    |       | |allocate_date_sk        |int(11)      | YES|   | NULL    |       | |packing_date_sk         |int(11)      | YES|   | NULL    |       | |ship_date_sk            |int(11)      | YES|   | NULL    |       | |receive_date_sk         |int(11)      | YES|   | NULL    |       | |order_number            |int(11)      | YES|   | NULL    |       | |request_delivery_date_sk|int (11)     | YES|   | NULL    |       | |order_amount            |decimal 10,2)| YES|   | NULL    |       | |order_quantity          |int(11)      | YES|   | NULL    |       | |allocate_quantity       |int (11)     | YES|   | NULL    |       | |packing_quantity        |int(11)      | YES|   | NULL    |       | |ship_quantity           |int (11)     | YES|   | NULL    |       | |receive_quantity        |int (11)     | YES|   | NULL    |       | +------------------------+-------------+----+---+-_-------+-------+ 15 rows in set (0.00 sec)

You can confirm that the 49 order numbers from order_dim have been relocated to the sales_order_fact table by using this statement.

 mysql> select count(0) from sales_order_fact where order_number IS       NOT NULL;

The result of the query is given below.

 +----------+ | count(0) | +----------+ |       49 | +----------+ 1 row in set (0.00 sec)

You should also confirm that the order_sk column was removed from the sales_order_fact table using this statement.

 mysql> desc sales_order_fact;

You will see the following on your console.

 +------------------------+-------------+------+---+-------+-------+ |Field                   |Type         | Null |Key|Default| Extra | +------------------------+-------------+------+---+-------+-------+ |customer_sk             |int(11)      | YES  |   | NULL  |       | |product_sk              |int(11)      | YES  |   | NULL  |       | |order_date_sk           |int(11)      | YES  |   | NULL  |       | |allocate_date_sk        |int(11)      | YES  |   | NULL  |       | |packing_date_sk         |int(11)      | YES  |   | NULL  |       | |ship_date_sk            |int(11)      | YES  |   | NULL  |       | |receive_date_sk         |int(11)      | YES  |   | NULL  |       | |order_number            |int(11)      | YES  |   | NULL  |       | |request_delivery_date_sk|int(11)      | YES  |   | NULL  |       | |order_amount            |decimal(10,2)| YES  |   | NULL  |       | |order_quantity          |int(11)      | YES  |   | NULL  |       | |allocate_quantity       |int(11)      | YES  |   | NULL  |       | |packing_quantity        |int(11)      | YES  |   | NULL  |       | |ship_quantity           |int(11)      | YES  |   | NULL  |       | |receive_quantity        |int(11)      | YES  |   | NULL  |       | +------------------------+-------------+------+---+-------+-------+ 15 rows in set (0.00 sec)

Finally, make sure that the order_dim table was removed by using this command.

 mysql> show tables;

Here is the list of tables in the dw database.

 +----------------------------+ | Tables_in_dw               | +----------------------------+ | allocate_date_dim          | | campaign_session_stg       | | customer_dim               | | customer_stg               | | date_dim                   | | month_dim                  | | month_end_sales_order_fact | | order_date_dim             | | pa_customer_dim            | | packing_date_dim           | | product_dim                | | product_stg                | | promo_schedule_stg         | | receive_date_dim           | | request_delivery_date_dim  | | sales_order_fact           | | ship_date_dim              | +----------------------------+ 17 rows in set (0.00 sec)



Dimensional Data Warehousing with MySQL. A Tutorial
Dimensional Data Warehousing with MySQL: A Tutorial
ISBN: 0975212826
EAN: 2147483647
Year: 2004
Pages: 149

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net