Chapter 13: Dimension Role Playing


This chapter teaches you dimension role-playing, a technique you use when a fact needs a dimension more than once. For example, if the sales order fact has more than one date, say an order date and a request delivery date, you need to use the date dimension more than once.

In this chapter you also learn two types of dimension role playing implementations, table alias and database view. Both types use MySQL functions. The table alias type uses the dimension more than once in an SQL statement by assigning an alias for each use. As for the database view type, you create as many views as the number of roles you need the dimension on the fact.

Adding Request Delivery Dates

In this section I show you how to add a request delivery date, revise the data warehouse regular population, and test the revised regular population script. This preparation is for showing the implementation of date dimension role-playing in the next section.

The first thing you need to do is add a request_delivery_date column to the sales_order_fact table. The schema now looks like that in Figure 13.1.

image from book
Figure 13.1: Adding request_delivery_date to sales_order_fact

You can use the script in Listing 13.1 to add the request_delivery_date column.

Listing 13.1: Adding the request_delivery_date_sk column

image from book
 /**********************************************************************/ /*                                                                    */ /* request_delivery_date_sk.sql                                       */ /*                                                                    */ /**********************************************************************/ USE dw; ALTER TABLE sales_order_fact ADD request_delivery_date_sk INT AFTER order_date_sk ; /* end of script                                                      */
image from book

You run the script in Listing 13.1 using this command.

 mysql> \. c:\mysql\scripts\request_delivery_date_sk.sql

MySQL will indicate that there are 44 rows affected by the command.

 Database changed Query OK, 44 rows affected (0.39 sec) Records: 44  Duplicates: 0  Warnings: 0

Since the structure of the sales_order_fact table has changed, you also need to update the script for regularly populating this table. The new script is shown in Listing 13.2.

Listing 13.2: The revised daily DW regular population

image from book
 /**********************************************************************/ /*                                                                    */ /* dw_regular_13.sql                                                  */ /*                                                                    */ /**********************************************************************/ USE dw; /* CUSTOMER_DIM POPULATION                                            */ TRUNCATE customer_stg; LOAD DATA INFILE 'customer.csv' INTO TABLE customer_stg FIELDS TERMINATED BY ' , ' OPTIONALLY ENCLOSED BY "" LINES TERMINATED BY '\r\n' IGNORE 1 LINES ( customer_number , customer_name , customer_street_address , customer_zip_code , customer_city , customer_state , shipping_address , shipping_zip_code , shipping_city , shipping_state ) ; /* SCD 2 ON ADDRESSES                                                 */ UPDATE   customer_dim a , customer_stg b SET   a.expiry_date - SUBDATE (CURRENT_DATE, 1) WHERE   a.customer_number = b.customer_number AND (   a.customer_street_address <> b.customer_street_address      OR a.customer_city <> b.customer_city      OR a.customer_zip_code <> b.customer_zip_code      OR a.customer_state <> b.customer_state      OR a.shipping_address <> b.shipping_address      OR a.shipping_city <> b.shipping_city      OR a.shipping_zip_code <> b.shipping_zip_code      OR a.shipping_state <> b.shipping_state      OR a.shipping_address IS NULL      OR a.shipping_city IS NULL      OR a.shipping_zip_code IS NULL      OR a.shipping_state IS NULL) AND expiry_date - '9999-12-31' ; INSERT INTO customer_dim SELECT   NULL , b.customer_number , b.customer_name , b.customer_street_address , b.customer_zip_code , b.customer_city , b.customer_state , b.shipping_address , b.shipping_zip_code , b.shipping_city , b.shipping_state , CURRENT_DATE , '9999-12-31' FROM   customer_dim a , customer_stg b WHERE     a.customer_number = b.customer_number AND (   a.customer_street_address <> b.customer_street_address      OR a.customer_city <> b.customer_city      OR a.customer_zip_code <> b.customer_zip_code      OR a.customer_state <> b.customer_state      OR a.shipping_address <> b.shipping_address      OR a.shipping_city <> b.shipping_city      OR a.shipping_zip_code <> b.shipping_zip_code      OR a.shipping_state <> b.shipping_state      OR a.shipping_address IS NULL      OR a.shipping_city IS NULL      OR a.shipping_zip_code IS NULL      OR a.shipping_state IS NULL) AND EXISTS ( SELECT * FROM customer_dim x WHERE b.customer_number = x.customer_number AND a.expiry_date = SUBDATE (CURRENT_DATE, 1)) AND NOT EXISTS ( SELECT * FROM customer_dim y WHERE     b.customer_number = y.customer_number       AND y.expiry_date - '9999-12-31') ; /* END OF SCD 2                                                       */ /* SCD 1 ON NAME                                                      */ UPDATE customer_dim a, customer_stg b SET a.customer_name = b.customer_name WHERE     a.customer_number = b.customer_number       AND a.expiry_date = '9999-12-31'       AND a.customer_name <> b.customer_name ; /* ADD NEW CUSTOMER                                                   */ INSERT INTO customer_dim SELECT   NULL , customer_number , customer_name , customer_street_address , customer_zip_code , customer_city , customer_state , shipping_address , shipping_zip_code , shipping_city , shipping_state , CURRENT_DATE , '9999-12-31' FROM customer_stg WHERE customer_number NOT IN( SELECT a.customer_number FROM   customer_dim a , customer_stg b WHERE b.customer_number = a.customer_number ) ; /* RE-BUILD PA CUSTOMER DIMENSION                                     */ TRUNCATE pa_customer_dim; INSERT INTO pa_customer_dim SELECT   customer_sk , customer_number , customer_name , customer_street_address , customer_zip_code , customer_city , customer_state , shipping_address , shipping_zip_code , shipping_city , shipping_state , effective_date , expiry_date FROM customer_dim WHERE customer_state = 'PA' ; /* END OF CUSTOMER_DIM POPULATION                                     */ /* PRODUCT_DIM POPULATION                                             */ TRUNCATE product_stg ; LOAD DATA INFILE 'product.txt' INTO TABLE product_stg FIELDS TERMINATED BY '' OPTIONALLY ENCLOSED BY '' LINES TERMINATED BY '\r\n' IGNORE 1 LINES ( product_code , product_name , product_category ) ; /* SCD2 ON PRODUCT NAME AND GROUP                                     */ UPDATE   product_dim a , product_stg b SET   expiry_date = SUBDATE (CURRENT_DATE, 1) WHERE     a.product_code = b.product_code AND (   a.product_name <> b.product_name      OR a.product_category <> b.product_category      AND expiry_date = '9999-12-31' ; INSERT INTO product_dim SELECT   NULL , b.product_code , b.product_name , b.product_category , CURRENT_DATE , '9999-12-31' FROM   product_dim a , product_stg b WHERE     a.product_code = b.product_code AND (   a.product_name <> b.product_name      OR a.product_category <> b.product_category ) AND EXISTS ( SELECT * FROM product_dim x WHERE     b.product_code = x.product_code              AND a.expiry_date = SUBDATE (CURRENT_DATE, 1)) AND NOT EXISTS ( SELECT * FROM product_dim y WHERE    b.product_code = y.product_code              AND y.expiry_date = '9999-12-31') ; /* END OF SCD 2                                                       */ /* ADD NEW PRODUCT                                                    */ INSERT INTO product_dim SELECT   NULL , product_code , product_name , product_category , CURRENT_DATE , '9999-12-31' FROM product_stg WHERE product_code NOT IN( SELECT y.product_code FROM product_dim x, product_stg y WHERE x.product_code = y.product_code ) ; /* END OF PRODUCT_DIM POPULATION                                      */ /* ORDER_DIM POPULATION                                               */ INSERT INTO order_dim (   order_sk , order_number , effective_date , expiry_date ) SELECT   NULL , order_number , order_date , '9999-12-31' FROM source.sales_order WHERE entry_date = CURRENT_DATE ; /* END OF ORDER_DIM POPULATION                                        */ /* SALES_ORDER_FACT POPULATION                                        */ INSERT INTO sales_order_fact SELECT   order_sk , customer_sk , product_sk , e.date_sk , f.date_sk , order_amount , order_quantity FROM   source.sales_order a , order_dim b , customer_dim c , product_dim d , date_dim e , date_dim f WHERE     a.order_number = b.order_number AND a.customer_number = c.customer_number AND a.order_date >= c.effective_date AND a.order_date <= c.expiry_date AND a.product_code = d.product_code AND a.order_date >= d.effective_date AND a.order_date <= d.expiry_date AND a.order_date = e.date AND a.request_delivery_date = f.date AND a.entry_date = CURRENT_DATE ; /* end of script                                                      */
image from book

Before you can test the revised regular population for the sales_order_fact table, you need to modify the sales_order table in the source database. To be precise, you need to add a request_delivery_date column to this table. The script in Listing 13.3 can help you achieve this.

Listing 13.3: Adding the request_delivery_date column to the sales_order table

image from book
 /*********************************************************************/ /*                                                                   */ /* request_delivery_date.sql                                         */ /*                                                                   */ /*********************************************************************/ USE source; ALTER TABLE sales_order   ADD request_delivery_date DATE AFTER order_date ; /* end of script                                                      */
image from book

Run the script in Listing 13.3 using this command.

 mysql> \. c:\mysql\scripts\request_delivery_date.sql

MySQL will indicate that there are 46 rows affected.

 Database changed Query OK, 46 rows affected (0.41 sec) Records: 46  Duplicates: 0  Warnings: 0

In addition to changing the sales_order table structure, you also need to add more sales orders to the sales_order table using the script in Listing 13.4.

Listing 13.4: Adding three sales orders with request delivery dates

image from book
 /**********************************************************************/ /*                                                                    */ /* request_delivery_date_source.sql                                   */ /*                                                                    */ /**********************************************************************/ USE source; INSERT INTO sales_order VALUES   (47, 1, 1, '2007-03-04', '2007-03-30', '2007-03-04', 7500, 75) , (48, 2, 2, '2007-03-04', '2007-03-30', '2007-03-04', 1000, 10) , (49, 3, 3, '2007-03-04', '2007-03-30', '2007-03-04', 1000, 10) ; /* end of script                                                     */
image from book

Now run the script in Listing 13.4.

 mysql> \. c:\mysql\scripts\request_delivery_date_source.sql

Finally, set your MySQL date to March 4, 2007 (the order date of the three recently added sales orders), and run the dw_regular_13.sql script in Listing 13.2.

 mysql> \. c:\mysql\scripts\dw_regular_13.sql

Here is how the response should look like.

 Database changed Query OK, 9 rows affected (0.07 sec) Query OK, 9 rows affected (0.05 sec) Records: 9  Deleted: 0  Skipped: 0  Warnings: 0 Query OK, 0 rows affected (0.00 sec) Rows matched: 0  Changed: 0  Warnings: 0 Query OK, 0 rows affected (0.00 sec) Records: 0  Duplicates: 0  Warnings: 0 Query OK, 0 rows affected (0.00 sec) Rows matched: 0  Changed: 0  Warnings: 0 Query OK, 0 rows affected (0.00 sec) Records: 0  Duplicates: 0  Warnings: 0 Query OK, 19 rows affected (0.08 sec) Query OK, 19 rows affected (0.06 sec) Records: 19  Duplicates: 0  Warnings: 0 Query OK, 4 rows affected (0.05 sec) Query OK, 4 rows affected (0.06 sec) Records: 4  Deleted: 0  Skipped: 0  Warnings: 0 Query OK, 0 rows affected (0.00 sec) Rows matched: 0  Changed: 0  Warnings: 0 Query OK, 0 rows affected (0.00 sec) Records: 0  Duplicates: 0  Warnings: 0 Query OK, 0 rows affected (0.00 sec) Records: 0  Duplicates: 0  Warnings: 0 Query OK, 3 rows affected (0.07 sec) Records: 3  Duplicates: 0  Warnings: 0 Query OK, 3 rows affected (0.12 sec) Records: 3  Duplicates: 0  Warnings: 0

If you query the sales_order_fact table, you will learn that the three new sales orders have values for the request_delivery_date_sk column, whereas the older records don't.

Here is a command for querying the sales_order_face table.

 mysql> select a.order_sk, request_delivery_date_sk     -> from sales_order_fact a, date_dim b     -> where a.order_date_sk = b.date_sk;

Here is the result of the query.

 +----------+--------------------------+ | order_sk | request_delivery_date_sk | +----------+--------------------------+ |        1 |                     NULL | |        2 |                     NULL | |        3 |                     NULL | |        5 |                     NULL | |        4 |                     NULL | |        6 |                     NULL | |        7 |                     NULL | |        8 |                     NULL | |        9 |                     NULL | |       10 |                     NULL | |       11 |                     NULL | |       12 |                     NULL | |       13 |                     NULL | |       14 |                     NULL | |       15 |                     NULL | |       16 |                     NULL | |       17 |                     NULL | |       18 |                     NULL | |       19 |                     NULL | |       20 |                     NULL | |       21 |                     NULL | |       22 |                     NULL | |       23 |                     NULL | |       24 |                     NULL | |       25 |                     NULL | |       26 |                     NULL | |       27 |                     NULL | |       28 |                     NULL | |       29 |                     NULL | |       30 |                     NULL | |       31 |                     NULL | |       32 |                     NULL | |       33 |                     NULL | |       34 |                     NULL | |       35 |                     NULL | |       36 |                     NULL | |       37 |                     NULL | |       38 |                     NULL | |       39 |                     NULL | |       40 |                     NULL | |       41 |                     NULL | |       42 |                     NULL | |       43 |                     NULL | |       44 |                     NULL | |       45 |                      760 | |       46 |                      760 | |       47 |                      760 | +----------+--------------------------+ 47 rows in set (0.01 sec) 

You can verify that the date for the request_delivery_date_sk 760 is March 30, 2007; using this query.

 mysql> select date_sk, date from date_dim where date_sk = 760;

The response shows that 760 has a value of 2007-03-30.

 +---------+------------+ | date_sk | date       | +---------+------------+ |     760 | 2007-03-30 | +---------+------------+ 1 row in set (0.00 sec)

Now that you have revised the schema and the regular population script, you're ready to implement the date dimension role-playing. The next two sections show how you can use each type of dimension role playing implementation types, table alias and database view.



Dimensional Data Warehousing with MySQL. A Tutorial
Dimensional Data Warehousing with MySQL: A Tutorial
ISBN: 0975212826
EAN: 2147483647
Year: 2004
Pages: 149

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net