In this chapter you learned two types of subset dimensions. The month subset dimension is an example of a roll-up dimension, which is a higher level dimension populated from its more detailed base dimension. The PA customer is a specific subset dimension;
In the
This chapter teaches you dimension role-playing, a technique you use when a fact needs a dimension more than once. For example, if the sales order fact has more than one date, say an order date and a request delivery date, you need to use the date dimension more than once.
In this chapter you also learn two types of dimension role playing
In this section I show you how to add a request delivery date,
The first thing you need to do is add a request_delivery_date column to the sales_order_fact table. The schema now looks like that in Figure 13.1.
Figure 13.1:
Adding request_delivery_date to sales_order_fact
You can use the script in Listing 13.1 to add the request_delivery_date column.
Listing 13.1: Adding the request_delivery_date_sk column
|
|
/**********************************************************************/ /* */ /* request_delivery_date_sk.sql */ /* */ /**********************************************************************/ USE dw; ALTER TABLE sales_order_fact ADD request_delivery_date_sk INT AFTER order_date_sk ; /* end of script */
|
|
You run the script in Listing 13.1 using this command.
mysql> \. c:\mysql\scripts\request_delivery_date_sk.sql
MySQL will
Database changed Query OK, 44 rows affected (0.39 sec) Records: 44 Duplicates: 0 Warnings: 0
Since the structure of the sales_order_fact table has changed, you also need to update the script for regularly populating this table. The new script is shown in Listing 13.2.
Listing 13.2: The revised daily DW regular population
|
|
/**********************************************************************/ /* */ /* dw_regular_13.sql */ /* */ /**********************************************************************/ USE dw; /* CUSTOMER_DIM POPULATION */ TRUNCATE customer_stg; LOAD DATA INFILE 'customer.csv' INTO TABLE customer_stg FIELDS TERMINATED BY ' , ' OPTIONALLY ENCLOSED BY "" LINES TERMINATED BY '\r\n' IGNORE 1 LINES ( customer_number , customer_name , customer_street_address , customer_zip_code , customer_city , customer_state , shipping_address , shipping_zip_code , shipping_city , shipping_state ) ; /* SCD 2 ON ADDRESSES */ UPDATE customer_dim a , customer_stg b SET a.expiry_date - SUBDATE (CURRENT_DATE, 1) WHERE a.customer_number = b.customer_number AND ( a.customer_street_address <> b.customer_street_address OR a.customer_city <> b.customer_city OR a.customer_zip_code <> b.customer_zip_code OR a.customer_state <> b.customer_state OR a.shipping_address <> b.shipping_address OR a.shipping_city <> b.shipping_city OR a.shipping_zip_code <> b.shipping_zip_code OR a.shipping_state <> b.shipping_state OR a.shipping_address IS NULL OR a.shipping_city IS NULL OR a.shipping_zip_code IS NULL OR a.shipping_state IS NULL) AND expiry_date - '9999-12-31' ; INSERT INTO customer_dim SELECT NULL , b.customer_number , b.customer_name , b.customer_street_address , b.customer_zip_code , b.customer_city , b.customer_state , b.shipping_address , b.shipping_zip_code , b.shipping_city , b.shipping_state , CURRENT_DATE , '9999-12-31' FROM customer_dim a , customer_stg b WHERE a.customer_number = b.customer_number AND ( a.customer_street_address <> b.customer_street_address OR a.customer_city <> b.customer_city OR a.customer_zip_code <> b.customer_zip_code OR a.customer_state <> b.customer_state OR a.shipping_address <> b.shipping_address OR a.shipping_city <> b.shipping_city OR a.shipping_zip_code <> b.shipping_zip_code OR a.shipping_state <> b.shipping_state OR a.shipping_address IS NULL OR a.shipping_city IS NULL OR a.shipping_zip_code IS NULL OR a.shipping_state IS NULL) AND EXISTS ( SELECT * FROM customer_dim x WHERE b.customer_number = x.customer_number AND a.expiry_date = SUBDATE (CURRENT_DATE, 1)) AND NOT EXISTS ( SELECT * FROM customer_dim y WHERE b.customer_number = y.customer_number AND y.expiry_date - '9999-12-31') ; /* END OF SCD 2 */ /* SCD 1 ON NAME */ UPDATE customer_dim a, customer_stg b SET a.customer_name = b.customer_name WHERE a.customer_number = b.customer_number AND a.expiry_date = '9999-12-31' AND a.customer_name <> b.customer_name ; /* ADD NEW CUSTOMER */ INSERT INTO customer_dim SELECT NULL , customer_number , customer_name , customer_street_address , customer_zip_code , customer_city , customer_state , shipping_address , shipping_zip_code , shipping_city , shipping_state , CURRENT_DATE , '9999-12-31' FROM customer_stg WHERE customer_number NOT IN( SELECT a.customer_number FROM customer_dim a , customer_stg b WHERE b.customer_number = a.customer_number ) ; /* RE-BUILD PA CUSTOMER DIMENSION */ TRUNCATE pa_customer_dim; INSERT INTO pa_customer_dim SELECT customer_sk , customer_number , customer_name , customer_street_address , customer_zip_code , customer_city , customer_state , shipping_address , shipping_zip_code , shipping_city , shipping_state , effective_date , expiry_date FROM customer_dim WHERE customer_state = 'PA' ; /* END OF CUSTOMER_DIM POPULATION */ /* PRODUCT_DIM POPULATION */ TRUNCATE product_stg ; LOAD DATA INFILE 'product.txt' INTO TABLE product_stg FIELDS TERMINATED BY '' OPTIONALLY ENCLOSED BY '' LINES TERMINATED BY '\r\n' IGNORE 1 LINES ( product_code , product_name , product_category ) ; /* SCD2 ON PRODUCTNAME AND GROUP */ UPDATE product_dim a , product_stg b SET expiry_date = SUBDATE (CURRENT_DATE, 1) WHERE a.product_code = b.product_code AND ( a.product_name <> b.product_name OR a.product_category <> b.product_category AND expiry_date = '9999-12-31' ; INSERT INTO product_dim SELECT NULL , b.product_code , b.product_name , b.product_category , CURRENT_DATE , '9999-12-31' FROM product_dim a , product_stg b WHERE a.product_code = b.product_code AND ( a.product_name <> b.product_name OR a.product_category <> b.product_category ) AND EXISTS ( SELECT * FROM product_dim x WHERE b.product_code = x.product_code AND a.expiry_date = SUBDATE (CURRENT_DATE, 1)) AND NOT EXISTS ( SELECT * FROM product_dim y WHERE b.product_code = y.product_code AND y.expiry_date = '9999-12-31') ; /* END OF SCD 2 */ /* ADD NEW PRODUCT */ INSERT INTO product_dim SELECT NULL , product_code , product_name , product_category , CURRENT_DATE , '9999-12-31' FROM product_stg WHERE product_code NOT IN( SELECT y.product_code FROM product_dim x, product_stg y WHERE x.product_code = y.product_code ) ; /* END OF PRODUCT_DIM POPULATION */ /* ORDER_DIM POPULATION */ INSERT INTO order_dim ( order_sk , order_number , effective_date , expiry_date ) SELECT NULL , order_number , order_date , '9999-12-31' FROM source.sales_order WHERE entry_date = CURRENT_DATE ; /* END OF ORDER_DIM POPULATION */ /* SALES_ORDER_FACT POPULATION */ INSERT INTO sales_order_fact SELECT order_sk , customer_sk , product_sk , e.date_sk , f.date_sk , order_amount , order_quantity FROM source.sales_order a , order_dim b , customer_dim c , product_dim d , date_dim e , date_dim f WHERE a.order_number = b.order_number AND a.customer_number = c.customer_number AND a.order_date >= c.effective_date AND a.order_date <= c.expiry_date AND a.product_code = d.product_code AND a.order_date >= d.effective_date AND a.order_date <= d.expiry_date AND a.order_date = e.date AND a.request_delivery_date = f.date AND a.entry_date = CURRENT_DATE ; /* end of script */
|
|
Before you can test the revised regular population for the sales_order_fact table, you need to modify the sales_order table in the source database. To be precise, you need to add a request_delivery_date column to this table. The script in Listing 13.3 can help you achieve this.
Listing 13.3: Adding the request_delivery_date column to the sales_order table
|
|
/*********************************************************************/ /* */ /* request_delivery_date.sql */ /* */ /*********************************************************************/ USE source; ALTER TABLE sales_order ADD request_delivery_date DATE AFTER order_date ; /* end of script */
|
|
Run the script in Listing 13.3 using this command.
mysql> \. c:\mysql\scripts\request_delivery_date.sql
MySQL will indicate that there are 46 rows affected.
Database changed Query OK, 46 rows affected (0.41 sec) Records: 46 Duplicates: 0 Warnings: 0
In addition to changing the sales_order table structure, you also need to add more sales orders to the sales_order table using the script in Listing 13.4.
Listing 13.4: Adding three sales orders with request delivery dates
|
|
/**********************************************************************/ /* */ /* request_delivery_date_source.sql */ /* */ /**********************************************************************/ USE source; INSERT INTO sales_order VALUES (47, 1, 1, '2007-03-04', '2007-03-30', '2007-03-04', 7500, 75) , (48, 2, 2, '2007-03-04', '2007-03-30', '2007-03-04', 1000, 10) , (49, 3, 3, '2007-03-04', '2007-03-30', '2007-03-04', 1000, 10) ; /* end of script */
|
|
Now run the script in Listing 13.4.
mysql> \. c:\mysql\scripts\request_delivery_date_source.sql
Finally, set your MySQL date to March 4, 2007 (the order date of the three recently added sales orders), and run the dw_regular_13.sql script in Listing 13.2.
mysql> \. c:\mysql\scripts\dw_regular_13.sql
Here is how the response should look like.
Database changed Query OK, 9 rows affected (0.07 sec) Query OK, 9 rows affected (0.05 sec) Records: 9 Deleted: 0 Skipped: 0 Warnings: 0 Query OK, 0 rows affected (0.00 sec) Rows matched: 0 Changed: 0 Warnings: 0 Query OK, 0 rows affected (0.00 sec) Records: 0 Duplicates: 0 Warnings: 0 Query OK, 0 rows affected (0.00 sec) Rows matched: 0 Changed: 0 Warnings: 0 Query OK, 0 rows affected (0.00 sec) Records: 0 Duplicates: 0 Warnings: 0 Query OK, 19 rows affected (0.08 sec) Query OK, 19 rows affected (0.06 sec) Records: 19 Duplicates: 0 Warnings: 0 Query OK, 4 rows affected (0.05 sec) Query OK, 4 rows affected (0.06 sec) Records: 4 Deleted: 0 Skipped: 0 Warnings: 0 Query OK, 0 rows affected (0.00 sec) Rows matched: 0 Changed: 0 Warnings: 0 Query OK, 0 rows affected (0.00 sec) Records: 0 Duplicates: 0 Warnings: 0 Query OK, 0 rows affected (0.00 sec) Records: 0 Duplicates: 0 Warnings: 0 Query OK, 3 rows affected (0.07 sec) Records: 3 Duplicates: 0 Warnings: 0 Query OK, 3 rows affected (0.12 sec) Records: 3 Duplicates: 0 Warnings: 0
If you query the sales_order_fact table, you will learn that the three new sales orders have values for the request_delivery_date_sk column, whereas the older records don't.
Here is a command for querying the sales_order_face table.
mysql> select a.order_sk, request_delivery_date_sk -> from sales_order_fact a, date_dim b -> where a.order_date_sk = b.date_sk;
Here is the result of the query.
+----------+--------------------------+ order_sk request_delivery_date_sk +----------+--------------------------+ 1 NULL 2 NULL 3 NULL 5 NULL 4 NULL 6 NULL 7 NULL 8 NULL 9 NULL 10 NULL 11 NULL 12 NULL 13 NULL 14 NULL 15 NULL 16 NULL 17 NULL 18 NULL 19 NULL 20 NULL 21 NULL 22 NULL 23 NULL 24 NULL 25 NULL 26 NULL 27 NULL 28 NULL 29 NULL 30 NULL 31 NULL 32 NULL 33 NULL 34 NULL 35 NULL 36 NULL 37 NULL 38 NULL 39 NULL 40 NULL 41 NULL 42 NULL 43 NULL 44 NULL 45 760 46 760 47 760 +----------+--------------------------+ 47 rows in set (0.01 sec)
You can verify that the date for the request_delivery_date_sk 760 is March 30, 2007; using this query.
mysql> select date_sk, date from date_dim where date_sk = 760;
The response shows that 760 has a value of 2007-03-30.
+---------+------------+ date_sk date +---------+------------+ 760 2007-03-30 +---------+------------+ 1 row in set (0.00 sec)
Now that you have revised the schema and the regular population script, you're ready to implement the date dimension role-playing. The next two sections show how you can use each type of dimension role playing implementation types, table alias and database view.