Flylib.com

Books Software

 
 
 

Dimensional Data Warehousing with MySQL: A Tutorial - page 67


Summary

In this chapter you learned two types of subset dimensions. The month subset dimension is an example of a roll-up dimension, which is a higher level dimension populated from its more detailed base dimension. The PA customer is a specific subset dimension; populated from its base dimension by selecting PA customers only.

In the next chapter, you learn another technique of reusing an existing dimension, called role-playing dimension



Chapter 13: Dimension Role Playing

This chapter teaches you dimension role-playing, a technique you use when a fact needs a dimension more than once. For example, if the sales order fact has more than one date, say an order date and a request delivery date, you need to use the date dimension more than once.

In this chapter you also learn two types of dimension role playing implementations , table alias and database view. Both types use MySQL functions. The table alias type uses the dimension more than once in an SQL statement by assigning an alias for each use. As for the database view type, you create as many views as the number of roles you need the dimension on the fact.

Adding Request Delivery Dates

In this section I show you how to add a request delivery date, revise the data warehouse regular population, and test the revised regular population script. This preparation is for showing the implementation of date dimension role-playing in the next section.

The first thing you need to do is add a request_delivery_date column to the sales_order_fact table. The schema now looks like that in Figure 13.1.

image from book
Figure 13.1: Adding request_delivery_date to sales_order_fact

You can use the script in Listing 13.1 to add the request_delivery_date column.

Listing 13.1: Adding the request_delivery_date_sk column

image from book
/**********************************************************************/ /* */ /* request_delivery_date_sk.sql */ /* */ /**********************************************************************/ USE dw; ALTER TABLE sales_order_fact ADD request_delivery_date_sk INT AFTER order_date_sk ; /* end of script */
image from book

You run the script in Listing 13.1 using this command.

mysql> \. c:\mysql\scripts\request_delivery_date_sk.sql

MySQL will indicate that there are 44 rows affected by the command.

Database changed Query OK, 44 rows affected (0.39 sec) Records: 44 Duplicates: 0 Warnings: 0

Since the structure of the sales_order_fact table has changed, you also need to update the script for regularly populating this table. The new script is shown in Listing 13.2.

Listing 13.2: The revised daily DW regular population

image from book
/**********************************************************************/ /* */ /* dw_regular_13.sql */ /* */ /**********************************************************************/ USE dw; /* CUSTOMER_DIM POPULATION */ TRUNCATE customer_stg; LOAD DATA INFILE 'customer.csv' INTO TABLE customer_stg FIELDS TERMINATED BY ' , ' OPTIONALLY ENCLOSED BY "" LINES TERMINATED BY '\r\n' IGNORE 1 LINES ( customer_number , customer_name , customer_street_address , customer_zip_code , customer_city , customer_state , shipping_address , shipping_zip_code , shipping_city , shipping_state ) ; /* SCD 2 ON ADDRESSES */ UPDATE customer_dim a , customer_stg b SET a.expiry_date - SUBDATE (CURRENT_DATE, 1) WHERE a.customer_number = b.customer_number AND ( a.customer_street_address <> b.customer_street_address OR a.customer_city <> b.customer_city OR a.customer_zip_code <> b.customer_zip_code OR a.customer_state <> b.customer_state OR a.shipping_address <> b.shipping_address OR a.shipping_city <> b.shipping_city OR a.shipping_zip_code <> b.shipping_zip_code OR a.shipping_state <> b.shipping_state OR a.shipping_address IS NULL OR a.shipping_city IS NULL OR a.shipping_zip_code IS NULL OR a.shipping_state IS NULL) AND expiry_date - '9999-12-31' ; INSERT INTO customer_dim SELECT NULL , b.customer_number , b.customer_name , b.customer_street_address , b.customer_zip_code , b.customer_city , b.customer_state , b.shipping_address , b.shipping_zip_code , b.shipping_city , b.shipping_state , CURRENT_DATE , '9999-12-31' FROM customer_dim a , customer_stg b WHERE a.customer_number = b.customer_number AND ( a.customer_street_address <> b.customer_street_address OR a.customer_city <> b.customer_city OR a.customer_zip_code <> b.customer_zip_code OR a.customer_state <> b.customer_state OR a.shipping_address <> b.shipping_address OR a.shipping_city <> b.shipping_city OR a.shipping_zip_code <> b.shipping_zip_code OR a.shipping_state <> b.shipping_state OR a.shipping_address IS NULL OR a.shipping_city IS NULL OR a.shipping_zip_code IS NULL OR a.shipping_state IS NULL) AND EXISTS ( SELECT * FROM customer_dim x WHERE b.customer_number = x.customer_number AND a.expiry_date = SUBDATE (CURRENT_DATE, 1)) AND NOT EXISTS ( SELECT * FROM customer_dim y WHERE b.customer_number = y.customer_number AND y.expiry_date - '9999-12-31') ; /* END OF SCD 2 */ /* SCD 1 ON NAME */ UPDATE customer_dim a, customer_stg b SET a.customer_name = b.customer_name WHERE a.customer_number = b.customer_number AND a.expiry_date = '9999-12-31' AND a.customer_name <> b.customer_name ; /* ADD NEW CUSTOMER */ INSERT INTO customer_dim SELECT NULL , customer_number , customer_name , customer_street_address , customer_zip_code , customer_city , customer_state , shipping_address , shipping_zip_code , shipping_city , shipping_state , CURRENT_DATE , '9999-12-31' FROM customer_stg WHERE customer_number NOT IN( SELECT a.customer_number FROM customer_dim a , customer_stg b WHERE b.customer_number = a.customer_number ) ; /* RE-BUILD PA CUSTOMER DIMENSION */ TRUNCATE pa_customer_dim; INSERT INTO pa_customer_dim SELECT customer_sk , customer_number , customer_name , customer_street_address , customer_zip_code , customer_city , customer_state , shipping_address , shipping_zip_code , shipping_city , shipping_state , effective_date , expiry_date FROM customer_dim WHERE customer_state = 'PA' ; /* END OF CUSTOMER_DIM POPULATION */ /* PRODUCT_DIM POPULATION */ TRUNCATE product_stg ; LOAD DATA INFILE 'product.txt' INTO TABLE product_stg FIELDS TERMINATED BY '' OPTIONALLY ENCLOSED BY '' LINES TERMINATED BY '\r\n' IGNORE 1 LINES ( product_code , product_name , product_category ) ; /* SCD2 ON PRODUCT

NAME

AND GROUP */ UPDATE product_dim a , product_stg b SET expiry_date = SUBDATE (CURRENT_DATE, 1) WHERE a.product_code = b.product_code AND ( a.product_name <> b.product_name OR a.product_category <> b.product_category AND expiry_date = '9999-12-31' ; INSERT INTO product_dim SELECT NULL , b.product_code , b.product_name , b.product_category , CURRENT_DATE , '9999-12-31' FROM product_dim a , product_stg b WHERE a.product_code = b.product_code AND ( a.product_name <> b.product_name OR a.product_category <> b.product_category ) AND EXISTS ( SELECT * FROM product_dim x WHERE b.product_code = x.product_code AND a.expiry_date = SUBDATE (CURRENT_DATE, 1)) AND NOT EXISTS ( SELECT * FROM product_dim y WHERE b.product_code = y.product_code AND y.expiry_date = '9999-12-31') ; /* END OF SCD 2 */ /* ADD NEW PRODUCT */ INSERT INTO product_dim SELECT NULL , product_code , product_name , product_category , CURRENT_DATE , '9999-12-31' FROM product_stg WHERE product_code NOT IN( SELECT y.product_code FROM product_dim x, product_stg y WHERE x.product_code = y.product_code ) ; /* END OF PRODUCT_DIM POPULATION */ /* ORDER_DIM POPULATION */ INSERT INTO order_dim ( order_sk , order_number , effective_date , expiry_date ) SELECT NULL , order_number , order_date , '9999-12-31' FROM source.sales_order WHERE entry_date = CURRENT_DATE ; /* END OF ORDER_DIM POPULATION */ /* SALES_ORDER_FACT POPULATION */ INSERT INTO sales_order_fact SELECT order_sk , customer_sk , product_sk , e.date_sk , f.date_sk , order_amount , order_quantity FROM source.sales_order a , order_dim b , customer_dim c , product_dim d , date_dim e , date_dim f WHERE a.order_number = b.order_number AND a.customer_number = c.customer_number AND a.order_date >= c.effective_date AND a.order_date <= c.expiry_date AND a.product_code = d.product_code AND a.order_date >= d.effective_date AND a.order_date <= d.expiry_date AND a.order_date = e.date AND a.request_delivery_date = f.date AND a.entry_date = CURRENT_DATE ; /* end of script */
image from book

Before you can test the revised regular population for the sales_order_fact table, you need to modify the sales_order table in the source database. To be precise, you need to add a request_delivery_date column to this table. The script in Listing 13.3 can help you achieve this.

Listing 13.3: Adding the request_delivery_date column to the sales_order table

image from book
{% if main.adsdop %}{% include 'adsenceinline.tpl' %}{% endif %}
/*********************************************************************/ /* */ /* request_delivery_date.sql */ /* */ /*********************************************************************/ USE source; ALTER TABLE sales_order ADD request_delivery_date DATE AFTER order_date ; /* end of script */
image from book

Run the script in Listing 13.3 using this command.

mysql> \. c:\mysql\scripts\request_delivery_date.sql

MySQL will indicate that there are 46 rows affected.

Database changed Query OK, 46 rows affected (0.41 sec) Records: 46 Duplicates: 0 Warnings: 0

In addition to changing the sales_order table structure, you also need to add more sales orders to the sales_order table using the script in Listing 13.4.

Listing 13.4: Adding three sales orders with request delivery dates

image from book
/**********************************************************************/ /* */ /* request_delivery_date_source.sql */ /* */ /**********************************************************************/ USE source; INSERT INTO sales_order VALUES (47, 1, 1, '2007-03-04', '2007-03-30', '2007-03-04', 7500, 75) , (48, 2, 2, '2007-03-04', '2007-03-30', '2007-03-04', 1000, 10) , (49, 3, 3, '2007-03-04', '2007-03-30', '2007-03-04', 1000, 10) ; /* end of script */
image from book

Now run the script in Listing 13.4.

mysql> \. c:\mysql\scripts\request_delivery_date_source.sql

Finally, set your MySQL date to March 4, 2007 (the order date of the three recently added sales orders), and run the dw_regular_13.sql script in Listing 13.2.

mysql> \. c:\mysql\scripts\dw_regular_13.sql

Here is how the response should look like.

Database changed Query OK, 9 rows affected (0.07 sec) Query OK, 9 rows affected (0.05 sec) Records: 9 Deleted: 0 Skipped: 0 Warnings: 0 Query OK, 0 rows affected (0.00 sec) Rows matched: 0 Changed: 0 Warnings: 0 Query OK, 0 rows affected (0.00 sec) Records: 0 Duplicates: 0 Warnings: 0 Query OK, 0 rows affected (0.00 sec) Rows matched: 0 Changed: 0 Warnings: 0 Query OK, 0 rows affected (0.00 sec) Records: 0 Duplicates: 0 Warnings: 0 Query OK, 19 rows affected (0.08 sec) Query OK, 19 rows affected (0.06 sec) Records: 19 Duplicates: 0 Warnings: 0 Query OK, 4 rows affected (0.05 sec) Query OK, 4 rows affected (0.06 sec) Records: 4 Deleted: 0 Skipped: 0 Warnings: 0 Query OK, 0 rows affected (0.00 sec) Rows matched: 0 Changed: 0 Warnings: 0 Query OK, 0 rows affected (0.00 sec) Records: 0 Duplicates: 0 Warnings: 0 Query OK, 0 rows affected (0.00 sec) Records: 0 Duplicates: 0 Warnings: 0 Query OK, 3 rows affected (0.07 sec) Records: 3 Duplicates: 0 Warnings: 0 Query OK, 3 rows affected (0.12 sec) Records: 3 Duplicates: 0 Warnings: 0

If you query the sales_order_fact table, you will learn that the three new sales orders have values for the request_delivery_date_sk column, whereas the older records don't.

Here is a command for querying the sales_order_face table.

mysql> select a.order_sk, request_delivery_date_sk -> from sales_order_fact a, date_dim b -> where a.order_date_sk = b.date_sk;

Here is the result of the query.

+----------+--------------------------+  order_sk  request_delivery_date_sk  +----------+--------------------------+  1  NULL   2  NULL   3  NULL   5  NULL   4  NULL   6  NULL   7  NULL   8  NULL   9  NULL   10  NULL   11  NULL   12  NULL   13  NULL   14  NULL   15  NULL   16  NULL   17  NULL   18  NULL   19  NULL   20  NULL   21  NULL   22  NULL   23  NULL   24  NULL   25  NULL   26  NULL   27  NULL   28  NULL   29  NULL   30  NULL   31  NULL   32  NULL   33  NULL   34  NULL   35  NULL   36  NULL   37  NULL   38  NULL   39  NULL   40  NULL   41  NULL   42  NULL   43  NULL   44  NULL   45  760   46  760   47  760  +----------+--------------------------+ 47 rows in set (0.01 sec)

You can verify that the date for the request_delivery_date_sk 760 is March 30, 2007; using this query.

mysql> select date_sk, date from date_dim where date_sk = 760;

The response shows that 760 has a value of 2007-03-30.

+---------+------------+  date_sk  date  +---------+------------+  760  2007-03-30  +---------+------------+ 1 row in set (0.00 sec)

Now that you have revised the schema and the regular population script, you're ready to implement the date dimension role-playing. The next two sections show how you can use each type of dimension role playing implementation types, table alias and database view.