Chapter 22: Late Arrival Facts


A fact is late if it is loaded at a later date than its effective date. Take as an example a sales order that is entered into the source data later than its order date. When the sales order is then loaded into its fact table, the loading date takes place after its order date and the sales order is therefore a late arrival fact.

Late arrival facts affect the population of periodic snapshot fact tables, such as the month_end_sales_order_fact table discussed in Chapter 14, “Snapshots.” For instance, if the month end snapshot sales order amount of March 2007 has been calculated and stored in the month_end_sales_order_fact table, and a late sales order with a March 10 order date is loaded, the March 2007 snapshot amount must be re-calculated to account for the late fact.

This chapter teaches you how to handle late arrival fact.

Handling Late Arrival Facts

In this section I explain how to handle late arrival sales orders when populating the month_end_sales_order_fact table.

First of all, in order for you to know if a sales order is late, you need to load the entry date from the sales order source to the sales_order_fact table. Since you don’t have a column for the entry date, you need to add a new one to the fact table. Similar to adding the request delivery date in Chapter 13, “Dimension Role Playing,” you apply the dimension role playing technique to add the entry date. Therefore, add a date surrogate key column in the sales order fact table named entry_date_sk and create a database view from the date dimension table called entry_date_dim. The script in Listing 22.1 creates the entry_date_dim view and add the entry_date_sk surrogate key column in the sales order fact table.

Note 

You might want to review the dimension role-playing technique in Chapter 13 to understand why you need to add the surrogate key column and create a database view.

Listing 22.1: Adding the entry_date column

image from book
 /****************************************************************/ /*                                                              */ /* entry_date.sql                                               */ /*                                                              */ /****************************************************************/ USE dw; CREATE VIEW entry_date_dim (   entry_date_sk , entry_date , month_name , month , quarter , year , promo_ind , effective_date , expiry_date ) AS SELECT   date_sk , date , month_name , month , quarter , year , promo_ind , effective_date , expiry_date FROM date_dim ; ALTER TABLE sales_order_fact ADD entry_date_sk INT AFTER receive_date_sk ; /* end of script                                                */
image from book

To run the script in Listing 22.1, type in the following command into the MySQL console.

 mysql> \. c:\mysql\scripts\entry_date.sql

You will see the following as the response.

 Database changed Query OK, 0 rows affected (0.02 sec) Query OK, 59 rows affected (0.49 sec) Records: 59  Duplicates: 0  Warnings: 0

After creating the entry_date_dim view and adding the entry_date_sk column to the sales_order_fact table, you now need to revise the data warehouse regular population script to include the entry date. Listing 22.2 shows the revised regular population script. Note that the sales_order source already contains entry dates, but we did not previously load it into the data warehouse.

Listing 22.2: The revised daily DW regular population

image from book
 /***********************************************************************/ /*                                                                     */ /* dw_regular_22.sql                                                   */ /*                                                                     */ /***********************************************************************/ USE dw; /* CUSTOMER_DIM POPULATION                                             */ TRUNCATE customer_stg; LOAD DATA INFILE 'customer.csv' INTO TABLE customer_stg FIELDS TERMINATED BY ', ' OPTIONALLY ENCLOSED BY "" LINES TERMINATED BY '\r\n' IGNORE 1 LINES ( customer_number , customer_name , customer_street_address , customer_zip_code , customer_city , customer_state , shipping_address , shipping_zip_code , shipping_city , shipping_state ) ; /* SCD 2 ON ADDRESSES                                                  */ UPDATE   customer_dim a , customer_stg b SET   a.expiry_date = SUBDATE (CURRENT_DATE, 1) WHERE   a.customer_number = b.customer_number AND (   a.customer_street_address <> b.customer_street_address      OR a.customer_city <> b.customer_city      OR a.customer_zip_code <> b.customer_zip_code      OR a.customer_state <> b.customer_state      OR a.shipping_address <> b.shipping_address      OR a.shipping_city <> b.shipping_city      OR a.shipping_zip_code <> b.shipping_zip_code      OR a.shipping_state <> b.shipping_state      OR a.shipping_address IS NULL      OR a.shipping_city IS NULL      OR a.shipping_zip_code IS NULL      OR a.shipping_state IS NULL) AND expiry_date = '9999-12-31' ; INSERT INTO customer_dim SELECT   NULL , b.customer_number , b.customer_name , b.customer_street_address , b.customer_zip_code , b.customer_city , b.customer_state , b.shipping_address , b.shipping_zip_code , b.shipping_city , b.shipping_state , CURRENT_DATE , '9999-12-31' FROM   customer_dim a , customer_stg b WHERE     a.customer_number = b.customer_number AND (   a.customer_street_address <> b.customer_street_address      OR a.customer_city <> b.customer_city      OR a.customer_zip_code <> b.customer_zip_code      OR a.customer_state <> b.customer_state      OR a.shipping_address <> b.shipping_address      OR a.shipping_city <> b.shipping_city      OR a.shipping_zip_code <> b.shipping_zip_code      OR a.shipping_state <> b.shipping_state      OR a.shipping_address IS NULL      OR a.shipping_city IS NULL      OR a.shipping_zip_code IS NULL      OR a.shipping_state IS NULL) AND EXISTS ( SELECT * FROM customer_dim x WHERE b.customer_number = x.customer_number AND a.expiry_date = SUBDATE (CURRENT_DATE, 1)) AND NOT EXISTS ( SELECT * FROM customer_dim y WHERE     b.customer_number = y.customer_number       AND y.expiry_date = '9999-12-31') ; /* END OF SCD 2                                                        */ /* SCD 1 ON NAME                                                       */ UPDATE customer_dim a, customer_stg b SET a.customer_name = b.customer_name WHERE     a.customer_number = b.customer_number       AND a.expiry_date = '9999-12-31'       AND a.customer_name <> b.customer_name ; /* ADD NEW CUSTOMER */ INSERT INTO customer_dim SELECT   NULL , customer_number , customer_name , customer_street_address , customer_zip_code , customer_city , customer_state , shipping_address , shipping_zip_code , shipping_city , shipping_state , CURRENT_DATE , '9999-12-31' FROM customer_stg WHERE customer_number NOT IN( SELECT a.customer_number FROM   customer_dim a , customer_stg b WHERE b.customer_number = a.customer_number ) ; /* RE-BUILD PA CUSTOMER DIMENSION                                      */ TRUNCATE pa_customer_dim; INSERT INTO pa_customer_dim SELECT   customer_sk , customer_number , customer_name , customer_street_address , customer_zip_code , customer_city , customer_state , shipping_address , shipping_zip_code , shipping_city , shipping_state , effective_date , expiry_date FROM customer_dim WHERE customer_state = 'PA' ; /* END OF CUSTOMER_DIM POPULATION                                      */ /* PRODUCT_DIM POPULATION TRUNCATE product_stg ; LOAD DATA INFILE 'product.txt' INTO TABLE product_stg FIELDS TERMINATED BY '' OPTIONALLY ENCLOSED BY '' LINES TERMINATED BY '\r\n' IGNORE 1 LINES ( product_code , product_name , product_category ) ; /* SCD2 ON PRODUCT NAME AND GROUP                                      */ UPDATE   product_dim a , product_stg b SET   expiry_date = SUBDATE (CURRENT_DATE, 1) WHERE     a.product_code = b.product_code AND (   a.product_name <> b.product_name      OR a.product_category <> b.product_category )      AND expiry_date = '9999-12-31' ; INSERT INTO product_dim SELECT   NULL , b.product_code , b.product_name , b.product_category , CURRENT_DATE , '9999-12-31' FROM   product_dim a , product_stg b WHERE     a.product_code = b.product_code AND (   a.product_name <> b.product_name      OR a.product_category <> b.product_category ) AND EXISTS ( SELECT * FROM product_dim x WHERE     b.product_code = x.product_code       AND a.expiry_date = SUBDATE (CURRENT_DATE, 1) AND NOT EXISTS ( SELECT * FROM product_dim y WHERE    b.product_code = y.product_code      AND y.expiry_date = '9999-12-31') ; /* END OF SCD 2                                                        */ /* ADD NEW PRODUCT                                                     */ INSERT INTO product_dim SELECT   NULL , product_code , product_name , product_category , CURRENT_DATE , '9999-12-31' FROM product_stg WHERE product_code NOT IN( SELECT y.product_code FROM product_dim x, product_stg y WHERE x.product_code = y.product_code ) ; /* END OF PRODUCT_DIM POPULATION                                       */ /* PRODUCT_COUNT_FACT POPULATION                                       */ TRUNCATE product_count_fact ; INSERT INTO product_count_fact (product_sk, product_launch_date_sk) SELECT   a.product_sk , b.date_sk FROM   product_dim a , date_dim b WHERE a.effective_date = b.date GROUP BY product_code HAVING COUNT (product_code) = 1 ; /* for products that have been updated by SCD2                         */ INSERT INTO product_count_fact (product_sk, product_launch_date_sk) SELECT   a.product_sk , b.date_sk FROM   product_dim a , date_dim b WHERE a.effective_date = b.date GROUP BY product_code HAVING COUNT (product_code) > 1 ; /* END OF PRODUCT_COUNT_FACT POPULATION                                */ /* INSERT NEW ORDERS                                                   */ INSERT INTO sales_order_fact SELECT   b.customer_sk , c.product_sk , f.sales_order_attribute_sk , d.order_date_sk , NULL , NULL , NULL , NULL , g.entry_date_sk , a.order_number , e.request_delivery_date_sk , order_amount , quantity , NULL , NULL , NULL , NULL FROM   source.sales_order a , customer_dim b , product_dim c , order_date_dim d , request_delivery_date_dim e , sales_order_attribute_dim f , entry_date_dim g WHERE       order_status = 'N' AND a.entry_date = CURRENT_DATE AND a.customer_number = b.customer_number AND a.status_date >= b.effective_date AND a.status_date <= b.expiry_date AND a.product_code = c.product_code AND a.status_date >= c.effective_date AND a.status_date <= c.expiry_date AND a.status_date = d.order_date AND a.entry_date = g.entry_date AND a.request_delivery_date = e.request_delivery_date AND a.verification_ind = f.verification_ind AND a.credit_check_flag = f.credit_check_flag AND a.new_customer_ind = f.new_customer_ind AND a.web_order_flag = f.web_order_flag AND a.status_date >= f.effective_date AND a.status_date <= f.expiry_date ; UPDATE   sales_order_fact a , source.sales_order b , allocate_date_dim c SET   a.allocate_date_sk = c.allocate_date_sk , a.allocate_quantity = b.quantity WHERE     order_status = 'A' AND b.entry_date = CURRENT_DATE AND b.order_number = a.order_number AND c.allocate date = b.status date ; UPDATE   sales_order_fact a , source.sales_order b , packing_date_dim d SET   a.packing_date_sk = d.packing_date_sk , a.packing_quantity = b.quantity WHERE     order_status = 'p' AND b.entry_date = CURRENT_DATE AND b.order_number = a.order_number AND d.packing_date = b.status_date ; UPDATE   sales_order_fact a , source.sales_order b , ship_date_dim e SET   a.ship_date_sk = e.ship_date_sk , a.ship_quantity = b.quantity WHERE     order_status = 'S' AND b.entry_date - CURRENT_DATE AND b.order_number = a.order_number AND e.ship_date = b.status_date ; UPDATE   sales_order_fact a , source.sales_order b , receive_date_dim f SET   a.receive_date_sk = f.receive_date_sk , a.receive_quantity = b.quantity WHERE     order_status = 'R' AND b.entry_date = CURRENT_DATE AND b.order_number = a.order_number AND f.receive_date = b.status_date ; /* end of script                                                       */
image from book

You must also revise the month end population script. The revised script, presented in listing 22.3, has three parts. The first part handles sales orders that are not late. The second part adds incoming sales amounts to the existing rows that have the same product and month as the incoming sales orders. The third part adds new rows for the incoming sales orders.

Listing 22.3: Revised Month End Sales Order population

image from book
 /**********************************************************************/ /*                                                                    */ /* month_end_sales_order_22.sql                                       */ /*                                                                    */ /**********************************************************************/ USE dw; /* normal (order_date = entry_date)                                   */ INSERT INTO month_end_sales_order_fact SELECT   d.month_sk , a.product_sk , SUM (order_amount) , SUM (order_quantity) FROM   sales_order_fact a , order_date_dim b , entry_date_dim c , month_dim d WHERE     a.order_date_sk = b.order_date_sk AND a.entry_date_sk = c.entry_date_sk AND b.order_date = c.entry_date AND c.month - MONTH (CURRENT_DATE) AND c.year - YEAR (CURRENT_DATE) AND b.month = d.month AND b.year = d.year GROUP BY d.month, d.year, product_sk ; /* late arrival, amount & quantity already exist for the past months */ UPDATE    month_end_sales_order_fact a , (SELECT   y.month , y.year , w.product_sk , SUM (order_amount) order_amount , SUM (order_quantity) order_quantity FROM   sales_order_fact x , order_date_dim y , entry_date_dim z , product_dim w WHERE     x.order_date_sk = y.order_date_sk AND x.entry_date_sk = z.entry_date_sk AND order_date <> entry_date AND MONTH (entry_date) = MONTH (CURRENT_DATE) AND YEAR (entry_date) = YEAR (CURRENT_DATE) AND x.product_sk = w.product_sk GROUP BY y.month, y.year, product_sk) b , month_dim c SET month_order_amount = month_order_amount + b.order_amount , month_order_quantity = month_order_quantity + b.order_quantity WHERE     a.month_order_sk = c.month_sk AND b.month = c.month AND b.year = c.year AND a.product_sk = b.product_sk ; /* late arrival but amount & quantity not exist for the past months */ INSERT INTO month_end_sales_order_fact SELECT   d.month_sk , a.product_sk , SUM (order_amount) , SUM (order_quantity) FROM   sales_order_fact a , order_date_dim b , entry_date_dim c , month_dim d WHERE     a.order date sk = b.order date sk AND a.entry_date_sk = c.entry_date_sk AND b.order_date <> c.entry_date AND c.month = MONTH (CURRENT_DATE) AND c.year = YEAR (CURRENT_DATE) AND b.month = d.month AND b.year = d.year AND NOT EXISTS (SELECT * FROM   month_end_sales_order_fact p , sales_order_fact q , month_dim s  WHERE     p.month_order_sk = s.month_sk AND s.month = d.month AND s.year = d.year AND p.product_sk = a.product_sk ) GROUP BY   d.month , d.year , a.product_sk ; /* end of script                                                      */
image from book



Dimensional Data Warehousing with MySQL. A Tutorial
Dimensional Data Warehousing with MySQL: A Tutorial
ISBN: 0975212826
EAN: 2147483647
Year: 2004
Pages: 149

Similar book on Amazon
The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling (Second Edition)
The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling (Second Edition)
The Data Warehouse ETL Toolkit: Practical Techniques for Extracting, Cleanin
The Data Warehouse ETL Toolkit: Practical Techniques for Extracting, Cleanin
Successful Business Intelligence: Secrets to Making BI a Killer App
Successful Business Intelligence: Secrets to Making BI a Killer App

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net