Now that you have identified your data sources, it’s time to write a script that can be used for initial population. Assuming your data warehouse will start operation on March 1, 2007 and your user wants to load two years of historical data, you need to load the source data dated from March 1, 2005 to February 28, 2007.
The script in Listing 7.1 serves as an example. Note that:
The customer dimension's and the product dimension's effective date is March 1, 2007. No sales order loaded has an earlier date, meaning no earlier customer and product dimensions are required.
The effective date of the order dimension is, of course, the order date.
The surrogate key columns of the sales order fact table are populated from the dimension surrogate keys.
You need to pre-populate the date_dim table separately using pre-population with dates from March 1, 2005 to, say, December 31, 2010.
Listing 7.1: DW initial population
/*****************************************************************/ /* */ /* dw_initial.sql */ /* */ /*****************************************************************/ USE dw; LOAD DATA INFILE 'customer.csv' INTO TABLE customer_stg FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\r\n' IGNORE 1 LINES ( customer_number , customer_name , customer_street_address , customer_zip_code , customer_city , customer_state ) ; INSERT INTO customer_dim SELECT NULL , customer_number , customer_name , customer_street_address , customer_zip_code , customer_city , customer_state , '2005-03-01' , '9999-12-31' FROM customer_stg ; LOAD DATA INFILE 'product.txt' INTO TABLE product_stg FIELDS TERMINATED BY '' OPTIONALLY ENCLOSED BY '' LINES TERMINATED BY '\r\n' IGNORE 1 LINES ( product_code , product_name , product_category ) ; INSERT INTO product_dim SELECT NULL , product_code , product_name , product_category , '2005-03-01' , '9999-12-31' FROM product_stg ; INSERT INTO order_dim SELECT NULL , order_number , order_date , '9999-12-31' FROM source.sales_order WHERE order_date >= '2005-03-01' AND order_date < '2007-02-28' ; INSERT INTO sales_order_fact SELECT order_sk , customer_sk , product_sk , date_sk , order_amount FROM source.sales_order a , order_dim b , customer_dim c , product_dim d , date_dim e WHERE a.order_number = b.order_number AND a.customer_number = c.customer_number AND a.product_code = d.product_code AND a.order_date = e.date AND order_date >= '2005-03-01' AND order_date < '2007-02-28' ; /* end of script */