Before you run the initial population script in Listing 7.1, you need to perform the following steps.
Clearing all the tables, including the staging tables
Pre-populating the date dimension
Preparing customer and product test data
Creating test data in the source database
These steps are discussed in the subsections below.
To clear all tables, use the script in Listing 7.2.
Listing 7.2: Script for truncating the tables
/*****************************************************************/ /* */ /* truncate_tables.sql */ /* */ /*****************************************************************/ USE dw; TRUNCATE customer_dim; TRUNCATE product_dim; TRUNCATE order_dim; TRUNCATE date_dim; TRUNCATE sales_order_fact; TRUNCATE customer_stg; TRUNCATE product_stg; USE source; TRUNCATE sales_order; /* end of script */
You run the script in Listing 7.2 by invoking this command.
mysql> \. c:\mysql\scripts\truncate_tables.sql
MySQL will respond by telling you how many records are deleted from each table.
Database changed Query OK, 7 rows affected (0.17 sec) Query OK, 4 rows affected (0.05 sec) Query OK, 20 rows affected (0.06 sec) Query OK, 5 rows affected (0.06 sec) Query OK, 20 rows affected (0.05 sec) Query OK, 7 rows affected (0.06 sec) Query OK, 3 rows affected (0.06 sec) Database changed Query OK, 8 rows affected (0.07 sec)
Once all the tables are empty, you can pre-populate the date dimension for the period of March 1, 2005 to December 31, 2010 by executing the pre_populate_date stored procedure discussed in Chapter 6, “Populating the Date Dimension.” Make sure that the current database is dw by calling:
mysql> use dw;
Then, call the stored procedure, passing the start date and the end date as arguments.
mysql> call pre_populate_date ('2005-03-01', '2010-12-31');
The next step after pre-populating the date_dim table is to prepare the customer and product test data. I've prepared the customer data in a CSV file and the product data in a fixed-width text file.
Here is the customer source data.
CUSTOMER NO,CUSTOMER NAME,STREET ADDRESS,ZIP CODE,CITY,STATE 1,Really Large Customers, 7500 Louise Dr.,17050, Mechanicsburg,PA 2,Small Stores, 2500 Woodland St.,17055, Pittsburgh,PA 3,Medium Retailers,1111 Ritter Rd.,17055,Pittsburgh,PA 4,Good Companies,9500 Scott St.,17050,Mechanicsburg,PA 5,Wonderful Shops,3333 Rossmoyne Rd.,17050,Mechanicsburg,PA 6,Loyal Clients,7070 Ritter Rd.,17055,Pittsburgh,PA 7,Distinguished Partners,9999 Scott St.,17050,Mechanicsburg,PA
And here is the product data.
PRODUCT CODE,PRODUCT NAME,PRODUCT GROUP 1 Hard Disk Drive Storage 2 Floppy Drive Storage 3 LCD Panel Monitor
The last set of test data you need is sales orders. The script in Listing 7.3 can be used to insert 21 sales orders into the sales_order table in the source database.
Listing 7.3: Sales orders for testing initial population
/*****************************************************************/ /* */ /* sales_order_initial.sql */ /* */ /*****************************************************************/ USE source; INSERT INTO sales_order VALUES (1, 1, 1, '2005-02-01', '2005-02-01', 1000) , (2, 2, 2, '2005-02-10', '2005-02-10', 1000) , (3, 3, 3, '2005-03-01', '2005-03-01', 4000) , (4, 4/ 1, '2005-04-15', '2005-04-15', 4000) , (5, 5, 2, '2005-05-20', '2005-05-20', 6000) , (6, 6, 3, '2005-07-30', '2005-07-30', 6000) , (7, 7, 1, '2005-09-01', '2005-09-01', 8000) , (8, 1, 2, '2005-11-10', '2005-11-10', 8000) , (9, 2, 3, '2006-01-05', '2006-01-05', 1000) , (10, 3, 1, '2006-02-10', '2006-02-10', 1000) , (11, 4, 2, '2006-03-15', '2006-03-15', 2000) , (12, 5, 3, '2006-04-20', '2006-04-20', 2500) , (13, 6, 1, '2006-05-30', '2006-05-30', 3000) , (14, 7, 2, '2006-06-01', '2006-06-01', 3500) , (15, 1, 3, '2006-07-15', '2006-07-15', 4000) , (16, 2, 1, '2006-08-30', '2006-08-30', 4500) , (17, 3, 2, '2006-09-05', '2006-09-05', 1000) , (18, 4, 3, '2006-10-05', '2006-10-05', 1000) , (19, 5, 1, '2007-01-10', '2007-01-10', 4000) , (20, 6, 2, '2007-02-20', '2007-02-20', 4000) , (21, 7, 3, '2007-02-28', '2007-02-28', 4000) ; /* end of script */
Now, run the sales_order_initial.sql script by using this command.
mysql> \. c:\mysql\scripts\sales_order_initial.sql
You’ll see that 21 new rows were inserted.
Database changed Query OK, 21 rows affected (0.05 sec) Records: 21 Duplicates: 0 Warnings: 0
Note | The historical data to load is from March 1, 2005. Therefore, the first two orders will not be loaded. |
Now that the test data is ready, it's time to perform the initial population. First of all, however, you need to set your MySQL date to 28 February, 2007, one day before the start of your data warehouse operation. You run the initial population at the end of the day, when no more data gets in and the source data is ready.
You are now ready to test the initial population. Run the dw_initial.sql population script in Listing 7.1 by using this command.
mysql> \. c:\mysql\scripts\dw_initial.sql
On your console, you should see the following messages.
Database changed Query OK, 7 rows affected (0.06 sec) Records: 7 Deleted: 0 Skipped: 0 Warnings: 0 Query OK, 7 rows affected (0.06 sec) Records: 7 Duplicates: 0 Warnings: 0 Query OK, 3 rows affected (0.06 sec) Records: 3 Deleted: 0 Skipped: 0 Warnings: 0 Query OK, 3 rows affected (0.07 sec) Records: 3 Duplicates: 0 Warnings: 0 Query OK, 19 rows affected (0.06 sec) Records: 19 Duplicates: 0 Warnings: 0 Query OK, 19 rows affected (0.10 sec) Records: 19 Duplicates: 0 Warnings: 0
You can use the query in Listing 7.4 to check that nineteen sales orders have been loaded correctly.
Listing 7.4: Query to confirm the sales orders are loaded correctly
/*****************************************************************/ /* */ /* confirm_initial_population.sql */ /* */ /*****************************************************************/ USE dw; SELECT order_number on , customer_name , product_name , date , order_amount amount FROM sales_order_fact a , customer_dim b , product_dim c , order_dim d , date_dim e WHERE a.customer_sk = b.customer_sk AND a.product_sk = c.product_sk AND a.order_sk = d.order_sk AND a.order date sk = e.date_sk ; /* end of script */
Run the query in Listing 7.4 by using this command.
mysql> \. c:\mysql\scripts\confirm_initial_population.sql
The result should be as follows.
Database changed +--+------------------------+-----------------+-----------+---------+ |no| customer_name | product _name | date | amount | +--+------------------------+-----------------+-----------+---------+ | 3| Medium Retailers | LCD Panel | 2005-03-01| 4000.00 | | 4| Good Companies | Hard Disk Drive | 2005-04-15| 4000.00 | | 5| Wonderful Shops | Floppy Drive | 2005-05-20| 6000.00 | | 6| Loyal Clients LCD | Panel | 2005-07-30| 6000.00 | | 7| Distinguished Partners | Hard Disk Drive | 2005-09-01| 8000.00 | | 8| Really Large Customers | Floppy Drive | 2005-11-10| 8000.00 | | 9| Small Stores | LCD Panel | 2006-01-05| 1000.00 | |10| Medium Retailers | Hard Disk Drive | 2006-02-10| 1000.00 | |11| Good Companies | Floppy Drive | 2006-03-15| 2000.00 | |12| Wonderful Shops | LCD Panel | 2006-04-20| 2500.00 | |13| Loyal Clients | Hard Disk Drive | 2006-05-30| 3000.00 | |14| Distinguished Partners | Floppy Drive | 2006-06-01| 3500.00 | |15| Really Large Customers | LCD Panel | 2006-07-15| 4000.00 | |16| Small Stores | Hard Disk Drive | 2006-08-30| 4500.00 | |17| Medium Retailers | Floppy Drive | 2006-09-05| 1000.00 | |18| Good Companies | LCD Panel | 2006-10-05| 1000.00 | |19| Wonderful Shops | Hard Disk Drive | 2007-01-10| 4000.00 | |20| Loyal Clients | Floppy Drive | 2007-02-20| 4000.00 | |21| Distinguished Partners | LCD Panel | 2007-02-28| 4000.00 | +--+------------------------+-----------------+-----------+---------+ 19 rows in set (0.00 sec)