Chapter 23: Dimension Consolidation


When the number of dimensions in your data warehouse increases, you might find some common data in more than one dimension. For example, the zip code, city, and state are in the customer and shipping addresses of the customer dimension as well as in the factory dimensions. This chapter explains how you can consolidate the zip code from the three dimensions into a new zip code dimension.

Revising the Data Warehouse Schema

To consolidate dimensions, you need to change the data warehouse schema. The revised schema is shown in Figure 23.1. A new zip_code_dim table has been added, and the structures of the sales_order_fact and production_fact tables have been changed. Note that only the tables related to the zip code are shown.

image from book
Figure 23.1: The schema after zip_code_dim is added

The zip_code_dim table is related to the two fact tables. The relationships replace those from the customer and factory dimensions. You need two relationships to the sales_order_fact table, one for the customer address and one for the shipping address. There is only one relationship to the production_fact table, therefore only the factory zip code surrogate key was added in this fact table.

The script in Listing 23.1 can be used to create the zip_code_dim table.

Listing 23.1: Creating the zip_code_dim table

image from book
 /****************************************************************/ /*                                                              */ /* zip_code_dim.sql                                             */ /*                                                              */ /****************************************************************/ /* default to dw                                                */ USE dw; CREATE TABLE zip_code_dim (   zip_code_sk INT NOT NULL AUTO_INCREMENT PRIMARY KEY , zip_code INT (5) , city CHAR (30) , state CHAR (2) , effective_date DATE , expiry_date DATE ) ; /* end of script                                                */
image from book

Run the script in Listing 23.1 to create the zip_code_dim table:

 mysql> \. c:\mysql\scripts\zip_code_dim.sql

Then, you need to populate the zip_code_dim table. I have provided the necessary zip code information in the zip_code.csv file below.

 ZIP CODE,CITY,STATE 17050,PITTSBURGH,PA 17051,MC VEYTOWN,PA 17052,MAPLETON DEPOT,PA 17053,MARYSVILLE,PA 17054,MATTAWANA,PA 17055,MECHANICSBURG,PA 44102,CLEVELAND,OH

As usual, you load data from a CSV file to a staging table. The script in Listing 23.2 can be used to create a staging table called zip_code_stg.

Listing 23.2: Creating the zip_code_stg table

image from book
 /****************************************************************/ /*                                                              */ /* zip_code_stg.sql                                             */ /*                                                              */ /****************************************************************/ /* default to dw                                                */ USE dw; CREATE TABLE zip_code_stg (   zip_code INT (5) , city CHAR (30) , state CHAR (2) ) ; /* end of script                                                */
image from book

Run the script in Listing 23.2 to create the staging table.

 mysql> \. c:\mysql\scripts\zip_code_stg.sql

Assuming the zip_code.csv file contains the complete postal codes and these codes never change, you need to load the zip code file to the data warehouse once only. This is a type of pre-population called one-time population.

The script in Listing 23.3 contains the script for loading the zip code data into the zip_code_stg staging table.

Listing 23.3: Zip_code_dim population

image from book
 /****************************************************************/ /*                                                              */ /* zip_code_population.sql                                      */ /*                                                              */ /****************************************************************/ /* default to dw                                                */ USE dw; TRUNCATE zip_code_stg; LOAD DATA INFILE 'zip_code.csv' INTO TABLE zip_code_stg FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '' LINES TERMINATED BY '\r\n' IGNORE 1 LINES ( zip_code , city state        ) ; INSERT INTO zip_code_dim SELECT   NULL , zip_code , city , state , '0000-00-00' , '9999-12-31' FROM zip_code_stg ; /* end of script                                                 */
image from book

Run the script in Listing 23.3 to populate the zip_code_dim table.

 mysql> \. c:\mysql\scripts\zip_code_population.sql

Here is the message that you will see as the response to the query.

 Database changed Query OK, 1 row affected (0.09 sec) Query OK, 7 rows affected (0.08 sec) Records: 7  Deleted: 0  Skipped: 0  Warnings: 0 Query OK, 7 rows affected (0.07 sec) Records: 7  Duplicates: 0  Warnings: 0 

You can query the zip_code_dim table to confirm correct population using this statement.

 mysql> select zip_code_sk sk, zip_code zip, city, state from        zip_code_dim;

The query result is as follows.

 +-----+-------+----------------+-------+ | sk  | zip   | city           | state | +-----+-------+----------------+-------+ |  1  | 17050 | PITTSBURGH     | PA    | |  2  | 17051 | MC VEYTOWN     | PA    | |  3  | 17052 | MAPLETON DEPOT | PA    | |  4  | 17053 | MARYSVILLE     | PA    | |  5  | 17054 | MATTAWANA      | PA    | |  6  | 17055 | MECHANICSBURG  | PA    | |  7  | 44102 | CLEVELAND      | OH    | +-----+-------+----------------+-------+ 7 rows in set (0.41 sec)

Now you need to revise the sales_order_fact table and the other tables. Here are the five steps you need to do to update the database structure.

  1. Create a customer_zip_code_dim view and a shipping_zip_code_dim based on the zip_code_dim table.

  2. Add the customer_zip_code_sk and shipping_zip_code_sk columns to the sales_order_fact table.

  3. Initially load the two zip code surrogate key columns based on the existing customer and shipping zip codes.

  4. Remove the customer and shipping codes as well as their cities and states from the customer_dim table

  5. Remove the customer city, state, and zip code from the pa_customer_dim table

The script in Listing 23.4 implements the above five revisions.

Listing 23.4: The revised sales_order_fact script

image from book
 /********************************************************************/ /*                                                                  */ /* sales_order_fact_23.sql                                          */ /*                                                                  */ /********************************************************************/ USE dw; CREATE VIEW customer_zip_code_dim ( customer_zip_code_sk , customer_zip_code , customer_city , customer_state , effective_date , expiry_date ) AS SELECT   zip_code_sk , zip_code , city , state , effective_date , expiry_date FROM zip_code_dim ; CREATE VIEW shipping_zip_code_dim ( shipping_zip_code_sk , shipping_zip_code , shipping_city , shipping_state , effective_date , expiry_date ) AS SELECT   zip_code_sk , zip_code , city , state , effective_date , expiry_date FROM zip_code_dim ; ALTER TABLE sales_order_fact   ADD customer_zip_code_sk INT AFTER customer_sk , ADD shipping_zip_code_sk INT AFTER customer_zip_code_sk ; UPDATE sales_order_fact a, customer_dim b, customer_zip_code_dim c SET   a.customer_zip_code_sk = c.customer_zip_code_sk WHERE   a.customer_sk = b.customer_sk AND b.customer_zip_code = c.customer_zip_code ; UPDATE   sales_order_fact a , customer_dim b , shipping_zip_code_dim c SET   a.shipping_zip_code_sk = c.shipping_zip_code_sk WHERE   a.customer_sk = b.customer_sk AND b.shipping_zip_code = c.shipping_zip_code ; ALTER TABLE customer_dim   DROP customer_zip_code , DROP customer_city , DROP customer_state , DROP shipping_zip_code , DROP shipping_city , DROP shipping_state ; ALTER TABLE pa_customer_dim   DROP customer_zip_code , DROP customer_city , DROP customer_state , DROP shipping_zip_code , DROP shipping_city , DROP shipping_state ; /* end of script                                                    */
image from book



Dimensional Data Warehousing with MySQL. A Tutorial
Dimensional Data Warehousing with MySQL: A Tutorial
ISBN: 0975212826
EAN: 2147483647
Year: 2004
Pages: 149

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net