Slowly Changing Dimension Type 1 (SCD1)


You use SCD1 if you do not need to maintain dimension history. When source data changes, you update the existing data in the corresponding dimension table.

As an example, I will apply SCD1 to the customer_dim table created and populated in Chapter 1. Recall that the table has six records as shown in Table 2.1.

Table 2.1: The customer_dim table before changes are applied
Open table as spreadsheet

customer _number

customer _name

customer _street _address

customer _zip _code

customer _city

customer _state

1

Big Customers

7500 Louise Dr.

17050

Mechanicsburg

PA

2

Small Stores

2500 Woodland St.

17055

Pittsburgh

PA

3

Medium Retailers

1111 Ritter Rd.

17055

Pittsburgh

PA

4

Good Companies

9500 Scott St

17050

Mechanicsburg

PA

5

Wonderful Shops

3333 Rossmoyne Rd.

17050

Mechanicsburg

PA

6

Loyal Clients

7070 Ritter Rd.

17055

Pittsburgh

PA

Suppose customer details have changed and now the content of the customer table in the source is as presented in Table 2.2.

Table 2.2: The revised customer details
Open table as spreadsheet

customer _number

customer _name

customer _street _address

customer _zip _code

customer _city

customer _state

1

Really Large Customers

7500 Louise Dr.

17050

Mechanicsburg

PA

2

Small Stores

2500 Woodland St.

17055

Pittsburgh

PA

3

Medium Retailers

1111 Ritter Rd.

17055

Pittsburgh

PA

4

Good Companies

9500 Scott St.

17050

Mechanicsburg

PA

5

Wonderful Shops

3333 Rossmoyne Rd.

17050

Mechanicsburg

PA

6

Loyal Clients

7070 Ritter Rd.

17055

Pittsburgh

PA

7

Distinguished Partners

9999 Scott St.

17050

Mechanicsburg

PA

As you can see, the name of the first customer has changed and there is now one more customer, customer number 7.

You can apply SCD1 to the customer_dim table in the data warehouse by running the script in Listing 2.1, which assumes that the new customer information has been uploaded to a staging (temporary) table named customer_stg.

Listing 2.1: Applying SCD1 to the customer names in customer_dim

image from book
 / *****************************************************************/ /*                                                                */ /* scd1.sql                                                       */ /*                                                                */ / *****************************************************************/ /* default database to dw                                         */ USE dw; /* update existing customers                                      */ UPDATE customer_dim a, customer_stg b SET a.customer_name = b.customer_name WHERE a.customer_number = b.customer_number   AND a.expiry_date = '9999-12-31'   AND a.customer_name <> b.customer_name ; /* add new customers                                               */ INSERT INTO customer_dim SELECT   NULL , customer_number , customer_name , customer_street_address , customer_zip_code , customer_city , customer_state , CURRENT_DATE , '9999-12-31' FROM customer_stg WHERE customer_number NOT IN (   SELECT b.customer_number   FROM customer_dim a, customer_stg b   WHERE a.customer_number = b.customer_number ) ; /* end of script                                                   */
image from book

The script in Listing 2.1 contains two SQL statements, an Update statement and an Insert statement. The Update statement copies the value of the customer_name column in the staging table to the customer_name column in the customer_dim table. The Insert statement inserts the record in the staging table that is not yet present in the customer_dim table. Running the script updates the name of the first customer and inserts the seventh customer in the staging table to the customer_dim table.

Before you run the script file, however, note that we don’t have a staging table yet, but we have the content of the current customer table (from the source database) in a customer.csv file. As such, applying SCD1 to the customer_dim table consists of two steps:

  1. Creating the staging table customer_stg and uploading the content of the customer.csv file into it.

  2. Running the script in Listing 2.1.

These steps are explained in the subsections below.

Creating and Loading the Customer Staging Table

The customer.csv file contains the current customer information and can be found in the zip file accompanying this book. Its content is as follows.

 CUSTOMER NO, CUSTOMER NAME,STREET ADDRESS, ZIP CODE,CITY,STATE 1,Really Large Customers, 7500 Louise Dr., 17050, Mechanicsburg, PA 2,Small Stores, 2500 Woodland St., 17055, Pittsburgh, PA 3,Medium Retailers, 1111 Ritter Rd., 17055, Pittsburgh, PA 4,Good Companies, 9500 Scott St., 17050, Mechanicsburg, PA 5,Wonderful Shops, 3333 Rossmoyne Rd., 17050, Mechanicsburg, PA 6,Loyal Clients, 7070 Ritter Rd., 17055, Pittsburgh, PA 7,Distinguished Partners, 9999 Scott St., 17050, Mechanicsburg, PA

The script in Listing 2.2 creates the customer_stg table in the data warehouse database (dw), and loads the customer.csv source file to the customer_stg table.

Listing 2.2: Creating and loading the customer_stg table

image from book
 /****************************************************************/ /*                                                              */ /* create_customer_stg.sql                                      */ /*                                                              */ /****************************************************************/ /* default database to dw                                       */ USE dw; /* create customer_stg table                                    */ CREATE TABLE customer_stg (customer_number INT customer_name CHAR (30) customer_street_address CHAR (30) customer_zip_code INT (5) customer_city CHAR (30) customer_state CHAR (2) ) ; /* clean up customer_stg table and load customer.csv            */ TRUNCATE customer_stg; LOAD DATA INFILE 'customer.csv' INTO TABLE customer_stg FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY "" LINES TERMINATED BY '\r\n' IGNORE 1 LINES ( customer_number , customer_name , customer_street_address , customer_zip_code , customer_city , customer_state ) ; /* end of script                                                */
image from book

Run the script in Listing 2.2 by using this command:

 mysql> \. c:\mysql\scripts\create_customer_stg.sql

Applying SCD1

Now you’re ready to run the SCD1 script in Listing 2.1. Before you do that, set your MySQL date to February 2, 2007 (a date later than the one you set in Chapter 1) to help you easily identify the newly added customer). After you set the date, run the scd1.sql script:

 mysql> \. c:\mysql\scripts\scd1.sql

Now confirm that the script in Listing 2.1 was executed successfully by querying the customer_dim table:

 mysql> select * from customer_dim \G

You should see the following result.

 *************************** 1. row ***************************             customer_sk: 1         customer_number: 1           customer_name: Really Large Customers customer_street_address: 7500 Louise Dr.       customer_zip_code: 17050           customer_city: Mechanicsburg          customer_state: PA          effective_date: 2007-02-01             expiry_date: 9999-12-31 *************************** 2. row ***************************             customer_sk: 2         customer_number: 2           customer_name: Small Stores customer_street_address: 2500 Woodland St.       customer_zip_code: 17055           customer_city: Pittsburgh          customer_state: PA          effective_date: 2007-02-01             expiry_date: 9999-12-31 *************************** 3. row ***************************             customer_sk: 3         customer_number: 3           customer_name: Medium Retailers customer_street_address: 1111 Ritter Rd.       customer_zip_code: 17055           customer_city: Pittsburgh          customer_state: PA          effective_date: 2007-02-01             expiry_date: 9999-12-31 *************************** 4. row ***************************             customer_sk: 4         customer_number: 4           customer_name: Good Companies customer_street_address: 9500 Scott St.       customer_zip_code: 17050           customer_city: Mechanicsburg          customer_state: PA          effective_date: 2007-02-01             expiry_date: 9999-12-31 *************************** 5. row ***************************             customer_sk: 5         customer_number: 5           customer_name: Wonderful Shops customer_street_address: 3333 Rossmoyne Rd.       customer_zip_code: 17050           customer_city: Mechanicsburg          customer_state: PA          effective_date: 2007-02-01             expiry_date: 9999-12-31 *************************** 6. row ***************************             customer_sk: 6         customer_number: 6           customer_name: Loyal Clients customer_street_address: 7070 Ritter Rd.       customer_zip_code: 17055           customer_city: Pittsburgh          customer_state: PA          effective_date: 2007-02-01             expiry_date: 9999-12-31 *************************** 7. row ***************************             customer_sk: 7         customer_number: 7           customer_name: Distinguished Partners customer_street_address: 9999 Scott St.       customer_zip_code: 17050           customer_city: Mechanicsburg          customer_state: PA          effective_date: 2007-02-02             expiry_date: 9999-12-31 7 rows in set (0.00 sec)

Analyzing the Result

The query’s output shows that:

  • The name of first customer has changed to ‘Really Large Customers’. This is the only name for the first customer, which means that no history is maintained for this customer. The effective and expiry dates of the first six customers do not change; only the name of the first customer changed.

  • Customer 7 is added with an effective date of 2007–02–02 (the date you ran the script).



Dimensional Data Warehousing with MySQL. A Tutorial
Dimensional Data Warehousing with MySQL: A Tutorial
ISBN: 0975212826
EAN: 2147483647
Year: 2004
Pages: 149

Similar book on Amazon
The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling (Second Edition)
The Data Warehouse Toolkit: The Complete Guide to Dimensional Modeling (Second Edition)
The Data Warehouse ETL Toolkit: Practical Techniques for Extracting, Cleanin
The Data Warehouse ETL Toolkit: Practical Techniques for Extracting, Cleanin
Successful Business Intelligence: Secrets to Making BI a Killer App
Successful Business Intelligence: Secrets to Making BI a Killer App

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net