You use SCD1 if you do not need to maintain dimension history. When source data changes, you update the existing data in the corresponding dimension table.
As an example, I will apply SCD1 to the customer_dim table created and populated in Chapter 1. Recall that the table has six records as shown in Table 2.1.
customer _number | customer _name | customer _street _address | customer _zip _code | customer _city | customer _state |
---|---|---|---|---|---|
1 | Big Customers | 7500 Louise Dr. | 17050 | Mechanicsburg | PA |
2 | Small Stores | 2500 Woodland St. | 17055 | Pittsburgh | PA |
3 | Medium Retailers | 1111 Ritter Rd. | 17055 | Pittsburgh | PA |
4 | Good Companies | 9500 Scott St | 17050 | Mechanicsburg | PA |
5 | Wonderful Shops | 3333 Rossmoyne Rd. | 17050 | Mechanicsburg | PA |
6 | Loyal Clients | 7070 Ritter Rd. | 17055 | Pittsburgh | PA |
Suppose customer details have changed and now the content of the customer table in the source is as presented in Table 2.2.
customer _number | customer _name | customer _street _address | customer _zip _code | customer _city | customer _state |
---|---|---|---|---|---|
1 | Really Large Customers | 7500 Louise Dr. | 17050 | Mechanicsburg | PA |
2 | Small Stores | 2500 Woodland St. | 17055 | Pittsburgh | PA |
3 | Medium Retailers | 1111 Ritter Rd. | 17055 | Pittsburgh | PA |
4 | Good Companies | 9500 Scott St. | 17050 | Mechanicsburg | PA |
5 | Wonderful Shops | 3333 Rossmoyne Rd. | 17050 | Mechanicsburg | PA |
6 | Loyal Clients | 7070 Ritter Rd. | 17055 | Pittsburgh | PA |
7 | Distinguished Partners | 9999 Scott St. | 17050 | Mechanicsburg | PA |
As you can see, the name of the first customer has changed and there is now one more customer, customer number 7.
You can apply SCD1 to the customer_dim table in the data warehouse by running the script in Listing 2.1, which assumes that the new customer information has been uploaded to a staging (temporary) table named customer_stg.
Listing 2.1: Applying SCD1 to the customer names in customer_dim
/ *****************************************************************/ /* */ /* scd1.sql */ /* */ / *****************************************************************/ /* default database to dw */ USE dw; /* update existing customers */ UPDATE customer_dim a, customer_stg b SET a.customer_name = b.customer_name WHERE a.customer_number = b.customer_number AND a.expiry_date = '9999-12-31' AND a.customer_name <> b.customer_name ; /* add new customers */ INSERT INTO customer_dim SELECT NULL , customer_number , customer_name , customer_street_address , customer_zip_code , customer_city , customer_state , CURRENT_DATE , '9999-12-31' FROM customer_stg WHERE customer_number NOT IN ( SELECT b.customer_number FROM customer_dim a, customer_stg b WHERE a.customer_number = b.customer_number ) ; /* end of script */
The script in Listing 2.1 contains two SQL statements, an Update statement and an Insert statement. The Update statement copies the value of the customer_name column in the staging table to the customer_name column in the customer_dim table. The Insert statement inserts the record in the staging table that is not yet present in the customer_dim table. Running the script updates the name of the first customer and inserts the seventh customer in the staging table to the customer_dim table.
Before you run the script file, however, note that we don’t have a staging table yet, but we have the content of the current customer table (from the source database) in a customer.csv file. As such, applying SCD1 to the customer_dim table consists of two steps:
Creating the staging table customer_stg and uploading the content of the customer.csv file into it.
Running the script in Listing 2.1.
These steps are explained in the subsections below.
The customer.csv file contains the current customer information and can be found in the zip file accompanying this book. Its content is as follows.
CUSTOMER NO, CUSTOMER NAME,STREET ADDRESS, ZIP CODE,CITY,STATE 1,Really Large Customers, 7500 Louise Dr., 17050, Mechanicsburg, PA 2,Small Stores, 2500 Woodland St., 17055, Pittsburgh, PA 3,Medium Retailers, 1111 Ritter Rd., 17055, Pittsburgh, PA 4,Good Companies, 9500 Scott St., 17050, Mechanicsburg, PA 5,Wonderful Shops, 3333 Rossmoyne Rd., 17050, Mechanicsburg, PA 6,Loyal Clients, 7070 Ritter Rd., 17055, Pittsburgh, PA 7,Distinguished Partners, 9999 Scott St., 17050, Mechanicsburg, PA
The script in Listing 2.2 creates the customer_stg table in the data warehouse database (dw), and loads the customer.csv source file to the customer_stg table.
Listing 2.2: Creating and loading the customer_stg table
/****************************************************************/ /* */ /* create_customer_stg.sql */ /* */ /****************************************************************/ /* default database to dw */ USE dw; /* create customer_stg table */ CREATE TABLE customer_stg (customer_number INT customer_name CHAR (30) customer_street_address CHAR (30) customer_zip_code INT (5) customer_city CHAR (30) customer_state CHAR (2) ) ; /* clean up customer_stg table and load customer.csv */ TRUNCATE customer_stg; LOAD DATA INFILE 'customer.csv' INTO TABLE customer_stg FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY "" LINES TERMINATED BY '\r\n' IGNORE 1 LINES ( customer_number , customer_name , customer_street_address , customer_zip_code , customer_city , customer_state ) ; /* end of script */
Run the script in Listing 2.2 by using this command:
mysql> \. c:\mysql\scripts\create_customer_stg.sql
Now you’re ready to run the SCD1 script in Listing 2.1. Before you do that, set your MySQL date to February 2, 2007 (a date later than the one you set in Chapter 1) to help you easily identify the newly added customer). After you set the date, run the scd1.sql script:
mysql> \. c:\mysql\scripts\scd1.sql
Now confirm that the script in Listing 2.1 was executed successfully by querying the customer_dim table:
mysql> select * from customer_dim \G
You should see the following result.
*************************** 1. row *************************** customer_sk: 1 customer_number: 1 customer_name: Really Large Customers customer_street_address: 7500 Louise Dr. customer_zip_code: 17050 customer_city: Mechanicsburg customer_state: PA effective_date: 2007-02-01 expiry_date: 9999-12-31 *************************** 2. row *************************** customer_sk: 2 customer_number: 2 customer_name: Small Stores customer_street_address: 2500 Woodland St. customer_zip_code: 17055 customer_city: Pittsburgh customer_state: PA effective_date: 2007-02-01 expiry_date: 9999-12-31 *************************** 3. row *************************** customer_sk: 3 customer_number: 3 customer_name: Medium Retailers customer_street_address: 1111 Ritter Rd. customer_zip_code: 17055 customer_city: Pittsburgh customer_state: PA effective_date: 2007-02-01 expiry_date: 9999-12-31 *************************** 4. row *************************** customer_sk: 4 customer_number: 4 customer_name: Good Companies customer_street_address: 9500 Scott St. customer_zip_code: 17050 customer_city: Mechanicsburg customer_state: PA effective_date: 2007-02-01 expiry_date: 9999-12-31 *************************** 5. row *************************** customer_sk: 5 customer_number: 5 customer_name: Wonderful Shops customer_street_address: 3333 Rossmoyne Rd. customer_zip_code: 17050 customer_city: Mechanicsburg customer_state: PA effective_date: 2007-02-01 expiry_date: 9999-12-31 *************************** 6. row *************************** customer_sk: 6 customer_number: 6 customer_name: Loyal Clients customer_street_address: 7070 Ritter Rd. customer_zip_code: 17055 customer_city: Pittsburgh customer_state: PA effective_date: 2007-02-01 expiry_date: 9999-12-31 *************************** 7. row *************************** customer_sk: 7 customer_number: 7 customer_name: Distinguished Partners customer_street_address: 9999 Scott St. customer_zip_code: 17050 customer_city: Mechanicsburg customer_state: PA effective_date: 2007-02-02 expiry_date: 9999-12-31 7 rows in set (0.00 sec)
The query’s output shows that:
The name of first customer has changed to ‘Really Large Customers’. This is the only name for the first customer, which means that no history is maintained for this customer. The effective and expiry dates of the first six customers do not change; only the name of the first customer changed.
Customer 7 is added with an effective date of 2007–02–02 (the date you ran the script).