Example: Using Slowly Changing Dimensions

 < Day Day Up > 



This example shows you how to use slowly changing dimensions to capture a history of changes to the data in a table. The history of changes enables analysis.

The example joins two source tables with the SQL Join transformation and loads a target table using the SCD Type 2 Loader transformation. The target table, ORGANIZATION_DIM, enables the analysis of trends in hiring, promotion, and salary for the employees in a company.

Historical information is maintained in the target table by retaining outdated rows alongside the current row for each record. When an existing record is updated, the existing row is closed out (no longer updated) and the new current row is added.

In this example, the SCD Type 2 Loader detects changes between existing target rows and incoming source rows. When a change is detected, the existing target row receives a new end date/time value that closes out that row. Then the new row is written into the target using the start and end date/time values from the source. The new row becomes the current row for that record.

Preparation

Two source tables, ORGANIZATION and STAFF, have been loaded from transactional data into an enterprise data warehouse. These two sources will be joined using the SQL Join transformation template. The output of the SQL Join transformation becomes the input to the SCD Type 2 Loader transformation, which loads the target table ORGANIZATION_DIM.

ORGANIZATION_DIM has been created with the Target Designer. Initially, the table contains a combination of columns from the ORGANIZATION and STAFF tables. These columns will be changed in the course of this example to arrive at the final configuration.

Metadata has been created for all four tables. All four tables are available for checkout from the foundation repository.

Create and Populate the Job

Follow these steps to create the Load the Organization Dimension job and to populate that job with transformations and tables:

  1. Start SAS ETL Studio and connect to the appropriate repository on the metadata server.

  2. In SAS ETL Studio, in the Shortcuts pane, click Process Designer to start the New Job Wizard.

  3. In the New Job Wizard, type the job name Load the Organization Dimension and click Finish. An empty Process Designer window is displayed.

  4. In the tree view, click the Process Library tab, then expand the Data Transforms folder.

  5. In the Data Transforms folder, click and drag SQL Join into the Process Designer window. The transform appears in the job with two source drop areas and one target drop area.

  6. In the Data Transforms folder, click and drag SCD Type 2 Loader into the Process Designer. Release the mouse button when the cursor is in the target drop area of the SQL Join transformation.

    click to expand
    Display 10.14: Transformations Added to the Job

  7. In the tree view, click the Inventory tab and expand the Tables folder.

  8. In the Tables folder, select STAFF, ORGANIZATION, and ORGANIZATION_DIM. Right-click and select Change Management Check Out. A check mark appears in the icons for all four tables.

  9. In the tree view, select the Project tab to continue work with the checked-out tables.

  10. In the Project tree, click and drag STAFF into one of the two source drop areas of the SQL Join transformation.

    click to expand
    Display 10.15: Data in the STAFF Source Table

  11. In the Project tree, click and drag ORGANIZATION into the second source drop area of the SQL Join transformation.

  12. In the Project tree, click and drag ORGANIZATION_DIM into the target drop area of the SCD Type 2 Loader. The job is now fully populated with tables and transformations.

    click to expand
    Display 10.16: Data in the ORGANIZATION Source Table

    click to expand
    Display 10.17: Fully Populated Job

Configure ORGANIZATION_DIM

Follow these steps to configure the target table ORGANIZATION_DIM.

  1. In the Process Designer, double-click ORGANIZATION_DIM to display its properties window.

  2. In the properties window, click the Columns tab.

  3. In the Columns tab, delete, rename, and rearrange columns to create the planned configuration.

    click to expand
    Display 10.18: Configured Columns in ORGANIZATION_DIM

  4. ORGANIZATION_DIM needs two new columns to meet business requirements. To add the first column, click Term_Date and click New. A new untitled column appears beneath Term_Date.

  5. Replace the default name of the new column with the name Load_Time. This column will contain the date and time that each row was physically loaded into ORGANIZATION_DIM. This data will be provided by the SCD Type 2 Loader.

  6. In the properties window of ORGANIZATION_DIM, in the row for Load_Time, double-click the Type column and select Numeric.

  7. In the row for Load_Time, double-click the Format column and type DATETIME20., which is a numeric format.

  8. To add the other new column, click Employee_ID and click New to display a new untitled column. Replace the default column name with the name Gen_Emp_ID. This column will be added to the primary key of ORGANIZATION_DIM to isolate the dimension table from possible duplication of values in the business key column Employee_ID. Later in this example, the SCD Type 2 Loader will be configured to generate key numbers.

  9. Press the TAB key twice, then double click and select the Numeric data type.

  10. Press the TAB key once, then click and type 12., which is a numeric format.

    click to expand
    Display 10.19: Configured Target Table ORGANIZATION_DIM

  11. Click OK to save changes and close the properties window.

Configure the SCD Type 2 Loader

Follow these steps to specify change tracking columns, specify the business key column, add new columns for load time and generated key:

  1. In the Process Designer, double-click the SCD Type 2 Loader to display its properties window.

  2. In the properties window, click the Change Tracking tab.

  3. In the Change Tracking tab, click and hold on Employee_ID. Drag down to select the new begin date/time column Job_Start_Date.

  4. Click, hold, and drag on Load Time. Select the new end date/time column Job_End_Date.

    click to expand
    Display 10.20: Change Tracking Columns in the SCD Type 2 Loader

  5. Click Apply and click the Business Key tab.

  6. In the Business Key tab, click New to display the ORGANIZATION_DIM Columns window.

  7. In the Columns window, click Employee_ID, then click OK to return to the Business Key tab.

    click to expand
    Display 10.21: Business Key Specified in the SCD Type 2 Loader

  8. In the Business Key tab, click Apply to save changes and click the Options tab.

  9. In the Options tab, click in the field to the right of Load Time Column. Type the column name Load_Time.

  10. Click Apply, then click the Generated Key tab.

  11. In the Generated Key tab, click the down arrow to open the pull-down menu and select the column Gen_Emp_ID.

    click to expand
    Display 10.22: Generated Key Definition in the SCD Type 2 Loader

  12. Click Apply, then click the Mapping tab.

  13. In the Mapping tab, click and drag between columns to create new mappings, as shown in the following display.

    click to expand
    Display 10.23: Mapped Columns in the SCD Type 2 Loader

  14. Click OK to save changes and close the properties window.

Run the Job and View the Results

The job is now fully configured and is ready to run. In the Process Designer, right-click and select Save. Then right-click and select Submit.

If job execution terminates due to errors, click the Log tab, locate the error, resolve the error in the Process Editor, and submit the job again.

To view the results of the job, click the Process Editor tab, right-click ORGANIZATION_DIM, and select View Data.

click to expand
Display 10.24: Data in ORGANIZATION_DIM

Check In the Job

To check in a job in the Project tree:

  1. In the Project tree, select the repository icon.

  2. On the SAS ETL Studio desktop, select Project Check In Repository from the menu bar. All of the objects in the project repository are checked in to the change-managed repository.



 < Day Day Up > 



SAS Institute - SAS 9.1.3 ETL Studio. User's Guide
SAS 9.1.3 ETL Studio: Users Guide
ISBN: 1590476352
EAN: 2147483647
Year: 2004
Pages: 127
Authors: SAS Institute

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net