Lesson 1: Planning and Creating DTS Workflows | Microsoft SQL Server 7 Data Warehousing Technical Support Training Kit

Transforming data with DTS involves planning and designing the transformation and creating and executing a DTS package. This lesson takes you through this process. The DTS Designer allows you to define complex workflows with multiple tasks based on conditional precedence constraints. In this lesson, you will learn how tasks, connections, data transformations, and workflows are represented and manipulated in the DTS Designer. You will also learn how to schedule packages for execution and how to execute them manually.

After this lesson, you will be able to:

Describe the design and planning steps taken before using DTS

Describe the data transformation process used by DTS

Use DTS Designer to create, edit, and save complex data transformations

Execute DTS packages and schedule DTS packages for execution

Estimated lesson time: 120 minutes

Restructuring and Mapping Data

Data transformation involves formatting and modifying data that is extracted from the data source into merged or derived values that are more useful at the destination. New values can easily be calculated from one or more columns in the source rowset, and a single source column can be decomposed into multiple destination columns.

Mapping Data Types

DTS allows you to specify the attributes of the destination columns and to indicate how the source columns are mapped to the destination columns. Transformation flags specify whether data of one type in the source can be converted to another type in the destination. For example, you can allow data type promotion, such as converting 16-bit integers to 32-bit integers, or data type demotion, such as converting 32-bit integers to 16-bit integers (data may be lost in this case). You can also require an exact match between source and destination data types.

Each database defines its own data types as well as its column and object naming conventions. DTS attempts to define the best possible data type matches between a source and a destination. However, you can override DTS mappings, specifying different destination data type, size, precision, and scale properties.

Merging and Separating Data

While creating your data warehouse databases, you will frequently need to combine data from multiple sources into a single table. At other times you may need to divide a data source into two or more separate tables in your data warehouse database especially when pulling data from nonrelational sources. You can merge and separate data in several ways.

At the File Level

You can combine information from multiple sources and place it into a single table, or you can place information from a single source into multiple tables. Heterogeneous joins are an example of combining multiple sources into a single result set that is saved at the destination. Heterogeneous joins are fully supported in SQL Server; they enable you to retrieve data from different databases located on different servers in the same SQL query. The source databases for a heterogeneous join can be any database for which you have an ODBC driver or OLE DB provider.

At the Column Level

You can combine information from multiple columns and place it into a single column, or you can place information from a single column into multiple columns. For example, you can summarize monthly sales totals for each product, or you can decompose a phone number that is stored in one column in order to store the area code in one column and the phone number in another.

Creating a DTS Package

When you use DTS to transfer and transform data, you create DTS packages that describe all of the work to be performed as part of the transformation process. You can create the DTS package interactively, using the DTS wizards or DTS Designer, or programmatically, using a language that supports OLE Automation, such as Visual Basic.

Saving a DTS Package

Saving the DTS package allows you to modify it, reuse it, or schedule it for later execution. If you do not save the DTS package, it executes immediately. You must save the DTS package if you want to schedule it for later execution.

You can save a DTS package in one of three locations:

To SQL Server

Packages saved to SQL Server are referred to as local packages and are stored in the msdb database. Local packages are the most efficient and are available to other SQL Servers. They are listed in the Local Packages node under Data Transformation Services in the console tree.

To Microsoft Repository

The Microsoft Repository is a database that stores descriptive information about software components and their relationships. The items in the repository are available to repository tools through a set of published COM interfaces and information models that define database schema and data transformations through shared metadata. This repository makes it possible for tool vendors to build tools that interact with databases and software components without advance knowledge of the databases or software components. Impromptu and PowerPlay from Cognos are two examples of third party tools that currently support Microsoft Repository. You can find examples of how tool vendors are supporting the Microsoft Repository at msdn.microsoft.com/repository.

Saving a DTS package in the Microsoft Repository makes package metadata reusable and available to other applications. Using the Microsoft Repository also enables you to track data lineage at the package and row levels of a table, which allows you to determine the source of each piece of data and the transformations that are applied to it. Tracking data lineage is important for data warehousing, making the Microsoft Repository the preferred location for saving DTS packages used for transferring data to a data warehouse.

When you save a package to the Microsoft Repository, the package is stored in the msdb database, but package metadata can be imported into the repository. Packages stored in this way are listed in the Repository Packages node under Data Transformation Services in the console tree. After you import the package metadata into the repository, you can view it using the Metadata Browser in the Metadata node under Data Transformation Services in the console tree. The Metadata Browser lists information about each version of a package and the executions of that package.

To File

Saving a DTS package in a COM-structured storage file makes it easy to distribute the DTS package using e-mail or network file servers. DTS packages saved as files do not appear in the SQL Server Enterprise Manager console tree. To open a package file for editing, right-click Data Transformation Services in the console tree, point to All Tasks, and click Open Package.

Implementing Package Security

You can password protect DTS packages that are saved to SQL Server or to COM-structured storage files in order to protect sensitive information. You cannot password protect packages that are stored to the Microsoft Repository because information in the repository is intended to be used by any repository tool. Currently, the repository does not provide security, but it may in future releases. When a DTS package is password protected, it is automatically encrypted. All of the package s collections and properties are encrypted, except for the package name, description, ID, version, and creation date.

DTS packages support two levels of security: owner password and user password.

Owner Password

This level of security provides complete access to all collections and properties. By default, DTS packages without owner passwords are not encrypted and can be read by any user with access to the package. If you specify an owner password for a package, you cannot view or edit the package without the owner password.

User Password

This level of security allows a user to execute but not edit or view the package definition. If a user password is specified, an owner password must be supplied.

Package Versioning

A package has two IDs: a package ID and a version ID. The package ID is generated when the package is created. The version ID is assigned when the package is created, and it is given a new value every time that the package is updated or saved. Package versioning makes it possible for SQL Server to track which version of a package was used to perform a specific transformation.

Defining Workflows

With DTS, you can define a workflow that controls the execution sequence of each step. Control of flow logic and conditional processing is achieved using precedence constraints. DTS tasks can also be prioritized. This section discusses use of DTS Designer to edit and customize packages.

Defining Transformation Steps

A DTS package is composed of one or more steps, which you can define. A step defines a unit of work that is performed as part of the transformation process. A step can

Execute a SQL statement.

Move and transform homogeneous or heterogeneous data from an OLE DB source to an OLE DB destination, using the DTS Data Pump.

Execute a JScript, PerlScript, or VBScript script. These scripts can perform any operation that their scripting language supports, allowing the implementation of any complex procedural logic that is required. ActiveX scripts can also access and manipulate data by using ActiveX Data Objects (ADO) or any other COM Automation components.

Launch an external program.

Retrieve and execute other DTS packages.

For example, you can create a DTS package that summarizes sales information for each product for a given month. The DTS package drops (if the table already exists) and creates the table at the destination, gets data from the source connection (OLE DB), processes the transformation (summarization), and finally sends the data to the destination connection (OLE DB).

Steps, Tasks, and Precedence Constraints

Workflows control the flow of execution for the package. A package is made up of data connections (sources and destinations) and tasks that are performed using those connections. The action of executing a task is controlled by a step. A step can be subject to one or more precedence constraints. A step with no precedence constraints executes immediately. If a step has precedence constraints, it cannot execute until all of its precedence constraints have been satisfied.

Note the following about working with steps and precedence constraints in DTS Designer:

Steps are represented by either task icons or a solid data transformation arrow between two data connection icons as shown in Figure 8.1.

click to view at full size

Figure 8.1 Steps are represented by task icons and data transformation arrows

Precedence constraints are represented by dashed blue (Completion), green (Success), or red (Failure) arrows as shown in Figure 8.2.

A precedence constraint arrow points from the step that must complete, succeed, or fail to satisfy the precedence constraint; this is called the source step. If the source step is a data transformation, the arrow points from the destination data connection of the data transformation.

A precedence constraint arrow points to the step that is subject to the precedence constraint; this is called the destination step. If the destination step is a data transformation, the arrow points to the source data connection of the data transformation.

click to view at full size

Figure 8.2 Precedence constraint arrows point from a source step to a destination step

To make your DTS Designer diagrams easier to read, you can add the same connection to the diagram more than once. When you do so, specify it as an existing connection. If a connection is involved in more than one transformation, it is recommended that you add the connection once for each transformation. If you do not, the diagram will have a single data connection icon with a large number of arrows (both data transformations and precedence constraints) pointing to and from it.

Figure 8.3 shows a DTS package with two data connections. In this example, there are two data transformations. One of the data transformations must complete before the other one can start, but it is not possible to tell from the diagram which data transformation must complete first.

Figure 8.3 Example of a difficult to read DTS diagram

Figure 8.4 shows a DTS package that performs exactly the same functions as the package in Figure 8.3. However, the two data connections have each been added to the diagram a second time. Now it is perfectly clear from the diagram which data transformation must complete first.

click to view at full size

Figure 8.4 Example of adding data connections to make the DTS diagram easy to read

TIP
Two items in DTS Designer can easily be confused.
The terms source and destination are used to refer to both data connections and steps. When working with a source or a destination, always make sure that you know whether it is a source or destination connection or a source or destination step.

In addition, arrows are used to represent both precedence constraints and data transformations. When working with an arrow in a DTS diagram, make sure that you know whether it is a dashed constraint or a solid data transformation arrow.

Workflow Properties

Steps are defined using the Workflow Properties dialog box. To open this dialog box, right-click on the destination step (either the task icon or the data transformation arrow if the step is a data transformation). Then select Workflow Properties from the context menu. (You may need to point to Workflow to get to the Workflow Properties option.)

The Precedence tab of the Workflow Properties dialog box allows you to add precedence constraints to or remove them from the step. The order of precedence constraints in the list is not important. To add precedence constraints, you can also select the source and destination steps, right-click the destination step, and then click the desired precedence constraint on the Workflow pop-up menu. The Options tab of the Workflow Properties dialog box allows you to set workflow properties such as task priority and transaction management for the task of the destination step.

Precedence Constraint Types

The type of precedence constraint you select determines the requirement for executing a step:

Success indicates that the source step must complete successfully before the destination step begins executing.

Failure indicates that the source step must complete with an indication of failure before the destination step executes.

Completion indicates that the source step must simply complete (with failure or success) before the destination step executes.

NOTE
It is important to keep in mind that success and failure are determined by the execution of the step, not by the results returned by a step. A select statement that returns no data can still complete successfully.

Precedence constraints create a finish/start relationship between the step being executed (the source step) and the step that will be executed next (the destination step). Given two steps A and B, a precedence constraint says:

Step B cannot start until Step A finishes

not

If Step A finishes, Step B will start

This relationship is important when multiple steps are involved, because a step may have many precedence constraints that all must be satisfied before it can execute. A common example of this concept is an automated load of a data mart database (from a data warehouse database) that is dependent upon all steps that load the data warehouse database completing successfully first.

Controlling Step Execution

Under the control of precedence constraints, steps execute in sequence, in parallel, or in a combination of these:

Some steps must execute in a certain sequence.

For example, data can be loaded into a table (Step B) only after the table has been successfully created (Step A).

Multiple steps execute in parallel to improve performance.

For example, a package can load data from multiple heterogeneous data sources into separate tables simultaneously.

Steps can use a combination of sequential and parallel execution.

For example, a package can load data from several heterogeneous files into a set of tables. The loading of data can be done in parallel, but only after the creation of the tables.

Figure 8.5 shows a package with six steps that demonstrate a combination of parallel and sequential execution. Steps A, B, and C do not have precedence constraints and execute immediately. Step D waits for Step A to complete successfully before executing. Step E waits for Step D to complete with a failure before executing. Step F waits for Step C to complete (with success or failure) before executing. Steps A, D, and E execute sequentially. Steps C and F execute sequentially. Steps A (and its sequential steps), B, and C (and its sequential step) execute in parallel.

click to view at full size

Figure 8.5 A package with steps that execute both in parallel and sequentially

Figure 8.6 shows how one step can have multiple precedence constraints. In this case, Step C can execute only when both Step A and Step B have successfully completed.

Figure 8.6 A step with multiple precedence constraints

Conditional Processing

Conditional processing using basic IF-THEN-ELSE logic allows a DTS package to respond to run-time conditions that vary. To implement conditional processing, use a combination of steps with Success and Failure precedence constraints, as shown in Figure 8.7. In the figure, Step B executes only if Step A completes successfully. Step C executes only if Step A fails. Step C would typically send a notification such as an e-mail message or would take corrective action that is in turn the condition for another precedence constraint that then allows the original step to be repeated.

Figure 8.7 Conditional processing using precedence constraints

Specifying Task Priority

By taking advantage of the Windows NT threading model, DTS allows you to specify the priority of tasks. By default, each thread within the DTS package executes at the same priority as the DTS package. However, if some operations are more time-critical than others, you can assign an execution priority to each step. You can specify each step to execute at Idle, Normal, or High priority.

Text Annotations

You can also add text annotations (labels) to your DTS package to make the package more readable in the DTS Designer.

Transforming Data with a Lookup Query

You can use a lookup query instead of a complicated CASE statement in an ActiveX script to transform codes into text or to map values from one format to another. As shown in Figure 8.8, a lookup query uses a value in the source data to retrieve related data from another query.

click to view at full size

Figure 8.8 Using a lookup query in a data transformation

When to Use a Lookup Query

You can use a lookup query when you want to

Validate imported data against information stored in other tables

Use a lookup table to transform values from the source and then load the mapped value from the lookup table into the data warehouse

For example, suppose that your destination table must include the full name for a customer s state; however, the data for state in the source table contains the postal abbreviation (for example, NY rather than New York). A lookup table in a database on a different server contains the name of each state and its postal abbreviation.

You can write a transformation to

Read the source data for the state column

Query the lookup table for the full name

Write the results to the appropriate column in the destination table

You should use a lookup query when

It is not practical to use a distributed query to join imported data with a remote lookup table.

You have a large number of unique values to transform. In this case, it is not practical to use an ActiveX script containing a CASE statement with a large number of branches. In addition, maintaining a lookup table is easier than hard coding an ActiveX script.

Implementing a Lookup Query

When defining a lookup query as part of a DTS data transformation, follow these guidelines:

Set up connections to the source, destination, and lookup tables. You must create separate connections for each.

Create a data transformation and specify the source and destination tables.

Add a lookup definition, including the lookup query.

The lookup query will be a parameterized query of the form:

Syntax

 SELECT column_name  FROM lookup_table  WHERE lookup_key = ?

The column_name is the column that you want to retrieve from the lookup_table, and the lookup_key is the column on which you are searching in the lookup table. The question mark serves as a placeholder for the data column that you want to import and transform. Lookup queries also can contain multiple keys and parameters.

Map the source and destination columns involved in the lookup with an ActiveX script. The ActiveX script calls the lookup query for every row in the source table.

Executing and Scheduling a DTS Package

Each DTS package is self-contained after you create it. A package is a complete description of all of the work to be performed as part of the transformation process. This section describes how to execute and schedule a DTS package that you have created.

Executing a DTS Package

After you save a DTS package, you have three options for retrieving and executing it. These three options are as follows:

Using SQL Server Enterprise Manager

Using the dtsrun command prompt utility

Programmatically using the DTS object model

The example given here shows the dtsrun command prompt utility being used to execute a DTS package that creates and populates a summary table in the Northwind database on the SQL Server named SQLSERVER. The /U option specifies the sa login. If the specified login has a password, it must be specified with the /P option. Note that the name of the package is enclosed in double quotes; this is necessary if the name contains spaces. Remember that the DTS package is a complete description of all of the work to be performed as part of the transformation process.

dtsrun /SSQLSERVER /Usa /N"Northwind Product Totals"

Scheduling a DTS Package

You can schedule a saved DTS package for execution at a specific time, such as daily at midnight, or at recurring intervals, such as on the first or last day of the month or weekly on Sunday at 6:00 a.m.

You can schedule a DTS package for execution in the following ways:

Using the DTS Import or DTS Export Wizards when you save the DTS package to the SQL Server msdb database.

Using SQL Server Enterprise Manager to create a SQL Server job that executes the DTS package by running the dtsrun command prompt utility. Do this manually for packages saved as files. For local packages and repository packages, right-click the package and select Schedule Package from the context menu.

Exercise 1: Creating a DTS Package

In the following series of exercises, you will use the DTS Designer to create a package that loads the Northwind_Mart database with data from the Northwind database.

In this exercise you will create the package, add destination and source data connections, and add a SQL task to clear the data warehouse tables.

Creating a package and adding connections

In the console tree in SQL Server Enterprise Manager, expand your server.

Expand Data Transformation Services.

Right-click Repository Packages, and then select New Package. The DTS Package Designer is displayed.

To create the first destination data connection, drag the Microsoft OLE DB Provider for SQL Server icon to the center of the workspace (the icon can be found on the Data tool palette on the left side of the DTS Designer window). The Connection Properties dialog box appears.

In the New Connection field, type Northwind_OLAP, and select Use Windows NT authentication. Click the Refresh button to refresh the list of databases. From the Database list, select Northwind_Mart.

Click OK.

To create the second destination data connection, drag the Microsoft OLE DB Provider for SQL Server icon to the right of the Northwind_OLAP connection. The Connection Properties dialog box appears.

In the Connection Properties dialog box, select Existing Connection, and in the Existing Connection list, select Northwind_OLAP.

Click OK.

At this point, your workspace has two destination data connections. Now you will make a connection that will be the source for each table that you want to populate.

To create the source data connection, drag the Microsoft OLE DB Provider for SQL Server icon to the workspace and place it above the Northwind_OLAP connections. The Connection Properties dialog box is displayed.

In the New Connection field, enter NW-Product, and select Use Windows NT authentication. From the Database list, select Northwind.

Click OK.

To complete the remaining connections, repeat Steps 10 through 12 for six more connections. Name them NW-Customer, NW-Employee, NW-Shipper, NW-Time, NW-Supplier, and NW-Sales_Fact.

To save the package to the Microsoft Repository

On the toolbar, click the Save button. The Save DTS Package dialog box is displayed.

In the Package Name text box, type Northwind Mart Full Load. Select SQL Server Repository in the Location drop-down list.

Click OK to save the package.

NOTE
When working in production we recommend that you save the package at regular intervals.

To add a SQL task

Drag the Execute SQL Task icon to the workspace to create a task (the icon can be found under the Tasks tool palette on the left side of the DTS Designer window). The Execute SQL Properties dialog box is displayed.

In the Description field, type Clear Dimension Tables, and then select Northwind_OLAP from the Existing Connection drop-down box.

In the SQL Statement field, type the following:

 TRUNCATE TABLE Sales_Fact  DELETE FROM Employee_Dim  DELETE FROM Customer_Dim  DELETE FROM Time_Dim  DELETE FROM Product_Dim  DELETE FROM Shipper_Dim

NOTE
These Transact-SQL statements will remove all data from the online analytical processing (OLAP) database. You must delete all data from the fact table first to avoid referential integrity errors. You cannot use the TRUNCATE TABLE statement with the dimension tables because they are referenced by the foreign key in the Sales_Fact table. You could add code here to drop the constraints before truncating the tables and then add them back after truncating the tables.

Click OK.

Exercise 2: Adding Workflows

In this exercise, you will connect each of the Northwind dimension table connections to the Northwind_OLAP connection located in the center of the work space, and you will connect the NW-Sales_Fact connection to the second Northwind_OLAP connection located on the right of the work space. You will also add precedence constraints that ensure that data is not loaded into the tables until after they have been cleared and to ensure that data is not loaded into the Sales Fact table until after data has been loaded into the dimension tables.

To add data transformations between connections

In this procedure, you will add data transformations that cause data to be transferred from one connection to another.

Click the NW-Product connection to select it.

While pressing the Ctrl key, click the Northwind_OLAP connection in the center of the workspace.

Both icons should now be selected.

Right-click the Northwind_OLAP icon, select WorkFlow, and choose Transform Data. This will create a data transformation from the NW-Product connection to the Northwind_OLAP connection. The data transformation is shown as a solid arrow.

Repeat the above steps for the connections NW-Time, NW-Customer, NW-Shipper, and NW-Employee. Do not add a data transformation for the NW-Supplier connection.

Create one more data transformation, from the NW-Sales_Fact connection to the second Northwind_OLAP connection, located on the right of the workspace.

To add precedence constraints

In this procedure, you will add precedence constraints that ensure that the dimension tables are not populated with data until the SQL Script task has completed successfully.

Click the Clear Dimension Tables task to select it.

While pressing the Ctrl key, click the NW-Product connection.

Both icons should now be selected.

Right-click the NW-Product icon, select WorkFlow, and choose On Success.

This will create a precedence constraint from the Clear Dimension Tables task to the NW-Product connection. The precedence constraint is shown as a dashed green arrow.

NOTE
If you right-click the Clear Dimension Tables SQL Task icon instead of the connection icon, then select On Success, an error is generated. The error says that defining precedences between the selected items is not valid. This is because you cannot define a precedence constraint from the connection to the task; it must be defined from the task to the connection. If you right-click the connection, the error will not be generated.

Repeat the above steps for the connections NW-Time, NW-Customer, NW-Shipper, and NW-Employee. Do not add a precedence constraint for the NW-Supplier connection.

Next, you need to set the precedence constraint for the fact table. The Sales Fact table must not be populated with data until all of the dimension tables have been populated.

Click the Northwind_OLAP connection (in the center of the work space) to select it.

While pressing the Ctrl key, click the NW-Sales_Fact connection.

Both icons should now be selected.

Right-click the NW-Sales_Fact icon, select WorkFlow, and choose On Success. This will create a precedence constraint from the first Northwind_OLAP connection to the NW-Sales_Fact connection. The precedence constraint is shown as a dashed green arrow.

Exercise 3: Transforming Data

In this exercise, you will edit each of the data transformations that you added in the previous exercise. You will create source queries, map sources to destination columns, and specify that data be copied or transformed by an ActiveX script.

To transform the data: Time dimension

Double-click the arrow between the NW-Time and Northwind_OLAP icons.

This will bring up the Data Transformation Properties dialog box. In the Description field on the Source tab, enter Load Time Dimension, and then select SQL Query.

In this procedure, you will use queries that are already built. Using Notepad, open C:\SQLDW\Exercise\Ch08\DTSLabSQL.sql. Highlight the query for the Time dimension. Copy the query and paste it into the SQL Query text box on the Data Transformation Properties dialog box. If this file is not available, type in the following Transact-SQL statement:

 SELECT DISTINCT  S.ShippedDate AS TheDate,   DateName(dw, S.ShippedDate) AS DayOfWeek,   DatePart(mm, S.ShippedDate) AS [Month],   DatePart(yy, S.ShippedDate) AS [Year],   DatePart(qq, S.ShippedDate) AS [Quarter],  DatePart(dy, S.ShippedDate) AS DayOfYear,   'N' AS Holiday,   case DatePart(dw, S.ShippedDate)   when (1) then 'Y'   when (7) then 'Y'   else 'N'   end AS Weekend,   DateName(month, S.ShippedDate) +   '_' + DateName(year,S.ShippedDate) AS YearMonth,  DatePart(wk, S.ShippedDate) AS WeekOfYear   FROM Orders S  WHERE S.ShippedDate IS NOT NULL

Click the Destination tab, and in the Table name list box select [Northwind_Mart].[dbo].[Time_Dim].

Click the Transformations tab.

Note that the columns are mapped for you. The mappings are incorrect because your destination table has a key field that is used for uniqueness. You need to correct the column mappings so that the data is inserted into the correct field.

To quickly unmap the columns, select the first field in the Source table list, and then select the last field in the Source table list while holding down the Shift key. Now, do the same for the Destination table list, and then click Delete. All of the column mapping arrows will be removed.

Select Copy Column from the New transformation drop-down list. This indicates that the column mappings you create will copy data from the source column to the destination.

Click TheDate in the Source table list, and then, while pressing the Shift key, click WeekOfYear in the Source table list. Now click TheDate in the Destination table list; then, while pressing the Shift key, click WeekOfYear in the Destination table list.

Click New to create all of the column mappings.

Click OK to accept the data transformation properties for the Time dimension.

To transform the data: Customer dimension

Double-click the arrow between the NW-Customer and Northwind_OLAP icons.

This will bring up the Data Transformation Properties dialog box. In the Description field, enter Load Customer Dimension, and in the Table name list box, select [Northwind].[dbo].[Customers].

Click the Destination tab, and in the Table name list box, select [Northwind_Mart].[dbo].[Customer_Dim].

Click the Transformations tab.

To quickly unmap the columns, select the first field in the Source table list, and then select the last field in the Source table list while holding down the Shift key. Now do the same for the Destination table list, and then click Delete. All of the column mapping arrows will be removed.

Select Copy Column from the New transformation drop-down list. This indicates that the column mappings you create will copy data from the source column to the destination.

Map each column, except Region, in the Source table list to the corresponding column in the Destination table list by selecting the matching fields from each list and then clicking New.

Select ActiveX Script from the New transformation drop-down list. This indicates that the column mapping you create for the Region column will use an ActiveX script to copy the contents from the source to the destination.

Select the Region field from each list, and then click New.

The ActiveX Script Transformation Properties dialog box is displayed.

Add code that will scrub the data. If a null value is stored in the source column, you need to give it a consistent value so that it can be represented in the data. Delete the default script that appears in the code editor in the ActiveX Script Transformation Properties dialog. Using Notepad, open C:\SQLDW\Exercise\Ch08\DTSLabSQL.sql. Highlight the ActiveX script for the Customer dimension. Copy the script and paste it into the code editor on the ActiveX Script Transformation Properties. If this file is not available, type in the following ActiveX script:

 Function Main()  If IsNull(DTSSource("Region")) Then    DTSDestination("Region") = "Other"  Else    DTSDestination("Region") = DTSSource("Region")  End If  Main = DTSTransformStat_OK  End Function

NOTE
The AutoGen button automatically regenerates the default code that was created when the column mapping was made. If you have transformation script code you want to keep, do not click the Auto Gen button.

TIP
The Test button tests the code. Try clicking the test button. The script executes against the data and generates the list of regions. After the test successfully executes, click the View Results button and you can see the list of Regions that have been generated by the script.

Click OK to accept the code changes.

Click OK to accept the data transformation properties for the Customer dimension.

To transform the data: Product dimension

Double-click the arrow between the NW-Product and Northwind_OLAP icons.

This will bring up the Data Transformation Properties dialog box. In the Description field, enter Load Product Dimension, and then select SQL Query.

Click the Build Query button.

You are now interacting with the query-building tools that will allow you to build a query from the tables in the Northwind database.

From the table listing in the left-hand pane, drag the Products and Categories tables into the Diagram pane.

TIP
If the Diagram pane is not visible, click the Show/Hide Diagram Pane button on the toolbar.

At this point, you should see two tables in the diagram pane with the appropriate relationship set between them. You will now select the fields that will make up your Product dimension.

In the Products table, check the boxes next to the ProductID, ProductName, SupplierID, and UnitPrice columns.

In the Categories table, check the box next to the CategoryName column.

You will now configure the fields so that they will transform correctly during the load.

Click the Show/Hide Grid Pane button on the toolbar to show the Grid pane. You can resize the pane by dragging its border.

In the Grid pane, make the following change:

In the Alias column for UnitPrice column of the Products table, type ListUnitPrice.

To review the results, click the Execute button on the toolbar.

Click OK to save the query and return to the Data Transformation Properties dialog box.

Click the Destination tab, and in the Table name list box select [Northwind_Mart].[dbo].[Product_Dim].

Click the Transformations tab.

To quickly unmap the columns, select the first field in the Source table list, and then select the last field in the Source table list while holding down the Shift key. Now do the same for the Destination table list, and then click Delete. All of the column mapping arrows will be removed.

Select Copy Column from the New transformation drop-down list. This indicates that the column mappings you create will copy data from the source column to the destination.

Click ProductID in the Source table list and in the Destination table list, then click New.

Repeat step 15 for the other matching columns, ProductName, ListUnitPrice, and CategoryName.

TIP
When fields in both lists are in the same order, you can quickly map those columns by selecting the fields in each list and then clicking New. Mapping a group of columns together is called a many-to-many column mapping and makes the data transformation more efficient. When you do this, ensure that the fields are in the same order, or values will be copied to the wrong destination columns.

You will use a lookup query to populate the SupplierName column in the destination table. Before you can create the transformation, you need to add the lookup query. To do this, click the Advanced tab of the Data Transformation Properties dialog box.

Click the Lookups button. The Data Transformation Lookups dialog box is displayed.

Click the Add button to add a new lookup query.

In the Name field, type SupplierLookup. In the Connection drop-down list, select NW-Supplier.

Click the Query ellipsis ( ) button as shown in Figure 8.9. The Data Transformation Services Query Designer window is displayed.

click to view at full size

Figure 8.9 The Data Transformation Lookups dialog box showing the Query ellipsis button

From the table listing in the left-hand pane, drag the Suppliers table into the Diagram pane.

In the Suppliers table, check the boxes next to the SupplierID and CompanyName columns.

Click the Show/Hide Grid Pane button on the toolbar to show the Grid pane.

You can resize the pane by dragging its border.

In the Grid pane, make the following change: Uncheck the Output checkbox for the SupplierID column and type a question mark (?) in the Criteria field for the SupplierID column. The query in the SQL pane should now be as follows, see Figure 8.10:

 SELECT CompanyName  FROM Suppliers  WHERE (SupplierID = ?)

click to view at full size

Figure 8.10 The completed query in the Data Transformation Services Query Designer

Click OK twice to close the Data Transformation Services Query Designer and the Data Transformation Lookups dialog boxes.

Click the Transformations tab to return to specifying column transformations.

Select ActiveX Script from the New transformation drop-down list. This indicates that the column mapping you create will use an ActiveX script to transform data as it is copied from the source column to the destination.

Click SupplierID in the Source table list and SupplierName in the Destination table list, and then click New.

Add code that will use the lookup query to look up the value for the SupplierName column. Delete the default script that appears in the code editor in the ActiveX Script Transformation Properties dialog. Using Notepad, open C:\SQLDW\Exercise\Ch08\DTSLabSQL.sql. Highlight the ActiveX script for the Product dimension. Copy the script and paste it into the code editor on the ActiveX Script Transformation Properties. If this file is not available, type in the following ActiveX script:

 Function Main()  DTSDestination("SupplierName") = _  DTSLookups("SupplierLookup").Execute(DTSSource("SupplierID").Value)  Main = DTSTransformStat_OK  End Function

TIP
Do not try to test this code; the test does not work for code that uses a lookup query.

Click OK twice to close the ActiveX Script Transformation Properties and Data Transformation Properties dialog boxes.

To transform data: Employee dimension

Double-click the arrow between the NW-Employee and Northwind_OLAP icons.

The Data Transformation Properties dialog box pops up. In the Description field, enter Load Employee Dimension, and then in the Table name list box, select [Northwind].[dbo].[Employees] from the list.

Click the Destination tab, and in the Table name list box, select [Northwind_Mart].[dbo].[Employee_Dim] from the list.

Click the Transformations tab.

Note that the columns are mapped for you. The mappings are incorrect because your destination table has a key field that is used for uniqueness. You will need to correct the column mappings so that the data is inserted into the correct field.

To quickly unmap the columns, select the first field in the Source table list and then select the last field in the Source table list while holding down the Shift key. Now, do the same for the Destination table list, and then click Delete. All of the column mapping arrows will be removed.

Select Copy Column from the New transformation drop-down list. This indicates that the column mappings you create will copy data from the source column to the destination.

Select the EmployeeID and HireDate columns in the Source table and Destination table lists, and then click New.

Select ActiveX Script from the New transformation drop-down list. This indicates that the column mapping you create for the EmployeeName column will use an ActiveX script to copy the contents from the source to the destination.

Select the FirstName and LastName columns in the Source table list, select EmployeeName in the Destination table list, and then click New.

The ActiveX Script Transformation Properties dialog box is displayed.

Add code that will concatenate the values of the FirstName and LastName columns to create the value for the EmployeeName column. Delete the default script that appears in the code editor in the ActiveX Script Transformation Properties dialog. Using Notepad, open C:\SQLDW\Exercise\Ch08\DTSLabSQL.sql. Highlight the ActiveX script for the Employee dimension. Copy the script and paste it into the code editor on the ActiveX Script Transformation Properties. If this file is not available, type in the following ActiveX script:

 Function Main()  DTSDestination("EmployeeName") = DTSSource("FirstName") & _  " " & DTSSource("LastName")  Main = DTSTransformStat_OK  End Function

Click OK to accept the code changes.

Click OK to accept the data transformation properties for the Employee dimension.

To transform data: Shipper dimension

Double-click the arrow between the NW-Shipper and Northwind_OLAP icons.

The Data Transformation Properties dialog box will appear. In the Description field enter Load Shipper Dimension, and in the Table name list box select [Northwind].[dbo].[Shippers].

Click the Destination tab, and in the Table name list box select [Northwind_Mart].[dbo].[Shipper_Dim].

Click the Transformations tab.

Note that the columns are mapped for you. The mappings are incorrect because your destination table has a key field that is used for uniqueness. Correct the column mappings so that the data is inserted into the correct field.

To quickly unmap the columns, select the first field in the Source table list and then select the last field in the Source table list while holding down the Shift key. Now, do the same for the Destination table list, and then click Delete. All of the column mapping arrows will be removed.

Select Copy Column from the New transformation drop-down list. This indicates that the column mappings you create will copy data from the source column to the destination.

Select the ShipperID and CompanyName columns in the Source table list and the ShipperID and ShipperName columns in the Destination table list, and then click New.

NOTE
DTS will transform data between fields that do not have the same column names.

Click OK to accept the data transformation properties for the Supplier dimension.

To transform data: Sales Fact

Double-click the arrow between the NW-Sales_Fact and Northwind_OLAP icons at the lower right of the workspace.

This will bring up the Data Transformation Properties dialog box. In the Description field, enter Load Sales Fact Table, and then select SQL Query.

In this procedure, you will use queries that are already built. Using Notepad, open C:\SQLDW\Exercise\Ch08\DTSLabSQL.sql. Highlight the query for the Sales Fact table. Copy the query and paste it into the SQL Query text box on the Data Transformation Properties dialog box. If this file is not available, type in the following Transact-SQL statement:

 SELECT      Northwind_Mart.dbo.Time_Dim.TimeKey,      Northwind_Mart.dbo.Customer_Dim.CustomerKey,      Northwind_Mart.dbo.Shipper_Dim.ShipperKey,      Northwind_Mart.dbo.Product_Dim.ProductKey,      Northwind_Mart.dbo.Employee_Dim.EmployeeKey,            Northwind.dbo.Orders.RequiredDate,            Orders.Freight * [Order Details].Quantity /                     (SELECT SUM(Quantity)            FROM [Order Details] od            WHERE od.OrderID = Orders.OrderID) AS LineItemFreight,      [Order Details].UnitPrice * [Order Details].Quantity AS LineItemTotal,      [Order Details].Quantity AS LineItemQuantity,      [Order Details].Discount * [Order Details].UnitPrice *           [Order Details].Quantity AS LineItemDiscount  FROM Orders   INNER JOIN [Order Details]       ON Orders.OrderID = [Order Details].OrderID   INNER JOIN Northwind_Mart.dbo.Product_Dim       ON [Order Details].ProductID =        Northwind_Mart.dbo.Product_Dim.ProductID   INNER JOIN Northwind_Mart.dbo.Customer_Dim       ON Orders.CustomerID =        Northwind_Mart.dbo.Customer_Dim.CustomerID   INNER JOIN Northwind_Mart.dbo.Time_Dim       ON Orders.ShippedDate = Northwind_Mart.dbo.Time_Dim.TheDate   INNER JOIN Northwind_Mart.dbo.Shipper_Dim       ON Orders.ShipVia = Northwind_Mart.dbo.Shipper_Dim.ShipperID   INNER JOIN Northwind_Mart.dbo.Employee_Dim       ON Orders.EmployeeID =        Northwind_Mart.dbo.Employee_Dim.EmployeeID   WHERE (Orders.ShippedDate IS NOT NULL)

Click the Destination tab, and in the Table name list box, select [Northwind_Mart].[dbo].[Sales_Fact].

Click the Transformations tab and note that the columns are correctly mapped for you.

Click OK to accept the data transformation properties for the Sales Fact table.

Exercise 4: Saving and Executing the Package

In this exercise, you will set properties of the package to enable saving data lineage and database metadata to the Microsoft Repository. Then you will save the package to the Microsoft Repository, execute the package to load the data warehouse, and view the data transformation metadata in the Metadata Browser.

To set package properties

Ensure that none of the items in the DTS Designer workspace is selected by clicking any open area in the workspace. On the toolbar, click the Properties button.

The DTS Package Properties dialog box should be displayed. If another property s dialog box is displayed, repeat this step, making sure that none of the items in the workspace is selected.

Click the Advanced tab as shown in Figure 8.11. In the Lineage section, check the Show lineage variables as source columns and Write lineage to repository check boxes.

This causes DTS to make the lineage ID available as columns in the data transformations. Each time that the DTS package is executed, a new lineage ID is generated. You can create columns in your destination tables to record these values, making it possible to track the lineage of every row in your destination tables. You will not create columns in the Northwind_Mart tables to record this lineage at the row level, but by checking these check boxes, you will still be able to track each execution of the package.

click to view at full size

Figure 8.11 The Advanced tab in the DTS Package Properties dialog box

Click the Options button in the Scanning Options section. The Scanning Options dialog box is displayed.

Check the Resolve package references to scanned catalog metadata check box.

Select the Scan all referenced catalogs into repository and Scan catalog if not already present in repository options. See Figure 8.12.

This causes DTS to populate the repository with the metadata for each of the catalogs (databases) that is referenced by the package. The metadata for the tables in the Northwind and Northwind_Mart databases used by this package will then be available in the repository. It is now possible for repository tools to retrieve information about the databases and the data lineage from the repository.

Figure 8.12 The Scanning Options dialog box

Click OK twice to close the Scanning Options and DTS Package Properties dialog boxes.

To execute the package

On the toolbar, click the Save button to save the package.

To execute the package manually, click the Execute button on the toolbar.

During the execution, you will see the status of execution displayed as Execution progress and Execution status indicators.

Once complete, the Package Execution Results dialog box that indicates that the execution was successful will be displayed. Click OK to close this dialog box.

Click Done to close the Executing DTS Package dialog box.

Repeat Steps 2, 3, 4 and 5 two more times. This generates more data lineage information for you to view later.

Exercise 5: Querying the Northwind_Mart Database

In this exercise, you will customize the complex query used to retrieve summary data from the Northwind transaction processing database in a previous chapter to retrieve the same data from the Northwind_Mart database. This demonstrates how the denormalized structure reduces the complexity and problems in retrieving summary data from an OLAP-driven star schema, because it is more efficient for querying and reporting due to the lower number of joins.

Query: Show the total number of units sold of each product, where the required date of the order is earlier than today.

To query the Northwind data mart

Open the C:\SQLDW\Exercise\Ch08\OLAPSelect.sql script file in SQL Server Query Analyzer. The query is as follows:

 SELECT           Product_Dim.ProductName,     Product_Dim.CategoryName,           Product_Dim.SupplierName,           SUM(Sales_Fact.LineItemQuantity) AS [Total Units Sold]  FROM Sales_Fact   INNER JOIN Product_Dim           ON Sales_Fact.ProductKey = Product_Dim.ProductKey           GROUP BY     Product_Dim.ProductName,     Product_Dim.CategoryName,           Product_Dim.SupplierName,           Sales_Fact.RequiredDate  HAVING (Sales_Fact.RequiredDate < getdate())

To make sure that the query is run against the Northwind_Mart database, select Northwind_Mart in the DB drop-down list.

Execute the query.

On the Query menu, select Show Execution Plan to show the actual steps that the query optimizer takes to retrieve the data from the database.

Execute the query again, and then click the Execution Plan tab in the results pane.

Notice the reduced number of joins, steps, and tables that are scanned to return the data requested by the query.

Exercise 6: Browsing the Microsoft Repository

In this exercise, you will browse information about your databases and DTS package in the Microsoft Repository.

To browse the Microsoft Repository

In the console tree of the SQL Server Enterprise Manager, expand your server, and then expand Data Transformation Services.

Click Metadata to display the Metadata Browser in the details pane. You may need to right-click Metadata, and then click Refresh on the pop-up menu to update information in the Metadata Browser.

In the Metadata Browser, you will see the Northwind_Mart and Northwind databases. These entries are the metadata that has been collected (scanned) from the databases into the Microsoft Repository database (stored in msdb). Any repository tool could browse this information. Click the + next to each of the databases and the subsequent levels to expand the metadata hierarchy for each database.

Click Package at the top of the Metadata Browser in the details pane.

In the Package Browser, you will see the Northwind Mart Full Load package. Click the + next to the Northwind Mart Full Load package. The versioned on <date> <time> entry shows when you saved the package. Each time you update a package, a new version is saved.

Click the + next to the versioned <date> <time> entry. Three entries showing when the package has been executed will be listed. This is the high-level lineage that gives you a full history of each execution of the package.

Click one of the executed on <date> <time> entries. Details about the package execution are displayed.

TIP
You can edit the package Description and Comments. To save the changes, click Update.

Notice the Lineage Long and Lineage Short entries below. These values are available to data transformations. In a data transformation, you can store one or both of these values to the rows in your destination tables. This allows you to locate the lineage information for any row.

Lesson Summary

DTS packages are saved data transformation definitions that are processed by SQL Server Data Transformation Services. DTS packages can be saved in files, in SQL Server, or in the Microsoft Repository.

A DTS package is composed of one or more steps, which you can define. A step is a unit of work with many possible actions. With DTS, you can control the execution sequence of each step. Precedence constraints and conditional processing allow you to prioritize steps.