Creating a Data Load Package in DTS Designer


The next step in building your data movement application is to load historical data into the dimension and fact tables that will constitute the data warehouse. Some of the information required to populate data warehouse tables can simply be copied from the production database or from historical data archived to delimited text files. Information such as certain dimension data will frequently have to be imported from secondary data stores such as Excel spreadsheets. Other information for dimension and fact tables will have to be derived from existing tables using ActiveX transformations and table lookups. In addition, to assure that you have clean data, you need to perform error-checking and data validation as part of the data load process. DTS can perform all these tasks for you. You have already learned how to perform basic data imports and transformations. In later chapters, you will learn how to perform the more complex tasks required for this application.

In the procedures in this section, you will create a package that contains the necessary tasks to load data into the CustomerDim , ProductDim , and SalesFact tables. The data for the CustomerDim and ProductDim tables will be loaded directly from tables in the SBS_OLTP database. The data for the SalesFact table will first be loaded into the SalesStage table from delimited text files containing sales data that has been extracted from the SBS_OLTP database for 1996 and 1997. (This simulates a data feed from a heterogeneous data store.) This data will then be loaded into the SalesFact table using a multiple table join with the CustomerDim, ProductDim , and TimeDim tables. The multiple table join is needed to link each row in the SalesFact table with a row in each of these three tables.

Important  

In a production environment, the data warehouse tables and its staging tables are generally placed on a separate computer from the computer used for the production database (for performance reasons). You should design your data movement application to minimize the effect on the production system when extracting data from the production system. In general, this means performing all data transformations on either the data warehouse computer or a staging computer, and extracting the data from the production system using a method that consumes the fewest resources on the production system. Typically, such a method would be copying the data unchanged directly to staging tables or to text files that are then loaded into staging tables.

Create a new package and then save it

In this procedure, you will create a new package for the connection objects, tasks, and precedence constraints required to load the historical data, and then you will save it.

  1. In the console tree, right-click Data Transformation Services in your local SQL Server instance and then click New Package.

  2. On the toolbar, click Save.

  3. Type LoadHistoricalData in the Package Name box, type mypassword in the Owner Password box, select Structured Storage File in the location list, type C:\Microsoft Press\SQL DTS SBS\DataMovementApplication\LoadHistoricalData.dts in the File Name box, and then click OK to save this package.

  4. In the Password box, type mypassword again to confirm the owner password and then click OK.

  5. Click OK to acknowledge that the package cannot be executed without the owner password.

Your first step in configuring the LoadHistoricalData package will be to create connection objects for the tasks that you will add to this package.

Create connection objects to the SBS_OLTP database

In this procedure, you will create two connection objects to the SBS_OLTP database. These connection objects will be used by two separate tasks that extract data from the Products and Customers tables. Because there are no dependencies between these tasks, performance will be enhanced if these tasks execute in parallel. To enable parallel processing of these tasks, you will create two separate connection objects to the SBS_OLTP database.

  1. On the Connection menu, click Microsoft OLE DB Provider For SQL Server.

  2. Type ProductsSource in the New Connection text box, ensure that (local) appears in the Server list, ensure that Windows Authentication is selected, select SBS_OLTP in the Database list, and then click OK.

  3. On the Connection menu, click Microsoft OLE DB Provider For SQL Server.

  4. Type CustomersSource in the New Connection text box, ensure that (local) appears in the Server list, ensure that Windows Authentication is selected, select SBS_OLTP in the Database list, and then click OK.

Create connection objects to the SBS_OLAP database

In this procedure, you will create six connection objects to the SBS_OLAP database. The first three connection objects will be used initially by three separate tasks that load data into the ProductDim , CustomerDim , and SalesFact tables. This package will also contain a task that loads data into the SalesStage table, but since the data that will populate the SalesStage table is being extracted from two text files, you will create two separate connection objects to the SBS_OLAP database to enable the loading of this data in parallel.

Finally, you will create an additional connection object that will be used for two separate tasks that do not execute at the same time. A Transform Data task will use this connection object to extract data from the SalesStage table and load the SalesFact table. Another task will use the same connection object to delete all data from the ProductDim , CustomerDim , SalesStage , and SalesFact tables. This task will enable you to re-execute this package multiple times during development without duplicating data. These two tasks can share one connection object because they will never be running at the same time.

  1. On the Connection menu, click Microsoft OLE DB Provider For SQL Server.

  2. Type ProductsDestination in the New Connection text box, ensure that (local) appears in the Server list, ensure that Windows Authentication is selected, select SBS_OLAP in the Database list, and then click OK.

  3. On the Connection menu, click Microsoft OLE DB Provider For SQL Server.

  4. Type CustomersDestination in the New Connection text box, ensure that (local) appears in the Server list, ensure that Windows Authentication is selected, select SBS_OLAP in the Database list, and then click OK.

  5. On the Connection menu, click Microsoft OLE DB Provider For SQL Server.

  6. Type SalesFactDataDestination in the New Connection text box, ensure that (local) appears in the Server list, ensure that Windows Authentication is selected, select SBS_OLAP in the Database list, and then click OK.

  7. On the Connection menu, click Microsoft OLE DB Provider For SQL Server.

  8. Type 1996DataDestination in the New Connection text box, ensure that (local) appears in the Server list, ensure that Windows Authentication is selected, select SBS_OLAP in the Database list, and then click OK.

  9. On the Connection menu, click Microsoft OLE DB Provider For SQL Server.

  10. Type 1997DataDestination in the New Connection text box, ensure that (local) appears in the Server list, ensure that Windows Authentication is selected, select SBS_OLAP in the Database list, and then click OK.

  11. On the Connection menu, click Microsoft OLE DB Provider For SQL Server.

  12. Type SBS_OLAPAdditionalConnection in the New Connection text box, ensure that (local) appears in the Server list, ensure that Windows Authentication is selected, select SBS_OLAP in the Database list, and then click OK.

    click to expand
    Tip  

    When you create multiple objects on the design sheet, these objects might appear one on top of the other. Use your cursor to drag each icon to a separate location on the design sheet to ensure each is visible. You will also find it useful to arrange the icons to represent the flow of data between the connection objects.

  13. On the toolbar, click Save.

Now that you have created all the required connection objects, you are ready to create the tasks to load the historical data into the SBS_OLAP database.

Create the Load CustomerDim Table Transform Data task

In this procedure, you will create a Transform Data task that uses a query to extract data from the Customers and Orders tables in the SBS_OLTP database. This task will then insert these results into the CustomerDim table in the SBS_OLAP database.

  1. On the Task menu, click Transform Data task.

    You are prompted to choose the connection object that this Transform Data task will use to obtain the source data for the transformation. In this procedure, you will query the SBS_OLTP database using the dedicated connection object you previously created for this purpose.

    click to expand
  2. Click the CustomersSource connection object.

    You are prompted to choose the connection object that this Transform Data task will use as the destination for the extracted data. In this procedure, you will copy the results of the query to the SBS_OLAP database using the dedicated connection object you previously created for this purpose.

    click to expand
  3. Click the CustomersDestination connection object.

    An undefined Transform Data task appears on the design sheet between these two connection objects, represented by a black directional arrow.

  4. Double-click this new Transform Data task to begin defining its properties.

    click to expand
  5. In the Description box, type Load CustomerDim Table , click SQL Query, and then click Browse to load a query from an existing script file to use for this Transform Data task.

  6. Navigate to C:\Microsoft Press\SQL DTS SBS\Ch2\ChapterFiles in the Look In list, and then double-click LoadCustomers.sql.

    This query performs an INNER JOIN between the Customers table and the Orders table in the SBS_OLTP database to retrieve a distinct list of all customers that ordered products between July 1, 1996, and December 31, 1997. This task will use the result of this query to populate the CustomerDim table in the SBS_OLAP database.

  7. Click the Destination tab and then select [SBS_OLAP].[dbo].[CustomerDim] in the Table Name list.

    The schema for the CustomerDim table is displayed.

  8. Click the Transformations tab.

    DTS attempts to map the columns generated by the source query to the columns in the destination table, beginning with the first column in each. However, the mapping is inappropriate for this transformation. The first column in the destination table is defined using the IDENTITY property (which generates an integer surrogate key) and should not have any column mapped to it.

    The first column in the source query should be mapped to the second column in the destination table, the second column in the source query should be mapped to the third column in the destination table, and so on. In addition, notice that DTS has configured each column transformation to use separate COM objects. As discussed previously, when transforming many columns, DTS executes faster if a single COM object is used rather than a separate COM object for each transformation. In this procedure, you will delete all existing transformations and then create the appropriate transformations using a single COM object.

    click to expand
  9. Click Delete All, click Select All, and then click New to specify a transformation type.

    The predefined transformation types enable you to copy columns unchanged, convert dates from a datetime format into a different format, convert strings in a number of ways, extract a substring, read a file, or write a file. You can also use an ActiveX script to create a custom transformation. (You will begin learning about using ActiveX transformations in Chapter 4.)

    Because you will not be changing any data as it is copied from the source query to the destination table, you will use the Copy Column transformation option.

  10. Click Copy Column and then click OK.

  11. In the Name box, type Copy Column and then click the Source Columns tab.

    The list of columns from the source query appears in the Available Columns list as well as in the Selected Columns list because you clicked Select All for this transformation. In this procedure, you will map all the columns generated by the source query to columns in the destination table.

    click to expand
  12. Click the Destination Columns tab.

    The list of all available columns in the destination table appears in the Available Columns list as well as in the Selected Columns list because you clicked Select All for this transformation. In this procedure, you will map all the columns in the destination table except for the CustomerKey column.

    click to expand
  13. In the Selected Columns list, click the CustomerKey column, click the left angle bracket (<) to remove this column, and then click OK.

    The Transform Data Task Properties dialog box reappears displaying the new transformation on the Transformations tab. All the column transformations use only a single COM object, and the first column in the destination table is not mapped to any column in the source query.

    click to expand
  14. Click OK to save this transformation.

You have successfully created your first Transform Data task using DTS Designer.

Create the Load ProductDim Table Transform Data task

In this procedure, you will create a Transform Data task that extracts data from the Products and Categories tables in the SBS_OLTP database using a source query and inserts the results into the ProductDim table in the SBS_OLAP database.

  1. On the Task menu, click Transform Data Task.

  2. Click the ProductsSource connection object.

  3. Click the ProductsDestination connection object.

  4. Double-click this new Transform Data task to begin defining its properties.

  5. In the Description box, type Load ProductDim Table , click SQL Query, and then click Browse.

  6. Navigate to C:\Microsoft Press\SQL DTS SBS\Ch2\ChapterFiles in the Look In list, and then double-click LoadProducts.sql.

  7. Click the Destination tab and then select [SBS_OLAP].[dbo].[ProductDim] in the Table Name list.

  8. Click the Transformations tab.

    The automatic mapping needs to be fixed for this task as well.

  9. Click Delete All, click Select All, and then click New.

  10. Click Copy Column and then click OK.

  11. In the Name box, type Copy Column and then click the Source Columns tab.

  12. Ensure that the Selected Columns list includes all the columns in the Available Columns list and then click Destination Columns.

  13. In the Selected Columns list, click the ProductKey column, click < to remove this column, and then click OK.

  14. Review the mappings and then click OK.

You have successfully created your second Transform Data task. You are now ready to extract data from the delimited text files containing the 1996 and 1997 sales data.

Create the Load 1996 Sales Data and the Load 1997 Sales Data tasks

In this procedure, you will create the two Bulk Insert tasks that load data from delimited text files into the SalesStage table in the SBS_OLAP database. Two separate tasks are used to enable the table to be loaded in parallel from two separate text files. The Bulk Insert task does not require a separate connection object for its source data. You actually create the connection to the text file as part of the configuration of the Bulk Insert task.

Important  

The Bulk Insert task is the fastest way to import data from delimited text files into SQL Server using DTS because it just generates an OLEDBCommand object with the Transact-SQL BULK INSERT statement in it, and it runs in the same process space as SQL Server. The Transform Data task with the Fastload option uses the IRowsetFastLoad OLEDB interface from inside in the DTS process space, which means that there is context switching between DTS and SQL Server. However, the Bulk Insert task works only with delimited text files as the data source and SQL Server as the destination, whereas the Transform Data task works with a wide range of data sources and destinations.

On a related note, the Execute SQL task is faster than the Transform Data task for the same reason (namely that no context switching is required).

  1. On the Task menu, click Bulk Insert Task.

  2. In the Description box, type Load 1996 Sales Data .

  3. In the Existing Connection list, select 1996DataDestination.

  4. In the Destination Table list, select [SBS_OLAP].[dbo].[SalesStage].

  5. In the Source Data File box, type C:\Microsoft Press\SQL DTS SBS\DataMovementApplication\1996SalesData.txt .

  6. Click Use Format File and then type C:\Microsoft Press\SQL DTS SBS\DataMovementApplication\SalesData.fmt in the Select Format File box.

    click to expand
  7. Click the Options tab, select the Insert Batch Size check box, type 500 in the list box, and then click OK.

    click to expand
    Tip  

    When you use the Bulk Insert task, you can use a variety of data insertion options provided by DTS so that you can control functionality. Specifying a batch size allows you to control the number of records committed at one time at the destination (an error in a batch rolls back the entire batch, not the entire insert). To maximize performance, constraints on the destination table are not checked unless you select the Check Constraints check box. If the data in the data source has been sorted on a particular column, identifying this to DTS (by selecting the Sorted Data check box and specifying the sort column) can increase the insert performance when the destination table has an index on the same column. Specifying a table lock can increase performance, but at the expense of other users if they will be accessing the destination table during the insert operation. Selecting the Enable Identity Insert check box allows inserts into a column with identities. Selecting the Keep Nulls check box retains nulls in the source data when a default value has been defined on the column in the destination table.

  8. On the Task menu, click Bulk Insert Task.

  9. In the Description box, type Load 1997 Sales Data .

  10. In the Existing Connection list, select 1997DataDestination.

  11. In the Destination Table list, select [SBS_OLAP].[dbo].[SalesStage].

  12. In the Source Data File box, type C:\Microsoft Press\SQL DTS SBS\DataMovementApplication\1997SalesData.txt .

  13. Click Use Format File. In the Select Format File box, type C:\Microsoft Press\SQL DTS SBS\DataMovementApplication\SalesData.fmt .

  14. Click the Options tab, select the Insert Batch Size check box, type 500 in the list box, and then click OK.

You have successfully created two Bulk Insert tasks that will load the SalesStage table in parallel from structured text files. You are now ready to create the Transform Data task that loads the SalesFact table using the data in the SalesStage , TimeDim , CustomerDim , and ProductDim tables.

Create the Load Sales Fact Table task

In this procedure, you will create a Transform Data task that extracts sales data from the SalesStage table, joins that data with key values from the TimeDim , ProductDim , and CustomerDim tables, and then inserts the result into the SalesFact table.

  1. On the Task menu, click Transform Data Task.

  2. Click the SBS_OLAPAdditionalConnection connection object to select it as the data source.

  3. Click the SalesFactDataDestination connection object to select it as the data destination.

  4. Double-click this new Transform Data task to begin defining its properties.

  5. In the Description box, type Load SalesFact Table , click SQL Query, and then click Browse.

  6. Navigate to C:\Microsoft Press\SQL DTS SBS\Ch2\ChapterFiles in the Look In list, and then double-click LoadSalesFact.sql.

    This SQL query performs inner joins among the TimeDim , ProductDim , CustomerDim , and SalesStage tables to generate the result set that DTS will use to populate the SalesFact table.

    click to expand
  7. Click the Destination tab and then select [SBS_OLAP].[dbo].[SalesFact] in the Table Name list.

  8. Click the Transformations tab and review the auto-mapped transformations.

    The auto-mapping is correct, and, since only five columns are involved, you will not change this transformation to use a single COM object.

    click to expand
  9. Click OK to save this transformation task.

    Tip  

    Because this Transform Data task actually makes no transformations, the data source is a SQL Server database, and the data destination is a SQL Server database on the same instance, this task could also have been configured as an Execute SQL task. Furthermore, an Execute SQL task will generally perform better than a Transform Data task. However, as you will discover in Chapter 8, the Transform Data task has some benefits in terms of error handling and execution reporting.

You have now added each of the Transform Data tasks required by this package. However, before you add the precedence constraints required by this package, you will add an Execute SQL task that enables this package to be executed repeatedly without duplicating data in the SBS_OLAP database.

Create the Truncate Tables task

In this procedure, you will create an Execute SQL task that truncates all data in the dimension and fact tables before data is loaded into them. Without this task, each time you executed this package, you would reload the same data (generating duplicate data). In addition, by using the TRUNCATE TABLE statement, the IDENTITY property in each table is also reset.

  1. On the Task menu, click Execute SQL Task.

  2. In the Description box, type Truncate Data .

  3. In the Existing Connection list, click SBS_OLAPAdditionalConnection.

    Any connection object that defines a connection to the SBS_OLAP database could be used because this task will become the first task that executes in this package and it will complete before any other task requires the use of this connection object.

  4. Click Browse.

  5. Navigate to C:\Microsoft Press\SQL DTS SBS\Ch2\ChapterFiles in the Look In list and then double-click TruncateTables.sql.

    This SQL query truncates the ProductDim , CustomerDim , SalesStage , and SalesFact tables. To do so, it first drops the foreign key constraints on the SalesFact table using an ALTER TABLE statement. After the TRUNCATE TABLE statements have completed, the script adds these foreign key constraints back to the SalesFact table.

    click to expand
  6. Click OK to save the configuration of this Execute SQL task.

You have successfully added the final task required by this package. You are now ready to add the precedence constraints necessary to enable the tasks in this package to execute in the proper order.

Create the precedence constraints

In this procedure, you will use On Success constraints to ensure that the first step that executes is the Truncate Data step, and that the Load SalesFact Table step executes after all other steps have completed successfully. The intermediate steps will execute in parallel.

  1. On the design sheet, click the Truncate Data step, and then hold down the Ctrl key while you click the CustomersSource connection object.

  2. Right-click the CustomersSource connection object, point to Workflow, and then click On Success.

  3. On the design sheet, click the Truncate Data step, and then hold down the Ctrl key while you click the ProductsSource connection object.

  4. Right-click the ProductsSource connection object, point to Workflow, and then click On Success.

  5. On the design sheet, click the Truncate Data step, and then hold down the Ctrl key while you click the Load 1996 Sales Data step.

  6. Right-click the Load 1996 Sales Data step, point to Workflow, and then click On Success.

  7. On the design sheet, click the Truncate Data step, and then hold down the Ctrl key while you click the Load 1997 Sales Data step.

  8. Right-click the Load 1997 Sales Data step, point to Workflow, and then click On Success.

  9. On the design sheet, click the CustomersDestination connection object and hold down the Ctrl key while you click the SBS_OLAPAdditionalConnection connection object.

  10. Right-click the SBS_OLAPAdditionalConnection connection object, point to Workflow, and then click On Success.

  11. On the design sheet, click the ProductsDestination connection object, and then hold down the Ctrl key while you click the SBS_OLAPAdditionalConnection connection object.

  12. Right-click the SBS_OLAPAdditionalConnection connection object, point to Workflow, and then click On Success.

  13. On the design sheet, click the Load 1996 Sales Data step, and then hold down the Ctrl key while you click the SBS_OLAPAdditionalConnection connection object.

  14. Right-click the SBS_OLAPAdditionalConnection connection object, point to Workflow, and then click On Success.

  15. On the design sheet, click the Load 1997 Sales Data step, and then hold down the Ctrl key while you click the SBS_OLAPAdditionalConnection connection object.

  16. Right-click the SBS_OLAPAdditionalConnection connection object, point to Workflow, and then click On Success.

    click to expand
  17. On the toolbar, click Save.

You have now completed the package that populates the CustomerDim and ProductDim tables, loads historical sales data into the SalesStage table, and then populates the SalesFact table by joining the data in the first three tables with the data in the TimeDim table. You are now ready to execute this package.

Executing the LoadHistoricalData package

In this procedure you will execute the LoadHistoricalData package.

  1. On the toolbar, click Execute.

  2. Click OK.

    This package added 88 rows to the CustomerDim table, 77 rows to the ProductDim table, and 1464 rows to the SalesFact table.

  3. Close SQL Server Enterprise Manager.

You have successfully executed the LoadHistoricalData package. Before you learn how to add additional functionality to the data movement application that you are building, you will learn about saving, securing, executing, and scheduling packages.




Microsoft SQL Server 2000 DTS Step by Step
Microsoft SQL Server 2000 DTS Step by Step (Step by Step (Microsoft))
ISBN: 0735619166
EAN: 2147483647
Year: 2003
Pages: 87
Authors: Carl Rabeler

Similar book on Amazon

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net