Transforming data with DTS involves planning and designing the transformation and creating and executing a DTS package. This lesson takes you through this process.
After this lesson, you will be able to
- Describe the design and planning steps taken before using DTS
- Describe the data transformation process used by DTS
- Use DTS Designer to create, edit, and save complex data transformations
Estimated lesson time: 105 minutes
Data transformation involves formatting and modifying data that is extracted from the data source into merged or derived values that are more useful at the destination. New values can easily be calculated from one or more columns in the source rowset, and a single source column can be decomposed into multiple destination columns.
DTS allows you to specify the attributes of the destination columns and to indicate how the source columns are mapped to the destination columns. Transformation flags specify whether data of one type in the source can be converted to another type in the destination. For example, you can allow data type promotion, such as converting 16-bit integers to 32-bit integers, or data type demotion, such as converting 32-bit integers to 16-bit integers (data may be lost in this case). You can also require an exact match between source and destination data types.
Each database defines its own data types as well as its column and object naming conventions. DTS attempts to define the best possible data type matches between a source and destination. However, you can override DTS mappings and specify different destination data type, size, precision, and scale properties.
You can merge and separate data in several ways:
At the File Level
You can combine information from multiple sources and place it into a single table, or you can take information from a single source and place it into multiple tables. Heterogeneous joins are an example of combining multiple sources into a single result set that is saved at the destination.
At the Column Level
You can combine information from multiple columns and place it into a single column, or you can take information from a single column and place it into multiple columns. For example, you can summarize monthly sales totals for each product, or you can decompose a phone number that is stored in one column in order to store the area code in one column and the phone number in another.
A DTS package is composed of one or more steps, which you can define. A step defines a unit of work that is performed as part of the transformation process. A step can
For example, you can create a DTS package that summarizes sales information for each product for a given month. The DTS package drops (if the table already exists) and creates the table on the destination, gets data from the source connection (OLE DB), processes the transformation (summarization), and finally sends the data to the destination connection (OLE DB).
When you use DTS to transfer and transform data, you create DTS packages that describe all of the work to be performed as part of the transformation process. You can create the DTS package interactively, using the DTS wizards or DTS Designer, or programmatically, using a language that supports OLE Automation, such as Microsoft Visual Basic.
Saving the DTS package allows you to modify it, reuse it, or schedule it for later execution. If you do not save the DTS package, it executes immediately. You must save the DTS package if you want to schedule it for later execution.
You can save a DTS package that you create in one of three ways:
Saving a DTS package in the Microsoft Repository makes package metadata reusable and available to other applications. Using the Microsoft Repository also enables you to track data lineage at the package and row level of a table, which allows you to determine the source of each piece of data and the transformations that are applied to it.
When you save a package to the Microsoft Repository, the package is stored in the msdb database, but package metadata can be imported into the repository. Packages stored in this way are listed in the Repository Packages node under Data Transformation Services in the console tree. After you import the package metadata into the repository, you can view it using the Metadata Browser in the Metadata node under Data Transformation Services in the console tree.
You can encrypt DTS packages that are saved to SQL Server or to COM-structured storage files in order to protect sensitive user names and password information. When a DTS package is encrypted, all of its collections and properties are encrypted, except for the package name, description, ID, version, and creation date.
DTS packages provide two levels of security: owner password and operator password.
Owner Password
The owner password level of security provides complete access to all collections and properties. By default, DTS packages without owner passwords are not encrypted and can be read by any user with access to the package.
Operator Password
The operator password level of security allows a user to execute but not edit or view the package definition. If an operator password is specified, an owner password must be supplied.
With DTS, you can define a workflow that controls the execution sequence of each step. Control of flow logic and conditional processing is achieved using precedence constraints. DTS tasks can also be prioritized. This section discusses using DTS Designer to edit and customize packages.
Workflows control the flow of execution for the package. A package is made up of data connections (sources and destinations) and tasks that are performed using those connections. The action of executing a task is controlled by a step. A step can be subject to one or more precedence constraints. A step with no precedence constraints executes immediately. If a step has precedence constraints, it cannot execute until all of its precedence constraints have been satisfied.
In DTS Designer, steps are represented by task icons and a solid Data Transformation arrow between two data connection icons. Precedence constraints are represented by dashed blue (Completion), green (Success), or red (Failure) arrows. An arrow points to the step that is subject to the precedence constraint; this is called the destination step. If the destination step is a data transformation, the arrow points to the source data connection of the step. An arrow points from the step that must be completed to satisfy the precedence constraint; this is called the source step. If the source step is a data transformation, the arrow points from the destination data connection of the step.
To make your DTS Designer diagrams easier to read, you can add the same connection to the diagram more than once. When you do so, specify it as an existing connection. If a connection is involved in more than one transformation, it is recommended that you add the connection once for each transformation. If you do not, the diagram will have a single data connection icon with a large number of arrows (both data transformations and precedence constraints) pointing to and from it.
NOTE
The terms source and destination are used to refer to both data connections and steps in DTS Designer. When working with a source or a destination, always make sure that you know whether it is a source or destination connection or a source or destination step.In addition, arrows are used to represent both precedence constraints and data transformations in DTS Designer. When working with an arrow in a DTS diagram, make sure that you know whether it is a constraint or a Data Transformation arrow.
Steps are defined using the Workflow Properties dialog box. To open this dialog box, right-click on the destination step (either the task icon or the Data Transformation arrow if the step is a data transformation step). Then select Workflow Properties (you may need to point to Workflow to get to the Workflow Properties option) from the context menu. You can also open the Workflow Properties dialog box by right-clicking a precedence constraint arrow, but this is not recommended, as it is confusing and does not allow access to the General tab.
The Precedence tab of the Workflow Properties dialog box allows you to add precedence constraints to or remove them from the step. The order of precedence constraints in the list is not important. You can also add precedence constraints from the Workflow menu or from the toolbar in DTS Designer, but this is not recommended, as it is easy to confuse the destination and source steps using this method. The General tab of the Workflow Properties dialog box allows you to set workflow properties such as task priority and transaction management for the task of the destination step.
The type of precedence constraint you select determines the requirement for executing a step:
Precedence constraints create a finish-start relationship between the step being executed (the source step) and the step that will be executed (the destination step). Given two steps A and B, a precedence constraint says
Step B cannot start until step A finishes |
not
If step A finishes, step B will start |
This is important when multiple steps are involved, as a step may have many precedence constraints that all must be satisfied before it can execute.
Under the control of precedence constraints, steps execute in sequence, in parallel, or in a combination of these:
For example, data can be loaded into a table (step B) only after the table has been successfully created (step A).
For example, a package can load data from Oracle and DB2 into separate tables simultaneously.
For example, a package can load data from several different heterogeneous files into a set of tables. The loading of data can be done in parallel, but only after the creation of the tables.
Figure 6.2 shows a package with six steps that demonstrate a combination of parallel and sequential execution. Steps A, B, and C have no precedence constraints and execute immediately. Step D waits for step A to complete successfully before executing. Step E waits for step D to complete with a failure before executing. Step F waits for step C to complete (with success or failure) before executing. Steps A, D, and E execute sequentially. Steps C and F execute sequentially. Steps A (and its sequential steps), B, and C (and its sequential step) execute in parallel.
Figure 6.2 A package with steps that execute both in parallel and sequentially
Figure 6.3 shows how one step can have multiple precedence constraints. In this case, step C can execute only when both step A and step B have successfully completed.
Figure 6.3 A step with multiple precedence constraints
Conditional processing using basic IF-THEN-ELSE logic allows a DTS package to respond to run-time conditions that vary. To implement conditional processing, use a combination of steps with Success and Failure precedence constraints, as shown in Figure 6.4. In the figure, step B executes only if step A completes successfully. Step C will execute if step A fails. Step C would typically send a notification such as an e-mail message or would take corrective action that is in turn the condition for another precedence constraint that then allows the original step to be repeated.
Figure 6.4 Conditional processing using precedence constraints
You can specify the priority of tasks. By default, each thread within the DTS package executes at the same priority as the DTS package. However, if some operations are more time-critical than others, you can assign an execution priority to each step. You can specify each step to execute at Idle, Normal, or High priority.
In this exercise, you will create a package using DTS Designer. The package will transfer some of the columns from the Products table in an Access database to a new table in a SQL Server database. The package will use a custom data transformation to look up and then convert data values from the Access table to new values in the SQL Server table. You will use an ActiveX script to convert the values.
|
CREATE TABLE StudyNwind..ProductsCopy ( ProductID int NOT NULL , ProductName nvarchar (40) NOT NULL , QuantityPerUnit nvarchar (20) NULL , UnitPrice money NULL , IsAnimal char (3) NULL ) |
SELECT CategoryName FROM Categories WHERE (CategoryID = ?) |
Function Main() Select Case _ DTSLookups("myLookup").Execute(DTSSource("CategoryID").Value) Case "Dairy Products", "Meat/Products", "Seafood" DTSDestination("IsAnimal") = "Yes" Case Else DTSDestination("IsAnimal") = "No" End Select Main = DTSTransformStat_OK End Function |
TIP
To edit individual column transformations, you need to right-click on the lines pointing from the Source Table list to the Destination Table list.
In this exercise you will execute the package you have just created and then view the data in the new table.
Each DTS package is self-contained after you create it. A package is a complete description of all of the work to be performed as part of the transformation process. This section describes how to execute and schedule a DTS package that you have created.
After you save a DTS package, you can retrieve and execute it, using SQL Server Enterprise Manager or the dtsrun command-prompt utility.
The example given here shows the dtsrun command-prompt utility being used to execute a DTS package that creates and populates a summary table in the StudyNwind database on the SQL Server named SQLSERVER. The /U option specifies the sa login. If the specified login has a password, it must be specified with the /P option. Note that the name of the package is enclosed in double quotes; this is necessary if the name contains spaces. Remember that the DTS package is a complete description of all of the work to be performed as part of the transformation process.
dtsrun /SSQLSERVER /Usa /N"StudyNwind Product Totals" |
You can schedule a saved DTS package for execution at a specific time, such as daily at midnight, or at recurring intervals, such as on the first or last day of the month or weekly on Sunday at 6 a.m.
You can schedule a DTS package for execution in the following ways:
In this exercise, you will use the DTS Import Wizard to create a copy of the entire Northwind database.
A DTS package is composed of one or more steps, which you can define. A step is a unit of work with many possible actions. It can perform a number of actions, including but not limited to executing SQL statements and launching external programs.
With DTS you can control the execution sequence of each step. Precedence constraints and conditional processing allow you to prioritize steps.