Integration Services packages are organized collections of DTP and DTR components. The package is the unit of execution for a Integration Services transfer operation. In other words, to use Integration Services to perform data transfers and ETL operations, you must create a package that contains all of the DTP components that define the source target and destinations for the data and the transformations that will take place, as well as the DTR components that define the work flow or sequence of actions that will be performed by the Integration Services package. Once the package has been created, you execute the package to perform the data transfer. You can see an overview of a sample Integration Services package shown in Figure 9-4.
Figure 9-4: Integration Services package overview
Integration Services packages can be created using the set of tools provided with SQL Server 2005, or they can be created programmatically using the DTR API. Integration Services packages can be saved either in SQL Server’s msdb database or as XML files that reside in the file systems. You can get a preview of the Integration Services design tools in the section “Integration Services Tools” that is presented later in this chapter.
Not surprisingly, the new Integration Services architecture adds some significant new capabilities to Integration Services that make it a more robust enterprise data transformation tool. Next, let’s look at some of the major enhancements that this new version of Integration Services brings to the table.
One of the major limitations of the earlier versions of Integration Services was the fact that packages were essentially tied to a given source and destination. That meant that it was really difficult to take a given package and easily reuse it on a different server, even if that server supported the same databases. While it was possible to use global variables and other complex workarounds to get around this limitation, these workarounds weren’t the most graceful or robust solutions to the problem. The new SQL Server 2005 Integration Services enables one package to handle multiple source and destination endpoints. You can use this feature to set package properties at run time from parameters files, the Registry, or XML documents.
The original Integration Services also didn’t support any built-in logging capabilities. Logging the operation of packages is particularly useful for auditing and troubleshooting. While you could add them yourself, this required development work on your part. The new SQL Server 2005 Integration Services provides built-in logging options for packages, tasks, and transformations.
Another great new feature in SQL Server 2005’s Integration Services is support for checkpoints and restarts within Integration Services packages. This new feature enables different checkpoints to be associated with multiple steps in complex Integration Services packages. Thus if a package fails that’s using checkpoints, the entire package doesn’t need to be rerun from the very beginning. Instead, the package can be restarted at the first unsuccessful checkpoint. For long-running packages, this can be a huge time saver, as the entire package doesn’t need to be reprocessed. Instead, the Integration Services package can resume processing where it left off. When combined with event handling, this can be a powerful tool for automatically managing your Integration Services jobs.
Support for variables is another new feature found in SQL Server 2005 Integration Services. With SQL Server 7 and SQL Server 2000, DTS packages supported variables for use in transformation scripts but didn’t support package-scoped variables. This lack of global variables made it difficult to reuse DTS packages on multiple databases and on multiple servers. To be sure, there were workarounds, but since DTS packages were not designed to be used in this fashion, these workarounds were somewhat complex and not particularly robust. With SQL Server 2005 the new Integration Services design provides support for package, task, and event-level variables, greatly increasing the flexibility of Integration Services packages and also significantly facilitating the reuse of Integration Services packages. As their name implies, package-scoped variables can be seen by all of the components in the Integration Services package. Task-scoped variables can be seen and accessed by the components of that particular task but not by any other components that are part of that Integration Services package. One example usage of a package-scoped Integration Services variable might be to contain the name of the current SQL Server system, allowing the package to be easily run on another system by substituting that system’s server name into the Integration Services variable at run time.
All Integration Services packages have a number of built-in system variables that represent different aspects of a Integration Services package. Table 9-1 lists the new Integration Services System variables.
System Variable | Description |
---|---|
BreakpointTargetDescription | Contains a description of the break point |
Cancel | Indicates that execution should be canceled |
CountDone | Contains the transfer progress counter |
CreationDate | Contains the package creation date |
CreatorComputerName | Contains the name of the computer used to create the package |
CreatorName | Contains the name of the user who created the package |
CustomEventDescription | Describes a custom event |
CustomEventInfo | Contains custom event information |
CustomEventName | Contains a custom event name |
CustomEventValue | Contains the custom event value |
CustonEventGUID | Contains the custom event's GUID |
ErrorCode | Contains an error code |
ErrorDescription | Contains an error description |
ExecutionInstanceGuid | Contains a GUID identifying the current package instance |
ExecutionStatus | Contains the package's execution status |
LocaleId | Contains the Locale identification code |
MachineName | Contains the current machine name |
MaxCount | Contains the maximum number of items to be transferred |
PackageId | Contains the current package identification |
PackageName | Contains the current package name |
PercentComplete | Contains the transfer progress status |
ProgressCountHigh | Contains the transfer progress high count |
ProgressCountLow | Contains the transfer progress low count |
ProgressDescription | Contains a transfer progress description |
ProgressEvent | Contains a transfer progress event identification |
Propagate | Indicates if an event can be propagated |
SourceDescription | Contains an event source description |
SourceID | Contains an event source identification |
SourceName | Contains an event source name |
StartTime | Contains the package's execution start time |
TaskID | Contains a task identification |
TaskName | Contains a task name |
UserName | Contains the name of the user running the package |
VersionBuild | Contains the package's build number |
VersionComment | Contains the package's description |
VersionGUID | Contains the package's GUID |
VersionMajor | Contains the package's major version number |
VersionMinor | Contains the package's minor version number |
SQL Server 2005’s Integration Services also supports complex flow control. Within each Integration Services package you can specify the path that will be taken if a given operation succeeds or fails. For instance, if an operation succeeds you can set up your Integration Services package to proceed along with the next subsequent action. Otherwise, if the operation fails you can take an alternative action. More information about the new error flow feature is presented in the following section. You can also set up the control flow in such a way that multiple tasks can be executed in parallel, or you can force tasks to execute sequentially by specifying that the next task will not be executed until the current task has completed. You can also use a new Integration Services construct known as a container to group together related Integration Services tasks. Each container can have its own internal control flow and variables. There are also multiple looping structures that enable you to set up repeated actions. There’s a For Each Loop container that is capable of iterating over a group of objects, taking action on those objects on every iterations. In addition, there’s a For Loop container that can evaluate an expression and conditionally perform repeated actions. More information about all of the available tasks is presented later in this chapter.
One of the big new features with SQL Server 2005 Integration Services is its ability to support error flows. The new Error Flows feature essentially enables you to add error handling to your Integration Services packages. With the new Error flows feature, when a Integration Services transform encounters rows that produce error conditions, instead of halting the process with an error, Integration Services can route the problem row according to the error flow that has been set up. For instance, the error flow may indicate that the row is simply written out to a log file, or it can also direct the package into a task that performs much more sophisticated error routines that can even manipulate the data and place the row back into the pipeline for reprocessing.
Similar to the way that SQL Server 2000 DTS worked, where DTS had both a wizard interface that was primarily designed to execute ad hoc data transfers and a DTS Designer that was used to build more complex DTS packages, SQL Server 2005’s Integration Services supports both an immediate mode and a project mode. To use Integration Services in the immediate mode, you can run the Integration Services Import/Export Wizard from the menu. The Integration Services Import/Export Wizard can be used to build, execute, and optionally save Integration Services packages that perform simple transfers. SQL Server 2005’s new Integration Services Import/Export Wizard looks and acts much like the version found in SQL Server 2000.
While immediate mode is useful for quick one-time data transfer operations, the project mode is useful for building more sophisticated Integration Services packages using the Business Intelligence Development Studio. The Business Intelligence Development Studio contains an all-new Integration Services Designer that supports an entirely new set of Data Flow, Control Flow, and Event handlers that can be used to build Integration Services packages. The new Integration Services Designer also provides full support for debugging Integration Services packages. You can see the new Integration Services Designer in more detail in the section “Integration Services Tools” later in this chapter.
Using a method much like the digital signing feature found that’s available for Microsoft .NET applications, Integration Services packages can now be signed as well. This enables you to verify the authenticity of a package when it is executed. Packages can be digitally signed during the design process using the Integration Services Designer. Once a package had been digitally signed, that package is read-only and can no longer be modified.
The DTP uses data adapters to connect data source and destination endpoints. As their names suggest, source data adapters connect to the source of the data and provide the input for Integration Services packages. Destination data adapters connect to the data target and output the data. SQL Server 2005’s provides a number of source and destination data adapters. Table 9-2 lists the set of built-in data adapters that are supplied with SQL Server 2005.
Data Adapter | Description |
---|---|
Flat File Destination Adapter | A file system adapter that writes text-delimited data to a file |
Flat File Source Adapter | A file system adapter that reads text-delimited data from a file |
OLE DB Destination Adapter | An OLE DB provider that writes data to an OLE consumer |
OLE DB Source Adapter | An OLE DB consumer that reads data from an OLE DB provider |
Raw File Destination Adapter | A file system adapter that writes data to a file |
Raw File Source Adapter | A file system adapter that reads data from a file |
SQL Server Destination Adapter | A SQL Server adapter that's used to write data to a table or view |
Web Service Source Adapter | A Web Service adapter that reads data from an XML web service |
Containers are a new construct that Microsoft has added to SQL Server 2005’s Integration Services. The primary purpose for Integration Services containers is to add structure and flow control to your Integration Services packages. Containers group together related tasks and are designed to be used to execute repeated tasks or to provide scope for variables. SQL Server 2005 Integration Services supports the types of containers shown in Table 9-3.
Container | Description |
---|---|
Package Container | A collection of package elements |
Foreach Loop Container | Provides iterative control flow in a package |
For Loop Container | Provides support for repeated actions in a package |
Sequence Container | Groups related tasks and containers in package |
TaskHost Container | Provides services to a task |
Container Properties | Holds values that are common to the container |
Container Collections | A collection of containers |
The Integration Services tasks are the most basic elements of the Integration Services package. The Integration Services task essentially defines an action that will be performed. These actions range from copying files, executing T-SQL statements, and running scripts, to performing FTP transfers and running data mining models. Multiple related tasks can be grouped into containers. Table 9-4 presents the tasks that are included in SQL Server 2005’s Integration Services.
Task | Description |
---|---|
ActiveX Script Task | Executes an ActiveX script that performs a specified action |
Analysis Services Execute DDL Task | Executes T-SQL DDL statements |
Analysis Services Processing Task | Processes Analysis Services objects |
Bulk Insert Task | Inserts data from a text file into a table |
Data Flow Task | Copies and transforms data between data sources |
Data Mining Query Task | Executes data mining queries |
Execute Package Task | Executes other packages |
Execute Process Task | Executes a program or script |
Execute SQL Task | Executes T-SQL statements |
File System Task | Executes actions on the file system |
File Transfer Protocol Task | Executes FTP data transfers |
Message Queue Task | Sends and receives messages from MSMQ data queues |
Script Task | Executes scripts written in VB.NET using the Microsoft Visual Studio for Applications (VSA) environment. |
Send Mail Task | Sends an e-mail message |
XML Task | Accesses data in XML documents |
Integration Services transformations control what happens to the data as it’s moved from the source data adapter to the destination data adapter. SQL Server 2005 supports both a number of built-in transformations and user-defined custom transformations. You can create custom transformations using the API provided by the DTP object model. SQL Server 2005’s Integration Services provides an extensive list of built-in standard transformations, as shown in Table 9-5.
Transformation | Description |
---|---|
Aggregate Transformation | Performs aggregations |
Allocation Transformation | Spreads the value of an input column across multiple output columns |
Character Map Transformation | Applies string functions to character data |
Conditional Split Transformation | Evaluates data and routes it to different outputs |
Copy/Map Transformation | Creates new output columns by copying input columns |
Data Mining Model Accuracy Transformation | Calculates the accuracy of data mining models |
Data Mining Model Training Transformation | Trains data mining models |
Data Mining Query Transformation | Runs data mining prediction queries |
Data Conversion Transformation | Converts the data type of an input column to a different output data type |
Derived Column Transformation | Creates an output column from the results of expressions |
Dimension Processing Transformation | Processes OLAP cube dimensions |
File Extractor Transformation | Reads data from a package's data flow and writes that data to file |
File Injector Transformation | Reads data from a file and adds that data to a package's data flow |
Fuzzy Grouping Transformation | Standardizes values in input column data |
Fuzzy Lookup Transformation | Looks up values in a reference table using fuzzy matching |
Logged Lineage Transformation | Provides environment information to the package's data flow |
Lookup Transformation | Looks up values in a reference table using exact matching |
Merge Transformation | Merges two sorted datasets |
Merge Join Transformation | Joins two datasets using a FULL, LEFT, or INNER join |
Multicast Transformation | Distributes input data to multiple outputs |
Partition Processing Transformation | Processes OLAP partitions |
Pivot Transformation | Pivots the input data according to an input column value |
Row Count Transformation | Counts the input rows and stores the count in a variable |
Sampling Transformation | Creates a representative sampling of the input dataset |
Script Transformation | Executes a script to transform the input |
Slowly Changing Dimension Transformation | Coordinates updating and insert rows into OLAP dimensions |
Sort Transformation | Sorts input data and copies the sorted data to the transformation output |
Surrogate Key Transformation | Provides additional custom properties to the Integration Services package |
Union All Transformation | Merges multiple datasets |
UnPivot Transformation | Unpivots input data according to an input column value |
The ability to raise and handle events is another new feature found SQL Server 2005 Integration Services. Event handling enables Integration Services packages to respond to events that are raised at run time by containers and tasks. Events can be fired by the Integration Services package elements to signal a number of different states, including error conditions, when a task has started, when a task completes, or a change in variable status. Table 9-6 lists the Integration Services Event Handlers.
Event Handler | Description |
---|---|
OnCustomerEvent | Raised on demand by a task or package |
OnError | Raised by a task or container on an error |
OnExecStatusChanged | Raised by a task or container when its execution status changes |
OnPostExecute | Raised by a task or container after it has run |
OnPostValidate | Raised by a task after it has been validated |
OnPreExecute | Raised by a task or container before it has run |
OnPreValidate | Raised by a task before it has been validated |
OnProgress | Raised by a task or container when a specified progress metric has been met |
OnQueryCancel | Raised by a task or container to determine if it [ok] should stop running |
OnTaskFailed | Raised by a task when it fails |
OnVariableValueChanged | Optionally raised by a variable when its value changes |
OnWarning | Raised by a task or container when a warning is generated |
Log providers are another completely new feature that Microsoft has added to Integration Services packages. As their name suggests, Integration Services log providers enable you to add logging to your packages, containers, and tasks. Logging is used to record error information or other important run-time or status information. Table 9-7 lists the log providers that Microsoft ships with SQL Server 2005.
Log Provider | Description |
---|---|
The Text File log provider | Writes log entries to ASCII text files using a comma-separated value (CSV) format using a default file extension of .log |
The SQL Profiler log provider | Writes SQL trace data to a trace file using a default file extension of .trc |
The SQL Server log provider | Writes log entries to the sysdtslog90 table in a SQL Server database |
The Windows Event log provider | Writes entries to the Windows Application |
The XML File log provider | Writes log files to an XML file using a default file extension of .xml |