The Data-Processing Tasks

The tasks in this group transform and/or move data in some way. The data-processing tasks are special and deserve their own grouping if for no other reason than to distinguish the Data Flow Task, which is arguably the most important feature in this release of Integration Services. Several chapters of this book focus exclusively on the Data Flow Task, so it is not covered in detail here. See Chapters 19 through 23 for more information about the Data Flow Task.

Bulk Insert Task

The Bulk Insert Task is provided as a way to quickly load large amounts of data from flat files into SQL Server. The Bulk Insert Task provides a simple way to insert data using the BULK INSERT T-SQL command. The SSIS Bulk Insert Task is provided as a backward-compatibility feature to ease migration into SSIS from DTS. It will likely be deprecated in coming versions.

No Longer Preferred

In DTS, using the Bulk Insert Task was the preferred way to do large bulk loads because it takes advantage of the performance gains using the T-SQL BULK INSERT command. With Integration Services, however, it is recommended to use the Data Flow Task. The Data Flow Task has numerous advantages over the Bulk Insert Task, which has rigid limitations on how it can consume and insert data. The Data Flow Task can bulk load data with comparable performance, especially if using the SQL Server Destination Adapter, while providing much more flexibility.

The SQL Server Destination Adapter and Bulk Insert

What's the connection? The SQL Server Destination Adapter uses the same interface into SQL Server as the Bulk Insert Task. So, although you get the same performance as you do with the Bulk Insert Task, with the Data Flow Task and the SQL Server Destination Adapter, you also get a whole lot more flexibility.

The following is a list of considerations to be aware of when using the Bulk Insert Task versus the Data Flow Task:

The Bulk Insert Task can only transfer data from text CSV or similarly formatted flat files, whereas the Data Flow Task can transfer data from many source systems.
Only members of the sysadmin or bulkadmin fixed server roles can run a package containing the Bulk Insert Task. Because the SQL Server Destination Adapter also uses the bulk insert API, its use is constrained by the same requirements. Other adapters also have similar security constraints but can output to multiple target systems.
The Bulk Insert Task can only insert data into SQL Server tables, whereas the Data Flow Task can output to multiple target systems.
For maximum control, you should use format files. (You can generate format files using the BCP utility.) The Data Flow Task does not use format files.
The Bulk Insert Task does not have metadata or format discovery features like the Data Flow Task.
The Bulk Insert Task has some limitations when importing empty double quoted fields, whereas the Data Flow Task does not.
Only views in which all columns refer to the same base table can be bulk loaded. The Data Flow Task has no such limitations.

The Bulk Insert Task Inside

The Bulk Insert Task is a wrapper task around the T-SQL Bulk Insert statement. The task generates the Bulk Insert statement from the properties set on the task and then sends it to the server where it will be executed. Therefore, all the limitations that apply to the Bulk Insert T-SQL statement also apply to the Bulk Insert Task.

Because the Data Flow Task replaces the features provided by the Bulk Insert Task, it is recommended to use the Data Flow Task for bulk-loading scenarios. The Data Flow Task is easier to set up, just as performant for straight bulk inserts, and more flexible.

XML Task

The XML Task provides a way to perform various operations on XML and XML documents. The XML Task can validate the XML against an XML Schema Definition (XSD) or Document Type Definition (DTD), apply an XSL transformation, retrieve element or attribute values with XPath, merge two different XML documents together, show differences between XML documents, and perform patch operations.

The XML Task is installed with the rest of Integration Services and does not have any special requirements or external dependencies to properly function.

Setting Up the XML Task

The sample package XML.dtsx in the S08-StockTasks solution performs an XSL transformation on a package you select or configure. It finds the executables, connection managers, and variables in the package and generates a very simple HTML report with a list of these package contents. The XSLT file itself is quite simple and should serve, with a little effort, as a starting point for anyone wanting to build an autodocumentation or metadata feature for their packages. The package shows how to set up the XML Task to perform an XSLT operation. If placed in a Foreach Loop with a ForeachFile Enumerator pointing to your package store, it would be possible to build reports for all of your packages with one package run.

Figure 8.28 shows the XML Task from the sample package with the OperationType "XSLT" selected. The XML Task uses three connection managers. The first is the file connection manager named PackageToReportOn. As the name implies, it points to the folder where the package is stored that will be processed with the XSLT. The second connection manager is the file connection manager named ReportHTMLFile, which points to the temporary location to store the resulting HTML report. The third connection manager is the file connection manager with the name XSLT File, which points to the location of the XSLT to apply to the package to generate the report.

Figure 8.28. The XML Task performs multiple XML functions

Caution

Make sure you've defined the SSISSAMPLESWORKING environment variable to point to a temporary folder for the resulting HTML file. The package uses property expressions to build the fully qualified file path.

Task-Specific Peculiarities or Nonstandard Settings

The key setting for the XML Task is the OperationType. In fact, the visible property set changes quite drastically depending on which operation type you select. The other settings that can be a little confusing are the SecondOperandType and SecondOperand. These two properties express the information needed for all operations. For example, for the Merge operation, the second operand is the file to be merged with the primary file. For the XSLT operation, the second operand is the XSLT file, and so on. Table 8.9 lists the second operands and their function.

Table 8.9. Second Operands and Their Function
Operation Type	Second Operand Function
Merge	Provides the XML file or text to be merged with the primary XML (specified by the Source property)
Diff	Provides the XML file or text to be compared with the primary XML
Patch	Provides the Diffgram document to be merged with the primary XML
XSLT	Provides the XSLT document to be used to transform the primary XML
XPath	Provides the XPath query to be used to retrieve nodes or text from the primary XML
Validate	Provides the DTD or XSD to be used to validate the primary XML

The XML Task is fairly flexible and can be used in many ways to process or retrieve information from XML files or text. The XML can be stored in a variable, in a file, or directly entered into the source property.

Tip

A common complaint about the XML Task is that it cannot "pipeline" operations or perform multiple serial operations on XML. But, this is not true. Actually, this is one of the reasons the XML Task supports storing the XML in a variable. To efficiently apply multiple operations to one XML document, you can simply output the XML text to a variable and then use the same variable as the input for the next XML Task.

Data Flow Task

The Data Flow Task supports one to many simultaneous and heterogeneous source connections to data from different source systems and media types. This data can be routed through one or more transformations executing in parallel out to zero to many different destination systems and media types. The Data Flow Task is the heart of Integration Services, where the bulk of the data processing, integration, and transformation happens.

The Data Flow Task is extremely important and several chapters are dedicated to describing it. Like the Script Task, this chapter included a Data Flow Task entry for completeness and consistency; however, the Data Flow Task is quite complex and too important to attempt coverage here. For detailed coverage of the Data Flow Task, see Chapters 18 through 23.