The tasks in this group transform and/or move data in some way. The data-processing tasks are special and deserve their own grouping if for no other reason than to distinguish the Data Flow Task, which is arguably the most important feature in this release of Integration Services. Several chapters of this book focus exclusively on the Data Flow Task, so it is not covered in detail here. See Chapters 19 through 23 for more information about the Data Flow Task. Bulk Insert TaskThe Bulk Insert Task is provided as a way to quickly load large amounts of data from flat files into SQL Server. The Bulk Insert Task provides a simple way to insert data using the BULK INSERT T-SQL command. The SSIS Bulk Insert Task is provided as a backward-compatibility feature to ease migration into SSIS from DTS. It will likely be deprecated in coming versions. No Longer PreferredIn DTS, using the Bulk Insert Task was the preferred way to do large bulk loads because it takes advantage of the performance gains using the T-SQL BULK INSERT command. With Integration Services, however, it is recommended to use the Data Flow Task. The Data Flow Task has numerous advantages over the Bulk Insert Task, which has rigid limitations on how it can consume and insert data. The Data Flow Task can bulk load data with comparable performance, especially if using the SQL Server Destination Adapter, while providing much more flexibility. The SQL Server Destination Adapter and Bulk InsertWhat's the connection? The SQL Server Destination Adapter uses the same interface into SQL Server as the Bulk Insert Task. So, although you get the same performance as you do with the Bulk Insert Task, with the Data Flow Task and the SQL Server Destination Adapter, you also get a whole lot more flexibility. The following is a list of considerations to be aware of when using the Bulk Insert Task versus the Data Flow Task:
Because the Data Flow Task replaces the features provided by the Bulk Insert Task, it is recommended to use the Data Flow Task for bulk-loading scenarios. The Data Flow Task is easier to set up, just as performant for straight bulk inserts, and more flexible.
XML TaskThe XML Task provides a way to perform various operations on XML and XML documents. The XML Task can validate the XML against an XML Schema Definition (XSD) or Document Type Definition (DTD), apply an XSL transformation, retrieve element or attribute values with XPath, merge two different XML documents together, show differences between XML documents, and perform patch operations. The XML Task is installed with the rest of Integration Services and does not have any special requirements or external dependencies to properly function. Setting Up the XML TaskThe sample package XML.dtsx in the S08-StockTasks solution performs an XSL transformation on a package you select or configure. It finds the executables, connection managers, and variables in the package and generates a very simple HTML report with a list of these package contents. The XSLT file itself is quite simple and should serve, with a little effort, as a starting point for anyone wanting to build an autodocumentation or metadata feature for their packages. The package shows how to set up the XML Task to perform an XSLT operation. If placed in a Foreach Loop with a ForeachFile Enumerator pointing to your package store, it would be possible to build reports for all of your packages with one package run. Figure 8.28 shows the XML Task from the sample package with the OperationType "XSLT" selected. The XML Task uses three connection managers. The first is the file connection manager named PackageToReportOn. As the name implies, it points to the folder where the package is stored that will be processed with the XSLT. The second connection manager is the file connection manager named ReportHTMLFile, which points to the temporary location to store the resulting HTML report. The third connection manager is the file connection manager with the name XSLT File, which points to the location of the XSLT to apply to the package to generate the report. Figure 8.28. The XML Task performs multiple XML functionsCaution Make sure you've defined the SSISSAMPLESWORKING environment variable to point to a temporary folder for the resulting HTML file. The package uses property expressions to build the fully qualified file path. Task-Specific Peculiarities or Nonstandard SettingsThe key setting for the XML Task is the OperationType. In fact, the visible property set changes quite drastically depending on which operation type you select. The other settings that can be a little confusing are the SecondOperandType and SecondOperand. These two properties express the information needed for all operations. For example, for the Merge operation, the second operand is the file to be merged with the primary file. For the XSLT operation, the second operand is the XSLT file, and so on. Table 8.9 lists the second operands and their function.
The XML Task is fairly flexible and can be used in many ways to process or retrieve information from XML files or text. The XML can be stored in a variable, in a file, or directly entered into the source property. Tip A common complaint about the XML Task is that it cannot "pipeline" operations or perform multiple serial operations on XML. But, this is not true. Actually, this is one of the reasons the XML Task supports storing the XML in a variable. To efficiently apply multiple operations to one XML document, you can simply output the XML text to a variable and then use the same variable as the input for the next XML Task. Data Flow TaskThe Data Flow Task supports one to many simultaneous and heterogeneous source connections to data from different source systems and media types. This data can be routed through one or more transformations executing in parallel out to zero to many different destination systems and media types. The Data Flow Task is the heart of Integration Services, where the bulk of the data processing, integration, and transformation happens. The Data Flow Task is extremely important and several chapters are dedicated to describing it. Like the Script Task, this chapter included a Data Flow Task entry for completeness and consistency; however, the Data Flow Task is quite complex and too important to attempt coverage here. For detailed coverage of the Data Flow Task, see Chapters 18 through 23. |