The multiphase data pump is the engine that handles the movement and transformation of data for the Transform Data and Data Driven Query tasks . When you modified the default transformation in the Transform Data task in the UpdateCustomerDim package to call the CountryCodes lookup query and when you modified the default transformation in the DDQ task in the UpdateProductDim package to call either the insert or update query, you were actually accessing the row transform phase of the multiphase data pump. The row transform phase is the only phase exposed by default in DTS Designer. In this chapter you will learn how to expose the additional phases and learn a little bit about each phase. In later chapters, you will extend the data movement application by using ActiveX transformation scripts to access the data pump at different phases.
The multiphase data pump actually has six phases that are entry points for customization of a data movement application. The data pump has three phases that allow you to interact with the data, one of which is the row transform phase. You can also access the data after each batch completes and again after all batches have completed. In addition to these phases that allow you to manipulate the data, there is one phase that allows you to access the data pump before any source rows are fetched and two phases that allow you to access the data pump after the data has been processed (but before the next task starts). These phases can be accessed by ActiveX scripts and custom COM objects. The following list describes each of these phases.
Pre-Source phase This phase occurs before the first row of data is fetched from the data source and is executed only once unless you use an ActiveX script to create a loop. You can write a pre-source data pump function for a transformation that writes header rows containing meta data to a table or a file; truncates tables; drops indexes; or initializes objects, connections, and memory for use in later phases of the data pump. This phase does not allow access to the source data but does allow access to the data destination. You actually accessed this phase in Chapter 3 when you submitted global variable values to the PopulateTimeDimension package using the DTSRun command. You will continue to explore accessing this phase in Chapter 6 and Chapter 7.
Row Transform phase This phase is the default data pump phase and occurs once for each row being transformed. All transformations use this phase to move data between the source and destination. This phase is available through the Transformations mapping tab of the Transform Data task or the Data Driven Query task. This phase is also available through an ActiveX Scripting task or a custom task. In an ActiveX scripting task, the Main function placeholder in the transformation script is the default entry point for writing a transformation script for this phase. You accessed this phase using an ActiveX transformation script when you modified the default transformation script to call the lookup query in the Transform Data task and the insert or update query in the DDQ task. This phase allows read access to the source data and meta data and write access to the destination data.
Post Row Transform phase This phase occurs once for each row being transformed, regardless of the success or failure of the row transformation. This phase has three subphases:
Transform Failure subphase This subphase occurs whenever an error is detected in the Row Transform phase for a particular row, indicated by the return of DTSTransformStat_Error or DTSTransformStat_ExceptionRow . These errors are typically caused by data conversion errors. You can write a function that handles transformation errors (such as type mismatches ). You can override the value that generated the error and continue with the execution, or you can abort the execution and report a failure. You can also call this subphase and insert code that tests for failure conditions that DTS does not automatically detect at this phase. Failure conditions include null conditions in the source data that will violate a null constraint at the data destination. If you do not write a function to handle a transform failure, execution will continue and DTS will attempt to insert the row containing the transformation error.
Insert Success subphase This subphase occurs when the current row is successfully inserted into the destination rowset. The data is not actually inserted into the data destination at this point. You can write a function to keep count of all rows that were successfully transformed and added to the destination rowset.
Insert Failure subphase This subphase occurs when the current row cannot be inserted into the destination rowset. You can write a function to keep a count of all rows that were not successfully transformed.
On Batch Complete phase This phase occurs once for each batch. By default, all rows are processed in one batch. You can define a batch size for the Transform Data task on the Options tab. You must set a batch size for the DDQ and other tasks programmatically. When this phase occurs, all rows in the destination rowset buffer are written to the destination table. You can write a function programmatically to audit the state of large, multi-batch load jobs. If an error occurs during the writing of this rowset, the number of rows written to the destination table depends upon whether the package is configured to use transactions. Transactions are discussed in Chapter 5.
Post Source Data phase This phase occurs after all batches have been processed (but before the final batch is committed). During this phase, you have full access to the data. For example, you could write a function that adds a footer row to the data destination.
On Pump Complete phase This phase occurs after all other phases are complete. During this phase, you do not have access to the destination data. During this phase, you can free up resources (such as closing a global ADO connection) and commit data held in global variables throughout the lifetime of the data pump (such as a value for total rows processed).
Figure 4-1 shows the data pump phases and how they map to the data flow.
By exposing these entry points to the multiphase data pump, DTS enables you to add great flexibility to the data movement application. You will learn to access these functions throughout the remainder of this book. However, because these entry points are not exposed by default, you must activate the data pump feature in SQL Server Enterprise Manager before you can access them directly in DTS Designer.
In this procedure, you will activate the multiphase data pump feature in DTS Designer and then review the data pump phases as they appear in DTS Designer.
In the SQL Server Enterprise Manager console tree, right-click Data Transformation Services and then click Properties.
Select the Show Multi-Phase Pump In DTS Designer check box. You could also select the Turn On Cache check box and clear the Enable Save To Meta Data Services check box at this point if you so desire .
In the Package Properties dialog box, you can also turn on caching, which can reduce the time required to load large packages in DTS Designer. When caching is not turned on, DTS Designer queries the Windows registry each time a package is opened to determine what scripting languages, custom transformations, OLE-DB Providers, and custom tasks are registered on the computer. If you turn on caching, you must click Refresh to update the cache after changing one of these objects.
In the SQL Server Enterprise Manager console tree, right-click Data Transformation Services and then click Open Package.
Navigate to C:\Microsoft Press\SQL DTS SBS\Ch1\WorkingFolder in the Look In list, and then double-click SQL_DTS_SBS_1.1.dts.
Double-click SQL_DTS_SBS_1.1 in the Select Package dialog box, type mypassword in the Password box, and then click OK.
Double-click the Transform Data step and then click the Transformations tab.
The phases filter list allows you to view, edit, or create transformations at different phases. The only transformation in this package is the DirectCopyXform transformation in the row transform phase.
Click Delete All, click Select All, and then click New.
Click ActiveX Script, click OK, and then click Properties.
Click the Phases tab.
Notice that only the Row Transform phase is selected by default. In this procedure, you will select all the phases. Then you will generate default code to demonstrate where to add custom code to access each phase with an ActiveX transformation script.
Select all the check boxes on the Phases tab, and then click Auto Gen. to add a function placeholder, and the DTSTransformstat_OK constant as an exit value, for each phase.
DTSTransformstat_OK indicates to the DTS task that it can continue with the next task. You can add ActiveX script within the function blocks of any phase of the multiphase data pump to enhance the functionality of your data movement application.
Click Cancel, click No, click Cancel again, click Cancel once more, and then close the SQL_DTS_SBS_1.1 package in DTS Designer.
You have now activated the multiphase data pump feature in DTS Designer. You will use the functionality exposed in this exercise throughout the remainder of this book.