Data Blocking


Some transformations need partial or complete sets of data to perform their functions. When data in the pipeline is held up by a transformation operation, it blocks the flow of data to subsequent components downstream. As you know, transformation components that depend on each other are synchronous while those that have no direct dependency are asynchronous. Depending on whether the first transformation must process all or part of the data, to pass the data to the next transformation, it will be either a blocking or partially blocking transformation.

Blocking Transformations

Transformations that require all input rows to be read before they can process the data are blocking transformations. These include:

  • Aggregation.

  • Sort.

The aggregation transformation must read all records to perform a mathematical algorithm and yield a single calculated value. The sort transformation must read records into a separate buffer in the sorted order, which cannot be completed until the last read has been added to the sort stack. Only after these operations have completed will the results flow down the data pipeline to the next transformations or data sources.

Partially Blocking Transformations

Partially blocking transformations are often used to combine data sets. They tend to have multiple data inputs with a resulting output that might have a greater number, a smaller number, or the same number of records than the total number of input records. The transformation(s) on the receiving end of a partially blocking transaction can work with subsets of the total input stream. Because the number of input records will likely not match the number of output records, these transformations are also called asynchronous transformations. Examples of partially blocking transformation components available in SSIS include the following:

  • Merge

  • Merge Join

  • Pivot

  • Unpivot

  • Union All

  • Term Lookup

  • Data Mining Query

With partially blocking transformations, the output of the transformation is copied into a new buffer, and a new thread is introduced into the data flow.

Row Transformations

This category of transformations processes data one row at a time. Row transformations are typically considered to be partially blocking because rows are available to downstream components as soon as they flow from the transformation component.

Row transformations include the following:

  • Derived Column

  • Data Conversion

  • Copy Column

  • Character Map

  • OLE DB Command

  • Script Command

These types of transformations are called row transformations because they process data in the pipeline one row at a time. These are also known as synchronous transformations because rows flow to and from the pipeline without completely blocking downstream components. A row transformation also reuses buffers rather than requiring data to be copied to new buffers that would require more memory and processing. It can add or modify columns but will not add additional rows to the pipeline.




Microsoft SQL Server 2005 Integration Services Step by Step
MicrosoftВ® SQL Server(TM) 2005 Integration Services Step by Step
ISBN: 0735624054
EAN: 2147483647
Year: 2007
Pages: 152

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net