SSIS Transformations


Most transformations have one input, one regular output, and one error output. Some transformations have multiple inputs to consolidate data from separate pipelines, and some transformations have multiple outputs to separate data into separate pipelines. Not all transformations have error output.

The types of transformations in the pipeline affect the data flow engine's management of memory buffers. Row transformations, such as Data Conversion or Derived Column, are very efficient at reusing existing buffers. Multiple rows are received into the buffer from an upstream component in batches, data in the buffer is manipulated on a row-by-row basis, and then these rows in the buffer are operated on by the next component. This process continues until all rows from the source have been transformed.

Other transformation types must create new buffers to store output before allowing the next component access to the data and, consequently, place the greatest demands on system resources. Transformations operating on rowsets, such as Aggregate or Sort, must read all input before creating any output. Transformations that combine input, such as Merge or Union All, are somewhere in between the other transformation types, reading some input before creating output, but they do copy that output to a different buffer and thus have greater resource requirements. The following section describes the transformations that SSIS includes.

Row Transformations

The following transformations update column values and create new columns. The transformation is applied to each row in the transformation input.

  • Character Map This transformation maps characters in a string into different characters-for example, a lowercase-character mapping changes ASCII 64 (A) into ASCII 97 (a). It overwrites the column with the new string or adds a column to the output row.

    • Full-width <-> half-width characters

    • Hiragana <-> Katakana characters

    • Simplified Chinese <-> Traditional Chinese

    • Lowercase

    • Uppercase

    • Reverse byte order (order of the bytes in each character reversed)

    • Apply linguistic casing (simple case mapping for Turkic and other locales)

  • Copy Column This transformation creates new columns in the output row from existing columns in the input row for subsequent transformation.

  • Data Conversion This transformation applies data conversion functions to change data type, column length, or precision and scale to an input column and adds a column with the new data to the output row.

  • Derived Column This transformation applies expressions to an input column to create a new value for a new column in the output row or as a replacement value for the existing column.

  • Script Component This transformation runs custom code against data in the pipeline.

  • OLE DB Command This transformation runs a SQL statement (optionally parameterized) for each row in the pipeline.

Rowset Transformations

The following transformations create new rowsets. The rowset can include aggregate and sorted values, sample rowsets, or pivoted and unpivoted rowsets.

  • Aggregate This transformation applies an aggregate function (Sum, Minimum, Maximum, and so on) to grouped records and produces new output records from aggregated results. Output records contain only the columns used for grouping and the aggregated values.

  • Sort This transformation applies an ascending or descending sort to one or more columns and optionally removes rows with duplicate values in the sort columns.

  • Percentage Sampling This transformation creates a random sample set of output rows by selecting a specified percentage of input rows, commonly used for data mining.

  • Row Sampling This transformation creates a random sample set of output rows by selecting a specified number of input rows, primarily used for testing packages with a subset of representative data.

  • Pivot This transformation converts rows into columns. For example, suppose your input rows contain CustomerName, Product, and Quantity. If you pivot on Product, the resulting output will have CustomerName and one column for each product. The output product columns will contain the values that were in the input Quantity column.

  • Unpivot This transformation converts columns into rows. For example, suppose your input has a CustomerName column and a column for each product your company sells. The product columns contain the quantity that the customer purchased. If you unpivot on the product columns, your output row will have CustomerName, Product, and Quantity rows. The Product column will contain the names of the product columns in the input rows.

Split and Join Transformations

The following transformations distribute rows to different outputs, create copies of the transformation inputs, join multiple inputs into one output, and perform lookup operations.

  • Conditional Split This transformation separates input rows into separate output pipelines based on a Boolean expression configured for each output. A single input row is passed to the first output row for which the condition is true or to a default output defined for rows that meet no conditions.

  • Multicast This transformation copies all input rows to two or more outputs.

  • Merge In this transformation, input rows are interleaved. Merges combine two sorted datasets with the same column structure into a single output. The output will be sorted in the same order as the two inputs.

  • Merge Join In this transformation, input rows are combined side by side. Merge Join combines two sorted datasets, using a FULL, LEFT, or INNER join.

  • Union All In this transformation, input rows are stacked. Union All combines two or more datasets with the same column structure into a single output.

  • Lookup This transformation joins input rows with columns in a reference dataset (from a table, view, or SQL statement) to add one or more columns to the output row.

Data Quality Transformations

The following transformations perform data quality operations such as standardizing values and looking up values in a reference table.

  • Fuzzy Lookup This transformation finds close or exact matches between two or more columns (DT_WSTR and DT_STR data types only) in the input row and a row in a reference table and adds selected columns from the matched row and columns for fuzzy matching metrics.

  • Fuzzy Grouping This transformation finds close or exact matches between input rows based on one or more columns and adds columns to the output, identifying matches and similarity scores.

Data-Mining Transformations

The following transformations perform data-mining operations:

  • Data-Mining Query Uses data from the input rows to execute prediction queries on a SQL Server Analysis Services data-mining model and creates output rows from the query result set.

  • Term Extraction Extracts nouns or noun phrases (or both) from an input column containing English text, places the extracted terms in a column in the output row, and adds another column for the score. Multiple output rows per input row are created when multiple terms are extracted from an input column.

  • Term Lookup Matches text in an input column with a reference table, counts the number of occurrences of the term in the dataset, and adds a term and frequency column to the output row. As with Term Extraction, multiple output rows per input row can be created.

Other Transformations

SSIS includes the following transformations to export and import data, add audit information, count rows, and work slowly with changing dimensions.

  • Export Column This transformation reads a file name from an input column and inserts data from another column in the same row into the specified file.

  • Import Column This transformation reads a file name from an input column and inserts data from the specified file into a new output column for the same row.

  • Audit This transformation adds the value of a system variable, such as MachineName or ExecutionInstanceGUID, to a new output column.

  • Row Count This transformation counts the number of rows currently in the data flow and stores the value in a variable.

  • Slowly Changing Dimension This transformation manages inserts and updates in a data warehouse for Type 1 and Type 2 changes.

Synchronous and Asynchronous Transformations

The following transformations determine how rows are processed.

  • Synchronous transformations These transformations process data row by row. They manipulate data that already exists in a memory buffer. Transformations that use data already contained in the row are the fastest (for instance, Derived Column). Transformations that process data row by row but must interact with an external data source will be slower (Lookup, Object Linking and Embedding (OLE) DB Command).

  • Asynchronous transformations These transformations cause SSIS to copy and manipulate data from one memory buffer to another. Asynchronous transformations potentially have more or fewer output rows than input rows.

  • Partially blocking transformations These transformations join datasets (Merge) or create a dataset that is dissimilar to the source dataset. (Pivot output has different columns from its input.) Data flows through these transformations more slowly than it flows through non-blocking transformations.

  • Blocking transformations These transformations cannot send a row on to the next transformation until all of the input rows have been read.




Microsoft SQL Server 2005 Integration Services Step by Step
MicrosoftВ® SQL Server(TM) 2005 Integration Services Step by Step
ISBN: 0735624054
EAN: 2147483647
Year: 2007
Pages: 152

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net