Writing a Transform


Transforms are last because in many ways they are no more than source and destination components bound together into one component. This is especially true of transforms with asynchronous outputs. Think of the input of a transform as a destination adapter and the output as a source adapter and you'll understand. So, what is left to talk about? Although sources and destinations only convert data to and from various media, they don't actually transform it. What transforms do that sources and destinations don't is actually modify the data. That's what this section focuses on. How transforms perform the actual work they do. The Data Profiler sample is an excellent example because it is simple, easy to understand, and built to be extended. There are currently only a few profiling options, but by simply adding a new profile type class, you can add new profiling operations to the transform.

Types of Transforms

You will remember that there are two general types of transforms: those with synchronous outputs and those with asynchronous outputs. Transforms with synchronous outputs do not handle the PrimeOutput method and it will never get called even if they do. Sync outputs with nonzero exclusion groups get their rows as directed by the DirectRow method called by the transform. Synchronous outputs with exclusion group of zero automatically receive all rows from their synchronous input. Implementing ProcessInput for transforms with synchronous outputs is quite simple and virtually identical between the two types of transforms. Because transforms with synchronous outputs do not have any behavior or interesting challenges beyond those found in transforms with asynchronous outputs, this section only describes transforms with asynchronous outputs and specifically focuses on the DataProfiler transform.

Setting Up the Transform

The process for setting up a transform is identical to any of the other components with the exceptions previously noted.

Setting Up the Attributes

The following are the attributes for the DataProfiler transform. Notice that the ComponentType parameter is set to ComponentType.Transform:

[DtsPipelineComponent(     DisplayName = "Data Profiler",     Description = "Computes profile metrics for input data",     IconResource = "SSIS2K5.Samples.Transforms.DataProfiler.Component.ico",     ComponentType = ComponentType.Transform )] 


The ProvideComponentProperties Method

Because the set of output columns for the DataProfiler is fixed, the ProvideComponentProperties method statically defines the output columns and there is little need for the ReinitializeMetaData method. No matter what the shape of the input columns, the output columns and their types will always be fixed. The following code is repeated for each output column and adds the Column, Type, ProfileType, and ProfileValue columns to the output.

IDTSOutputColumn90 outputColumn = outputColumnCollection.New(); outputColumn.Name = "Column"; outputColumn.SetDataTypeProperties(DataType.DT_WSTR, 50, 0, 0, 0); 


The PreExecute Method

In the PreExecute method, the component examines each of the columns, creates column profilers for each column, and determines which of the column profilers can profile the column. Note that it is easy to add a new profile type by simply creating a new profile class.

foreach(IDTSInputColumn90 inputColumn in inputColumnCollection) {     // For each column profiler...     for (int j = 0; j < columnProfilerCount; j++)     {        ColumnProfilerBase columnProfiler = null;     // Create column profiler object of the desired type     switch(j)     {        case 0:                   columnProfiler = new ColumnProfilerMin();           break;           case 1:              columnProfiler = new ColumnProfilerMax();           break;           case 2:              columnProfiler = new ColumnProfilerAvgStrLen();           break;           case 3:              columnProfiler = new ColumnProfilerPercentageNulls();           break;       }       // Give column profiler the information about the input column       columnProfiler.ColumnIndex = BufferManager.FindColumnByLineageID(           input.Buffer,           inputColumn.LineageID);           columnProfiler.ColumnName = inputColumn.Name;           // And tell the details of that column (type, length, etc)           BufferManager.GetColumnInfo(              input.Buffer,              columnProfiler.ColumnIndex,              ref columnProfiler.ColumnInfo);              // The column profiler might not be able to profile this column,              // so ask beforehand.              if (!columnProfiler.CanProfile())              {                  // If we can't profile this column, null out the column profiler                  // object and our ProcessInput method will ignore this element.                  columnProfiler = null;              }              else              {                  columnProfiler.Prepare();              }              // Save the new object in the array of column profilers              columnProfilers.Add(columnProfiler);     } } 


The PrimeOutput Method The

PrimeOutput method is about as simple as it gets. It stores the output buffer reference for use in the ProcessInput method. The Data Flow Task guarantees that PrimeOutput will be called before ProcessInput.

Tip

Remember also, that this is different from synchronous outputs on source adapters that do not return from the PrimeOutput method until the source has completed.


public override void PrimeOutput(int outputs, int[] outputIDs, PipelineBuffer[] buffers) {     outputBuffer = buffers[0]; } 


The ProcessInput Method

The ProcessInput method loops through each of the rows performing the profile operation. When the EndOfRowset is reached, each of the profiling classes is asked for their final profile values and the output buffer receives a number of rows based on the number of columns multiplied times the number of profile types. At last, the code calls SetEndOfRowset to indicate that the transform is done.

 public override void ProcessInput(int inputID, PipelineBuffer buffer) {     // Check if this is all the data     if (!buffer.EndOfRowset)     {        // While there are rows, profile columns.        while (buffer.NextRow())        {            // Iterate through the profilers and apply the profiles            for (int i = 0; i < columnProfilers.Count; i++)            {               ColumnProfilerBase columnProfiler =               (ColumnProfilerBase)columnProfilers[i];               if (columnProfiler != null)               {                   columnProfiler.Profile(buffer);               }            }        }     }     else     {         // When we reach end of rowset, output final profile values.        for (int i = 0; i < columnProfilers.Count; i++)        {            ColumnProfilerBase columnProfiler =            (ColumnProfilerBase) columnProfilers[i];            if (columnProfiler != null)            {               outputBuffer.AddRow();               outputBuffer[columnIndex] = columnProfiler.ColumnName;               outputBuffer[typeIndex] = columnProfiler.ColumnInfo.DataType.ToString();               outputBuffer[profileTypeIndex] = columnProfiler.GetProfileType();               outputBuffer[profileValueIndex] = columnProfiler.GetProfileValue();            }        }        outputBuffer.SetEndOfRowset();        outputBuffer = null;     } } 


Performing the Transformation

The actual work of transforming the data is done by each of the profiler classes. The following code is the definition for the average string length profile.

namespace SSIS2K5.Samples.Transforms.DataProfiler {        class ColumnProfilerAvgStrLen : SSIS2K5.Samples.Transforms.DataProfiler.     ColumnProfilerBase      {           private double rows = 0;           private double length = 0;           public override string GetProfileType() { return "AVGSTRLEN"; } 


Only profile string type columns.

          public override bool CanProfile()           {               return ColumnInfo.DataType == DataType.DT_WSTR                       || ColumnInfo.DataType == DataType.DT_STR;           } 


The profile method is called for every row and simply stores the number of rows and the length.

public override void Profile(PipelineBuffer buffer) {      rows++;      object o = buffer[ColumnIndex];      if (o != null) length += ((string) o).Length; } 


When the last row is processed, the resulting average row length is the length of all rows divided by the number of rows.

          public override string GetProfileValue() { return Math.Ceiling(length/rows) ToString(); }      }; } 




Microsoft SQL Server 2005 Integration Services
Microsoft SQL Server 2005 Integration Services
ISBN: 0672327813
EAN: 2147483647
Year: 2006
Pages: 200
Authors: Kirk Haselden

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net