Creating a Transformation


SSIS doesn't provide a lot of functionality in the box for dealing with BLOBs in the data flow. Yet, I've got some emails that are huge because they're either forum threads or loaded with Hypertext Markup Language (HTML) or from newsgroup posts. What I'd like to do is shorten the email body so that I only get the first few lines, which should be enough to capture the essence of the body. It would be nice to be able to change the length of the body retained based on a value in a variable. It would also be nice to replace any empty column values with a string value from a variable so, for example, instead of being completely empty, the CC field might contain the string "<Empty>. This is much easier to deal with than a NULL. Having the replacement value in a variable makes it easy to change later as well.

Note

You can also replace NULL values using a Derived Column transform, but then that wouldn't be as much fun.


Creating the Component

Creating the transform is similar to creating the destination adapter. You have to select the input columns, and so on, but you also need to create outputs like the source. Finally, there are two distinct types of transforms as discussed previouslysynchronous and asynchronous. First, you'll create a synchronous transform and then you'll learn how to create an asynchronous transform.

Setting Up the Input Columns

Start by creating the input columns:

1.

Drop another Script Component onto the designer and select the Transform option.

2.

Delete the path from the source to the destination you just created.

3.

Create a new path from the source to the transform.

4.

Create a new path from the transform to the destination.

5.

Open the Script Transformation Editor.

6.

Select the Input Columns node in the left pane.

7.

Select the check boxes in all the available input columns.

8.

In the grid below, set the usage type for the MessageBody column to ReadWrite. Best practices dictate that you should only set a column to be ReadWrite if you plan to modify it in the transform. Figure 21.4 shows the correct settings.

Figure 21.4. Setting up the transform input columns


Setting the Synchronous Input ID

You need to tell the Script Component that it is a synchronous transform. The way you do this is by explicitly linking the output with the input. As mentioned earlier, all objects in the data flow have IDs, including inputs. So, the way you link the output with the input is to specify in the SynchronousInputID property of the output the ID of the associated input. The following steps show you how to do this:

1.

Double-click on the Script Component.

2.

Select the Inputs and Outputs node.

3.

Select the Input node.

4.

In the property grid, look at the ID property. In the sample project, the ID is 517.

5.

Select the Output node.

6.

In the property grid, set the SynchronousInputID to the value you found for the input ID. Figure 21.5 shows the SynchronousInputID settings.

Figure 21.5. Setting up the SynchronousInputID


Setting Up the Variables

Before adding the script, you need to create a couple of variables the Script transform will use as parameters. The first variable is the maximum body length. It's an integer. The second is the default string value you'll use to replace the empty string, which is a string type.

Tip

In the sample, these variables are created at data flow scope because they really have no use elsewhere in the package. Also, if the data flow ever needs to be copied and pasted to another package, the variables will go with the data flow, leaving it valid without the need for creating variables again. It's always a good idea to create variables at the most local scope possible. This helps avoid variable confusion and makes it easier to manage and update packages.


To make the variables accessible within the script, you need to add them to one of the reserve lists. Because you don't need to change their values, you can place their names in the ReadOnlyVariables property.

Tip

If you do need to change their values, you should keep two things in mind. You must add the variable name to the ReadWriteVariables property and you can only change the value in the PostExecute method. Even better, you can use the VariableDispenser object to lock the variables directly in the code. This approach is more work, but has the virtue of only locking the variable for as long as it is absolutely needed.


Figure 21.6 shows the Script Transformation Editor with the ReadOnlyVariables correctly set.

Figure 21.6. Adding the variables to the read list


Adding the Script

Next, you need to set up the script. Because this is a synchronous transform, you only need to modify the rows that are passed to the ProcessInputRow method. You don't need to add new rows. Essentially, the Script Component passes each row in succession, allowing the script to modify it in place. Because the Script Component already creates the template ProcessInputRow function for you, you need only add the processing code.

The section of code that reads the message body and truncates it is as follows:

If Row.MessageBody.Length > Me.Variables.BODYMAXLEN Then        Dim dBytes As Byte() = Row.MessageBody.GetBlobData(0, Me.Variables.BODYMAXLEN)        Row.MessageBody.AddBlobData(dBytes) End If 


The code tests if the length of the message body is greater than the maximum body length. If it is, it retrieves the message body BLOB, but only the maximum body length bytes. Then, it overwrites the message body BLOB with the shorter value.

The following code, repeated for each string column, checks if the given column is empty. If it is, it replaces it with the string in the DEFAULTSTRINGVALUE variable.

If Row.Sender Is Nothing Then    Row.Sender = Me.Variables.DEFAULTSTRINGVALUE End If 




Microsoft SQL Server 2005 Integration Services
Microsoft SQL Server 2005 Integration Services
ISBN: 0672327813
EAN: 2147483647
Year: 2006
Pages: 200
Authors: Kirk Haselden

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net