Implementing the Script Component


Let's assume that you are going to import customer data from a CSV file into a DimCustomer table in the is2005sbsDW database. While you're importing the data, you need to validate each customer's e-mail address and store the validation result in an additional column in the DimCustomer table. The Script transformation component is often useful for validating data in Data Flow.

Reviewing a Sample Project

First, let's look through the sample project you will work with in the following procedure.

Start BIDS and open the solution file saved under C:\Documents and Settings\<username>\My Documents\Microsoft Press\is2005SbS\Chap06_02-InitialCode \Chapter06_02.sln. In Solution Explorer, double-click image from book Package.dtsx. Your screen should like this:

image from book

The main stream of the package is quite simple: Truncate the destination table and import the CSV file. Executing SQL Task - Clear DimCustomer truncates the DimCustomer table before the CSV file is loaded into DimCustomer. Data Flow - Import Customer CSV is a Data Flow task to import a CSV file into a DimCustomer table.

Double-click the Data Flow - Import Customer CSV to see the inside of the task. Your screen will be switched to the Data Flow tab like this:

image from book

Flat File Source - Customer CSV is mapped to a CSV file that holds source customer data. The file name and the schema of the CSV file are defined in the Customer CSV File connection, which you can see in the connection managers. OLE DB Destination - DimCustomer is mapped to the DimCustomer table in the is2005sbsDW database.

Before executing the package, you need to copy a source CSV file into the root folder of the drive C. Start Windows Explorer, copy image from book Customers.csv, saved under the C:\Documents and Settings\<username>\My Documents\Microsoft Press\is2005SbS\Chap06_02-InitialCode \Data folder, to the root folder of the drive C. Open the CSV file in Microsoft Office Excel and review the customer data in the file.

Go back to BIDS. Press F5 to execute the package. All the items on the Control Flow tab and Data Flow tab turn green, and you should be able to see the row count in the Data Flow tab.

Before you modify the package, you need to add a column to the DimCustomer table to hold the validation result. The new column is called IsValidEmail, and the data type is Boolean.

Implement Validation Using the Transformation Script Component
  1. Start SSMS and connect to localhost.

  2. In Object Explorer, select the is2005sbsDW database in the Databases folder and click the New Query button on the Standard toolbar.

  3. In the Query Editor, type the following SQL code:

     ALTER TABLE is2005sbsDW.dbo.DimCustomer ADD IsValidEmail bit NULL DEFAULT 0 

  4. Press F5 to execute the query. Ensure that the Command(s) Completed Successfully message is displayed on the Messages tab.

    Now you are ready to modify the package. You will add a new Script component in the Data Flow - Import Customer CSV and implement validation code in the new component.

  5. In BIDS, click the Data Flow tab.

  6. In the Toolbox, expand the Data Flow Transformations group, and then drag Script Component to the design surface of the Data Flow tab. The following dialog box appears on the screen:

    image from book

    In this dialog box, you will define which type of Script component you are going to use. As you learned at the beginning of this section, the Script component can be used as a source, destination, and transformation component. In this procedure, select Transformation.

  7. Select Transformation and click OK.

  8. Select the green arrow between Flat File Source - Customer CSV and OLE DB Destination - DimCustomer and press Delete to remove the path.

  9. Select Flat File Source - Customer CSV, and then drag the green arrow to the new Script component.

  10. Double-click the new Script component. The Script Transformation Editor dialog box appears.

  11. On the Input Columns page, select the check box next to EmailAddress in the Available Input Columns table. Notice that the Input Column, the Output Alias, and Usage Type fields are automatically completed in the following list. This step enables you to declare which input fields you would like to access in your code. Your screen looks like this:

    image from book

  12. Click the Inputs and Outputs page.

  13. In the tree view in the middle of the dialog box, expand Output 0 and select Output Columns. Click Add Column.

  14. Rename the new column IsValidEmail.

  15. In Data Type Properties, change the DataType property to Boolean [DT_BOOL]. You screen looks like this:

    image from book

  16. Click the Script page. In the right-side pane, change the Name property to Script Component - Validate Email.

  17. Click Design Script. The VSA IDE appears on the screen.

  18. In the Code Editor, insert the following code in your script:

     Imports System Imports System.Data Imports System.Math Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper Imports Microsoft.SqlServer.Dts.Runtime.Wrapper Imports System.Text.RegularExpressions Public Class ScriptMain     Inherits UserComponent     Public Overrides Sub Input0_ProcessInputRow(ByVal Row As Input0Buffer)         '         ' Add your code here         '         Const EMAIL_PATTERN As String = _             "^\w+([-+.]\w+)*@\w+([-.]\w+)*\.\w+([-.]\w+)*$"         If Row.EmailAddress_IsNull Then         Row.IsValidEmail = False             Exit Sub         End If         Row.IsValidEmail = Regex.IsMatch(Row.EmailAddress, _             EMAIL_PATTERN)     End Sub End Class 

    The Input0_ProcessInputRow method will be invoked when SSIS inserts each row into this Script transformation component. For example, if an upstream component inserts five rows into this component, SSIS will call this method five times.

    The Row parameter object contains all the information for each row's inputs and outputs, which you defined in the previous steps. Each column is defined as a member property of the Input0Buffer class. If you type Row in the Code Editor and type a dot (.), you can browse through the properties that you can access in the IntelliSense window.

    The preceding code validates a given e-mail address by using Regular Expressions. Regular Expressions is a string pattern matching functionality, often used to examine whether the given text is following certain patterns. Regular Expressions is also useful for extracting certain patterns of string from the given text. In this code, EMAIL_PATTERN holds the regular expressions for the generic e-mail address. The Regex.IsMatch method returns True if the e-mail address matches the e-mail address pattern; otherwise, it returns False. For more information about Regular Expressions, refer to the "NET Framework Regular Expressions" section on the MSDN Web site at http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpguide/html/cpcon-COMRegularExpressions.asp.

  19. Close VSA and the Script Transformation Editor dialog box.

  20. Select the Script Component - Validate Email component and drag the green arrow to OLE DB Destination - DimCustomer.

  21. Double-click OLE DB Destination - DimCustomer. The OLE DB Destination Editor dialog box appears. On the Mappings page, scroll down the grid and find IsValidEmail under Destination Column. Under Input Column should be <Ignore>. Change it to IsValidEmail, which you created in the previous steps.

  22. Click OK. Your screen now looks like this:

    image from book

  23. If you have not copied Customer.csv to the root folder of the drive C in the previous step, start Windows Explorer, copy image from book Customers.csv, saved in the C:\Documents and Settings\<username>\My Documents\Microsoft Press\is2005SbS\Chap06_02-InitialCode\Data folder to root folder of the drive C.

  24. Press F5 to execute the package. Make sure all components turn green and that the row count is displayed in the Data Flow.

  25. Go back to SSMS, expand databases is2005sbsDW and Tables, and right-click the dbo.DimCustomer table. Select Open Table. Verify that the IsValidEmail column is all True.

  26. Let's make sure validation code is working as you expect. Start Notepad, open C:\Customers.csv, and modify the e-mail address of the first customer (jon24@adventure-works.com) to an improper address, such as ABC.

  27. Save your changes.

  28. Go back to BIDS and press Shift+Ctrl+F5 to restart the package.

  29. Go back to SSMS and select Execute SQL on the Query Designer menu to refresh the result grid. Ensure that the first customer's IsValidEmail value changed to False. This indicates that your validation code is working as you expect.

  30. In Object Explorer, select the is2005sbsDW database and click New Query on the Standard toolbar.

  31. In the Query Editor, type the following SQL code:

     ALTER TABLE is2005sbsDW.dbo.DimCustomer DROP COLUMN IsValidEmail 

  32. Press F5 to execute the query. Make sure that the message Command(s) Completed Successfully appears on the Messages tab.




Microsoft SQL Server 2005 Integration Services Step by Step
MicrosoftВ® SQL Server(TM) 2005 Integration Services Step by Step
ISBN: 0735624054
EAN: 2147483647
Year: 2007
Pages: 152

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net