Control Flow Elements


Microsoft SQL Server Integration Services (SSIS) is a tool to move data between different systems; it's also an effective and complete tool to extract, transform, process, and clean data and perform maintenance. In general, SSIS is a set of components to create complex solutions organized in a controlled work flow.

When you create a package, you use a set of control flow items and maintenance tasks in the control flow pane. Then you need to specify how they will be related to each other. Figure 5-1 is an example of this basic architecture.

image from book
Figure 5-1: Package control flow architecture

A package consists of a control flow and, optionally, one or more data flows. SSIS provides three different types of control flow elements:

  • Containers that provide structures in packages

  • Tasks that provide functionality

  • Precedence constraints that connect the executables, containers, and tasks into an ordered control flow

Tasks are control flow elements that define units of work to be performed in a package control flow. An SSIS package is composed of one or more tasks. If the package contains more than one task, they are connected and sequenced in the control flow by precedence constraints. A control flow can consist of more than one level of nested containers. For instance, in this chapter, you will create a container that contains a Foreach Loop container to perform a specific task.

Control Flow Components

The SSIS Designer provides a toolbox that contains all the control flow items and maintenance plan tasks. You can find more specific and detailed information about each component in Microsoft SQL Server 2005 Books Online at http://msdn2.microsoft.com/en-us/library/ms130214.aspx. The following is a list of the control flow components available in SSIS:

  • Data flow tasks

  • SQL Server tasks

  • Data preparation tasks

  • Workflow tasks

  • Scripting tasks

  • Analysis Services tasks

  • Maintenance tasks

  • Containers

This list represents the typical control flow components. However, SSIS gives developers the ability to write custom tasks by using a programming language such as Microsoft Visual Basic.NET or C#.NET.

Now you will learn how to use some of these components in a workflow. The first exercise will show you the DelayValidation property.

Note 

DelayValidation is an important property used to delay the validation of an SSIS object until run time. This property exists on all SSIS components.

When this property is set to False, which is its default value, it can help you find problems prior to execution. These errors or warnings can be real or potential issues. Thus, when you know that they are related to objects that haven't been created but will be created later as part of the process, you can ignore them by setting the DelayValidation property to True.

For example, it is common to create a complex extract, transform, and load (ETL) process with several packages that need to run in sequence. They could consume considerable time and fail after several hours of execution because one of the packages needs objects that haven't been created yet. Even if you tested this ETL process in the development environment and did not encounter errors, it can fail in the production environment. This occurs because, in development, you have all the objects you need, but in the production environment, you don't necessarily have all the objects. They are created when the process runs.

In this scenario, you probably want to use the DelayValidation property. Setting this property to True in the objects that could potentially have issues allows you to skip pre-execution validation, and your package can start running without errors. In the following procedure, you'll set the DelayValidation property.

Note 

At execution, SSIS checks or validates all objects that were not validated in the pre-execution phase.

Work with Tasks and DelayValidation
  1. Open Microsoft SQL Server Management Studio, connect to your Database Engine server, and verify that the Employee table is deleted from the QuickStartODS database.

  2. Navigate to C:\Documents and Settings\<username>\My Documents\Microsoft Press\is2005sbs\Chap05 and double-click image from book Chap05.sln to open the solution in Microsoft Visual Studio.

  3. Notice that the Error List pane is empty.

    image from book

  4. In Solution Explorer, in the SSIS Package folder, double-click image from book Tasks.dtsx.

  5. In the Control Flow pane, right-click an open area and click Properties to display the Properties pane.

  6. In the Execution group, verify that DelayValidation is set to False.

  7. Click the Execute SQL Task - Create Employee Table task and notice that the Properties pane changes. In the Execution group, verify that DelayValidation is False.

  8. Do the same to Data Flow Task - Employee Excel to QSODS. Verify that DelayValidation is False.

    You can also see that the Error List pane shows two errors. In the last sentences of their descriptions, they say "Invalid object name 'dbo.Employee'" and "Check that the object exists in the database." In fact, the object, which is the Employee table, hasn't been created yet.

  9. Your screen should look similar to this:

    image from book

  10. Right-click image from book Tasks.dtsx and choose Execute Package. A Package Validation Error is displayed.

    Note 

    It is a valid error because the destination table has not been created yet.

  11. Your screen should look similar to this:

    image from book

  12. Click OK.

  13. Select the Tasks Package in the Properties pane and change the DelayValidation setting to True for the package.

    image from book

  14. Execute the package again and verify that it finishes successfully.

  15. In SQL Server Management Studio (SSMS), verify that the Employee table has been created with data in the QuickStartODS database.

  16. On the Debug menu, click the Stop Debugging button to stop the executed package, and then close Visual Studio.

Using Containers

SSIS contains four types of containers. They are objects that give structure to packages and additional functionality to tasks. They support repeating control flows in packages, and they group tasks and containers into meaningful units of work. Containers can include other containers in addition to tasks. The following is a list of container types available in SSIS.

  • Foreach Loop container Defines a control flow repeatedly by using an enumerator. The Foreach Loop container repeats the control flow for each member of a specified enumerator. For example, if you want to read all the files that have the *.log file name extension located in a specific folder, use the Foreach file enumerator.

    image from book

  • For Loop container Evaluates an expression and repeats its workflow until the expression evaluates to False. For example, you can define an initial expression as @iCounter=0, an evaluation expression as @iCounter<5, and the iteration expression as @iCounter=@iCounter + 1, and the task will run five times. The expressions must be valid SSIS expressions. You can build nested loops and implement complex looping because For Loop containers can include other For Loop containers.

    image from book

  • Sequence container Groups tasks and containers into control flows that are subsets of the package control flow. That is, sequence containers group the package into multiple separate control flows. It gives you such benefits as the ability to disable groups for debugging, to manage properties on multiple tasks instead of on just individual tasks, and to provide scope for variables that are used by components in the sequence container.

    image from book

  • Task Host container The nested-container architecture of Integration Services enables all containers to include other containers except for the Task Host container, which encapsulates only a single task. The Task Host container services a single task.

    image from book

Note 

Packages and Event Handlers are also types of containers.

The next exercise uses sequence containers to link groups of tasks. The objective is to test whether the output file already exists. A script task is used to verify whether the file exists. If it does, then only a portion of the package needs to execute, but if the file does not exist, the entire package needs to run.

Note 

The next exercise uses the solution described in Chapter 4, "Using Data Flow Transformations," as the source.

Add Sequence Containers
  1. Navigate to C:\Documents and Settings\<username>\My Documents\Microsoft Press\is2005sbs\Chap05\Chap05(b) and double-click image from book Chap05(b).sln to open the solution in Visual Studio.

  2. Open the image from book NewProducts.dtsx package.

  3. On the Control Flow page, drag a Sequence Container object from the Control Flow Items group Toolbox to the grid.

  4. In the Properties window, in the Name box, change the name of the sequence container to Sequence Container - File Doesn't Exist.

    Tip 

    Alternatively, you can right-click the sequence container and choose Rename.

  5. Drag existing tasks into the sequence container.

    To do this, you must select and drag Execute SQL Task - Delete From ProductNames and Data Flow Task - Import Products together rather than individually. Your screen should look similar to this:

    image from book

    Now you need a way to test whether the output file is found and, if not found, to jump to the sequence container. The first step is to create a variable for the package.

Test Whether the Output File is Found
  1. Right-click the background of the Control Flow pane and choose Variables.

    This brings up a window that you can dock on the side.

  2. Click the Add Variable button in the Variables toolbar.

  3. In the Name column, change the name to FileFound. In the Data Type column, change the type to Boolean.

    Note 

    The scope for a variable is the container that is active when you create the variable. In this case, it is the package.

    Your screen should look similar to this:

    image from book

    Next, assign a value to the variable. You can use a script task to do this.

Assign a Value to the Variable
  1. Drag Script Task from the Control Flow Items group Toolbox to the grid.

  2. Double-click Script Task to open the Script Task Editor.

  3. On the General page, in the Name property box, change the Name property to Script Task - Check Products List.

  4. On the Script page, notice that the default language is Visual Basic .NET. Click the Design Script button to add the following non-bold code to the Main() subroutine. You can also copy the script from the Script file in the Data folder.

     Public Sub Main()     Try         Dim myVariable as Variables         Dts.VariableDispenser.LockOneForWrite("FileFound", myVariable)         myVariable("FileFound").Value =         System.IO.File.Exists(Dts.Connections(_         "ProductsDestination").ConnectionString)         Dts.TaskResult = Dts.Results.Success     Catch     Dts.TaskResult = Dts.Results.Failure     End Try     Dts.TaskResult = Dts.Results.Success End Sub 

    The VariableDispenser belongs to the task, not to the package. If you have multiple variables to acquire, you add each one independently to the VariableDispenser, using LockForRead and LockForWrite and only the variable name as an argument. This does not actually lock the variable but puts it in a list to lock. Once all the variables are in the dispenser, you call GetVariables to put the array of variables into the local variable (myVariable in this example). If you need only one variable, the LockOne functions are a shortcut.

    Use System.IO to test for the existence of the output file. Rather than hard-code the path and filename, use the connection. The FileFound variable will contain the result of the test. Whether the file is found or not, the task will return Success.

    You don't need to release the variable at the end of the function. It will be done automatically at the end of the package. Your code should look like this:

    image from book

  5. Close the Design Script window and click OK to close the Script Task Editor.

    The next step is to create a placeholder for the tasks to be performed if the file is found.

Create a Placeholder
  1. On the Control Flow page, drag Sequence Container from the Control Flow Items group Toolbox to the grid.

  2. Right-click the name of the container and rename the sequence container Sequence Container - File Exists.

  3. Using Windows Explorer, navigate to the C:\Documents and Settings\<username>\My Documents\Microsoft Press\is2005sbs\Chap05\Chap05(b)\Data\ folder.

  4. Open the image from book FuzzyInputA.txt file to review its contents.

    image from book

  5. On the Control Flow page, drag a Data Flow Task from the Control Flow Items group Toolbox to the Sequence Container - File Exists container.

  6. Right-click the new task, select Rename, and change the name of the task to Data Flow Task - Lookup Names.

  7. Double-click Data Flow Task - Lookup Names to open the Data Flow page.

  8. Drag Flat File Source from the Data Flow Sources group Toolbox to the grid. Rename this task Flat File Source - FuzzyInput.

  9. Double-click the Flat File Source - FuzzyInput data adapter, and then click the New button to open the Flat File Connection Manager Editor to create a new connection manager.

  10. Enter a name for the connection manager: Product Input Names.

  11. Use the Browse button to navigate to the source text file: C:\Documents and Settings \<username>\My Documents\Microsoft Press\is2005sbs\Chap05\Chap05(b)\Data \FuzzyInputA.txt.

  12. Click Advanced to edit the following properties:

    1. Name: ProductName

    2. OutputColumnWidth: 50

  13. Click OK.

    Note 

    Fuzzy Lookup gives a warning if the column width of the source is different from the reference column.

  14. In the Flat File Source Editor, click Columns. This will set the mapping between the External Column and the Output Column. Click OK.

Adding a Fuzzy Lookup Transformation

The Fuzzy Lookup transformation performs data cleaning tasks. Fuzzy Lookup enables you to match input records with clean, standardized records in a reference table. This transformation finds the closest match and indicates the quality of the match. The reference data must be a table in a SQL Server 2005 database. The match between the value in an input column and the value in the reference table can be an exact match or an approximate match. However, if you want an exact match, it is better to use a Lookup transformation instead. The transformation requires at least one column match to be configured for fuzzy matching.

Because Fuzzy Lookup transformation needs a reference table, it could sometimes consume time and resources, especially in large environments. Fuzzy Lookup uses temporal index tables to store the data for easy access and to enhance performance, which can consume a large amount of memory. You can make adjustments in the advance properties to optimize the performance within your specific environment. It is good practice to use non-production servers to run large and complex fuzzy transformation tasks.

Note 

Columns that participate in the join between the input and the reference table must have compatible data types.

Now you're ready to match the list of names in the input file to the master list of new product names.

Add a Fuzzy Lookup Transformation
  1. Drag a Fuzzy Lookup transformation from the Data Flow Transformations group Toolbox to the grid. Rename it Fuzzy Lookup - Product Names.

  2. Connect the output from Flat File Source - FuzzyInput to Fuzzy Lookup - Product Names by clicking and dragging the green arrow.

  3. Double-click the Fuzzy Lookup task to open the Fuzzy Lookup Transformation Editor and set the properties to Connection manager: localhost.QuickStartODS.

  4. Generate a new index, using Reference Table [dbo].[ProductNames].

    image from book

    Tip 

    Fuzzy Lookup creates a sophisticated index on the lookup table. It can create the index each time, or it can store the index in the database. For a large lookup table that doesn't change frequently, you should store the indexes.

  5. Click the Columns tab, and then create a join between ProductName in the Available Input Columns and ProductName in the Available Lookup Columns by clicking and dragging ProductName from the Input Columns to ProductName in the Lookup Columns.

  6. Check the Pass Through checkbox in the Available Input Columns table so that the source file name is included in the output file.

  7. Select the ProductName column as the Lookup Column.

  8. Click OK.

    image from book

    Now you are ready to put the results of the lookup into a database table.

Add a SQL Server Destination
  1. Drag SQL Server Destination from the Data Flow Destinations group Toolbox to the grid. Rename it SQL Server Destination - MatchedNames.

  2. Drag an output arrow from the Fuzzy Lookup transformation to the SQL Server destination.

  3. Double-click the SQL Server Destination - MatchedNames data adapter.

  4. Select the connection manager: localhost.QuickStartODS.

  5. Click the New button to create a new table.

  6. Replace the default table name SQL Server Destination - MatchedNames with Matched-Names.

  7. Click OK.

    Note 

    Notice the extra columns created to hold statistics for the transformation.

    Your screen should look similar to this:

    image from book

  8. Click Mappings in the SQL Destination Editor to set the correct mappings between the input and destination columns.

  9. Click OK.

    Now you should have a screen that looks like this:

    image from book

    If you run the task a second time, the same rows will be added twice to the Matched-Names table. You can add a step that clears out the destination table before executing the lookup.

Add an Execute SQL Task
  1. On the Control Flow page, add a new Execute SQL Task to the File Exists container. Double-click the new task and set the following properties:

    • Name: Execute SQL Task - Clear Matched Names

    • Connection: localhost.QuickStartODS

    • SQL Statement: Delete from MatchedNames

  2. Click OK.

Adding a Foreach Loop Container

The next step is to loop through multiple input files. In this demonstration, you will use the default enumerator Foreach File. The Foreach Loop container provides several other enumerators that you can review by clicking the list box.

Add a Foreach Loop Container to Sequence Container - File Exists
  1. Drag Foreach Loop Container from the Control Flow Items group Toolbox into the Sequence Container - File Exists container.

  2. Double-click the new container to open the editor, and then click Collection to change the following properties:

    • Folder: C:\Documents and Settings\<username>\My Documents\Microsoft Press\is2005sbs\Chap05\Chap05(b)\Data

    • Files: FuzzyInput*.txt

      image from book

    Now you will add a variable that will change the connection string for each iteration of the loop so that each input file in the specified folder will be read.

  3. Click Variable Mappings in the Foreach Loop editor.

  4. In the Variable drop-down list box, select <New Variable>.

  5. Change the Container property to the name of the package, NewProducts (at the top of the hierarchy), to set the scope of the variable to the entire package.

    image from book

  6. Change the variable's Name property to InputFile.

  7. Click OK twice.

  8. Click the Product Input Names connection to display its Properties pane.

  9. Click the ellipsis in the Expressions property.

  10. In the Property drop-down list, select ConnectionString and add the following expression: @[User::InputFile]. (This expression references the user variable called InputFile.)

  11. Click OK.

    image from book

  12. Clear Precedence Constraint between Execute SQL Task - Clear Matched Names and Data Flow Task - Lookup Names.

  13. Drag Data Flow Task - Lookup Names into the Foreach Loop container. Now the Sequence Container - File Exists should look like this:

    image from book

Applying Precedence Constraints

Precedence constraints connect executables into a control flow. An executable is an SSIS object in a control flow. For example, an executable could be a For Loop, a Foreach Loop or Sequence container, a task, or an event handler.

Precedence constraints specify conditions that determine whether executables run. Basically, you can control the sequence or execution order based on restrictions, expressions, or both. The constraint values are Success, Failure, or Completion. The expression has to be a Boolean expression, and you can combine expressions and constraints, for instance, Expression-and-Constraint or Expression-or-Constraint. When you have a task that has more than one precedence executable or precedence task, you can decide to evaluate each precedent constraint to True, or you can choose to evaluate only one constraint to True. You can set all these options in the Precedence Constraint Editor.

In the next steps, you will learn how to use and set constraints, use Evaluation Operation and Expressions in constraints, and, finally, set logical AND and OR statements to set multiple constraints.

Apply Precedence Constraints
  1. Drag the output arrow from Script Task - Check Products List to Sequence Container - File Doesn't Exist.

  2. Double-click the constraint to open the Precedence Constraint Editor and modify the following options:

    • Evaluation Operation: Expression and Constraint

    • Expression: @FileFound == False

    Your screen should look similar to this:

    image from book

    Note 

    Use the @ symbol in front of a variable to reference it in an expression.

  3. Click OK.

  4. Drag the output arrow from Script Task - Check Products List to Sequence Container - File Exists.

  5. Double-click the constraint to edit the Precedence Constraint options:

    • Evaluation Operation: Expression and Constraint

    • Expression: @FileFound == True (It means that the file exists.)

  6. Click OK.

  7. Drag the green arrow from the Execute SQL Task - Clear Matched Names task to the Foreach Loop container.

  8. Delete the image from book ProductsDestination.txt file from C:\Documents and Settings\<user-name>\My Documents\MicrosoftPress\is2005sbs\Chap05\Chap05(b)\Data.

  9. Right-click the image from book NewProducts.dtsx package and choose Execute Package. Only the File Doesn't Exist branch should execute.

    image from book

  10. Stop debugging from the Debug menu, and then re-execute the package. Only the File Exists branch should execute.

    image from book

  11. Stop debugging.

    Now only one branch will be taken. However, even if the File Doesn't Exist branch runs, the File Exists branch will still need to run. You can add a Precedence Constraint between the two containers to trigger the execution of the File Exists branch.

  12. Add a Precedent Constraint from the Sequence Container - File Doesn't Exist container to the Sequence Container - File Exists container.

  13. Select the logical OR constraint option for this new constraint.

    Caution 

    Because there are multiple constraints on the Sequence Container - File Exists container, you need to determine whether both constraints (logical AND) or just one constraint (logical OR) needs to be satisfied to execute the tasks within the container. In this case, only one constraint will be executed, never both, so you need to use the OR constraint option.

    Your screen should look similar to this:

    image from book

    Note 

    The precedence constraint with a logical OR condition appears as a dotted green arrow.

  14. Delete the image from book ProductsDestination.txt file.

  15. Execute the package. Both branches should execute.

  16. Stop debugging the package and close the solution.

    image from book




Microsoft SQL Server 2005 Integration Services Step by Step
MicrosoftВ® SQL Server(TM) 2005 Integration Services Step by Step
ISBN: 0735624054
EAN: 2147483647
Year: 2007
Pages: 152

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net