Understanding the SSIS Data Flow and Control Flow


From an architectural perspective, the difference between Integration Services Data Flow and Control Flow is important. One aspect that will help illustrate the distinction is to look at them from the perspective of how the components are handled. In the Control Flow, the task is the smallest unit of work, and tasks require completion (success, failure, or just completion) before the subsequent tasks are handled. In the Data Flow, the transformation is the basic component; however, a transformation functions very differently from a task. Instead of one transformation necessarily waiting for associated transformations before work can be done, the transformations work together to process and manage data.

Comparing and Contrasting the Data Flow and Control Flow

Although the Control Flow looks very similar to the Data Flow with processing objects (tasks and transformations) and green and red connectors that bridge them, there is a world of difference between them. The Control Flow, for example, does not manage or pass data between components; rather it functions as a task coordinator with isolated units of work. Here are some of the Control Flow concepts:

  • Workflow orchestration

  • Process-oriented

  • Serial or parallel tasks execution

  • Synchronous processing

As highlighted, the Control Flow tasks can be designed to execute both serially and in parallel — in fact, more often than not there will be aspects of both. A Control Flow task can branch off into multiple tasks that are performed in parallel as well as a single next step that is performed essentially in serial from the first. To show this, Figure 10-1 is a very simple Control Flow process where the tasks are connected in a linear fashion. The execution of this package shows that the components are serialized — only a single task is executing at a time.

image from book
Figure 10-1

The Data Flow, on the other hand, can branch, split, and merge, providing parallel processing, but the concept is different from the Control Flow. Even though there may be a set of connected linear transformations, you cannot necessarily call the process serial, because the transformations in most cases will be running at the same time, handling subsets of the data in parallel. Here are some of the unique aspects of the Data Flow:

  • Information-oriented

  • Data correlation and transformation

  • Coordinated processing

  • Streaming in nature

  • Sources and destinations

Similar to the Control Flow shown above, Figure 10-2 models a simple Data Flow where the components are connected one after the other. The difference between the Data Flow in Figure 10-2 and the Control Flow in Figure 10-1 is that only a single task is executing in the linear flow. In the Data Flow, however, all the transformations are doing work at the same time. In other words, the first batch of data flowing in from the source may be in the final destination step (Currency Rate Destination), while at the same time data is still flowing in from the source.

image from book
Figure 10-2

Multiple components are running at the same time because the Data Flow transformations are working together in a coordinated streaming fashion and the data is being transformed in groups as it is passed down from the source to the subsequent transformations.

SSIS Package Execution Times from Package Start to Package Finish

Since a Data Flow is merely a type of Control Flow task, and there can be more than one Data Flow embedded in a package, the total time it takes to execute a package is measured from the execution of the first Control Flow task or tasks through the completion of the last task being executed, regardless of whether the components executing are Data Flows transformations or Control Flow tasks. This may sound obvious, but it is worth mentioning, because when designing a package, maximizing the parallel processing where appropriate (with due regard to your server resources) will help optimize the flow and reduce the overall processing time.

The package in Figure 10-3 has several tasks executing a variety of processes and using precedence constraints in a way that demonstrates parallel execution of tasks.

image from book
Figure 10-3

Because the Control Flow has been designed with parallelization, the overlap in tasks allows the execution of the package to complete faster than it would if the steps were executed in a serial manner as in Figure 10-1.



Professional SQL Server 2005 Integration Services
Wireless Java : Developing with Java 2, Micro Edition
ISBN: 189311550X
EAN: 2147483647
Year: 2006
Pages: 182

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net