Data Flow Versus WorkflowFundamental Differences


In many ways, the data flow in the Data Flow Task looks like workflow in the runtime. Boxes represent functionality. Lines represent relationships between the boxes. Components in the Data Flow Task use connections, log data to log providers, and so forth. But, when you get past simple visual comparisons at the boxes and lines level, you'll see they are quite different. This is why Microsoft makes the distinction in the designer and object models. Following are the main differences between data flow and workflow:

  • Execution model

  • Connecting lines

  • Functional gamut

Execution Model

Data flow and workflow components execute differently. The duration of a given task's execution time is a fraction of the total duration of the execution time of the package it runs within. The IS runtime starts a task and waits for it to complete before scheduling the tasks constrained by that task. So, for the simple package in Figure 6.3, the Data Flow Task won't execute until the Preparation SQL Task has finished successfully. The total package execution duration can closely be approximated by adding together the execution duration times from each of the individual tasks. The execution time for workflow boils down to the following formula: E = e + s, where E is the total time it takes to execute a package, e is the sum of the execution duration times of all tasks on one logical thread, and s is the package startup and shutdown duration. Of course, this is complicated when there is more than one logically parallel executing thread. Then, the execution duration of the package is the execution duration of the longest logical thread duration. The point is that execution duration for workflow is an additive operation derived from combining the execution durations of the tasks. In addition, tasks on the same logical thread dont run at the same time.

Figure 6.3. Workflow execution duration is the sum of task duration


Estimating the total execution duration of a Data Flow Task is a little more complicated, but conceptually it is driven by the amount of data running through the Data Flow Task and the types of transforms in the Data Flow Task. A logical data flow thread is what you get if, graphically, you were to draw a single line from a source adapter, through each succeeding transform, until you get to the destination adapter. The significant difference is that it's possible for all adapters and transforms on one logical thread to be active at the same time.

You wouldn't use these methods for estimating package or Data Flow Task execution durations. There are better ways to do that, such as running the data flow with a subset of data and then multiplying that execution duration by the scale of the full set of data. The point is that tasks in the same logical thread do not run at the same time. Transforms run simultaneously even if on the same logical thread.

Connecting Lines

Tasks have precedence constraints. Transforms have paths. Those connecting lines look a lot alike, but they're very different. Precedence constraints as described previously do little but determine if the next task can execute. Paths actually represent outputs of transforms and the metadata associated with those outputs. Precedence constraints control task execution; paths show where the data flows.

Functional Gamut

Functional gamut refers to the breadth of functionality a component can provide. The functional gamut of tasks is virtually unlimited. Tasks can be written to do just about anything any piece of software can do. Witness the stock tasks that ship with IS that vary from the Script Task, which, in itself, represents the key to unlock the full power of .Net software development, to the Data Flow Task. The Windows Management Instrumentation (WMI) and Web Services Tasks are other good examples of the range of different tasks you can write that use various emerging technologies.

Transforms, on the other hand, are very specific to processing data and are strictly bound to the architecture of the Data Flow Task. This isn't to say that it is a deficit for transforms, but merely to point out that tasks and transforms are quite different. The beauty of these differences is that both are supported within one framework and both are equally extensible through pluggable architectures.



Microsoft SQL Server 2005 Integration Services
Microsoft SQL Server 2005 Integration Services
ISBN: 0672327813
EAN: 2147483647
Year: 2006
Pages: 200
Authors: Kirk Haselden

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net