The SSIS Architecture


With DTS in SQL Server 2000, control flow and data flow were tightly coupled, making it difficult to create and manage complex packages. The new SSIS architecture breaks this tight coupling and separates control flow from data flow. You will see this change when you start designing packages in SSIS designer inside Business Intelligence Development Studio. Control Flow and Data Flow are two tabs or screens in the SSIS package designer. The other change in the SSIS architecture is the division of package creation, storage, execution, and data flow into four core components: clients, the SSIS service, the runtime engine, and data flow engine.

Before further exploring these four core components, here are some SSIS terms that you should become familiar with:

  • Task A package contains one or more tasks, which perform various data manipulation and management functions.

  • Container Tasks are grouped by entities known as containers. Containers support iterative control flow in packages and group tasks into meaningful units of work. The SSIS architecture is based on containers. The package container organizes containers, tasks, connections, transformations, variables, configurations, and precedence constraints into an executable unit of work. Containers, except for the TaskHost container, can hold other containers, and they provide scope for variables, transactions, and events.

  • Precedence constraint Precedence constraints organize the tasks and containers in a package workflow and specify the conditions for execution of the tasks.

  • Data Flow task Tasks that extract data from different sources, provide the facility to transform, cleanse and modify data, and store data in the appropriate destination are known as Data Flow tasks.

  • Package A collection of different tasks and containers, connections, precedence constraints controlling data flow, variables, and different configurations is combined together into a unit of work called a package. The different components of SSIS manage, store, and execute this unit of work.

Figure 12.1. An SSIS package is a collection of components such as tasks, containers, connections, variables, and configurations.


The following sections look at the core SSIS components in more detail.

SSIS Clients

The SSIS clients include the built-in tools, wizards, and command-line utilities, as well the custom applications that you create for designing and creation of packages. The SSIS designer graphical tool allows you to create SSIS packages without writing a single line of code. The designer contains a collection of built-in tasks to choose from, or you can program a custom task by using .NET managed or native code. In addition to the designer, SSIS contains several wizards and command prompt utilities that assist in configuring the packages by creating configuration files, deploying packages, and creating package dependencies. These are discussed later in this chapter.

SSIS Service

SSIS has separated the management and design of packages for better manageability into a Windows service (MsDtsSrvr.exe). This service provides management support for SSIS package storage and execution. This service is off by default, and the startup type is set as disabled but is automatically started when the first DTS package runs. You can manage this service by using the SQL Server Configuration Manager tool. In addition to monitoring local and remote running packages, this service also controls the storage of packages. Packages can be stored either in the sysdtspackages90 table in the msdb SQL Server system database or as XML files (.dtsx) on the file system.

Note

Business Intelligence Development Studio allows you to save SSIS packages to disk files only as .dtsx XML-formatted files. You can export the packages to the msdb database by using SQL Server Management Studio.


The SSIS Runtime Engine

The DTS runtime engine executes packages and provides support for logging, debugging, configuration, connections, and transactions. It provides support for the following:

  • Execution of each task specified by the workflow and consideration of different properties, such as concurrent execution of multiple tasks.

  • Handling of the scope and namespace of the variables passed between tasks, containers, or packages.

  • Logging of events raised during package execution. Table 12.1 lists the SSIS events raised during package execution.

    Table 12.1. SSIS Events

    Event Handler

    Description

    OnError

    This event is raised when an error occurs.

    OnExecStatusChanged

    This event is raised when its execution status changes.

    OnInformation

    This event is used for reporting purposes during the validation and execution of an executable. The report does not contain errors or warnings.

    OnPostExecute

    This event is raised by an executable immediately after it completes.

    OnPostValidate

    This event is raised by an executable when its validation completes.

    OnPreExecute

    This event is raised by an executable immediately before it runs.

    OnPreValidate

    This event is raised by an executable when its validation begins.

    OnProgress

    When the executable makes measurable progress, it raises this event.

    OnQueryCancel

    This event is raised by an executable to determine whether it should stop running.

    OnTaskFailed

    This event is raised by a task when it fails.

    OnVariableValueChanged

    This event is raised when the value of a variable is changed by the executable that defines this variable. This event is not raised if you set the RaiseChangeEvent property for the variable to False.

    OnWarning

    This event is raised by an executable when a warning occurs.


  • Debugging of packages by storing information about enabled breakpoints.

The runtime engine is also responsible for establishing the connection to any external data source used by the package.

The Data Flow Engine

The SSIS data flow engine manages the tasks that move data between the source and destination. SSIS uses in-memory buffers for better performance. The Data Flow task provides and manages these buffers, calls the appropriate data flow components, and provides transformations to modify and cleanse data. The Data Flow task is a set of connected data flow components consisting of data adapters and pipelines. These are the different data flow components (see Figure 12.2):

  • Source adapters to extract the data Source adapters are usually the first components in a data flow and do not usually have an input but can have one or more outputs.

  • Transformations These are the pipeline processes that are used to modify and cleanse data. They have an input and output to receive data from upstream and send processed data to downstream. Transformations can be synchronous or asynchronous. For synchronous transformation task, every input has an associated output; in an asynchronous task, there can be multiple outputs for a single input.

  • Destination adapters These are used to connect and store data in the data sources. They have at least one input and are usually the last component in the graph.

  • Pipeline paths The components are connected together by pipeline paths. Paths connect the output of one component to the input of the second. They are different from precedence constraints because they only connect the components and do not constrain the next data flow component.

Figure 12.2. The Data Flow task is a set of connected data flow components consisting of source adapters, transforms, destination adapters, and pipelines.


As shown in Figure 12.2, the data flow task can contain one or more source adapters to read the data, one or more transformations to process and cleanse the data, and one or more destinations to store the data.

The data flow engine manages interdependency between the different components of a data flow task and also supports parallel execution of the non-connected tasks. The data flow engine supports merging, splitting, and multicasting of data in a data flow and can also be used to redirect an error to a destination output.




Microsoft SQL Server 2005(c) Changing the Paradigm
Microsoft SQL Server 2005: Changing the Paradigm (SQL Server 2005 Public Beta Edition)
ISBN: 0672327783
EAN: 2147483647
Year: 2005
Pages: 150

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net