Package Execution Principles | Expert SQL Server 2005 Integration Services (Programmer to Programmer)

Scalability is also a matter of designing a package execution strategy that minimizes resource impact (memory, processor, and I/O capabilities) on the servers that need resources for other purposes or have little resources, and leveraging the servers that have resource availability or that can handle the impact load from package execution. When it comes to where a package should be executed, there is no absolute answer. However, there are general principles that can direct one architecture design over another.

Package Storage Location versus Execution Location

First of all, when it comes to running a package, there is a difference between where a package is run and where that package is stored. A package can be stored as a file and put in a file system folder, or a package can be loaded into the MSDB system database in SQL Server 2005. Either way, when the package is executed, the storage location is merely where the metadata of that package lives. The package is loaded from that source location through an execution method and run on the machine where the execution is kicked off. In other words, if you are running a package through the command line DTExec, the package will run on the machine where DTExec is called, not the place where the SSIS package is stored. Figure 12-16 shows the storage location server on the left and the execution server on the right. The package is executed on the server on the right, where the resources are available.

image from book
Figure 12-16: Package storage and execution location

When it comes to knowing on which servers to install the Integration Services service, it’s simply a matter of identifying the servers that will be running the packages, not necessarily the servers that are storing the packages, unless they will also be running packages.

SSIS packages can be executed in several ways, the most common being the command line executable DTExec or through SQLAgent. Wherever the command line is executed or the SQLAgent job is started, that is where the package will be loaded and run. The same principle applies to executing a package through the debugger in BIDS, or running a package with DTExecUI. When you run the package through these UI mechanisms, the local resources on the machine (perhaps a development machine or laptop) will be used to handle the execution. The following sections look at the impact of a package that contains at least one data flow on the source, destination, and execution machines.

Execute SQL Task and Bulk Insert Task Execution

Although a package may be run on any machine with the SSIS service (or really, that has the SSIS executables and DLLs), this isn’t necessarily the place where all the work is being done in the package. For example, if you run an Execute SQL Task in a package and kick the package off on a different server than where the Execute SQL Task connection is defined, then the SQL statement is run where the connection is configured, not on the SSIS execution machine. To be sure, the workflow coordination will still be handled on your SSIS execution machine, but the actual SQL code would be run on a different machine.

For the Execute SQL Task and Bulk Insert Task, the SQL code or BCP command is executed on the machine that the connection specifies. This is different from the Data Flow Task, which runs on the machine where the package is executed.

Package Execution and the Data Flow

For your packages that have data flows (which is probably most of your packages), you should understand what happens to the data based on where you execute that package (with the embedded data flow). Additionally, understanding where the data flow execution impact will be dictates where you decide to run your packages.

The data flow impact on the package execution server involves the resources needed to manage the data buffers, the data conversion requirements as data is imported from sources, the memory involved in the lookup cache, the temporary memory and processor utilization required for the Sort and Aggregate transformations, and so on. Essentially, any transformation logic contained in the data flows will be handled on the server where the package is executed.

The following examples are common configurations for where data is sourced, the destination location, and where packages are executed. Obviously, data flows can be varied and complex with multiple sources and destinations each, so this simplification provides the framework with single-source locations and single-destination locations.

Packages Executed on the Source or Destination Servers

The most common example is when a package (that contains a data flow) is executed on either the source or destination server, assuming they are separate.

Source Server Package Execution

Figure 12-17 shows the data path and impact on the environment when the package is executed on the machine where the source data is located.

image from book
Figure 12-17: Package execution on source server

The source server will both provide the extracted data and handle the data flow transformation logic, and the destination server will require any data load overhead such as disk I/O for files or database inserts or index reorganization.

The benefits of this approach include the following:

There is decreased impact on the destination server, where potential users are querying.
Data flow buffers are loaded rapidly, given that the location of the source files and package execution is local and involves no network I/O.
The impact on the destination server is limited, which is useful for destination servers that have 24/7 use or the SSIS process runs often.

Following are the drawbacks of this approach:

The impact on the source server’s resources, which may affect applications and users on the source server
Potential reduced performance of the data flow destination adapter and the inability to use the SQL Destination adapter, which requires the package be executed on the same server as the package

Destination Server Package Execution

Similar to the impact when a package is run on a source server, running a package on the destination server, as Figure 12-18 demonstrates, will have similar benefits and drawbacks, just reversed.

image from book
Figure 12-18: Package execution on destination server

Following are benefits of executing a package on a destination server:

Limited impact on your source server if it is running other critical tasks
Potential performance benefits for data inserts, especially since the SQL Destination component can now be used
Licensing consolidation if your destination server is also running SQL Server 2005

One drawback of this approach is that it has a heavy impact on the destination server, which may affect users querying the server.

This approach is very useful if you have users querying and using the destination during the day, and your SSIS processing requirements can be handled through nightly processes.

Stand-alone SSIS Servers

An alternate execution option is to run your SSIS packages on a second or third server.

In Figure 12-19, an SSIS package is executed on a second server, and in this diagram, both the source and destination are on the same machine.

image from book
Figure 12-19: Package execution on secondary server

As you can see, this is not the ideal scenario, since the data would need to travel over the network to the second server, be processed, and then travel back to the same original machine, creating potentially high network utilization, depending on the volume. However, it would reduce the resource impact on the data source and destination server.

It makes more sense to use a stand-alone SSIS server if your sources and destinations are not on the same physical machines. Figure 12-20 highlights this architecture.

image from book
Figure 12-20: Package execution on tertiary server

In this case, the impact on both the source and destination machines is reduced because the SSIS server would handle the data flow transformation logic. This architecture also provides a viable SSIS application server approach, where the machine can handle all the SSIS processing packages no matter where the data is coming from and going to. The drawbacks to this approach are in the ability to optimize the source extraction and destination import, increased network I/O (since the data has to travel over the wire two times), as well as licensing.

Distributed Package Execution

No matter where a package is stored, it is loaded and run on the server that invokes the execution. This means that if you need to achieve distributed execution for your packages, you must have a mechanism for kicking off the packages on different servers. Distributed execution just means that you are implementing a scale-out approach to your SSIS architecture. This can be very valuable for achieving large scalability that can expand. It can also reduce hardware costs, considering that scale-up servers are very expensive.

A distributed execution needs a central server that houses the packages and has the application that coordinates the execution on the remote machines. Each remote server that will be performing the executions needs a scheduling agent that runs the packages. Figure 12-21 shows what this server architecture would look like.

image from book
Figure 12-21: Distributed package execution

This scheduling server can use a variety of mechanisms that actually kick off packages on other servers with associated applications, often called agents. Here are a few of the application and agent possibilities:

SQL Server Agent- The most common approach to distributed execution is using SQL Agent and either leveraging the master-child SQL Agent capabilities, or just calling jobs to kick off on remote servers with the sp_start_job stored procedure. SQL Agent jobs are asynchronous, so you would lose the ability to know when the remote job is complete, but it makes for an easy-to-implement solution.
SQL Procedures with the Execute SQL Task- When you run the system extended procedure, xp_cmdshell, you are able to kick off an executable on the server where the stored procedure is run. By using this procedure to call a command line with DTExec, you could have a master SSIS package on the central server that is running several Execute SQL Tasks with connections to remote SQL Servers. Each Execute SQL Task could run the xp_cmdshell, and the package would actually run on the remote server. The xp_cmdshell is synchronous, so the Execute SQL Task would wait until the child package was finished executing on the remote server before completing. This would enable control flow constraints to be used. Security must be evaluated with this choice.
Application Programming- Packages can be run programmatically by using the load and execute methods in the object model. Therefore, it is possible to create applications on the central server and remote server that communicate and coordinate package execution, such as leveraging .NET remoting technologies.
Queuing Technologies- MSMQ or SQL Server 2005 Service Broker can also be a mechanism to execute packages on remote servers. Capturing both messages that define the package to execute and the execution details can inform agents that run the packages.