Turnkey Settings for Speedup

In this section, you'll find a collection of straightforward methods and hints for data flow speedup. The Data Flow Task provides a number of settings and options you can use to eliminate unnecessary work or to diminish the amount of data that must flow through it. Some of these methods involve simple property settings; others are a bit more complex. In addition, you can do a number of things in the environment outside the Data Flow Task to speed up data flow, which are mentioned in this section as well. Although these methods are presented as a series of "do's and don'ts," understand that not every tip applies to every situation and that the items presented here are only guidelines, not hard rules that must always be followed. Knowing when to apply which rules is a matter of experience grown from testing and measuring the results.

Eliminating Unnecessary Work

Often, you might find that the best data flow optimization you can achieve is to simply eliminate work the Data Flow Task must do. Eliminating work is almost always preferable to making the Data Flow Task faster because you get a 100% speedup for every bit of work you eliminate. The following tips describe ways to eliminate work from the Data Flow Task.

Optimize the Sources

Most of the optimizations for eliminating unnecessary work have something to do with eliminating the data coming from the sources. Following are a few ways to eliminate unnecessary data from entering into the data flow.

Remove Unneeded Columns

Unneeded columns are columns that never get referenced in the data flow. The Execution Engine emits warnings for unused columns, so they are easy to identify. This makes the buffer rows narrower. The narrower the row, the more rows that can fit into one buffer and the more efficiently the rows can be processed. Binary Large Objects (BLOBs) are particularly burdensome to the Data Flow Task and should be eliminated if at all possible. Use the queries in the source adapters to eliminate unnecessary columns.

Use a SQL Select Statement to Retrieve Data from a View

Avoid using the Table or view access mode in the OLE DB Source Adapter. It is not as performant as using a SELECT statement because the adapter opens a rowset based on the table or view. Then, it calls OpenRowset in the validation phase to retrieve column metadata and later in the execution phase to read out the data.

Testing has shown that using a SELECT statement can be at least an order of magnitude faster because the adapter issues the specified command directly through the provider and fetches the data using sp_prepare without executing the command, avoiding the extra round-trip and a possibly inappropriate cached query plan.

Optimize Source Queries

Using traditional query optimization techniques, optimize the source adapter SQL query. SSIS doesn't optimize the query on your behalf, but passes it on verbatim.

Use Flat File and Other File Sources

Retrieving data from file sources presents its own set of performance challenges because the data is typically in some format that requires conversion. For example, the Jet Provider only supports a limited set of data types when reading Excel files and flat-file data is always of type String until converted. The following sections discuss a few hints for how to eliminate unnecessary data flow work.

Combine Adjacent Unneeded Flat-File Columns

In fixed-width flat files, combine unused columns into one. This eliminates the parsing necessary for the unused columns.

Leave Unneeded Flat-File Columns as Strings

Flat-file columns are all strings when first read from the file. Converting strings to dates, integers, or even Unicode strings is costly. Leave columns as strings unless a conversion is absolutely necessary.

Eliminate Hidden Operations

For the most part, the Data Flow Task is explicit and obvious about what it is doing. For example, in the Derived Column transformation, you must cast column and variable values explicitly. However, there are some components that perform hidden or automatic conversions. For example, the Flat File Source Adapter attempts to convert external column types to their associated output column types. Use the Advanced Editor to explore each column type so that you know where such conversions occur.

Only Parse or Convert Columns When Necessary

Every row with a type conversion costs CPU cycles. Eliminate unnecessary conversions. Reorganize the data flow to eliminate the Type Conversion transform, if possible. Even better, if possible, modify the source column data type to match the type needed in the data flow. Also, if the conversion need not be locale-sensitive, use FastParse.

Use the `FastParse` Option in Flat File Source Adapter

FastParse is a set of optimized parsing routines that replaces some of the SSIS locale-specific parsing functions. FastParse only supports the more common date format representations and does not use locale when parsing. See Books Online for other FastParse limitations.

To turn on FastParse, follow these steps:

1.	Open the Flat File Source Adapter or the Data Conversion transform in the Advanced Editor.
2.	Select the Input and Output Properties tab.
3.	Open the Flat File Source Output node in the tree on the left.
4.	Open the Output Columns.
5.	Select one of the columns.
6.	Note the `FastParse` custom property in the property grid.

Eliminate Unneeded Logging

Logging is useful for debugging and troubleshooting. When developing packages, you should pull out the stops. But, when deploying completed packages to production, be mindful and careful about the log entries you leave enabled and the log provider you use. Notably, OnPipelineRowsSent is somewhat verbose. If there is any concern about the write time for logging, you might want to consider turning off logging or at least eliminating the chatty log entries. This is something you need to balance against the need to monitor and troubleshoot packages. Turning off logging might be good for performance, but then you don't get any logs to diagnose when problems arise. You'll have to find the right balance. Logging generally doesn't kill performance. You just need to be aware of the potential performance problems.

Optimizing the Data Flow

The following are some suggestions for tuning the data flow.

Use Indexes

This should probably go without saying, but when accessing data, make sure you use indexes to speed up the process. For example, in certain cases, the Lookup transform can benefit from having an index on the reference table.

Perform Set-based Operations

When possible, perform set-based operations in SQL Server. For example, SQL Server can generally sort faster than the Sort transform, especially if the table being sorted is indexed. Set-based operations, such as joins, unions, and selects with ORDER BY and GROUP BY, tend to be faster on the server.

Be Mindful of Transforms with Internal File IO

In Chapter 20, "The Stock Data Flow Components", some transforms have comments to the effect that they perform internal file Input/Output. For example, the Raw Source and Destination, Import/Export Column transforms, Flat File Source and Destination Adapters, and Excel File Source and Destination Adapters are all directly impacted by the performance of the file system. File IO isn't always a bottleneck, but when combined with low memory conditions, causing spooling, or with other disk-intense operations, it can significantly impact performance. Components that read and write to disk should be scrutinized carefully and, if possible, configured to read and write to dedicated hard drives. Look for ways to optimize the performance of the hard drives using RAID, defragmentation, and/or correct partitioning.

Move to a 64-Bit Platform

As you can see, the Data Flow Task can be very memory intensive. With a few blocking transforms and a large source dataset, it's easily possible to run into the memory limitations of 32-bit systems. The benefit of moving to a 64-bit platform is simplemore memory.

Monitor Memory-Intensive Transforms

If your package is memory bound, look for ways to eliminate the memory-intensive transforms or shift them to another package. Some transforms, such as the Aggregate, Lookup, and Sort transforms, use a lot of memory. The Sort transform, for example, holds all buffers until the last buffer and then releases the sorted rows. If memory runs low, these transforms might spool to disk, causing expensive hard page faults.

Monitor Other Memory-Intensive Applications

When running on the same machine as other memory-intensive applications, the data flow can become memory starved, even if there is plenty of memory on the machine. This is typically true when running packages on the same machine with SQL Server. SQL Server is aggressive about using memory. You can use the sp_configure system stored procedure to instruct SQL Server to limit its memory usage.

Eliminate Unnecessary Operations

If you don't need to process a column, keep the column in the original format in which you find it. For example, don't convert strings to dates if you never do any date-related transformations.

Pare Down the Lookup Reference Data

The default lookup query for the Lookup transform is

'SELECT * FROM'

Select the option to use the results of a query for the reference data. Generally, the reference data should only contain the key and the desired lookup column. So, for a dimension table lookup, that would be the natural key and the surrogate key.

Use Lookup Partial or Full Cache Mode

Depending on the requirements and the data, you should choose one of these two modes to speed up the Lookup transform. Partial cache mode is useful when the incoming data is repetitive and only references a small percentage of the total reference table. Full cache mode is useful when the reference table is relatively small and the incoming data references the full spectrum of reference table rows.

Consider Transforms with Asynchronous Outputs

As the mantra goes, "Asynchronous transforms aren't good because they copy memory." In practice, however, it's not always so conclusive. Recall the earlier discussion about asynchronous outputs being the start of a new execution tree. Because introducing a new execution tree provides another worker thread to do work, in some cases, introducing a transform with an asynchronous output can actually benefit data flow performance. For example, introducing an asynchronous output into a data flow running on a machine with only one CPU will likely slow down the data flow because of the memory copying. However, introduce asynchronous outputs into a data flow with only two execution trees running on a machine with eight CPUs, and you'll likely see a performance gain because more CPUs can be enlisted in processing the data flow. There's no magic pill here. This is one of those cases where you have to set, test, and measure.

Eliminate Lookup, Aggregate, and Sort Transforms

Although performant for what they do, and important when absolutely necessary, these transforms invariably degrade data flow performance. If possible, eliminate them from your data flows. Sort and aggregate data at the source, and use the Merge Join transform instead of the Lookup transform, if possible.

Set `BufferTempStoragePath` and `BlobTempStoragePath` to Fast Drives

If the data flow does need to spool to disk, these two locations point to where the Execution Engine stores the memory. In truth, if you're swapping out to these locations, you should try to decrease the memory load and eliminate the swapping. However, that's not always possible, so both properties should point to fast drives that are in low demand. Ideally, BufferTempStoragePath and BlobTempStoragePath should be on separate spindles.

Increase `DefaultBufferMaxSize` and `DefaultBufferMaxRows`

Increasing the values for these two properties can boost performance by decreasing the number of buffers moving through the data flow. However, you should avoid increasing the values too much to the point where the Execution Engine starts swapping out buffers to disk. That would defeat the purpose.

Use Match Indexes for Repeat Data-Cleansing Sessions

On subsequent runs, the Fuzzy Lookup transform can either use an existing match index or create a new index. If the reference table is static, the package can avoid the potentially expensive process of rebuilding the index for repeat package runs. If you choose to use an existing index, the index is created the first time that the package runs. If multiple Fuzzy Lookup transformations use the same reference table, they can all use the same index. To reuse the index, the lookup operations must be identical; the lookup must use the same columns. You can name the index and select the connection to the SQL Server database that saves the index. Doing this can save substantial startup time that would be wasted rebuilding the index.

Implement Parallel Execution

Both the Execution Engine for the Data Flow Task and the Execution Engine for the Control Flow are multithreaded. The following settings can be used to take full advantage of Execution Engine threading.

Use the `EngineThreads` Property

There is a property on the Data Flow Task called EngineThreads, which controls the number of worker threads the Execution Engine uses. The default for this property is 5, which means that five worker threads and five source threads will be used. However, as you now know, by simply adding a few components, data flow thread requirements will quickly exceed the default. If the EngineThreads value is less than the number of worker threads, or execution trees, the Data Flow Task uses one thread for multiple execution trees. This isn't always a bad thing. You might notice very little difference in performance with different EngineThreads settings depending on the package and the environment. On the other hand, boosting the number of threads can drastically improve performance for some data flowsespecially on high-end, multiprocessor machines. The important point here is to be aware of how many threads the data flow naturally needs and try to keep the EngineThreads value reasonably close to it.

Set `MaxConcurrentExecutables`

Recall the description of this property from Chapter 7, "Grouping Control Flow with Containers". It is the number of threads the control flow or runtime execution engine will use. The default is 4 and the maximum is 128. If you set the value to -1, the number of threads used will be the number of processors on the machine plus 2. For hyperthreaded enabled CPUs, the number of processors used is the logical processor count. If there are multiple Data Flow Tasks in the Control Flow, for example 10, and MaxConcurrentExecutables is set to 4, only four of the Data Flow Tasks will execute simultaneously. Set, test, and measure various value combinations of this property and the EngineThreads property to determine the optimal setting for your packages.

Handle Errors

Instead of handling errors in the data flow, you can choose to ignore them and then use the Conditional Split transform to filter through them, moving the error rows to a different output where you can later analyze and process them in a separate process. Handling errors in the data flow can be expensive. If the number of errors is typically low, then there's little worry. However, if you have a lot of errors, for example, when using the Lookup transform, using the Conditional Split transform is the way to go.

Scrutinize Components with Row-Intensive Operations

All the components perform operations on a row-by-row basis, but there are some that perform expensive operations such as the OLE DB Command transform, which executes a query for every row in the buffer. Although useful and even indispensable at times, it should be avoided if at all possible. Other transforms with row-intensive operations are as follows:

Fuzzy Lookup
Data Conversion
Export/Import Column
Term Lookup

Insert Performance

Often the greatest data flow performance inhibitor is the sustained write speed of the destination hard drives. But other elements impact destinations as well. Depending on where you are inserting data, you can speed up the inserts in the following ways.

Use the SQL Server Destination Adapter

If running on the same machine as SQL Server, use the SQL Server Destination Adapter instead of OLE DB Destination Adapter. Tests show a marked performance gain with the SQL Server Destination Adapter over the OLE DB Destination Adapter, especially when loading into empty nonindexed tables. This is almost always a slam-dunk performance improvement.

Set the Commit Size

The Commit Size option allows you to set a larger buffer commit size for loading into SQL Server. This setting is only available in the OLE DB Destination Adapter when using the SQL Server OLEDB driver. A setting of zero indicates that the adapter should attempt to commit all rows in a single batch.

Turn on Table Lock

This option is available in the OLE DB Destination Editor. Selecting Table Lock also enables a fast load, which tells the adapter to use the IRowsetFastload bulk insert interface for loading.

Note

Fastload delivers much better performance; however, it does not provide as much information if there is an error. Generally, for development, you should turn it off and then turn it on when deploying to production.

Disable Constraints

This option is also available in the OLE DB Destination Editor by deselecting the Check Constraints option.

Use Minimal Logging

Using the Simple or Bulk-Logged Recovery model, bulk import operations are minimally logged, which can significantly decrease insert times. To learn more about minimal logging, search MSDN or Books Online for "minimally logged operations."

Disable Indexes

Depending on the amount of time you save on the load, it might benefit you to disable indexes. The choice depends on how much you gain versus how much you lose. If disabling the index saves you 20 minutes of load time and rebuilding the index takes 10, you've saved yourself 10 minutes. However, in practice, the index rebuild time often takes longer than the amount of time you save by disabling it.