Turnkey Settings for
Speedup
In this section, you'll find a collection of
straightforward methods and hints for data flow speedup. The Data
Flow Task provides a number of settings and options you can use to
eliminate unnecessary work or to diminish the amount of data that
must flow through it. Some of these methods involve simple property
settings; others are a bit more complex. In addition, you can do a
number of things in the environment outside the Data Flow Task to
speed up data flow, which are mentioned in this section as well.
Although these
methods
are presented as a series of "do's and
don'ts," understand that not every tip applies to every situation
and that the items presented here are only guidelines, not hard
rules that must always be followed. Knowing when to apply which
rules is a matter of experience grown from testing and measuring
the results.
Eliminating
Unnecessary Work
Often, you might find that the best data flow
optimization you can achieve is to simply eliminate work the Data
Flow Task must do. Eliminating work is almost always preferable to
making the Data Flow Task faster because you get a 100% speedup for
every bit of work you eliminate. The following tips describe ways
to eliminate work from the Data Flow Task.
Optimize the
Sources
Most of the optimizations for eliminating
unnecessary work have something to do with eliminating the data
coming from the sources. Following are a few ways to eliminate
unnecessary data from entering into the data flow.
Remove Unneeded
Columns
Unneeded columns are columns that never get
referenced in the data flow. The Execution Engine emits warnings
for unused columns, so they are easy to identify. This makes the
buffer rows narrower. The narrower the row, the more rows that can
fit into one buffer and the more
efficiently
the rows can be
processed
. Binary Large Objects (BLOBs) are particularly burdensome
to the Data Flow Task and should be eliminated if at all possible.
Use the queries in the source adapters to eliminate unnecessary
columns.
Use a SQL Select
Statement to Retrieve Data from a View
Avoid using the
Table
or view
access mode in the OLE DB Source Adapter. It is not
as performant as using a
SELECT
statement because the
adapter opens a rowset based on the table or view. Then, it calls
OpenRowset in the validation phase to retrieve column metadata and
later in the execution phase to read out the data.
Testing has shown that using a
SELECT
statement can be at least an order of magnitude faster because the
adapter issues the specified command directly through the provider
and fetches the data using sp_prepare without executing the
command, avoiding the extra round-trip and a possibly inappropriate
cached query plan.
Optimize Source
Queries
Using traditional query optimization techniques,
optimize the source adapter SQL query. SSIS doesn't optimize the
query on your
behalf
, but
passes
it on verbatim.
Use Flat File and
Other File Sources
Retrieving data from file sources
presents
its
own set of performance challenges because the data is typically in
some format that requires conversion. For example, the Jet Provider
only supports a limited set of data types when reading Excel files
and flat-file data is always of type
String
until
converted. The following sections discuss a few hints for how to
eliminate unnecessary data flow work.
Combine Adjacent
Unneeded Flat-File Columns
In fixed-width flat files, combine unused
columns into one. This eliminates the parsing necessary for the
unused columns.
Leave Unneeded
Flat-File Columns as Strings
Flat-file columns are all strings when first
read from the file. Converting strings to dates, integers, or even
Unicode strings is costly. Leave columns as strings unless a
conversion is
absolutely
necessary.
Eliminate Hidden
Operations
For the most part, the Data Flow Task is
explicit and obvious about what it is doing. For example, in the
Derived Column transformation, you must cast column and variable
values explicitly. However, there are some
components
that perform
hidden or automatic conversions. For example, the Flat File Source
Adapter attempts to convert external column types to their
associated output column types. Use the Advanced Editor to explore
each column type so that you know where such conversions occur.
Only Parse or
Convert Columns When Necessary
Every row with a type conversion costs CPU
cycles. Eliminate unnecessary conversions. Reorganize the data flow
to eliminate the Type Conversion transform, if possible. Even
better, if possible, modify the source column data type to match
the type needed in the data flow. Also, if the conversion need not
be
locale-sensitive
, use
FastParse
.
Use the
FastParse
Option in Flat File Source Adapter
FastParse
is a set of optimized parsing
routines that
replaces
some of the SSIS
locale-specific
parsing
functions.
FastParse
only supports the more common date
format representations and does not use locale when parsing. See
Books Online for other
FastParse
limitations.
To
turn
on
FastParse
, follow these
steps:
|
1.
|
Open
the Flat File Source Adapter or the Data
Conversion transform in the Advanced Editor.
|
|
2.
|
Select the Input and Output Properties
tab.
|
|
3.
|
Open the Flat File Source Output node in the
tree on the left.
|
|
4.
|
Open the Output Columns.
|
|
5.
|
Select one of the columns.
|
|
6.
|
Note the
FastParse
custom property in
the property grid.
|
Eliminate Unneeded
Logging
Logging is useful for debugging and
troubleshooting. When developing packages, you should pull out the
stops. But, when deploying completed packages to production, be
mindful and careful about the log entries you leave enabled and the
log provider you use. Notably,
OnPipelineRowsSent
is
somewhat verbose. If there is any concern about the write time for
logging, you might want to consider turning off logging or at least
eliminating the chatty log entries. This is something you need to
balance against the need to monitor and troubleshoot packages.
Turning off logging might be good for performance, but then you
don't get any logs to diagnose when problems arise. You'll have to
find the right balance. Logging
generally
doesn't kill performance.
You just need to be aware of the potential performance
problems.
Optimizing the
Data Flow
The following are some suggestions for tuning
the data flow.
Use Indexes
This should probably go without saying, but when
accessing data, make sure you use indexes to speed up the process.
For example, in certain cases, the Lookup transform can benefit
from having an index on the reference table.
Perform Set-based
Operations
When possible, perform set-based operations in
SQL Server. For example, SQL Server can generally sort faster than
the
Sort
transform,
especially
if the table being sorted is
indexed. Set-based operations, such as joins, unions, and selects
with
ORDER BY
and
GROUP BY
, tend to be faster on
the server.
Be Mindful of
Transforms with Internal File IO
In Chapter 20, "The Stock Data Flow Components",
some transforms have comments to the effect that they perform
internal file Input/Output. For example, the Raw Source and
Destination, Import/Export Column transforms, Flat File Source and
Destination Adapters, and Excel File Source and Destination
Adapters are all directly impacted by the performance of the file
system. File IO isn't always a bottleneck, but when combined with
low memory conditions,
causing
spooling, or with other disk-
intense
operations, it can significantly impact performance. Components
that read and write to disk should be scrutinized
carefully
and, if
possible, configured to read and write to dedicated hard drives.
Look for ways to optimize the performance of the hard
drives
using
RAID, defragmentation, and/or correct partitioning.
Move to a 64-Bit
Platform
As you can see, the Data Flow Task can be very
memory
intensive
. With a few blocking transforms and a large source
dataset, it's easily possible to run into the memory limitations of
32-bit systems. The benefit of moving to a 64-bit platform is
simplemore memory.
Monitor
Memory-Intensive Transforms
If your package is memory bound, look for ways
to eliminate the memory-intensive transforms or shift them to
another package. Some transforms, such as the Aggregate, Lookup,
and Sort transforms, use a lot of memory. The Sort transform, for
example, holds all buffers until the last buffer and then releases
the sorted rows. If memory runs low, these transforms might spool
to disk, causing expensive hard page faults.
Monitor Other
Memory-Intensive Applications
When running on the same machine as other
memory-intensive applications, the data flow can become memory
starved, even if there is plenty of memory on the machine. This is
typically true when running packages on the same machine with SQL
Server. SQL Server is
aggressive
about using memory. You can use
the
sp_configure
system stored procedure to instruct SQL
Server to limit its memory usage.
Eliminate
Unnecessary Operations
If you don't need to process a column, keep the
column in the original format in which you find it. For example,
don't convert strings to dates if you never do any date-
related
transformations.
Pare Down the
Lookup Reference Data
The default lookup query for the Lookup
transform is
'SELECT * FROM'
Select the option to use the results of a query
for the reference data. Generally, the reference data should only
contain the key and the desired lookup column. So, for a dimension
table lookup, that would be the natural key and the surrogate
key.
Use Lookup Partial
or Full Cache Mode
Depending on the requirements and the data, you
should choose one of these two modes to speed up the Lookup
transform. Partial cache mode is useful when the incoming data is
repetitive and only references a small percentage of the total
reference table. Full cache mode is useful when the reference table
is relatively small and the incoming data references the full
spectrum of reference table rows.
Consider
Transforms with Asynchronous Outputs
As the mantra goes, "Asynchronous transforms
aren't good because they copy memory." In practice, however, it's
not always so conclusive. Recall the earlier discussion about
asynchronous outputs being the start of a new execution tree.
Because introducing a new execution tree provides another worker
thread to do work, in some cases, introducing a transform with an
asynchronous output can actually benefit data flow performance. For
example, introducing an asynchronous output into a data flow
running on a machine with only one CPU will likely slow down the
data flow because of the memory copying. However, introduce
asynchronous outputs into a data flow with only two execution trees
running on a machine with eight CPUs, and you'll likely see a
performance gain because more CPUs can be enlisted in processing
the data flow. There's no magic pill here. This is one of those
cases where you have to set, test, and measure.
Eliminate Lookup,
Aggregate, and Sort Transforms
Although performant for what they do, and
important when absolutely necessary, these transforms invariably
degrade data flow performance. If possible, eliminate them from
your data flows. Sort and aggregate data at the source, and use the
Merge Join transform instead of the Lookup transform, if
possible.
Set
BufferTempStoragePath
and
BlobTempStoragePath
to
Fast Drives
If the data flow does need to spool to disk,
these two locations point to where the Execution Engine stores the
memory. In truth, if you're swapping out to these locations, you
should try to decrease the memory load and eliminate the swapping.
However, that's not always possible, so both properties should
point to fast drives that are in low demand. Ideally,
BufferTempStoragePath
and
BlobTempStoragePath
should be on separate spindles.
Increase
DefaultBufferMaxSize
and
DefaultBufferMaxRows
Increasing the values for these two properties
can boost performance by
decreasing
the number of buffers moving
through the data flow. However, you should avoid increasing the
values too much to the point where the Execution Engine starts
swapping out buffers to disk. That would defeat the purpose.
Use Match Indexes
for Repeat Data-Cleansing Sessions
On
subsequent
runs, the Fuzzy Lookup transform
can either use an existing match index or create a new index. If
the reference table is static, the package can avoid the
potentially
expensive process of rebuilding the index for repeat
package runs. If you choose to use an existing index, the index is
created the first time that the package runs. If multiple Fuzzy
Lookup transformations use the same reference table, they can all
use the same index. To reuse the index, the lookup operations must
be identical; the lookup must use the same columns. You can
name
the index and select the connection to the SQL Server database that
saves the index. Doing this can save substantial startup time that
would be
wasted
rebuilding the index.
Implement Parallel
Execution
Both the Execution Engine for the Data Flow Task
and the Execution Engine for the Control Flow are multithreaded.
The following settings can be used to take full advantage of
Execution Engine threading.
Use the
EngineThreads
Property
There is a property on the Data Flow Task called
EngineThreads
, which controls the number of worker threads
the Execution Engine uses. The default for this property is
5
, which means that five worker threads and five source
threads will be used. However, as you now know, by simply adding a
few components, data flow thread requirements will quickly exceed
the default. If the
EngineThreads
value is less than the
number of worker threads, or execution trees, the Data Flow Task
uses one thread for multiple execution trees. This isn't always a
bad thing. You might notice very little difference in performance
with different
EngineThreads
settings depending on the
package and the environment. On the other hand,
boosting
the number
of threads can drastically improve performance for some data
flowsespecially on high-end, multiprocessor machines. The important
point here is to be aware of how many threads the data flow
naturally needs and try to keep the
EngineThreads
value
reasonably close to it.
Set
MaxConcurrentExecutables
Recall the description of this property from
Chapter 7, "Grouping Control Flow with Containers". It is the
number of threads the control flow or runtime execution engine will
use. The default is
4
and the maximum is
128
. If
you set the value to
-1
, the number of threads used will
be the number of processors on the machine plus 2. For
hyperthreaded enabled CPUs, the number of processors used is the
logical processor count. If there are multiple Data Flow Tasks in
the Control Flow, for example 10, and
MaxConcurrentExecutables
is set to 4, only four of the
Data Flow Tasks will execute
simultaneously
. Set, test, and measure
various value combinations of this property and the
EngineThreads
property to determine the optimal setting
for your packages.
Handle Errors
Instead of handling errors in the data flow, you
can choose to ignore them and then use the Conditional Split
transform to filter through them, moving the error rows to a
different output where you can later analyze and process them in a
separate process. Handling errors in the data flow can be
expensive. If the number of errors is typically low, then there's
little worry. However, if you have a lot of errors, for example,
when using the Lookup transform, using the Conditional Split
transform is the way to go.
Scrutinize
Components with Row-Intensive Operations
All the components perform operations on a
row-by-row basis, but there are some that perform expensive
operations such as the OLE DB Command transform, which executes a
query for every row in the buffer. Although useful and even
indispensable
at times, it should be avoided if at all possible.
Other transforms with row-intensive operations are as
follows
:
-
Fuzzy Lookup
-
Data Conversion
-
Export/Import Column
-
Term
Lookup
Insert
Performance
Often the greatest data flow performance
inhibitor is the sustained write speed of the destination hard
drives. But other elements impact destinations as well. Depending
on where you are inserting data, you can speed up the
inserts
in
the following ways.
Use the SQL Server
Destination Adapter
If running on the same machine as SQL Server,
use the SQL Server Destination Adapter instead of OLE DB
Destination Adapter. Tests show a
marked
performance gain with the
SQL Server Destination Adapter over the OLE DB Destination Adapter,
especially when loading into empty nonindexed tables. This is
almost always a slam-dunk performance improvement.
Set the Commit
Size
The Commit Size option allows you to set a
larger buffer commit size for loading into SQL Server. This setting
is only available in the OLE DB Destination Adapter when using the
SQL Server OLEDB driver. A setting of zero indicates that the
adapter should attempt to commit all rows in a single batch.
Turn on Table
Lock
This option is available in the OLE DB
Destination Editor. Selecting Table Lock also enables a fast load,
which
tells
the adapter to use the
IRowsetFastload
bulk
insert interface for loading.
Note
Fastload delivers much better performance;
however, it does not provide as much information if there is an
error. Generally, for development, you should turn it off and then
turn it on when deploying to production.
Disable
Constraints
This option is also available in the OLE DB
Destination Editor by
deselecting
the Check Constraints option.
Use Minimal
Logging
Using the Simple or Bulk-Logged Recovery model,
bulk import operations are minimally logged, which can
significantly decrease insert times. To learn more about minimal
logging, search MSDN or Books Online for "
minimally
logged
operations."
Disable
Indexes
Depending on the amount of time you save on the
load, it might benefit you to disable indexes. The choice depends
on how much you gain versus how much you lose. If disabling the
index saves you 20 minutes of load time and rebuilding the index
takes 10, you've saved yourself 10 minutes. However, in practice,
the index rebuild time often takes longer than the amount of time
you save by disabling it.
|