Memory Buffer Architecture | MicrosoftВ® SQL Server(TM) 2005 Integration Services Step by Step

A fundamental understanding of the memory management architecture in SSIS will help you make intelligent choices about how to assist SSIS to move data more efficiently through the pipeline. Some transformations require data to be copied or moved from one buffer to another while others do not. Because some transformations depend on the output of others, this can create blocks and bottlenecks in the process. Some operations require that a preceding transformation completely finish its work before it can begin to process any rows. This is said to be a blocking transformation. Other transformations must begin their duty, but rows can begin to flow to another transformation before the entire process has completed. In this case, the operation of the first transformation only partially blocks the second. Finally, there can be data flows and transformations in the package that are not at all dependent on each other and do not block other transformations in the data flow.

Buffer Usage

Behind the scenes, the Data Pipeline engine uses a buffer-oriented architecture to load and manipulate data sets in memory efficiently. The benefit of this in-memory processing is that you do not need to copy and stage data physically at each step of the data integration. Rather, the Data Pipeline engine manipulates data as it is transferred from source to destination.

Buffers are based on design-time metadata. The width of a row determines the size of the buffer. Because buffer space for a data flow task is allocated in a specific size, the number of rows that will fit into a buffer is determined by the number of bytes required for each row. In other words, the smaller the rows, the more rows that can fit into the buffer-and the greater the efficiency of the overall process. Two factors affect row size:

Number of columns
Width of each column

Column width is determined by the data type and the number of bytes assigned to fixed-width string types.

Consider the following scenario. Say that a table contains twelve columns consisting of five Int types, three DateTime types, and four 100-byte strings. That's 1,072 bytes per row. Using the default buffer size of ten megabytes, this means that the maximum number of records the buffer can handle is less than ten thousand. If only three columns were actually needed for the transformation (perhaps one of each data type in this example), the number of rows the buffer would accommodate is increased to nearly 40 thousand.

As a rule, select only the columns you need from data sources and assign conservation data types whenever possible. This is a simple matter of specifying columns explicitly in a SQL Select statement. The default lookup query for a data source uses Select *. This is because at design time, the package designer gives you all available columns to choose for joins and pass-through. After you have identified the required columns, modify the query to select only the columns that are actually necessary to support data flow operations.

As this data flows through the pipeline, SSIS attempts to reuse data from prior buffers as much as possible when additional operations are performed. How buffers are used and reused depends on the type of transformations that you use in the solution.