Understanding Data Flow Sources, Transformations, and Destinations | MCITP Administrator: Microsoft SQL Server 2005 Optimization and Maintenance (Exam 70-444) Study Guide

SSIS is commonly used to move data from one location or file format to another. SSIS reads data using Data Flow sources, changes or processes the data using transformations if needed, and then writes the data to its new location or file format using Data Flow destinations. Packages aren’t limited to just a single source or destination; complex packages might have multiple sources and/or destinations.

Worth repeating is that the Data Flow tab shows the workflow within a single Data Flow task. Packages can have more than one Data Flow task, each with its own self-contained workflow.

Sources

Sources are combined with connection managers (except the Raw File source) to retrieve data from various file formats. The OLE DB source is a catchall that can be used against any data store that supports OLE DB, so you’re not just limited to text and Excel files. Here is the complete list of sources:

Data Reader source
Excel source
Flat File source

Raw File source

Note

Raw File sources are often used to move data between packages because they have less overhead than writing the same data to a table, which would require SQL Server to log the transaction so that it could be rolled back if necessary.

OLE DB source
XML source

Transforms

Transforms are the power tools of SSIS. They allow you to slice, dice, and hack data in a lot of different ways with no coding-you have to just set a few properties. The following list contains some notes about the uses of each transform to help you better retain their capabilities- it’s a lot to remember!

Aggregate Applies aggregate functions such as SUMand AVERAGEas well as GROUP BY.
Audit Includes data about the environment; the options are ExecutionInstanceGuid, PackageID, PackageName, VersionID, ExecutionStartTime, MachineName, UserName, Taskname, and TaskID.
Character Map Applies string functions such as uppercase, lowercase, and conversion from things such as half-width to full-width characters.
Conditional Split Sends different rows to different outputs depending on the conditions set. A common example would be clean data going directly to the final destination while rows with missing elements are diverted for additional processing.
Copy Column Creates a copy of a column (this could be done inside a SELECT statement, assuming you didn’t need to apply a transform such as Character Map to the column before you made the copy).
Data Conversion Converts a column to a different data type and copies it to a new column.
Data Mining Query Performs prediction queries against a data mine.
Derived Column You used this in the previous example; it uses an expression to create a new value that can replace an existing one or be added as a new column.
Export Column Takes data from a column and writes it to a file. An example would be an employees table that has a Photo column; you could use this transform to create those images on disk as separate files using a naming convention you specify. It’s worth giving this one a test run to better see what it does.
Fuzzy Grouping Identifies rows that may be duplicates; a score is used so you can customize how aggressive you want to be about filtering out duplicates.
Fuzzy Lookup Finds the closest lookup in a table; it’s similar to the Lookup transform, but it doesn’t require an exact match.
Import Column Basically the opposite of the Export Column transform; this is used to grab files from disk and place into rows during processing.
Lookup Takes a column-with ZIP codes, for example-and looks it up in a table and returns related information.
Merge Combines two sorted data sets into one data set; the two data sets need to have matching column types. Think of it as a specialized variation of the Union All transform. It could be used to bring back together results that were split using a Conditional Split transform.
Merge Join This is similar to the Merge transform, but it is used to apply the various join types to two different data sources.
Multicast Sends all its output to multiple destinations. A simple example would be to use one output to log the state of the data at that point while another output sends the data on for additional processing.
OLE DB Command Executes a SQL statement for each row processed.
Percentage Sampling Often used in data mining; returns a random subset of the total rows.
Pivot Performs a standard pivot table–type operation just like you would do in Excel.
Row Count Puts the count of rows processed into a variable for later use.
Row Sampling Similar to the Percentage Sampling transform, only you specify the number of rows rather than the percentage.
Script Component Executes Visual Basic .NET code on the rows that pass through.
Slowly Changing Dimension You definitely want to use the wizard to configure this one. This is used when processing data for a data warehouse–type scenario where you need to maintain the history of changes but the changes are not frequent.
Sort Sorts data either ascending or descending.
Term Extraction Works only with English; this looks for nouns and verbs only and returns them as a column, giving you a way to identify rows that contain certain words.
Term Lookup Extracts terms and ends up with a list of terms and their frequency; this can be configured to be case sensitive. It has a lot of rules you’ll want to look at before employing.
Union All Works just like Union All in TSQL; combines two outputs that have matching column types.
Unpivot The reverse of a Pivot transform.

Destinations

Destinations are the endgame, where you write your final data to a file or database table. You have a rich set of options to work with; the most commonly used are the SQL Server and Flat File destinations. Destinations also are based on connection managers. The following are the available destinations:

Data Mining Model Training destination Passes data through the algorithms that train the final model.
DataReader destination Similar to the Recordset destination; this one is for ADO.NET and is a way to take data from the data flow and put it into a variable that can be accessed in code (script).
Dimension Processing destination Used to load and process an Analysis Service dimension.
Excel destination Loads data into an Excel worksheet.
Flat File destination Creates a variety of text file formats.
OLE DB destination Loads data into a system that has an OLEDB driver.
Partition Processing destination Load and processes an Analysis Service partition.
Raw File destination We’ve talked about raw files before; this is where you would create one (as opposed to reading from one in a source).
Recordset destination Creates an in-memory ADO (not ADO.NET!) recordset, which would be pushed into a variable for use in code later.
SQL Server destination Same performance as BULK INSERT in T-SQL, but you can transform the data as you append to a SQL Server table.
SQL Server Mobile destination Write to a device that is running Pocket PC or similar small operating system with a limited version of SQL Server.