Understanding Data Flow Sources, Transformations, and Destinations


SSIS is commonly used to move data from one location or file format to another. SSIS reads data using Data Flow sources, changes or processes the data using transformations if needed, and then writes the data to its new location or file format using Data Flow destinations. Packages aren’t limited to just a single source or destination; complex packages might have multiple sources and/or destinations.

Worth repeating is that the Data Flow tab shows the workflow within a single Data Flow task. Packages can have more than one Data Flow task, each with its own self-contained workflow.

Sources

Sources are combined with connection managers (except the Raw File source) to retrieve data from various file formats. The OLE DB source is a catchall that can be used against any data store that supports OLE DB, so you’re not just limited to text and Excel files. Here is the complete list of sources:

  • Data Reader source

  • Excel source

  • Flat File source

  • Raw File source

    Note 

    Raw File sources are often used to move data between packages because they have less overhead than writing the same data to a table, which would require SQL Server to log the transaction so that it could be rolled back if necessary.

  • OLE DB source

  • XML source

Transforms

Transforms are the power tools of SSIS. They allow you to slice, dice, and hack data in a lot of different ways with no coding-you have to just set a few properties. The following list contains some notes about the uses of each transform to help you better retain their capabilities- it’s a lot to remember!

  • Aggregate   Applies aggregate functions such as SUMand AVERAGEas well as GROUP BY.

  • Audit   Includes data about the environment; the options are ExecutionInstanceGuid, PackageID, PackageName, VersionID, ExecutionStartTime, MachineName, UserName, Taskname, and TaskID.

  • Character Map   Applies string functions such as uppercase, lowercase, and conversion from things such as half-width to full-width characters.

  • Conditional Split   Sends different rows to different outputs depending on the conditions set. A common example would be clean data going directly to the final destination while rows with missing elements are diverted for additional processing.

  • Copy Column   Creates a copy of a column (this could be done inside a SELECT statement, assuming you didn’t need to apply a transform such as Character Map to the column before you made the copy).

  • Data Conversion   Converts a column to a different data type and copies it to a new column.

  • Data Mining Query   Performs prediction queries against a data mine.

  • Derived Column   You used this in the previous example; it uses an expression to create a new value that can replace an existing one or be added as a new column.

  • Export Column   Takes data from a column and writes it to a file. An example would be an employees table that has a Photo column; you could use this transform to create those images on disk as separate files using a naming convention you specify. It’s worth giving this one a test run to better see what it does.

  • Fuzzy Grouping   Identifies rows that may be duplicates; a score is used so you can customize how aggressive you want to be about filtering out duplicates.

  • Fuzzy Lookup   Finds the closest lookup in a table; it’s similar to the Lookup transform, but it doesn’t require an exact match.

  • Import Column   Basically the opposite of the Export Column transform; this is used to grab files from disk and place into rows during processing.

  • Lookup   Takes a column-with ZIP codes, for example-and looks it up in a table and returns related information.

  • Merge   Combines two sorted data sets into one data set; the two data sets need to have matching column types. Think of it as a specialized variation of the Union All transform. It could be used to bring back together results that were split using a Conditional Split transform.

  • Merge Join   This is similar to the Merge transform, but it is used to apply the various join types to two different data sources.

  • Multicast   Sends all its output to multiple destinations. A simple example would be to use one output to log the state of the data at that point while another output sends the data on for additional processing.

  • OLE DB Command   Executes a SQL statement for each row processed.

  • Percentage Sampling   Often used in data mining; returns a random subset of the total rows.

  • Pivot   Performs a standard pivot table–type operation just like you would do in Excel.

  • Row Count   Puts the count of rows processed into a variable for later use.

  • Row Sampling   Similar to the Percentage Sampling transform, only you specify the number of rows rather than the percentage.

  • Script Component   Executes Visual Basic .NET code on the rows that pass through.

  • Slowly Changing Dimension   You definitely want to use the wizard to configure this one. This is used when processing data for a data warehouse–type scenario where you need to maintain the history of changes but the changes are not frequent.

  • Sort   Sorts data either ascending or descending.

  • Term Extraction   Works only with English; this looks for nouns and verbs only and returns them as a column, giving you a way to identify rows that contain certain words.

  • Term Lookup   Extracts terms and ends up with a list of terms and their frequency; this can be configured to be case sensitive. It has a lot of rules you’ll want to look at before employing.

  • Union All   Works just like Union All in TSQL; combines two outputs that have matching column types.

  • Unpivot   The reverse of a Pivot transform.

Destinations

Destinations are the endgame, where you write your final data to a file or database table. You have a rich set of options to work with; the most commonly used are the SQL Server and Flat File destinations. Destinations also are based on connection managers. The following are the available destinations:

  • Data Mining Model Training destination   Passes data through the algorithms that train the final model.

  • DataReader destination   Similar to the Recordset destination; this one is for ADO.NET and is a way to take data from the data flow and put it into a variable that can be accessed in code (script).

  • Dimension Processing destination   Used to load and process an Analysis Service dimension.

  • Excel destination   Loads data into an Excel worksheet.

  • Flat File destination   Creates a variety of text file formats.

  • OLE DB destination   Loads data into a system that has an OLEDB driver.

  • Partition Processing destination   Load and processes an Analysis Service partition.

  • Raw File destination   We’ve talked about raw files before; this is where you would create one (as opposed to reading from one in a source).

  • Recordset destination   Creates an in-memory ADO (not ADO.NET!) recordset, which would be pushed into a variable for use in code later.

  • SQL Server destination   Same performance as BULK INSERT in T-SQL, but you can transform the data as you append to a SQL Server table.

  • SQL Server Mobile destination   Write to a device that is running Pocket PC or similar small operating system with a limited version of SQL Server.



MCITP Administrator. Microsoft SQL Server 2005 Optimization and Maintenance Study Guide (70-444)
MCITP Administrator: Microsoft SQL Server 2005 Optimization and Maintenance (Exam 70-444) Study Guide
ISBN: 0470127457
EAN: 2147483647
Year: 2004
Pages: 146

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net