Common Component Conventions | Microsoft SQL Server 2005 Integration Services

Although each of the stock components is unique, they also share some similarities, especially in the component designers. This section covers the component and designer conventions, which most of the stock components use to some extent.

Custom Component Editors and the Advanced Editor

You can edit data flow components in three ways. Like most objects in the designer, you can modify component properties in the property grid. You can get by with modifying certain properties with only the property grid, but it has limitations.

Another way to edit components in the designer is to use their custom editor. Some, but not all, stock components have a custom editor. Custom component editors only support a given component and provide features for supporting the specific functions and properties of that component.

The final way to modify components is to use the Advanced Editor, which is the default editor for some components with no custom editor and is available to edit most components even if they have custom editors. The Advanced Editor is a generic editor, but it also exposes some component properties that are not visible in any other location; specifically, the input and output properties of components.

You can open the editor or Advanced Editor by either right-clicking on the component and selecting the corresponding menu item for the editor you want to open, or you can click on the Show Editor or Show Advanced Editor links at the bottom of the properties grid. You must have the component selected in the designer for these links to appear. If the links don't appear, right-click on the margin just below the property grid and in the context menu that appears, make sure the Commands menu item is selected. Figure 20.1 shows the editor links.

Figure 20.1. The properties grid for the OLE DB Source Adapter with the editor links

Error Output

Components might provide a secondary output called, appropriately enough, an error output. Error outputs provide a way for a component to redirect rows with one or more columns containing errors. Errors occur when the transform encounters unexpected data. For example, if the transform is expecting a column to contain an integer but it actually contains a date, that would be considered an error. Also, the Data Flow Task treats potential truncations as a different class of errors so that you can handle both with different settings. Although errors are always bad, truncations might be OK or even desirable. Error outputs provide a way for components to divert error and truncation rows to a separate output, providing various options for handling the exceptional data.

Figure 20.2 shows the error row editor for the OLE DB Source Adapter.

Figure 20.2. Changing the error output options for multiple columns

Using the Error Output Editor, you can specify how the component will handle a row with an error. There are three options:

Ignore Ignores the error. The row is sent down normal output with the errant data possibly converted to NULL. This is rarely the right choice.
Redirect Pushes the row to the error output.
Fail Component Causes the component and, possibly, the data flow to fail.

Caution

When selecting multiple columns and choosing the error disposition, as shown in Figure 20.2, make sure you click the Apply button so that your change will be applied to all the selected columns.

The option you choose is, of course, dependent on your requirements and how important it is to you that your data be completely clean. Following are some examples of how people process rows with errors.

Ignore the rows Do this if the column with the error is insignificant and no other transforms in the data flow process the column.
Correcting the problem "in situ" You can add additional transformations on the error output to attempt to correct the problem with the rows and redirect the corrected rows back into the data flow.
Storing the rows You can send the error rows to a table or flat file for later analysis. You might find patterns that indicate there are issues with the way the data is being input into the source or processed upstream and attempt to correct the problem either at the source or through additional processing directly in the data flow.
Building reports on error rows You can run additional data flow processes on the error outputs and build reports to track the types of errors that are occurring.

Note

Notice that error output settings are found on the Error Output tab. Many of the components support error outputs, but not all of them. Some components have a separate Configure Error Outputs button instead of the tab. Some components have neither if they don't support error outputs.

Error Columns

Error outputs have two additional columns called ErrorCode and ErrorColumn. The ErrorCode column contains the error ID of the error and the ErrorColumn contains the column ID of the column that generated the error. If you create a Data Viewer on the error output, you can see the ErrorCode, ErrorColumn, and ErrorCode Description for each error row. Figure 20.3 shows the ErrorRows sample package in the S20-StockComponents sample solution.

Figure 20.3. Viewing the error output columns

The ErrorCode column gives you an ID that you can look up in the errors list. The ErrorColumn is the ID of the column that caused the error. To use this value, open the Advanced Editor for the component that caused the error and look at the output columns. In this case, the OutputColumn value was 8. Figure 20.4 shows that the column with the ID of 8 was "Column 1" on the Flat File Source Adapter called Read Dirty Data.

Figure 20.4. Finding the column that caused an error

Column Mapping

Another thing components have in common is the way they represent columns and column mappings. Figure 20.5 shows the editor for the Flat File Source Adapter. Because it is a source, there are no input columns to which you can map the output columns. You can choose to eliminate columns if you want. In this case, "Column 4" is deselected, making it unavailable to any of the downstream components.

Figure 20.5. Eliminating an input column

Figure 20.6 shows the Flat File Destination Editor with the three columns of the error output from the Flat File Source Adapter called Read Dirty Data in Figure 20.3. To remove one of the mappings, the lines between the columns, you click on it with the mouse and press the Delete key.

Figure 20.6. Selecting and deleting mappings

You can also right-click in any spot inside the Columns window to access the context menu shown previously. This is a quick way to select and delete all the mappings, which is useful when the metadata on either the input or output changes and you need to remap the columns.

Access Modes

Many components provide different ways to access a given resource. For example, the OLE DB Source Adapter provides four options for where to retrieve rows:

Table or View Specify the name directly in the designer.
Table Name or View Name Variable The table or view name is contained in a variable.
SQL Command Specify the SQL query for retrieving rows directly in the designer.
SQL Command from Variable The SQL query is stored in a variable; specify the name of the variable.

The XML Source Adapter provides three different ways to access XML data:

XML File Location Specify the filename.
XML File from Variable Specify the name of a variable that contains the name of the XML file.
XML Data from Variable Specify the name of a variable that contains the XML data.

These options for accessing resources are called Access modes and are common throughout components. They provide a way to flexibly and dynamically modify important component settings.

External Metadata

When a component is disconnected from its data source or destination, it has no way to validate against the actual source or destination metadata. Integration Services provides a way to cache the source or destination metadata at the component so that the component can still validate and map columns, even when the component is disconnected. When the component is connected to its data source or destination, it validates its components against the metadata in the source or destination. When disconnected, it validates against the External Metadata cache.

Component Groups

As the thickness of this chapter confirms, there are a lot of data flow components that ship with Integration Services; 45 by my count, depending on the edition you have installed. So, here, the components are organized by functional categories. Source adapters and destination adapters are obvious categories, and the transforms are further organized into three categories:

Flow Control transforms
Auditing transforms
Data Manipulation transforms

Although overlaps exist between the categories, and arguments could be made that a transform doesn't belong in one but should be in another, this categorization is purely for convenience of organization and shouldn't be viewed strictly.

Source Adapters

Source adapters retrieve data from the location and format in which it is stored and convert it to the common data flow buffer format, placing it in a data flow buffer.

Flow Control Transforms

Flow Control transforms direct rows down different paths, create new outputs or columns, merge multiple outputs, or otherwise direct data to different portions of the data flow based on predicates or structure.

Auditing Transforms

Auditing transforms generate metadata about the data flow that is consumed outside of the Data Flow Task. This information can be useful for diagnostic or lineage information.

Data Manipulation Transforms

These are the transforms that perform the heavy data transformation lifting. They vary in data processing extremes from the Character Map transform, which merely performs case conversions on strings, to the Aggregate transform, which consumes all rows on the input while potentially producing only one output row.

Destination Adapters

Destination adapters retrieve data from data flow buffers and insert it into the adapter-specific format and location.

Advanced Components

There are some components that ship with the Developer and Enterprise Editions of SQL Server only or are for specialized purposes, so they are discussed in their own chapter. You can find them in Chapter 22.

Fuzzy Grouping
Fuzzy Lookup
Term Extraction
Term Lookup

Component Properties

For each component, a table is provided that lists the properties in a condensed format for reference. These aren't properties on the component itself, but rather descriptive properties or classifications to help better characterize the components.

The properties and possible values contained in the tables are as follows:

Component Type Source, transform, or destination; the component type classification.
Has Custom Editor Does the component support a Custom Editor? If not, you need to use the Advanced Editor to modify the component.
Internal File I/O Component works directly with file system files. This is helpful when trying to understand the impact of the file system on the data flow.
Output Types Synchronous, Asynchronous. Remember the definitions from Chapter 19, "Introducing the Data Flow Task," a synchronous output sends a row out for every row on the component input and asynchronous outputs have a different buffer type than the component's input.
Threading Single, Multi. Has performance and scale implications.
Managed Yes/No. Can impact the type of connection manager to use.
Number Outputs The number of possible outputs. Useful for understanding how data flows through the component.
Number Inputs The number of possible inputs. Useful for understanding how data flows through the component.
Requires Connection Manager The component uses a connection manager. Useful for understanding configuration requirements.
Supports Error Routing The component supports an error output.
Constraints These are various constraints that could possibly impact the performance of the component. These are not universally applicable and should be viewed purely as performance hints.