Writing a Source Adapter


This section describes how to build a source adapter and walks through the implementation of the Image File Source Adapter example. Open the ImageFileSrc sample solution in the SAMPLES\SRC\SampleComponents\ImageFileSrc folder to follow along. The Image File Source Adapter is a bit unusual in that it does not rely on external metadata. The columns to retrieve are relatively static having been defined in the EXIF standard. The component could conceivably open one of the source files, retrieve the supported EXIF fields for that file, and then configure the columns based on the supported fields; however, the next file might support a different set of fields. To keep things simple, only the most commonly supported and interesting fields are supported for this source.

Setup and the Design Time Methods

The setup for the component is very similar to custom tasks, but this section covers some of the differences as well as how to implement some of the design time methods.

The DtsPipelineComponent Attribute

The DtsPipelineComponent attribute identifies the component to Integration Services and provides a number of important pieces of information.

[ DtsPipelineComponent( ComponentType = ComponentType.SourceAdapter, IconResource = "SSIS2K5.Samples.Transforms.ImageFileSrc.component.ico", DisplayName="ImageFile Source", Description="Retrieves EXIF information from JPG files", CurrentVersion = 1) ] 


There are a few important points to notice about the attribute. ComponentType is required. Without it, Integration Services will not recognize the component. IconResource must be the fully qualified name of the icon. The DisplayName and Description are important and you must include one or the other. The CurrentVersion attribute is not required, but comes in handy when you ship the next version of the component. Attribute parameters not used are NoEditor, LocalizationType, RequiredProductLevel, ShapeProgID, and UITypeName, which are described in Chapter 24.

ProvideComponentProperties

This method cleans up any other possible settings and then sets up the connection, output, and ComponentInformation. It might not seem necessary to do the cleanup because the designer calls this method only when the component is first created. However, there are other clients besides the designer, for example, custom written code for generating packages. Those clients might call the component methods in a different order and even multiple times. It's important to guard against such clients and always implement these methods as though they were getting called in the wrong way at the wrong time by cleaning up, checking assumptions, and building from scratch.

public override void ProvideComponentProperties() {        // start out clean, remove anything put on by the base class        RemoveAllInputsOutputsAndCustomProperties(); 


Setting Up the Runtime Connection

This section of code reserves a reference to a runtime connection. This confuses some, so an explanation is in order. The Data Flow Task has a special collection called the RuntimeConnectionCollection. This is not a collection of connection managers. It is a collection of objects specific to the Data Flow Task and indicates components needed for a connection manager. By creating a RuntimeConnection as shown next, the component is saying to the Data Flow Task that it needs to use a connection. The name can be any string and is used for later retrieving the RuntimeConnection object, which will contain a reference to a connection manager. Why? This design makes it possible to better validate at the Data Flow Task level and hides connection manager name changes from components.

       // Get the runtime connection collection.        IDTSRuntimeConnectionCollection90 pIDTSRuntimeConnectionCollection = ComponentMetaData.RuntimeConnectionCollection;        // See if there is already a runtime connection        IDTSRuntimeConnection90 pIDTSRuntimeConnection;        try        {           pIDTSRuntimeConnection = pIDTSRuntimeConnectionCollection[CONNECTION_NAME];        }           catch (Exception)        {           // Must not be there, make one           pIDTSRuntimeConnection = pIDTSRuntimeConnectionCollection.New();           pIDTSRuntimeConnection.Name = CONNECTION_NAME;           pIDTSRuntimeConnection.Description = CONNECTION_DESC;        } 


Setting Up Component Outputs

This section of code sets up an output and declares that the component wants to use external metadata columns. External metadata columns have a number of benefits, including the ability to validate even when there is no connection available. In this case, the metadata is static, but the external metadata columns are still useful because the Advanced UI uses the external metadata columns to display the links in the mapping columns. Users indicate that they don't want a particular column by deleting the mapping, which also eliminates the output column.

        // Add an output        IDTSOutput90 output = ComponentMetaData.OutputCollection.New(); 


The output.Name is the name that shows up in the Input and Output Properties tab in the Advanced UI.

       // Name the output        output.Name = "Image Source Output";        // Set external metadata used.        ComponentMetaData.OutputCollection[0].ExternalMetadataColumnCollection.IsUsed = true;        // We don't have connected metadata, just external metadata.        // Don't provide connected validation.        ComponentMetaData.ValidateExternalMetadata = false; 


The component supports error dispositions. As a rule, components should support error dispositions. They make it easier to troubleshoot problems with the data. Setting UsesDispositions generally indicates that the component provides error input. However, it doesn't have to do this and might simply provide better error handling. Without error dispositions, the component will fail the first time data fails to correctly insert into the buffer.

  ComponentMetaData.UsesDispositions = true; 


The ContactInfo information is useful when a package that references the component is loaded, but the component isn't installed. In that event, this information is displayed to the user in an error message when loading the package.

// Get the assembly version and set that as our current version. SetComponentVersion(); // Name the component and add component information ComponentMetaData.Name = "ImageFileSrc"; ComponentMetaData.Description = "Sample Image File Source Adapter"; ComponentMetaData.ContactInfo = ComponentMetaData.Description +        ";Microsoft SQL Server Integration Services 2005, By Kirk Haselden;" +        " (c) 2006 Kirk Haselden" + ComponentMetaData.Version.ToString();  ReinitializeMetadata ReinitializeMetadata is called when the Refresh button in the Advanced UI is clicked or when the Validate method returns VS_NEEDSNEWMETADATA. public override void ReinitializeMetaData() {        // Baseclass may have some work to do here        base.ReinitializeMetaData();        // Get the output        IDTSOutput90 output = ComponentMetaData.OutputCollection[0];        // Start from a clean slate        output.OutputColumnCollection.RemoveAll();        output.ExternalMetadataColumnCollection.RemoveAll(); 


This is where the column information is added. The ExternalMetadataColumn is added to the ExternalMetadataColumnCollection on the output it describes. Note that the column data is statically defined. This is not typical. Most sources reference an external source such as a table, flat file, or spreadsheet to determine the metadata for the columns. In this case, the data we want to expose is known beforehand and is statically defined in the ColumnInfos array. As a general rule, you should avoid removing columns in source adapters unless absolutely necessary because this causes downstream data flow to break when the lineage IDs change.

m_columnInfos = new ColumnInfo[12] {        new ColumnInfo("FILENAME", DataType.DT_STR, 1028),        new ColumnInfo("MAKE", DataType.DT_STR, 30),        new ColumnInfo("MODEL", DataType.DT_STR, 30),        new ColumnInfo("SHUTTERSPEED", DataType.DT_STR, 10),        new ColumnInfo("FSTOP", DataType.DT_STR, 10),        new ColumnInfo("EXPOSUREPROGRAM", DataType.DT_STR, 20),        new ColumnInfo("ISO", DataType.DT_STR, 10),        new ColumnInfo("DATETIME", DataType.DT_STR, 40),        new ColumnInfo("IMAGEWIDTH", DataType.DT_STR, 10),        new ColumnInfo("IMAGELENGTH", DataType.DT_STR, 10),        new ColumnInfo("BITSPERSAMPLE", DataType.DT_STR, 10),        new ColumnInfo("COMPRESSION", DataType.DT_STR, 20), }; 


The ReinitializeMetadata method uses the array of ColumnInfos to populate the output columns.

// Add the EXIF column information for (int i = 0; i < m_columnInfos.Length;i++ ) {         // Create a new output column        IDTSOutputColumn90 column = output.OutputColumnCollection.NewAt(i);        IDTSExternalMetadataColumn90 externalColumn =               output.ExternalMetadataColumnCollection.NewAt(i); 


Set the properties of each column. In this sample, all the columns are of type string so only the length and code page parameters are needed. The SetDataTypeProperties method sets the data type, length, precision, scale, and code page for the column simultaneously.

       // Set the column properties        column.SetDataTypeProperties(m_columnInfos[i].type,           m_columnInfos[i].length, 0, 0, 1252);        column.Name = m_columnInfos[i].name; 


Set the properties of each external metadata column. These should be the same as their corresponding columns.

       // Set the external column properties        externalColumn.DataType = m_columnInfos[i].type;        externalColumn.Length = m_columnInfos[i].length;        externalColumn.Name = m_columnInfos[i].name;        // Assign the output column to the external metadata        column.ExternalMetadataColumnID = externalColumn.ID; 


This component has only one output so this setting is redundant. However, if a component has more than one output that are in sync with the same input, the ExclusionGroup property should be set to a nonzero value and the DirectRow method should be used to tell the Execution Engine to which output the row should be sent. If the DirectRow method is not called to indicate the output to which the row should be sent, the row is not sent to an output. Outputs in sync with the same input and with exclusion group 0 are automatically forwarded to all outputs.

       // Exclusion group        output.ExclusionGroup = 0; 


This setting is also redundant because it is zero by default. But setting it makes it explicitly understood that the output is an asynchronous output. You will recall that an asynchronous output is the start of a new execution tree and consequently a new buffer type. Therefore, asynchronous outputs have no synchronous input ID value.

       // Synchronous input        output.SynchronousInputID = 0;     } } 


The Validate Method

The Validate method is where the component has the opportunity to sanity check some of its own settings. When validating, it's important to remember that not all settings come directly through a well-developed UI that checks for bounds violations or other invalid values. It's possible that someone modified the package XML directly or that the component was created by a different client application than the designer. Validation should be a little pessimistic and check all the assumptions. The sample Validate method for the Image File Source shows a few examples of the kind of checks validation should make.

public override DTSValidationStatus Validate() {        // Make sure base call is successful        DTSValidationStatus status = base.Validate();        if (status == DTSValidationStatus.VS_ISCORRUPT)        {           return status;        } 


If there are no external metadata columns, ReinitializeMetadata probably hasn't been called yet. This is probably the first time Validate is called and the component was probably just created. Short circuit the call here so that the Data Flow Task calls ReinitializeMetadata. Later, Validate will be called again and make it past this check.

       // If there are no external metadata columns, then return that we need some        // Short circuit the validation since there is no metadata to validate at this point.        if (ComponentMetaData.OutputCollection[0].ExternalMetadataColumnCollection.Count == 0)        return DTSValidationStatus.VS_NEEDSNEWMETADATA; 


This is a source; there should be no inputs and only one output because the component doesn't support an error output.

       // should have no inputs        if (ComponentMetaData.InputCollection.Count != 0)           return DTSValidationStatus.VS_ISCORRUPT;        // should have two outputs        if (ComponentMetaData.OutputCollection.Count != 1)           return DTSValidationStatus.VS_ISCORRUPT; 


The following code is an example of pessimistic validation. The component sets UsesDispositions to TRUE, but this code still checks it to ensure that the value hasn't been altered in some way.

       // See if the UsesDispositions is set.        if (!ComponentMetaData.UsesDispositions)        {           bool bCancel;           ComponentMetaData.FireError( 99, "ImageFile Source",             "Uses dispositions setting is incorrect", "", 0, out bCancel);           return DTSValidationStatus.VS_ISCORRUPT;        }        return status; } 


AcquireConnections and ReleaseConnections

Acquire and Release Connections methods are called the most because they bracket any of the other calls, including Validate, ReinitializeMetadata, ProcessInput/PrimeOutput, and PreExecute/PostExecute. They do not bracket PrepareForExecute and Cleanup calls.

AcquireConnections

Because getting a connection is such a commonly required thing to do in the component, the AcquireConnection method has been provided. You could get away with not implementing this method by simply doing the same code in all the methods that require connections, but then you'd have duplicated code in all of them and the Data Flow Task wouldn't be able to do some of its own validation on the connection managers. Implementing the AcquireConnections and ReleaseConnections methods correctly ensures that the component will always have a valid and up-to-date connection available before the other methods are called. The up-to-date part is important because, if the component simply calls AcquireConnection on a connection manager and stores the returned connection object as a member variable, numerous problems could result. For example, the connection might time out or it might be changed by a property expression or a Foreach Loop. It is important to implement AcquireConnections correctly and to release the connection object in the ReleaseConnections method to minimize the chance that those bad things happen.

// Called multiple times during design and once during execution public override void AcquireConnections(object transaction) { 


Get the RuntimeConnection that was created in ProvideComponentProperties.

       IDTSRuntimeConnection90 conn;        try        {           // get the runtime connection           conn =  ComponentMetaData.                      RuntimeConnectionCollection[CONNECTION_NAME];        }           catch (Exception ex)        {        bool bCancel;        ComponentMetaData.FireError(1, "ImageFileSrc",                "Could not find the runtime connection.",                "", 0, out bCancel );        throw ex; } // Get the connection manager host from the connection ConnectionManagerHost cmh = (ConnectionManagerHost)conn.ConnectionManager; if (cmh == null) {        bool bCancel;        ComponentMetaData.FireError(2, "ImageFileSrc",                  "Could get the runtime Connection Manager.",                  "", 0, out bCancel ); } 


The Image File Source only supports the Multifile type of connection manager. The component simply cannot function without one, so alert the user with an error if it isn't the correct type.

  // Make sure it is a multifile connection manager   else if ( cmh.CreationName != "MULTIFILE") {          bool bCancel;          ComponentMetaData.FireError(2, "ImageFileSrc",                  "Connection manager must be of type MULTIFILE",                  "", 0, out bCancel); } 


The connection manager was the correct type, so store it in a member variable. Note that it is stored there, but updated every time AcquireConnections is called. This is different than getting a reference to the connection manager once and holding onto it for the life of the component.

else {        try        {           // Store the connection manager for validation and execution           m_cm = conn.ConnectionManager;        }        catch (Exception ex)        {            bool bCancel;            ComponentMetaData.FireError(10, "ImageFileSrc",                    "Could not get the runtime connection.",                    "", 0, out bCancel);             throw ex;        }    } } 


ReleaseConnections

Normally, the ReleaseConnections is where you would release the connection object you retrieved from the connection manager. Because MultiFile Connection Managers only return strings containing a filename, there is nothing to release here.

// Release if you have one, multifile only returns strings public override void ReleaseConnections() {        // Nothing to do here because we're using file connection managers        base.ReleaseConnections(); } 


General Runtime Approach

The general approach for this component is as follows:

  • Get columns Establish the set of columns that will be populated with EXIF field information.

  • Store column ID Set the column ID in the array of structures containing the column information based on the field metadata.

  • Loop through files Retrieve the filename for each file returned from the MultiFile Connection Manager.

    • Call AcquireConnection Retrieve the name of the file.

    • Read EXIF data Create an ExifFileReader object to open the file and read the EXIF information.

    • CallAddRow Add a new row to contain the EXIF information.

    • Insert data Copy the EXIF information to the buffer.

    • Handle dispositions If the column insert fails, determine the reason and warn the user.

  • Set EndOfRowset When the MultiFile Connection Manager returns null, indicating that there are no more files to open, set EndOfRowset on the buffer.

Take a look at the code now and see how each of the steps is implemented.

Get Columns

// Called once right before executing. public override void PreExecute() {        base.PreExecute();        // Find out which output columns exist        IDTSOutput90 output = ComponentMetaData.OutputCollection[0];        int countColumns = output.OutputColumnCollection.Count; 


Check whether output columns exist. This is how the component knows if the user deleted a path between the ExternalMetadata column and the output column.

// Iterate through the columns for( int iColumn = 0; iColumn < countColumns; iColumn++ ) {        // Set the exists flag for the column info.        m_columnInfos[iColumn].outputExists =               (output.OutputColumnCollection[iColumn] == null ? false : true); 


Store Column ID The

ColumnInfos array stores the output ID of the given column. Later, the component uses the ID to access the column on the buffer.

       m_columnInfos[iColumn].outputID = iColumn;     } } 


The processing of the output buffers happens in the PrimeOutput method. Note that this method is only called once. It's a blocking call and the component does not return from the call until it has added all the rows to the buffer. Also note that the component only processes one buffer. To the component, it appears as though there is one endless buffer available for filling.

// Called to provide a buffer to fill. Only called once per execution. public override void PrimeOutput(int outputs, int[] outputIDs, PipelineBuffer[] buffers) { 


Get the buffer. In this case, there is only one because there is only one output.

    // Only have one output, so     PipelineBuffer buffer = buffers[0];     // The name of the jpg file     string fileName; 


Loop Through Files

While the MultiFile Connection returns a valid filename, the component needs to do the following steps.

    // For each jpg file, get the EXIF data     // MultiFile connection managers return null from     // AcquireConnection when the file list is exausted.     object conn; 


Call AcquireConnection

  while (null != (conn = m_cm.AcquireConnection(null))) 


Read EXIF Data

The MultiFile Connection Manager returns the filename as an object. Convert to a string.

           // Get the name of the file            fileName = conn.ToString(); 


Create a new ExifReader object to open the file and read the EXIF information. The ExifReader is found in the EXIFReader.cs file in the sample project.

           // Open each file and read the EXIF information            ExifReader reader = new ExifReader(fileName); 


Call AddRow

This method adds a row to the buffer. If the buffer is full, the Execution Engine performs the buffer flip and then adds a row to the new buffer.

           // Add a new row to the buffer            buffer.AddRow(); 


Insert Data

Insert data into the buffer using the SetString method. This code makes a big assumption to simplify the code. Most of the time, sources have columns of varying types. In these cases, the component needs to use the various type-specific insert methods, such as SetBoolean, SetByte, SetInt16, or SetDateTime.

// Insert each column into the buffer. foreach (ColumnInfo ci in m_columnInfos) {     try     {        // Only output to columns the user has chosen        if (ci.outputExists)        {            // We only have string types            // For other types, switch/case works.            buffer.SetString(ci.outputID, reader.GetProperty(ci.name));          }        } 


Handle Dispositions

In the ProvideComponentProperties method, the component sets the UsesDispositions flag to TRUE. If the component supports an error output, it directs the row to the error output. Otherwise, the component might simply handle the error rows in the exception handler.

catch (Exception ex) {        if (ex is DoesNotFitBufferException)        {        DTSRowDisposition rd =            ComponentMetaData.OutputCollection[0].TruncationRowDisposition;        if (rd == DTSRowDisposition.RD_IgnoreFailure)        {            // Ignore the error and continue.            ComponentMetaData.FireWarning(100, "ImageFleSrc",               "The file property " + ci.name + " for file " +                  reader.FileName + " was too long. Ignoring truncation.", "", 0);          }          else          {             bool bCancel = false;             ComponentMetaData.FireError(100, "ImageFleSrc",                "The file property " + ci.name + " for file " + reader.FileName +                " was too long.", "", 0, out bCancel);             throw ex;          } 


Set EndOfRowset

The EndOfRowset method tells the Execution Engine that the output is done generating rows. If this method is not called, the Data Flow Task never terminates and continues waiting for more rows. However, for source adapters, the Execution Engine considers the source complete when the PrimeOutput call returns. If the EndOfRowset is not set by the source before returning from PrimeOutput, the Execution Engine fires an error before terminating.

        // Tell the buffer that we're done now.         buffer.SetEndOfRowset(); 


Source adapters are examples of pure asynchronous outputs. Although the metadata for asynchronous outputs on a transform might sometimes reflect the metadata on the transforms inputs, sources have no inputs and all the data comes from some source outside of the pipeline. The primary responsibilities of a source adapter, therefore, are as follows:

  • Describe the source data to the Data Flow Task

  • Retrieve data from some source medium

  • Convert the source data into buffer format

  • Insert the source data into the buffer

  • Tell the Execution Engine when the source data is exhausted



Microsoft SQL Server 2005 Integration Services
Microsoft SQL Server 2005 Integration Services
ISBN: 0672327813
EAN: 2147483647
Year: 2006
Pages: 200
Authors: Kirk Haselden

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net