The Pipeline Component Methods | Wireless Java : Developing with Java 2, Micro Edition

Components are normally described as having two distinct phases: design-time and runtime. When you implement a component, you inherit from the base class, Microsoft.SqlServer.Dts.Pipeline.PipelineComponent, and provide your own functionality by overriding the base methods, some of which are primarily design-time, others runtime. If you are using native code, then the divide between the runtime and design-time is clearer because they are implemented on different interfaces. Commentary on the methods has been divided into these two sections, but there are some exceptions, notably the connection-related methods; a section on Connection Time–related methods is included later on.

Note

In programming terms, a class can inherit functionality from another class, termed the base class. If the base class provides a method, and the inheriting class wishes to change the functionality within this method, it can override the method. In effect, you replace the base method with your own. From within the overriding method, you can still access the base method, and call it explicitly if required, but any consumer of the new class will see only the overriding method.

Design-Time

The following methods are explicitly implemented for design-time, overriding the PipelineComponent methods, although they will usually be called from within your overriding method. Not all of the methods have been listed, because for some there is little more to say, and others have been grouped together according to their area of function. Refer to the SQL Server documentation for a complete list.

There are some methods that have been described as verification methods, and these are a particularly interesting group. They provide minor functions such as adding a column or setting a property value, and you could quite rightly think that there is little point in ever overriding them, as there isn't much to add to the base implementation. As mentioned, these are your verification methods, and code has been added for verification that the operation about to take place within the base class is allowed. The following sections expand on the types of checks you can do, and if you want to build a robust component, these are well worth looking into.

Another very good reason to implement these methods as described is actually to reduce code. These methods will be used by both a custom user interface (UI) and the built-in component editor, or Advanced Editor. If you raise an error saying that a change is not allowed, then both user interfaces can capture this and provide feedback to the user. Although a custom UI would not be expected to offer controls to perform blatantly inappropriate actions, the Advanced Editor is designed to offer all functionality, so you are protecting the integrity of your component regardless of the method used.

ProvideComponentProperties

This method is provided so you can set up your component. It is called when a component is first added to the Data Flow, and it initializes the component. It does not perform any column-level activity, as this is left to ReinitializeMetadata; when this method is invoked, there are generally no inputs to outputs to be manipulated anyway. The sorts of procedures you may want to set in here are listed below.

Remove existing settings, such as inputs and outputs. This allows the component to be rebuilt and can be useful when things go wrong.
Add inputs and outputs, ready for column work later on in the component lifetime. You may also define custom properties on them and specify related properties, such as linking them together for synchronous behavior.
Define the connection requirements. By adding an item to the RuntimeConnectionCollection, you have a placeholder prepared for the Connection Manager at runtime, as well as informing the designer of this requirement.
The component may have custom properties that are configurable by a user in addition to those you get for free from Microsoft. These will hold settings other than the column-related one that effect the overall component operation or behavior.

Validate

Validate is called numerous times during the lifetime of the component, both at design-time and at runtime, but the most interesting work is usually the result of a design-time call. As the name suggests, it validates that the content of the component is correct and will enable you to at least run the package. If the validation encounters a problem, then the return code used is important to determine any further actions, such as calling ReinitializeMetadata. The base class version of Validate performs its own checks in the component, and you will need to extend it further in order to cover your specific needs. Validate should not be used to change the component at all; it should only report the problems it finds.

ReinitializeMetaData

The ReinitializeMetaData method is where all the building work for your component is done. You add new columns, remove invalid columns, and generally build up the columns. It is called when the Validate method returns VS_NEEDSNEWMETADATA. It is also your opportunity in the component to do any repairs that need to be done, particularly around invalid columns as mentioned previously.

MapInputColumn and MapOutputColumn

These methods are used to create a relationship between an input/output column and an external metadata column. An external metadata column is an offline representation of an output or input column and can be used by downstream components to create an input. It allows you to validate and maintain columns even when the data source is not available. It is not required, but it makes the user experience better. If the component declares that it will be using External Metadata (IDTSComponentMetaData90 .ValidateExternalMetadata), then the user in the advanced UI will see upstream columns to the left and the external columns on the right; if you are validating your component against an output, you will see the checked listbox of columns.

Input and Output Verification Methods

There are several methods you can use to deal with inputs and outputs. The three functions you may need to perform are adding, deleting, and setting a custom property. The method names clearly indicate their functions:

InsertInput
DeleteInput
SetInputProperty
InsertOutput
DeleteOutput
SetOutputProperty

For most components, the inputs and outputs will have been configured during ProvideComponentProperties, so unless you expect a user to add additional inputs and outputs and fully support this, you should override these methods and fire an error to prevent this. Similarly, unless you support additions, you would also want to deny deletions by overriding the corresponding methods. Properties can be checked for validity during the Set methods as well.

Set Column Data Types

There are two methods used to set column data types: one for output columns and the other for external metadata columns. There is no input column equivalent, as the data types of input columns are determined by the upstream component.

SetOutputColumnDataTypeProperties
SetExternalMetadataColumnDataTypeProperties

These are verification methods that can be used to validate or prevent changes to a column. For example, in a source component, you would normally define the columns and their data types within ReinitializeMetaData. You could then override SetOutputColumnDataTypeProperties, and by comparing the method's supplied data types to the existing column, you could prevent data type changes but allow length changes.

There is quite a complex relationship between all of the parameters for these methods; please refer to SQL Server documentation for reference when using this method yourself.

PerformUpgrade

This method should allow you to take a new version of the component and update an existing version of the component on the destination machine.

RegisterEvents

This method allows you to register custom events in a pipeline component. You can therefore have an event fire on something happening at runtime in the package. This is then eligible to be logged in the package log.

RegisterLogEntries

This method decides which of the new custom events are going to be registered and selectable in the package log.

SetComponentProperty

In the ProvideComponentProperties method, you told the component about any custom properties that you would like to expose to the user of the component and perhaps allow them to set. This is a verification method, and here you can check what it is that the user has entered for which custom property on the component and ensure that the values are valid.

Set Column Properties

There are three column property methods, each allowing you to set a property for the relevant column type.

SetInputColumnProperty
SetOutputColumnProperty
SetExternalMetadataColumnProperty

These are all verification methods and should be used accordingly. For example, you may set a column property during ReinitializeMetaData, and to prevent a user interfering with this, you could examine the property name and throw an exception if it is a restricted property, in effect making it read-only.

Similarly, if several properties are used in conjunction with each other at runtime to provide direction on the operation to be performed, you could enumerate all column properties to ensure that those related properties exist and have suitable values. You could assign a default value if a value is not present or raise an exception depending on the exact situation.

For an external metadata column, which will be mapped to an input or output column, any property set directly on this external metadata column can be cascaded down onto the corresponding Input or Output column through this overridden function.

SetUsageType

This method deals with the columns on inputs into the component. In a nutshell, you use it to select a column and to tell the component how you will treat each column. What you see coming into this method is the Virtual Input. What this means is that it is a representation of what is available for selection to be used by your component. These are the three possible usage types for a column:

DTSUsageType.UT_IGNORED — The column will not be used by the component. What happens is that you will be removing from the InputColumnCollection this InputColumn. This differs from the other two usage types, which add a reference to the InputColumn to the InputColumnCollection if it does not exist already or you may be changing its Read/Write property.
DTSUsageType.UT_READONLY — The column is read-only. The column is selected, and data can be read and used within the component but cannot be modified.
DTSUsageType.UT_READWRITE — The column is selected, and you can both read and write or change the data within your component.

This is another of the verification methods, and you should use it to ensure that the columns selected are valid. For example, the Reverse String sample shown below can operate only on string columns, so you must check that the data type of the input column is DT_STR for string or DT_WSTR for Unicode strings. Similarly, the component performs an in-place change, so the usage type must be read/write. Setting it to read-only would cause an exception during execution when you tried to write the changed data back to the pipeline buffer. Therefore you want to validate the columns as they are selected to ensure that they meet the requirements for your component design.

On Path Attachment

There are three closely related path attachment methods, called when the named events occur, and the first two in particular can be used to improve the user experience:

OnInputPathAttached
OnOutputPathAttached

The reason these methods are here is to handle situations where the inputs or outputs are all identical and interchangeable, the multicast being the example, where you attach to the dangling output and another dangling output is created. You detach, and the output is deleted.

Runtime

Runtime, also known as execution-time, is when you actually work with the data, through the pipeline buffer, with columns and rows of data. The following methods are all about preparing the component, doing the job it was designed for, and then cleaning up afterward.

PrepareForExecute

This method is rather like the PreExecute method below and can be used for setting up anything in the component that you will need at runtime. The difference is that you do not have access to the Buffer Manager, so you cannot get your hands on the columns in either the output or the input at this stage. The distinction between the two is very fine apart from that, so usually you will end up using PreExecute exclusively, as you will need access to the Buffer Manager anyway.

PreExecute

PreExecute is called once and once only in the component, and it is the recommendation of Microsoft that you do as much preparation as possible for the execution of your component in this method. In this case, you'll use it to enumerate the columns, reading off values and properties, calling methods to get more information, and generally preparing by gathering all the information you require in advance. This is stored in a variable, making it faster to access multiple times rather than creating objects during the real execution for every row. This is the earliest point in the component that you will access the component's Buffer Manager, so you have the live context of columns, as opposed to the design-time representation. The live and design time representations of columns may not match. The design time may contain more information that you do not need at runtime. As mentioned, it is here that you do the Column Preparation for your component in this method, because it is called only once per component execution, unlike some of the other runtime methods, which are called multiple times.

PrimeOutput and ProcessInput

These two methods are dealt with together because they are so closely linked that to deal with them any other way would be disjointed. These two methods are essentially how the data flows through components. Sometimes you use only one of them, and sometimes you use both. There are some rules you can follow.

In a source adapter, the ProcessInput method is never called, and all of the work is done through PrimeOutput. In a destination adapter, it is the opposite way around. The PrimeOutput method is never called, and the whole of the work is done through the ProcessInput method.

Things are not quite that simple with a transform. There are two types of transforms, and the type of transform you are writing will dictate which method or indeed methods your component should call.

Synchronous: PrimeOutput is not called and therefore all the work is done in the ProcessInput method. The buffer Lineage IDs remain the same. For a detailed explanation of buffers and Lineage IDs, please refer to Chapter 10.
Asynchronous: Both methods are called here. The only difference really between a synchronous and an asynchronous component is that the asynchronous component does not reuse the input buffer. The PrimeOutput method hands the ProcessInput method a buffer to fill with its data.

PostExecute

This method would be where you clean up anything that you started in PreExecute. Although it can do this, it is not limited to just that. After reading the description of the Cleanup method in just a second, you're going to wonder about the difference between that and this method. The answer is, for this release, nothing. If you want to think about this logically, then PreExecute is married to PrepareForExecute.

Cleanup

As the method name suggests, this is called as the very last thing your component will do, and it is your chance to clean up whatever resources may be left. However, it is rarely used, like PostExecute.

DescribeRedirectedErrorCode

If you are using an error output and directing rows down there in case of errors, then you should expose this method to give more information about the error. When you direct a row to the error output, you specify an error code. This method will be called by the pipeline engine, passing in that error code, and it is expected to return a full error description string for the code specified. These two values are then included in the columns of the error output.

Connection Time

These two methods are called several times throughout the life cycle of a component, both at design-time and at runtime, and are used to manage connections within the component.

AcquireConnections

This method is called both in design and when the component executes. There is no explicit result, but the connection is normally validated and then cached in a member variable within the component for later use. At this stage, a connection should be open and ready to use.

ReleaseConnections

If you have any open connections, as set in the AcquireConnections method, then this is where they should be closed and released. If the connection was cached in a member variable, use that reference to issue any appropriate Close or Dispose methods. For some connections, such as a File Connection Manager, this may not be relevant as all that was returned was a file path string, but if you took this a stage further and opened a text stream or similar on the file, it should now be closed.