The Task and the Taskhost Container

The Task and the `Taskhost` Container

By now, you should have a pretty good idea how containers look and act. Each container shares some properties with the other container types, yet provides some unique behavior. Containers, including the Package, Event Handler, and Loop containers, are essentially derivations of the Sequence container; they all execute a series of tasks in some order dictated by precedence constraints. The Taskhost is somewhat different. It always contains exactly one task.

The Taskhost is a very dedicated sort of container. Taskhosts live and die with the tasks they contain. Every instance of a task has a Taskhost surrounding it, servicing it, protecting it, and controlling it. When the task is deleted, the Taskhost goes with it. Yet, when viewed within the designer, the task is all you see, whereas the Taskhost is mostly invisible.

Properties Collection

In the bowels of the Taskhost code is a bit of logic that interrogates the task to discover what properties the task has. The Taskhost then builds a collection of property objects for each of the properties on the task. The properties collection is called the properties provider. This is how the designer discovers properties on the task. It's how property expressions and configurations are able to modify the values of task properties. If you ever write code to embed Integration Services into your application, using the properties provider is the easiest way to set property values on tasks. Figure 7.6 is a conceptual illustration of the relationship between tasks and Taskhosts.

Figure 7.6. `Taskhost` have property object collections

Persistence

Tasks can be written so that they save and load their own property values. However, for tasks with simple typed properties, the Taskhost can save and load the properties for the task, relieving the custom task writer from the chore of writing persistence code. Custom task writers need only create properties on their tasks and the Taskhost takes care of the rest through the properties provider and supporting persistence logic.

Package Paths and Configurations

When you create a configuration that changes the value of a property on a task, you're using the Taskhost. Each configuration has a value and a path, called a package path. The path is submitted to the package that parses it up to the portion of the package path that points to a child container or property. This process is repeated until the final object in the package path is found. If the package path points to a property on a task, the Taskhost sets the task property to the value in the configuration.

Debugging Features

Many of the logging features are implemented at the container level as follows.

Breakpoints

The ability to add breakpoints to tasks is one of the more powerful features available in Integration Services and is made possible by the Taskhost. Tasks set up custom breakpoints when the runtime calls their Initialize method by adding them to a special breakpoints collection. During package execution, the Taskhost interacts with the task to indicate when a breakpoint is hit and when the task should continue to execute.

A set of default breakpoints are available on every task as well. The Taskhost exposes these breakpoints by default, even for custom tasks that don't support breakpoints. Figure 7.7 illustrates how the Taskhost provides breakpoints for these standard events, even when the task doesn't support them.

Figure 7.7. `Taskhosts` provide a standard set of breakpoints

Disabling Control Flow

Occasionally when you're debugging a package, it's nice to remove tasks and simplify and eliminate some of the noise from the package to better understand what's happening, or what's going wrong. But, chances are, you've put a lot of work into the settings for each task. Deleting them means you'd have to add and reconfigure them all over again. It would be nice if you could just simulate deleting the tasks until you can troubleshoot the package, and then just turn them all back on again. This is the idea behind disabling tasks. Figure 7.8 illustrates how the Taskhost implements disabling.

Figure 7.8. `Taskhosts` implement disabling of tasks

Custom Registration

Tasks can create their own custom breakpoints and Log events.

Breakpoints

As noted in the "Debugging Features" section, tasks can provide custom breakpoints. When the user creates a task by either dropping it onto the designer or programmatically and if the task supports it, the runtime provides a method to add additional custom breakpoints to the standard ones. Although it's not necessary to support custom breakpoints, for complex or long-running tasks, it provides a way for package writers to easily peek into the behavior of the task as it is running. Custom breakpoints are covered in greater detail in Chapters 24, "Building Custom Tasks."

Log Events

The Taskhost provides default logging that logs the Execute, Validate, and other events. Tasks can also send information to the logs. Figure 7.9 shows the Configure SSIS Logs dialog box in the designer. Using this dialog box, package writers can filter events and the information they want to log for each event. Each item in the list (shown in Figure 7.9) is a log event.

Figure 7.9. Tasks can support custom log events

To make it possible for package writers to filter custom events, it's necessary for tasks to describe to the Taskhost what custom log events they will log. To do this, the Taskhost checks to see if the task supports custom logging and then retrieves from the task the names and descriptions of all the custom log events the task will log. In Figure 7.9, the custom log events are FTPOperation and FTPConnectingToServer.

Custom Events

Like custom log events, tasks can also describe custom execution events they may raise. This makes it possible to create event handler containers for custom events in the same way that you can create event handlers for stock events like OnPreExecute and OnVariableValueChanged. In this way, it's actually possible to build Integration Services workflow that executes as a direct result of an arbitrary event raised from a custom task.

Contact Information and Graceful Load Failover

Graceful load failover and contact information are two features aimed at reducing the occurrence of loading problems or, at least, simplifying their resolution. Graceful load failover is what the object model does when a component that is referenced in a package is not installed on the machine. When the package attempts to load a component such as a transform or a task, it checks to see if it succeeded. If not, it captures the failure and emits an error but continues to load the remainder of the package.

Contact information is a feature that allows components to describe themselves. Look at the XML for any package with stock tasks in it and you'll see the contact information for the task as part of the task's XML. If the task fails to load, the Taskhost retrieves the contact information for the component and builds an error with the contact information embedded. The following is the contact information property node for the Execute SQL Task:

<DTS:Property DTS:Name="TaskContact">  Execute SQL Task; Microsoft Corporation; Microsoft SQL Server v9;  © 2004 Microsoft Corporation; All Rights Reserved;  http://www.microsoft.com/sql/support/default.asp;1 </DTS:Property>

Because of the graceful load failover feature, the package still successfully loads, but the Taskhost emits an error and creates a placeholder task in the place of the missing or uninstalled task. The designer shows the placeholder task in the same location in the package as where the uninstalled task was, and the Taskhost retrieves the contact information for the task and displays it as part of the error. This allows the package user a way to find out what task is missing and how to correct it. It also presents the user with an opportunity to correct the package by either deleting the task and repairing it or reinstalling the missing task.

Note

Although contact information is simply a string and can contain any information you want to supply, Microsoft recommends that the following format be used for contact information:

Task Name;Company Name;Product Name;Copyright;Component Webpage

To see graceful load failover and contact information work, try this:

Build a package with an Execute SQL Task.
You don't need to modify it, just drag and drop it on the designer for a new package.
Save the package and close it.
Open the package by right-clicking on it in the Solution Explorer and selecting View Code.
Find the SQL Task node.
Change the creation name of the SQL Task node in some way to make it invalid. For example, you could add two periods where there should be one.
Save the package in the Code View.
Restart the designer.
Now, attempt to load the package into the designer again.

The package looks the same. The task is there, but you should also receive an error message telling you that there were errors loading the task and that the package didn't load successfully. What's happening under the covers? In this case, the Taskhost gets as much information about the task as possible and creates something that looks like the SQL Task. Try to run the package and it fails, but, at least, now with graceful load failover, you know that there was a SQL Task there and that it's invalid. The error looks similar to the following with some of the specifics such as package name and so on changed:

    Error    1    Error loading FactCheck.dtsx: Error loading a task. The     contact information for the task is "Execute SQL Task; Microsoft     Corporation; Microsoft SQL Server v9; © 2004 Microsoft Corporation; All     Rights Reserved;http://www.microsoft.com/sql/support/default.asp;1". This     happens when loading a task fails.

Note

Graceful load failover is only implemented for tasks and data flow components. Contact information is implemented for tasks, Foreach Enumerators, log providers, connection managers, and data flow components.

Isolation

The Taskhost isolates the task. This means that it only provides as much information to the task as is absolutely necessary, but no more. This isolation keeps the task from doing things such as traversing the package object model and making changes to other objects and tasks. Likely, the benefits of isolating tasks are not immediately obvious to the new Integration Services user. After all, DTS allowed tasks access to the object model of the package. What's so bad about that?

The problems with that are legion as follows. Tasks that modify the object model

Cannot be migrated or upgraded
Cannot be sandboxed or otherwise executed in alternative locations
Can modify other tasks or objects and cause bugs that are difficult to debug and fix
Can break execution time behavior of the runtime
Can create race conditions that the runtime cannot detect or resolve
Can access variables unsafely causing inconsistent or incorrect values
Might cause crashes or other severe behavior due to invalid object access
Might cause memory corruption or leaks by breaking COM reference counting, apartment, or Managed/Native code interop rules

The point here is the guiding philosophy from the start of Integration Services and that tasks should not be in the business of controlling workflow or anything like it. Tasks do the work; the runtime provides management, scheduling, variable access, cancellation, and all the rest. Tasks should only be provided as much information as is needed for them to successfully execute and communicate their status to the outside world, the object model, and they do that through the Taskhost. Basically, the Taskhost provides a way to ensure that tasks "keep their hands to themselves" in the Integration Services runtime sandbox.

Sensitive Data Protection

Another service Taskhosts provide is sensitive data protection. Some tasks have properties that contain sensitive data such as passwords. Integration Services provides overarching security settings for protecting sensitive data and the Taskhost supports those settings. To enable this, tasks mark their sensitive properties with a Sensitive=1 attribute when saving. The Taskhost detects this setting and applies the appropriate action in memory, either stripping out the value or encrypting it based on the global security setting, and then writing it out to the package file.

In summary, the Taskhost is a virtually invisible container that performs much of the work that makes tasks easy to develop, configure, and maintain. It provides default behavior for tasks; prevents tasks from doing harmful operations; and provides configurations, property expressions, properties discovery, and many other crucial features on behalf of tasks.

The Simple `Sequence` Container

The Sequence container is very simple and perhaps often overlooked. To be sure, not everyone or every situation requires its use, but the Sequence container has some valuable purposes. You can use the Sequence container to simply divide up packages into smaller, more comprehensible pieces. This was covered in the general container discussion previously, but applies specifically to the Sequence container because it does not offer any additional functionality.

So, let's look at a case in which the Sequence container can be useful. You might want to perform multiple, discrete operations, yet ensure that they either all succeed or fail together. For example, you might want to execute multiple SQL scripts in sequence. The Sequence container is ideal for this because it, like other containers, allows you to create a distributed transaction with a lifetime that begins when the Sequence container begins executing and either commits or rolls back when the Sequence container returns. The Sequence container commits the transaction if it completes successfully and it rolls back the transaction if it fails execution.

The SequenceContainers package in the S07-Containers sample solution illustrates container transactions. Figure 7.10 shows the Sequence container from the sample package. The sample package creates a table and inserts some values into the new table. However, if any of the tasks fail, you want the actions from the previous tasks to roll back. With the Sequence container, this is simply a matter of configuring the transactionOption property to have the value Required, which forces a transaction if one does not already exist on the parent container of the Sequence container.

Figure 7.10. `Sequence` containers support transactions

The package has a Sequence container with four Execute SQL Tasks. The first task creates a table named CompanyAddresses having a column called ID with the UNIQUE constraint. The next three tasks insert rows into the table. The last task, Insert Apple, attempts to insert a row for Apple with a nonunique ID. The transactionOption property for the Sequence container is set to Required to ensure that if a transaction has not already been started in the parent of the Sequence container, it starts a new one. If you open the package and run it, the last task, the Insert Apple Execute SQL Task, should fail, causing the transaction for the entire Sequence container to roll back and undo the actions of the previous tasks, including the insertions and table creation. This sample demonstrates one of the powerful features of containers, called user-defined transaction scope. Try changing some of the settings for this package and see how it affects the transaction. You can also change the ID for the Apple company (in the task's query property) to a unique value and it correctly inserts all rows and commits the entire transaction.

Another use of the Sequence container is to build exceptional workflow. A common request is to conditionally execute a task in the workflow, but to continue executing the remainder of the workflow even if the conditional task doesn't execute. Because the Sequence container makes it simple to build smaller units of execution inside the package, this request is easy. Figure 7.11 shows the sample package after being executed. Notice that Task 2, the conditional task, didn't execute, but Task 3 executed anyway. This is because Task 3 executes dependent on the successful completion of the Sequence container, not the exceptional class. Although there are other solutions to this problem, they can be quite difficult to set up. This method is simple to set up and easy to understand.

Figure 7.11. `Sequence` containers create smaller units of execution

To see this package in action, open the ConditionalTask package in the S07-Containers sample solution. To see the package execute all three tasks, change the variable named EXECUTECONDITIONALTASK to be trUE and run the package again.

Although the Sequence container is not as powerful as the looping containers and doesn't get used as much as other containers, it has its place and value. Use the Sequence container to document and reduce complex packages and you'll start to really appreciate its simple elegance.