Chapter 7: Package Reliability | Expert SQL Server 2005 Integration Services (Programmer to Programmer)

What sets enterprise ETL solutions like SSIS apart from just writing your complete process in a TSQL script is how SSIS handles problems. This chapter walks you through several scenarios on how to trap an error that makes your system go bump in the night without having to receive that call or page. In SSIS, you can develop a package that can self-heal or, at worst, roll back to a predictable state where you can fix things in the morning. If you must get a 4 a.m. call, you will want to be able to fix the problem and re-execute the package without having to worry about your data being in an inconsistent state. This chapter walks you through a number of examples on how to stabilize and make your packages bulletproof.

Error and Event Handling

Event handling is, at its root, the main differentiating factor between SSIS and writing a 10-page TSQL script to transform your data. With event handlers, you can watch certain areas of the package or the complete package and react to errors. This section also shows you how to use event handlers in a few more advanced ways.

Types of Event Handlers

When you create an event handler, you must specify what type of event you want to trap. The main types of events to watch are going to be PreExecute, PostExecute, OnError, and OnWarning. The PreExecute event is executed any time a package, container, or task starts. If you were to have a package that had a sequence container with two tasks, you would see the PreExecute event executed four times in the packages execution: once for the package beginning, another time for the container starting, and once for each task. The PostExecute works much the same way, but is executed for every time a package, container, or task completes.

The OnWarning and OnError events are executed any time a warning or error is executed. In the same theoretical package mentioned earlier with a package, container, and two tasks, an OnError event handler would only be executed once if only a single task inside the container were to fail. You can also trap any type of informational event with the OnInformation event handler. This is generally going to be overkill, though, since the event is executed dozens of times (typically) for a small package and provides little value to someone looking at the log.

For auditing purposes, you may have the requirement to trap any time a variable is changed. You can do this by first creating an OnVariableValueChanged event handler. Then, you must set the RaiseChangedEvent property on each variable that you want to trap the event for to True. By default, this property is set to False. To change the property, select the variable, then go to the Properties window. This event handler comes in handy sometimes if your auditors require such detailed information.

Auditing Through Event Handlers

The most common use for event handlers is for auditing. Events can be scoped to the entire package, a container, or a granular and an individual task. Generally, you’re going to scope the event handler to the entire package, but occasionally you may have a sensitive part of your package that needs special attention. Using event handlers for auditing gives you a lot of power to consolidate your logging into a single table for your entire SSIS enterprise environment. For example, you could have all 2,000 packages in your environment writing into a central log table, and you could then write reports on the table to detect hung tasks or errors.

First, you’ll need to create a table to store your errors. The AdventureWorks databases will be a fine place to create this table for the purpose of this example, but ideally, if you consolidate your log for many packages, you’ll want to centralize this table into an optimized table in its own database. The last thing you want is your packages being bottlenecked by auditing processes.

The following table is a good start on an auditing table (you can download all the scripts from this chapter from the book’s web site at www.wrox.com). You’d ideally want this table as descriptive as possible to help an operations person diagnose a problem. This table may not have all the columns your operations DBAs need to diagnose an issue, but it’s a great down payment.

 CREATE TABLE [dbo].[SSISEnterpriseLog](      [LogID] [int] IDENTITY(1,1) NOT NULL,      [PackageName] [varchar](150) NULL,      [EventType] [varchar](15) NULL,      [ExecutionID] varchar(50) NULL,      [PackageID] varchar(50) NULL,      [SourceName] [varchar](150) NULL,      [SourceID] varchar(50) NULL,      [ErrorCode] [varchar](15) NULL,      [ErrorDescription] [varchar](1000) NULL,      [InteractiveMode] [bit] NULL,      [MachineName] [varchar](50) NULL,      [UserName] [varchar](50) NULL,      [EventDateTime] [datetime] NOT NULL CONSTRAINT [DF_SSISEnterpriseLog_EventDateTime]  DEFAULT (getdate()),  CONSTRAINT [PK_SSISEnterpriseLog] PRIMARY KEY CLUSTERED (      [LogID] ASC )WITH (PAD_INDEX  = OFF, IGNORE_DUP_KEY = OFF) ON [PRIMARY] ) ON [PRIMARY]

This table stores the events that occur any time you’d like to trap a problem, and the table will be inserted using an Execute SQL Task in the Event Handler tab. Let’s create an event handler to simulate this example. After you’ve created the example SSISEnterpriseLog table in the AdventureWorks database, create a new SSIS package called EventHandler.dtsx. In the package, create a connection manager to the AdventureWorks database that holds the table you just created.

Next, go to the Event Handler tab and create a new OnError event handler that is scoped to your entire package. Do this by selecting OnError from the Event Handler drop-down box and EventHandler from the Executable drop-down box. Drag an Execute SQL Task onto the page and name it OnError. Open the task and connect it to the AdventureWorks connection manager. Go to the Expression page in the task and add a new expression by clicking the ellipse button. The expression should be on the SQLStatementSource and should read like the following code:

 "INSERT INTO [dbo].[SSISEnterpriseLog]            ([PackageName]            ,[EventType]            ,[ExecutionID]            ,[PackageID]            ,[SourceName]            ,[SourceID]            ,[ErrorCode]            ,[ErrorDescription]            ,[InteractiveMode]            ,[MachineName]            ,[UserName]            )      VALUES            ('"+ @[System::PackageName] +"'            ,'"+ @[System::TaskName] +"'            ,'"+ @[System::ExecutionInstanceGUID] +"'            ,'"+ @[System::PackageID] +"'            ,'"+ @[System::SourceName] +"' ,'"+ @[System::SourceID] +"'      ,"+ (DT_STR, 15 , 1252) @[System::ErrorCode] +"      ,'"+ @[System::ErrorDescription] +"'      ,'"+ (DT_WSTR, 6) @[System::InteractiveMode] + "' ,'"+ @[System::MachineName] + "' ,'"+ @[System::UserName] +"' )"

The challenge of this statement is managing the double and single quotation marks. Luckily, you can download this complete package at www.wiley.com to avoid the trial and error of having to write it from scratch. This INSERT statement will insert a row into the auditing table any time an error occurs. You could also have used a parameterized query in the Execute SQL Task, but it would ultimately have been more complex to manage the ordinal question marks representing all the parameters. Usually, with more complex queries as shown here, it’s easier to just create an expression to make the query dynamic.

The INSERT statement makes extensive use of system variables to determine what task has failed. The TaskName variable is used to read the task name of the event handler’s Execute SQL Task. So, in this case, this would return the word OnError. This is going to be why you’ll want to name your task based on what type of event handler it is. Then, the task will help you self-document the INSERT statement without having to go into each INSERT statement and configure it for the individual INSERT statement.

The SourceName variable stores what task, container, or package threw the error, and is mandatory information for any auditing table. The ErrorCode and ErrorDescription variables are only available for an OnError or OnWarning event, and would have to be removed from this expression in events such as the OnPostExecute. For event handlers other than the OnError and OnWarning, you would want your INSERT statement to read like this:

 "INSERT INTO [dbo].[SSISEnterpriseLog]            ([PackageName]            ,[EventType]            ,[ExecutionID]            ,[PackageID]            ,[SourceName]            ,[SourceID]            ,[ErrorCode]            ,[ErrorDescription]            ,[InteractiveMode]            ,[MachineName]            ,[UserName]            )      VALUES            ('"+ @[System::PackageName] +"'    ,'"+ @[System::TaskName] +"'            ,'"+ @[System::ExecutionInstanceGUID] +"'            ,'"+ @[System::PackageID] +"'            ,'"+ @[System::SourceName] +"' ,'"+ @[System::SourceID] +"'      ,NULL      ,NULL      ,'"+ (DT_WSTR, 6) @[System::InteractiveMode] + "' ,'"+ @[System::MachineName] + "' ,'"+ @[System::UserName] +"'  )"

The InteractiveMode system variable contains a Boolean value on whether the package was run interactively by a user. If it was run interactively, you’ll also see the name of the user who ran the package in the UserName variable. Lastly, you can see what machine name ran the package in the MachineName variable.

The next step would be to copy the task over to the PreExecute and PostExecute event handlers for the package. You then must rename the Execute SQL Task to the type of event handler so the INSERT statement would use the value of OnWarning, OnPostExecute, or OnPreExecute. To test the solution, drag a few tasks over in the control flow and execute the package.

Areas to Grow This Solution

The solution that is described in the previous section is a great starter kit for your own auditing solution. In Chapter 8, you’ll learn how to develop SSIS templates that can be used by everyone in your company. After you come up with an enterprise auditing solution as shown in this section, consider deploying it as a template to all of your developers so each one won’t have to reinvent the wheel.

There are some areas where the solution may need to be improved to make it functional for your company. For example, you’re currently constrained to 1,000 characters for the event description. If you want to fix that, you could just increase the table definition from a varchar(1000) to a larger field. A thousand characters is more than adequate to catch most (if not all) errors.

Another area that could be grown is where there is a lot of redundant data in the auditing table because of its denormalized form. To improve on this, you could create a schema in third normal form and perhaps use stored procedures from within the Execute SQL Task to insert into the table.

Disabling Event Handling

There is going to be overhead with doing event handling. In most cases, it won’t be noticeable, since the event is only executed any time a task fails in the OnError event handlers. If you have the requirement to use the OnInformation event, you may execute the event handler hundreds of times, and the overall package performance may suffer. To test that theory, you could temporarily disable the event handlers for the entire package, or for certain portions of the package.

To disable a single event handler, you can go to the individual event handler and go to the Properties window. Then, change the Disable flag to True. This property disables that single event handler from ever executing.

You can also disable the entire event handling structure at a package, container, or task level. To do this, go to the control flow and to the package Properties. You can then set the DisableEventHandlers to true. This will only disable package event handlers, but the ones that are specified to execute on an individual task will still execute. If you have event handlers on an individual container or task, you would have to set the DisableEventHandlers property to true on each of those tasks or containers, too.

Integrating Snapshots into SSIS

Another excellent use of event handlers is for increasing your package’s reliability. One way to do this is to integrate snapshots into your SSIS package. Snapshots are a SQL Server 2005 Enterprise Edition high-availability feature that takes a point-in-time, read-only image of your database. This image would include all tables, stored procedures, and data from the database. From a high-availability perspective, snapshots can act as a backup file, allowing you to quickly recover from a user error or data integrity issue. Because snapshots only hold the data that has changed since it was created, it will be an incredibly small file, and will be faster to create than a backup file.

Wouldn’t it be great if you could create a snapshot image of your database prior to a package running, and then, if any problem occurred, roll back from the snapshot? With the help of event handlers and a little code, that’s exactly what you can do. Snapshots don’t get rid of the need for backups, but in specialized situations like this, it can really help out. Let’s start by creating a new package for this example.

You will want the first step in this package to create the snapshot. You could, of course, place it in front of strategic areas instead of the first step, but for the purpose of this example, let’s protect the entire package with a snapshot. Drag over an Execute SQL Task to create the snapshot and name the task Create Snapshot. Point the task to a connection manager that uses the AdventureWorks database on your test machine. When you create a snapshot, the syntax looks much like a CREATE DATABASE Data Definition Language (DDL) statement, with the exception of the last line, where you specify that it will be a snapshot of a given database. Configure the query in the task to run the following statement to create the snapshot (note your directory structure may vary from this one):

 CREATE DATABASE AdventureWorks_PreETL_Snapshot ON ( NAME = AdventureWorks_Data, FILENAME = 'C:\Program Files\Microsoft SQL Server\MSSQL.1\MSSQL\Data\AdventureWorks_data_PreETL_Snapshot.ss' ) AS SNAPSHOT OF AdventureWorks;

Next, create one more task in the control flow to simulate a failure. Create an additional Execute SQL Task and call it Generate Error. Connect the task to the AdventureWorks connection manager and create an intentionally bad TSQL statement like this:

 SELECTTTTT BAD QUERY

It’s important to note that the Execute SQL Task that creates the snapshot should always be the first task in the control flow, and that there should be some sort of precedence constraint between it and the first true task of your package. This can be seen in the example shown in Figure 7-1.

image from book
Figure 7-1: Execute SQL Task creating the snapshot appears first in the control flow

Before leaving the control flow, create a new variable called FailureFlag that is a Boolean data type with the default value of 0. Creating intentionally bad code is always the easy part of a solution, but now comes the challenging part of the solution. First, you must create an OnError event handler that is scoped to the entire package. Drag a single Script Task onto the event handler and name the task Set FailureFlag.

In the Script page of the Script Task, set the ReadWriteVariables option to FailureFlag (it is case-sensitive). Then click Design Script. In the code window, you want to add code to set the FailureFlag variable by using code such as the following:

 Public Sub Main()    Dts.Variables("FailureFlag").Value = True    Dts.TaskResult = Dts.Results.Success End Sub

Any time a failure occurs anywhere in the package, this script will set the FailureFlag variable to true, and this variable will come into play later in another event handler. The next step is to create a PostExecute event handler that’s scoped to the entire package. After creating the event handler, drag a Script Task onto the design pane and rename the task Stub. This task will do absolutely nothing at all, other than to help you do conditional logic on the precedence constraint you are about to create.

You now need to create the logic that will restore the database from the snapshot in case of a failure. You’ll create the conditional logic in a moment, but in the meantime, drag an Execute SQL Task over to the design pane to perform the restore. Drag the precedence constraint out of the Stub Script Task and onto the newly created Execute SQL Task. Name the Execute SQL Task Rollback DB and configure it to use the AdventureWorks connection manager. Lastly, use the following query to roll back the database from the snapshot:

 use master GO RESTORE DATABASE AdventureWorks from DATABASE_SNAPSHOT = 'AdventureWorks_PreETL_Snapshot'; GO

The assumption of this script is that there are no users in the database at the time of the restoration. If this assumption is incorrect, you must write code to disconnect all the active users prior to the RESTORE command. Luckily, something like that has already been written in many locations and a similar script can be downloaded at www.wrox.com (it is a stored procedure called usp_killusers).

With the database now restored, you can dispose of the snapshot that was created earlier. To do this, create another Execute SQL Task and name it Drop Snapshot. Again, connect it to the AdventureWorks connection manager and configure it to run the following script to drop the snapshot by using the DROP DATABASE command:

 Use Master go DROP Database AdventureWorks_PreETL_Snapshot

You’re now ready to build the conditional logic to only execute the tasks if a failure had occurred. First, drag the precedence constraint from the Rollback DB Execute SQL Task to the Drop Snapshot Execute SQL Task that you just created. Also, drag over another precedence constraint between the Stub Script Task and the Drop Snapshot Execute SQL Task. You should now have two precedence constraints coming out of the Stub Script Task.

As it stands now, this series of tasks will execute every time a PostExecute event is triggered, which is every time any task, container, or package completes. You obviously can’t have the database restoring for all of these events, so you want to only selectively have the database restore if the package is completing and there was a failure.

To do this, double-click the precedence constraint connecting the Stub Task to the Rollback DB Task to edit the constraint. Change the Evaluation operation option to Expression and Constraint. Then type the following expression into the Expression text box:

 @FailureFlag == True && @SourceName == @PackageName

This expression evaluates whether the FailureFlag was set to True from some type of OnError event. It also ensures that the source that triggers the PostExecute event is the package’s completion and not just a task. The two && signs are a logical AND condition for the expression. Click OK to save this expression and note the FX box to the side of the precedence constraint.

You now need to build the opposite logic. If the package completes and no error occurs, you still need to gracefully drop the snapshot. Double-click on the precedence constraint connecting the Stub to the Drop Snapshot Task to edit the precedence constraint. Again, you will want to change the Evaluation operation option to Expression and Constraint, and type the following expression into the Expression text box:

 @FailureFlag == False && @SourceName ==@PackageName

This code will only execute the DROP SNAPSHOT if the OnError event was never executed and the package has finished running. Before exiting the Precedence Constraint Editor, change the Multiple constraint option to Logical OR, as shown in Figure 7-2. This will allow either the Stub Task or the Rollback DB Task to conditionally execute the Drop Snapshot Task. If you did not do this, the cleanup step would never execute, since both tasks would have to execute (which is impossible with the expression logic you put in place).

image from book
Figure 7-2: Changing the Multiple constraint option to Logical OR

Your final solution will look like Figure 7-3. If no problem occurs in the package, the Drop Snapshot Task will execute, cleaning up the earlier created snapshot. If a problem does occur, then both the Rollback DB and Drop Snapshot Tasks will execute sequentially. This solution can easily be built on to only watch certain tasks, or to disconnect the users prior to the rollback of the database. The assumption is made in this solution, though, that you have exclusive access to the target database. That may be a poor assumption and if so, you would want to use some of the other solutions mentioned later in this chapter (such as transactions). If you do have the luxury of being the only user making changes to the database (such as in a nightly ETL batch process), then the snapshot solution is an effective and quick way to ensure your data is in a consistent state if a problem occurs. If a problem does occur, you would simply fix the data quality, database space, or any other type of problem, and then rerun the package from the beginning. For more information on snapshots, refer to Books Online under the topic “Database Snapshots.”

image from book
Figure 7-3: Final solution