Managing the Solution | Practical Business Intelligence with SQL Server 2005

During development, you should be taking periodic checkpoints of your development so that you have a point to go back to should something go seriously wrong with the design or coding of one of your packages. When your packages are through the development stage, you will want to deploy them to the test and production environments and be able to configure them appropriately for that environment.

Protecting Your Source Code

The packages you create are XML files, and you can consider them to be source code for Integration Services packages. Throughout the development cycle, you will reach stages where a component is running well, perhaps not perfectly, but it is worth saving as a checkpoint before you continue to modify it. We highly recommend you use some form of source control, such as Visual Source Safe (VSS), to save copies of your source files at significant checkpoints. VSS 2005 integrates with BI Development Studio to perform automatic checkout of any package you modify to prevent others from overwriting your work, as well as saving a historical copy of what you started with.

Deployment

When you were developing your packages, you were executing them from your client in BI Development Studio. This was great for debugging, but now it's time to deploy the packages to a location more appropriate for a production environment, or at least to a test environment. You can deploy your packages to SQL Server or to a file system managed through SQL Server Management Studio. SQL Server is preferred if you are going to work in a clustered server environment.

Before you deploy your packages, you want to change some of the properties of some of the tasks in the package, such as the connection managers so that they point to the right servers.

Changing Settings for a Production Environment

Packages include a lot of information that may differ between development and production environments, such as the names of servers, database names, and input file paths. In Integration Services, you can specify which properties should be configured at runtime and where to obtain the values for those properties. The values can be stored in an environment variable, an XML file, or a table in SQL Server, or some combination of all three sources.

You specify a configuration file for a package by selecting Package Configurations from the SSIS menu. The dialog includes a wizard that walks you through the various options, such as selecting which properties of your package you would like to read from the configuration file at runtime. You can reuse the same configuration for multiple packages in your solution, meaning that a value for a property such as a ServerName can be specified in one place, but applied to all packages launched from the same location.

A common scenario is to store configuration settings in a table in SQL Server. A connection manager is used to specify which server to use, but one of our goals here is to be able to pick up different settings depending on the environmenthow do we change this connection manager to point to another server? This is where an environment variable or an XML file in a fixed location is used. An environment variable proves very useful if you cannot guarantee that the XML file will be in the same location in each environment. Here is one way to set this up.

First, you need to add a new connection manager to each package you want to configure. Use a consistent name for the connection manager throughout your solutions. We use one named ETL_Operations. It should point to a SQL Server database that will contain tables used for managing your ETL tools, auditing, and operational statistics, not one used for the data warehouse. We'll create a new database named the same as the connection manager, ETL_Operations.

Next, we'll create an XML configuration file that will contain a connection string that will specify where the ETL_Operations connection manager should point to. Using the Configuration Wizard, choose a configuration type of XML configuration file and specify a fully qualified file name to contain the configuration settings. This file will reside in the same location on every server where you want to use this configuration (for example, development and production environments). We used C:\SSISConfigurations\Manufacturing.dtsConfig. Figure 4-15 illustrates what this looks like in the wizard.

Figure 4-15. Setting the configuration pointer

Click Next. Now you can specify the properties of the ETL_Operations connection manager that will change as you move through different environments. In our example, it's just the server name. We consistently use the same database name and Windows authentication to keep things simple. Expand the Connection Managers node, and the ETL_Operations properties, and then check the ServerName property, as shown in Figure 4-16.

Figure 4-16. Specifying a connection property to be configured at runtime

Click Next. Give the configuration a name. We'll use ConfigurationReference. Click Finish to save the specifications into the package. The configuration now appears in the Package Organizer.

Now you can add the other properties you want to configure to the configuration table in SQL Server referenced by the ConfigurationReference configuration file we just created. Click Add to add another configuration, but this time choose a type of SQL Server. For the Connection, choose the ETL_Operations connection manager we created just for this purpose. We don't have a table yet, so click New. This will create a table called [SSIS Configuration]. We'll use this table from now on for all our configurations stored in SQL Server. For the Configuration filter setting, choose a name related to the application, not the environment. We'll use Manufacturing. This will distinguish our configurations from others that could be set up independently for other applications (see Figure 4-17).

Figure 4-17. Defining configurations stored in SQL Server

Now, just as you did for the ETL_Operations connection manager, select the ServerName property from all the connection managers you want to reconfigure at runtime. Click Next and provide a name for the configuration, such as Connections. You should now see two configurations in the Package Configuration Organizer. Leave them in this order, with the ConfigurationReference configuration before the Connections configuration, as shown in Figure 4-18.

Figure 4-18. Configuration Organizer

Reusing the Configuration Settings in Other Packages

Now that you've got the configurations set up for one package, you can easily reuse them in other packages that use the same connection managers. When you start the configuration wizard for a package, begin as you did for the first package by specifying the XML configuration file. Provide the same filename, and you will be prompted to either reuse the existing settings or to overwrite them. Choose to reuse existing settings. Do the same for the SQL Server configurations, specifying the same connection manager, table (choose from the drop-down list), and filter.

It is important to note that to reuse existing configurations, you must use the same names for the connection managers, including case, and you must have the same number of connection managers. If you open the configuration for a package that is missing a connection manager (even if it doesn't use it), that connection manager will be dropped from the configuration.

Using the Configurations in Multiple Environments

We have defined the configurations for one environment, but our goal was to be able to use them in several environments. This is easy to accomplish. First, copy the XML file to the new environment, putting it in exactly the same drive and directory. Edit the file with Notepad and change the server name to the new server name. Then use the Copy Database Wizard in SQL Server Management Studio to copy the ETL_Operations database to the new server. Open the [SSIS Configurations] table and change the ConfiguredValue column for each of your connection managers to point to the new server(s).

Deploying Packages to a New Server

After you have set up the package configuration, the next step is to move the packages to the targeted environment.

Deploying a Single Package

You can move the packages one at a time, as they are completed, using the File/Save copy as menu option. Choose SQL Server as the location and specify which SQL Server instance you want to use to store the package. At the Package path: prompt, click the dot (.); under Stored Packages, select the folder where you want to save the package (usually MSDB, but you can create your own), and type in the name you want to store the package under, without the .dtsx extension. Click OK to set the location and name, and OK again to complete the save.

Deploying All Packages in a Project

You can also move all the packages in a project to SQL Server or the file system on a server using the manifest and Deployment Wizard. The manifest is a file defining what is to be deployed. You need to enable a project setting to create a deployment manifest. This is done through the Project/Properties menu. On the Deployment Utility node of the Configuration Properties, set the CreateDeploymentUtility property to True. Doing so causes the manifest to be created when you build the project. To launch the Deployment Wizard, double-click the manifest, which is located in the \bin directory of your solution.

Creating Integration Services Subfolders on SQL Server

Using SQL Server Management Studio, you can organize your packages stored on SQL Server by connecting to an Integration Services service. (Note that you do not specify an instance name for Integration Services, just the server name.) You will see two nodes in the tree: Running Packages and Stored Packages. Open the Stored Packages node. If you have many packages, it is convenient to create subfolders under the MSDB folder. To create a subfolder, just right-click the parent folder and choose New folder. When you deploy a package, or choose File\Save a copy as, you can specify the path to the subfolder you want to store it in.

Security

Package security is managed by specific roles for Integration Services created and managed by the SQL Server engine. Unless a user is a member of one of these three roles, that user will not be able to work with packages stored in SQL Server, or even store a package there. The three roles are as follows:

db_dtsadmin. Can add, remove, and execute any package.
db_dtsltduser. Can add, remove, and execute its own packages.
db_dtsoperator. Can view and execute any package directly or as a job.

You assign Windows groups or users to these roles through the Security node of the msdb database in SQL Server Management Studio.

You can also change the roles that can run or replace a package. In SQL Server Management Studio, right-click a package and choose Package roles. You can set each of the read and write permissions to any SQL role defined for the msdb database.

Maintenance

You can also use the deployment utility to create a new installation routine when you have made changes after the packages have initially been installed, which will replace the existing packages in production. Because you probably don't want to overwrite any configuration changes that were made in production, you should set the AllowConfigurationChanges project property to False.

Operations

After you've completed your packages and they've been promoted to production, you are in a completely different environment from BI Development Studio. Your connection managers need to know where production servers are, you'll want to log progress information if a package fails, and you may want to be able to restart from a known checkpoint.

Running Packages in Production

After the packages have been deployed on the production server, you can use the Package Execution utility (available from the Integration Services folder on the Start menu) to select a package, set options and configurations, assign variable values, set connection properties, and, of course, run the package. You can also right-click a package and choose Run to execute a package. Both of these methods execute the package on the machine you are working on, not the server containing the packages.

Instead of running packages manually, in a production environment it is usually better if the packages are automatically executed at a certain time or after some event, such as every Sunday night at 10 p.m., with someone being notified only if they don't succeed. You can use the SQL Server Agent service introduced in Chapter 2, "Introduction to SQL Server 2005," to schedule packages; with the Agent, you can specify package logging options, select which configuration file to use, and notify an operator via e-mail or other device if the job fails.

Execution Location of Packages

Packages run through jobs started by the SQL Agent are executed on the server, as you would expect. You can run packages from a client machine using SQL Server Management Studio, but you need to be aware that the package will run on the client, not on the server that displays in the Object Explorer. If you use environment variables or XML files to set runtime configurations of packages, you must create those variables and XML files on the client, too.

Packages running on the client may increase the network traffic, or may execute slower than on the server, depending on the capacity of the client. Some packages may fail if the Integration Services runtime isn't installed on the client.

Running Packages within a Transaction

Many packages contain multiple steps that need to succeed or fail as a single unit. For example, in the Sales Territory dimension package that we created earlier, if the truncate step succeeds but the import from the source database fails for some reason (such as the source database not being available), we would be left with an empty Sales Territory table in the data warehouse.

Integration Services deals with this issue by allowing you to set up the whole package to run in a transaction so that it succeeds or fails as a unit, or to explicitly select which tasks need to share the same transaction. If you have packages that call other packages, they can also share the same transaction. If any task that is part of the transaction fails, all the preceding tasks in the transaction will be rolled back.

To set up a transaction for a package, right-click anywhere in the Control Flow view and select Properties. Change the TransactionOption property from Supported, which means that if the package is called from another package, it participates in the other package's transaction, to Required, which creates a new transaction for this package if one doesn't already exist. All the tasks that need to participate in the package's transaction also need to have their individual TransactionOption property set to Supported, which is the default setting for tasks.

Building Restartable Packages

Some packages include really long-running tasks such as downloading a file from an FTP server. If a subsequent task fails for some reason, such as loading the data into a table when the database server is not available, you might prefer not have to run the whole package again, but just the steps after the download.

Integration Services uses a concept called checkpoints to support this. When checkpoints are enabled for a package, a checkpoint file is created while the package is executing that keeps track of the last step that successfully executed. When a step fails and you have to restart the package, Integration Services can use the checkpoint file to restart from the point of failure rather than from the beginning.

To enable checkpoints, right-click anywhere in the Control Flow view and select Properties. Change the SaveCheckpoints property to True, specify a CheckpointFileName, and set the CheckpointUsage property to IfExists to make the package restart from the checkpoint if a checkpoint file exists.

Tip: Designing Packages for Restartability

A package can only be restarted from the task level; so if, for example, you have multiple steps within a single Data Flow task, you can only restart the package at the beginning of the data flow. If you split up complex Data Flow tasks into multiple simpler tasks, the package can be restarted at a more granular level.

Logging Package Information

It often proves useful for operations staff to have access to a detailed execution log in case a package fails and they need to discover the cause. You can set up logging for Integration Services packages either in the Visual Studio designer by selecting Logging from the Integration Services menu or when you execute the package using the Package Execution utility or SQL Server Agent.

Logs can be written to multiple destinations including a simple text file, a SQL Server trace file for integrated debugging of the database and Integration Services package, or the Windows event log. The level of detail that you want to include can also be specified, so that you can limit the amount of information that is included. We recommend logging these events: OnError, OnPostExecute, OnPrevalidate, OnTaskFailed.