Administering DTS | ppk on JavaScript, 1/e

Data Transformation Services (DTS) is a collection of objects that allow you to move data from any OLE DB source to any destination. I won't show you how to program a DTS package in this section. Instead, I'll focus on some of the administrative concerns with DTS, covering some of the primary components, including the following:

Tasks Objects that allow you to handle an individual piece of work, such as FTPing files or transforming data
Steps The component that defines the order in which steps execute
Connections Connections to any OLE DB data source
Package A collection of the tasks and connections available for execution
Global variables Allow you to pass variables (similar to stored procedure variables) between packages and tasks
DTS Designer The GUI you use to create packages

DTS was first released in SQL Server 7.0, and its capabilities have expanded tremendously in SQL Server 2000. In this release of DTS, you can perform more tasks, such as using FTP and sending messages to a Microsoft queue. Microsoft also made modifications to the tasks that were available in previous versions. For example, now you can pass global variables in and out of packages easily.

DTS Connections

One of the myths of DTS is that it's only built for SQL Server. Indeed, DTS is optimized for SQL Server because it's built in to Enterprise Manager, but it can be used by any relational database system that is OLE DB compliant. DTS is a great way to upgrade your system to SQL Server from another relational database management system (RDBMS) such as Oracle. DTS also provides a way to convert your data from SQL Server to another system such as DB2. There are even OLE DB drivers for host systems like VSAM.

As you begin to use other OLE DB providers, such as Oracle, Sybase, and DB2, keep in mind that most other OLE DB providers require some type of client software installed on the workstation that is connecting to the remote system. For example, if you are using the OLE DB provider for Oracle, you must install the SQL.NET client utilities. To use the OLE DB provider for DB2, you must have your DB2 administrator install DB2 packages on the system to support the OLE DB drivers. Table 4-3 provides a list of a few of the commonly used OLE DB providers and the software that installs them. Most of the providers mentioned in Table 4-3 are installed with SQL Server 7.0 as well.

Table 4-3: Where to Find OLE DB Providers
Type of Connection	Where to Find It
SQL Server	Installed with SQL Server
Microsoft Access	Installed with SQL Server
Microsoft Excel	Installed with SQL Server
Dbase	Installed with SQL Server
HTML File	Installed with SQL Server 2000 only
Paradox	Installed with SQL Server
Text File	Installed with SQL Server
Oracle	Installed with SQL Server; will also need Oracle connectivity components
Data Link (UDL)	Installed with SQL Server
DB2	Need host integration services or third party
Sybase	Need Sybase client software
VSAM	Need host integration services or third party

Some data sources do not yet have OLE DB data sources available, but may support ODBC. One of the OLE DB providers available to DTS is the OLE DB Provider for ODBC. With this provider, you can extend the amount of available data sources to any ODBC-compliant data source. For example, there is an OLE DB provider for Oracle as well as an ODBC driver. If a normal OLE DB provider is available for a data source, use it instead of the ODBC driver, because OLE DB is considerably faster than using the OLE DB Provider for ODBC.

Saving a Package

Once you've created a package, you can save it to any of four locations:

Locally in SQL Server
In Meta Data Services
As a COM-structured file
As a VB file

Each of these options has drawbacks as well as advantages.

Saving a Package Locally in SQL Server

Saving a package locally in SQL Server is the most common method. Packages saved here appear in Enterprise Manager, in the Data Transformation Services group under Local Packages.

These packages are stored in the msdb database in the sysdtspackages table. As you can imagine, your msdb database can rapidly grow quite large if you're actively using DTS and also saving packages locally.

Saving a Package in Meta Data Services

Saving your package in Meta Data Services allows you to scan a package for information. Simply put, it's a method of self-documenting your packages and the data they contain.

Although this is a nice feature, it has a major performance drawback. Packages saved here can take up to ten times longer to save and double the time to load. When a package takes double the time to load, it affects your total execution time. Packages that use Meta Data Services also are stored in the msdb database.

Saving a Package as a COM-Structured File

Saving your package as a COM-structured file is the fastest way to save and load a package. Packages saved here have a .dts file extension.

To open the package after it's saved in this form, right-click the Data Transformation Services group in Enterprise Manager, and choose Open Package.

Saving a Package as a VB File

Saving your package as a VB file means the package uses a .bas extension, and can therefore be opened in any text viewer or in Visual Basic. This is a great way to learn how to program in DTS and use the DTS object model.

Caution

This format means you can't edit the packages easily, because you lose the ability to edit the package in Designer. If you want to save a package as a VB file, also save a copy of the package in another format. Then the package can be updated in Designer.

Managing Package Versions

As you save packages, DTS uses a simple version-controlling system on all DTS formats, except for VB files. Right-click any package saved locally, or in Meta Data Services, and choose Versions. You can view the versions for a package, and roll back to previous versions. The version is automatically displayed when you open a COM-structured file.

Each time you save a package, a version is automatically added. The problem is that DTS does not use any type of archiving mechanism. If you have a 1MB package and save it six times, the package is 6MB in size.

Note

As you can imagine, if you are using DTS heavily, and saving your packages either locally or into Meta Data Services, your msdb database becomes much more important. Each time you make a major change in your package, create a backup of the msdb database. I also like to keep a .dts file backup just in case.

The DTS version-control system is primitive at best. I recommend that you use a traditional version-control system, such as Visual Source Safe, to protect your versions. The major advantage to saving a package as a COM-structured file is that you can check the files into Visual Source Safe and version-control them easily. This allows you to check files in and add more detailed notes to packages.

In the Trenches

A problem occurs when you save a package as a COM-structured file where DTS will accumulate versions of the package. DTS will add version after version to the package, but provides no mechanism to delete versions from a .dts file. You can overcome this problem by taking the following steps:

Open the correct version of the package in DTS Designer.
Open a file management program such as Windows Explorer and delete the .dts file.
In DTS Designer, select Save As under the Package menu.
Name the package, using the same name and the same location as you previously did.

Designer creates an instance of the package in memory and the package is in no way linked to the file until you save it again. It is always a good idea to create a backup of the .dts file before you delete the old versions.

Securing a Package

By default, anyone with a login on your SQL Server can create a package and save it to your SQL Server. This doesn't present a problem if users are saving packages as .dts files, but when people save them onto the SQL Server or Meta Data Services, your msdb database can grow out of control.

Apply user and owner passwords to protect your sensitive packages from being executed or viewed by users without the proper authority. This method is only available if you're saving a package locally or as a file. If you specify an owner password when saving a package, it prevents users from opening a package in Designer. A user password prevents a user from executing a package.

Any user with default security can save and view packages in SQL Server, unless of course the user is trying to save a package that has an owner password. Although you can trace which users are creating packages, it may be easier just to tighten the security bolts slightly and only allow certain users to create packages. If you don't want your users to be able to see the packages installed on your server, simply deny them access to the sp_enum_dtspackages stored procedure in the msdb database. Users will not see an error if they try to see the packages in Enterprise Manager, but the list will appear to be empty.

Tip

You may also want to tighten security to prevent users from adding a package on your production SQL Server. You can limit the users who can create packages in Enterprise Manager by denying access to the sp_add_dtspackage stored procedure in the msdb database.

In each of my environments, I like the DBA to have control of who can create, view, and execute DTS packages. To accomplish this, I create a role called DTSUsers in the msdb database. Then I remove access to the public role and grant access to the DTSUsers role as shown here:

USE MSDB Print 'Revoking access to create and read DTS packages' REVOKE EXECUTE on msdb..sp_enum_dtspackages  to Public REVOKE EXECUTE on msdb..sp_enum_dtspackagelog  to Public REVOKE EXECUTE on msdb..sp_enum_dtssteplog  to Public REVOKE EXECUTE on msdb..sp_enum_dtstasklog  to Public REVOKE EXECUTE on msdb..sp_get_dtspackage  to Public REVOKE EXECUTE on msdb..sp_get_dtsversion  to Public REVOKE EXECUTE on sp_add_dtspackage to Public REVOKE EXECUTE on sp_make_dtspackagename to Public REVOKE EXECUTE on sp_drop_dtspackage to Public REVOKE EXECUTE on sp_make_dtspackagename to Public REVOKE EXECUTE on sp_reassign_dtspackageowner to Public REVOKE EXECUTE on sp_get_dtspackage to Public REVOKE EXECUTE on sp_reassign_dtspackagecategory to Public REVOKE EXECUTE on sp_add_dtscategory to Public REVOKE EXECUTE on sp_make_dtspackagename to Public REVOKE EXECUTE on sp_drop_dtscategory to Public REVOKE EXECUTE on sp_modify_dtscategory to Public REVOKE EXECUTE on sp_enum_dtscategories to Public REVOKE EXECUTE on sp_log_dtspackage_begin to Public REVOKE EXECUTE on sp_log_dtspackage_end to Public REVOKE EXECUTE on sp_log_dtsstep_begin to Public REVOKE EXECUTE on sp_log_dtsstep_end to Public REVOKE EXECUTE on sp_dump_dtslog_all to Public REVOKE EXECUTE on sp_dump_dtspackagelog to Public REVOKE EXECUTE on sp_dump_dtssteplog to Public REVOKE EXECUTE on sp_dump_dtstasklog to Public REVOKE EXECUTE on sp_log_dtstask to Public

Caution

Test this lockdown script in all of your environments before deploying. You never know what types of dependencies you'll have in your application.

The following lines of code will create the role called DTSUsers and grant all
the access to it:

USE MSDB IF (SELECT COUNT(*) FROM SYSUSERS WHERE NAME = 'DTSUsers') = 0 BEGIN       EXEC SP_ADDROLE 'DTSUsers'       Print 'Adding DTSUsers Role' END GRANT EXECUTE on msdb..sp_enum_dtspackages  to DTSUsers GRANT EXECUTE on msdb..sp_enum_dtspackagelog  to DTSUsers GRANT EXECUTE on msdb..sp_enum_dtssteplog  to DTSUsers GRANT EXECUTE on msdb..sp_get_dtspackage  to DTSUsers GRANT EXECUTE on msdb..sp_get_dtsversion  to DTSUsers GRANT EXECUTE on sp_add_dtspackage to DTSUsers GRANT EXECUTE on sp_make_dtspackagename to DTSUsers GRANT EXECUTE on sp_drop_dtspackage to DTSUsers GRANT EXECUTE on sp_make_dtspackagename to DTSUsers GRANT EXECUTE on sp_reassign_dtspackageowner to DTSUsers GRANT EXECUTE on sp_get_dtspackage to DTSUsers GRANT EXECUTE on sp_reassign_dtspackagecategory to DTSUsers GRANT EXECUTE on sp_add_dtscategory to DTSUsers GRANT EXECUTE on sp_make_dtspackagename to DTSUsers GRANT EXECUTE on sp_drop_dtscategory to DTSUsers GRANT EXECUTE on sp_modify_dtscategory to DTSUsers GRANT EXECUTE on sp_enum_dtscategories to DTSUsers GRANT EXECUTE on sp_log_dtspackage_begin to DTSUsers GRANT EXECUTE on sp_log_dtspackage_end to DTSUsers GRANT EXECUTE on sp_log_dtsstep_begin to DTSUsers GRANT EXECUTE on sp_log_dtsstep_end to DTSUsers GRANT EXECUTE on sp_dump_dtslog_all to DTSUsers GRANT EXECUTE on sp_dump_dtspackagelog to DTSUsers GRANT EXECUTE on sp_dump_dtssteplog to DTSUsers GRANT EXECUTE on sp_dump_dtstasklog to DTSUsers GRANT EXECUTE on sp_log_dtstask to DTSUsers GO

Note

This code sample can be downloaded at http://www.sqlservercentral.com/experienceddba.

Make sure you don't make this type of change haphazardly. You'll need to consult whomever this will affect before running this script as you may have some upset users when they realize they can't create packages. When you'd like to grant a user rights to perform any type of DTS function, simply grant them access to the msdb database and the DTSUsers role. With COM-structured files, you can use Windows 2000/NT security to prevent users from reading files. When a user tries to open a DTS file without the appropriate rights, DTS sends an Access Denied error.

Executing Packages

To execute a package, choose Execute under the Package menu in DTS Designer, or right-click the saved package in Enterprise Manager and choose Execute. The package uses the resources of the workstation or server that executes it. This is a vital DTS point and is commonly misunderstood. If you're developing a package from your workstation to transfer a million records from one data source to another, DTS uses your machine to perform all the transformation. If your workstation isn't fast and powerful, the process could spend all of your system's resources.

Note

The workaround for this is to schedule the package as a one-time job, then execute the job. This ensures that you execute the package from your server, not the workstation.

Take Command

You can also execute a package from a command prompt using the DTSRUN utility. When you schedule a package to execute, SQL Server uses xp_cmdshell and the DTSRUN utility to execute the package. The following table describes some of the available parameters for DTSRUN.

Switch	Purpose
/?	Help
/S	Server name and instance
/U	User name to connect with
/P	Password for connecting user
/E	Use Windows authentication
/N	Name of package
/G	Package's GUID
/V	Version GUID
/M	Package password
/F	COM-structured filename
/R	Repository database
/A	Pass global variable in
/L	Package log file
/W	Write NT event on completion
/!D	Delete the package

For example, to execute a COM-structured package named package.dts, use the following syntax (at a minimum):

DTSRUN /F C:\Package.DTS

To execute a package that is stored locally on SQL Server, use this syntax:

DTSRUN /S ServerName /U username /P password /N PackageName

If you would like to pass in two global variables into the package, you could use syntax like the following:

DTSRun /S ServerName  /U username /P password /N PackageName /A "Variable1Char":"8"="140" /A "Variable2Char":"8"="Data2"

You can also schedule the package to execute as a job using the instructions I'll provide in a moment. After you schedule the package as a job, you can copy the information out of the job's step, which will have an encrypted password.

Managing Package Locations

Most DTS connections and objects look at the computer that is executing the package, not the server where the package is saved. For example, if you specify a connection to a flat file (D:\extract\flatfile.txt) that is located on your workstation, and execute the package, the package executes successfully. However, if you execute the package on the server, or if you schedule the package, the package fails. This is because the file and directory don't exist on the server or on other workstations, and DTS looks for the file on the computer that is executing the package.

The only exception to this rule is the Bulk Copy Task. The Bulk Copy Task only allows you to specify a file on your target server. For instance, if you have a connection to a SQL Server and try to add a Bulk Copy Task, DTS looks for the flat file on the SQL Server that you're trying to connect to.

If you have any custom scripting languages installed, such as PerlScript, and you use them in DTS, make sure that they are installed on any workstation or server that might execute participating packages.

Tip

I often design packages in PCAnywhere or Terminal Server to avoid running into package locality problems.

Using DTSRunUI

Learning all the switches that DTSRUN has to offer can be a daunting task. However, located in the depths of your tools directory you'll find DTSRunUI.exe, which is a hidden treasure. This small application allows you to quickly write a DTSRUN statement that can execute a package from the command line. DTSRunUI can also automate the process of passing parameters to the package.

By default, DTSRunUI.exe is located in the C:\Program Files\Microsoft SQL Servers\80\Tools\Binn folder. Open it and enter the server and login information. Then click '...' to find the packages on that server.

After you've identified your package, click Advanced to open the Advanced Options dialog box, where you can pass global variables values to the package. (If you've ever typed a long and complex DTSRUN command at the command line, you'll appreciate this feature.) Click Generate to generate the command-line execution command.

This tool can really help you create DTSRUN statements rapidly and schedule them. It also helps you avoid the chore of looking up the syntax on BOL or copying the syntax from your SQL Agent job. You can also schedule your package for later execution using DTSRunUI.

Note

You cannot execute VB stored files with this utility.

Scheduling a Package

The simplest way to schedule a package is to right-click a locally saved package and choose Schedule Package. The package's Scheduling Properties dialog box opens (you may be familiar with this dialog box from SQL Server Agent). The dialog box creates a SQL Server Agent job and creates the proper syntax in the step.

You can also schedule a package manually by creating a new SQL Server Agent job. When creating the step to execute the package in the job, use the operating system command (CmdExec) discussed earlier in the 'Jobs' section to execute the DTSRUN statement. Enter the DTSRUN command for the Command option in the Job Step tab.

Optimizing Packages

You have several features to help you optimize the performance of packages. In this section, I'll discuss some of the tricks you can use to make packages run better and faster.

In the Trenches

Apparently, many administrators have packages failing when they're executed from a job. I see many questions in newsgroups about scheduling a package. Administrators complain that a DTS package works fine when executed manually, but fails when executed from a job.

The most common reason for this failure is login credentials. When executing a package with SQL Server Agent, the package is executed with the user who starts SQL Server Agent. If you use Windows authentication for any of your connections inside your package, the login used to start Agent must also exist on the remote system.

Make sure the user has the proper rights to the UNC paths, shares, and files needed to execute the package. If you start Agent with the System account, keep in mind that the account doesn't have access to remote systems. Also, never use a mapped drive inside your package. Instead, use UNC paths to reach network resources. This is because the account profile that starts the package may not have the same drive mapped as you have.

Caching Registry Data

DTS has a well-known performance flaw: it reads the registry each time a package loads in order to determine if there are any new OLE DB data sources and tasks. If you have a system where those items don't change often, you can turn on caching to load these reads into memory. Once caching is turned on, the package load time (and therefore the overall execution time) is dramatically reduced.

To turn on caching, right-click the Data Transformation Services group and choose Properties. Select the Turn On Cache option to enable this feature.

Keep in mind that if you enable this option, you're only enabling it on the workstation or server where you're working. However, if you logged in through a remote management tool such as PCAnywhere, you can enable the option on the computer to which you've connected.

Enabling the Multiphase Data Pump

The Properties dialog box for DTS contains an option for the multiphase data pump. This is a new feature in DTS, and it allows you to separate a transformation into multiple phases. This means you can create error handling in your transformation to catch errors and automatically fix them. Or you can log the number of successful transformed records. By default, the multiphase data pump option is turned off for usability. Select the option Show Multi-phase Pump In DTS Designer to turn on the option in Designer.

Enabling Just-in-Time Debugging

The Turn On Just-In-Time Debugging option enables more sophisticated debugging tools when you're debugging ActiveX scripts inside a package. Though I say 'more sophisticated debugging tool,' it's still very weak, but this tool does give you added flexibility. This option uses the last installed script debugger on your system. On most developer machines, this is Visual Interdev, but it could also be the Script Debugger that comes with Internet Explorer.

Using the Bulk Insert Task

DTS was developed on a completely open API that allows developers to add customized logic into a package. When transforming data, the more logic that is installed into the process, the slower the package executes.

By default, DTS uses the copy column method with no logic for transforming data. The most optimized way of transforming data is with the Bulk Insert task. The Bulk Insert task uses the Bulk Insert syntax to import data, which does not allow you to add any type of logic to the transformation. In fact, you can't even change the mapping of the table using this task. The columns in the flat file must match exactly the columns in the relational system. Another limitation is that the Bulk Insert task only allows you to import data to a SQL Server system.

Using the Transform Data Task

If you want to import or export data from a foreign data source (heterogeneous), you should use the Transform Data task. By default, each column has its own COM object to transform its data. Each COM object is represented as an arrow in the Transformations tab of the task, as shown in Figure 4-3.

click to expand
Figure 4-3: Transformations tab with auto-mapping

As you can imagine, the more COM components you have, the slower the transformation, as each COM object is created and closed. You can optimize this process by having all the columns share one COM object. To do this, use the following steps:

Click the Delete All button in the Transformations tab of the Transform Data Task Properties dialog box.
Click the New button, and choose the Copy Column method of transformation.
Under the Source Columns tab, move all the columns to the right pane using the >> button. Perform the same step under the Destination Columns tab.
Click OK.

The result looks like Figure 4-4. Depending on the number of columns, you should see a dramatic improvement in performance.

click to expand
Figure 4-4: Transformations tab after optimization

You'll also find a large performance difference among the types of transformation methods you use in the Transform Data task. The fastest method inside this task is the Copy Columns method. If you have to perform custom transformations through VBScript, it will take about double the time for the same transformation. You can also use JScript at a slightly higher performance cost.

Note

You have the ability to add additional scripting languages such as PerlScript and REXX. If you do this, keep in mind that DTS may not be optimized for the language.

DTS Without SQL Server

A common myth about DTS is that you need SQL Server to execute DTS. In actuality, you can freely redistribute the DTS .dll and executable files, and the packages will execute independently without SQL Server. Table 4-4 shows a list of files you can distribute. If you choose to do this, you must distribute your packages as .dts files, and you need a license to edit the package in Designer. You also need a license if you connect to SQL Server in your package.

Table 4-4: List of Files to Make DTS Independent
File	Description
Axscphst.dll Axscphst.rll	Handles ActiveX scripting
Dtsffile.dll Dtsffile.rll	Flat file provider
Dtspkg.dll Dtspkg.rll	Handles the package
Dtspump.dll Dtspump.rll	Handles the data pump task
Dtsrun.exe Dtsrun.dll	Executes the package
Custtask.dll	Handles the custom task
Sqlwoa.dll SQLwid.dll	ANSI translation for SQL Server
SQLresld.dll	Loads resource files

Note

You will have to register each of these files on the server for this to work.

Troubleshooting DTS

If you execute a package that fails, you can double-click the red error sign to see details about why the step failed. In the status screen, you can see when a step began to execute and when it stopped. You can also see which steps never executed.

In DTS, a common need is to execute a package from within a package, either through an Execute Package task or an ActiveX Script task. You can execute a package from within a package to make a modular process. For example, if you have a standardized auditing process, you could create an auditing package and execute it from other packages. This way, you won't have to modify all of your packages when an auditing change is needed. One of the main problems with executing a package inside of a package is that you can't easily see the detailed errors that occur in the child packages.

To remedy this problem (as well as other problems), you can add extended error logging in the child or parent packages in order to debug problems. DTS ships with several logging facilities from which you can log to the following targets:

Application log in Windows NT or Windows 2000
Text error log
Exception files
Specialized DTS logging

You can access all but the exception file logging mechanism in DTS Designer. Right-click on a empty white space in Designer and select Package Properties. All of the logging options are under the Logging tab of the dialog box.

The primary way to log is by saving the execution information in SQL Server. To do this, simply check the Log Package Execution to SQL Server option, and specify a target server (see Figure 4-5).

click to expand
Figure 4-5: Use the Logging tab to configure logging

Note

You can also consolidate the logs from multiple servers to a single server.

Specify an error file to receive the execute information (it's a text file). The log file can have any extension (generally .txt or .log), and can grow fast since the status is appended to the log file each time the package is executed. Periodically delete the file to purge the old records.1`

You can also write the status of a package's execution in the Windows NT or Windows 2000 application log, by checking the Write Completion Status To Event Log option. Enabling this option writes the completion status and the details as an informational event in the application log. If the package fails, the event is still only written as an informational event, not a warning or error.

Tip

When trying to debug an ActiveX Script task, try to insert a message box using the following syntax: MSGBOX 'Content Here'. This will let you know if a certain piece of the code is being reached. With some ActiveX Script tasks, you may have to force the task to execute on the main package thread or it may hang. To do this, right-click on the task, select Workflow | Workflow Properties, then select Execute On Main Package Thread in the Option tab.

Other logging mechanisms are available in the Data Pump Properties screen in the Options tab. Exception files allow you to output any records that didn't transform properly to a delimited file. You can then correct the problem and rerun the transformation process.