Chapter 9: Managing SSIS


So, you have a set of packages and are ready to run the package in production. This chapter focuses on how to administer packages after they’re deployed to production. The discussion also examines how to create a stand-alone ETL server and some of the command line utilities you can use to make your job easier.

The Package Store

When you deploy your packages, they are stored into what is called the SSIS Package Store. The Package Store, in some cases, will actually physically store the package, as in the msdb database option. Or, in the case, of the file system, the Package Store just keeps a pointer to a specific top-level directory and enumerates through the packages stored underneath that directory. To connect to the Package Store, the SSIS Service must be running. This service is called SQL Server Integration Services, or MsDtsServer. There is only one instance of the service per machine, or per set of clustered machines.

You can configure the SSIS service in the Services applet in Control Panel image from book Administrative Tools. Double-click on the SQL Server Integration Services item. As you can see, the service is set to automatically start by default and starts under the NT AUTHORITY\NetworkService account. In the Recovery tab, you may decide that you want the service to automatically restart in the event of a failure-you can specify how to react if the service fails the first, second, and subsequent times. As you can see in Figure 9-1, the service has been changed to restart if a failure occurs twice. The failure count in this figure is also reset after two days.

image from book
Figure 9-1: Configuring the service to restart in the event of a failure

Although you can run and stop packages without the service, the service makes running packages more manageable. For example, anyone who wants to interrogate the service can find out which packages are running. It can also aid in the importing and exporting of packages into the Package Store. Other uses for the service will be discussed throughout this chapter, but one last great use for the service is to enable you to handle centralized management, enumeration, and monitoring of your packages throughout your enterprise.

The MsDtsServer service is configured through an XML file that is located, by default, in the following path:

 C:\Program Files\Microsoft SQL Server\90\DTS\Binn\MsDtsSrvr.ini.xml

This path will vary if you’re in a cluster. If you cannot find the path, go to the HKEY_LOCAL_MACHINE\ SOFTWARE\Microsoft\MSDTS\ServiceConfigFile Registry key in RegEdit. By default, the XML file should look like the following:

 <?xml version="1.0" encoding="utf-8" ?> - <DtsServiceConfiguration xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">   <StopExecutingPackagesOnShutdown>true</StopExecutingPackagesOnShutdown> - <TopLevelFolders> - <Folder xsi:type="SqlServerFolder">   <Name>MSDB</Name>   <ServerName>.</ServerName>   </Folder> - <Folder xsi:type="FileSystemFolder">   <Name>File System</Name>   <StorePath>..\Packages</StorePath>   </Folder>   </TopLevelFolders>   </DtsServiceConfiguration>

There really isn’t a lot to configure in this file, but there are some interesting uses for the file. The first configuration line tells the packages how to react if the service is stopped. By default, packages that the service is running will stop upon the service stopping or being failed over. You could also configure the packages to continue to run after the service is stopped until they complete by changing the StopExecutingPackagesOnShutDown property to False, as shown here:

 <StopExecutingPackagesOnShutdown>false</StopExecutingPackagesOnShutdown>

The next configuration sections are the most important. They specify which paths and servers the MsDtsServer service will read from. Whenever the service starts, it reads this file to determine where the packages are stored. In the default file, you will have a single entry for a SQL Server that looks like the following SqlServerFolder example:

 - <Folder xsi:type="SqlServerFolder">   <Name>MSDB</Name>   <ServerName>.</ServerName>   </Folder>

The <Name> line represents how the name will appear in Management Studio for this set of packages. The <ServerName> line represents where the connection will point to. There is a problem if your SQL Server is on a named instance where this file will still point to the default non-named instance (.). If you do have a named instance, simply replace the period with your instance name.

The next section shows you where your File System packages will be stored. The <StorePath> shows the folder where all packages will be enumerated from. The default path is C:\program files\ microsoft sql server\90\dts\Packages, which is represented as ..\Packages in the following default configuration. The .. part of the statement navigates one directory below the SSIS service file, and \Packages then traverses into the Packages folder.

 - <Folder xsi:type="FileSystemFolder">   <Name>File System</Name>   <StorePath>..\Packages</StorePath>   </Folder>

Everything in the Packages folder and below will be enumerated. You can create subdirectories under this folder and they will immediately show up in Management Studio. Each time you make a change to the MsDtsSrvr.ini.xml file, you must stop and start the MsDtsServer service in order for the changes to take effect.

Creating a Central SSIS Server

Many enterprise companies have so many packages that they decide to separate the SSIS service from SQL Server and place it on its own server. The advantages of this are that your SSIS packages will not suffocate the SQL Server’s memory during a large load, and you have a central spot to manage them.

The disadvantages of this are now you must license the server separately, and you add another layer of complexity while you’re debugging packages. When you do this, you create a fantastic way to easily scale packages by adding more memory to your central server, but you also create an added performance hit because all remote data must be copied over the network before entering the data flow buffer.

To create a centralized SSIS hub, you must only modify the MsDtsSrvr.ini.xml file and restart the service. The service can read a UNC path such as \\ServerName\Share and it can point to multiple remote servers. In the following example, the service will enumerate packages from three servers: one that is local and another two that are on a named instance. After restarting the service, you will see a total of six folders to expand in Management Studio. (The Management Studio aspect of SSIS is discussed in much more detail later in this chapter.)

 <?xml version="1.0" encoding="utf-8" ?> - <DtsServiceConfiguration xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">   <StopExecutingPackagesOnShutdown>true</StopExecutingPackagesOnShutdown> - <TopLevelFolders> - <Folder xsi:type="SqlServerFolder">   <Name>Server A MSDB</Name>   <ServerName>localhost</ServerName>   </Folder>   <Name>Server B MSDB</Name>   <ServerName>SQLServerB</ServerName>   </Folder>   <Name>Server C MSDB</Name>   <ServerName>SQLServerC\NamedInstance</ServerName>   </Folder> - <Folder xsi:type="FileSystemFolder">   <Name>Server A File System</Name>   <StorePath>P:\Packages</StorePath>   </Folder> - <Folder xsi:type="FileSystemFolder">   <Name>Server B File System</Name>   <StorePath>\\SQLServerB\Packages</StorePath>   </Folder> - <Folder xsi:type="FileSystemFolder">   <Name>Server C File System</Name>   <StorePath>\\SQLServerC\Packages</StorePath>   </Folder>   </TopLevelFolders>   </DtsServiceConfiguration>

Your next issue is how to schedule packages when using a centralized SSIS hub like this example. You can schedule your packages through SQL Server Agent, or through a scheduling system such as Task Scheduler from Windows. Since you’re already paying for a license of SQL Server, it’s better to install SQL Server on your server and use Agent since it gives you much more flexibility, as you will see later in this chapter. You can also store configuration tables and logging tables on this SQL Server to centralize its processing as well. Both scheduling mechanisms are examined later in this chapter.

Clustering SSIS

Making your SSIS service highly available can be done by clustering SSIS. Before proceeding, however, we need to clarify what a highly available SSIS service gives you. In the event of a server failure, the SSIS service will start on another server in the cluster, and packages can begin to be managed as if the failure never occurred. If there is a package running during the server failure, the package will not automatically begin running on the other server. So, the service is highly available, but the package itself will not continue to work uninterrupted.

The unfortunate news is that SSIS is not a clustered service by default. Even though it does not cluster in the main SQL Server setup, it can still be clustered manually through a series of relatively easy steps. This section walks you through those steps, but makes the assumption that you already know how to use Windows clustering and know the basic clustering architecture. Following are the high-level steps required to set up SSIS as a clustered service, which will then be explored in more detail:

  1. Install SSIS on all nodes that can use the service.

  2. Create a new cluster group (optional).

  3. If you created a new group, create a virtual IP, name, and drive as clustered resources.

  4. Copy the MsDtsSrvr.ini.xml to the clustered drive.

  5. Modify the MsDtsSrvr.ini.xml file to change the location of the packages.

  6. Change the Registry setting to point to the MsDtsSrvr.ini.xml file.

  7. Cluster the MsDtsServer service as a generic service.

Let’s begin by looking at a minor decision you’ll have to make prior to clustering. You can choose to cluster the MsDtsServer service in the main SQL Server cluster group for a given instance, or you can create its own cluster group. Although it’s easier to piggyback the main SQL Server service, you will find that it adds complexity to management.

Recall that you can only have a single SSIS service between the entire Windows cluster. If you have a four-instance SQL Server cluster, where would you place the SSIS service then? This is one reason why it makes the most sense to move the SSIS service into its own group. The main reason though is a manageability one. If you decide that you need to fail over the SSIS service to another node, you would have to fail over the SQL Server as well if they shared a cluster group, which would cause a data outage. Moving the SSIS service into its own cluster group ensures that only the SSIS service fails over and does not cause a wider outage.

Placing this in its own group comes at a price, though. The service will now need a virtual IP address, its own drive, and a name on the network. Once you have those requirements, you’re ready to cluster. If you decided not to place SSIS in its own group, then you would not need the drive, IP address, or name.

The first step in clustering is to install SSIS on all nodes in the Windows cluster. If you installed SSIS as part of your SQL Server install, you’ll see that SSIS installed only on the primary node. You’ll now need to install it manually on the other nodes in the cluster. Ensure that you make the installation simple, and install SSIS on the same folder on each node.

If you want to have the SSIS service in a different group from the database engine, you’ll first have to create a new group called SSIS in Cluster Administrator (it can be called something else). This group will need to be shared by whichever nodes that you would like to participate in the cluster. Then, add to the group a physical drive that is clustered, an IP address, and a network name. The IP address and network name are virtual names and IPs.

From whichever node owns the SSIS group, copy the MsDtsSrvr.ini.xml file to the clustered physical drive that’s in the SSIS cluster group. Generally, it’s best to create a directory called <Clustered Drive Letter>\SSISSetup in which to place the file. Make a note of wherever you placed the file for a later configuration step. You’ll also want to create a folder on the same clustered drive for your packages (called Packages) to be stored. This directory will store any packages and configuration files that will be stored on the file system instead of the msdb database.

Next, open regedit.exe or your favorite Registry editing tool and change the SOFTWARE\Microsoft\ MSDTS\ServiceConfigFile key to point to the new location (including the file name) for the MsDtsSrvr.ini.xml file. Be sure to back up the Registry before making this change.

After this change, you’re ready to cluster the MsDtsServer service. Open Cluster Administrator again and right-click on the SSIS cluster group (if you’re creating it in its own group) and select New image from book Resource. This opens the Resource Wizard, which clusters nearly any service in Windows. On the first screen, type Integration Services for the name of the clustered resource and select Generic Service. This name is a logical name that is only going to be meaningful to the administrator and you.

Next, on the Possible Owner screen, add any node that you want to potentially own the SSIS service. On the Dependencies page, add the group’s Network Name, IP Address, and Drive as dependencies. This ensures that the SSIS service will not come online prior to the name and drives being online. Also, if the drive were to fail, the SSIS service will also fail.

The next screen is the Generic Service Parameters screen, where you will want to type MsDtsServer for the service to cluster. The last screen in the wizard is the Registry Replication screen, where you will want to ensure that the SOFTWARE\Microsoft\MSDTS\ServiceConfigFile key is replicated. If a change is made to this Registry key, it will be replicated to all other nodes. After you finish the wizard, the SSIS service will be almost ready to come online and be clustered.

The final step is to move any packages that were stored on the file system over to the clustered drive in the Packages folder. The next time you open Management Studio, you should be able to see all the packages and folders. You’ll also need to edit the MsDtsSrvr.ini.xml file to change the SQL Server to point to SQL Server’s virtual name and not the physical name, which will allow failovers of the database engine. In the same file, you will need to change the path in the StorePath to point to the <Clustered Drive>:\Packages folder you created earlier as well. After this, you’re ready to bring the service online in Cluster Administrator.

It’s important to note now that your SSIS service is clustered, you will no longer connect to the physical machine name to manage the packages in Management Studio. Instead you will connect to the network name that you created in Cluster Administrator. If you added SSIS as a clustered resource in the same group as SQL Server, you would connect to the SQL Server’s virtual network name.

File System or the msdb Deployment

As discussed earlier, you can store your packages in two places: on the file system or in the msdb database. Each storage option has its own pros and cons, and which option you choose will be based on what is more important to you. These pros and cons are examined in much more depth in this section, but to summarize, the following table provides a high-level idea of which storage option is best based on what functionality you’re most interested in. Just because a given storage option is not checked does not mean that it doesn’t have that functionality. The ones checked are simply most optimized for the given functionality.

Open table as spreadsheet

Functionality

Best in File System

Best in msdb

Security

 

X

Backup and recovery

 

X

Deployment

X

 

Troubleshooting

X

 

Availability

X

 

Execution Speed

X

X

If security concerns you greatly, you may want to consider placing your packages in the msdb database. To secure your packages on the file system, you could have multiple layers of security by using the NT File System (NTFS) security on the folder on the file system where the packages are located. You could also then place a password on the packages to keep users who may have administrator rights to your machine from executing the package. This does add extra complexity to your package deployments in some cases.

If you store your packages in the msdb database, you can assign package roles to each package to designate who can see or execute the package. The packages can also be encrypted in the msdb database, which strengthens your security even more. This is examined in much more depth later in this chapter.

Backup and recovery is simpler when you store your packages in the msdb database. If you were to store your packages in the msdb database, then you must only wrap the msdb database into your regular maintenance plan to back up all the packages. As packages are added, they are wrapped into the maintenance plan. The problem with this is that you can’t restore a single package using this mechanism. You’d have to restore all the packages to a point in time, and that would also restore the jobs and history.

The other option is a file system backup, which would just use your favorite backup software to back up the folders and files. If you do this, you must rely on your Backup Operator to do this for you, which makes some people uneasy. At that moment you could restore individual packages to a point in time. In reality, you can just redeploy the packages from Source Safe if you can’t retrieve a backup file.

File system deployments are much simpler, but less sophisticated. To deploy packages onto the file system, you must only copy them into the directory for the package store. You can create subdirectories under the parent directory to subdivide it easily. You can also copy a single package over easily as well, in case you need to make a package change. To import a package into the package store using the msdb database, you must use Management Studio (or the DTUtil.exe command line tool) and import them package by package. To do a bulk migration, you could use the deployment utility discussed in Chapter 8.

Along the same lines as deployment is troubleshooting. If something were to go bump in the night, and you wanted to see if the packages in production were the same release as the packages you thought you had deployed, you must only copy the files down to your machine and perform a comparison using Source Safe or another similar tool. If the files were stored in the msdb database, you would have to right-click each package in Management Studio and select Export. If the same packages were stored in the file system, you must only copy the files to your machine or open BIDS on the remote machine.

Availability of your packages is always on the top of the list for DBAs. If you were to store the packages in the msdb database and the database engine were to go down, the packages would be unavailable. If they were stored in the file system, then your packages would be available for execution. Of course, if the database engine is down, then probably one of your key data sources would also be down at the same time.

The good news is no matter what storage option you choose, the performance will be the same. As you can see, there are many pros and cons to each storage option, and neither wins overwhelmingly. The main reason that you should generally choose to use the file system is for simplicity of deployment.



Expert SQL Server 2005 Integration Services
Expert SQL Server 2005 Integration Services (Programmer to Programmer)
ISBN: 0470134119
EAN: 2147483647
Year: 2004
Pages: 111

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net