The SSIS Toolkit


SSIS offers a user-friendly integrated environment called Business Intelligence Development Studio (BIDS), which can be used to create, debug, and deploy SSIS packages. This tool leverages the Visual Studio .NET GUI foundation to offer features such as tabbed windows, solution and project management, enhanced designer interfaces, debugging, SourceSafe integration, help integration, and so on. BIDS combines a graphical editor to design packages and several tools and templates to create and debug packages. Several built-in tasks are available in BIDS to allow you to create complex packages without writing a single line of code. The SQL Server Management Studio tool, on the other hand, offers an environment to manage the existing packages.

Let's now look at some built-in containers and tasks that you can use while designing SSIS packages.

Containers

Containers provide structure and meaning to a package by grouping various tasks and containers together. Containers can include other containers in addition to tasks. Packages use containers for purposes such as to group tasks and containers that must succeed or fail as a unit, to repeat tasks for each element in a collection (such as files in a folder), and to repeat tasks until a specified expression evaluates to false.

SSIS provides four types of containers for building packages: the Foreach Loop container, the For Loop container, the Sequence container, and the Task Host container. Let's look at each of these types in little more detail.

The Foreach Loop Container

The Foreach Loop container allows a package to loop repetitively, based on an enumerator. The following different enumerator types are available:

  • Foreach File Enumerator This type is used to traverse files and subfolders on the disk drive.

  • Foreach ADO Enumerator This type is used to traverse rows in an ADO rowset.

  • Foreach ADO.NET Schema Rowset Enumerator This type is used to enumerate the data source in a schema.

  • Foreach Item Enumerator This type is used to traverse items in a collection such as documents.

  • Foreach From Variable Enumerator This type is used to traverse the values of a variable.

  • Foreach Nodelist Enumerator This type is used to enumerate the resultset of an XML Path Language (XPath) expression.

  • Foreach SMO Enumerator This type is used to enumerate SQL Server Management Objects (SMO) objects such as tables and databases in a SQL Server instance.

The For Loop Container

The For Loop container evaluates a specified expression, and the repetitive flow is continued until the expression evaluates to false. The loop is defined by an initial expression that is optional, an evaluating expression that should result in false for the loop to stop, and an optional iterative expression to increment or decrement the loop counter.

As shown in Figure 12.3, the initial expression sets the variable @var to 0. The tasks in the For Loop container are executed in a loop until @var is less than 10. AssignExpression increments the @var variable each time the loop is run.

Figure 12.3. The For Loop container defines a repeating control flow in a package.


The Sequence Container

While designing SSIS packages, often it is necessary to run multiple tasks in parallel or to group together several tasks in a collection such that the collective property of all the tasks in the collection affects the next task downstream. Such tasks and containers can be grouped in the Sequence container. The Sequence container creates a subset of the package control flow to allow you to manage properties on multiple tasks by grouping them in a single container and allows debugging the tasks as a single unit of work. In addition, you can define scope on the variables over the entire Sequence container. An example of the Sequence container is shown in Figure 12.4.

Figure 12.4. Sequence containers group a package into multiple separate control flows, each containing one or more tasks and containers that run within the overall package control flow.


The Task Host Container

The Task Host container essentially encapsulates a single task. Setting the properties of the encapsulated task automatically sets the properties of the Task Host Container. This container is useful when you're creating SSIS tasks programmatically.

Tasks

Tasks define the units of work to be performed as part of a package for process communication and data flow. The tasks in SSIS can be divided into the following categories:

  • Control-flow tasks These tasks are used for process-oriented tasks such as communicating with other processes, preparing data for the data flow, working with SQL or Analysis Services objects, or extending package functionality through programming.

  • Data flow tasks These tasks are used for data control and manipulation (for example, extracting, transforming, and loading data from a source to a destination).

  • Database maintenance tasks These tasks are used to perform administrative and maintenance processes for SQL Server to ensure performance, high availability, and disaster recovery.

  • Custom tasks If the required functionality is not available in the three other kinds of tasks, you can extend the functionality by implementing custom tasks. The SSIS programming model exposes a set of APIs and base classes for implementing custom tasks. You can inherit from these classes and override the required methods to achieve the desired functionality.

Let's now look at each of these task types in a little more detail.

Control-Flow Tasks

The different control-flow tasks can be further divided into SQL Server tasks, process-oriented workflow tasks, data-oriented tasks, Analysis Services tasks, and scripting tasks.

SQL Server Tasks

SQL Server tasks allow access, configuration, and administration of SQL Server objects. SQL Server tasks include the following:

  • The Execute SQL task The Execute SQL task allows execution of Transact-SQL (TSQL) statements and stored procedures. In addition, it allows the running of parameterized queries. Input variables for stored procedures and queries are mapped to the value of ? at runtime. Parameters are order dependent. In the case of a stored procedure with several parameters, the SSIS engine follows the order in which the input variables are assigned and maps accordingly. The first input variable is mapped to the first ?, and so on. The return values can be a single row, multiple rows, or XML.

  • The Bulk Insert task This task allows you to copy data from flat files to SQL Server tables. It provides various properties of the bulk cinsert format file, such as format, batch size, and so on.

The other SQL Server tasksincluding the Transfer SQL Server Objects task, the Transfer Database task, the Transfer Error Messages task, the Transfer Jobs task, the Transfer Logins task, and the Transfer Master Stored Procedures taskallow you to transfer different SQL Server objects to different database instances.

Process-Oriented Workflow Tasks

The process-oriented workflow tasks include the following:

  • The Execute Package task This task allows the execution of other SSIS packages as part of the workflow.

  • The Execute DTS 2000 Package task This task allows the execution of DTS packages developed in SQL Server 2000 as is, without migration to SSIS. Not all the tasks available to the DTS packages in SQL Server 2000 are available in SSIS. In addition, the architecture and object model have changed tremendously. Hence, not all DTS tasks can be migrated to SQL Server 2005 SSIS. This is discussed in more detail in the section "Migrating DTS 2000 Packages to SSIS," later in this chapter.

    A DTS 2000 Execute package requires the DTS 2000 Runtime to be present. The DTS runtime is part of SQL Server 2000. However, if SQL Server 2000 is not installed on the server, you can install the DTS 2000 runtime while installing SQL Server 2005. During the SQL Server 2005 setup, you click the Advanced Options button and select for DTS 2000 to be installed. The connections associated with packages to be executed by the Execute DTS 2000 Package task are maintained in the SQL Server 2000 format and are not migrated. Figure 12.5 shows the Execute DTS 2000 Package Task Editor window.

    Figure 12.5. The Execute DTS 2000 Package task runs packages that were developed by using SQL Server 2000 tools.


  • The Execute Process task This task allows execution of applications and Windows batch files as part of the SSIS workflow.

  • The Message Queue task This task allows you to use message queuing to send and receive messages between SSIS packages or to send a message to an external application queue.

  • The Send Mail task As the name suggests, this task can be used to send email messages over SMTP. You can use this task to send an email message if tasks in the package workflow succeed or fail or in response to an event that the package raises at runtime. The task uses an SMTP connection manager to connect to a mail server.

  • The WMI Data Reader task This task runs Windows Management Instrumentation (WMI) Query Languages queries to retrieve information from windows event logs, application or performance information about the different server resources including hardware, and so on. The WMI Data Reader task uses a WMI connection manager to connect to the server from which it reads WMI information.

  • The WMI Event Watcher task The WMI Event Watcher task watches for specific configured WMI events. The alert events raised could be an application installation on a production server or degrading performance of hardware resources.

Data-Oriented Tasks

Data-oriented tasks include the File System task, the File Transfer Protocol task, the Web Services task, and the XML task. These tasks are used for copying and modifying files, downloading files from different locations, executing web methods, and working with XML documents.

Analysis Services Tasks

Analysis Services tasks include the Analysis Services Processing task, which allows processing of Analysis Services objects such as cubes, dimensions, and mining models; the Analysis Services Execute DDL task, which allows creating, dropping, and altering of mining models and multidimensional objects such as cubes and dimensions; and the Data Mining Query task, which allows running of prediction queries based on data mining models built in Analysis Services.

Scripting Tasks

Scripting tasks allow you to program code to extend the functionality of a package. The two tasks that allow you to program code for better functionality are the Script task and the ActiveX Script task. The ActiveX Script task allows you to code in scripting languages such as VBScript or JScript. To code in .NET languages such as Visual Basic. NET or C# .NET, SSIS has introduced a new task called the Script task, which allows you to perform complex tasks by writing VB .NET or C# code.

Data Flow Tasks

Data flow tasks control the flow of data from the source to the destination and allow you to cleanse and transform data and redirect portions of the data to different destinations.

The data flow tasks are divided into source adapters, transformations, and destination adapters.

Source Adapters

Source adapters allow you to connect to different data sources, using different connection managers, and to extract and read data. The different connections that you are allowed to connect to are OLE DB, flat files, Excel files, raw files, XML files, and data reader.

Transformations

Different transformation can be applied to cleanse, modify, and transform data. These are the different transformations:

  • The Conditional Split transformation This transformation can redirect data rows to different outputs based on certain conditions. Each input row is redirected to an output based on an expression.

  • The Multicast transformation The Multicast transformation is similar to the Conditional Split transformation but directs every row to every output. This task is useful when multiple sets of transformations need to be applied to the same data and logical copies of the data need to be created to be processed further, using different transformations.

  • The Derived transformation This transformation is useful for cleansing and modifying data because it replaces old columns or creates new column values by applying user-defined expression to the column values. You can also use it to create new columns based on expressions applied to variables.

  • The Data Conversion transformation This transformation converts the data type of a column to a different data type, so it is similar to the Cast/Convert function in T-SQL.

  • The Merge transformation The Merge transformation provides an output by merging two sorted datasets based on the values in their key columns, resulting in sorted merged output.

  • The Merge Join transformation Similarly to the Merge transformation, this transformation merges two datasets but joins them by using FULL, LEFT, or INNER joins.

  • The Aggregate transformation You can use this transformation to perform aggregation operations such as an average, a sum, or a count.

  • The Character Map transformation You can use this transformation to apply string functions against character data.

  • The Copy/Map transformation You can use this transformation to create new columns by copying input columns and adding the new columns to the transformation output.

  • The File Extractor transformation You can use this transformation to insert data from data flow into a file. This transformation is useful in moving data of specific formats to separate files.

  • The File Inserter transformation This transformation reads data from a file and adds it to a data flow. This transformation is useful for adding more information to processed data.

  • The OLE DB Command transformation This transformation executes a SQL command for each row in a data flow.

  • The Data Mining Model Training transformation This transformation trains data mining models on the data received from the data mining model algorithms.

  • The Data Mining Query transformation You can use this transformation to run data mining prediction queries.

  • The Dimension Processing transformation As the name suggests, you can use this transformation to process Analysis Services dimensions.

  • The Fuzzy Grouping transformation This transformation performs data cleansing tasks by identifying rows of data that are likely to be duplicates and choosing a canonical row of data to use in standardizing the data.

  • The Fuzzy Lookup transformation This transformation performs data cleansing tasks such as standardizing data, correcting data, and providing missing values by looking up values in a reference table, using a fuzzy match.

  • The Logged Lineage transformation This transformation allows extraction of environmental variable values and includes them as a part of the data flow.

  • The Lookup transformation You can use this transformation to perform lookups in a reference dataset, which can be an existing table or view, a new table, or the result of a SQL statement.

  • The Partition Processing transformation You can use this transformation to process Analysis Services partitions.

  • The Row Count transformation This transformation counts rows and stores the total in a variable.

  • The Row Sampling transformation You can use this transformation to create a sample dataset by selecting a specified number of the transformation input rows.

  • The Script Component This transformation uses script to extract, transform, or load data.

  • The Slowly Changing Dimension transformation You can use this transformation to coordinate the updating and inserting of records in data warehouse dimension tables.

  • The Sort transformation You can use this transformation to sort data in ascending or descending order.

  • The Union All transformation As the name suggests, this transformation merges multiple datasets.

  • The UnPivot transformation You can use this transformation to create a normalized version of an unnormalized table.

Destination Adapters

Destination adapters allow you to load processed data into different destinations. The different destination adapters available in the Toolbox are the Flat File destination adapter, the OLE DB Destination adapter, the Raw File destination adapter, the Recordset destination adapter, the SQL Mobile destination adapter, and the SQL Server destination adapter.

Database Maintenance Tasks

Maintenance tasks, such as backing up the database, monitoring index fragmentation, checking database integrity, and executing custom scripts to monitor database performance and high availability, are some of the common tasks that database administrators perform on a regular basis. SSIS offers several useful database maintenance tasks for DBAs to include in a maintenance plan without writing custom scripts or coding. The different database maintenance tasks available in the Toolbox include those described in the following sections.

The Backup Database Task

The Backup Database task allows you to back up a single or multiple databases. You can perform a full backup, a differential backup, or a transaction log backup based on the recovery model. In addition, this task also allows you to create a file and filegroup backup for a database with multiple filegroups and data files. The backup device can be specified or created in this task, and it contains all the properties associated with backup creation using T-SQL or SQL Server Management Studio.

The Reorganize Index Task

You use the Reorganize Index task to defragment fragmented indexes in a single or multiple databases. The task also includes an option to defrag large object data such as text, ntext, varchar(max), nvarchar(max), varbinary(max), or xml data. To configure defragmentation of large objects, you check Compact Large Objects in the task dialog box or set the CompactLargeObjects option to true in the task properties box. This option is off by default.

The Check Database Integrity Task

The Check Database Integrity task encapsulates the DBCC CHECKDB T-SQL command to check the allocation and structural integrity of all the objects in a single or multiple databases. To check the integrity of indexes, you select Include Indexes in the dialog or the IncludeIndexes custom property in the properties box.

The Rebuild Index Task

The Rebuild Index task allows you to rebuild indexes on SQL Server objects such as tables and views. The task contain different custom properties, such as allocation of free space, FillFactor percentage, PAD_INDEX to allocate free space specified by the fill factor, SORT_IN_TEMPDB to store intermediate results in TempDB, and IGNORE_DUP_KEY to allow multiple-row index operations. In addition, it provides the option to release table locks by setting the ONLINE option to ON.

Other database maintenance tasks available in the Toolbox include the following:

  • The History Cleanup task You use this task to delete history tables in the msdb database.

  • The Execute SQL Server Agent task You use this task to run SQL Server Agent jobs.

  • The Notify Operator task You use this task to send notification messages.

  • The Maintenance Cleanup task You use this task to remove old files related to maintenance plans.

  • The Notify Operator task You use this task to send notification messages via email, pager, or net send to SQL Server Agent operators.

  • The Shrink Database task You use this task to shrink the size of the database and logs to a user-defined size.

  • The Update Statistics task You use this task to update statistics by distributing the key values for one or more statistics groups.

Business Intelligence Development Studio

Business Intelligence Development Studio (BIDS) is an integrated environment for designing, building, and deploying business intelligence applications. BIDS integrates the features previously available in Analysis Manager and DTS in SQL Server 2000. BIDS contains windows described in the following sections.

Solution Explorer

The Solution Explorer provides a solution- and project-based organized view of packages, data sources, data source views, and other files. You can use it to view the various packages and access their properties. In addition, the Solution Explorer allows versioning of packages using Microsoft Visual SourceSafe.

The SSIS Designer Window

The SSIS designer window provides a nice graphical view of packages. It allows you to create, modify, debug, and save packages. The designer contains the following tabs:

  • Control Flow This tab contains the process-oriented tasks of the SSIS package, in the order of execution and the looping structures that allow repetitive execution of tasks.

  • Data Flow This tab contains the data-oriented part of the SSIS package and allows you to create packages that control the flow of data from the source to the destination.

  • Event Handlers This tab allows you to create custom event handlers to manage errors and other events raised during package execution. Error handlers can be created to send emails, retrieve system information, clean up after a particular task completion, and so on.

  • Package Explorer This tab displays package objects, including variables, executables, precedence constraints, event handlers, connection managers, and log providers, in a hierarchical view. The Package Explorer tab reflects the SSIS object hierarchy, with the package container being the top of the hierarchy. As this container is expanded, you can view and access the variables, executables, precedence constraints, event handlers, connection managers, and log providers associated with the package. The Package Explorer allows you to view and delete package objects.

  • Execution Results This tab displays the package execution results.

  • Connection Managers You can use the Connection Managers tab to create, edit, and delete various connection manager types, including OLE DB, ADO.NET, flat files, SMTP, WMI, and so on.

The Toolbox Window

The Toolbox contains all the built-in tasks, arranged in proper order and categorized by functionality. The Toolbox category and items change depending on the settings on the Control Flow and Data Flow tabs. For instance, if you are in the Control Flow tab, the Toolbox shows maintenance plan tasks and control flow items; however, if you are in the Data Flow tab, the Toolbox lists data flow source, transformation, and destination tasks.

The Properties Window

The Properties window contains the properties of each task selected. You can specify the custom properties associated with each task in this window.

The Debug Windows

The Debug windows give you the ability to set breakpoints and debug the SSIS package and watch the data as it flows through the package workflow. This provides tremendous value in catching and fixing errors and in troubleshooting SSIS packages. The following sections describe some of the debug windows available in BIDS.

The Breakpoint Window

A breakpoint suspends package execution at the point where the breakpoint is defined. Breakpoints can be set on the different tasks in the SSIS designer to suspend a package on the different event handlers specified in Table 12.1. You can customize breakpoint behavior by setting the hit count type. The hit count type can be specified as follows:

  • Always Execution is suspended when the breakpoint is hit.

  • Equals Execution is suspended when the breakpoints are equal to the hit count.

  • Greater or equal Execution is suspended when the breakpoint is equal to or greater than the hit count.

  • Multiple Execution is suspended when a multiple of the hit count occurs.

The enabled breakpoint can be viewed in the breakpoint window. It also displays the various supported breakpoints and allows you to enable or disable the breakpoints.

The Command Window

The Command window allows you to specify execution commands by bypassing the menu system.

The Immediate Window

The Immediate window allows you to debug, evaluate expressions, print variables, and so on.

Call Stack Window

The Call Stack window lists the SSIS containers that are currently running.

The Locals Window

The Locals window provides the status on the current executables and variables. It lists all the variables within the scope of the current context.

The Output Window

The Output window displays various status messages during the course of package compilation and execution. It displays validation errors when a package is opened, build errors when a package is compiled, and progress/status messages on execution of a package.

The Watch Window

The Watch window is used for viewing, editing, and saving variable values and expressions. It displays various variables and expressions with its associated data types and evaluated expressions.

Progress Reporting

SSIS uses two methods to report the progress of a package during the execution:

  • The Progress tab

  • Color-coding

The Progress tab displays the order of the task execution and the start and finish times of each task. In addition, it displays any errors encountered during the execution of the package.

Color-coding is also used to display the status of package execution. Depending on color, you can distinguish whether a package is running or whether it has successfully completed or failed. This color-coding is displayed only during package execution. Table 12.2 describes the color-coding.

Table 12.2. Execution Status Color-Coding

Color

Status

Gray

Waiting to run

Yellow

Executing

Green

Successful completion

Red

Completed with errors


SQL Server Management Studio

SQL Server Management Studio allows you to manage existing packages. It allows importing and exporting of packages saved from one kind of storage format to another. It allows storing of a package to either the file system in XML format, SQL Server, or the SSIS package store. Management Studio cannot be used for designing or debugging packages, but it can be used to execute a package, view running packages, and import and export packages. When the SSIS service is running, you can use Management Studio to connect to an SSIS instance in the Object Explorer window to see existing and running packages.

Command-Prompt Utilities

SQL Server 2005 includes several command-prompt utilities that you can use to manage and execute the SSIS packages. They include the following:

  • DTExecUI This utility allows execution of a package by opening a user interface to set the execution properties.

  • DTExec This utility allows execution of existing packages from the command line.

  • DTUtil This utility allows management of existing packages from the command line.

The following sections describe these utilities in more detail.

DTExecUI

You can invoke this utility from the command prompt by typing DTExecUI (see Figure 12.6). It opens up a user interface where you can set the properties to execute the package.

Figure 12.6. The DTExecUI utility can be used to run an SSIS package.


The DTExecUI utility allows execution of SSIS packages stored in SQL Server, the file system, or the SSIS package store. You can specify the configuration file to be loaded to set the object properties of the package during runtime, or you can set values of individual objects. In addition, you can specify the commands to load, and in what order, during runtime. You can set the connection properties if the package needs to connect to an external data source. You can also set different execution options, such as setting validation warnings or enabling package checkpoints. Log files can be specified to log the execution of a package or set package verification. Alternatively, you can choose to report execution information to the console, and you can specify the level of information to display.

DTExec

DTExec is similar to DTExecUI except that it does not contain a user interface. All the settings are passed as command-line parameters. Based on the exit code, the user can determine whether the command was executed successfully. Table 12.3 lists the return values for DTExec.

Table 12.3. Return Values for DTExec

Return Value

Description

0

Successful execution

2

Package failure

3

Package cancelled by the user

4

Unable to locate the requested package

5

Unable to load the requested package


DTUtil

The DTUtil command-line utility allows you to manage existing packages by allowing you to perform operations such as copying, deleting, moving, and signing a package. You can also use DTUtil to verify whether a package exists. Table 12.4 lists the return values for DTUtil.

Table 12.4. Return Values for DTUtil

Return Value

Description

0

Successful execution

1

Package failure

4

Unable to locate the requested package

5

Unable to load the requested package

6

Cannot resolve the command line because of errors





Microsoft SQL Server 2005(c) Changing the Paradigm
Microsoft SQL Server 2005: Changing the Paradigm (SQL Server 2005 Public Beta Edition)
ISBN: 0672327783
EAN: 2147483647
Year: 2005
Pages: 150

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net