Architecture | Wireless Java : Developing with Java 2, Micro Edition

SQL Server 2005 has truly evolved SSIS into a major player in the extraction, transformation, and loading (ETL) market. It was a complete code rewrite from SQL Server 2000 DTS. What's especially nice about SSIS is its price tag, which is free with the purchase of SQL Server. Other ETL tools can cost hundreds of thousands of dollars based on how you scale the software. The SSIS architecture has also expanded dramatically, as you can see in Figure 1-3. The SSIS architecture consists of four main components:

The SSIS Service
The SSIS runtime engine and the runtime executables
The SSIS data flow engine and the data flow components
The SSIS clients

image from book
Figure 1-3

The SSIS Service handles the operational aspects of SSIS. It is a Windows service that is installed when you install the SSIS component of SQL Server 2005, and it tracks the execution of packages (a collection of work items) and helps with the storage of the packages. Don't worry; you'll learn more about what packages are momentarily. The SSIS Service is turned off by default and is set to disabled. It only turns on when a package is executed for the first time. You don't need the SSIS service to run SSIS packages, but if the service is stopped, all the SSIS packages that are currently running will in turn stop.

The SSIS runtime engine and its complementary programs actually run your SSIS packages. The engine saves the layout of your packages and manages the logging, debugging, configuration, connections, and transactions. Additionally, it manages handling your events when one is raised in your package. The runtime executables provide the following functionality to a package that you'll explore in more detail later in this chapter:

Containers: Provide structure and scope to your package
Tasks: Provide the functionality to your package
Event Handlers: Respond to raised events in your package
Precedence Constraints: Provide ordinal relationship between various items in your package

In Chapter 3, you'll spend a lot of time in each of these architecture sections, but the vital ones are introduced here.

Packages

A core component of SSIS and DTS is the notion of a package. A package best parallels an executable program in Windows. Essentially, a package is a collection of tasks that execute in an orderly fashion. Precedence constraints help manage which order the tasks will execute in. A package can be saved onto a SQL Server, which in actuality is saved in the msdb database. It can also be saved as a .DTSX file, which is an XML-structured file much like .RDL files are to Reporting Services. Of course, there is much more to packages than that, but you'll explore the other elements of packages, like event handlers, later in this chapter.

Tasks

A task can best be described as an individual unit of work. They provide functionality to your package, in much the same way that a method does in a programming language. The following are some of the tasks available to you:

ActiveX Script Task: Executes an ActiveX script in your SSIS package. This task is mostly for legacy DTS packages.
Analysis Services Execute DDL Task: Executes a DDL task in Analysis Services. For example, this can create, drop, or alter a cube.
Analysis Services Processing Task: This task processes a SQL Server Analysis Services cube, dimension, or mining model.
Bulk Insert Task: Loads data into a table by using the BULK INSERT SQL command.
Data Flow Task: This very specialized task loads and transforms data into an OLE DB destination.
Data Mining Query Task: Allows you to run predictive queries against your Analysis Services data-mining models.
Execute DTS 2000 Package Task: Exposes legacy SQL Server 2000 DTS packages to your SSIS 2005 package.
Execute Package Task: Allows you to execute a package from within a package, making your SSIS packages modular.
Execute Process Task: Executes a program external to your package, such as one to split your extract file into many files before processing the individual files.
Execute SQL Task: Executes a SQL statement or stored procedure.
File System Task: This task can handle directory operations such as creating, renaming, or deleting a directory. It can also manage file operations such as moving, copying, or deleting files.
FTP Task: Sends or receives files from an FTP site.
Message Queue Task: Send or receives messages from a Microsoft Message Queue (MSMQ).
Script Task: Slightly more advanced than the ActiveX Script task. This task allows you to perform more intense scripting in the Visual Studio programming environment.
Send Mail Task: Send a mail message through SMTP.
Web Service Task: Executes a method on a Web service.
WMI Data Reader Task: This task can run WQL queries against the Windows Management Instrumentation. This allows you to read the event log, get a list of applications that are installed, or determine hardware that is installed, to name a few examples.
WMI Event Watcher Task: This task empowers SSIS to wait for and respond to certain WMI events that occur in the operating system.
XML Task: Parses or processes an XML file. It can merge, split, or reformat an XML file.

There is also an array of tasks that can be used to maintain your SQL Server environment. These tasks perform functions such as transferring your SQL Server databases, backing up your database, or shrinking the database. Each of the tasks available to you is described in Chapter 3 in much more detail, and those tasks will be used in many examples throughout the book. Tasks are extensible, and you can create your own tasks in a language like C# to perform tasks in your environment, such as reading data from your proprietary mainframe.

Data Source Elements

The main purpose of SSIS remains lifting data, transforming it, and writing it to a destination. Data sources are the connections that can be used for the source or destination to transform that data. A data source can be nearly any OLE-DB-compliant data source such as SQL Server, Oracle, DB2, or even nontraditional data sources such as Analysis Services and Outlook. The data sources can be localized to a single SSIS package or shared across multiple packages in BIDS.

A connection is defined in the Connection Manager. The Connection Manager dialog box may vary vastly based on the type of connection you're trying to configure. Figure 1-4 shows you what a typical connection to SQL Server would look like.

image from book
Figure 1-4

You can configure the connection completely offline, and the SSIS package will not use it until you begin to instantiate it in the package. The nice thing about this is that you can develop in an airport and then connect as needed.

Data Source Views

Data source views (DSVs) are a new concept to SQL Server 2005. This feature allows you to create a logical view of your business data. They are a collection of tables, views, stored procedures, and queries that can be shared across your project and leveraged in Analysis Services and Report Builder.

This is especially useful in large complex data models that are prevalent in ERP systems like Siebel or SAP. These systems have column names like ER328F2 to make the data model flexible to support nearly any environment. This complex model naming convention creates positions of people in companies who specialize in just reading the model for reports. The business user, though, would never know what a column like this means, so a DSV may map this column to an entity like LastPaymentDate. It also maps the relationships between the tables that may not necessarily exist in the physical model.

DSVs also allow you to segment a large data model into more bite-sized chunks. For example, your Siebel system may be segmented into a DSV called Accounting, Human Resource, and Inventory. One example called Human Resource can be seen in Figure 1-5. As you can see in this figure, a friendly name has been assigned to one column called Birth Date (previously named BirthDate without the space) in the Employee entity. While this is a simplistic example, it's especially useful for the ER328F2 column previously mentioned.

image from book
Figure 1-5

DSVs are deployed as a connection manager. There are a few key things to remember with data source views. Like data sources, DSVs allow you to define the connection logic once and reuse it across your SSIS packages. Unlike connections, though, DSVs are disconnected from the source connection and are not refreshed as the source structure changes. For example, if you change the Employee table in a connection to Resources, the DSV will not pick up the change. Where this type of caching is a huge benefit is in development. DSVs allow you to utilize cached metadata in development, even if you're in an airport, disconnected. It also speeds up package development. Since your DSV is most likely a subset of the actual data source, your SSIS connection dialog boxes will load much faster.