Tour of Integration Services


Certainly the most important Business Intelligence change from SQL Server 2000 to 2005 is found in the component responsible for the movement and manipulation of data. The 2000 platform used Data Transformation Services (DTS), while the 2005 platform uses Integration Services. The change in capabilities goes far beyond what a simple name change may imply, as Integration Services was a complete ground-up rewrite of DTS. In fact, not a single code line from DTS remains within Integration Services.

Integration Services is a solution that provides enterprise-level data integration and workflow solutions that have as their goal the extraction, transformation, and loading (ETL) of data from various sources to various destinations. Included in SSIS are a wide range of tools and wizards to assist in the creation of the workflow and dataflow activities that you need to manage in these complex data-movement solutions.

Integration Services Uses

Before discussing the detailed components within Integration Services, you need to understand some of the more common business uses that involve creating SSIS solutions.

One of the first scenarios involves combining data from different sources stored in different storage systems. In this scenario, SSIS would be responsible for connecting to each data source, extracting the data, and merging it into a single dataset. In today's information systems topology, this is becoming a very common scenario, as businesses' archive information is not needed for regular operations but is invaluable to analyze business trends or meet compliance requirements. Also, this scenario is found when different parts of a business use different storage technologies or different schemas to represent the same data. In these cases, SSIS is used to perform the homogenization of the information. SSIS seamlessly handles multiple divergent data sources and the transformations that can alter data types, split or merge columns, and look up descriptive information that becomes powerful assets for these situations.

Another common scenario is the population and maintenance of data warehouses and data marts. In these business uses, the data volumes tend to be exceptionally large and the window of time in which to perform the extraction, transformation, and loading of the data tends to be rather short. SSIS includes the ability to bulk load data directly from flat files in SQL Server and also has a destination component that can perform a bulk load into SQL Server. A key feature for large data volume and complex enrichment and transformation situations such as these is restartability. SSIS includes checkpoints to handle rerunning a package from a task or container within the controlflow so that you can elegantly handle various types of errors that may occur during these complex data loading scenarios. Often important in data warehouse loads is the ability to source a particular destination from many different tables or files. Often in the database world, we refer to this as denormalization, and SSIS packages can easily merge data into a single dataset and load the destination table in a single process without the need to stage or land the data at each step of the process. Lastly, we often require the management or partitioning of history within our data warehouses in order to review the state of activity as of a certain point in time. This history management creates complex updating scenarios, and SSIS handles this with the assistance of the Slowly Changing Dimension Wizard. This wizard dynamically creates and configures a set of data transformation tasks used to manage the insert and update of records, updating of related records, and adding new columns to tables to support this history management.

Often, businesses receive data from outside of their systems and need to perform data-quality routines to standardize and clean the data before loading it into their systems. This is commonly the case when different areas of the business use different standards and formats for the information or when the data is being purchased, as in the common case with address data. Sometimes the data formats are different because the platforms that they originate from differ from the intended destination. In these cases, SSIS includes a rich set of data-transformation tasks to perform a wide range of data-cleaning, converting, and enriching functions. You can replace values or get descriptions from code values by using exact or fuzzy lookups within SSIS. Identifying records that may be duplicates by using SSIS grouping transformations helps to successfully remove them before loading the destination.

The ability to dynamically adjust the data transformations being performed is a common scenario within businesses. Often, data needs to be handled differently based on certain values it may contain or even based upon the summary or count of values in a given set of records. SSIS includes a rich set of transformations that are useful for splitting or merging data based upon data values, applying different aggregations or calculations based on different parts of a dataset, and loading different parts of the data into different locations. SSIS containers specifically support evaluating expressions, enumerating across a set of information, and performing workflow tasks based on results of the data values.

Lastly, you commonly have operational administrative functions that require automation. SSIS includes a whole set of tasks devoted to these administrative functions. You can use tasks specifically designed to copy SQL Server objects or facilitate the bulk loading of data. Also, you have access in SSIS to a SQL Management Objects (SMO) enumerator to perform looping across our servers to perform administrative operations on each server in our environment. Additionally, you have the ability to schedule all of your SSIS packages and solutions using SQL Server Agent jobs.

Four Main Parts of Integration Services

Integration Services consists of four main parts, which are shown in Figure 6-1:

  • The SSIS Service

  • The SSIS runtime engine and the runtime components

  • The SSIS object model

  • The SSIS dataflow engine and the dataflow components

image from book
Figure 6-1

Integration Services Service

The component of architecture within Integration Services responsible for monitoring packages as they execute and managing the storage of packages is the SSIS Service.

Integration Services Runtime Engine and Runtime Components

The SSIS runtime engine is responsible for saving the layout and design of the packages, running the packages, and providing support for all additional package functionality such as transactions, break-points, configuration, connections, event handling, and logging. The specific executables that make up this engine include packages, containers, tasks, and event handlers.

Within the SSIS runtime engine, the basic organizational unit is the package, which subdivides into the functional units of controlflow and dataflow. Dataflow is specifically implemented within the controlflow through the use of the dataflow task. Additionally, your SSIS packages can be called from other packages to make use of previously developed functionality, which may yield very interesting and complex dependencies that will require your attention as an administrator.

Controlflow in SSIS is created using three types of objects: containers, tasks, and constraints. The containers structure package elements by grouping tasks. Often, these containers may be used to iterate through a dataset and perform operations on each element within an iteration (such as "For each"). The tasks enable you to develop sophisticated workflow actions such as executing SQL statements, sending mail, transferring files via FTP, or executing another package. Lastly, the constraints connect the package containers and tasks and enable you to specify conditions upon which the next steps of the workflow may be executed. Three default constraints are found within SSIS: success, completion, and failure.

Integration Services Object Model

The managed application programming interface (API) used to access SSIS tools, command-line utilities, and custom applications is the SSIS object model. While we will not go into much detail about this object model, it is very important to acknowledge it as a major component of Integration Services.

Integration Services Dataflow Engine and Dataflow Components

Within an SSIS package's controlflow, a dataflow task creates instances of the dataflow engine. This engine is responsible for providing the in-memory data movement from sources to destinations. Additionally, this engine performs the requested transformations to enrich the data for the purposes you specify. Three primary components that make up the dataflow engine include sources, transformations, and destinations. The sources provide connectivity to and extract data from a wide range of sources such as database tables or views, files, spreadsheets, and even XML files. The destinations permit the insert, update, and deletion of information on a similar wide range of destinations. Lastly, the transformations enable you to modify the source data before loading into a destination using capabilities such as lookups, merging, pivoting, splitting, converting, and deriving information.

Project Management and Change Control

Clearly one of the areas needing an entirely different mindset than the previous version of SQL Server involves how DBAs interact with the development team. The shared view of development by administrators and developers alike is enacted through the Business Intelligence Developer Studio (BIDS). With regard to Integration Services, the BIDS environment is how solutions and projects are created. Generally, the configuration of the BIDS solutions and projects will be handled by the developers; however; the administrators will be called upon to help configure various aspects of these solutions. The administration and management of Integration Services is primarily performed within SQL Server Management Studio. Often, moving the Integration Services solutions from environment to environment will mean changing dynamic information within the package and also setting up any information reference by the packages. Examples of these elements include Package Configuration settings, referenced XML or configuration files, and solution data sources.



Professional SQL Server 2005 Administration
Professional SQL Server 2005 Administration (Wrox Professional Guides)
ISBN: 0470055200
EAN: 2147483647
Year: 2004
Pages: 193

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net