Next Steps | Practical Business Intelligence with SQL Server 2005

In this introduction to Integration Services, you've learned how to copy and transform data in a number of different ways. Integration Services has many additional aspects, and we want to draw your attention to a few of them.

Data Quality

For purposes of clarity in introducing Integration Services, we have assumed in this chapter a rather ideal world where there are no errors in the data. This is rarely the case, and in Chapter 7 we discuss techniques for detecting and correcting bad data. All the tasks and transforms we used have an error output you can use to divert the process or data flow in case of an error.

Scaling Your Solution

The techniques shown in this chapter work well for moderate to high volumes of data. If you have a very large database, or very high rates of new data, you should read Chapter 11, "Very Large Data Warehouses." This addresses design and management issues relating to Integration Services packages, the data warehouse, and the on-line analytical processing (OLAP) databases. Integration Services packages are typically stored in the msdb database in SQL Server. This means that if that SQL Server instance is clustered, the packages are accessible even if a cluster node fails. Integration Services itself is not cluster aware and doesn't really need to be, because it is really just a service available on any machine it is installed on. In a cluster, you can only store Integration Services packages on one instance of SQL Server. If you are using Integration Services packages stored on a clustered instance of SQL Server, be sure to use the virtual server name, not the physical server name, when referencing a package.

Other Transformations

Our goal in this chapter was to get you familiar with the structure of Integration Services packages, not necessarily to explain each transform or task. A number of other transformations are not covered in this chapter. One set of transformations controls how rows flow between source and destination, such as Union All to merge multiple data sets with the same set of columns, and Merge Join, which allows you to perform database-style joins between two data streams. You can also Sort data, or use the Aggregate transformation to perform functions such as sum or count.

Another set of transformations allows you to transform column values, such as converting data types using Data Conversion or performing string manipulations using the Character Map transformation. There are also transformations for working with Analysis Services objects, such as the Dimension Processing and Data Mining Model Training transformations.

The Script transform is extremely useful for creating complex transforms because you can code in a flexible language.

Finally, one important transformation called the Slowly Changing Dimension transformation is covered in detail in Chapter 8.

Control Flow Tasks

We have focused our attention mostly on the Data Flow task in this chapter because this is the most commonly used task. Other tasks include Data Preparation tasks that prepare data for loading, such as copying or downloading files using FTP or MSMQ, and Bulk Insert.

You can also add a great deal of flexibility to your packages by writing custom .NET code using a Script task, execute other packages or Windows applications using Execute Package or Execute Process tasks, or even call Web services to perform a function or return data using the Web Service task.