Each data source will have its own intricacies based on how much effort the provider themselves invested in making it efficient. With data sources such as DB2, you have many options, just like in Oracle. Many customers have legacy data sources that lack OLE DB providers to assist you. In these cases, you can use ODBC and the Script source to read data, or the Script destination to write data into the database. This section covers a few other common data sources that we have found in the field, and some interesting alternatives you may not have known about. You may be interested in a fantastic site by the connectivity team that may help you connect to a data source that’s not listed in this section: http://ssis.wik.is.
If you are a consumer of DB2, there are several options you have. One of the easiest options you have is to use the Microsoft OLE DB Provider for DB2. This can be downloaded for free as part of the SQL Server 2005 Feature Pack if you have an Enterprise Edition license of SQL Server. To find the Feature Pack, search for “SQL Server 2005 Feature Pack” in your favorite search engine. You can also use drivers such as DB2 Connect from IBM and StarSQL (by StarQuest). Each of these providers has its own benefits and problems.
With any of these providers, it’s best to use the OLE DB source in the data flow instead of the Datareader (which is actually Ado.Net) source. The Microsoft Provider has an incredibly fast provider option that speeds up reading out of the data source tremendously. This option is an advanced option called Rowset Cache Size and, when set to a number such as 30, it will multithread the reads from the SSIS machine. The problem with the provider (as of this writing) is that it does not support large data types such as LOB, CLOB, BLOB, or their varchar and vargraphic equivalents. This support is coming, but providers such as StarSQL support it today. There are also options in the providers to turn datetime data types into varchar data types.
Writing data to DB2 can be extremely slow with the Microsoft provider. The fact is, Microsoft won’t be investing much R&D in efficiently writing data to a competing database platform. If this is important to you, then you should test other providers like StarSQL or the IBM driver. Persistent is also making a driver that will hook right into the data flow and will be very fast for writing data to DB2. The following table shows a list of some other drivers to consider.
Vendor | API | Version of DB2 | SSIS Natively Compatible |
---|---|---|---|
Microsoft | OLE DB | All DRDA compliant | Yes |
IBM | OLE DB | z/OS and UDB 7.0 and later, AIX 5.0 and later, AS400 4.5 and later | Yes |
IBM | ADO.NET | DB2 UDB 9.0 | No |
IBM | ODBC | z/OS and UDB 7.0 and later AIX 5.0 and later, AS400 4.5 and later | No |
Attunity | OLE DB | 6.1, 7.x, 8.0 on z/OS 7.x, 8.0 on UNIX 7.x, 8.0 on Windows | Yes |
Attunity | OLE DB | On AS/400 5.1 and later | Yes |
DataDirect | OLE DB | z/OS and UDB 7.0 and later AIX 5.x AS400 4.5 and later | No |
HIT | OLE DB | z/OS and UDB 8.0 and later AIX 5.x, AS400 4.5 and later | No |
DataDirect | ADO.NET | z/OS and UDB 7.0 and later AIX 5.x, AS400 4.5 and later | |
ETI | Bulk Load | 8.0 and later | Yes |
VSAM is a particularly nasty data format to deal with. There are two main providers that you can use for this format: OLE DB Provider for VSAM and a similar middleware provider that is made by Attunity. Each can be challenging to install and configure, but after they are configured, they will perform fairly fast for reading data out of VSAM.
In several clients’ performance tests that we performed, the Attunity provider far exceeded the performance of the Microsoft driver, and provided easy support of change data capture (CDC). This provides a great way to build a bridge to the mainframe to gradually remove your applications from it. The disadvantage to the Attunity provider is that it has a heavier footprint on the mainframe. With either provider, it’s important that you have very good indexes on the VSAM files so that you can retrieve data quickly. This especially applies to any time you want to retrieve data out of the VSAM file with a predicate (where clause).
For most people, however, the best option is to retrieve an extract from VSAM. If you choose this as a source, ensure that you allot enough time in your project plan for the mainframe group to produce these programs. It’s easy to take that group for granted, and before you know it, they want an additional month in your project plan.