Integrating with Non-SQL Server 2005 Components

Integrating with Non-SQL Server 2005 Components

You can build your entire DW/BI system using Microsofts technology. Microsoft has added some features to make a Microsoft-only stack work better together, but these are surprisingly few. If you have good reasons for using a different technology for one or more components, youll be in good company with a heterogeneous system.

In this section we outline a few of the issues you may encounter with developing and deploying a heterogeneous system.

Replacing the Relational Database

Many customers use Microsofts business intelligence technology with a non-Microsoft relational database. Among our consulting clients , Oracle is the most common relational database used with SQL Server, although weve seen plenty of IBMs DB2 and a smattering of other technologies.

All of SQL Servers business intelligence components communicate with each other, and the relational database, through a variety of data providers, including OLE DB, ODBC, or .NET providers. The popular databases including those mentioned previously all have OLE DB providers written either by Microsoft, by the relational database vendor, or by a third partysometimes all three. Any less common database technology is very likely to have an ODBC driver available. All of these options, combined with Microsofts OLE DB for ODBC and Flat Files providers, enable you to write and read data from a huge variety of sources.

Integration Services with Non-SQL Server RDBMS

Integration Services packages can read data from non-SQL Server sources very efficiently . Integration Services packages can write and update data to non-SQL Server sources. However, because the OLE DB interface does not address bulk loading, Integration Services writes data to the OLE DB destination row by row. This inefficient method of inserting data might be adequate for the daily incremental load process, but its usually too slow for the historical load. As we describe in Chapter 5, if you need to insert large sets of data into a non-SQL Server database from Integration Services, you should write the data to flat files and then launch your databases bulk loader from Integration Services. As time goes on, we can hope that someone (Microsoft, database companies, or third-party database utility companies) will write high-performance, database-specific Integration Services destination adapters, similar to the one that Microsoft has written for SQL Server.

Analysis Services with Non-SQL Server RDBMS

Its very common to build an Analysis Services OLAP database on top of a non-SQL Server database. Some observers estimate that nearly half of all Analysis Services 2000 implementations were built from non-SQL Server sources. With Analysis Services 2005, you can even build a cube from heterogeneous sources, with some data stored in, say, SQL Server and other data stored in Oracle. (Just how good an idea this is remains an open question!)

The most common stumbling block to populating an Analysis Services database from a non-SQL Server relational source is the performance of the OLE DB provider. As we mentioned previously, Microsoft doesnt write most of these providers. Some providers perform incredibly badly . If youre having performance problems loading your Analysis Services database, investigate alternative providers.

Some DW/BI systems that implemented Analysis Services 2000 on top of non-SQL Server databases found that the SQL issued by Analysis Services performed poorly in the source relational database. A different syntax, or perhaps a query hint, would have made the same SQL operation perform significantly better, but Analysis Services 2000 provided no mechanism for modifying the queries that populate the database. Analysis Services 2005 uses cartridges for different providers, so the SQL can be structured differently from one relational database to another. It would be unusual, but not impossible , for a DW/BI team to modify a cartridge for its own use. Microsoft expects cartridge development to be primarily the province of third-party tool and system developers.

There are a few Analysis Services features that work only with SQL Server source data. The most interesting is proactive caching, which we discuss in Chapter 17. You can use proactive caching with non-SQL source databases, but youll have fewer options in configuring this feature.

Reporting Services with Non-SQL Server RDBMS

You can use Reporting Services to create and publish reports from non-SQL databases. The same reporting portal can combine reports from multiple sources. In fact, a single report may contain data from multiple sources, although this generally works best as a summary report from one source that drills down (for example on customer key) to detailed data from a second source. The mechanism for this broad connectivity is the same set of OLE DB providers that we have discussed previously.

Reporting Services does require an instance of the SQL Server relational database to hold the report metadata. In many installations the report metadata catalog will require little permanent storage, as metadata catalogs are typically quite small. However, if you use Reporting Services to schedule and archive reports, the cached results are stored in this same report catalog database. These cached reports can be quite large. You may be surprised by how large this SQL Server database is growing.

Data Mining with Non-SQL Server RDBMS

You can use Microsofts data mining functionality to build models from a non-SQL Server relational database as easily as from SQL Server. There are no known issues.

Replacing Integration Services

As ETL tools have gained a strong foothold in data warehouse infrastructures over the last decade , many customers will be reluctant to migrate from their existing ETL tool to Integration Services. All the major ETL tools work cross-platform, and can populate a SQL Server database. Even if your ETL vendor has not released a version of their product thats specific to SQL Server 2005, your existing version should do a fine job.

There are a few places where Integration Services is extremely useful, even if youre relying on a third-party tool for the bulk of the ETL system. The tasks for processing Analysis Services data mining models and OLAP databases are very convenient . You may set up a small Integration Services package simply to perform these functions. Use your ETL tools execute external process function to launch DTExec and start this package after your load completes. Look to see if your ETL package can pick up status codes returned by DTExec, so you can handle errors gracefully and integrate processing metadata. Of course, you should first look within your existing ETL package to see if it has functionality to process Analysis Services objects. Over time, we expect to see other ETL tool vendors include this capability in their products.

Even if your ETL tool cant process Analysis Services directly, youre not required to use Integration Services. You can set up a script, like VBScript, to control processing.

A second use of Integration Services beyond its role in the ETL system, is for periodic database maintenance activities. This role uses Integration Services database maintenance tasks like Back up databases that work only with SQL Server. The classic way to set up database maintenance scripts is to write TSQL scripts, perhaps glued together with some operating system scripting. Alternatively, there are third-party database maintenance utilities. Integration Services is clearly superior to the hand-scripting method of database maintenance, and we recommend that you strongly consider using it for this purpose even if you use a third-party ETL tool.

Replacing Analysis Services OLAP, Data Mining, or Reporting Services

The SQL Server 2005 technologies that are closest to the business user Analysis Services OLAP and Data Mining, and Reporting Servicesare all easy to swap for non-Microsoft products. There are many third-party alternatives to much of Reporting Services functionality.

Most major OLAP technologies are cross-platform in terms of their data sources, and can work from a SQL Server relational database. You should be able to implement a non-Microsoft OLAP system like Hyperion or MicroStrategy on top of SQL Server 2005 without significant difficulty. These products can read data even from SQL Server partitioned tables without any changes to their code; querying partitioned tables is transparent to the client.

These non-Microsoft products may place different requirements on the underlying relational data model. All prefer or require dimensional structures, which the authors also strongly recommend for supporting Analysis Services. Some non-Microsoft tools are much happier with a snowflake design. Some are significantly less able to hold large data volumes or monster dimensions than Analysis Services, so you may need to implement summary cubes with drillthrough to relational detail. Nonetheless, you can expect no significant problems that are unique to pulling data from SQL Server 2005.

You should expect to find that non-Microsoft data mining tools can readily consume SQL Server data. If those data mining tools include built-in functionality for writing back to the relational database, for example to store results from training the model, you should investigate whether that writeback uses SQL Servers bulk loading API to write efficiently. This may be an important feature for performing data mining on large data volumes.

Using a Non-Microsoft Ad Hoc Query Tool

Its common to use a non-Microsoft ad hoc query tool to construct and execute queries against both the relational database and Analysis Services. Microsofts ad hoc query tool offerings to date are not best-of-breed , or even close, and many Microsoft-centric DW/BI teams have turned to third-party tools to meet the needs of their power analysts.

If you implement Reporting Services, particularly embedded within a BI portal, youll significantly reduce the number of business users who truly need ad hoc capabilities. As we discuss in Chapter 9, reports can be parameterized and include fairly sophisticated drill paths. These two features can meet the vast majority of business users ad hoc needs. Reporting Services Report Builder provides functionality that meets additional ad hoc requirements, although its not the very best analytic tool weve ever seen. True ad hoc analysis, where the power user is developing new calculations and exploring the data in new ways, is best performed using tools specifically designed for that kind of interaction. Microsoft currently doesnt offer a good general purpose ad hoc query tool for OLAP, although Microsoft Office Excel Pivot Tables, Web Components, and Data Analyzer offer some functionality. Microsoft is starting to address this glaring hole in its BI product suite. In 2006, the next version of Office is making some progress along the very long road to best-of-breed.

One of the most interesting features of Analysis Services 2005 is the ability to build an OLAP database on any data source, including normalized data. Combined with proactive caching, which may simplify maintenance, Microsoft delivers a lot of the power of the dimensional model without the cost of building a dimensional data warehouse. This is not a great foundation for an enterprise information architecture, but can provide quite significant cost savings in some scenarios. However, in order to reap those savings you need to access the data through the OLAP layer by using a query tool that speaks XML for Analysis. At the time of this writing, that means using either Reporting Services, the Microsoft Office products, or a non-Microsoft query tool designed for OLAP, as we discussed earlier. Any SQL-based query and reporting toolwhich includes the vast majority of query and reporting tools on the marketwill not work against this architecture.

This area is confusing for readers, in large part because there are so many query and reporting tools on the market, and most of them can access Microsoft data. Well finish this section by summarizing the two main approaches:

  • Implement Analysis Services as the primary presentation server, with most queries going to Analysis Services. Set up Reporting Services, preferably within a portal, to serve most queries and reports from Analysis Services. Other Reporting Services reports can source data from the relational databases. Report users cant tell the difference. Provide the power users with an advanced query and analysis client tool designed to work with Analysis Services, probably one sold by a non-Microsoft vendor.

  • Implement a primarily relational data warehouse, with little or no use of Analysis Services OLAP databases. Build the reporting application with Reporting Services or any other SQL-based reporting toolset. Use your choice of ad hoc query and analysis client tools to access the relational data warehouse and any cubes in your architecture.

The first architecture makes most sense to us in most cases. However, it puts you solidly in the Microsoft camp (and it limits your ad hoc tool choices) unless and until other vendors build query and reporting systems that can take full advantage of Analysis Services databases. Any investment you may have in a SQL-based reporting infrastructure is made obsolete.

If you have recently negotiated a big contract with a SQL-based query and reporting tool vendor, youll probably choose the second approach. An advantage is that youll be a bit more in the mainstream and less tied to the somewhat idiosyncratic Microsoft way of implementing business intelligence. If you believe, as we do, that a dimensional database server is a fundamentally better way to access data than direct queries using SQL, then by minimizing or eliminating Analysis Services role youre not maximizing the advantage from your DW/BI system.



Microsoft Data Warehouse Toolkit. With SQL Server 2005 and the Microsoft Business Intelligence Toolset
The MicrosoftВ Data Warehouse Toolkit: With SQL ServerВ 2005 and the MicrosoftВ Business Intelligence Toolset
ISBN: B000YIVXC2
EAN: N/A
Year: 2006
Pages: 125

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net