You have completed the first three chapters of the book where you learned the concepts of data warehousing, worked hands on with Analysis Services tools and finally learned the basics of the MDX language to retrieve data from Analysis Services. The next three chapters of the book guide you to use the product to design your cubes and dimensions. The traditional approach of designing your cubes and dimensions is based upon an existing single-source data set. You will be working with multiple relational data sources in the real world when you develop business intelligence applications. In this chapter you learn what data sources are and how they feed into the creation of Data Source Views (DSVs). These DSVs provide you a consolidated view on just the tables and joins on tables of interest across the one or more data sources you defined. The data source and DSVs literally form the foundation for subsequent construction of both dimensions and cubes. Note that more than one data source per project is supported as are multiple DSVs per project; you learn how this infrastructure plays out in this chapter.
In order to retrieve data from the source you need information about the source such as name of the source, type of method to retrieve the data, security permissions needed to retrieve the data etc. All this information is encapsulated into an object called Data Source in Analysis Services 2005. An Analysis Services database contains a collection object called Data Sources which stores all the data sources needed to build cubes and dimensions within that database. Analysis Services will be able to retrieve source data from data sources via the OLEDB interface or using the managed .Net provider interface.
In the simplest case you will have one data source that contains one fact table with some number of dimensions linked to it by joins; that data source is populated by data from an OLTP database and is called an Operational Data Store (ODS). Figure 4-1 shows a graphical representation of this data source usage. The ODS is a single entity storing data from various sources so that it can feed as a source of data for your data warehouse.
A variant on the data source usage, which is enabled by the UDM in Analysis Services 2005, is the ability to take data directly from the OLTP system as input to the decision support software. This is shown in Figure 4-2.
In Analysis Services 2000, certain limitations were associated with the use of data sources. Analysis Services 2000 only supported one fact table per cube. Therefore, only one data source could be used for specifying the fact table of a cube. You could still specify multiple data sources within Analysis Services 2000, because dimensions referenced did not have to be in the same data source as the fact table. A workaround addressing the single fact table constraint was to create a SQL view on the multiple fact tables to create what appeared to Analysis Services 2000 as a single fact table. A more common and straightforward solution adopted by many users was to have multiple cubes based on disparate data sources and combine them into a single cube that was called a virtual cube.
Analysis Services 2005 natively supports the capability of specifying multiple fact tables within a single cube. Each of these fact tables can be from a different data source. Analysis Services 2005 still provides you with the capability of creating what are essentially virtual cubes; this is accomplished using linked objects (discussed in Chapter 9). Since Analysis Services 2005 provides you with the capability of creating cubes from various data sources you need to be extremely careful about how you model your cube — that is, you must specify the right relationships (primary key and foreign key mappings) between tables from various data sources. In this way you can make sure your cube is designed to provide you the results you want.
Using the Analysis Services 2000 data source methods is like carving on a bar of soap with a butter knife: you could create a statue, but it might not win any awards for beauty. Conversely, the kind of power and flexibility in Analysis Services 2005 puts you in a position similar to that of carving a bar of soap with a razor blade. Carving with a razor blade, you can make a gorgeous and intricate statue, but if you're not careful, you could cut the heck out of your fingers. So, be careful and craft some beautiful dimensional schemas! To do so, keep your schemas as simple as possible relative to the flexibility requirements imposed by the application specification you're working with.
Strictly speaking, Analysis Services 2005 supports all data sources that expose a connectivity interface through OLE DB or .Net Managed Provider. The data sources need to support certain basic methods based on the interface that conform to the standard since Analysis Services 2005 uses those methods to retrieve schema (tables, relationships, columns within tables and data types) information. If the data source is a relational database then by default it uses the standard SQL to query the database. Analysis Services uses a cartridge mechanism that allows you to specify extensions and customization in SQL that are used by a specific relational database. This allows Analysis Services to efficiently query data from the relational database.
The major data sources for Analysis Services databases are the relational databases Microsoft SQL Server, IBM's DB2, Teradata and Oracle. Figure 4-3 shows various data sources supported by Analysis Services 2005 on one of the machines that has SQL Server 2005 installed. For a specific data source you need to install the client components of the data provider so that the OLE DB provider and/or .Net provider for that specific data source is installed on your machine. These client components should not only be supported on your development machine where you use BIDS to design your database but also on the server machine where an Analysis Services instance will be running. For relational databases DB2 and Oracle it is recommended you use the Micorosoft's OLE DB data provider for Oracle or DB2 instead of the OLE DB providers provided those databases. Please make sure appropriate connectivity components from Oracle and IBM's DB2 are installed on your machine in addition to the OLE DB providers from Microsoft.
In Chapter 2 you used the data source wizard to create a data source which included impersonation information. In addition to providing impersonation information you can optionally specify additional connection properties such as query time out for connection, isolation level, and maximum number of connections as shown in Figure 4-4. The isolation level property has two modes: Read Committed and Snapshot. By default the Read Committed is used for all the data sources. The Snapshot isolation mode which is supported by the relational data source SQL Server 2005 and Oracle are used to ensure that the data read by Analysis Services 2005 is consistent across multiple queries sent over a single connection. What this means is that if the data on the relational data source keeps changing and if multiple queries are sent by Analysis Services 2005 to SQL Server 2005 to query data then all the queries will be seeing consistent data seen by the first query. Any new data that arrived between the results of first query and Nth query sent over a specific connection will not be included in the results of the Nth query. All these connection properties will be stored and applied whenever a connection is established to that specific data source. The data source wizard also allows you to create data sources based on an existing data source connection already created so that a single connection is shared by Analysis Services for multiple databases. The wizard also allows you to establish connections to objects within the current Analysis Services project such as establishing an OLE DB connection to the cube being created in the project. Such a connection is typically useful while creating mining models (to be seen in Chapter 13) from cubes.
There are two types of data provider which most data sources support. OLE DB is a common set of interfaces implemented through COM components that help you to access data from data sources. Similar to OLE DB; a common interface is being exposed using managed code. Providers exposing the interface using .NET technology are called .NET providers. Analysis Services 2005 has the ability to use OLE DB or .NET providers to access data from data sources such as flat files to large scale databases such as SQL Server, Oracle, Teradata and DB2. Analysis Services retrieves data from the data sources using the chosen provider's (OLE DB or managed) interfaces for processing of Analysis Services objects. If any of the Analysis Services objects are defined as ROLAP then the provider is used to retrieve data during query time. Updating the data in the UDM is called writeback. Analysis Services also uses the provider interfaces to update the source data during writeback (you learn writeback in Chapter 12).
Microsoft has created the .NET Framework and programming languages that use the framework to run in a Common Language Runtime (CLR) environment. The relationship between the Microsoft languages and the CLR are analogous to that of Java the language and the Java Runtime (the virtual machine). The .NET Framework itself is a huge class library that exposes tons of functionality and does so in the context of managed code. The term managed refers to the fact that memory is managed by the CLR and not the coder. You can write your own managed provider for your data source, or you can leverage .NET providers that use the .NET Framework. With the installation of SQL Server 2005 you will have .NET providers to access data from Microsoft SQL Server and Oracle which is shown way back in Figure 4-3. If your relational data source has a .NET provider, you can install it and use that provider. In the Connection Manager page of the data source wizard, you can choose the .NET provider to connect to your data source.
OLE DB is an industry standard that provides a set of COM (Component Object Model) interfaces that allow clients to access data from various data stores. The OLE DB standard was created for client applications to have a uniform interface from which to access data. Such data can come from a wide variety of data sources using this interface, such as Microsoft Access, Microsoft Project, and various database management systems.
Microsoft provides a set of OLE DB data providers to access data from several widely used data sources. These OLE DB providers are delivered together in a package called MDAC (Microsoft Data Access Components). Even though the interfaces exposed by the providers are common, each provider is different in the sense they have specific optimizations relevant to the specific data source you perform on data retrieval. OLE DB exposes the COM interface and is written in unmanaged code (code that runs outside the .NET Framework CLR). The .NET Framework provides you with a way of having interoperability between managed and unmanaged code by which you can create a .NET wrapper that uses the unmanaged code but provides the functionality of a managed provider. There is such a wrapper on top of the OLE DB that is called the Unmanaged .NET Data Provider for the data sources that do not have a native implementation of the .NET data provider.
The Microsoft OLE DB provider for SQL Server has been the primary way of connecting to a Microsoft SQL Server to date. In the SQL Server 2005 release, this OLE DB provider has been repackaged and named SQL Native Client. SQL Native Client provides easy manageability of upgrades to the OLE DB provider. Analysis Services 2005 provides the capability of connecting to any data source that provides the OLE DB interface, including the Analysis Server OLE DB provider by which you can retrieve data from another Analysis Server.
Analysis Services 2000 supported connection to data sources through OLE DB providers only. Analysis Services 2005 has a much tighter integration with the .NET Framework and supports connections via OLE DB providers and .NET data providers. If you deployed the .NET Framework across your entire organization, we recommend you use the .NET providers to access data from relational data sources. You might encounter a certain amount of performance degradation by using the .NET provider; however, the uniformity, maintainability, inherent connection pooling capabilities, and security provided by .NET data providers are worth the risk of taking the hit on performance. If you are really concerned about performance, we recommend you use OLE DB providers for your data access.