One of the biggest challenges to a successful data warehouse implementation is making all of the diverse elements fit together. In the past, you had two options for data warehousing tools: best of breed from multiple vendors or a suite of products from a single vendor. The best-of-breed approach required purchasing tools from various vendors and configuring them to work together. For example, one vendor might produce a very good data migration tool and another might produce good desktop analysis tools. Whether the tools worked together was up to each vendor. The alternative "single-source" solution was a good idea for compatibility, but in some cases the suite of tools a single vendor offered did not perform all aspects of the data warehousing process well. Microsoft has taken an "all of the above" strategy by offering a suite of tools for data warehousing as well as a standard interface via which other vendors tools can be used. To do this, Microsoft and its data warehouse alliance partners have identified the major components of a data warehouse and created a framework that supports these major components.
After this lesson, you will be able to:
- Describe the Microsoft Data Warehousing Frameworks
- Use the approach that Microsoft recommends
- Identify the data store component of the Microsoft Data Warehousing Framework
Estimated lesson time: 25 minutes
The goal of the Data Warehousing Framework is to simplify the design, implementation, and management of data warehousing solutions. This framework has been designed to provide
Figure 5.1 shows how the Microsoft Data Warehousing Framework is the road map for product development and integration on the Microsoft platform.
Figure 5.1 Microsoft Data Warehousing Framework
The Data Warehousing Framework has been designed to provide an open architecture for easy extension by Microsoft customers and business partners using industry-standard technology. This architecture allows organizations to choose best-of-breed components and still be assured of integration.
Ease of use is a compelling reason for customers and independent software vendors to choose the Microsoft Data Warehousing Framework. Microsoft provides an object-oriented set of components that are designed to manage information in the distributed environment. Microsoft is also providing both entry-level and best-of-breed products to address the many steps in the data warehousing process.
The Data Warehousing Framework describes the relationships between the various components used in the process of building, using, and managing a data warehouse. The core of the Microsoft Data Warehousing Framework is the set of enabling technologies that include the data transport layer (OLE DB) and the integrated metadata repository. These two technologies allow for the interoperability of the many products and tools involved in the data warehousing process.
Building a data warehouse requires a set of tools for describing the logical and physical design of the data sources and their destinations in the data warehouse or data marts. Operational online transaction processing (OLTP) data must pass through a cleansing and transformation stage before being placed into the data warehouse or data marts in order to conform to the definitions laid out during the design stage. This data staging process is often many levels deep, especially with enterprise data warehouse architectures.
End-user tools, including desktop productivity products, specialized analysis products, and custom programs, are used to gain access to the information in the data warehouse. Ideally, user access is through a directory facility that enables users to search for appropriate and relevant data to resolve business questions; this facility provides a layer of security between the users and network systems.
Finally, a variety of tools, such as those for scheduling repeated tasks and managing multiserver networks, can come into play for the management of the data warehouse environment.
Microsoft Repository provides the integration point for the metadata shared by the various tools used in the data warehousing process. Shared metadata allows for the transparent integration of multiple tools from a variety of vendors, without the need for specialized interfaces between each of the products.
Similarly, OLE DB provides for standardized, high-performance access to a wide variety of data and allows for integration of multiple data types. OLE DB is described in detail in a later chapter.
Figure 5.2 illustrates which Microsoft tools to use for each of the corresponding data warehouse components. Microsoft provides a complete set of tools that satisfy the requirements for each major data warehousing component.
Figure 5.2 How Microsoft components fit the framework
Microsoft recognizes that in a customer s implementation, not all of these components will be Microsoft products. However, Microsoft does provide a tool or technology for each major data warehousing component:
Figure 5.3 depicts how the Microsoft components and the Data Warehousing Framework fit in the data warehousing life cycle.
Figure 5.3 How Microsoft components support the data warehousing life cycle
The process of creating a data warehouse can involve many systems, tools, and components. A number of years ago, Microsoft identified the simplification of that process as one of its priorities.
Nowadays, data warehouses are collecting data from all sorts of systems, not just relational databases. While ODBC proved great in standardizing the route to relational data, a new standard was needed to help with the multitude of data in other systems. From that need, OLE DB was born. Microsoft then took the concept further to permeate all areas of data warehousing and produce the Data Warehousing Framework. The framework is a blueprint for the development of open, interoperable tools and components for which the goal is to simplify the design, implementation, and management of data warehousing solutions.
The framework provides customizable and extensible components based on an open architecture.
The following topics provide an overview of the components that make up the Data Warehousing Framework and the associated Microsoft tools and technologies that help to facilitate the data warehousing process. You will learn more details about these components and how to use them in later chapters. The components that make up the framework are:
At the core of the Data Warehousing Framework is OLE DB, an object-based, system-level programming interface that helps you access information across a variety of data sources. OLE DB is the Microsoft interface for managing data across an organization. ODBC provides access to data in relational databases, whereas OLE DB allows access to both relational and non-relational data. OLE DB for OLAP is an extension to OLE DB that adds multidimensional capabilities for accessing data stored in multidimensional cubes.
The benefit of OLE DB for data warehousing is its ability to access a variety of sources, including
You cannot bring data directly into the data warehouse without first transforming the data. A data transformation describes the application of one or more operations to a piece of data, which helps bring forth a consistent, viable format for the data.
DTS is part of SQL Server and it provides a sophisticated set of tools for transferring and transforming data. DTS provides
SQL Server replication is a technology for distributing data across an enterprise. SQL Server replication allows you to move and synchronize copies of your data across databases. Many data warehousing environments require that you move data automatically from a data warehouse database to a data mart database or the other way round and SQL Server replication is ideally suited to doing this.
You have seen in previous chapters how business often looks at data in a multidimensional format. You have also seen how the most useful queries in an OLAP environment are those that examine trends by using aggregations. Figure 5.4 illustrates how OLAP Services support creating multidimensional aggregated structures and then querying them.
Figure 5.4 OLAP Services
OLAP Services include a middle-tier server component that processes relational data to create multidimensional cubes and a client-side data caching and calculation service (PivotTable Service) that gives users intuitive data analysis capabilities.
OLAP Services give you the flexibility to choose the storage type that best fits your organizational needs. The OLAP Services middle-tier server component supports the creation of cubed data in three storage architectures:
NOTE
The OLAP Services Server is a completely separate server from SQL Server. Although designed to work extremely efficiently with the powerful capabilities of Microsoft SQL Server 7, the OLAP Server can use any OLE DB, ODBC, or text data as the data warehouse source database.
OLAP Services come with the following wizards:
NOTE
The Partition wizard is available only when you are using the version of OLAP Services that comes with the Enterprise Edition of SQL Server 7. The Enterprise Edition of SQL Server 7 requires Windows NT Server, Enterprise Edition.
The PivotTable Service is the client-side complement to the OLAP Services Server components. The PivotTable Service gives access to OLAP Services, provides application developers with a way to get at OLAP data, and allows you to store cubes locally, which enables faster processing and mobile analysis and presentation in a disconnected environment.
Microsoft ActiveX Data Objects (ADO) and Microsoft ActiveX Data Objects (Multidimensional) (ADO MD) provide access to information supplied by OLE DB and OLE DB for OLAP providers, in a manner that is simple for developers to use and is supported by all major development tools in the market today.
OLE DB and OLE DB for OLAP are standards for data access. Although OLE DB is very powerful, it is also complex enough to require a language like C++ to properly access it.
ADO is an application-level programming interface to OLE DB. It is easy to use and supports COM automation interface. ADO MD is the counterpart to ADO for access to multidimensional data. It is an application-level programming interface to OLE DB for OLAP. It also supports the COM automation interface. ADO and ADO MD are discussed in more detail, including programming examples, in Chapter 12, "MDX Statements and ADO MD Objects."
Users of your data warehouse shouldn t have to know SQL syntax. Microsoft English Query allows end users to input queries in plain English instead of Transact-SQL. The value of this ability in a data warehousing environment is self-evident: It makes it much easier for end users to perform ad hoc querying.
The English Query engine analyzes general English semantics to create the proper SQL syntax for the query. Additionally, the English Query Engine utilizes application-specific domain files, which contain mappings of specific nouns and verbs to specific database structures and joins.
If the English Query engine determines that more information is needed to execute a query, the user is prompted for more information.
You can create the domain with a domain editor and then save the file as a domain project. This file is compiled and used in conjunction with the engine to parse English into Transact-SQL.
NOTE
The domains associated with English Query are different from, but analogous to, Windows NT domains. A general definition of a domain is "a collection of information." A Windows NT domain is a collection of information about security and resources. An English Query domain is a collection of information about a database, including the English terms users employ to describe it.
In the data warehouse environment, it is important to have an integrated approach to metadata rather than having multiple catalogs of metadata for each point solution. It is not sufficient to merely store the data; the needs of the analysis environment require that the architecture make available data about the data for example, comments and data lineage.
This component in the Data Warehousing Framework is Microsoft Repository, which allows for shared metadata across components of the framework. Metadata is information stored by various data warehousing tools. By sharing this information, the tools can be transparently integrated without the need for special interfaces.
Microsoft Repository is a database that stores descriptive data about the relationships between data warehouse components. It is a set of published COM interfaces based on the Open Information Model (OIM).
The openness of Microsoft Repository allows third-party vendors transparent access to build their tools around it and leverage the strength of sharing metadata in a central place using the same protocol. The ability to share and reuse meta-data is key to the interoperability of tools from multiple vendors.
Microsoft Repository allows data warehouse designers to store the origin of a piece of data in the repository. This ability allows you to implement a full audit trail, down to the row level, the date and time when the DTS package was executed, and who executed the package.
This lesson presented an overview of the Data Warehousing Framework. Before you develop a solution, it is important to have in-depth knowledge about the various components of the framework. While most components of the framework provide very distinct capabilities, in some situations you can choose from more than one component to achieve the same results for example, in some cases, data migration could be accomplished using either DTS or replication.