Lesson 1: Microsoft Data Warehousing Framework

One of the biggest challenges to a successful data warehouse implementation is making all of the diverse elements fit together. In the past, you had two options for data warehousing tools: best of breed from multiple vendors or a suite of products from a single vendor. The best-of-breed approach required purchasing tools from various vendors and configuring them to work together. For example, one vendor might produce a very good data migration tool and another might produce good desktop analysis tools. Whether the tools worked together was up to each vendor. The alternative "single-source" solution was a good idea for compatibility, but in some cases the suite of tools a single vendor offered did not perform all aspects of the data warehousing process well. Microsoft has taken an "all of the above" strategy by offering a suite of tools for data warehousing as well as a standard interface via which other vendors tools can be used. To do this, Microsoft and its data warehouse alliance partners have identified the major components of a data warehouse and created a framework that supports these major components.

After this lesson, you will be able to:

  • Describe the Microsoft Data Warehousing Frameworks
  • Use the approach that Microsoft recommends
  • Identify the data store component of the Microsoft Data Warehousing Framework

Estimated lesson time: 25 minutes

The goal of the Data Warehousing Framework is to simplify the design, implementation, and management of data warehousing solutions. This framework has been designed to provide

  • An open architecture that is integrated and extended easily by third-party vendors
  • Heterogeneous data import, export, validation, and cleansing services with optional data lineage (determining the source of any piece of data)
  • Integrated metadata (that is, data about the data) for warehouse design, data extraction/transformation, server management, and end-user analysis tools
  • Core management services for scheduling, storage management, performance monitoring, alerts/events, and notification

Figure 5.1 shows how the Microsoft Data Warehousing Framework is the road map for product development and integration on the Microsoft platform.

click to view at full size

Figure 5.1 Microsoft Data Warehousing Framework

The Data Warehousing Framework has been designed to provide an open architecture for easy extension by Microsoft customers and business partners using industry-standard technology. This architecture allows organizations to choose best-of-breed components and still be assured of integration.

Ease of use is a compelling reason for customers and independent software vendors to choose the Microsoft Data Warehousing Framework. Microsoft provides an object-oriented set of components that are designed to manage information in the distributed environment. Microsoft is also providing both entry-level and best-of-breed products to address the many steps in the data warehousing process.

Data Warehousing Framework Components

The Data Warehousing Framework describes the relationships between the various components used in the process of building, using, and managing a data warehouse. The core of the Microsoft Data Warehousing Framework is the set of enabling technologies that include the data transport layer (OLE DB) and the integrated metadata repository. These two technologies allow for the interoperability of the many products and tools involved in the data warehousing process.

Building a data warehouse requires a set of tools for describing the logical and physical design of the data sources and their destinations in the data warehouse or data marts. Operational online transaction processing (OLTP) data must pass through a cleansing and transformation stage before being placed into the data warehouse or data marts in order to conform to the definitions laid out during the design stage. This data staging process is often many levels deep, especially with enterprise data warehouse architectures.

End-user tools, including desktop productivity products, specialized analysis products, and custom programs, are used to gain access to the information in the data warehouse. Ideally, user access is through a directory facility that enables users to search for appropriate and relevant data to resolve business questions; this facility provides a layer of security between the users and network systems.

Finally, a variety of tools, such as those for scheduling repeated tasks and managing multiserver networks, can come into play for the management of the data warehouse environment.

Microsoft Repository provides the integration point for the metadata shared by the various tools used in the data warehousing process. Shared metadata allows for the transparent integration of multiple tools from a variety of vendors, without the need for specialized interfaces between each of the products.

Similarly, OLE DB provides for standardized, high-performance access to a wide variety of data and allows for integration of multiple data types. OLE DB is described in detail in a later chapter.

Microsoft Components

Figure 5.2 illustrates which Microsoft tools to use for each of the corresponding data warehouse components. Microsoft provides a complete set of tools that satisfy the requirements for each major data warehousing component.

click to view at full size

Figure 5.2 How Microsoft components fit the framework

Microsoft recognizes that in a customer s implementation, not all of these components will be Microsoft products. However, Microsoft does provide a tool or technology for each major data warehousing component:

  • Visual data tools allow you to design data warehouses and data marts and give you the ability to create the actual databases in SQL Server 7.0 environments. These tools include the Database Diagram in SQL Server Enterprise Manager, the Cube Editor in OLAP Manager, and the visual data tools in Visual Studio.
  • OLE DB and Open Database Connectivity (ODBC) drivers are available from Microsoft for most major operational databases and file systems, including Oracle, DB2, DB2/400, VSAM, and AS/400 physical files.
  • Data Transformation Services (DTS), which are included in SQL Server 7.0, provide the essential extraction, cleansing, and loading functionality necessary for the data transformation process.
  • Data marts can be built using SQL Server 7.0 and online analytical processing (OLAP) Services regardless of whether the underlying data is stored in SQL Server.
  • Most Microsoft end-user tools are data warehouse enabled.
  • Tools that have features that work with the data warehouse database include Microsoft Internet Explorer, Microsoft Access, and Microsoft English Query.
  • Tools that have the capability to work with the multidimensional structure of OLAP Services as well as the data warehouse database include Microsoft Excel and Microsoft development tools, such as Microsoft Visual Basic.
  • The Microsoft Repository is based on the industry-standard Open Information Model (OIM), allowing Microsoft and third parties to share information stored there.
  • SQL Server 7.0 management and tuning tools provide a complete set of database management and performance optimization tools. These tools include SQL Server Query Analyzer, SQL Server Profiler, and SQL Server Performance Monitor.

Data Warehousing Framework

Figure 5.3 depicts how the Microsoft components and the Data Warehousing Framework fit in the data warehousing life cycle.

click to view at full size

Figure 5.3 How Microsoft components support the data warehousing life cycle

The process of creating a data warehouse can involve many systems, tools, and components. A number of years ago, Microsoft identified the simplification of that process as one of its priorities.

Nowadays, data warehouses are collecting data from all sorts of systems, not just relational databases. While ODBC proved great in standardizing the route to relational data, a new standard was needed to help with the multitude of data in other systems. From that need, OLE DB was born. Microsoft then took the concept further to permeate all areas of data warehousing and produce the Data Warehousing Framework. The framework is a blueprint for the development of open, interoperable tools and components for which the goal is to simplify the design, implementation, and management of data warehousing solutions.

The framework provides customizable and extensible components based on an open architecture.

Data Warehousing Framework Components

The following topics provide an overview of the components that make up the Data Warehousing Framework and the associated Microsoft tools and technologies that help to facilitate the data warehousing process. You will learn more details about these components and how to use them in later chapters. The components that make up the framework are:

  • OLE DB, ODBC, and OLE DB for online analytical processing (OLAP)
  • Data Transformation Services and Replication
  • OLAP Services
  • PivotTable Service
  • ADO and ADO MD
  • Microsoft English Query
  • Microsoft Repository

OLE DB, ODBC, and OLE DB for OLAP

At the core of the Data Warehousing Framework is OLE DB, an object-based, system-level programming interface that helps you access information across a variety of data sources. OLE DB is the Microsoft interface for managing data across an organization. ODBC provides access to data in relational databases, whereas OLE DB allows access to both relational and non-relational data. OLE DB for OLAP is an extension to OLE DB that adds multidimensional capabilities for accessing data stored in multidimensional cubes.

Benefits of OLE DB

The benefit of OLE DB for data warehousing is its ability to access a variety of sources, including

  • Mainframe indexed sequential access method (ISAM)/virtual storage access method (VSAM) and hierarchical databases
  • E-mail stores such as Microsoft Exchange
  • Relational databases with native OLE DB providers
  • Any source accessible through ODBC (by using the OLE DB provider for ODBC drivers)

Data Transformation Services and Replication

You cannot bring data directly into the data warehouse without first transforming the data. A data transformation describes the application of one or more operations to a piece of data, which helps bring forth a consistent, viable format for the data.

DTS is part of SQL Server and it provides a sophisticated set of tools for transferring and transforming data. DTS provides

  • Importing and exporting of data from heterogeneous data sources using OLE DB
  • An open architecture so that third-party tool vendors can create tools that use or integrate with DTS
  • The sharing of metadata through the Microsoft Repository Open Information Model

SQL Server replication is a technology for distributing data across an enterprise. SQL Server replication allows you to move and synchronize copies of your data across databases. Many data warehousing environments require that you move data automatically from a data warehouse database to a data mart database or the other way round and SQL Server replication is ideally suited to doing this.

OLAP Services

You have seen in previous chapters how business often looks at data in a multidimensional format. You have also seen how the most useful queries in an OLAP environment are those that examine trends by using aggregations. Figure 5.4 illustrates how OLAP Services support creating multidimensional aggregated structures and then querying them.

click to view at full size

Figure 5.4 OLAP Services

OLAP Services include a middle-tier server component that processes relational data to create multidimensional cubes and a client-side data caching and calculation service (PivotTable Service) that gives users intuitive data analysis capabilities.

Role of the Middle-Tier Component

OLAP Services give you the flexibility to choose the storage type that best fits your organizational needs. The OLAP Services middle-tier server component supports the creation of cubed data in three storage architectures:

  • Multidimensional OLAP (MOLAP) Aggregates for the multiple dimensions along with detail data are stored in a multidimensional structure on disk.
  • Relational OLAP (ROLAP) Both aggregates and detail data are stored in tables in a relational database management system (RDBMS).
  • Hybrid OLAP (HOLAP) The detail data remains in the RDBMS, but the aggregates are stored in a multidimensional structure.

NOTE
The OLAP Services Server is a completely separate server from SQL Server. Although designed to work extremely efficiently with the powerful capabilities of Microsoft SQL Server 7, the OLAP Server can use any OLE DB, ODBC, or text data as the data warehouse source database.

OLAP Services Wizards

OLAP Services come with the following wizards:

  • The Dimension wizard To help you create shared dimensional data.
  • The Cube wizard Provides a simple interface for specifying the fact table, measures, and dimensions when you build your cube.
  • The Data Storage and Aggregation wizard To help you set options for optimal cube performance and data storage.
  • The Virtual Cube wizard To help you to set up a virtual cube.
  • The Partition wizard To help you to break out a slice of the cubed data for storage elsewhere, perhaps even using a different server. Reducing cube size enhances query performance and storage.

NOTE
The Partition wizard is available only when you are using the version of OLAP Services that comes with the Enterprise Edition of SQL Server 7. The Enterprise Edition of SQL Server 7 requires Windows NT Server, Enterprise Edition.

PivotTable Service

The PivotTable Service is the client-side complement to the OLAP Services Server components. The PivotTable Service gives access to OLAP Services, provides application developers with a way to get at OLAP data, and allows you to store cubes locally, which enables faster processing and mobile analysis and presentation in a disconnected environment.

ADO and ADO MD

Microsoft ActiveX Data Objects (ADO) and Microsoft ActiveX Data Objects (Multidimensional) (ADO MD) provide access to information supplied by OLE DB and OLE DB for OLAP providers, in a manner that is simple for developers to use and is supported by all major development tools in the market today.

OLE DB and OLE DB for OLAP are standards for data access. Although OLE DB is very powerful, it is also complex enough to require a language like C++ to properly access it.

ADO is an application-level programming interface to OLE DB. It is easy to use and supports COM automation interface. ADO MD is the counterpart to ADO for access to multidimensional data. It is an application-level programming interface to OLE DB for OLAP. It also supports the COM automation interface. ADO and ADO MD are discussed in more detail, including programming examples, in Chapter 12, "MDX Statements and ADO MD Objects."

Microsoft English Query

Users of your data warehouse shouldn t have to know SQL syntax. Microsoft English Query allows end users to input queries in plain English instead of Transact-SQL. The value of this ability in a data warehousing environment is self-evident: It makes it much easier for end users to perform ad hoc querying.

English Query Engine

The English Query engine analyzes general English semantics to create the proper SQL syntax for the query. Additionally, the English Query Engine utilizes application-specific domain files, which contain mappings of specific nouns and verbs to specific database structures and joins.

If the English Query engine determines that more information is needed to execute a query, the user is prompted for more information.

Domain

You can create the domain with a domain editor and then save the file as a domain project. This file is compiled and used in conjunction with the engine to parse English into Transact-SQL.

NOTE
The domains associated with English Query are different from, but analogous to, Windows NT domains. A general definition of a domain is "a collection of information." A Windows NT domain is a collection of information about security and resources. An English Query domain is a collection of information about a database, including the English terms users employ to describe it.

Microsoft Repository

In the data warehouse environment, it is important to have an integrated approach to metadata rather than having multiple catalogs of metadata for each point solution. It is not sufficient to merely store the data; the needs of the analysis environment require that the architecture make available data about the data for example, comments and data lineage.

This component in the Data Warehousing Framework is Microsoft Repository, which allows for shared metadata across components of the framework. Metadata is information stored by various data warehousing tools. By sharing this information, the tools can be transparently integrated without the need for special interfaces.

Openness

Microsoft Repository is a database that stores descriptive data about the relationships between data warehouse components. It is a set of published COM interfaces based on the Open Information Model (OIM).

Transparency for Third-Party Tool Integration

The openness of Microsoft Repository allows third-party vendors transparent access to build their tools around it and leverage the strength of sharing metadata in a central place using the same protocol. The ability to share and reuse meta-data is key to the interoperability of tools from multiple vendors.

Data Lineage

Microsoft Repository allows data warehouse designers to store the origin of a piece of data in the repository. This ability allows you to implement a full audit trail, down to the row level, the date and time when the DTS package was executed, and who executed the package.

Lesson Summary

This lesson presented an overview of the Data Warehousing Framework. Before you develop a solution, it is important to have in-depth knowledge about the various components of the framework. While most components of the framework provide very distinct capabilities, in some situations you can choose from more than one component to achieve the same results for example, in some cases, data migration could be accomplished using either DTS or replication.



Microsoft Corporation - Microsoft SQL Server 7 Data Warehousing Technical Support Training Kit
Microsoft SQL Server 7 Data Warehousing Technical Support Training Kit
ISBN: 0735606706
EAN: 2147483647
Year: 1999
Pages: 114

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net