Defining the Basic Terms

 <  Day Day Up  >  

Although data warehousing is a pervasive term , used throughout the IT industry, there is a lot of misunderstanding as to what a data warehouse actually is. This section will provide a good introductory treatment of data warehousing and the terminology used when discussing data warehouses.

What Is a Data Warehouse?

A data warehouse is best defined by the type and manner of data stored in it and the people who use that data. The data warehouse is designed for decision support providing easier access to data and reducing data contention . It is separated from the day-to-day OLTP applications that drive the core business. A data warehouse is typically read-only with the data organized according to the business rather than by computer processes. The data warehouse classifies information by subjects of interest to business analysts, such as customers, products, and accounts. Data in the warehouse is not updated; instead, it is inserted (or loaded) and then read multiple times.

Warehouse information is historical in nature, spanning transactions that have occurred over the course of many months and years . For this reason, warehouse data is usually summarized or aggregated to make it easier to scan and access. Redundant data can be included in the data warehouse to present the data in logical, easily understood groupings.

Data warehouses contain information that has been culled from operational systems, as well as possibly external data (such as third-party point-of-sale information). Data in the data warehouse is stored in a singular manner for the enterprise, even when the operational systems from which the data was obtained store it in many different ways. This fact is important because the analyst using the data warehouse must be able to focus on using the data instead of trying to figure out the data or question its integrity.

A typical query submitted to a data warehouse is: "What was the total revenue produced for the central region for product 'x' during the first quarter?"

To summarize, a data warehouse is a collection of data that is

  • Separate from operational systems

  • Accessible and available for queries

  • Subject-oriented by business

  • Integrated and consistently named and defined

  • Associated with defined periods of time

  • Static, or non-volatile, such that updates are not made

Operational Data Versus the Data Warehouse

The purpose and intent of a data warehouse differ substantially from operational databases supporting OLTP and production systems, such as order entry, shipping, and inventory control (see Table 45.1). Operational databases are typically used by clerical or line workers doing the day-to-day business of an organization. Additionally, operational data is atomic in nature, continually changes as updates are made, and reflects only the current value of the last transaction.

Table 45.1. Operational Data Versus Warehouse Data

Operational Data

Warehouse Data

Atomic

Summarized

Production Support

Analytical

Application-Oriented

Subject-Oriented

Current

Historical

Dynamic

Static


What Is a Data Mart?

The term data mart is used almost as often as the term data warehouse . But how is a data mart different from a data warehouse? A data mart is basically a departmental data warehouse defined for a single (or limited number of) subject area(s).

Data in data marts need not be represented in the corporate data warehouse, if one even exists. Breadth of data in both data marts and corporate data warehouses should be driven by the needs of the business. Therefore, unless the departmental data is required for enterprise-wide analysis, it may not exist in the corporate data warehouse.

A data mart is not necessarily smaller in size than an enterprise data warehouse. It may be smaller, but size is determined based on business needs. Departmental analysis at the business unit level may require more historical information than cross-department, enterprise-wide analysis.

What Is an Operational Data Store?

An Operational Data Store (ODS) provides a centralized view of near real-time data from operational systems. The ODS is optional in a data warehousing environment. If used, it is populated from multiple operational databases or may be used directly as the data store for multiple operational applications. The ODS can then be used as a staging area for data warehouse population (as shown in Figure 45.1).

Figure 45.1. The Operational Data Store.
graphics/45fig01.gif

An ODS is a collection of data that is

  • Used by operational systems

  • Subject-oriented by business

  • Integrated and consistently named and defined

  • Current, up-to-date (as opposed to historical)

  • At the detail level (as opposed to summarized)

  • Dynamic, or volatile, to support operational systems

What Is OLAP?

OLAP stands for On-Line Analytical Processing. OLAP technology is often used in conjunction with a data warehouse. OLAP technology enables high-level end users (analysts, managers executives, and so on) to derive intelligence from data through interactive and iterative access to multiple views of information (typically stored in a data warehouse).

OLAP uses a multidimensional view of detail, summary, and aggregate data to access information for further analysis. The key term here is multidimensional . A dimension is a structural attribute viewed as similar by the end user . For example, months, quarters , years, and so on make up a time dimension; likewise, all cities, regions , countries , and so on could comprise a geography dimension.

Simply stated, a dimension is a modifier of the basic fact that must be analyzed . Examples of facts include sales figures, expenses, and inventory on hand. Multiple dimensions affect the value of these facts. For example, sales differ by geography (for example, sales region), time (for example, first quarter), product (for example, widgets versus flanges), and any other number of factors.

OLAP is characterized by dynamic multidimensional analysis, enabling complex calculations applied across dimensions, across components of a dimension, and/or through hierarchies. Additionally, OLAP provides analysis and trending capabilities over time, subsetting of data, drill-down through varying levels of detail, reach-through to operational data, and methods for comparing different analytical views of data.

OLAP calculations are usually more complex than simple data summarization and aggregation. For example, the following is a typical OLAP query: "What would be the effect on net revenue if account maintenance fees for demand deposit accounts went up by 3% in conjunction with a customer affinity program that reduced the fee by 1% for every additional account held by the customer?" Answering this question is not simple.

The technology used to store the aggregate data on which OLAP operates can be relational or a proprietary multidimensional format. If the data is stored in a relational database, such as DB2, the term ROLAP , or Relational OLAP, is used; if a multidimensional database is deployed, such as Essbase (which IBM has licensed from Hyperion Software and delivered as the DB2 OLAP Server), the term MOLAP , or Multidimensional OLAP, is used.

This introduction covers the basics of OLAP but is necessarily brief. To cover OLAP in depth could take an entire book.

 <  Day Day Up  >  


DB2 Developers Guide
DB2 Developers Guide (5th Edition)
ISBN: 0672326132
EAN: 2147483647
Year: 2004
Pages: 388

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net