WHAT IS A DATA WAREHOUSE?

only for RuBoard - do not distribute or recompile

WHAT IS A DATA WAREHOUSE?

We will start by giving a definition of data warehousing, and then we'll go on to explore the definition by comparing and contrasting the features of a data warehouse with the features of operational systems that we have already described.

The accepted definition of a data warehouse (attributed to Bill Inmon, 1992) is a database that contains the following four characteristics:

  1. Subject oriented

  2. Nonvolatile

  3. Integrated

  4. Time variant

Subject oriented means that the data is organized around subjects (such as Sales) rather than operational applications (such as order processing). Operational databases are organized around business application; they are application oriented.

Recall the five queries that the directors have identified as examples of the types of questions they would like to ask of their data.

We concluded that they are concerned with sales of products over time. The subject area in our case study is clearly sales.

Nonvolatile means that the data, once placed in the warehouse, is not usually subject to change. Anyone who is using the database has confidence that a query will always produce the same result no matter how often it is run. Operational databases are extremely volatile in that they are constantly changing. A query is unlikely to produce the same result twice if it is accessing tables which are frequently updated.

Integrated means the data is consistent. For instance, dates are always stored in the same format. Integration is a problem for most organizations, particularly where there are many different types of technology in use. Some differences are quite fundamental, such as the character set. Most systems use the ASCII (American Standard Code for Information Interchange) character set, but some do not. IBM, which is one of the largest computer manufacturers in the world, bases all of its mainframe systems and many of its midrange systems on a totally different character set called EBCDIC (Extended Binary Coded Decimal Interchange Code). So the letter P has a decimal value of 80 in ASCII but is 215 in EBCDIC (the character with a value of 80 in EBCDIC is & ). The word Pool in ASCII translates to &??% in EBCDIC, and it's difficult to imagine anything less integrated than this.

Other differences are more subtle, such as dates. Most DBMSs have a Date data type (although the storage format is different from one DBMS to another), whereas access methods such as indexed sequential have no such facility.

Even more subtle differences occur within different applications within the same technology. This occurs where, for instance, one application designer decides to hold customer addresses as five columns of 25 characters each, whereas another might use a Varchar(100) format.

Before data is allowed to enter the data warehouse, it must be integrated. So integration is a process through which the data passes after it leaves the application database and before it enters the warehouse database.

Time variant means that historical data is recorded. Almost all queries executed against a data warehouse have some element of time associated within them. We have already established that most operational systems do not retain historical information. It is almost impossible to predict what will happen in the future without observing what happened in the past. A data warehouse helps to address this fundamental issue by adding a historical dimension to the data taken from the operational databases.

One of the most successful techniques in designing data warehouses is called dimensional analysis and dimensional modeling. We will now move on to examine this technique.

only for RuBoard - do not distribute or recompile


Designing a Data Warehouse . Supporting Customer Relationship Management
Designing A Data Warehouse: Supporting Customer Relationship Management
ISBN: 0130897124
EAN: 2147483647
Year: 2000
Pages: 96
Authors: Chris Todman

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net