Lesson 1: Fundamental Data Warehousing Concepts

This lesson discusses the concept of a data warehouse and introduces some of the terminology related to data warehousing.

After this lesson, you will be able to:

  • Describe the characteristics of a data warehouse and data mart
  • Differentiate between a data warehouse and a data mart
  • Identify common objectives of data warehousing

Estimated lesson time: 15 minutes

What Is a Data Warehouse?

Some people use the term data warehouse in a very general way. To them, any read-only collection of accumulated historical data is legitimately called a data warehouse. Others consider the term highly specific. To these people, certain methodologies must be adhered to, databases must be designed a certain way, and only specific front-end tools can be used to truly implement a data warehouse.

Knowing characteristics and benefits of data warehousing will enable you to formulate your own definition for your organization. To give you a high-level understanding of the term at this early stage, here is one functional definition of a data warehouse:

A data warehouse is a database specifically structured for query and analysis. A data warehouse typically contains data representing the business history of an organization. Data is usually less detailed and longer-lived than data from an online transaction processing (OLTP) system.

For example, a data warehouse may store daily order totals by customer over the past five years, whereas an OLTP system would store every order processed but retain those records for only a few months.

Even though the term may mean different things in different organizations and environments, some characteristics are common to all data warehouses:

  • Data is collected from other sources; for example, legacy systems or online transaction processing systems.
  • Data is made consistent prior to storage in the data warehouse.
  • Data is summarized. Data warehouses usually do not retain as much detail as transaction-oriented systems.
  • Data is longer-lived. Transaction systems may retain data only until processing is complete, whereas data warehouses may retain data for years.
  • Data is stored in a format that is convenient for querying and analysis.
  • Data is usually considered read only.

Several features that make up a data warehouse are discussed below.

Tool for Users

First and foremost, a data warehouse is a tool for users. A data warehouse allows users to examine historical data, analyze this data in a variety of ways, and make decisions based on it.

Data warehousing uses a variety of technologies, from software, such as database engines and query tools to hardware, such as symmetric multiprocessing (SMP) servers and Redundant Array of Inexpensive Disks (RAID) drives. All of this technology is useless, however, if users cannot quickly and easily retrieve answers to questions.

Collection of Data to Assist Decision Support and Analysis

A data warehouse cannot make decisions for humans. However, it can provide the information with which humans can make intelligent, informed decisions. A data warehouse normally includes data from across the enterprise, and this data is often summarized to a level that makes retrieving summarized data fast and efficient.

Data and Tools

A data warehouse and a data mart are databases. Data warehousing is the process that includes all the tools used to build the warehouse and the tools used to retrieve information and analyze it. One of the most important aspects of data warehousing is the software used to retrieve information. If analysts cannot easily use the tools, the data warehouse is of no use.

What Is a Data Mart?

A data mart is simply a smaller data warehouse. Usually the data in a mart is a subset of data that is found in an enterprisewide warehouse, as follows:

  • A data warehouse is for data throughout the enterprise.
  • A data mart is specific to a particular department.

Why Build Data Marts?

Data marts are not required. They can add time to the process by requiring you to update a number of marts regularly with new data. However, data marts also mean faster queries in each department because there is less data against which to query.

Dependent vs. Independent Data Marts

Data marts can be built in stages and by separate teams. The goal of the data warehousing team is to predefine the characteristics of dimensions that will be used throughout multiple data marts. Predefining these characteristics will create a conformity or dependency between the data marts and allow for an easy rollup into an enterprise-level data warehouse.

Independent data marts occur when multiple areas of the company build a data mart to satisfy their needs and their needs alone. In some situations, building separately may be a good solution if there is no intent to interact with other departments. However, this can prevent future integration and add development cost if there will be an interest in sharing information across departments. Industry experts have acknowledged that, a majority of the time, data marts must be rebuilt entirely according to a conformed (dependent) dimensional framework before they can be rolled up into a data warehouse successfully.

Lesson Summary

Both data warehouses and data marts describe the process of collecting data from other sources. Data is summarized into a read-only format that is convenient for querying and analysis. Whereas a data warehouse may encompass all systems and all departments or business processes within an enterprise, the scope of a data mart is usually smaller, encompassing a single department such as shipping or a single business process such as inventory. Both data warehouses and data marts have the common objective of better business decisions through better use of data.

NOTE
For consistency, the term data warehouse is used throughout this book. Most issues are equally applicable to data marts. In cases where the issues are not equally applicable, the differences are addressed in separate discussions of each environment.



Microsoft Corporation - Microsoft SQL Server 7 Data Warehousing Technical Support Training Kit
Microsoft SQL Server 7 Data Warehousing Technical Support Training Kit
ISBN: 0735606706
EAN: 2147483647
Year: 1999
Pages: 114

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net