Lesson 1: What Is a Data Warehouse?

Interoperability standards have only recently been adopted by the industry. Therefore, data warehousing is an industry that has never looked better, but it is still maturing. In this lesson, you will learn about the history of data warehousing and be introduced to online analytical processing (OLAP) and the characteristics of data accessed through OLAP systems.

After this lesson, you will be able to:

  • Understand the need for standard data warehouse tools
  • Describe OLAP and how it compares with OLTP
  • Discuss the characteristics of data in a warehouse

Estimated lesson time: 20 minutes

History of Corporate Data Analysis

The idea of a data warehouse is not new. Even in the days before computers, a company s data warehouse was its knowledge about itself and its customers. Perhaps a valued long-time employee held this knowledge. Or maybe the president kept a storage closet full of year-end reports going back through time. Data warehousing is a new term to describe an integral part of any business. The main difference is that now computers make possible types of data access and analysis that were previously very difficult or impossible.

Paper Reports

Historically, cost has limited the use of mainframes and minicomputers to tracking transactions, with only limited reporting for analysis. Instead of providing trend-based data, reports mirrored the current state of the data in these systems. Since storage costs were prohibitive, historical data was stored primarily on paper rather than electronically.

Spreadsheet Analysis

In order to analyze data from these systems, corporations began utilizing PC-based spreadsheet programs to allow the input and analysis of data from transactional systems. Spreadsheets became the tool of choice for collecting data and analyzing resulting trends. For example, a sales manager may rekey the end-of-year sales figures from reports into spreadsheets to provide historical sales trend analysis. Unfortunately, it took a lot of time to create these spreadsheets and maintain their consistency with data stored in transactional systems. Since such spreadsheets often required manual re-entry of data, this process was time consuming and prone to error.

Executive Information Systems

To simplify the process of collecting and analyzing data, corporate information services (IS) departments began creating spreadsheets overnight and giving their employees copies of these prebuilt views of the data. These were packaged and delivered to executives and managers as Executive Information Systems. Software development houses began building custom applications whose sole function was to collect data from transactional and operational systems and present the data as trends or summaries. Although this approach improved on spreadsheet analysis by automating the data migration, it sometimes resulted in redundant data extracts and inconsistent interpretation of the data. For example, the extract for regional sales might compute cost of goods differently than the extract for sales by product line. Although the figures are based on the same data, there is the potential for inconsistency.

Proprietary Data Analysis/Warehouse Tools

As these systems grew, software developers began creating specific, proprietary tools to provide efficient aggregation and presentation of the data pulled from the corporate systems. One of the pioneers of these systems, Arbor Software, coined the term online analytical processing (OLAP) to differentiate these systems from the online transaction processing (OLTP) systems run by most corporations. Unfortunately, each of these developers created its own proprietary algorithms and application programming interfaces (APIs) that differentiated its systems from others on the market. This practice made these systems very expensive, difficult to install and maintain, and incompatible with one another.

Industry-Standard Data Warehouse Tools

In 1996, Microsoft began working with leading data warehouse vendors to define a set of standards that encourage interoperability of analysis, data management, and storage tools from different vendors. This initiative led to the creation and adoption of OLE DB for OLAP, a standard way for tools from multiple vendors to interact. SQL Server 7 includes all of these standards in a single high-performance, low-cost product.

What Is Online Analytical Processing?

OLAP is a process completed by people using tools to facilitate the analysis of dimensionally structured business information. OLAP allows users to navigate data by drilling down to lower-level detail, drilling up to higher-level summaries, and pivoting. OLAP tools support some level of calculation against the data.

OLAP can provide different levels of analysis through complex querying capabilities, intense data comparisons, trending, mining, and reporting. SQL Server OLAP Services is the combination of data and tools used to make decisions.

OLAP is most often associated with the process of querying and reporting, which is the opposite of an online transaction processing system. The term OLAP was specifically created to contrast with online transaction processing. These systems are further defined below.

OLAP Systems

OLAP systems are streamlined for efficient and effective querying capabilities against enterprise-level data that represents the business processes that make up the company.

OLAP is a set of technologies that takes the data in a data warehouse and transforms that data into multidimensional structures, called cubes, to allow for better response to complex queries. OLAP gives a better response than an OLTP system or a standard data warehouse by storing the data in dimensions.

For example, you may want to know product sales by month for several regions. The three dimensions products, regions, and time are summarized values, which is one of the purposes of OLAP. Cube structures store these summarized values and allow for rapid responses to user queries. In addition, because these values are stored in dimensions, users can switch between different dimensions. Users can look at total sales by product by month by region, or switch the numbers to look at total dollar sales by month by region, totaling on product.

OLTP Systems

OLTP systems are designed for real-time data entry and editing.

An example of an OLTP system is an order-entry system. Orders come into a call system and must be recorded quickly. Changes can be made as customers add or remove items. Canceled orders are removed from the system. The shipping department updates orders with the order status as it moves through the distribution process.

If you decide to do all your analysis against the OLTP system, you will encounter a large amount of data that changes constantly. In addition, your queries may slow down the system, which is often unacceptable in a high-volume environment.

OLAP vs. OLTP

OLAP systems are built to be queried, but not to have constant inserts, updates, and deletes. To avoid slowing down the OLTP systems, move the data into a separate database, and then perform all queries against the OLAP system. Because you are not concerned with the speed of inserts, updates, and deletes, you can store the OLAP data in a denormalized format and heavily index the tables to maximize the speed of queries.

OLAP Data Characteristics

Data in a data warehouse has several attributes that differentiate it from data in a standard, online transaction processing system.

Consolidated and Consistent

The terms consolidated and consistent have particular meanings in a data warehouse. Consolidated means that the data is gathered from throughout the enterprise and stored in a central location.

Consistent means that all users will get the same results to the same question, even if it is posed at different times. For example, the answer to the question, "What were the total sales for January 1997?" will be consistent whether the question is posed in 1997 or 2002.

The data is internally consistent in the warehouse, no matter how it is stored at its sources.

That different users will get the same results to the same question is not surprising, but historical data in a warehouse should not change over time unless it is necessary to make a change to the data to fix a problem. Maintaining the internal consistency of data can be one of the most difficult goals to achieve in a warehouse.

Example

If you have offices in both the United States and Canada, the U.S. offices may record employee salaries in U.S. dollars while the Canadian office records salaries in Canadian dollars. However, if you want to compare wages in the United States and Canada over time, you will have to convert one company s currency to the other, taking into account the exchange rate changes.

Another common problem is that Boolean fields may be stored as 0 and 1 in one database, true and false in another, and 0 and 1 in a third. As data is moved over into a warehouse, these values must be formatted the same way so that a single query looking for a true or false value will be able to find the proper records.

Subject Oriented

Data in a warehouse should include only key business information. Often, data in sources throughout the enterprise includes information that is not of use to decision makers in the company. Only subject-oriented data should be moved into a warehouse. Once in the warehouse, the data should be organized based on subject.

Historical

Data warehouse data is historical, which means that it does not change over time unless a problem existed with the data at the source. Data in a warehouse represents a snapshot in time, so a warehouse is accurate only to a certain point in the past.

Data in a warehouse often covers a long period of time; OLTP systems have only current or very recent data. Data over a long period of time allows the analysis of trends over time, including seasonal and long-term trends.

Read Only

Because data in a warehouse is historical, it is read only. Data in a warehouse changes only if errors are found in the original source data because if data is updated after it is in a warehouse, consistency is compromised.

Because data in a warehouse will not be updated or deleted, the warehouse can be structured to allow maximum speed and flexibility for queries, such as an aggressive use of indexes.

Granular

Data in an OLTP system is stored with maximum detail. Data in a data warehouse does not usually need to be stored with maximum detail. Instead, you can handle a certain level of summarization, so the data is stored with more or less granularity. The key of data warehouse design is to identify the appropriate level of summary. You can always summarize up, but you cannot drill down through a summary without the lower-level data. The base data must be kept to reformat or readjust the history to fit new or changing information needs.

Lesson Summary

A history of corporate data analysis reveals that data warehousing is not entirely new. However, the use of industry-standard tools and technologies is relatively new. The scope and flexibility of those online analytical processing tools is great, making an organized disciplined approach more desirable than ever.

The characteristics of OLAP are quite different from those of OLTP; for example, OLAP benefits from plentiful indexes, whereas OLTP performance can suffer if there are too many indexes. OLAP data is consolidated and consistent, subject oriented, historical, read only, and granular. With their relatively simple structure, OLAP databases are easier to understand. In addition, they do not change frequently, and they are structured for easy navigation. OLTP databases, on the other hand, are difficult for users to understand because their structure is more complex. They also change frequently to support business operations.



Microsoft Corporation - Microsoft SQL Server 7 Data Warehousing Technical Support Training Kit
Microsoft SQL Server 7 Data Warehousing Technical Support Training Kit
ISBN: 0735606706
EAN: 2147483647
Year: 1999
Pages: 114

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net