Introduction to Data Warehouses and OLAP Cubes

You will want to be familiar with what OLAP cubes are so that you can make full use of them. You can trace how operational data becomes an OLAP cube in this section.

Operational Data

A traditional Online Transaction Processing (OLTP) database such as Access or SQL Server will store data across several related tables. In the convenience store chain, the cash registers might store data in tables called Customers, Orders, Order Details, and Products. Figure 10.1 shows the table structure for the cash register.

Figure 10.1. Sales at the convenience store are likely stored across several tables in a relational database.

The cash register database is recording all sales, so you can keep a record of what you're selling, know when to replenish your store's inventory, and calculate the profit margin on all your items and the day's total sales. Of course, that database is going to keep on growing with each day's new sales.

Rather than leaving the data to grow on the cash registers, where it won't be of much use to the executives back at the home office, it is sent upstream (daily, hourly, or perhaps even as each sale happens) to the companywide sales database.

After just a few months of receiving data from your 250 stores, you've got a whole lot of data in that database in the home office. No single record makes much difference to you, really, because one record or sale by itself doesn't tell you where your business has been or where it is headed. You need summaries and other aggregations of all kinds if you want to pull meaningful information out of the raw data. However, you don't want to send in long-running queries to this database, because it's very busy receiving so much operational information from the stores. Trying to generate reports on all that data, or even a subset of it, would slow things down on the operations side.

The answer is to move the data from that companywide sales database into another database that is structured for easier and faster querying. This is known as a data warehouse.

Warehousing Your Data

The format of the data warehouse, although still a relational database, looks very different from the transaction database. As shown in Figure 10.2, sales are stored in a central fact table, with lookups out to other tables in order to provide product hierarchies, store hierarchies, and time hierarchies as shown in Figure 10.2.

Figure 10.2. The central fact table will contain hundreds of millions of records per year.

Each record in the fact table will have two categories of information. The first category is dimensional information, such as what time the sale was made, in what store, and what item was sold. The second category is measure information, examples of which include the price of the item, the cost of the item, and how many of that item were sold in that particular transaction.

Measure information is typically numeric. This is the data you can total and otherwise aggregate.

Dimension information provides meaning to the totals. You can slice your total sales along the time dimension and the product dimension to find out that between 3:00 a.m. and 5:00 a.m., stores in the Central Time Zone sell more laundry detergent than any other region does at that time of night.

Enter the Cube

A summary table is a prebuilt file that contains every possible aggregation of every measure, by each dimension. One way to prebuild all the possible summary tables is to create a multidimensional data structure (commonly called a cube). This is different from the usual relational database data structure, wherein each cell (the intersection of a field and record) holds one and only one piece of data. Any "cell" in a cube can have multiple pieces of data, or even none; further, each cell's data is aggregated on one or more dimensions. The OLAP server will handle converting the fact table in Figure 10.2 to the virtual cube shown in Figure 10.3.

Figure 10.3. An OLAP cube summarizes measures along various dimensions.

If this structure is called a cube, does that mean it is limited to three dimensions? People call it a cube to enable you to visualize the data in your mind. However, it is possible to have far more than three dimensions.

Cubes Offer Prebuilt Data Views

The bottom line is that when you have data in Excel, you might have to go through the work described in Chapter 2, "Creating a Basic Pivot Table," to groom and organize your data. However, when you have an OLAP cube, the data has already been groomed and organized for you.

An OLAP cube is essentially a data structure that already contains all the possible aggregations by all the available dimensions. Now all you have to do is point Excel's pivot table at that OLAP cube, and then you can use just about any neat pivot table trick discussed in this book to generate various reports and charts.

    Pivot Table Data Crunching
    Pivot Table Data Crunching for Microsoft Office Excel 2007
    ISBN: 0789736012
    EAN: 2147483647
    Year: 2003
    Pages: 140 © 2008-2017.
    If you may any questions please contact us: