In this chapter you learned measure additivity. You proved that the order_amount measure of the sales_order_fact table is fully additive.
A dimensional query is a query in a dimensional data warehouse that joins the fact table and the dimension tables on one or more surrogate keys. This chapter teaches you the dimensional query pattern and how to apply it to the three most common query types: aggregation, specific, and inside out.
Aggregate queries aggregate individual facts, for example, by adding up measure values. In a specific query you query the facts of a specific dimension value. While most queries specify one or more dimensions as the selection criteria (constraints), an inside out query's criteria are measure values. Understanding these three most common query types allows you to apply dimensional queries to other query types.
In this section I explain how you can apply dimensional queries on the three most common query types: aggregate, specific, and inside-out queries.
To apply dimensional queries, you first need to add data to your data warehouse by running the script in Listing 4.1. You need this additional data to test the dimensional queries in Listings 4.2 to 4.7.
Listing 4.1: Script for that adds data for testing dimensional queries
|
|
/*****************************************************************/ /* */ /* dimensional_query_data.sql */ /* */ /*****************************************************************/ USE dw; INSERT INTO order dim VALUES (NULL, 11, CURRENT_DATE, '9999-12-31') , (NULL, 12, CURRENT_DATE, '9999-12-31') , (NULL, 13, CURRENT_DATE, '9999-12-31') , (NULL, 14, CURRENT_DATE, '9999-12-31') , (NULL, 15, CURRENT_DATE, '9999-12-31') , (NULL, 16, CURRENT_DATE, '9999-12-31') ; INSERT INTO date_dim VALUES (NULL, '20075-0211-016', 'FebruaryNovember', 112, 41, 20057, CURRENT_DATE, '9999-12-31') ; INSERT INTO sales_order_fact VALUES (11, 1, 2, 2, 20000) , (12, 2, 3, 2, 25000) , (13, 3, 4, 2, 30000) , (14, 4, 2, 2, 35000) , (15, 5, 3, 2, 40000) , (16, 1, 4, 2, 45000) ; /* end of script */
|
|
Before you start, change your MySQL date to February 6, 2007; and run the script in Listing 4.1 to insert six orders into the order_dim table, one date into the date_dim table, and six sales orders into the sales_order_fact table.
mysql> \. c:\mysql\scripts\dimensional_query_data.sql
You’ll see this on your MySQL console.
Database changed Query OK, 6 rows affected (0.05 sec) Records: 6 Duplicates: 0 Warnings: 0 Query OK, 1 row affected (0.06 sec) Query OK, 6 rows affected (0.06 sec) Records: 6 Duplicates: 0 Warnings: 0
Now that you have the necessary data, you’re ready to apply dimensional queries to the three types of queries mentioned earlier.
Aggregate queries summarize (aggregate) individual facts. The measure values are typically summed, even though count is also a common aggregate. Two examples are discussed in this section.
The dimensional query in Listing 4.2 gives you the daily sales summary. The aggregation of the order amounts and number of orders is done by date. Note that the join between the sales_prder_fact table and date_dim table is on their surrogate keys.
Listing 4.2: Daily Aggregation
|
|
/*****************************************************************/ /* */ /* daily_aggregation.sql */ /* */ /*****************************************************************/ SELECT date , SUM (order_amount) , COUNT(*) FROM sales_order_fact a , date_dim b WHERE a.order_date_sk = b.date_skGROUP BY date ORDER BY date ; /* end of script */
|
|
Run the query using this command.
mysql> \. c:\mysql\scripts\daily_aggregation.sql
Here is the output of this query.
+----------------+--------------------+-----------+ date SUM (order_amount) COUNT(*) +----------------+--------------------+-----------+ 2007-02-05 58000.00 10 2007-02-06 195000.00 6 +----------------+--------------------+-----------+ 2 rows in set (0.03 sec)
The query result shows the daily total order amounts (sum) and the number of orders (count) of all orders.
The dimensional query in Listing 4.3 gives you the annual sales summary. The order amounts and the number of orders are not only aggregated by date, but also by product and customer city. The three joins, between the fact table and each of the three dimension tables (date, product, and customer dimensions), are on the surrogate keys.
Listing 4.3: Annual aggregation
|
|
/*****************************************************************/ /* */ /* annual_aggregation.sql */ /* */ /*****************************************************************/ SELECT year, product_name, customer_city, SUM (order_amount), COUNT(*) FROM sales_order_fact a , date_dim b , product_dim c , customer_dim d WHERE a.order_date_sk = b.date_sk AND a.product_sk = c.product_sk AND a.customer_sk = d.customer_sk GROUP BY year, product_name, customer_city ORDER BY year, product_name, customer_city ; /* end of script */
|
|
Run the script as
mysql> \. c:\mysql\scripts\annual_aggregation.sql
Here is the output of the query
+------+-----------------+---------------+---------------+---------+ year product_name customer_city SUM COUNT(*) (order_amount) +------+-----------------+---------------+---------------+---------+ 2007 Floppy Drive Mechanicsburg 70000.00 5 2007 Floppy Drive Pittsburgh 8000.00 1 2007 Hard Disk Drive Mechanicsburg 46000.00 2 2007 Hard Disk Drive Pittsburgh 34000.00 3 2007 LCD Panel Mechanicsburg 61000.00 3 2007 LCD Panel Pittsburgh 34000.00 2 +------+-----------------+---------------+---------------+---------+ 6 rows in set (0.03 sec)
The query result