Chapter 21: Factless Facts


This chapter teaches you how to deal with a requirement for a measure that is not available in the source data. For example, the product source data (the product.txt file you used in Chapter 2, “Dimension History” and Chapter 8, “Regular Population”) do not contain product count information. If you need to know the number of products in your system, clearly you cannot simply pull this out of your data warehouse.

The factless fact technique can help. By using this technique, you can count the number of products by keeping track of product launches. You can create a factless fact table that has only surrogate keys from the product (what you count) and the date (when the count occurs) dimensions. We call the table a factless table because the table itself does not have a measure.

This chapter shows you how to apply the factless fact technique in your data warehouse.

Product Launch Factless Facts

In this section, I explain how to implement a product launch factless fact, including adding and initially populating the factless product_count_fact table.

The data warehouse schema to track the number of product launches is shown in Figure 21.1. Only the tables related to the product_count_fact table are shown.

image from book
Figure 21.1: Fact-less fact table

Use the script in Listing 21.1 to create the factless fact table.

Listing 21.1: Creating the product_count_fact table

image from book
 /****************************************************************/ /*                                                              */ /* product_count.sql                                            */ /*                                                              */ /****************************************************************/ USE dw; CREATE TABLE product_count_fact (product_sk INT , product_launch_date_sk INT) ; /* end of script                                                */
image from book

Run the script by using the following command.

 mysql> \. c:\mysql\scripts\product_count.sql

Now you need to create a product launch date view on the date dimension. You do that by applying role-playing on the date dimension. (Dimension role-playing was discussed in Chapter 13, “Dimension Role Playing.”) The view only gets product effective dates, not all dates in the date dimension. The product_launch_date_dim dimension is a subset dimension. The script in Listing 21.2 creates the required product_launch_date view.

Listing 21.2: Creating the product_launch_date view

image from book
 /****************************************************************/ /*                                                              */ /* product_launch_date_dim.sql                                  */ /*                                                              */ /****************************************************************/ USE dw; CREATE VIEW product_launch_date_dim (   product_launch_date_sk , product_launch_date , month_name , month , quarter , year , promo_ind , effective_date , expiry_date ) AS SELECT DISTINCT   date_sk , date , month_name , month , quarter , year , promo_ind , b.effective_date , b.expiry_date FROM   product_dim a , date_dim b WHERE     a.effective_date = b.date ; /* end of script                                                */
image from book

Run the script to create the product_launch_date_dim view using this command.

 mysql> \. c:\mysql\scripts\product_launch_date_dim.sql

You can verify that the views were created successfully by issuing the show full tables command, like this.

 mysql> show full tables like 'product_%date_dim'

The query result is as follows.

 +----------------------------------+------------+ | Tables_in_dw (product_%date_dim) | Table_type | +----------------------------------+------------+ | product_launch_date_dim          | VIEW       | +----------------------------------+------------+ 1 row in set (0.00 sec)

Next, confirm that you have the product launch dates in the view.

 mysql> select * from product_launch_date_dim \G

The result of the query is as follows.

 *************************** 1. row *************************** product_launch_date_sk: 1    product_launch_date: 2005-03-01             month_name: March                  month: 3                quarter: 1                   year: 2005              promo_ind: NULL         effective_date: 0000-00-00            expiry_date: 9999-12-31 *************************** 2. row *************************** product_launch_date_sk: 731    product_launch_date: 2007-03-01             month_name: March                  month: 3                quarter: 1                  year: 2007             promo_ind: NULL        effective_date: 0000-00-00           expiry_date: 9999-12-31 2 rows in set (0.01 sec)

Note 

The query output has two product launch dates (March 1, 2005 and March 1, 2007), which is correct since the product’s effective date in the product_dim table is either one of these two dates.

You now need to initially populate the product_count_fact table. The script in Listing 21.4 loads the existing product launches to the product_count_fact table from the product_dim table. The script has two Insert statements. The first Insert adds the products that have not been updated. No SCD2 is applied.

The second Insert handles products that have been through SCD2. Note that the Select statement returns the earliest inserted rows when you do a GROUP BY, so it correctly picks up the effective date of a product when it is launched, not the product’s effective date of an SCD2 row.

Listing 21.4: The script for initially populating the product count

image from book
 /**********************************************************************/ /*                                                                    */ /* product_count_ini.sql                                              */ /*                                                                    */ /**********************************************************************/ USE dw; /* for new products                                                   */ INSERT INTO product_count_fact (product_sk, product_launch_date_sk) SELECT   a.product_sk , b.date_sk FROM   product_dim a , date_dim b WHERE a.effective_date = b.date GROUP BY product_code HAVING COUNT (product_code) = 1 ; /* for products that have been updated by SCD2                        */ INSERT INTO product_count_fact (product_sk, product_launch_date_sk) SELECT   a.product_sk , b.date_sk FROM   product_dim a , date_dim b WHERE a.effective_date = b.date GROUP BY product_code HAVING COUNT (product_code) > 1 /* end of script                                                      */
image from book

Run the script in Listing 21.4 using this command.

 mysql> \. c:\mysql\scripts\product_count_ini.sql

You will see this message on the console.

 Database changed Query OK, 3 rows affected (0.12 sec) Records: 3  Duplicates: 0  Warnings: 0 Query OK, 1 row affected (0.07 sec) Records: 1  Duplicates: 0  Warnings: 0

Then, confirm that the initial population was successful by querying the product_count_fact table using this statement.

 mysql> select * from product_count_fact;

Here is the query result.

 +------------+------------------------+ | product_sk | product_launch_date_sk | +------------+------------------------+ |          2 |                      1 | |          5 |                    731 | |          1 |                      1 | |          3 |                      1 | +------------+------------------------+ 4 rows in set (0.06 sec)



Dimensional Data Warehousing with MySQL. A Tutorial
Dimensional Data Warehousing with MySQL: A Tutorial
ISBN: 0975212826
EAN: 2147483647
Year: 2004
Pages: 149

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net