This chapter teaches you how to deal with a requirement for a measure that is not available in the source data. For example, the product source data (the product.txt file you used in Chapter 2, “Dimension History” and Chapter 8, “Regular Population”) do not contain product count information. If you need to know the number of products in your system, clearly you cannot simply pull this out of your data warehouse.
The factless fact technique can help. By using this technique, you can count the number of products by keeping track of product launches. You can create a factless fact table that has only surrogate keys from the product (what you count) and the date (when the count occurs) dimensions. We call the table a factless table because the table itself does not have a measure.
This chapter shows you how to apply the factless fact technique in your data warehouse.
In this section, I explain how to implement a product launch factless fact, including adding and initially populating the factless product_count_fact table.
The data warehouse schema to track the number of product launches is shown in Figure 21.1. Only the tables related to the product_count_fact table are shown.
Figure 21.1: Fact-less fact table
Use the script in Listing 21.1 to create the factless fact table.
Listing 21.1: Creating the product_count_fact table
/****************************************************************/ /* */ /* product_count.sql */ /* */ /****************************************************************/ USE dw; CREATE TABLE product_count_fact (product_sk INT , product_launch_date_sk INT) ; /* end of script */
Run the script by using the following command.
mysql> \. c:\mysql\scripts\product_count.sql
Now you need to create a product launch date view on the date dimension. You do that by applying role-playing on the date dimension. (Dimension role-playing was discussed in Chapter 13, “Dimension Role Playing.”) The view only gets product effective dates, not all dates in the date dimension. The product_launch_date_dim dimension is a subset dimension. The script in Listing 21.2 creates the required product_launch_date view.
Listing 21.2: Creating the product_launch_date view
/****************************************************************/ /* */ /* product_launch_date_dim.sql */ /* */ /****************************************************************/ USE dw; CREATE VIEW product_launch_date_dim ( product_launch_date_sk , product_launch_date , month_name , month , quarter , year , promo_ind , effective_date , expiry_date ) AS SELECT DISTINCT date_sk , date , month_name , month , quarter , year , promo_ind , b.effective_date , b.expiry_date FROM product_dim a , date_dim b WHERE a.effective_date = b.date ; /* end of script */
Run the script to create the product_launch_date_dim view using this command.
mysql> \. c:\mysql\scripts\product_launch_date_dim.sql
You can verify that the views were created successfully by issuing the show full tables command, like this.
mysql> show full tables like 'product_%date_dim'
The query result is as follows.
+----------------------------------+------------+ | Tables_in_dw (product_%date_dim) | Table_type | +----------------------------------+------------+ | product_launch_date_dim | VIEW | +----------------------------------+------------+ 1 row in set (0.00 sec)
Next, confirm that you have the product launch dates in the view.
mysql> select * from product_launch_date_dim \G
The result of the query is as follows.
*************************** 1. row *************************** product_launch_date_sk: 1 product_launch_date: 2005-03-01 month_name: March month: 3 quarter: 1 year: 2005 promo_ind: NULL effective_date: 0000-00-00 expiry_date: 9999-12-31 *************************** 2. row *************************** product_launch_date_sk: 731 product_launch_date: 2007-03-01 month_name: March month: 3 quarter: 1 year: 2007 promo_ind: NULL effective_date: 0000-00-00 expiry_date: 9999-12-31 2 rows in set (0.01 sec)
Note | The query output has two product launch dates (March 1, 2005 and March 1, 2007), which is correct since the product’s effective date in the product_dim table is either one of these two dates. |
You now need to initially populate the product_count_fact table. The script in Listing 21.4 loads the existing product launches to the product_count_fact table from the product_dim table. The script has two Insert statements. The first Insert adds the products that have not been updated. No SCD2 is applied.
The second Insert handles products that have been through SCD2. Note that the Select statement returns the earliest inserted rows when you do a GROUP BY, so it correctly picks up the effective date of a product when it is launched, not the product’s effective date of an SCD2 row.
Listing 21.4: The script for initially populating the product count
/**********************************************************************/ /* */ /* product_count_ini.sql */ /* */ /**********************************************************************/ USE dw; /* for new products */ INSERT INTO product_count_fact (product_sk, product_launch_date_sk) SELECT a.product_sk , b.date_sk FROM product_dim a , date_dim b WHERE a.effective_date = b.date GROUP BY product_code HAVING COUNT (product_code) = 1 ; /* for products that have been updated by SCD2 */ INSERT INTO product_count_fact (product_sk, product_launch_date_sk) SELECT a.product_sk , b.date_sk FROM product_dim a , date_dim b WHERE a.effective_date = b.date GROUP BY product_code HAVING COUNT (product_code) > 1 /* end of script */
Run the script in Listing 21.4 using this command.
mysql> \. c:\mysql\scripts\product_count_ini.sql
You will see this message on the console.
Database changed Query OK, 3 rows affected (0.12 sec) Records: 3 Duplicates: 0 Warnings: 0 Query OK, 1 row affected (0.07 sec) Records: 1 Duplicates: 0 Warnings: 0
Then, confirm that the initial population was successful by querying the product_count_fact table using this statement.
mysql> select * from product_count_fact;
Here is the query result.
+------------+------------------------+ | product_sk | product_launch_date_sk | +------------+------------------------+ | 2 | 1 | | 5 | 731 | | 1 | 1 | | 3 | 1 | +------------+------------------------+ 4 rows in set (0.06 sec)