Ragged Hierarchies


A hierarchy that does not have data in one or more level is called a ragged hierarchy. For example, if your users don’t have any campaign session for certain months, then the month dimension is said to have a ragged campaign hierarchy. In this section I explain the ragged hierarchy and how to apply it on campaign sessions.

Here’s a sample ragged campaign (in the ragged_campaign.csv file) that does not have January, April, September, October, November, and December 2006 campaign sessions.

 CAMPAIGN SESSION,MONTH,YEAR NULL,1,2006 2006 Early Spring Campaign,2,2006 2006 Early Spring Campaign,3,2006 NULL,4,2006 2006 Spring Campaign,5,2006 2006 Spring Campaign,6,2006 2006 Last Campaign,7,2006 2006 Last Campaign,8,2006 NULL,9,2006 NULL,10,2006 NULL,11,2006 NULL,12,2006

The script in Listing 16.6 loads the campaign session into the month_dim table.

Listing 16.6: Ragged campaign session

image from book
 /*****************************************************************/ /*                                                               */ /* ragged_campaign.sql                                           */ /*                                                               */ /*****************************************************************/ USE dw; TRUNCATE campaign_session_stg; LOAD DATA INFILE 'ragged_campaign.csv' INTO TABLE campaign_session_stg FIELDS TERMINATED BY ', ' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\r\n' IGNORE 1 LINES (   campaign_session , month , year ) ; UPDATE   month_dim a , campaign_session_stg b SET a.campaign_session = b.campaign_session WHERE     a.month = b.month AND a.year = b.year AND b.campaign_session IS NOT NULL ; UPDATE   month_dim a , campaign_session_stg b SET a.campaign_session = a.month_name WHERE     a.month = b.month AND a.year = b.year AND b.campaign_session IS NULL ; /* end of script                                                 */
image from book

To see how it works, clean up the existing campaign session column (set its value to NULL) by running the script in Listing 16.7.

Listing 16.7: Nullifying the campaign_session column

image from book
 /*****************************************************************/ /*                                                               */ /* nullify_campaign_session.sql                                  */ /*                                                               */ /*****************************************************************/ USE dw; UPDATE month_dim SET campaign_session = NULL ; /* end of script                                                 */
image from book

Run the script by using this command:

 mysql> \. c:\mysql\scripts\nullify_campaign_session.sql

You should see a message similar to this on your console.

 Database changed Query OK, 12 rows affected (0.05 sec) Rows matched: 96 Changed: 12 Warnings: 0

Now, run the ragged_campaign.sql script in Listing 16.7 to load the campaign session CSV file by calling the script file name.

 mysql> \. c:\mysql\scripts\ragged_campaign.sql

Here is what should be printed on the console.

 Database changed Query OK, 12 rows affected (0.05 sec) Query OK, 12 rows affected (0.05 sec) Records: 12  Deleted: 0  Skipped: 0  Warnings: 0 Query OK, 6 rows affected (0.07 sec) Rows matched: 6  Changed: 6  Warnings: 0 Query OK, 6 rows affected (0.06 sec) Rows matched: 6  Changed: 6  Warnings: 0

To verify that the month_dim table was populated, query the table using this SQL statement.

 mysql> select month_sk, month_name, year, campaign_session     -> from month_dim     -> where year = 2006;

The correct result is as follows.

 +----------+------------+------+----------------------------+ | month_sk | month_name | year | campaign_session           | +----------+------------+------+----------------------------+ |       11 | January    | 2006 | January                    | |       12 | February   | 2006 | 2006 Early Spring Campaign | |       13 | March      | 2006 | 2006 Early Spring Campaign | |       14 | April      | 2006 | April                      | |       15 | May        | 2006 | 2006 Spring Campaign       | |       16 | June       | 2006 | 2006 Spring Campaign       | |       17 | July       | 2006 | 2006 Last Campaign         | |       18 | August     | 2006 | 2006 Last Campaign         | |       19 | September  | 2006 | September                  | |       20 | October    | 2006 | October                    | |       21 | November   | 2006 | November                   | |       22 | December   | 2006 | December                   |   +----------+------------+------+----------------------------+ 12 rows in set (0.00 sec)

You can then run the campaign_session_path.sql script using this command.

 mysql> \. c:\mysql\scripts\campaign_session_path.sql

Here is the query result.

 +------------+----------------------------+----------+------+ | pc         | time                       | amt      | qty  | +------------+----------------------------+----------+------+ | Monitor    | 2005                       | 44500.00 |  125 | | Monitor    | January                    |  1000.00 | NULL | | Monitor    | January                    |  1000.00 | NULL | | Monitor    | April                      |  2500.00 | NULL | | Monitor    | April                      |  2500.00 | NULL | | Monitor    | 2005 Last Campaign         |  4000.00 | NULL | | Monitor    | July                       |  4000.00 | NULL | | Monitor    | October                    |  5000.00 | NULL | | Monitor    | October                    |  5000.00 | NULL | | Monitor    | November                   | 32000.00 |  125 | | Monitor    | November                   | 32000.00 |  125 | | Peripheral | 2005                       | 25000.00 |   70 | | Peripheral | November                   | 25000.00 |   70 | | Peripheral | November                   | 25000.00 |   70 | | Storage    | 2005                       | 69000.00 |  285 | | Storage    | 2005 Early Spring Campaign |  3000.00 | NULL | | Storage    | February                   |  1000.00 | NULL | | Storage    | March                      |  2000.00 | NULL | | Storage    | 2005 Spring Campaign       |  6500.00 | NULL | | Storage    | May                        |  3000.00 | NULL | | Storage    | June                       |  3500.00 | NULL | | Storage    | 2005 Last Campaign         |  4500.00 | NULL | | Storage    | August                     |  4500.00 | NULL | | Storage    | September                  |  1000.00 | NULL | | Storage    | September                  |  1000.00 | NULL | | Storage    | October                    |  8000.00 | NULL | | Storage    | October                    |  8000.00 | NULL | | Storage    | November                   | 37500.00 |  200 | | Storage    | November                   | 37500.00 |  200 | | Storage    | December                   |  8500.00 |   85 | | Storage    | December                   |  8500.00 |   85 |  +------------+----------------------------+----------+------+  31 rows in set (0.00 sec)

The query output shows that the rows of year and month levels are the same as the output of the non-ragged hierarchy’s campaign path. However, the months with no campaign are listed right above the months themselves. This is to say, the months roll themselves up to their non-existing campaign levels. For example, January does not have a campaign, so you see two January rows for Monitor (2nd and 3rd rows). The third row is for the month, the second represents the non-existing campaign row. The sales order amount (the amt column in the query output) for the non-existing campaign is the same as the month amount.

February and March for Storage belong to the same campaign, namely ‘2005 Early Spring Campaign’. Therefore, both months have one row each that roll up to their campaign; the sum of their sales order amounts is the amount for the campaign.



Dimensional Data Warehousing with MySQL. A Tutorial
Dimensional Data Warehousing with MySQL: A Tutorial
ISBN: 0975212826
EAN: 2147483647
Year: 2004
Pages: 149

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net