In this chapter you learn how to handle non-straight data sources, which are data sources that you cannot load directly to the data warehouse because they have different granularities than the dimension tables. This chapter starts by presenting examples of non-straight sources. It then lets you try to handle non-straight sources in the data warehouse you've been building since Chapter 1, “Basic Components.”
Let me explain how you can handle non-straight sources by revisiting the campaign source data in Chapter 16, "Muti-Path and Ragged Hierarchies" here.
CAMPAIGN SESSION, MONTH, YEAR 2006 First Campaign, 1, 2006 2006 First Campaign, 2, 2006 2006 First Campaign, 3, 2006 2006 First Campaign, 4, 2006 2006 Second Campaign, 5, 2006 2006 Second Campaign, 6, 2006 2006 Second Campaign, 7, 2006 2006 Third Campaign, 8, 2006 2006 Last Campaign, 9, 2006 2006 Last Campaign, 10, 2006 2006 Last Campaign, 11, 2006 2006 Last Campaign, 12, 2006
The granularity of the campaign data source is month because every row has a month element. Also a campaign may last more than a month, as shown by the 2006 First Campaign that ran for four months. This means, the source data repeats this campaign four times. That is four rows.
Let’s say your user wants to simplify the campaign source data preparation and prepare a maximum of one row for a campaign, regardless how long it runs. The new data format will have to be changed to the following.
CAMPAIGN_SESSION, START_MONTH, START_YEAR, END_MONTH, END_YEAR 2006 First Campaign, 1, 2006, 4, 2006 2006 Second Campaign, 5, 2006, 7, 2006 2006 Third Campaign, 8, 2006, 8, 2006 2006 Last Campaign, 9, 2006, 12, 2006
The data source is now much simpler than before it was reformatted. Each campaign is only represented by one row.
This is how you handle non-straight sources, by changing their formats so that they can be loaded to their dimension tables.