A date dimension has a special role in dimensional data warehousing. First and foremost, a date dimension contains times and times are of utmost importance because one of the primary functions of a data warehouse is to store historical data. Therefore, the data in a data warehouse always has a time aspect. Another unique aspect of a date dimension is that you have the option to generate the dates to populate the date dimension from within the data warehouse.
Note | Some data warehouses require both dates and times. |
In your data warehouse, you start by creating a date_dim table that has a date or datetime column. The values in this table will then become a reference for other tables with date values. In our data warehouse, the date_dim table relates to the sales_order_fact table through the date_sk surrogate key. For instance, all sales order rows in the fact table with February 6, 2007 order dates have a value of 1, because in the date_dim table the date_sk value for February 6, 2007 is 1. In addition to the date itself, the date dimension has other data, such as the month name and the quarter. Having the surrogate key relationship makes this additional data available to the sales orders. This means you can query the sales orders by month name and by quarter.
This chapter teaches you the three most common techniques for populating the date dimension in a dimensional data warehouse. The three techniques are
Pre-population
One date everyday
Loading the date from the source data