I explain in this section what key aspects of source data you need to identify before you can begin writing the initial population script.
The first step is to identify what source data is needed and available for every fact and every dimension of the data warehouse. You also need to know the characteristics of the data source, such as its file type, record structure, and accessibility.
A sample document that summarizes the key information of the source data for the sales order data warehouse, including the file types, formats, and data warehouse target tables, is shown in Table 7.1. This type of table is often called the data source map because it maps every data from the source to the target The activity to produce the table is called data source mapping.
Note | The source data for customer and product dimensions map directly to their target data warehouse tables, the customer_dim and product_dim tables. On the other hand, the sales order transaction is the source for more than one data warehouse table. You learn a more complex source data in Chapter 20, “Non-Straight Sources.” |
Source Data | Data Type | File/Table Name | Target Table in Data Warehouse |
---|---|---|---|
Customer | CSV text file | customer.csv | customer_dim |
Product | Fixed-width text file | product.txt | product_dim |
Sales order transaction | MySQL | sales_order in source database | order_dim |
sales_order_fact | |||
date_dim, if you use “Loading the Date from Source” option, but you’ll use pre-population here |