Back in Chapter 2, we identified some key software architectural decisions the DBA must make. Principal among those were how many instances will comprise the data warehouse and will the data warehouse data be loaded in one step or two? In this chapter, we will focus on just the latter issue ”loading the data. When loading data into a data warehouse, there are two options (shown in Figure 6-1):
Figure 6-1. Common Data Warehouse ETL Architectures
We'll refer to these approaches as transform, then load and load, then transform, respectively. We reviewed the pros and cons of these approaches back in Chapter 2. Now, we'll examine optimally implementing them using Oracle 8i and 9i in the sections that follow. The implementation options are somewhat different since the database versions' features are different (i.e., newer Oracle versions tend to offer newer , better data warehousing solutions). We also reviewed back in Chapter 2 that data loading programs must be designed to utilize SMP/MPP multi-CPU architectures, otherwise CPU usage may not exceed 1/No. of CPUs. The Golden Rules are very simple:
Our goal will be to achieve a parallel loading architecture something like Figure 6-2. Figure 6-2. True Parallel ETL Processing via Forking
However, we must first examine two other criteria before we can settle on our final data loading architecture: upstream data sources and data transformation requirements. |