Solution Overview | Practical Business Intelligence with SQL Server 2005

In our solution, we'll create a conformed data warehouse that can logically accommodate the information available in each of the sales systems. Our extraction, transformation, and loading (ETL) process will transform the data from each data source into the shape needed for the data warehouse. The ETL process can be run on a weekly (or more frequent) basis to keep the information current. The budgeting information will be integrated into the warehouse, eliminating the manual process of tracking sales to budgets. Users will continue to be able to use familiar Excel spreadsheets to enter their budget figures.

Business Requirements

The high-level requirements to support the business objectives are as follows:

Present a consolidated view of the information. The information from the two source ERP systems needs to be consolidated, taking into account differences in product and territory dimensions and differences in representing transactional sales measures.
Present a comparison of budget versus actual sales. The business creates budgets using an Excel spreadsheet for each region to understand sales performance. The budgets are created at a higher level than the detailed level transactions, specifically region by product by quarter.

High-Level Architecture

We will use SQL Server Integration Services (SSIS) to read data from our data sources, perform any data transformations we need, and then store it in the data warehouse. Integration Services will also be used to control the sequence of events and processes, such as emptying tables, loading data in the proper order, and generating audit logs of the operations. The tasks and transforms necessary for each of these processes are stored in Integration Services packages. You develop packages using the graphic designer for SQL Server Integration Services projects in BI Development Studio. You can test and debug your packages in the development studio. Once your packages are ready, you deploy them to the Integration Services server, where they can be executed directly, or they can be invoked by a job step in a SQL Agent job.

We will load the full set of dimension data into our warehouse on a daily basis. This data has the business keys from the original companies. We will use our own surrogate keys to uniquely identify the products and regions and create a uniform model across the business. A result of reloading the dimensions is the surrogate keys can change, invalidating the facts previously loaded. For this example, we will reload all the sales for each company so the new surrogate keys will be reflected in the fact table. This works if the volume of data is small enough that the processing fits within your time constraints. We show you how to properly update existing dimensions and incrementally load facts in Chapter 8, "Managing Changing Data."

The cube data will remain available to users during the rebuild process through the built-in caching facilities of Analysis Services. During the loading of sales data, we will translate business keys into the new surrogate keys. We will load budget data quarterly from Excel spreadsheets from each company and perform the same business key translations as we do for sales. This will allow us to directly compare sales and budgets. We'll use the capability of Integration Services to loop through the filenames in a directory to automate the loading of the budget spreadsheets. Figure 4-1 illustrates the flow of data in our ETL process.

Figure 4-1. High-level architecture

Business Benefits

The solution will deliver the following benefits to the client:

The automation of data integration will lead to lower costs and improved awareness of business performance and provide a solid platform for future growth and acquisitions.
Access to more up-to-date information will improve the business' agility and responsiveness to changing market conditions.