The goal throughout the rest of this book is to transform the OLTP-oriented Northwind database into a fully functional data warehouse. You will build a data warehouse that will service the needs of other activities performed within the enterprise, like querying and reporting on the data. These activities would not be as efficient if they were pulling directly from a highly normalized system such as those typically found in transaction processing applications.
In this lesson, you will be introduced to the Northwind database and review the life cycle for creating a data warehouse.
After this lesson, you will be able to:
- Describe the structure of the Northwind database
- Understand the life cycle of data in a warehouse
- Identify the components of the Data Warehousing Framework in the context of a data warehousing architecture
Estimated lesson time: 25 minutes
The Northwind sample database is installed with SQL Server. The database contains the sales data for a fictitious company called Northwind Traders, which imports and exports specialty foods from around the world.
This is a highly normalized relational database. The design complements the activities of a transaction processing system. Northwind Traders has a series of people who use a data entry screen to take orders for the products that they sell. Normalized databases are better suited for these activities because of the many data inserts and modifications.
In this exercise, you will review the relationship diagram of the Northwind database to get more information about the design of the database. In the exercise, you will use the Diagram Editor in SQL Server Enterprise Manager to view the diagram; there is a copy of the diagram in Appendix B, Database Schemas. The Northwind database is the starting point for our data warehousing process.
You will see the tables and their relationships in the Diagram Editor. This tool allows you to create tables and their relationships and is very efficient for building and maintaining the structure of your databases.
Notice the tables used to store the information on an order. These will allow you to answer the following questions for each order:
Who purchased the products ordered?
Who took the orders from the customers?
Who will ship the products ordered?
Notice the tables that reflect information for the products. The Product table contains attributes (table columns) that describe each product. Further related information about the supplier from which a product was purchased and the category to which it belongs is stored in the Suppliers and Categories tables. This design is a prime example of normalizing the structure to separate the data into smaller tables to describe different entities.
A closer look at the diagram will show that each order is actually stored in two tables. The header information, which is stored in the Orders table, explicitly describes the order. The detail information is stored in the Order Details table, which is able to store multiple products for each order. Each detail record is unique because a product and the order identification explicitly define it in the table.
Figure 2.1 shows the parts of the life cycle when a data warehouse is being built. These parts are described in the following topics.
Figure 2.1 The Data Warehousing Life Cycle
OLTP systems are the beginning point for data that will end up in a warehouse. In this sense, OLTP systems could be client/server databases, legacy mainframe systems, and stand-alone file-based PC applications. The focus of these systems is the details of business transactions.
Data must be collected from disparate sources. In some cases, it must be transformed from one format to another. It must then be moved into the warehouse database.
OLTP data is moved into a denormalized database, which is your data warehouse. The warehouse may then be split into data marts.
Cubes are built on top of the denormalized database to build an OLAP system; they give you better response times to queries.
A data warehouse does not require cubes or other OLAP tools to be considered a data warehouse. The warehouse is the centralized, denormalized database that holds enterprise data, and cubes are just structures built on top of the warehouse to speed queries. Cubes are explored in greater detail in Chapter 9: Microsoft SQL Server OLAP Services.
In this exercise, you will use OLAP Services to browse a prebuilt multidimensional cube that represents the sales data for a sample company named FoodMart. You will use the Cube browser to filter data and to drill down through the data to see the lower-level detail or drill up to see higher-level data. You should notice the differences between browsing data stored in a transaction processing system and browsing that in a data warehouse. This exercise gives you a first look at the Cube browser; you will work with the OLAP Manager and the Cube browser in detail in later chapters.
The Cube browser appears, displaying a grid made up of one dimension and the measures of your cube. The additional dimensions appear at the top of the browser.
In this procedure, you will replace one dimension in the grid with another.
Make sure that your pointer looks like this when you drop the dimension:
You will see that your filtered data now reflects figures for the first quarter of 1997 only.
The cube expands to include the Product Department level.
Data in a warehouse has a life cycle. It originates in other systems, usually OLTP systems. The data is transformed and imported into a data warehouse. Once it is there, it is possible to build cubes, which are precalculated multidimensional views of the data. Cubes provide speed and ease of access to querying tools and therefore to end users.