The Star Schema | Dimensional Data Warehousing with MySQL: A Tutorial

A good dimensional data warehouse has a simple database structure. Technically, a simple structure means faster queries. In a dimensional data warehouse, the implementation relational database has two types of tables, the fact table and the dimension table. The fact table consists of the facts (or measures) of the business. Dimension tables contain descriptions for querying the database.

Note

You will have better understanding of the fact table and the dimension table after reading the first two chapters of this book.

The tables in a data warehouse are related such that the schema looks like a star, hence the term star schema.

Note

In addition to the star structure, the snowflake structure could also be used in a data warehouse. However, it is more difficult to model than the star structure. As well, the snowflake structure is not that easy to understand and implement and the performance of its queries is slower than that of the star structure. These drawbacks make the snowflake structure unsuitable for dimensional data warehousing. This book covers the star structure only.

A star consists of a fact table surrounded by two or more dimension tables. A one-star structure has one fact table only. A multi-star structure has multiple fact tables, one for each star. In addition, dimension tables may be shared among multiple fact tables. This chapter covers the one-star structure; you learn the multi-star structure in Chapter 19, “Multi-Star Schemas.”

Figure 1.1 shows a one-star dimensional schema of the sales order data warehouse we will develop in this book.

image from book
Figure 1.1: A one-star dimensional schema

The fact table name is usually suffixed fact and the suffix dim (short for dimension) is normally added to the name of a dimension table. It should be clear that the schema in Figure 1.1 has one fact table (sales_order_fact) and four dimension tables (customer_dim, order_dim, product_dim, and date_dim). The fact table contains one or more measurable facts (a measurable fact is called a measure, for short). Dimension tables categorize measures.

Every dimension table has exactly one surrogate key column. A surrogate key column is suffixed sk. Every surrogate key column in a dimension table has a similarly-named column in the fact table to make querying the database easier. However, the sk-suffixed columns in the fact table are not referred to as surrogate key columns.

The lines connecting the sales_order_fact table to the four dimension tables in Figure 1.1 indicate the joins for querying the tables. These joins are on the dimension tables’ surrogate key columns.

When building a dimensional data warehouse, you have to generate the surrogate key values within your data warehouse yourself; they do not come from the data source. Surrogate key values are sequential numbers.

Note

The section “Surrogate Keys” later in this chapter explains surrogate keys in detail.

Now that you already know the definitions of the star schema and the fact and dimension tables, let’s look at an example. Let’s say we’re interested in order amounts and decide that the order_amount column in the sales_order_fact table in Figure 1.1 is the measure. Table 1.1 contains a sample row from the sales order fact table.

Table 1.1: A sample row from the fact table
Open table as spreadsheet
customer_sk	product_sk	date_sk	order_sk	order_amount
1	1	1	1	1000

Related dimensions that correspond to the data in Table 1.1 are given in Tables 1.2 to 1.5.

Table 1.2: The related row in the customer dim table
Open table as spreadsheet
customer_sk	customer_no	customer_name
1	1	Dons Limited

Table 1.3: The related row in the product_dim table
Open table as spreadsheet
product_sk	product_code	product_name
1	1	Cangcung Hard Disk

Table 1.4: The related row in the date dim table
Open table as spreadsheet
date_sk	date
1	2007–02–01

Table 1.5: The related row in the order_dim table
Open table as spreadsheet
order_sk	order_number
1	1

The row from the fact table specifies a sales order value of $1,000. This is the measure of the fact. The value 1 in the customer_sk column in the sales_order_fact table corresponds to the same value (in this case, 1) in the customer_sk column in the customer_dim table. This relationship reveals that the consumer that placed the order is Dons Limited. By using the value of the product_sk column in the sales_order_fact table, you can trace the product information in the product_dim table. By joining the sales_order_fact table and the date_dim table on the date_sk column, you’ll get the order date. And by joining the fact table and the order_dim table on the order_sk column, you obtain the order number.