Differences in Database Design Philosophies | Business Intelligence Roadmap: The Complete Project Lifecycle for Decision-Support Applications

There is a completely different design philosophy behind BI target databases as compared with operational databases. Table 8.1 summarizes the differences between these two types of databases.

Operational Databases

The intent of operational database design is to prevent the storage of the same data attributes in multiple places and thus to avoid the update anomalies caused by redundancy. In other words, from an operational perspective you want to avoid storing the same data in multiple columns in multiple tables so they do not get out of synch. Designing normalized database structures is key for developing relational databases in support of that intent. Normalization ensures that the data is created, stored, and modified in a consistent, nonredundant way.

Table 8.1. Operational Databases versus BI Target Databases

Operational Databases	BI Target Databases
Geared toward eliminating redundancy, coordinating updates, and repeating the same types of operations many times a day, every day (for example, airline reservations, deposits and withdrawals from bank accounts, hotel room reservations ).	Geared toward supporting a wide range of queries and reports. Queries and reports may vary from one business analyst to another or from one department to another. All of the queries and reports may not run on the same day and may not run every day (for example, quarterly trend analysis reports on regional sales, monthly order fulfillment report).
Most of the transactional systems require subsecond response time.	Although response time is important, subseconds cannot be expected. Typical response times are seconds, minutes, or hours.
Highly normalized to support consistent updates and maintenance of referential integrity.	Highly denormalized to provide quick retrieval of a wide range and a large amount of data. Data that belongs together from an analytical reporting perspective is usually stored together.
Store very little derived data. Data is usually derived dynamically when needed.	Store large amounts of derived data. This saves time for the queries and reports.
Do not store historical data. Historical records are archived.	Store large amounts of historical data, often at some level of summarization, but just as often at a detailed level.
Lightly summarized, mostly for reporting purposes.	Many levels of precalculated, summarized data, from lightly summarized to highly summarized.

Most operational systems are designed with a data-in philosophy (data entry), not a data-out philosophy (reporting and querying). The objective of a data-in philosophy is to make data entry as efficient as possible, running hundreds of thousands of transactions per day, while eliminating or minimizing redundancies in the data. Data redundancy leads to inconsistencies, and inconsistencies are often the reason for poor-quality data. Therefore, in trying to solve the enormous data quality and data redundancy problems in operational systems, the goal is to avoid redundancy (except for key redundancy, which is unavoidable). This goal is achieved through normalization.

While normalization works well for operational systems, the requirements for reporting are different from the requirements for data entry. Reporting uses data that has already been created, which means update anomalies cannot occur. While it is of great benefit that the data is consistent and nonredundant as a result of a normalized database design, that same design makes reporting difficult. For example, to create strategic trend analysis reports, many tables have to be accessed, and every row in those tables has to be read. This is not only complex but also extremely inefficient when run against a normalized database design because it requires scanning tables and performing large multi-table JOINs. For that reason, most BI target databases are based on a multidimensional design, in which the data for the strategic trend analysis reports is stored in a precalculated and presummarized way.

Figure 8.1 illustrates the general difference between an operational normalized design and a BI multidimensional design.

Figure 8.1. Operational normalized versus BI Multidimensional Designs

graphics/08fig01.gif

In this example, the operational database design shows an Order database where customers are associated with orders, and each order is composed of many line items. With each placed order, the line items have to be subtracted from a separate Inventory database. The BI target database design shows a database with summaries that are used to identify trends over time. In this design, the same data about orders, line items, and inventory may exist in multiple tables (Monthly Summary, Regional Summary, Product Summary), albeit summarized by different dimensions. While operational databases generally store granular (atomic) data, BI target databases, for the most part, store summarized data.

BI Target Databases

Contrary to the data-in philosophy (data entry) of operational systems, the data-out philosophy (reporting and querying) of BI applications includes the following design considerations.

BI target databases are designed for simplified, high-performance data retrieval, not for efficiency of data storage and maintenance (which are important design considerations for operational databases).
Eliminating or minimizing data redundancy is not a goal in designing BI target databases. If a choice must be made, data redundancy is favored over complexity, but the redundancy must be controlled. Redundant data must be consistent and reconcilable.
Basic assumptions for designing BI target databases are listed below.

- Data is stored in such a manner that it is readily accessible in ways that are of interest to the business people.

- The design is driven by access and usage.

- A normalized design is not necessarily intuitive for a business person and could therefore become quite complex.

- No BI data can be invented! All data in the BI target databases must exist in or be derivable from current internal or external operational data sources.

A key decision for all BI applications is whether or not, and at what level, to store summarized data in the BI target databases. The database administrator and the lead developer may decide to store both detailed data and summarized data, either together in the same BI target database or in different BI target databases. This database design decision must be based on access and usage requirements.