Logical Database Design


Because of the differences in intent and purpose between operational systems and BI applications, different database design techniques have been devised for BI target databases. These highly denormalized designs store aggregated and summarized data in a multidimensional fashion. Logical database designs are documented as physical data models with technical meta data.

Aggregation and summarization are probably the most significant contributors to good BI application performance. If most business analysts need to see their data summarized, these totals should be precalculated and stored for quick retrieval. It is important to discuss the level of granularity with the business representative, as well as with other business analysts who will be using the BI target databases, since they will expect the database design to allow them to drill down to a certain level of detail.

Multidimensional database designs support the quick retrieval of a wide range of data. Two popular multidimensional design techniques are the star schema and the snowflake schema, both described below.

The Star Schema

In a star schema, data is represented as an array of precalculated values, called facts, around which analysis is performed. These precalculated facts represent atomic operational data values that have been presummarized by certain dimensions, such as customer, product, and time. A dimension in a star schema is similar to an entity in a logical data model: it is a business object about which data is collected for business purposes.

The star schema mirrors the view of a business query. As the name implies, the star schema has a single object in the middle, called the fact table, which is connected in a radial fashion to a number of objects, called dimension tables. Figure 8.2 presents an example of a star schema.

Figure 8.2. Star Schema

graphics/08fig02.gif

A star schema has two, and only two, levels: the fact table and a series of single-level dimension tables. Fact tables have the following characteristics:

  • A fact table represents a critical business event (a business activity or transaction, such as a sale or a claim).

  • The facts are the quantifiable aspects of the business event; that is, they are columns in the fact table.

  • A fact table links to its related dimension tables (business objects, such as customer or product).

  • A fact table has a long composite key comprised of the primary keys of the related dimension tables (which are foreign keys in the fact table).

  • A number of highly redundant fact tables may exist for a given subject area. Each fact table could contain a different aggregation level of the same data. For example:

    - Sales facts by store by region by date

    - Sales facts by product by store by date

    - Sales facts by customer by region by date

  • Fact tables are long and narrow: the tables have an immense number of rows (long), but there are relatively few columns in the tables (narrow).

    Dimension tables have very different characteristics.

  • Dimension tables are business objects, which represent the different perspectives from which the facts in a fact table can be viewed and analyzed .

  • Dimension tables usually have a one-attribute primary key.

  • Dimension tables are denormalized, which means that data belonging together from a specific business perspective, such as a roll-up hierarchy, is grouped together into one table. This produces some redundant data values, which is acceptable in this design schema.

  • Dimension tables are short and wide: the tables have relatively few rows (short), but there are many columns in the tables (wide).

  • Whenever possible, dimension tables should be shared by the fact tables (conformed dimensions).

  • One dimension is always a time dimension with attributes describing the timestamp, such as calendar year, quarter, season , fiscal period, or accounting period. Some other examples of common dimension tables are customer, product, policy, sales representative, region, and store.

Most multidimensional DBMSs effectively deal with the optimization of large multi-table JOINs. One method for determining whether the DBMS is resolving the query efficiently is to look at the optimized plan for the query. For example:

  • If the fact table is the last table JOINed, this is an indicator of optimization. If the fact table appears to be somewhere in the middle, or even somewhere toward the beginning, the DBMS may not be resolving the JOIN optimally, unless it uses more sophisticated JOIN algorithms.

  • If the DBMS does not use Cartesian product JOINs, the DBMS may take the qualifying row keys and apply them against a composite fact table index, or it may apply them via an index intersection against multiple fact table single-column indices.

In either case, verify that your DBMS is executing multidimensional queries in the most efficient manner since your performance depends on it.

The star schema is the most popular database design schema for BI applications for a variety of reasons.

  • It yields the best performance for trend analysis queries and reports that include years of historical data.

  • It provides maximum flexibility for multidimensional data analysis.

  • It is supported by most of the relational DBMS vendors with modifications to their DBMS optimizer.

  • Its simplicity makes complex data analysis much less difficult than with a standard normalized design. It is much easier to ask questions such as the following:

    - Which insurance broker is giving us the most or the least lucrative business?

    - What are the most frequently occurring types of claims from this insurance broker?

    - When are these claims occurring?

The preceding questions are typical drill-down questions (asking for more detailed data) and typical roll-up questions (asking for more summarized data).

The Snowflake Schema

A snowflake schema is a variation of a star schema, except in a snowflake the points of the star radiate into more points, as shown in Figure 8.3.

Figure 8.3. Snowflake Schema

graphics/08fig03.gif

In snowflake schemas, the levels of the hierarchies in the dimension tables are normalized, thereby increasing the number of tables. Table 8.2 lists the advantages and disadvantages of snowflake schemas.

Table 8.2. Advantages and Disadvantages of Snowflake Schemas

Advantages

Disadvantages

  • The size of the dimension tables is reduced and data value redundancy is avoided because parent-child hierarchies are no longer collapsed .

  • The increased number of tables may adversely affect query performance because of the necessary additional JOINs.

  • Application flexibility is increased.

  • Database maintenance effort is increased because there are more tables to maintain.



Business Intelligence Roadmap
Business Intelligence Roadmap: The Complete Project Lifecycle for Decision-Support Applications
ISBN: 0201784203
EAN: 2147483647
Year: 2003
Pages: 202

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net