A data model is for a database designer what a box of colors is for a painter: it provides a means for drawing representations of reality. Indeed, it has been claimed that "data modeling is an art" (Hull, 1997), even if the product of this activity has the prosaic name of database scheme.
When a data model allows the designer to devise schemes that are easy to understand and can be used to build a physical database with any actual software system, it is called conceptual (Batini, Ceri, & Navathe, 1992). This name comes from the fact that a conceptual model tends to describe concepts of the real world, rather than the modalities for representing them in a computer.
Many conceptual data models exist with different features and expressive powers, mainly depending on the application domain for which they are conceived. As we have said in the introduction, in the context of data warehousing, it was soon realized that traditional conceptual models for database modelling, such as the entity-relationship model, do not provide a suitable means to describe the fundamental aspects of such applications. The crucial point is that in designing a data warehouse, there is the need to represent explicitly certain important characteristics of the information contained therein, which are not related to the abstract representation of real-world concepts, but rather to the final goal of the data warehouse: supporting data analysis oriented to decision making. More specifically, it is widely recognized that there are at least two specific notions that any conceptual data model for data warehousing should include in some form: the fact (or its usual representation, the data cube) and the dimension. A fact is an entity of an application that is the subject of decision-oriented analysis and is usually represented graphically by means of a data cube. A dimension corresponds to a perspective under which facts can be fruitfully analyzed. Thus, for instance, in a retail business, a fact is a sale and possible dimensions are the location of the sale, the type of product sold, and the time of the sale.
Practitioners usually tend to model these notions using structures that refer to the practical implementation of the application. Indeed, a widespread notation used in this context is the "star schema" (and variants thereof) (Kimball, 1996) in which facts and dimensions are simply relational tables connected in a specific way. An example is given in Figure 1. Clearly, this low-level point of view barely captures the essential aspects of the application. Conversely, in a conceptual model these concepts would be represented in abstract terms which is fundamental for concentration on the basic, multidimensional aspects that can be employed in data analysis, as opposed to getting distracted by the implementation details.
Figure 1: An Example of Star Schema
Before tackling in more detail the characteristics of conceptual models for multidimensional applications, it is worth making two general observations. First, we note that in contrast to other application domains, in this context not only at the physical (and logical) but also at the conceptual level, data representation is largely influenced by the way in which final users need to view the information. Second, we recall that conceptual data models are usually used in the preliminary phase of the design process to analyze the application in the best possible way, without implementation "contaminations." There are however further possible uses of multidimensional conceptual representations. First of all, they can be used for documentation purposes, as they are easily understood by non-specialists. They can also be used to describe in abstract terms the content of a data warehousing application already in existence. Finally, a conceptual scheme provides a description of the contents of the data warehouse which, leaving aside the implementation aspects, is useful as a reference for devising complex analytical queries.
Let us now investigate in more detail, but still informally, the fundamental ingredients of a conceptual data model for data warehousing. We start from the observation made above that the effectiveness of data warehousing modeling strictly depends on the ability to describe factual data according to appropriate dimensions, that is, "perspectives" under which data can be analyzed. For instance, in a data warehousing application for a retail company, it is useful to organize data along dimensions such as products commercialized by the company, stores selling these products, and days on which sales occur. To better support data analysis, it is useful to organize a dimension into a hierarchy of levels, obtained by grouping elements of the dimension according to the analysis needs. For instance, we might be interested in grouping products into brands and categories, and days into months and years. When the members of a level l can be grouped to members of another level l', it is often said that l rolls-up to l'. For instance, the level "product" rolls-up to the level "brand." A level usually has descriptive attributes (or simply descriptions) associated with it. For instance, descriptions of a store include its name, manager, and address.
Let us consider a more concrete example, which will be used as a simple case study throughout this chapter.
Example 1: The Toys4All company produces and sells a large number of products (mainly toys) in a chain of stores, over a wide territory.
A main business goal for this company could be to understand the impact of promotions on sales, that is, how promotions influence product sales and to what extent promotions are profitable. Another important business goal could be the analysis of the warehouse process, where inventory levels should be measured monthly, for each product and warehouse controlled by the company. It follows that possible dimensions of the Toys4All data warehouse application are Product, Store, Warehouse, Time, and Promotion. The Product dimension may be organized into levels such as item (whose members are products such as Disney's Dinosaur and Duplo Pooh), product-line (containing members like Mattel's Disney and Lego Duplo), brand (Mattel and Lego), category (Popular Characters and Blocks), and department (Action Figures and Blocks). The elements of the Time dimension describe days over a period of time; this dimension may be organized into the levels day, month, quarter, year, and season. A member of the level day might be February 27, 2001. Members of the level day can be grouped to members of the level month, but also to members of the level season (e.g., Carnival). Descriptions of the item level might be its name and code.
Traditionally, the entities of an application subject to decision-oriented analysis are called facts, and the specific and measurable aspects of a fact relevant for the analysis are known as measures. A collection of measures for the same fact can be nicely represented by means of a data cube (or hypercube) having a "physical" dimension for each "conceptual" dimension of measurement: a coordinate of the data cube specifies a combination of level members, and the corresponding cell contains the measure associated with such a combination.
Example 2: For the Toys4All company, a possible fact is the daily sale. This fact can be analyzed with respect to the day of the sale, the product sold, the store of the sale, and the promotion applied to the daily sale. The measurements made for each daily sale could include the number of units sold, the income, and the cost. Thus, a data cube Sales can be used to describe daily information about the items sold by the stores of the chain. An instance of this data cube can state the fact that on February 27, 2001 the store Colosseum has sold two pieces of Duplo Pooh, applying a Carnival 2001 Promotion, for a corresponding gross income of 19.98 Euros against a cost of 14.98 Euros.
In the warehouse process, measurable facts are the inventory levels, to be measured, for instance, monthly, for each product and warehouse. They can be modeled by means of a data cube Inventory. The measurements made for each monthly inventory could include the inventory level (the quantity in stock at the end of the month), the quantity shipped during the month, and the value at cost of the quantity in stock.
In the next section we will try to formalize the general notions discussed in this section.