

At the abstract level, considering its typical requirements, a multidimensional database (MDDB) is a data repository that provides an integrated environment for decision support queries that require complex aggregations on huge amounts of historical data (e.g., to identify trends in revenues). At the technical level, we assume for this chapter that an MDDB is a relational data warehouse, in which the information is organized following the starmodel. Its basic structure may be represented with the simple entityrelationship diagram depicted in Figure 1, in which all the D_{i} entities represent the dimensions of the MDDB, while the connecting relationship F is the fact table.
Figure 1: EntityRelationship Representation of an MDDB
Each dimension table D_{i} contains all the information that is specific only to the dimension itself (for example, product information), while the fact table F correlates all dimensions and contains information on the attributes of interest for the intersection of all the dimensions (for example, sales by product by day).
Multidimensional queries usually require the computation of aggregate functions on data grouped on an appropriate set of attributes. For multidimensional data processing, data is usually represented as a single (huge) fact table, on which all the interesting aggregates may be computed.
As shown in the practical example below, in addition to the fact table, MDDBs may have several dimensions, each of which is characterized by a considerable number of attributes, most of which may also be relevant for grouping computation.
We consider as a practical example the MDDB for a large grocery store chain, taken from Kimball (1996). The company manages a large number of stores, each of which is a supermarket selling a wide variety of different products (e.g., grocery, frozen foods, bakery, etc.).
We can identify the following dimensions:
Product, which can be characterized by more than 50 different attributes describing various features of the products, such as brand, category, diettype (diet vs. nondiet), package type, weight, case size. Products belong to a merchandise hierarchy, which includes subcategory and category. This means that all the products that belong to the same subcategory also have the same value for attribute category.
Store, which characterizes each point of sale with attributes such as complete store address and telephone, manager, description of store services, sizes of different departments. Stores are organized in a geographic hierarchy (zip code, county, state). The store dimension may have more than 20 attributes.
Time, which provides the appropriate detail to allow accurate analysis of the MDDB data, that is indeed a time series. Time is represented with the granularity of day; days may be described as day in the month, in the quarter, and in the year. Furthermore, the indication of which days are holidays, weekend days, days in which special events took place (and many more) can be given. The time dimension may have more than 15 attributes.
Promotion, which describes the characteristics of product promotions, such as the promotion type (ad, coupon, etc.) and several pieces of information on each promotion type, the promotion cost, and start/end date. Overall the promotion dimension is characterized by at least 10 attributes.
The fact table provides the sales information on which the actual financial analysis is performed. It includes the identifiers of all the dimensions and several attributes describing sales revenues (e.g., income, number of units sold, number of customer transactions performed, etc.). In this chapter we consider a subset of the attributes of each dimension as relevant attributes for grouping computations: we assume 15 attributes for dimensions Product and Store, nine attributes for Time, and 11 attributes for Promotion.
MDDBs have three features that make them an ideal environment for the adoption of materialized views.
MDDBs are implemented upon data warehouse architectures, where updates occur with a rigid low frequency (daily, weekly, or even monthly), in periods where the system load is low; thus, ratio f_{Q} / f_{U} is particularly high.
MDDBs have a data model that permits the easy adoption of incremental view maintenance techniques; often, materialized views can be maintained considering only the updates and the current state of the materialization, with no need to access base tables.
Queries in MDDBs can be characterized by a model that permits a precise and manageable definition of the view selection problem.
We now describe the multidimensional data model and the query model. We will then present the most important techniques for the maintenance of materialized views and the MDmat problem, that is the problem of chosing an optimal set of views to materialize.
We repeat for convenience some definitions which were previously introduced.
Definition 1: A Multidimensional Database is a collection of relations D_{1}, …, D_{n}, F, where 0
Each D_{i} is a dimension table , i.e., a relation characterized by an identifier d_{i} that uniquely identifies each tuple (d_{i} is the primary key of D_{i}).
F is a fact table , i.e., a relation connecting all dimension tables D_{1} , …, D_{n} ; the identifier of F is given by the foreign keys d_{1} , …, d_{n} of all the dimension tables it connects; the schema of F contains a set of additional attributes V (representing the values on which the aggregate functions are applied).
The dimension tables may contain hierarchies on the data.
Definition 2: Let D be a dimension table with identifier d. An attribute hierarchy on D is a set of functional dependencies FD_{D} = {fd_{0} , fd_{1} , …, fd_{n} }, where each fd_{i} is characterized by two sets of attributes A ^{l}_{i} ⊂ Attr(D) and A^{r}_{i} ⊆ Attr(D) (respectively called left side and right side of the dependency); the dependency is represented as fd_{i} : A ^{l}_{i} → A^{r}_{i}.
Each functional dependency fd_{i} is a constraint on the content of the dimension table D: for any pair of tuples t_{1}, t_{2} ∊ D, t_{1}[A^{l}_{i}]= t_{2}[A^{l}_{i}]⇒ t_{1}[A^{r}_{i}]= t_{2}[A^{r}_{i}]. A dependency fd_{0} with A ^{l}_{0} = {d} and A ^{r}_{0} = {Attr(D)− d} will always be present in FD_{D}. A given set of attributes A^{l}_{i} may appear only once as the left side of a dependency. Functional dependencies must be acyclic, i.e., the graph obtained by drawing an arc from a_{x} to a_{y}, if ∃fd_{i} ∊ FD_{D}  a_{x} ∊ A^{l}_{i} ∧ a_{y} ∊ A^{r}_{i}, must be acyclic.
Definition 3: An attribute hierarchy is treelike (FD ^{T} ) if all the attributes appear at most once in the right side of a functional dependency and the left side always contains a single attribute.
In this chapter we consider only treelike attribute hierarchies.
Definition 4: An MDDB attribute hierarchy FD_{DB} is the union of the attribute hierarchies FD_{Dj} of all the dimensions D_{j} appearing in the MDDB.
Business analysis queries usually require the computation of aggregate functions on data grouped on an appropriate set of attributes.
Example 1: Consider again the multidimensional database of our example, MDDB = {Product, Store, Time, Promotion, F}, where each dimension table has its corresponding attributes and with fact table F ={p, s, d, r, f}, having p, s, d (time dimension is represented with the granularity of day), and r as foreign keys of the dimension tables, and f representing the amount of sales. The following queries can be requested on the MDDB:
q_{1} = total sales per product
q_{2} = total sales per product and store
q_{3} = total sales per product and day
q_{4} = total sales per product, store and day
The queries we consider are selectjoingroupby queries, with some restrictions on the allowed selection and join predicates. In particular, selection predicates are simple comparison predicates between a dimension attribute and a constant value (i.e., are of the type attr <operator> constvalue). These predicates select a "slice" of data on which aggregates are actually computed. We envision a framework where the user provides a set of queries representative of the queries that the system must answer. For select queries we assume that only the attribute used is important and not the constants used in the comparison (which will generally change from query to query). To simplify the framework, we derive from a selection condition on an attribute a request for aggregating on that attribute. Consider a query q that returns the total sales of a particular store grouped by products. Query q can be answered by accessing the result of a query that returns the sales grouped by product and store, and selecting from them only the tuples relative to the specified store. In this way we can focus on queries computing aggregates. These considerations lead to the restriction that all the attributes appearing in a selection predicate must also appear as grouping attributes.
Join operations may be performed only between the fact table and any of its dimensions. Allowed predicates are equality predicates between a dimension identifier and the corresponding fact table foreign key, while join predicates involving nonkey attributes are disallowed.
We consider the standard SQL notion of groupby and aggregate function (considered functions are count, sum, avg, min, max). Grouping attributes may be drawn both from dimension and fact table attributes.
Definition 5: Given a query q, we define the query as characterized by the set of its groupby attributes A. We may represent the query as q ^{A}.
A fundamental characteristic of MDDB queries is that it is often convenient to reuse the results of queries to answer other queries. In Example 1, we can use the result of query q_{2} to answer query q_{1}, adding the sales across all stores to get the result. In a similar way, q_{1} can also be computed in terms of the answers to queries q_{3} and q_{4}, while q_{2} and q_{3} can be computed in terms of query q_{4}.
The reuse of queries is strictly related to the datacube (Gray et al., 1996) operator. This operator, receiving as input a table T, a set of aggregating attributes A, and an aggregate function f, computes the union of the results of the queries evaluating function f, having as grouping attributes all possible combinations of attributes in A.
Consider the situation where only the fact table F is present. Applying the data cube operator on the foreign keys of F, we obtain a data cube lattice.
Definition 6: Given an MDDB = {D_{1} , …, D_{n} , F}, the lattice Cubelattice of MDDB is the lattice of the set of all possible grouping queries that can be defined on the foreign keys of F. This lattice is characterized by the following elements:
an ordering relation defined as the comparison between the sets of grouping attributes (i.e., q^{A1} ≼ ” q^{A2} ↔ A_{1} ⊆ A_{2});
meet operator as union of the grouping attributes;
join operator as intersection of the grouping attributes;
the query grouping on all the foreign keys as top element;
the query computing the aggregate function on all the tuples of F as bottom element (empty set of grouping attributes).
Requested queries are associated with a subset of the views of the datacube lattice of MDDB. In Agrawal, Chaudhuri, & Narasayya (2000). the subset of views is considered which appears in the execution plans of representative queries.
Example 2: Consider again the MDDB of Example 1. Figure 2 represents the datacube lattice derived from the fact table F and shows the elements that have a query associated.
Figure 2: DataCube Lattice with Associated Queries
The presence of dimensions significantly increases the size of the problem. The first aspect is the growth in the number of potential grouping attributes, which exponentially increases the number of elements of the lattice. The second aspect is the presence of hierarchies, which permits removal of some elements from the lattice. In fact, consider a query grouping on a dimension key d_{i} and also on an attribute a_{j} of the same dimension D_{i}. Since there exists a functional dependency from d_{i} to a_{j}, a query grouping on {d_{i} , a_{j}} must produce the same result of the query grouping on {d_{i}}. This observation is the basis for the following generalization.
Definition 7: Let q_{i}^{Ai} and q_{j}^{Aj} be two queries and FD_{DB} the MDDB attribute hierarchy. The operator ancestor (represented by the symbol ⊕) is defined by the following algorithm:
Algorithm 1: Ancestor of two queries.
Algorithm 1 operates by building the union of the attributes characterizing the queries and eliminating all the elements for which there exists a functional dependency in FD_{DB}.
The result of applying the operator ⊕ to queries q_{x} and q_{y} is the smallest view that contains all the information necessary for answering q_{x} as well as q_{y}. If applied in a reflexive way (i.e., q_{x}^{Ax} ⊕ q_{x}^{Ax}), it eliminates all redundant attributes.
Example 3: Consider the Store dimension of the MDDB described earlier, with key s and restricted to a set of attributes {z, c, st, n} representing respectively zip code, county, state, and number of sale clerks. The dimension has the following treelike attribute hierarchy: { fd_{0} : s → {z, c, st, n}, fd_{1} : z → c, fd_{2} : c → st}.
The queries q ^{{n}} , q ^{{c}} , and q ^{{st}} can for example be computed from q ^{{c,n}} = q^{{n}} ⊕ q^{{c}} ⊕ q ^{{st}}.
Definition 8: The operator descendent (represented by symbol ⊖ ) is defined by the following algorithm:
Algorithm 2: Descendent of two queries.
Algorithm 2 first extends the parameters A_{x} and A_{y} with all the attributes that can possibly be derived from them. It then considers their intersection A_{z} and finally removes from it the right sides of dependencies whose left side is contained in A_{z}.
The descendent operator computes the "greatest" among the set of attributes characterizing the queries that can be computed by both q_{x} and q_{y}.
Definition 9: Let {D_{1} , …, D_{n} , F} be a multidimensional database and FD_{DB} the MDDB attribute hierarchy. Consider the set of queries characterized by all the combinations among the attributes of {D_{1} , …, D_{n} , F}, except the combinations that contain elements on both sides of a functional dependency fd_{i} ∊ FD_{DB} . This set of queries identifies a lattice where:
⊕ is the meet operation;
⊖ is the join operation;
the ordering relation is given by the following definition: q^{A1} ≼ q^{A2} ↔ (A_{1} ⊖ A_{2}) = A_{1} ↔ (A_{1} ⊕ A_{2}) =A_{2};
the query grouping on all the foreign keys of F, {d_{1} , …, d_{n}}, is the top element;
the query computing the aggregate function on all the tuples of F is the bottom element (empty set of grouping attributes).
We call this lattice the MDlattice (multidimensional lattice).
Comparing Definitions 6 and 9, it is easy to observe that without hierarchies the MDlattice is equivalent to the Cubelattice built on all the schema attributes.
Example 4: Consider the Store dimension of Example 3. The MDlattice for an MDDB where Store is the only dimension is represented in Figure 3.
Figure 3: MDLattice of the Store Dimension
An MDlattice defines all possible ways of computing queries that can be defined on an MDDB in terms of other queries defined also on MDDB. In fact, if a query q_{i} ≼ q_{j} then q_{i} can be answered using q_{j}. This property can be generalized.
Definition 10: Let q_{i} and q_{j} be two queries on an MDDB. Then, the least upper bound (l.u.b.) of q_{i} and q_{j} in the MDlattice is the query obtained as a result of evaluating q_{i} ⊕ q_{j} , and it represents a query (the most specialized) which can be used to answer both q_{i} and q_{j}.
We now distinguish between queries and views. In the following we will use the term query to refer to the representative queries provided by the user, while we will use the term view to refer to the elements of the lattices. When a query q computes the content of a view v of a lattice, we say that the query q is associated with view v. We also extend to views the notation for the queries and say that a view is characterized by a set of attributes A if A is the set of grouping attributes of v, represented as v^{A}.
Definition 11: A query q_{i} is said to depend on view v_{j} if the query can be answered using as only inputs the content of v_{j} together with any appropriate dimension D_{i}.
There is a strict relationship between dependence of a query on a view and the ordering relation on an MDlattice. In fact, if we consider a query q_{i} depending on view v_{j} and the view v_{i} associated with the query, it must happen that v_{i}≼ v_{j}.
An important relationship exists between the ordering of the elements in the lattice given by relation ≼ and the cardinalities (i.e., number of tuples) of the views. In fact, for each pair of views v_{i} , v_{j} ∊ MDlattice, if v_{j} ≺ v_{i}, v_{j} can be computed by v_{i} with a further grouping. Since grouping can only reduce the number of tuples, it follows that v_{j} ≤ v_{i}, where v represents the cardinality of v.
The number of views of an MDlattice depends on the number of attributes of the dimensions of the MDDB and on the number and structure of the attribute hierarchies.
In the absence of hierarchies on the dimensional tables, i.e., when each dimension has a hierarchy that contains only the functional dependency between the key of the dimension and all its attributes, the number of views of the MDlattice is given by the formula:
where i is the number of dimensions of MDDB and n_{i} is the number of nonkey attributes of each dimension D_{i}. In fact, the number of views of the MDlattice is obtained as the product of the number of views of the lattices that can be built on every dimension. Each dimension D_{i} generates a lattice which has 2^{ni} elements representing all the possible combinations of n_{i} attributes (exactly like a Cubelattice), plus one view which represents the view characterized by the key d_{i}.
In the presence of hierarchies, the number of views in the local lattices is reduced and the above formula for n_{total} constitutes an upper bound. A formal treatment of the cardinality of the MDlattice in the presence of hierarchies is beyond the scope of this chapter. We only provide as an example the determination of the number of elements for the MDlattice of the MDDB previously described.
Example 5: Consider the MDDB described earlier in this chapter. The number of views of the lattices that can be built on every dimension are: n_{Product} = 12,289; n_{Store} = 8,193; n_{Time} = 129; n_{Promotion} = 1025. The total number of views of the resulting MDlattice is then the product of all these values, obtaining  MDlattice  = 1.3313 · 10^{13}.

