

In the multidimensional model developed by Hurtado and the authors (Hurtado, Mendelzon, & Vaisman, 1999), dimensions are defined as nontemporal structures like the geography dimension of Figure 1. A dimension in this model has a schema, and instances associated to this schema which may evolve through time, although dimension versioning is not supported. In the Temporal Multidimensional Model we present in this section, we timestamp dimension elements at the schema and instance level in order to keep track of the updates that occur during the dimension's lifespan.
We will consider time as discrete; that is, a point in the timeline, called a time point, will correspond to an integer. We will also assume, except when noted, instant t0 to be the dimension's creation instant.
Temporal Dimensions. The following sets are defined: a set of level names L, where each level l ∊ L is associated with a set of values dom(l); a set of attribute names A, such that each attribute a ∊ A is associated with a set of values dom(a); a set of temporal dimension names TD; and a set of fact table names F.
Definition 1 (Temporal Dimension Schema): A temporal dimension schema is a tuple (dname, L, λ, ≼, A, >>, μ) where:
dname ∊ TD is the name of the temporal dimension.
μ is a level in the Time dimension. Intuitively, μ defines the granularity of the dimension dname. It represents transaction time.
L ⊆ L is a finite set of levels, which contains a distinguished level name All, s.t. dom(All) = {all}. This distinguished level is considered valid during the complete lifespan of the dimension.
"λ" is a function with signature dom(μ) → L, defining the instants when each level was part of the dimension.
"≼" is a function with signature dom(μ) → 2^{LxL}, such that for each t ∊ dom(μ), ≼, is a relation such that ≼*_{t}, the transitive and reflexive closure of ≼t, is a partial order, with a unique bottom level, 1_{inf} ∊ λ (t), and a unique top level, All, where, for every level and ≼ *_{t} All hold.
A is a finite set of attributes.
"≫" is a function with signature dom(μ) x A → L.For every level l ∊ λ (t), a ≫_{t} l means that if the function is applied to an attribute a, it returns the level l such that attribute a belongs(ed) to level l at time t.
The structure introduced by Definition 1 suggests that, given an appropriate language, it would be possible to query the schema of a dimension (using the functions ≼ and ≫). Note that l_{inf} is unique at any time instant t, although it may not be the case at different times. Also observe that taking a snapshot of the schema at a given time t, we get a nontemporal dimension schema.
Notation: In the figures of this chapter, a label t_{i} associated to an edge in a graph, will mean that the edge is valid for all t ≥ t_{i}, and a label t*_{i}, that the edge was valid for all t < t_{i}. If an edge has no label, it is considered valid for all of the dimension's lifespan. An interval [t_{i}, t_{j}) means that the edge is valid from t ≥ t_{i}, to t < t_{j}.
Definition 2 (Temporal Dimension Instance): A temporal dimension instance is a tuple (D, TRUP, TDESC), where D is a temporal dimension schema, and:
TRUP (temporal rollup) is a set of functions, satisfying the following conditions:
For every instant t ∊ dom(μ), and for each pair of levels such that l_{1} ≼tl_{2t}, there exists in TRUP a rollup function: dom (l_{1}) → dom(1_{2}). Thus, a function is defined for every snapshot taken at any instant t ∊ dom(μ).
For every instant t in the dimension's lifespan, and for every pair of paths τ_{1} and τ_{2} in the graph with nodes in λ(t) and edges in ≼_{t}, such that τ_{1} = < l_{1}, l_{2}, …, l_{k}, l_{n} >, and τ_{2} = < l_{1}, l'_{2}, …, l'_{k}, l_{n} >, we have
At every instant t of the dimension's lifespan, and for each triple of levels l_{1}, l_{2}, l_{3} ∊ λ(t) such that l_{1} ≼_{t} l_{2} and l_{2} ≼ _{t} l_{3}, ran
TDESC (temporal description) is a set of functions such that for every instant t ∊ dom(μ), and for each level l ∊ λ(t) and for each attribute a such that a >>_{t} l, there exists in TDESC a function with signature : dom(l) → dom(a).
We will call the second condition in Definition 2 snapshot consistency.
Example 1: Figure 3 depicts a schema and an instance for a dimension Store. Here, dname = Store, L = {storeId, storeType, city, region, All}, μ = day. Initially "≼" contains the following pairs: storeId ≼ city, city ≼ region, region ≼ All. The addition of a new level storeType above level storeId at time d_{1} will modify "≼" as follows: ≼ = {storeId ≼ city, city ≼ region, storeId storeType, region ≼ All, storeType All}.
Figure 3: The Temporal Dimension "Store"
Figure 3 also shows a temporal dimension instance for dimension Store. We see that the rollup functions with no label are valid for the whole lifespan of Store, while for example is valid from d_{1}on.
Definition 3 (Active Instance Set): Given a temporal dimension d, a level l ∊ L, and an instant t, the set of elements belonging to dom(l) at time t is called the Active Instance Set of l. We denote it Ainstset (l,t).
Example 2: In Figure 3, Ainstset (stereId, d) = {s_{l}, s_{2}, s_{3}, s_{4}}, for all d. However, deleting s_{1} at time d_{3} will yield Ainstset (storeId, d_{3})= {s_{2}, s_{3}, s_{4}}.
The previous definitions set the basis for a temporal data warehouse. We will introduce the idea through the following example.
Example 3: Figure 4 shows a sequence of updates to a temporal dimension Product. Initially, Product consisted only of level itemId, and the distinguished level All. After that, the brand was added to the dimension, although the initial state is not lost (Figure 4(b)). Later, the type of the item is inserted, with level name itemType. The company to which an item belongs is also added above level brand at time d_{5}, and level brand is deleted at time d_{9} (Figure 4(d)). Note that an edge is added from level itemId to level brand, valid from time d_{9}on.
Figure 4: A Series of Updates to Dimension "Product"
A slice or snapshot (in temporal database terminology) of a dimension, taken at a given instant t, defines the state of the dimension at that time. For instance, Figure 5 shows a snapshot of dimension Product at d_{4}.
Figure 5: A Snapshot at "d4"
Definition 4 (Temporal Fact Table): A temporal fact table schema is a tuple s = (fname, f, m, μ), where m is a level name, called the measure of the fact table μ, is a level in the Time dimension, and f is a function with signature dom(μ) → 2^{L}.
Given a temporal fact table schema (fname, f, m, μ), a set of levels L in the range of f, and a level μ in the Time dimension, a mapping from each level l_{i} ∊ (L ∪ μ) to dom(l_{i}) is called a point.
Given a temporal fact table schema s = (fname, f, m, μ), a temporal fact table instance over it is a partial function named f name which maps points of s to elements in dom(m).
Definition 5 (Temporal Base Fact Table): Given a set D of temporal dimensions, a temporal base fact table is a temporal fact table with schema (fname, f_{D}, m, μ), where f is a function with signature dom(μ) → 2^{L}, such that for each t ∊ dom(μ), every level in f_{D}(t) is a bottom level of the dimension it belongs to. Thus, a temporal base fact table is a temporal fact table such that its attributes are the bottom levels of each one of the dimensions in D.
Example 4: Given D = {Store, Product} where dimensions Store and Product are those of Figures 3 and 4 respectively, the Temporal Base Fact Table associated to D would have f_{D}(t) = {storeId, itemId}for all t. If updates occur such that, for instance, at time d12, brand becomes the bottom level of dimension Product, f_{D}(d12) = {storeId, brand}.
Definition 6 (Temporal Multidimensional Database):
A temporal multidimensional database schema, denoted B_{s}, is a pair (D_{s}, F_{s}), where D_{s} is a set of temporal dimension schemas, and F_{s} is a set of temporal fact table schemas.
A multidimensional database instance I(B), is a tuple (F_{I}, D_{I}), where D_{I} and F_{I} are dimension and fact table instances defined as above.
In temporal databases, in general, instances are represented using valid time, this is, the time when the fact being recorded became valid. On the contrary, a database schema holds within a time interval that can be different from the one within which this schema was valid in reality. Changes in the real world are not reflected in the database until the database schema is updated. Schema versioning, thus, is often related with transaction time. In our temporal multidimensional model, things are different than above: an update to a dimension schema modifies the structure as well as the instance of the dimension. Figure 3 shows a dimension where schema and instance updates occurred. When storeId becomes generalized to storeType, the associated rollups must correspond to the instants in which they actually hold. Thus, we consider that temporal dimensions are represented by valid time. It is straightforward to extend the model for supporting temporal dimensions with valid and transaction times.
Fact table instances can be represented in our model using valid and transaction times. In order to accomplish this, two time dimensions must be defined. Fact table schema versioning is represented using transaction time. When a bottom level of a dimension is deleted or a level is added below it, the schemas of the associated fact tables change and new versions are defined for them.
In the previous section we introduced a model accounting for temporal dimensions, i.e., dimensions that evolve across time, allowing us to keep track of the dimension's history. Let us call this dimension versioning. We will show that typical OLAP queries involving temporal dimensions admit different interpretations, which should be considered in order to give the user the correct answers to queries. Further, we will introduce the TOLAP query language, which supports the model presented in the previous section.
One might argue, at first sight, that a generic temporal query language like TSQL2 (Snodgrass, 1995) could be used instead of defining a specialpurpose one like TOLAP. There are two main reasons supporting the idea of introducing a new language. First, a language designed specifically for the multidimensional model makes typical OLAP queries much more concise and elegant. In a generic language, queries would have to be laboriously encoded using detailed knowledge of the lowlevel relational structures used to encode the dimensional data. Second, the bestknown temporal query languages such as TSQL2, support only a minimal level of schema versioning (Snodgrass, 1995, p.29).
Another alternative would have been adding temporal features to other languages with schema management features, such as HiLog (Chen, Kifer, & Warren, 1989) or SchemaLog (Lakshmanan, Sadri, & Subramanian, 1993). Again, using a language specifically designed for OLAP yields much simpler syntax and semantics, and just the highorder features that are needed to support schema evolution.
The following example will show that a temporal OLAP query language is needed in order to capture the particular characteristics of OLAP queries expressed over a set of temporal dimensions.
Let us consider again a retail data warehouse, with a set of temporal dimensions D = {Product, Store}, and a base fact table with schema (Sales, f, sales, day). Dimensions Store and Product are the ones in Figures 3 and 6, respectively. The dimension levels in the fact table are {itemId, storeId}, the bottom levels of the dimensions in D. Notice that in Product, item i_{1} has type ty_{1} until day d_{4}, and type ty_{2} since then. For the Time dimension we assume: (and the rollups from week to All). Finally, we have the following instance for the Sales fact table (day is displayed for the sake of clarity, but could have been omitted, like in TSQL2):
Figure 6: Dimension Product for the Running Example
Let us now suppose the query: "List the weekly total sum of sales, by city and item type." Two interpretations could be given to this query. The usual one would compute the sum of sales considering the type an item had when it was sold. In this case, for instance, item i_{1} would contribute to the aggregation in the following way: the first three tuples, with a total of 800, will add to the group {ty_{1}, c_{1}, w_{1}}, while the last one will contribute to {ty_{2}, c_{1}, w_{2}}.
itemId  storeId  day  sales 

i_{1}  s_{1}  d_{1}  600 
i_{2}  s_{2}  d_{1}  100 
i_{2}  s_{1}  d_{2}  100 
i_{3}  s_{2}  d_{2}  100 
i_{3}  s_{3}  d_{3}  100 
i_{3}  s_{4}  d_{4}  100 
i_{1}  s_{1}  d_{5}  100 
The second interpretation, the only one currently supported by nontemporal systems, would compute the sum of the sales considering that each sold item has the current type, regardless of the time the sale occurred. The result a user would get under this interpretation is given by the table below, and was computed in the following way: the rollup function for every occurrence of item i_{1} is set to:
The results will be given by the following table.
itemType  city  week  sales 

ty_{1}  c_{1}  w_{1}  800 
ty_{2}  c_{1}  w_{1}  100 
ty_{2}  c_{2}  w_{2}  200 
ty_{2}  c_{2}  w_{2}  100 
Thus, all the i_{1} tuples will contribute to type ty_{2}. For instance, the first tuple will now contribute to the group {ty_{2}, c_{1}, w_{1}}. The following table shows the result the user would obtain under the second interpretation.
We will first present TOLAP by means of examples, and then define its syntax and semantics. It is interesting to notice that the queries we will present operate at a high level of abstraction, without requiring lowlevel knowledge about the database design that underlies the dimensional model.
We will work with a set of dimensions D = {Store, Product}, where Store is the dimension in Figure 3 and Product is the dimension depicted in Figure 4; the base fact table Sales is the one introduced at the beginning of this section.
itemType  city  week  sales 

ty_{1}  c_{1}  w_{1}  200 
ty_{2}  c_{1}  w_{1}  700 
ty_{2}  c_{2}  w_{2}  200 
ty_{2}  c_{2}  w_{2}  100 
Simple Queries. We begin with queries not involving aggregates.
Example 5: A query returning the sales for stores in Buenos Aires, on a daily basis, will be expressed in TOLAP as
In TOLAP, the query above returns the tuples in Sales such that s rolled up to ‘Buenos Aires’ at the time of the sale, where s represents an element in the instance set of the lowest level of the dimension Store. The query is interpreted as follows: a tuple in Sales will be in the result if an item i was sold in store s on a day t if s was settled in Buenos Aires on that day (recall that the granularity of the fact table is day ).
We assume a fixed order of the attributes in the base fact tables. For instance, in the base fact table Sales, the first position from the left will always correspond to dimension Product.
Queries with Aggregates. In order to address queries involving aggregation, we adapted a nonrecursive data log with aggregate functions (Consens & Mendelzon, 1990) which, in turn, was based on the approach of Klug's relational calculus with aggregates (Klug, 1982).
Example 6: Consider the query: "list the total sales per item and region," where we want aggregates to be computed using temporally consistent values (i.e., a sale in a given store is credited to the region that corresponded to that store at the time of the sale).
This query is interpreted as follows: let us suppose a pair of tuples t_{1}: < i_{1}, TaosBranch, 100, 10/10/01 >, and t_{2}:< i_{1}, DallasBranch1, 100, 10/10/01 >, in fact table Sales; moreover, assume that the store branches identified by TaosBranch and DallasBranch1 were assigned to the southern region on October 10, 2001. As the expression s region:re will evaluate to true when variables s, t, and re become instantiated with values TaosBranch (or DallasBranch1), 10/10/01, and Southern, respectively, the tuple < i_{1}, Southern, 200 > will be in the result.
It is worth noting that in case the time granularities of the fact table Sales and the Store dimension differ (for instance, if the dimension's granularity was month), the user would not have to make the granularity conversion explicit, allowing a limited form of "schema independence." Later examples show the use of variables that range over level names, pushing this independence farther.
Example 7: We now introduce descriptive attributes of dimension levels. Suppose we want the total sales by store and brand, for stores with more than 90 employees. Assume that the level storied is described by an attribute nbrEmp (number of employees).
Metaqueries.TOLAP also allows querying the system's metadata, supporting queries with no fact table in the body of the rules. We call these queries metaqueries.
"Give me the time instants at which store s_{1} belonged to the Southern region," expressed as:
* Note that we must specify the name of the dimension in the atom Store: storeId: ‘s_{1},’ because there is no fact table in the body of the rule to which the dimension's bottom level could be bound.
"List the months when ‘Southern’ was not a valid region."
In the example above, variable X ranges over level names in the Store dimension. Variable x is bound by the values in the levels of the dimension. The domain of t is the lifespan of the dimension referenced in the metaquery. For every instantiation of the variables X,x, and t, which makes the body rule of the rule true, the time instant t is listed.
"Were products categorized by brands two years ago?"
Here again X ranges over level names. The expression above means that if any element, in any level in the Product dimension, is rolledup to an element in level brand at the required date, the answer to the query will be "yes."
"How were customers classified three years ago?"
Earlier, we showed that our model supports different versions of a dimension schema over time (in temporal database terminology this is called schema versioning). For instance, consider an alternative history for dimension Store in our running example, which is depicted in Figure 7. We can see that the bottom level of Store was initially city, and at time d_{5} storeId was inserted below it. Also assume that the fact table depicted below was in effect before d_{5}.
Figure 7: Data Warehouse Versioning
itemId  city  sales  day 

i_{1}  c_{1}  100  d_{1} 
i_{2}  c_{2}  100  d_{2} 
i_{1}  c_{3}  100  d_{3} 
i_{2}  c_{1}  100  d_{1} 
After the update, the sales in the following table occurred (notice the new structure of the fact table).
itemId  storeId  sales  day 

i_{1}  s_{4}  100  d_{6} 
i_{2}  s_{2}  100  d_{6} 
i_{1}  s_{3}  100  d_{7} 
i_{2}  s_{1}  100  d_{7} 
In TOLAP, an element not defined at a given instant will not contribute to the result. For instance, given the query "list the total sum of sales by brand and storeId," we have:
The expression storeId:st means that if an element in any level, which was once a component of a base fact table, rolledup to level storeId at time t, it contributes to the aggregation. Thus, the sales made before d_{6} will not contribute to the aggregation in the head (condition storeId:st will not be satisfied). Analogously, a query like "total sales by store and itemId" would return exactly the instance of the second fact table above. This query is expressed:
For the dimension Product, Figure 4 shows that at time d_{5}, level company was added above level brand. Given the query, "total sales by company and region," the sales occurring before d_{5} will not contribute to the aggregation, as company was not a level of the dimension at that time. The query reads in TOLAP:
Finally, at time d_{9}, level brand was deleted from Product. The query "total sales by brand and region" is expressed in TOLAP as:
Any sale that occurred after d_{9} will not be considered, as brand is not a level of the dimension anymore.
A TOLAP program allows computing the result of a rule and using it as a predicate in the body of another rule. The predicate's name is the name of the predicate in the head of the rule which computes it.
For instance, let us suppose the following query: "List the total sales by brand, for those brands that sold more than $100,000 in Buenos Aires." We want to answer this query precomputing a view holding the total sales in Buenos Aires, by brand.
Then, we compute the query, using predicate BASales
For each brand in dimension Product matching a brand in BASales, in the second rule, if the amount sold was greater than 100,000, the sales contributes to the aggregation.
We will formally define the syntax of a TOLAP rule. We will first give some definitions which will be used below, and then formalize the concepts previously introduced in the section, TOLAP by example.
Preliminary Definitions. Given a set T, and a discrete linear order <, with no endpoints, we define a pointbased temporal domain as the structure T_{P} = (T, <). Analogously, given T_{P} = (T, <), we define the set I(T) = {(a, b)  a ≤ b, a, b ∊ T ∪ {−∞, +∞}}. Let us denote by θ the set of the usual interval comparison operators. Then, T_{I} = (I(T), θ) is an intervalbased temporal domain corresponding to T_{P}. The rollup functions in TOLAP will be defined over T_{P}.
Atoms, Terms, Rules, and Programs. Assume B_{s} (D_{s}, F_{s}) and I (B) represent a multidimensional database schema and instance, respectively, as already defined. Let V_{L} and V_{D} be a set of level and data variables, respectively. We have also the sets C_{L} and C_{D} of level and data constants, respectively. Let P be a set of predicate names and F_{Ag} a set of aggregate function names.
Definition 7 (Terms):
A data term is either a variable in V_{D}or a constant in C_{D};
a rollup term is an expression of the form d:X:x, X:x, or x, where X is a level name variable in V_{L}or constant in C_{L}, x is a data term, and d is a constant in C_{L};
a descriptive term is an expression of the form x.a where x and a are data terms;
an aggregate term is an expression of the form f(d) such that f is a function name in F_{Ag} and d is a data term.
A term is a data, rollup, descriptive, or aggregate term.
Definition 8 (Atoms):
A fact atom is an expression of the form F(X_{1}, …, X_{n}, M, t), where F is a fact table name in F_{s}_{,}and X_{1}, …, X_{n}, M and t are data terms;
a rollup atom is an expression of the form , or X → Y, where X and Y are rollup terms, and t is a data term;
a descriptive atom is an expression of the form where x is a descriptive term, and y and t are data terms;
an aggregate atom is of the form Q(R,…,Z) s.t. Q ∊ P, and R,…,Z are data terms s.t. at least one is an aggregate term;
a constraint atom is an expression t_{1} θt_{2}, where t_{1} and t_{2} are data terms, and θ is one of {<, =};
if g : N ×… × N → N is a scalar function, g(n_{1}, …, n_{m}), where n_{i} are data terms, is a scalar atom;
an intentional (extensional) predicate atom is an expression of the form p (X, …, Z) where X,Y,Z are data terms, and p is an intentional (extensional) predicate name in P. In what follows, we will refer to these kinds of atoms simply as predicate atoms when possible.
An atom is a fact, rollup, descriptive, aggregate, constraint, scalar, or predicate atom. An expression !t_{1}, where t_{1} is an atom, is a negated atom.
Definition 9 (TOLAP Rules): A TOLAP rule is a formula of the form A_{1}, A_{2}, …, A_{n}, where A is an intentional (possibly aggregate) positive atom, and A_{i}, i = 1…n are nonaggregate atoms. A TOLAP rule Γ satisfies the following conditions:
if a variable appears in the head of the rule, it must also appear in its body;
for every level/data variable v, all the rollup terms in which v appears are associated to the same dimension; moreover, all the variables in a rollup atom belong to the same dimension;
if there is an aggregate atom Q(a_{l}, …, a_{n}) in the head of the rule, for all atoms in the body, of the form is a constant data term (thus, aggregation is performed over constant level names);
every variable x in a rollup atom of the form must appear in a fact atom in the body of the rule;
if x.a is in the body of Γ, at least one rollup term in the body is of the form d:X:x, X:x, or x;
every variable appearing in a predicate atom must appear in a fact or rollup atom in the body of the rule;
in a rollup term of the form , d is a constant data term;
if there is no fact atom in the body of Γ, all the rollup atoms must be of the form
In a negated fact atom, at least one of its terms must be a constant in C_{D};
if t_{1} θ t_{2} is a constraint atom, t_{1} and t_{2} are both variables in a fact, rollup, or descriptive atom in the body, or one of them is a constant in C_{D} and the other is a variable in a fact, rollup, or descriptive atom;
if the head of a rule is of the form Q(a_{l}, …, a_{n}), Q cannot appear in the body of the same rule. From the rules above, it follows that there exists a function, call it dim, that maps each data variable x to a unique dimension dim(x), and each level variable X to a unique dimension dim(X). Furthermore, there exists a function level, that maps each instant t to a unique level level(x,t) of the dimension dim(x), and to a unique level level(X,t) of the dimension dim(X);
the granularity of a fact table F must be finer than the finest granularity of any of its component dimensions involved in a rollup atom in the body of the rule. This rule prevents a fact from being counted more than once. For example, consider the query:
Assume that the fact table granularity is "year," and the granularity of dim(s)is "day." A fact in an instance of F could be of the form <i_{1}, s_{1}, 100, 1998> The dimension may have a pair of tuples <s_{1}, rs_{1}, 111998, 551998> and <s_{1}, rs_{2}, 561998, 12311998>. Thus, the fact would be counted twice.
Definition 10 (Mutual Recursion): Given a set R of TOLAP rules, a precedence graph G is built as follows: for each aggregate, predicate, or fact atom P, there is a node named P in G. If P and Q are nodes in G, add an edge from P to Q if there is a rule Γ in R such that P and Q occur in the body and the head of Γ, respectively. Following Abiteboul, Hull, & Vianu (1995), we say that two aggregate, fact, or predicate atoms R and S are mutually recursive if R and S participate in the same cycle in G.
Definition 11 (TOLAP Programs): A finite set of TOLAP rules which does not contain mutually recursive atoms is called a TOLAP program.
We will use pointbased semantics (Toman, 1995) for the rollup functions. This means, for instance, that in a dimension such as Store of our running example, a value for a rollup, say storeId:s‘i1’ city:c:‘c_{1},’ exists for each month in its validity interval.
Let us assume that for each dimension instance, we have a pair of relations, call them R_{D} and D_{D}, representing the sets TRUP and TDESC of Definition 2, respectively. The multidimensional database is defined over three different domains: D, N, T_{P}, where variables ranging over D belong to an uninterpreted sort, the ones ranging over N belong to an interpreted sort (), and the temporal variables range over TP, as previously defined. For each dimension d, we will consider that T_{P} is limited to the lifespan of d.
A valuation θ, for a TOLAP rule Γ, is a tuple (θ_{S}, θ_{I}), where θ_{S} is called a schema valuation, and θ_{I} is an instance valuation. Valuation θ_{S} maps the level and attribute variables in Γ to level and attribute names in B_{S}(D_{S}, F_{S}), while θ_{I} maps domain variables to values in I(B).
Definition 12 (Valuations): A schema valuation for a rule Γ, denoted θ_{S}(Γ) maps level and attribute variables in the atoms of Γ as follows:
given a rollup atom of the form maps d to a dimension name in D_{s}, t to a value w ∊ T_{P}, and X and Y to a pair of values (v, u) s.t. u holds in d;
if the rollup atom is of the form , θ_{S} maps t to a value ω ∊ T_{P}, and X and Y to a pair of values v, u s.t. v ≤ u holds in dim(X) ∊ D_{s};
if the rollup atom is of the form , maps t to a value ω ∊ T_{P}, and Yto a value u s.t. level(x,w) holds in dim(X) ∊ D_{s};
for a rollup atom of the form x → Y:y, θ_{S} maps Yto a dimension level u in dim(Y), s.t.l_{inf} ≤ * u holds in dim(X);
given a descriptive atom of the form maps t to a value ω ∊ T_{p}, and A to an attribute name u ∊A, s.t. u>>_{t} level (x,w) in dim(X) ∊ D_{s}.
Given a rule schema valuation θ_{S}(Γ) for a rule Γ, an instance valuation is a function θ_{I} s.t.:
θ_{I} maps the domain variables x and y in the rollup atoms defined above to values in R_{D} over levels defined by θ_{s};
θ_{I} maps variable x in the descriptive atom , to values in D_{D}, over levels defined by θ_{s}, and y to a value in D or N;
θ_{I} maps a fact atom F(x_{1}, …, x_{n} M, t) as follows: the rightmost term t in F is mapped to a value ω ∊ T_{P}; each domain variable x_{i} in F to a value in dom (level (x_{i}, w)); and the data term M in F to a value in N ( M is the measure of the fact table);
a constraint atom x {<, =} y evaluates to true whenever θ_{I}(x) {<, =} θ_{I}(y) is true. Recall that every variable in a constraint atom must appear in a rollup or fact atom in the body. Thus, either θ_{I} maps x and y in the way described above, or they are mapped to a value in D, N or T_{P};
a negated rollup atom is evaluated using the Close World Assumption. Thus, !(x Y:y)is true if, given a valuation θ s.t. θ(x) = u, θ(t) = ω, θ(Y) = l, and θ(y) = v, then there does not exist a rollup in dim(x)s.t. recall that every variable must be in a positive atom in the body of the rule;
a negated descriptive atom is treated as explained above for negated rollup atoms; thus, is true if, given a valuation θ s.t. θ(x) = u, θ(t) =, ω, θ (A) = a, and θ(y) = v, a description does not exist in dim(x)such that ξ[t]_{level(x, w)}^{a} (u) = v;
a negated constraint atom is evaluated in the standard way (i.e., θ(!x = b) =θ(x <> b));.
a negated fact atom !F(x, y, …, "a," m, t) is true if for a valuation θ_{I}(x) = u, θ_{I}(y) = v, …, the tuple <u, v……. "a," … > is not in F;
predicate atoms are valuated as in standard data log (Abiteboul, Hull, & Vianu, 1995).
Let AGG be the set of aggregate functions, with extension AGG = {MIN, MAX, COUNT, SUM, AVG}, and r a relation. The aggregate operation (Consens & Mendelzon, 1990) γf_{A(X)} (r) is the relation
over XA, s.t. XA ∊ schema (r), f ∊ AGG, and f_{A}(r) denotes the aggregation of the values in t ∊ [A], t ∊ r, using t.
We can now define the semantics of a TOLAP rule Γ of the form
as follows:
For each level or data variable v_{i} in the body of Γ, and for a valuation θ of the variables in the rule's body, we have:
Then Q = γ_{AGGm(a1, …, an)} (r_{Γ})
Queries expressed as TOLAP rules studied in the previous subsections always lead to finite answers. This follows from the syntactic restrictions and from the semantics already defined. The existence of functions dim(x) and level (x,t) makes every variable in a rule bounded by the active domain of some level in a dimension. This is analogous to the concept of range restricted variable defined by Abiteboul, Hull, & Vianu (1995). Thus, for instance, if a negated rollup atom appears in the body, safety is always granted. The same occurs with negated fact atoms. Also, limiting the temporal domain to the lifespan of the dimensions avoids infinite query answers. Thus, we conclude that TOLAP rules are safe.
Intuitively, it is not hard to see that TOLAP has at least the power of firstorder query languages with aggregation. However, in a sense it goes beyond this class. Note that in the temporal multidimensional data model, only the direct rollups are stored. Thus, to evaluate a rollup atom like d:X:x Y:y, we need to compute the transitive closure of the rollup functions in dimension d. It is a wellknown fact that this cannot be done in first order, even after adding aggregate functions (Libkin & Long, 1997). However, as long as the dimension schema is fixed, this computation can be done in first order, because for a fixed schema, the number of joins needed to transitively close the rollup functions is known in advance.
Observe that not only the structure of a dimension is subject to updates. There are common reallife situations in which an instance of a dimension may be modified in a nontrivial fashion. For instance, suppose we add the Geography dimension of Figure 1 to our running example data warehouse, and pose the following query: "total sales per item and region, using only the currently existing regions." In TOLAP we would write:
The second rollup atom filters out regions not valid today.
Suppose now the following constraint: if a region where a sale occurred does not exist today, we want in the result a descendant of such region, if it exists. This query cannot be expressed in TOLAP. To show this, suppose the Cuyo region is split into CuyoEast and CuyoWest. Later, CuyoEast is merged with another region, say Pampa. In the meantime, maybe some region could have been deleted. With the tools defined so far, we could not find the "descendants" of Cuyo, because this is a transitive closure problem even though the schema remains fixed. TOLAP can be extended in order to be able to express the class of queries exemplified above. This problem, however, is beyond the scope of this chapter.
We now present a preliminary TOLAP implementation. We describe the data structure supporting the system, and how TOLAP atoms, rules, and programs are translated to SQL statements. We also discuss different implementation alternatives.
A first TOLAP implementation was developed at the University of Buenos Aires (Vaisman, 2001). Vaisman and Mendelzon (2001) give details of the implementation, and discuss preliminary results using a reallife case study, which we will briefly present in the next section.
We will analyze two different data structures for representing temporal dimensions: a "fixed schema" versus a "nonfixed schema" approach. In both of them, a relation with schema (dimensionId, loLevel, toLevel, From, To) represents the structure of each dimension across time. Each tuple in this relation means that in the dimension identified by dimensionId, level loLevel rolled up to level toLevel between instants From and To. For example, the schema of the retail data warehouse will be represented by a relation with tuples like <D1, itemId, itemType, d_{1}, Now>, <D1, itemId, brand, d_{2}, Now>, and so on.
We briefly discuss how instances of the dimensions are stored under both kinds of representations.
Fixed Schema. In a fixedschema relational representation, a dimension instance is a relation with tuples of the form (loLevel, upLevel, toVal, upVal, From, To). Each tuple represents a rollup As an example, the instances of dimension Product in our retail data warehouse would be represented by a relation with tuples of the form <itemId, itemType, i_{1}, ty_{1}, d_{l}, d_{3}>, <itemId, brand, i_{1}, b_{1}, d_{1}, Now>, and so on. Thus, as new levels are added to a dimension, new tuples are inserted in the relation representing the dimension, but the relation's schema does not change. For instance, adding a new level company above brand just requires adding tuples like <brand, company, b_{1}, c1, d_{6}, Now> (of course, in the relation storing the schema, a tuple <D1, brand, company, d_{6}, Now> must be inserted).
This is probably the most natural representation, because it implements the relation R_{D} straightforwardly. However, translation to SQL would be awkward, and the resulting SQL queries will be far from optimal, because in order to compute aggregation over nonbottom dimension levels, the transitive closure of the rollup functions must be computed, requiring selfjoining temporal relations.
Nonfixed Schema. In this case there is also one relation for each dimension, but each dimension level is mapped to an attribute in this relation. A single tuple captures all the possible paths from an element in the bottom level, to the distinguished element "all." Two attributes, From and To, indicate, as usual in the temporal database field, the interval within which a tuple is valid. For example, a relation representing dimension Product will have schema: {itemId, itemType, brand, From, To}.
This structure requires more complicated algorithms for supporting dimension updates. For instance, adding a dimension level above a given one, or below the bottom level, induces a schema update of the relation representing the dimension instance: a new column will be added to this relation. If a level is deleted, no schema update occurs (i.e., the corresponding column is not dropped, in order to preserve the dimension's history. In the example above, a column company would be added. However, the translation process is simpler, and the subsequent query performance better, because computing the transitive closure of the rollup functions reduces to a single relation scan.
Fixed vs. Nonfixed Schemas. To make the ideas above more clear, let us show how a TOLAP query is translated to SQL. Later, we will give full details of this process.
Consider the query "total sales by company," posed to the earlier example of a data warehouse. This query reads in TOLAP:
Assuming that no schema update affected the fact table Sales, the SQL equivalent of the TOLAP query above in the fixed schema approach will look like ^{[1]}:
SELECT P1.upLevel,SUM(sales) FROM Sales S, Product P, Product P1, Time T WHERE S.itemId = P.loVal AND P.loLevel = 'itemId' AND P.upLevel = 'brand' AND P1.upLevel = 'company' AND P.upLevel = P1.loLevel AND S.day between P.From AND P.To AND S.day between P1.From AND P1.To GROUP BY P1.upLevel
In the nonfixed schema representation, the SQL equivalent for the query is:
SELECT P.company,SUM(sales) FROM Sales S, Product P WHERE S.itemId = P.itemId AND S.day between P.From AND P.To GROUP BY P.company
It is easy to see that the result is much more elegant and efficient in the second approach. The computation of the rollup from itemId and company is straightforward, while in the first approach, the selfjoin of the table Product is required.
These arguments lead to us conclude that the nonfixed schema approach would be the better choice for a TOLAP implementation.
The nonfixed schema approach discussed above was followed in the implementation of TOLAP. A parser performs a first pass over a rule, checking that the syntactic conditions of Definition 9 are met. In case an error is found, execution halts. In this first pass, the different atoms are identified and a symbol table built. The second pass translates each atom into an SQL statement and builds the equivalent SQL query.
We will show how a TOLAP rule of the form
is translated to SQL. The translation of atoms, rules, and programs will be explained in that order.^{[2]}
TOLAP Atoms. For each rollup atom like , bound to a variable in a fact table, a selection clause is built as follows:
F.i = Dim_{i}.bottom AND F.time BETWEEN Dim_{i}.From AND Dim_{i}.To
The expression Dim_{i} is the table representing the dimension D_{i}. The first conjunct joins this table with the fact table, on the attribute representing the bottom level of the dimension. The actual name of the column F.i is taken from the fact table's metadata. The second conjunct corresponds to the join between the fact table and the "Time" dimension.
Each time constant is translated as a selection clause. The second rollup atom in the rule above will be translated as:^{[3]}
F.j = Dim_{j}.bottom AND Dim_{j}.To = Now
Dim_{j} is the dimension such that variable y_{j} is bound to.
If a rollup atom of the form x → Y:x corresponds to a userdefined time dimension, the atom is translated as:
F.j = Dim_{j}.bottom
A rollup atom Dim:1: , not bound to any fact table, is translated as an EXISTS clause.
EXISTS SELECT * FROM Dim WHERE F.time between Dim.From AND Dim.To
The WHERE clause is omitted if the join with the time dimension is not required.
The rollup from the bottom levels of the dimensions to the levels 1_{i} and 1_{j}, corresponding to the variables in the head of the rule above (the aggregation levels), is computed as a projection and an aggregation in the way shown below. Thus, the SQL query generated by the TOLAP query of the beginning of this section, will look like:
SELECT Dim_{i}.l_{i}, Dim_{j}.l_{j},Ag(measure) FROM F_l, Dim_{i}, Dim_{j} WHERE F_l.i = Dim_{i}.bottom AND F_l.j = Dim_{j}.bottom AND F_l.time between Dim_{i}. From AND Dim_{j}.To AND Dim_{j}.To = Now AND EXISTS (SELECT * FROM Dim WHERE F_l.Time between Dim.From AND Dim.To) GROUP BY Dim_{i}.l_{i}, Dim_{j}.l_{j}
The term measure is the measure in the fact table, bound to variable m. The fact table subindex represents the version of the fact table. In this case there is only one version, as no schema update occurred. We will come back to this shortly.
A constraint atom is translated as a selection condition in the WHERE clause. If the constraint atom is negated, this condition is treated in the usual way (a NOT predicate is added).
A negated rollup atom is translated as a NOT EXISTS clause. Suppose that in the query above we add a negated atom as follows^{[4]}:
where ‘a’ represents a constant. This atom is converted into an SQL expression of the form:
NOT EXISTS( SELECT * FROM Dimj WHERE Dimj.To=Now AND Dimj.l_{l} ='a' AND F.j = Dim_{j}.bottom) where l_{l} is the attribute representing level l_{l}.
A predicate atom is translated as a table in the FROM clause, with the conditions which arise from the variables or constants in it.
TOLAP Rules. So far we tackled the problem of translating each atom in a TOLAP rule separately. The next step will be the study of the translation of the whole rule.
Above, we gave an example of a simple rule, assuming no schema update occurred in the fact table. However, we claimed that one of the main features of TOLAP is the ability to deal with different schema versions triggered by the insertion or deletion of a bottom level. Given a fact table F, versions F_{1}, F_{2}, and so on may be created, each one with a different schema.
Given a TOLAP query involving a fact table F, with versions F_{l} ,…, F_{n}, the SQL query Q equivalent to Γ will be such that
where Q_{i} are queries involving facts that occurred in the intervals I_{i} in which each F_i holds. If the query involves an aggregation function, call it f_{AGG}, one more aggregation must be performed, in order to consider duplicates in each subquery. Thus
TOLAP programs are treated analogously, as a series of TOLAP rules.
Join elimination. There are cases in which the join between dimensions and fact tables is not needed. This situation arises when a variable in the head of the rule belongs to a level which is the bottom level of a fact table in the body (or was the bottom level at least during some interval I_{i}). The current TOLAP implementation takes advantage of this fact, and does not generate the join.
Example 8:Let us consider the situation stated earlier in which the fact table Sales of our running example is split into Sales_{1}, and Sales_{2}, each one holding before and after an instant d_{5}, as a result of the insertion of level storeId below level city. In TOLAP, the query "total sales by itemId and city" reads:
Two SQL subqueries will be generated, one for each fact table.
SELECT itemId,city,SUM(sales) FROM ( SELECT itemId,city, SUM(sales) FROM Sales_{1} GROUP BY itemId,city UNION ALL SELECT itemId,city, SUM(sales) FROM Sales_{2}, Store WHERE Sales_{2}.Time between Store.From AND Store.To AND Sales_{2}.storeId=Store.storeId GROUP BY itemId,city) GROUP BY itemId,city
Notice that, as city and itemId were the bottom levels of Sales_{1}, no join is needed in the first subquery.
Subquery Pruning. If a TOLAP rule contains a constraint atom with a condition over time such that the lifespan of a version of the fact table does not intersect with the interval determined by the constraint, the subquery corresponding to that version of the fact table is not generated by the translator, as it will not yield any tuple in the output. For instance, in Example 8, adding the constraint t < d_{6} will prevent the first subquery from being generated. We call this step subquery pruning.
In order to illustrate the need for temporal management in OLAP discussed in this chapter, let us present a reallife case study, a medical center in Argentina (Vaisman & Mendelzon, 2001), where each patient receives different services, including radiographies, electrocardiograms, medicine, disposable material, and so on. These services are denoted "Procedures." Data was taken from different tables in the clinic's operational database. Figure 8 shows the final state of the dimensions. We built a temporal data warehouse using six months of data taken from medical procedures performed on patients at the center.
Figure 8: Dimensions in the Case Study
Dimension Procedure, with bottom level procedureld, and levels procedureType, subgroup, and group, describes the different procedures available to patients. Dimension Patient, with bottom level patientId, and levels yearOfBirth and gender, represents information about the person under treatment. Age intervals are represented by a dimension level called yearRange. Patients are also grouped according to their health insurance institution. Moreover, these institutions are further grouped into types (level institutionType), such as private institutions, labor unions, etc. Dimension Doctor gives information about the available doctors (identified by doctorId) and their specialties (a level above doctorId). There is also a userdefined dimension Time, with levels day, week, and month.
A fact table holds data about procedures delivered to patients by a certain doctor on a certain date. We call this fact table Services, and its schema contains the bottom levels of the dimensions above, plus the measure. The fact table schema is: Services (doctorlD, procedureld, patientlD, day, quantity, t), where t represents a systemmaintained time dimension, and quantity is the measure; attribute day represents a userdefined time dimension. For instance, a tuple <docl, 10.01.01, pat120, 10, 2, "10/10/2000"> means that doctor "doc1" prescribed two special radiographies to patient "pat120" on day "10" which corresponds to October 10, 2001. In this case study, the systemmaintained and userdefined times are synchronized, meaning that there is a correspondence between the values of attributes "day" and "t" (e.g., day "11" corresponds to October 11, 2000). This synchronization, however, is not mandatory in the model. Thus, both times can take independent values.
The scenario was prepared as follows. Dimension Procedure was created with bottom level procedureId. Subsequent operations generalized this level to level subgroup. Level procedureId was then generalized to procType. Finally, subgroup was generalized to group, and procType related to group. In the Patient dimension, the initial bottom level patientId represents information about the person under treatment. This level was later generalized to year0fBirth, gender, and institution, in that order. Levels yearOfBirth and institution were further generalized into yearRange and institutionType, respectively. For dimension Doctor, in order to show how schema versioning is handled in TOLAP, we assumed that although facts were recorded at the doctorId level, this level was temporarily deleted, originating the second version of the fact table, called Services_2. Thus, during a short interval, the dimension's bottom level was specialty, later specialized into level doctorId, which triggered the creation of the third version of the fact table, denoted Services_3.
Finally, a userdefined Time dimension was created, with granularity day, allowing expressing aggregates over time. The dimension's hierarchy is the following: day rollsup to week and month; week and month rollup to All. The three resulting fact table versions have the following schemas: (a) Services_1:{procId, patientId, doctorId, day, quantity, tmp}, where tmp represents the builtin time dimension, and quantity is the number of procedures delivered; (b) Services_2:{procId, patientId, specialty, day, quantity, tmp}; (c) Services_3:{procId, patientId, doctorId, day, quantity, tmp}. The fact tables were populated offline.
The following queries reflect TOLAP's expressive power. Suppose a user wants to analyze the doctors' workloads, in order to estimate future needs. The following query measures how the arrival of a new doctor influences the number of patients served by a doctor named Martinez: "list the total number of services delivered weekly by Dr. Martinez while Dr. Feinsilver was not working for the clinic."
Notice that the negated atom is not bound to the fact table. Also notice the use of the userdefined Time dimension.
The following query returns the number of services delivered by Dr. Martinez while both doctors were employed at the clinic.
The next query illustrates how to check patients who were served when they were affiliated to ‘MEDICUS’ and are currently affiliated to ‘OSDE’ (two private health services in Argentina).
The reader must notice that the queries above would require several lines of SQL code. Moreover, without an underlying temporal multidimensional data model, these queries cannot be expressed, unless an adhoc design of the data warehouse is carried out.
The TOLAP implementation includes a graphic environment developed taking advantage of the temporal multidimensional model. The graphic interface allows to:
browse dimensions across time, and watch how they were hierarchically organized throughout their lifespan; further, dimension instances could also be browsed;
perform dimension updates;
import rollup functions from text files;
browse different versions of a fact table;
send TOLAP programs to the system, and display their results without leaving the environment, including seeing the generated SQL query.
Figure 9 shows a typical system screen depicting a dimension Patient, taken from the case study presented above, as of December 13, 2000. The window on the left presents the dimension's structure. The arrows upon the window allow browsing forward and backward across time. This is not allowed if the system is in "initial" mode. The window on the right shows the dimension's instances. This window is synchronic with respect to the one on the left. Thus, the instances being displayed correspond to the structure on the left although they can vary as instance updates occur. However, while the screen of the left remains unchanged, several instance updates may occur, which can be displayed in the window on the right. The little box in the upper middle indicates the number of elements in the bottom level which are going to be displayed, allowing partial loading of the instance graph in main memory. This feature is crucial in making the tool usable in real applications, because loading the entire instance graph would be very expensive, even for not very large dimensions.
Figure 9: Browsing Dimension "Patient"
^{[1]}For the sake of clarity we do not show here how time granularity is handled in the translation; usually, this will depend on the underlying DBMS.
^{[2]}Notation: The dimensions' subindices represent their position in the fact table to which they are bound. In the implementation, constructions of the form x 4 Y:x, are replaced by x[t] + Y:x, in order to simplify parsing.
^{[3]}In an actual implementation, Now can be replaced by Sysdate() or any function returning the current time.
^{[4]}Actually, the parenthesis is not required.

