# OPERATORS FOR MULTIDIMENSIONAL AGGREGATE DATA WHICH EXTEND RELATIONAL ALGEBRA AND RELATIONAL CALCULUS

In the literature on aggregate multidimensional (called initially statistical) databases, there are two different approaches for defining operators able to manipulate this multidimensional structure, from now on called multidimensional aggregate data (MAD): the enlarged relational model and the multidimensional tabular model.

Historically, the first approach refers to the enlarged relational model, and it is described in different papers. For example, in Klug (1982) the author describes the equivalence which exists between relational algebra and relational calculus with regard to the query languages which have aggregate functions. He gives a precise definition of aggregate functions and extends the relational algebra and relational calculus in a general and natural fashion to include aggregate functions. He also shows that the languages extended in such a way have equivalent expressive power. Finally, the author proposes a new operator, called aggregate formation, described at the beginning of Chapter 1.

In Su (1983) the author proposes the G-relation structure (G stands for ‘Generalized’), and a set of operators based on this structure which enlarge the classic relational operators. A G-relation is formally defined as follows: given a collection of concept types A1, , An, B1, , Bm, each of which contains a set of occurrences belonging to the same summary type, a G-relation type, denoted as R(A1, , An B1, , Bm) and defined on these n+m sets, is a set of ordered n+m tuples <a1, , an, b1, , bm> such that ai Ai, for i = 1 to n, and bj Bj, for j = 1 to m. Ai are called identifying domains and Bj are called summary domains. A number of restructuring operations and reviewed algebraic relational operations are defined and illustrated, such as aggregation (similar to summarization) or its inverse, disaggregation.

In Ghosh (1996) the author discusses an extension of relational algebra in order to execute a set of operations on a table (or two tables for joints), and to still have a table as a result. He also explains that it is possible to define both the classic relational operations of projection, selection, ordering, union, intersection, join, and outer join on domains of the relation R, and, simultaneously, also to define numerical operations of sum, subtraction, multiplication, division, vector dot products, restricted dot products, vector products, and restricted vector products on the rows of the relation.

In different papers, G. Ozsoyoglu and Z.M. Ozsoyoglu study an extension of relational algebra for summary tables, and a query language for manipulating these data. In particular, in Ozsoyoglu & Ozsoyoglu (1983) and Ozsoyoglu, Ozsoyoglu, & Mata (1985), they propose this extension, which is completed in Ozsoyoglu, Ozsoyoglu, & Matos (1987), where an extended relational algebra (ERA), containing aggregate functions and set-valued attributes, is proposed. A rigorous formalization of this algebra is presented and discussed, as well as some other operators, which are different from the traditional relational operators being proposed. In particular, starting from literals and relations, the operators discussed in the following are proposed: projection, cross-product (the Cartesian Product without the ordering constraint), restriction, set union, set difference, aggregate formation (imported from Klug, 1982), pack (which groups all the instances of the category attribute specified by the operator in one unique set-value), unpack (which is the inverse of the latter, i.e., splits the unique set-value instance of the category attribute specified by the operator in all the instances previously grouped), construct (which constructs a single-column set-valued relation using a range of values to form tuple components or tuples of the relation explicitly), aggregation-by-template (which groups tuples of the first relation (first operand) specifying both the attributes and the aggregation function). By the construct operation it generates a new relation where the summary values are obtained by applying the aggregation function to the instances, single values, or set-values, specified in the construct operator), θ-join and selection. They also discuss and formally define the calculus objects with aggregate functions, considering calculus expressions constructed by using variables, terms, formulas, range formulas, and by utilizing alpha expressions recursively. Finally, they demonstrate the equivalence between Algebra expressions and Calculus objects in both directions.

In Meo-Evoli, Ricci, & Shoshani (1992), operators that correspond to the relational algebra operators "select," "project," and "union," as well as "aggregate," were defined. They have been labeled "S-select," S-project,” etc. (S stands for "statistical operator"). They have the following semantics in the context of an aggregate statistical object:

S-select: selects a subset of category values of a category attribute. This does not reduce the cardinality of the multidimensional space, except if a single value is selected.

S-project: summarizes over all values of a dimension. This reduces the cardinality of the multidimensional space by one.

S-aggregation: summarizes over the values of the classification hierarchy. One can specify a summarization over one or more levels. This also does not reduce the cardinality of the multidimensional space.

S-union: is used to combine multiple statistical objects which have overlapping (or partially overlapping) category values.

We will show that similar operators have been identified in the OLAP area. Multidimensional Databases: Problems and Solutions
ISBN: 1591400538
EAN: 2147483647
Year: 2003
Pages: 150