Views in databases typically are virtual objects, i.e., their content is computed only when the database receives a request to access them. A typical strategy consists in rewriting queries that refer to views, in order to express them in terms of the concrete relations available on the system.
Example 2 A query like
select Product, TotalSales, Brand from TotalSales, Products where TotalSales.Product = Products.Code
might be transformed by the query optimizer into query:
select Product, sum(Amount) as TotalSales, Brand from Sales, Products where Sales.Product = Products.Code group by Product, Brand
An alternative strategy consists of explicitly storing the content of a view in the database, creating a materialized view. The advantage of materialized views is that access to the view will only require reading the blocks where the materialization is stored, without the need to compute the query from the base data. Since query computation is almost always more expensive than reading a materialization, this strategy is able to reduce the cost of query computation and to improve database throughput. For example, if view TOTALSALES is materialized, the query will be directly executed on the view and it will in general be more efficient, as it will require access to a smaller amount of data and will not need to compute the aggregation.
The main shortcoming of materialized views is that, to guarantee query correctness, their content must be equivalent to what can be obtained by executing the query associated with the view on the current database state. Then, every time a table used in the view computation is updated, materialized views have to be kept consistent, applying the effect of the base updates on the result of the query defining the view. In the example, the materialization of view TOTALSALES will have to be updated when tables SALES or PRODUCT are modified.
We formalize the benefit of materialized views with the following simple model. Let us suppose that for a view V, the cost of executing its query is QDB and the cost of accessing its materialization is QM. In general, it will be QDB > QM. We further represent the cost of storing the view as CM and suppose that the query is executed with frequency fQ. Updates on data that force a view recomputation occur with frequency fU. To compute the benefit of view materialization, it will be necessary to evaluate the costs incurred executing the query directly on base data and compare them with the costs incurred using the materialization. When the materialization is not used, the cost of computing the query qi with fQ frequency will be equal to fQ · QDB. When the materialization is present, we have to sum up the cost of storing the materialization, the cost of computing it, and the cost of updating it fU times; we obtain the formula CM + fU · QDB + fQ · QM. The benefit B is the difference between the two expressions:
With an immediate algebraic transformation on the formula, we derive that the view materialization will offer a positive benefit only when
The formula confirms the intuition that the benefit increases with query frequency and with the advantage that the view materialization offers to view computation; the benefit decreases with the update frequency and with the cost of view materialization. If we disregard the materialization cost CM, we obtain the simpler formula that identifies the situations when a view materialization is convenient:
The advantages of materialized views can be significantly increased by the use of incremental maintenance techniques. These techniques have been the subject of a cosiderable amount of research (Gupta & Mumick, 1999). Many solutions have been identified that permit the efficient computation of the update that has to be applied on a materialized view to make it consistent with the updates that occurred on the base data. In this way, when updates are applied on the base tables, it is not necessary to recompute all the materialized views that have been modified from scratch.
For instance, for view TOTALSALES a complete recomputation would probably be extremely expensive. A more efficient solution may propagate updates on base tables to the view, e.g., by requiring that every insertion into SALES would generate an increment of the sale amount of the tuple in TOTALSALES describing the product sales. In typical database configurations, the size of updates is considerably smaller than the size of tables and views, and the use of incremental maintenance techniques offers a considerable potential. Extending on the previous formalization, if CU is the cost of updating the view, the cost when using techniques for incremental maintenance of materialized views may be formalized as:
with CU being typically a small fraction of the cost QDB of computing the view from scratch.
In databases, there is a large experience in using cost models to optimize queries. The formula above constitutes a starting point for the implementation of view materialization solutions in a wide range of situations. What the formula does not show are the additional obstacles that make it difficult to exploit the potential of these solutions.
One of the obstacles to a wide adoption of these techniques is the great number of views that can help query execution. This aspect forces the system to do a preliminary and difficult choice of the views that may benefit the computation of queries. Indeed, the use of materialized views for query computation has a strong relationship with multi-query optimization techniques, where the goal is to build an efficient query execution plan for multiple queries reusing the same intermediate results in more than one query.
For instance, consider the two queries:
query 1: select Product, sum(Amount) from Sales, Products where Sales.Product = Products.Code and Brand = 'SuperMagic' group by Product query 2: select Brand, sum(Amount) from Sales, Products where Sales.Product = Products.Code group by Brand
The computation of both queries would benefit from the identification and precomputation of TotalSales. The availability of the materialization of TOTALSALES permits us to rewrite the above queries into:
query1-rewritten: select Product, TotalAmount from TotalSales, Products where Sales.Product = Products.Code and Brand = 'SuperMagic' query2-rewritten: select Brand, sum(TotalAmount) from TotalSales, Products where Sales.Product = Products.Code group by Brand
In typical database applications queries arrive in a continuous stream to the database; the adoption of multi-query optimization would require us to analyze if there are common subparts in the arriving queries that may be computed only once. There are many obstacles that limit the applicability of multi-query optimization, which has met until now very limited success in commercial systems: the analysis has to be done in a relatively short time, the search space is extremely wide, and there is no opportunity to store the intermediate results for longer periods, as updates on the underlying data make them obsolete. When some of these restrictions are removed, multi-query optimization may become an interesting technique.
For instance, Mistry et al. (2001) demonstrate heuristics that can be used for multi-query optimization, in a situation where queries are known up front and updates arrive with a known frequency. Similar restrictions also apply to multidimensional databases, which have been until now the most interesting environment for the adoption of specific multi-query optimization and materialized views. The goal of this chapter is to explore the adoption of materialized views in this specific environment. The next section recalls some basic models and definitions. We then present the problem of incrementally maintaining a set of materialised views on a MDDB, and describe the problem of chosing an optimal set of materializations and possible solutions. Following that is an overview of current commercial implementations of materialized views, and we end with some conclusions.