Chapter 13

Overview

Materialized views are a data warehousing/decision support system tool that can increase by many orders of magnitude the speed of queries that access a large number (maybe many hundreds of thousands or millions) of records. In basic terms, they allow a user to query potentially terabytes of detail data in seconds (or less). They accomplish this by transparently using pre-computed summarizations and joins of data. These pre-computed summaries would typically be very small compared to the original source data.

Say, for example, your company has a sales database loaded with the details of a million orders, and you want to get a breakdown of sales by region (a common enough query). Each and every record would be scanned, the data aggregated to the region level, and the calculation performed. Using a materialized view, we can store a summary of the sales data by region, and have the system maintain these summaries for us. If you have ten sales regions, this summary will have ten records, so instead of sifting through a million detail records, we'll query only ten. Furthermore, if someone asks a slightly different question, say for the sales in a specific region, then that query can also obtain the answer from the materialized view.

In this chapter you'll find out what materialized views are, what they can do and, most importantly, how they work - a lot of the 'magic' goes on behind the scenes. Having gone to the trouble of creating it, you'll find out how to make sure that your materialized view is used by all queries to which the view is capable of providing the answer (sometimes, you know Oracle could use the materialized view, but it is not able to do so simply because it lacks important information). Specifically, we will:

  • Run through an example that will demonstrate the essential power of materialized views, and help you decide quickly whether it is a feature you might want to use.

  • Discuss the various parameters and privileges that must be set in order to use this feature.

  • Investigate, with examples, the use of constraints and dimensions to let the database know when to use a materialized view to answer a query in the most effective manner.

  • Look at how to use the DBMS_OLAP package to analyze your views.

  • Round off the chapter with two caveats you should be aware of when using materialized views.

A Brief History

Summary table management, another term for the materialized view, has actually been around for some time in tools such as Oracle Discoverer (an ad-hoc query and reporting tool). Using Discoverer, an administrator would set up various summary tables in the database. The tool would then parse queries before sending them to Oracle. If it determined that some summary table existed, which could answer the question more readily, it would rewrite the query to access the summary tables, rather than the underlying table that was originally specified in the query, and submit it to Oracle for processing. This was great, as long you used that tool to process your queries. If you ran a query in SQL*PLUS, or from your Java JDBC client, then the query rewrite would not (could not) take place. Furthermore, the synchronization between the details (original source data) and the summaries could not be performed or validated for you automatically, since the tool ran outside the database.

Furthermore, since version 7.0, the Oracle database itself has actually implemented a feature with many of the characteristics of summary tables - the Snapshot. This feature was initially designed to support replication, but I would use it myself to 'pre-answer' large queries. So, I would have snapshots that did not use a database link to replicate data from database to database, but rather just summarized or pre-joined frequently accessed data. This was good, but without any query rewrite capability, it was still problematic. The application had to know to use the summary tables in the first place, and this made the application more complex to code and maintain. If I added a new summary then I would have to find the code that could make use of it, and rewrite that code.

In Oracle 8.1.5 (Enterprise and Personal Editions) Oracle took the query rewriting capabilities from tools like Discoverer, the automated refresh and scheduling mechanisms from snapshots (that makes the summary tables 'self maintaining'), and combined these with the optimizer's ability to find the best plan out of many alternatives. This produced the materialized view.

With all of this functionality centralized in the database, now every application can take advantage of the automated query rewrite facility, regardless of whether access to the database is via SQL*PLUS, Oracle Forms, JDBC, ODBC, Pro*C, OCI, or some third party tool. Every Oracle 8i enterprise database can have summary table management. Also, since everything takes place inside the database, the details can be easily synchronized with the summaries, or at least the database knows when they aren't synchronized, and might bypass 'stale' summaries (you control its behavior in this case). By putting the functionality right in front of the data, anything that can access Oracle can take advantage of this functionality.

Note 

The same philosophy underpins features such as Fine Grained Access Control (see the section on Openness in Chapter 1 and also Chapter 21, which is dedicated to FGAC). The closer to the data these functions are, the broader the audience that can appreciate them. If you put security outside the database, in an application perhaps, only people who use the application can make use of it (hence the only access to the data is via the application).

What you'll need to run the Examples

In order to run the examples in this chapter you'll need access to a Personal or Enterprise Edition of Oracle 8.1.5, or higher. This functionality is not provided in the Standard release. You will need a user account with the following privileges (at least):

  • GRANT CREATE SESSION

  • GRANT CREATE TABLE

  • GRANT CREATE MATERIALIZED VIEW

  • GRANT QUERY REWRITE

The first three privileges above may be granted to a role that you have been granted. The QUERY REWRITE privilege must be granted directly to you.

Additionally, you'll need access to a tablespace with about 30 - 50MB of free space.

Finally, you must be using the Cost-Based Optimizer (CBO) in order to make use of query rewrite. If you do not use the CBO, query rewrite will not take place. In these examples, our optimizer goal will be left at the default of CHOOSE; simply analyzing the tables will ensure we can take advantage of query rewrite.

An Example

A quick example will demonstrate what a materialized view entails. The concept demonstrated below is that of reducing the execution time of a long running query transparently, by summarizing data in the database. A query against a large table will be transparently rewritten into a query against a very small table, without any loss of accuracy in the answer. We'll start with a large table that contains a list of owners of objects, and the objects they own. This table is based on the ALL_OBJECTS data dictionary view:

tkyte@TKYTE816> create table my_all_objects   2  nologging   3  as   4  select * from all_objects   5  union all   6  select * from all_objects   7  union all   8  select * from all_objects   9  /      Table created.      tkyte@TKYTE816> insert /*+ APPEND */ into my_all_objects   2  select * from my_all_objects;      65742 rows created.      tkyte@TKYTE816> commit;      Commit complete. tkyte@TKYTE816> insert /*+ APPEND */ into my_all_objects   2  select * from my_all_objects;      131484 rows created.      tkyte@TKYTE816> commit;      Commit complete.      tkyte@TKYTE816> analyze table my_all_objects compute statistics;      Table analyzed. 

On my system, I have the Java option installed so the MY_ALL_OBJECTS table has about 250000 rows in it after the above. You may have to adjust the number of times you UNION ALL and INSERT, in order to achieve the same effect. Now, we'll execute a query against this table that shows the number of objects owned by each user. Initially, this will require a full scan of the large table we have above:

tkyte@TKYTE816> set autotrace on tkyte@TKYTE816> set timing on tkyte@TKYTE816> select owner, count(*) from my_all_objects group by owner;      OWNER                            COUNT(*) ------------------------------ ---------- A                                      36 B                                      24 CTXSYS                               2220 DBSNMP                                 48 DEMO                                   60 DEMO11                                 36 DEMO_DDL                              108 MDSYS                                2112 MV_USER                                60 ORDPLUGINS                            312 ORDSYS                               2472 OUR_TYPES                              12 OUTLN                                  60 PERFSTAT                              636 PUBLIC                             117972 SCHEDULER                              36 SCOTT                                  84 SEAPARK                                36 SYS                                135648 SYSTEM                                624 TESTING                               276 TKYTE                                  12 TTS_USER                               48 TYPES                                  36      24 rows selected.      Elapsed: 00:00:03.35      tkyte@TKYTE816> set timing off tkyte@TKYTE816> set autotrace traceonly tkyte@TKYTE816> select owner, count(*) from my_all_objects group by owner; 24 rows selected.      Execution Plan ----------------------------------------------------------    0      SELECT STATEMENT Optimizer=CHOOSE (Cost=2525 Card=24 Bytes=120)    1    0   SORT (GROUP BY) (Cost=2525 Card=24 Bytes=120)    2    1     TABLE ACCESS (FULL) OF 'MY_ALL_OBJECTS' (Cost=547 Card=262968      Statistics ----------------------------------------------------------           0  recursive calls          27  db block gets        3608  consistent gets        3516  physical reads           0  redo size        1483  bytes sent via SQL*Net to client         535  bytes received via SQL*Net from client           3  SQL*Net roundtrips to/from client           1  sorts (memory)           0  sorts (disk)          24  rows processed 

In order to get the aggregate count, we must count 250000+ records on over 3600 blocks. Unfortunately, in our system we ask this question frequently, dozens of times every day. We are scanning almost 30MB of data. We could avoid counting the details each and every time by creating a materialized view of the data. The following demonstrates the basic steps needed to perform this operation. We'll discuss the GRANT and ALTER statements in more detail in the How Materialized Views Work section. In addition to the grants below, you might need the CREATE MATERIALIZED VIEW privilege as well, depending on what roles you have been granted and have enabled:

tkyte@TKYTE816> grant query rewrite to tkyte;      Grant succeeded.      tkyte@TKYTE816> alter session set query_rewrite_enabled=true;      Session altered.      tkyte@TKYTE816> alter session set query_rewrite_integrity=enforced;      Session altered.      tkyte@TKYTE816> create materialized view my_all_objects_aggs   2  build immediate   3  refresh on commit   4  enable query rewrite   5  as   6  select owner, count(*)   7    from my_all_objects   8   group by owner   9  /      Materialized view created.      tkyte@TKYTE816> analyze table my_all_objects_aggs compute statistics; Table analyzed. 

Basically, what we've done is pre-calculate the object count, and define this summary information as a materialized view. We have asked that the view be immediately built and populated with data. You'll notice that we have also specified REFRESH ON COMMIT and ENABLE QUERY REWRITE, but more on these in a moment. Also notice that we may have created a materialized view, but when we ANALYZE, we are analyzing a table. A materialized view creates a real table, and this table may be indexed, analyzed, and so on.

First, let's see the view in action by issuing the same query again (the query that we used to define the view itself):

tkyte@TKYTE816> set timing on tkyte@TKYTE816> select owner, count(*)   2    from my_all_objects   3   group by owner;      OWNER                            COUNT(*) ------------------------------ ---------- A                                      36 B                                      24 ... TYPES                                  36      24 rows selected.      Elapsed: 00:00:00.10      tkyte@TKYTE816> set timing off      tkyte@TKYTE816> set autotrace traceonly tkyte@TKYTE816> select owner, count(*)   2    from my_all_objects   3   group by owner;      24 rows selected.           Execution Plan ----------------------------------------------------------    0      SELECT STATEMENT Optimizer=CHOOSE (Cost=1 Card=24 Bytes=216)    1    0   TABLE ACCESS (FULL) OF 'MY_ALL_OBJECTS_AGGS' (Cost=1 Card=Valve)      Statistics ----------------------------------------------------------           0  recursive calls          12  db block gets           7  consistent gets           0  physical reads           0  redo size        1483  bytes sent via SQL*Net to client         535  bytes received via SQL*Net from client           3  SQL*Net roundtrips to/from client           0  sorts (memory)           0  sorts (disk)          24  rows processed      tkyte@TKYTE816> set autotrace off 

From over 3,600 consistent gets (logical I/Os), to just 12. No physical I/O this time around as the data was found in the cache. Our buffer cache will be much more efficient now as it has less to cache. I could not even begin to cache the previous query's working set, but now I can. Notice how our query plan shows we are now doing a full scan of the MY_ALL_OBJECTS_AGGS table, even though we queried the detail table MY_ALL_OBJECTS. When the SELECT OWNER, COUNT(*)... query is issued, the database automatically directs it to our materialized view.

Let's take this a step further by adding a new row to the MY_ALL_OBJECTS table, and committing the change:

tkyte@TKYTE816> insert into my_all_objects   2  ( owner, object_name, object_type, object_id )   3  values   4  ( 'New Owner', 'New Name', 'New Type', 1111111 );      1 row created.      tkyte@TKYTE816> commit;      Commit complete. 

Now, we issue effectively the same query again, but this time we're just looking at our newly inserted row:

tkyte@TKYTE816> set timing on tkyte@TKYTE816> select owner, count(*)   2    from my_all_objects   3   where owner = 'New Owner'   4   group by owner;      OWNER                            COUNT(*) ------------------------------ ---------- New Owner                               1      Elapsed: 00:00:00.01 tkyte@TKYTE816> set timing off      tkyte@TKYTE816> set autotrace traceonly tkyte@TKYTE816> select owner, count(*)   2    from my_all_objects   3   where owner = 'New Owner'   4   group by owner;           Execution Plan ----------------------------------------------------------    0      SELECT STATEMENT Optimizer=CHOOSE (Cost=1 Card=1 Bytes=9)    1    0   TABLE ACCESS (FULL) OF 'MY_ALL_OBJECTS_AGGS' (Cost=1 Card=Valve)      Statistics ----------------------------------------------------------           0  recursive calls          12  db block gets           6  consistent gets           0  physical reads           0  redo size         430  bytes sent via SQL*Net to client         424  bytes received via SQL*Net from client           2  SQL*Net roundtrips to/from client           0  sorts (memory)           0  sorts (disk)           1  rows processed      tkyte@TKYTE816> set autotrace off 

The analysis shows that we scanned the materialized view and found the new row. By specifying REFRESH ON COMMIT in our original definition of the view, we requested that Oracle maintain synchronization between the view and the details - when we update the details, the summary will be maintained as well. It cannot maintain synchronization in every case of an arbitrary materialized view, but in the case of a single table summary (as we have) or joins with no aggregation, it can.

Now, one last query:

tkyte@TKYTE816> set timing on tkyte@TKYTE816> select count(*)   2    from my_all_objects   3   where owner = 'New Owner';        COUNT(*) ----------          1      Elapsed: 00:00:00.00      tkyte@TKYTE816> set timing off      tkyte@TKYTE816> set autotrace traceonly tkyte@TKYTE816> select count(*)   2    from my_all_objects   3   where owner = 'New Owner';           Execution Plan ----------------------------------------------------------    0      SELECT STATEMENT Optimizer=CHOOSE (Cost=1 Card=1 Bytes=9)    1    0   SORT (AGGREGATE)    2    1     TABLE ACCESS (FULL) OF 'MY_ALL_OBJECTS_AGGS' (Cost=1 Card=Valve)      Statistics ----------------------------------------------------------           0  recursive calls          12  db block gets           5  consistent gets           0  physical reads           0  redo size         367  bytes sent via SQL*Net to client         424  bytes received via SQL*Net from client           2  SQL*Net roundtrips to/from client           0  sorts (memory)           0  sorts (disk)           1  rows processed tkyte@TKYTE816> set autotrace off 

We can see that Oracle is smart enough to use the view even when the query appears to be slightly different. There was no GROUP BY clause here, yet the database recognized that the materialized view could still be used. This is what makes materialized views 'magical'. The end users do not have to be aware of these summary tables. The database will realize for us that the answer already exists and, as long as we enable query rewrite (which we did), will automatically rewrite the query to use them. This feature allows you to immediately impact existing applications, without changing a single query.

Uses of Materialized Views

This is relatively straightforward and is answered in a single word - performance. By calculating the answers to the really hard questions up front (and once only), we will greatly reduce the load on our machine. We will experience:

Materialized views will increase your need for one resource - more permanently allocated disk. We need extra storage space to accommodate the materialized views, of course, but for the price of a little extra disk space, we can reap a lot of benefit.

Materialized views work best in a read-only, or read-intensive environment. They are not designed for use in a high-end OLTP environment. They will add overhead to modifications performed on the base tables in order to capture the changes. There are concurrency issues with regards to using the REFRESH ON COMMIT option. Consider our summary example from before. Any rows that are inserted or deleted from this table will have to update one of 24 rows in the summary table in order to maintain the count in real time. What this means is that, at most, you can have 24 people committing at the same time (assuming they all affect a different owner that is). This does not preclude the use of materialized views in an OLTP environment. For example if you use full refreshes on a recurring basis (during off-peak time) there will be no overhead added to the modifications, and there would be no concurrency issues. This would allow you to report on yesterday's activities, for example, and not query the live OLTP data for reports.

How Materialized Views Work

Materialized views may appear to be hard to work with at first. There will be cases where you create a materialized view, and you know that the view holds the answer to a certain question but, for some reason, Oracle does not. If you dig deep enough, you'll discover why, and it always comes back to the fact that Oracle is just a piece of software, and can only work with the information disk provided with it. The more meta data provided, the more pieces of information about the underlying data you can give to Oracle, the better. These pieces of information are mundane things that you might not even think about in a data warehouse environment, such as NOT NULL constraints, primary keys, foreign keys and so on. The meta data provided by these keys and constraints gives the optimizer more information, and hence, more of a chance.

Note 

Not only do keys and constraints, such as those listed above preserve data integrity, they also add information about the data, into the data dictionary, which can be used in query rewrites - hence the term met adata. See the 'Constraints' section for further information.

In the following sections we will look at what you have to do to set up materialized views, some examples of using them, and how adding more information, more meta data, to the database will make materialized views work more often.

Setting Up

There is one mandatory INIT.ORA parameter necessary for materialized views to function, and this is the COMPATIBLE parameter. The value of COMPATIBLE should be set to 8.1.0, or above, in order for query rewrites to be functional. If this value is not set appropriately, query rewrite will not be invoked.

There are two other relevant parameters that may be set at either the system-level (via the INIT.ORA file), or the session-level (via the ALTER SESSION command). They are:

In the example above, you saw the ALTER SESSION statements that enable this query rewrite magic. Since the example used only relationships and objects enforced by Oracle, the query rewrite integrity could be set to the highest level - ENFORCED.

I also needed to grant QUERY REWRITE privileges to myself. Now, the account I used happened to be a DBA account, which has QUERY REWRITE, so why did I need to grant the privilege directly to myself? The reason is that you cannot create compiled stored objects such as materialized views, stored procedures, and triggers that rely on privileges from a role (the DBA role in this case). See Chapter 23, Invoker and Definers Rights, for a full explanation of the use of roles and compiled stored objects. If you create a materialized view with QUERY REWRITE enabled, but do not have QUERY REWRITE system privilege yourself, you will receive the following error:

create materialized view my_all_objects_aggs * ERROR at line 1: ORA-01031: insufficient privileges 

Internal Mechanics

So, now that we can create a materialized view and show that it works, what are the steps Oracle will undertake to rewrite our queries? Normally, when QUERY_REWRITE_ENABLED is set to FALSE, Oracle will take your SQL as is, parse it, and optimize it. With query rewrites enabled, Oracle will insert an extra step into this process. After parsing, Oracle will attempt to rewrite the query to access some materialized view, instead of the actual table that it references. If it can perform a query rewrite, the rewritten query (or queries) is parsed and then optimized along with the original query. The query plan with the lowest cost from this set is chosen for execution. If it cannot rewrite the query, the original parsed query is optimized and executed as normal.

Query Rewrite

When query rewrite is enabled, Oracle will use the following steps to try and rewrite a query with a materialized view.

Full Exact Text Match

In this method, Oracle considers possible exact string matches in the set of available materialized views found in the data dictionary. In the example above, this is the method Oracle would have used for the very first query that used the materialized view. The algorithm used is 'friendlier' (more flexible) than a shared pool comparison (which demands an exact byte-for-byte match) as it ignores whitespace, case of characters, and other formatting.

Partial Text Match

Starting with the FROM clause, the optimizer compares the remaining text of the materialized view's defining query. This allows for mismatches between items in the SELECT list. If the data you need can be generated from the materialized view (if your SELECT list can be satisfied) Oracle will rewrite the query using the materialized view. The query SELECT LOWER(OWNER) FROM MY_ALL_OBJECTS GROUP BY OWNER; would be an example of a partial text match.

General Query Rewrite Methods

These enable the use of a materialized view even if it contains only part of the data, more data than requested, or data that can be converted. The optimizer tests the materialized view's definition against the individual components of the query (SELECT, FROM, WHERE, GROUP BY) to find a match. The checks that Oracle performs against these components are:

We can look at an example of join compatibility using MY_ALL_OBJECTS and the following tables:

tkyte@TKYTE816> create table t1 ( owner varchar2(30), flag char(1) );      Table created.      tkyte@TKYTE816> create table t2 ( object_type varchar2(30), flag char(1) );      Table created. 

The following query is join-compatible with the materialized view - the query can and will be rewritten using the materialized view:

tkyte@TKYTE816> select a.owner, count(*), b.owner   2    from my_all_objects a, t1 b   3   where a.owner = b.owner   4     and b.flag is not null   5   group by a.owner, b.owner   6  / 

The database can see that using our materialized view, or the actual base table would result in the same answer. However, the following query, while similar, is not join-compatible:

tkyte@TKYTE816> select a.owner, count(*), b.object_type   2    from my_all_objects a, t2 b   3   where a.object_type = b.object_type   4     and b.flag is not null   5   group by a.owner, b.object_type   6  / 

The OBJECT_TYPE column is not in our materialized view, so Oracle cannot rewrite this query using the materialized view.

Grouping Compatibility

This is required if both the materialized view and the query contain a GROUP BY clause. If the materialized view is grouped at the same level, or is grouped at a higher level of detail than is needed, the query will be rewritten to use the materialized view. The query SELECT COUNT(*) FROM MY_ALL_OBJECTS GROUP BY 1; applied against our first example would be a case where the materialized view is grouped at a higher level of detail than is needed. The database can rewrite this query to use our materialized view even though the grouping for the query is not the same as the grouping for the materialized view.

Aggregate Compatibility

This is required if both query and materialized view contain aggregates. It will ensure that the materialized view can satisfy the required aggregates. It can perform some interesting rewrites in some cases. For example, it will recognize that AVG(X) is the same as SUM(X)/COUNT(X), so a query that requires AVG(X) can be satisfied by a materialized view with the SUM and COUNT.

In many cases, simple application of the above rules will allow Oracle to rewrite a query to use a materialized view. In other cases (as we will see in an example below), the database will need a little more help from you. You will need to give it additional information in order for it to recognize that it can use a materialized view to answer a question.

Making sure your View gets used

In this section, we'll look at ways to do this, first by using constraints to help us make use of a query rewrite, and then by using dimensions, which are a means of describing complex relationships - hierarchies of data.

Constraints

I've been asked in the past, 'Why should I use a primary key? Why not just use a unique index?' Well, the answer is that you could, but doesn't the fact you used a primary key say something over and above just using a unique index? In fact it does - it can say a lot. The same goes for the use of foreign keys, NOT NULL constraints and others. Not only do they protect the data, but they also add information about the data into the data dictionary. Using this additional information, Oracle is able to perform a query rewrite more often, in many complex cases.

Consider the following small example. We will copy the EMP and DEPT tables from the SCOTT schema, and create a materialized view that pre-joins the tables together. This materialized view differs from our first example in that it is a REFRESH ON DEMAND materialized view. This means that in order for changes to be applied to it, we will have to refresh it manually:

tkyte@TKYTE816> create table emp as select * from scott.emp;      Table created.      tkyte@TKYTE816> create table dept as select * from scott.dept;      Table created.      tkyte@TKYTE816> alter session set query_rewrite_enabled=true;      Session altered.      tkyte@TKYTE816> alter session set query_rewrite_integrity=enforced;      Session altered.      tkyte@TKYTE816> create materialized view emp_dept   2  build immediate   3  refresh on demand   4  enable query rewrite   5  as   6  select dept.deptno, dept.dname, count (*)   7    from emp, dept   8   where emp.deptno = dept.deptno   9   group by dept.deptno, dept.dname  10  /      Materialized view created.      tkyte@TKYTE816> alter session set optimizer_goal=all_rows;      Session altered. 

Since the underlying tables and the resulting materialized view are very small, we force the use of the cost-based optimizer using the ALTER SESSION command, instead of analyzing tables as we would normally. If Oracle knew how small these tables were, it would not do some of the optimizations we would like. Using the 'default' statistics, it will behave as if the tables were fairly large.

Now, we have withheld a lot of information from Oracle here. It does not understand the relationship between EMP and DEPT, does not know which columns are primary keys, and so on. Now, let's run a query, and see what happens:

tkyte@TKYTE816> set autotrace on tkyte@TKYTE816> select count(*) from emp;        COUNT(*) ----------         14           Execution Plan ----------------------------------------------------------    0      SELECT STATEMENT Optimizer=ALL_ROWS (Cost=1 Card=1)    1    0   SORT (AGGREGATE)    2    1     TABLE ACCESS (FULL) OF 'EMP' (Cost=1 Card=82) 

The query has been directed at the underlying EMP table. Now, you and I know that the COUNT(*) query could easily, and more efficiently (especially if the number of employees in each department was large, and there were lots of departments), have been answered from the materialized view. There, we have all of the information we need to get the count of employees. We know this because we are aware of things about the data that we kept from Oracle:

These three facts imply that if we join EMP to DEPT, each EMP row will be observed in the resultset at least once and at most once. Since we never told Oracle these facts, it was not able to make use of the materialized view. So, let's make Oracle aware of them:

tkyte@TKYTE816> alter table dept   2  add constraint dept_pk primary key(deptno);      Table altered.      tkyte@TKYTE816> alter table emp   2  add constraint emp_fk_dept   3  foreign key(deptno) references dept(deptno);      Table altered.      tkyte@TKYTE816> alter table emp modify deptno not null; Table altered.      tkyte@TKYTE816> set autotrace on tkyte@TKYTE816> select count(*) from emp;        COUNT(*) ----------         14      Execution Plan ----------------------------------------------------------    0      SELECT STATEMENT Optimizer=ALL_ROWS (Cost=1 Card=1 Bytes=13)    1    0   SORT (AGGREGATE)    2    1     TABLE ACCESS (FULL) OF 'EMP_DEPT' (Cost=1 Card=82 Bytes=1066) 

Now Oracle is able to rewrite the query using the EMP_DEPT materialized view. Any time that you know Oracle could use a materialized view, but it is not doing so (and you have verified you can use materialized views in general), take a closer look at the data and ask yourself 'What piece of information have I withheld from Oracle?' Nine times out of ten, you'll find a missing piece of meta data that, when included, allows Oracle to rewrite the query.

So, what happens if this is a true data warehouse, and there are tens of millions of records in the above tables? You don't really want the additional effort of verifying a foreign key relationship - you already did that in your data scrubbing routine, didn't you? In this case, you can create a non-validated constraint, one that is used to inform the database about a relationship, but it has not been validated by the database itself. Let's look at the above example again, but this time we'll simulate the loading of data into an existing data warehouse (our example above is our data warehouse). We'll drop our constraints, load the data, refresh the materialized views, and add our constraints back. We'll start with dropping the constraints:

tkyte@TKYTE816> alter table emp drop constraint emp_fk_dept;      Table altered.      tkyte@TKYTE816> alter table dept drop constraint dept_pk;      Table altered.      tkyte@TKYTE816> alter table emp modify deptno null;      Table altered. 

Now, in order to simulate our load, I will insert a single new row into EMP (not much of a load, I know, but enough for demonstration purposes). Then, we will refresh our materialized view and tell Oracle to consider it as FRESH:

tkyte@TKYTE816> insert into emp (empno,deptno) values ( 1, 1 );      1 row created.      tkyte@TKYTE816> exec dbms_mview.refresh( 'EMP_DEPT' );      PL/SQL procedure successfully completed. tkyte@TKYTE816> alter materialized view emp_dept consider fresh;      Materialized view altered. 

Now we tell Oracle about the relationships between EMP and DEPT:

tkyte@TKYTE816> alter table dept   2  add constraint dept_pk primary key(deptno)   3  rely enable NOVALIDATE   4  /      Table altered.      tkyte@TKYTE816> alter table emp   2  add constraint emp_fk_dept   3  foreign key(deptno) references dept(deptno)   4  rely enable NOVALIDATE   5  /      Table altered.      tkyte@TKYTE816> alter table emp modify deptno not null NOVALIDATE;      Table altered. 

So here we have told Oracle that there is a foreign key from EMP to DEPT as before. However this time, because we will have scrubbed our data prior to loading it into the warehouse, we tell Oracle not to perform any validating checks. The NOVALIDATE option bypasses the checking of existing data we loaded, and RELY tells Oracle to trust the integrity of the data. Basically, we have told Oracle to trust that if it joins EMP to DEPT by DEPTNO, every row in EMP will be retrieved at least once, and at most once.

In fact, in this case, we have 'lied' to the database. We have inserted a row into EMP that has no corresponding row in DEPT. We are now ready to query:

tkyte@TKYTE816> alter session set query_rewrite_integrity=enforced;      Session altered.      tkyte@TKYTE816> select count(*) from emp;        COUNT(*) ----------         15      Execution Plan ----------------------------------------------------------    0      SELECT STATEMENT Optimizer=ALL_ROWS (Cost=1 Card=1)    1    0   SORT (AGGREGATE)    2    1     TABLE ACCESS (FULL) OF 'EMP' (Cost=1 Card=164) 

Since we set QUERY_REWRITE_INTEGRITY=ENFORCED Oracle did not rewrite the query to use the materialized view. We must go down a level in query integrity. We need Oracle to 'trust' us:

tkyte@TKYTE816> alter session set query_rewrite_integrity=trusted;      Session altered.      tkyte@TKYTE816> select count(*) from emp;        COUNT(*) ----------         14      Execution Plan ----------------------------------------------------------    0      SELECT STATEMENT Optimizer=ALL_ROWS (Cost=1 Card=1 Bytes=13)    1    0   SORT (AGGREGATE)    2    1     TABLE ACCESS (FULL) OF 'EMP_DEPT' (Cost=1 Card=82 Bytes=1066) 

In this case, Oracle did in fact rewrite the query, but the side effect is that our newly inserted row has not been counted. The 'wrong' answer is returned because the 'fact' that each row in EMP should be preserved in a join to DEPT is not a fact, given the data we loaded. When the materialized view refreshed, it did not get the newly added EMP row. The data we told Oracle to rely on was not reliable. This demonstration highlights two important points:

Dimensions

Use of dimensions is another method by which we can give even more information to Oracle. Suppose we have a table of details giving a transaction date and a customer ID. The transaction date points to another table that gives full details of what month the transaction date was in, what quarter of your fiscal year it represents, what fiscal year it was in, and so on. Now, suppose you created a materialized view that stored aggregated sales information at the monthly level. Can Oracle use that view to answer a query for sales data for a particular quarter or year? Well, we know that transaction date implies month, month implies quarter, and quarter implies year, so the answer is that it can, but Oracle doesn't know this (yet), so although it can, it won't.

Using a database object called a DIMENSION, we can alert Oracle to these facts so that it will use them to rewrite queries in more cases. A dimension declares a parent/child relationship between pairs of columns. We can use it to describe to Oracle that, within a row of a table, the MONTH column implies the value you'll find in the QTR column, the QTR column implies the value you'll find in the YEAR column, and so on. Using a dimension, we can have a materialized view that has fewer details than the detailed records (summarized to the monthly level perhaps). This may still be at a higher level of aggregation than the query requests (the query wants data by quarter, say), but Oracle will recognize that it can use the materialized view to get the answer.

Here is a simple example. We will set up a SALES table to store the transaction date, a customer ID, and the total number of sales. This table will have about 350000 rows in it. Another table, TIME_HIERARCHY, will store the mapping of transaction date to month, to quarter, to year. If we join the two together, we can obtain aggregate sales by month, quarter, year, and so on. Likewise, if we had a table that mapped a customer ID to a zip code, and the zip code to a region, we could easily join this table to SALES, and aggregate by zip code or region.

In a conventional database schema (one without materialized views and other structures) these operations would succeed, but they would be very slow. For every row in the sales data, we would have to perform an indexed read into the lookup table to convert either the transaction date or customer ID into some other value (a NESTED LOOP JOIN) in order to group by this other value. Enter the materialized view. We can store a summarized rollup of the details, perhaps at the monthly level for the transaction date, and at the zip code level for the customer information. Now, rolling up to quarters or by region becomes a very fast operation.

We'll start by creating the SALES table, and load it with some random test data, generated using the ALL_OBJECTS view.

tkyte@TKYTE816> create table sales   2  (trans_date date, cust_id int, sales_amount number );      Table created.      tkyte@TKYTE816> insert /*+ APPEND */ into sales   2  select trunc(sysdate,'year')+mod(rownum,366) TRANS_DATE,   3          mod(rownum,100) CUST_ID,   4          abs(dbms_random.random)/100 SALES_AMOUNT   5    from all_objects   6  /      21921 rows created.      tkyte@TKYTE816> commit;      Commit complete.      

These details will represent one year's worth of data. I set up the TRANS_DATE to simply be the first day of this year plus a number between 1 and 365. The CUST_ID is a number between 0 and 99. The total number of sales is some typically 'large' number (it was a really good year).

My ALL_OBJECTS view contains about 22000 rows, so after four inserts that consecutively double the size of the table, we'll have about 350000 records. I am using the /*+ APPEND */ hint simply to avoid the redo log that would otherwise be generated by these large inserts:

tkyte@TKYTE816> begin   2      for i in 1 .. 4   3      loop   4          insert /*+ APPEND */ into sales   5          select trans_date, cust_id, abs(dbms_random.random)/100   6            from sales;   7          commit;   8      end loop;   9  end;  10  /      PL/SQL procedure successfully completed.      tkyte@TKYTE816> select count(*) from sales;        COUNT(*) ----------     350736 

Now we need to set up our TIME_HIERARCHY table, to roll up the date field by month, year, quarter, and so on:

tkyte@TKYTE816> create table time_hierarchy   2  (day primary key, mmyyyy, mon_yyyy, qtr_yyyy, yyyy)   3  organization index   4  as   5  select distinct   6     trans_date    DAY,   7     cast (to_char(trans_date,'mmyyyy') as number) MMYYYY,   8     to_char(trans_date,'mon-yyyy') MON_YYYY,   9     'Q' || ceil( to_char(trans_date,'mm')/3) || ' FY'  10         || to_char(trans_date,'yyyy') QTR_YYYY,  11     cast( to_char( trans_date, 'yyyy' ) as number ) YYYY  12    from sales  13  /      Table created. 

In this case, it was simple enough. We generated:

In general, the computations required to create this table could be much more complex. For example, fiscal year quarters, typically, are not so easily computed, and neither are fiscal years. They do not generally follow the calendar year.

We'll now create the materialized view, SALES_MV. The summary we are creating rolls the data up from individual days to months. We would expect our materialized view to have about 1/30 the number of rows of our SALES table, if the data was evenly distributed:

tkyte@TKYTE816> analyze table sales compute statistics;      Table analyzed.      tkyte@TKYTE816> analyze table time_hierarchy compute statistics;      Table analyzed.      tkyte@TKYTE816> create materialized view sales_mv   2  build immediate   3  refresh on demand   4  enable query rewrite   5  as   6  select sales.cust_id, sum(sales.sales_amount) sales_amount,   7         time_hierarchy.mmyyyy   8    from sales, time_hierarchy   9   where sales.trans_date = time_hierarchy.day  10   group by sales.cust_id, time_hierarchy.mmyyyy  11  / Materialized view created.      tkyte@TKYTE816> set autotrace on tkyte@TKYTE816> select time_hierarchy.mmyyyy, sum(sales_amount)   2    from sales, time_hierarchy   3   where sales.trans_date = time_hierarchy.day   4   group by time_hierarchy.mmyyyy   5  /          MMYYYY SUM(SALES_AMOUNT) ---------- -----------------      12001        3.2177E+11      12002        1.0200E+10      22001        2.8848E+11      32001        3.1944E+11      42001        3.1012E+11      52001        3.2066E+11      62001        3.0794E+11      72001        3.1796E+11      82001        3.2176E+11      92001        3.0859E+11     102001        3.1868E+11     112001        3.0763E+11     122001        3.1305E+11      13 rows selected.      Execution Plan ----------------------------------------------------------    0      SELECT STATEMENT Optimizer=CHOOSE (Cost=4 Card=327 Bytes=850VALVE)    1    0   SORT (GROUP BY) (Cost=4 Card=327 Bytes=8502)    2    1     TABLE ACCESS (FULL) OF 'SALES_MV' (Cost=1 Card=327 Bytes 

So far, so good - Oracle rewrote the query to use the view, SALES_MV. However, let's see what happens if we issue a query that calls for a higher level of aggregation:

tkyte@TKYTE816> set timing on tkyte@TKYTE816> set autotrace on tkyte@TKYTE816> select time_hierarchy.qtr_yyyy, sum(sales_amount)   2   from sales, time_hierarchy   3  where sales.trans_date = time_hierarchy.day   4  group by time_hierarchy.qtr_yyyy   5  /      QTR_YYYY                                         SUM(SALES_AMOUNT) ------------------------------------------------ ----------------- Q1 FY2001                                               9.2969E+11 Q1 FY2002                                               1.0200E+10 Q2 FY2001                                               9.3872E+11 Q3 FY2001                                               9.4832E+11 Q4 FY2001                                               9.3936E+11      Elapsed: 00:00:05.58 Execution Plan ----------------------------------------------------------    0      SELECT STATEMENT Optimizer=CHOOSE (Cost=8289 Card=5 Bytes=14)    1    0   SORT (GROUP BY) (Cost=8289 Card=5 Bytes=145)    2    1     NESTED LOOPS (Cost=169 Card=350736 Bytes=10171344)    3    2       TABLE ACCESS (FULL) OF 'SALES' (Cost=169 Card=350736 B    4    2       INDEX (UNIQUE SCAN) OF 'SYS_IOT_TOP_30180' (UNIQUE)      Statistics ----------------------------------------------------------           0  recursive calls          15  db block gets      351853  consistent gets ... 

We see that Oracle doesn't have the knowledge we have. It does not yet know that it could have used the materialized view to answer this particular query, so it used the original SALES table instead, and had to do a lot of work to get the answer. The same thing would happen if we requested data aggregated by fiscal year.

So, let's use a DIMENSION to alert Oracle to the fact that the materialized view would be useful in answering this question. First, we'll create the DIMENSION:

tkyte@TKYTE816> create dimension time_hierarchy_dim   2          level day      is time_hierarchy.day   3          level mmyyyy   is time_hierarchy.mmyyyy   4          level qtr_yyyy is time_hierarchy.qtr_yyyy   5          level yyyy     is time_hierarchy.yyyy   6  hierarchy time_rollup   7  (   8   day child of   9   mmyyyy child of  10   qtr_yyyy child of  11   yyyy  12  )  13  attribute mmyyyy  14  determines mon_yyyy;      Dimension created. 

This tells Oracle that the DAY column of the TIME_HIERARCHY table implies MMYYYY, which in turn implies QTR_YYYY. Finally, QTR_YYYY implies YYYY. Also stated is the fact that MMYYYY and MON_YYYY are synonymous - there is a one-to-one mapping between the two. So, any time Oracle sees MON_YYYY used, it understands it as if MMYYYY was used. Now that Oracle has a greater understanding of the relationships between the data we can see a marked improvement in our query response times:

tkyte@TKYTE816> set autotrace on tkyte@TKYTE816> select time_hierarchy.qtr_yyyy, sum(sales_amount)   2   from sales, time_hierarchy   3  where sales.trans_date = time_hierarchy.day   4  group by time_hierarchy.qtr_yyyy   5  / QTR_YYYY                                         SUM(SALES_AMOUNT) ----------------------------------------------- ----------------- Q1 FY2001                                               9.2969E+11 Q1 FY2002                                               1.0200E+10 Q2 FY2001                                               9.3872E+11 Q3 FY2001                                               9.4832E+11 Q4 FY2001                                               9.3936E+11      Elapsed: 00:00:00.20      Execution Plan ----------------------------------------------------------    0      SELECT STATEMENT Optimizer=CHOOSE (Cost=7 Card=5 Bytes=195)    1    0   SORT (GROUP BY) (Cost=7 Card=5 Bytes=195)    2    1     HASH JOIN (Cost=6 Card=150 Bytes=5850)    3    2       VIEW (Cost=4 Card=46 Bytes=598)    4    3         SORT (UNIQUE) (Cost=4 Card=46 Bytes=598)    5    4           INDEX (FAST FULL SCAN) OF 'SYS_IOT_TOP_30180' (UNI    6    2       TABLE ACCESS (FULL) OF 'SALES_MV' (Cost=1 Card=327 Byt      Statistics ----------------------------------------------------------           0  recursive calls          16  db block gets          12  consistent gets ... 

Well, we went from more than 350000 logical reads to 12 - not too bad at all. If you run this example, you'll be able to see the difference. The first query took a while (about six seconds), the answer to the second query was on the screen before our hands left the Enter key (about a fifth of a second).

We can use this DIMENSION feature many times on the same base fact table. Consider if we assign a ZIP_CODE and REGION attribute to every customer in our database:

tkyte@TKYTE816> create table customer_hierarchy   2  ( cust_id primary key, zip_code, region )   3  organization index   4  as   5  select cust_id,   6     mod( rownum, 6 ) || to_char(mod( rownum, 1000 ), 'fm0000') zip_code,   7     mod( rownum, 6 ) region   8    from ( select distinct cust_id from sales)   9  /      Table created.      tkyte@TKYTE816> analyze table customer_hierarchy compute statistics;      Table analyzed. 

Next, we recreate our materialized view to be a summary that shows us SALES_AMOUNT by ZIP_CODE and MMYYYY:

tkyte@TKYTE816> drop materialized view sales_mv;      Materialized view dropped.      tkyte@TKYTE816> create materialized view sales_mv   2  build immediate   3  refresh on demand   4  enable query rewrite   5  as   6  select customer_hierarchy.zip_code,   7         time_hierarchy.mmyyyy,   8         sum(sales.sales_amount) sales_amount   9    from sales, time_hierarchy, customer_hierarchy  10   where sales.trans_date = time_hierarchy.day  11     and sales.cust_id = customer_hierarchy.cust_id  12   group by customer_hierarchy.zip_code, time_hierarchy.mmyyyy  13  /      Materialized view created. 

We'll try to execute a query now, which will show us sales by ZIP_CODE and MMYYYY, and demonstrate that I used the materialized view as expected:

tkyte@TKYTE816> set autotrace tkyte@TKYTE816> select customer_hierarchy.zip_code,   2         time_hierarchy.mmyyyy,   3         sum(sales.sales_amount) sales_amount   4    from sales, time_hierarchy, customer_hierarchy   5   where sales.trans_date = time_hierarchy.day   6     and sales.cust_id = customer_hierarchy.cust_id   7   group by customer_hierarchy.zip_code, time_hierarchy.mmyyyy   8  /      1250 rows selected.      Execution Plan ----------------------------------------------------------    0      SELECT STATEMENT Optimizer=CHOOSE (Cost=1 Card=409 Bytes=204    1    0   TABLE ACCESS (FULL) OF 'SALES_MV' (Cost=1 Card=409 Bytes=2      Statistics ----------------------------------------------------------          28  recursive calls          12  db block gets         120  consistent gets ... 

However, when we ask for information at a different level of aggregation (rolling MMYYYY up to YYYY and ZIP_CODE up to REGION), we see that Oracle does not recognize that it can use the materialized view:

tkyte@TKYTE816> select customer_hierarchy.region,   2         time_hierarchy.yyyy,   3         sum(sales.sales_amount) sales_amount   4    from sales, time_hierarchy, customer_hierarchy  5   where sales.trans_date = time_hierarchy.day   6     and sales.cust_id = customer_hierarchy.cust_id   7   group by customer_hierarchy.region, time_hierarchy.yyyy   8  / 9 rows selected.      Execution Plan ----------------------------------------------------------    0      SELECT STATEMENT Optimizer=CHOOSE (Cost=8289 Card=9 Bytes=26    1    0   SORT (GROUP BY) (Cost=8289 Card=9 Bytes=261)    2    1     NESTED LOOPS (Cost=169 Card=350736 Bytes=10171344)    3    2       NESTED LOOPS (Cost=169 Card=350736 Bytes=6663984)    4    3         TABLE ACCESS (FULL) OF 'SALES' (Cost=169 Card=350736    5    3         INDEX (UNIQUE SCAN) OF 'SYS_IOT_TOP_30185' (UNIQUE)    6    2       INDEX (UNIQUE SCAN) OF 'SYS_IOT_TOP_30180' (UNIQUE)      Statistics ----------------------------------------------------------           0  recursive calls          15  db block gets      702589  consistent gets ... 

Oracle understands the time dimension we set up, but it does not yet have any information about how CUST_ID, ZIP_CODE, and REGION relate to each other in our CUSTOMER_HIERARCHY table. What we do to correct this is to rebuild our dimension with two hierarchies in it - one describing the TIME_HIERARCHY and the other describing the CUSTOMER_HIERARCHY table:

tkyte@TKYTE816> drop dimension time_hierarchy_dim   2  /      Dimension dropped.      tkyte@TKYTE816> create dimension sales_dimension   2      level cust_id   is customer_hierarchy.cust_id   3      level zip_code  is customer_hierarchy.zip_code   4      level region    is customer_hierarchy.region   5      level day       is time_hierarchy.day   6      level mmyyyy    is time_hierarchy.mmyyyy   7      level qtr_yyyy  is time_hierarchy.qtr_yyyy   8      level yyyy      is time_hierarchy.yyyy   9  hierarchy cust_rollup  10  (  11      cust_id child of  12      zip_code child of  13      region  14  )  15  hierarchy time_rollup  16  (  17      day child of  18      mmyyyy child of  19      qtr_yyyy child of  20      yyyy  21  )  22  attribute mmyyyy  23  determines mon_yyyy;      Dimension created. 

We dropped the original time hierarchy and created a new, more descriptive one explaining all of the relevant relationships. Now Oracle will understand that the SALES_MV we created is able to answer many more questions. For example, if we ask our 'REGION by YYYY' question again:

tkyte@TKYTE816> select customer_hierarchy.region,   2         time_hierarchy.yyyy,   3         sum(sales.sales_amount) sales_amount   4    from sales, time_hierarchy, customer_hierarchy   5   where sales.trans_date = time_hierarchy.day   6     and sales.cust_id = customer_hierarchy.cust_id   7   group by customer_hierarchy.region, time_hierarchy.yyyy   8  /          REGION       YYYY SALES_AMOUNT ---------- ---------- ------------          0       2001   5.9598E+11          0       2002   3123737106          1       2001   6.3789E+11          2       2001   6.3903E+11          2       2002   3538489159          3       2001   6.4069E+11          4       2001   6.3885E+11          4       2002   3537548948          5       2001   6.0365E+11      9 rows selected.      Execution Plan ----------------------------------------------------------    0      SELECT STATEMENT Optimizer=CHOOSE (Cost=11 Card=9 Bytes=576)    1    0   SORT (GROUP BY) (Cost=11 Card=9 Bytes=576)    2    1     HASH JOIN (Cost=9 Card=78 Bytes=4992)    3    2       HASH JOIN (Cost=6 Card=78 Bytes=4446)    4    3         VIEW (Cost=3 Card=19 Bytes=133)    5    4           SORT (UNIQUE) (Cost=3 Card=19 Bytes=133)    6    5             INDEX (FAST FULL SCAN) OF 'SYS_IOT_TOP_30180' (U    7    3         TABLE ACCESS (FULL) OF 'SALES_MV' (Cost=1 Card=409 B    8    2       VIEW (Cost=3 Card=100 Bytes=700)    9    8         SORT (UNIQUE) (Cost=3 Card=100 Bytes=700)   10    9           INDEX (FULL SCAN) OF 'SYS_IOT_TOP_30185' (UNIQUE)      Statistics ----------------------------------------------------------           0  recursive calls          16  db block gets          14  consistent gets ... 

Oracle was able to make use of both hierarchies in the dimension here, and can now make use of the materialized view. Due to the dimensions we created, it performed simple lookups to convert the CUST_ID column into REGION (since CUST_ID implies ZIP_CODE implies REGION), the MMYYYY column into QTR_YYYY, and answered our question almost instantly. Here, we reduced the number of logical I/Os from more than 700,000 to 16. When you consider that the size of the SALES table will only grow over time, and the size of the SALES_MV will grow much more slowly (180 records or so per month), we can see that this query will scale very well.

DBMS_OLAP

The last piece to the materialized view puzzle is the DBMS_OLAP package. This package is used for the following purposes:

Unfortunately, the utilization routines are beyond the scope of what I can cover in one chapter. It involves the setting up of Oracle Trace and the Enterprise Manager Performance Pack, but we shall take a look at the other three.

In order to use the DBMS_OLAP package, you must have external procedures set up, as most of the DBMS_OLAP code is actually stored in a C library. See Chapter 18, C-Based External Procedures, for set-up instructions if you receive an error such as the following:

ERROR at line 1: ORA-28575: unable to open RPC connection to external procedure agent ORA-06512: at "SYS.DBMS_SUMADV", line 6 ORA-06512: at "SYS.DBMS_SUMMARY", line 559 ORA-06512: at line 1 

Estimating Size

The ESTIMATE_SUMMARY_SIZE routine will report the estimated number of rows and bytes of storage that the materialized view will consume. Since hindsight is 20/20, we can ask DBMS_OLAP to estimate a figure and then compare that figure to what we get in reality.

In order to run this procedure, you will need to ensure that you have a PLAN_TABLE installed in your schema. You will find the CREATE TABLE statement in the [ORACLE_HOME]/rdbms/admin directory on your database server, in a file named utlxplan.sql. If you execute this script, it will create the PLAN_TABLE for you. This table is used by the EXPLAIN PLAN facility that in turn is used by DBMS_OLAP to estimate the size of the materialized view. With this table in place, we can use the built-in ESTIMATE_SUMMARY_SIZE routine to get an estimate of the number of rows/bytes a materialized view would need if we were to build it. I start with a DELETE STATISTICS on our SALES_MV materialized view. DBMS_OLAP would not normally have access to a materialized view to actually see what the sizes are, so we have to hide it (otherwise DBMS_OLAP will get the exact answer from the data dictionary):

tkyte@TKYTE816> analyze table sales_mv DELETE statistics; Table analyzed.      tkyte@TKYTE816> declare   2      num_rows number;   3      num_bytes number;   4  begin   5      dbms_olap.estimate_summary_size   6      ( 'SALES_MV_ESTIMATE',   7         'select customer_hierarchy.zip_code,   8                 time_hierarchy.mmyyyy,   9                 sum(sales.sales_amount) sales_amount  10            from sales, time_hierarchy, customer_hierarchy  11           where sales.trans_date = time_hierarchy.day  12             and sales.cust_id = customer_hierarchy.cust_id  13           group by customer_hierarchy.zip_code, time_hierarchy.mmyyyy',  14        num_rows,  15        num_bytes );  16  17      dbms_output.put_line( num_rows || ' rows' );  18      dbms_output.put_line( num_bytes || ' bytes' );  19  end;  20  / 409 rows 36401 bytes      PL/SQL procedure successfully completed. 

The first parameter to this routine is the name of the plan to be stored in the plan table. This name is not excessively relevant, except that you will want to DELETE FROM PLAN_TABLE WHERE STATEMENT_ID = 'SALES_MV_ESTIMATE' after you are done. The second parameter is the query that will be used to instantiate the materialized view. DBMS_OLAP will analyze this query using the statistics on all of the underlying tables to guess the size of this object. The remaining two parameters are the outputs from DBMS_OLAP - the rows and byte count estimates, with values of 409 and 36401, respectively. Now let's calculate the true valves:

tkyte@TKYTE816> analyze table sales_mv COMPUTE statistics; Table analyzed.      tkyte@TKYTE816> select count(*) from sales_mv;        COUNT(*) ----------       1250      tkyte@TKYTE816> select blocks * 8 * 1024   2    from user_tables   3   where table_name = 'SALES_MV'   4  /      BLOCKS*8*1024 -------------         40960 

So, the ESTIMATE_SUMMARY_SIZE routine did very well on the size of the table, but underestimated the number of rows. This is typical of anything that 'estimates' - it will get some things right, and miss on others. I would use this routine for a rough 'best guess' as to the size of an object.

Dimension Validation

This routine takes any given dimension, and checks that the hierarchies you have defined are valid. For example, in our example earlier, it would ensure that a CUST_ID implied a ZIP_CODE implied a REGION. To see this routine at work, we'll create a 'bad' example to work from. We'll start by creating a table that has a row for every day of this year with the day, month and year as attributes:

tkyte@TKYTE816> create table time_rollup   2  ( day      date,   3    mon      number,   4    year     number   5  )   6  /      Table created.      tkyte@TKYTE816> insert into time_rollup   2  select dt, to_char(dt,'mm'), to_char(dt,'yyyy')   3    from ( select trunc(sysdate,'year')+rownum-1 dt   4             from all_objects where rownum < 366 )   5  /      365 rows created. 

So, here we have set up a time roll-up similar to our previous example. This time however, I did not preserve the year in the month attribute, just the two digits that represent the month. If we add one more row to this table:

tkyte@TKYTE816> insert into time_rollup values   2  ( add_months(sysdate,12),   3    to_char(add_months(sysdate,12),'mm'),   4    to_char(add_months(sysdate,12),'yyyy') );      1 row created. 

We can see the problem. We will be saying that DAY implies MONTH, and MONTH implies YEAR, but it is not true in this case. We'll have one month that implies one of two different years. DBMS_OLAP will do a sanity check for us to show us our error. First we'll set up the dimension:

tkyte@TKYTE816> create dimension time_rollup_dim   2     level day is time_rollup.day   3     level mon is time_rollup.mon   4     level year is time_rollup.year   5  hierarchy time_rollup   6  (   7          day child of mon child of year   8  )   9  /      Dimension created. 

And then validate it:

tkyte@TKYTE816> exec dbms_olap.validate_dimension( 'time_rollup_dim', user, false, false );      PL/SQL procedure successfully completed. 

It looks like it succeeded, but we really have to check the table it creates and populates for us:

tkyte@TKYTE816> select * from mview$_exceptions;      OWNER TABLE_NAME   DIMENSION_NAME   RELATIONSHI BAD_ROWID ----- ------------ ---------------- ----------- ------------------ TKYTE TIME_ROLLUP  TIME_ROLLUP_DIM  CHILD OF    AAAGkxAAGAAAAcKAA7 TKYTE TIME_ROLLUP  TIME_ROLLUP_DIM  CHILD OF    AAAGkxAAGAAAAcKAA8 TKYTE TIME_ROLLUP  TIME_ROLLUP_DIM  CHILD OF    AAAGkxAAGAAAAcKAA9 ...           32 rows selected. 

If we look at the rows the MVIEW$_EXCEPTIONS point us to, we'll find that they are the rows for the month of MARCH (I ran this in March). Specifically:

tkyte@TKYTE816> select * from time_rollup   2  where rowid in ( select bad_rowid from mview$_exceptions );      DAY              MON       YEAR --------- ---------- ---------- 01-MAR-01          3       2001 02-MAR-01          3       2001 03-MAR-01          3       2001 04-MAR-01          3       2001 ... 30-MAR-01          3       2001 31-MAR-01          3       2001 26-MAR-02          3       2002      32 rows selected. 

The problem is evident at this point, MON does not imply YEAR - the dimension is invalid. It would be unsafe to use this dimension, as the wrong answer would result.

It is recommended that your dimensions be validated after they are modified to ensure the integrity of the results you receive from the materialized views that make use of them.

Recommending Materialized Views

One of the more interesting uses of the DBMS_OLAP package is to have it tell you what materialized views you should consider creating. The RECOMMEND routines do just that.

There are two versions of this routine:

As a simple example, we'll ask DBMS_OLAP to take a look at our existing 'fact' table, SALES.

Note 

A fact table is a table in a star schema that, quite simply, contains facts. The SALES table we used above is a fact table. A fact table typically has two types of columns. There are columns that contain facts (values like the SALES_AMOUNT in our sales table), and there are columns that are foreign keys to dimension tables (list TRANS_DATE in our sales table).

Let's see what DBMS_OLAP has to say. Before we can do that, we'll need to add the foreign keys. The RECOMMEND routine won't look at the DIMENSION to see what can be done - it needs to see the foreign keys to determine the relationships between tables:

tkyte@TKYTE816> alter table sales add constraint t_fk_time   2  foreign key( trans_date) references time_hierarchy   3  /      Table altered.      tkyte@TKYTE816> alter table sales add constraint t_fk_cust   2  foreign key( cust_id) references customer_hierarchy   3  /      Table altered. 

Once we've done that, we are ready to look at our fact table, SALES:

tkyte@TKYTE816> exec dbms_olap.recommend_mv( 'SALES', 10000000000, '' );      PL/SQL procedure successfully completed. 

Here we asked RECOMMEND_MV to:

  1. Look at the table SALES.

  2. Consider a large amount of space to be used for the materialized views (we just passed a really big number).

  3. Not feel that it needs to keep any particular materialized view (we passed '' as the list of views to KEEP).

Next, we can either query the tables it populates directly or, more conveniently, use a sample routine to print the contents. To install the sample routine and run the report you will:

tkyte@TKYTE816> @C:\oracle\RDBMS\demo\sadvdemo      Package created.      Package body created.      Package created.      Package body created.      tkyte@TKYTE816> exec demo_sumadv.prettyprint_recommendations Recommendation Number = 1 Recommended Action is CREATE new summary: SELECT CUSTOMER_HIERARCHY.CUST_ID, CUSTOMER_HIERARCHY.ZIP_CODE, CUSTOMER_HIERARCHY.REGION , COUNT(*), SUM(SALES.SALES_AMOUNT), COUNT(SALES.SALES_AMOUNT) FROM TKYTE.SALES, TKYTE.CUSTOMER_HIERARCHY WHERE SALES.CUST_ID = CUSTOMER_HIERARCHY.CUST_ID GROUP BY CUSTOMER_HIERARCHY.CUST_ID, CUSTOMER_HIERARCHY.ZIP_CODE, CUSTOMER_HIERARCHY.REGION Storage in bytes is 2100 Percent performance gain is 43.2371266138567 Benefit-to-cost ratio is .0205891079113603 Recommendation Number = 2 ...      PL/SQL procedure successfully completed. 

DBMS_OLAP looked at the dimensions and existing materialized views, and is now making suggestions for additional materialized views that may usefully be created, given the meta data (primary keys, foreign keys, and dimensions) we have entered into the database.

If we used Oracle Trace we could go a step further with this recommendation process. Oracle Trace is capable of capturing the actual queries asked of the system, and recording details about them. These will be used by DBMS_OLAP to perform even more focused recommendations, recommendations based not only on what is possible, but based on the reality of the types of questions you actually ask of the data. Materialized views that are possible but would not be used by you based on your workload are not recommended. Other materialized views that are possible, and would be used by you, would be recommended, since they would be used by queries that your system actually executes.

Caveats

There are a few considerations to be aware of with regards to using materialized views. We will briefly cover some of them here.

Materialized Views are Not Designed for OLTP Systems

As mentioned above, materialized views typically add overhead to individual transactions and, if created with REFRESH ON COMMIT, will introduce contention. The overhead arises from the need to track the changes made by a transaction - these changes will either be maintained in the session state or in log tables. In a high-end OLTP system, this overhead is not desirable. The concurrency issue comes into play with a REFRESH ON COMMIT materialized view, due to the fact that many rows in the detail fact table point to a single row in a summary table. An update to any one of perhaps thousands of records, will need to modify a single row in the summary. This will naturally inhibit concurrency in a high update situation.

This does not preclude the use of materialized views with OLTP, in particular materialized views that are REFRESHed ON DEMAND, with a full refresh. A full refresh does not add the overhead of tracking transaction-level changes. Rather, at some point in time, the defining query for the materialized view is executed, and the results simply replace the existing materialized view. Since this is done on demand (or on a timed basis), the refresh may be scheduled for a time when the load is light. The resulting materialized view is especially relevant for reporting purposes - your OLTP data can be transformed using SQL into something that is easy and fast to query, every night. The next day, your online reports of yesterday's activities run as fast as possible, and easily co-exist with your OLTP system.

Query Rewrite Integrity

As we discussed above - this has three modes:

You must understand the ramifications of using each of these modes. ENFORCED will give you the right answer every time, at the expense of not using some materialized views that would speed up the query response time. TRUSTED, if what Oracle has been asked to 'trust' turns out to be false, may give a result that would not be achieved if the original source data had been queried instead. We saw an example of this early on with the EMP_DEPT materialized view. STALE_TOLERATED should be used in reporting systems where getting a value that existed a while ago is acceptable. If up-to-the-minute information is mandatory, STALE_TOLERATED should not be used.

Summary

Materialized views are a powerful data warehouse/decision support system feature. A single materialized view may be used by many different, but related, queries. Best of all, it is 100 percent transparent to the application, and the end user. You do not have to teach people what summaries are available - you inform Oracle what is possible via constraints, referential integrity, and dimensions. It does the rest for you.

Materialized views are the natural evolution, and merging, of features found in the database and decision support tools. No more are the summary table management features of Oracle's Discoverer (and other like tools) limited to these environments. Now every client, from the lowly SQL*PLUS to your custom developed applications, to off-the-shelf reporting tools can take advantage of the fact that the answer has already been stored for them.

Add to all this the DBMS_OLAP tool. It will not only estimate how much additional storage you need to support a materialized view, but it can watch how your existing views are used. Based on this, it will recommend dropping some and creating others - to the point of even supplying you with the query it feels you should use in the materialized view.

All in all, materialized views in a read-only/read-intensive environment will definitely pay you back for the additional storage through reduced query response time and reduced resources needed to actually process the queries.



Expert One on One Oracle
Full Frontal PR: Getting People Talking about You, Your Business, or Your Product
ISBN: 1590595254
EAN: 2147483647
Year: 2005
Pages: 41
Authors: Richard Laermer, Michael Prichinello
BUY ON AMAZON

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net