Materialized views are a data warehousing/decision support system tool that can increase by many orders of magnitude the speed of queries that access a large number (maybe many hundreds of thousands or millions) of records. In basic terms, they allow a user to query potentially terabytes of detail data in seconds (or less). They accomplish this by transparently using pre-computed summarizations and joins of data. These pre-computed summaries would typically be very small compared to the original source data.
Say, for example, your company has a sales database loaded with the details of a million orders, and you want to get a breakdown of sales by region (a common enough query). Each and every record would be scanned, the data aggregated to the region level, and the calculation performed. Using a materialized view, we can store a summary of the sales data by region, and have the system maintain these summaries for us. If you have ten sales regions, this summary will have ten records, so instead of sifting through a million detail records, we'll query only ten. Furthermore, if someone asks a slightly different question, say for the sales in a specific region, then that query can also obtain the answer from the materialized view.
In this chapter you'll find out what materialized views are, what they can do and, most importantly, how they work - a lot of the 'magic' goes on behind the scenes. Having gone to the trouble of creating it, you'll find out how to make sure that your materialized view is used by all queries to which the view is capable of providing the answer (sometimes, you know Oracle could use the materialized view, but it is not able to do so simply because it lacks important information). Specifically, we will:
Run through an example that will demonstrate the essential power of materialized views, and help you decide quickly whether it is a feature you might want to use.
Discuss the various parameters and privileges that must be set in order to use this feature.
Investigate, with examples, the use of constraints and dimensions to let the database know when to use a materialized view to answer a query in the most effective manner.
Look at how to use the DBMS_OLAP package to analyze your views.
Round off the chapter with two caveats you should be aware of when using materialized views.
Summary table management, another term for the materialized view, has actually been around for some time in tools such as Oracle Discoverer (an ad-hoc query and reporting tool). Using Discoverer, an administrator would set up various summary tables in the database. The tool would then parse queries before sending them to Oracle. If it determined that some summary table existed, which could answer the question more readily, it would rewrite the query to access the summary tables, rather than the underlying table that was originally specified in the query, and submit it to Oracle for processing. This was great, as long you used that tool to process your queries. If you ran a query in SQL*PLUS, or from your Java JDBC client, then the query rewrite would not (could not) take place. Furthermore, the synchronization between the details (original source data) and the summaries could not be performed or validated for you automatically, since the tool ran outside the database.
Furthermore, since version 7.0, the Oracle database itself has actually implemented a feature with many of the characteristics of summary tables - the Snapshot. This feature was initially designed to support replication, but I would use it myself to 'pre-answer' large queries. So, I would have snapshots that did not use a database link to replicate data from database to database, but rather just summarized or pre-joined frequently accessed data. This was good, but without any query rewrite capability, it was still problematic. The application had to know to use the summary tables in the first place, and this made the application more complex to code and maintain. If I added a new summary then I would have to find the code that could make use of it, and rewrite that code.
In Oracle 8.1.5 (Enterprise and Personal Editions) Oracle took the query rewriting capabilities from tools like Discoverer, the automated refresh and scheduling mechanisms from snapshots (that makes the summary tables 'self maintaining'), and combined these with the optimizer's ability to find the best plan out of many alternatives. This produced the materialized view.
With all of this functionality centralized in the database, now every application can take advantage of the automated query rewrite facility, regardless of whether access to the database is via SQL*PLUS, Oracle Forms, JDBC, ODBC, Pro*C, OCI, or some third party tool. Every Oracle 8i enterprise database can have summary table management. Also, since everything takes place inside the database, the details can be easily synchronized with the summaries, or at least the database knows when they aren't synchronized, and might bypass 'stale' summaries (you control its behavior in this case). By putting the functionality right in front of the data, anything that can access Oracle can take advantage of this functionality.
Note | The same philosophy underpins features such as Fine Grained Access Control (see the section on Openness in Chapter 1 and also Chapter 21, which is dedicated to FGAC). The closer to the data these functions are, the broader the audience that can appreciate them. If you put security outside the database, in an application perhaps, only people who use the application can make use of it (hence the only access to the data is via the application). |
In order to run the examples in this chapter you'll need access to a Personal or Enterprise Edition of Oracle 8.1.5, or higher. This functionality is not provided in the Standard release. You will need a user account with the following privileges (at least):
GRANT CREATE SESSION
GRANT CREATE TABLE
GRANT CREATE MATERIALIZED VIEW
GRANT QUERY REWRITE
The first three privileges above may be granted to a role that you have been granted. The QUERY REWRITE privilege must be granted directly to you.
Additionally, you'll need access to a tablespace with about 30 - 50MB of free space.
Finally, you must be using the Cost-Based Optimizer (CBO) in order to make use of query rewrite. If you do not use the CBO, query rewrite will not take place. In these examples, our optimizer goal will be left at the default of CHOOSE; simply analyzing the tables will ensure we can take advantage of query rewrite.
A quick example will demonstrate what a materialized view entails. The concept demonstrated below is that of reducing the execution time of a long running query transparently, by summarizing data in the database. A query against a large table will be transparently rewritten into a query against a very small table, without any loss of accuracy in the answer. We'll start with a large table that contains a list of owners of objects, and the objects they own. This table is based on the ALL_OBJECTS data dictionary view:
tkyte@TKYTE816> create table my_all_objects 2 nologging 3 as 4 select * from all_objects 5 union all 6 select * from all_objects 7 union all 8 select * from all_objects 9 / Table created. tkyte@TKYTE816> insert /*+ APPEND */ into my_all_objects 2 select * from my_all_objects; 65742 rows created. tkyte@TKYTE816> commit; Commit complete. tkyte@TKYTE816> insert /*+ APPEND */ into my_all_objects 2 select * from my_all_objects; 131484 rows created. tkyte@TKYTE816> commit; Commit complete. tkyte@TKYTE816> analyze table my_all_objects compute statistics; Table analyzed.
On my system, I have the Java option installed so the MY_ALL_OBJECTS table has about 250000 rows in it after the above. You may have to adjust the number of times you UNION ALL and INSERT, in order to achieve the same effect. Now, we'll execute a query against this table that shows the number of objects owned by each user. Initially, this will require a full scan of the large table we have above:
tkyte@TKYTE816> set autotrace on tkyte@TKYTE816> set timing on tkyte@TKYTE816> select owner, count(*) from my_all_objects group by owner; OWNER COUNT(*) ------------------------------ ---------- A 36 B 24 CTXSYS 2220 DBSNMP 48 DEMO 60 DEMO11 36 DEMO_DDL 108 MDSYS 2112 MV_USER 60 ORDPLUGINS 312 ORDSYS 2472 OUR_TYPES 12 OUTLN 60 PERFSTAT 636 PUBLIC 117972 SCHEDULER 36 SCOTT 84 SEAPARK 36 SYS 135648 SYSTEM 624 TESTING 276 TKYTE 12 TTS_USER 48 TYPES 36 24 rows selected. Elapsed: 00:00:03.35 tkyte@TKYTE816> set timing off tkyte@TKYTE816> set autotrace traceonly tkyte@TKYTE816> select owner, count(*) from my_all_objects group by owner; 24 rows selected. Execution Plan ---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=2525 Card=24 Bytes=120) 1 0 SORT (GROUP BY) (Cost=2525 Card=24 Bytes=120) 2 1 TABLE ACCESS (FULL) OF 'MY_ALL_OBJECTS' (Cost=547 Card=262968 Statistics ---------------------------------------------------------- 0 recursive calls 27 db block gets 3608 consistent gets 3516 physical reads 0 redo size 1483 bytes sent via SQL*Net to client 535 bytes received via SQL*Net from client 3 SQL*Net roundtrips to/from client 1 sorts (memory) 0 sorts (disk) 24 rows processed
In order to get the aggregate count, we must count 250000+ records on over 3600 blocks. Unfortunately, in our system we ask this question frequently, dozens of times every day. We are scanning almost 30MB of data. We could avoid counting the details each and every time by creating a materialized view of the data. The following demonstrates the basic steps needed to perform this operation. We'll discuss the GRANT and ALTER statements in more detail in the How Materialized Views Work section. In addition to the grants below, you might need the CREATE MATERIALIZED VIEW privilege as well, depending on what roles you have been granted and have enabled:
tkyte@TKYTE816> grant query rewrite to tkyte; Grant succeeded. tkyte@TKYTE816> alter session set query_rewrite_enabled=true; Session altered. tkyte@TKYTE816> alter session set query_rewrite_integrity=enforced; Session altered. tkyte@TKYTE816> create materialized view my_all_objects_aggs 2 build immediate 3 refresh on commit 4 enable query rewrite 5 as 6 select owner, count(*) 7 from my_all_objects 8 group by owner 9 / Materialized view created. tkyte@TKYTE816> analyze table my_all_objects_aggs compute statistics; Table analyzed.
Basically, what we've done is pre-calculate the object count, and define this summary information as a materialized view. We have asked that the view be immediately built and populated with data. You'll notice that we have also specified REFRESH ON COMMIT and ENABLE QUERY REWRITE, but more on these in a moment. Also notice that we may have created a materialized view, but when we ANALYZE, we are analyzing a table. A materialized view creates a real table, and this table may be indexed, analyzed, and so on.
First, let's see the view in action by issuing the same query again (the query that we used to define the view itself):
tkyte@TKYTE816> set timing on tkyte@TKYTE816> select owner, count(*) 2 from my_all_objects 3 group by owner; OWNER COUNT(*) ------------------------------ ---------- A 36 B 24 ... TYPES 36 24 rows selected. Elapsed: 00:00:00.10 tkyte@TKYTE816> set timing off tkyte@TKYTE816> set autotrace traceonly tkyte@TKYTE816> select owner, count(*) 2 from my_all_objects 3 group by owner; 24 rows selected. Execution Plan ---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=1 Card=24 Bytes=216) 1 0 TABLE ACCESS (FULL) OF 'MY_ALL_OBJECTS_AGGS' (Cost=1 Card=Valve) Statistics ---------------------------------------------------------- 0 recursive calls 12 db block gets 7 consistent gets 0 physical reads 0 redo size 1483 bytes sent via SQL*Net to client 535 bytes received via SQL*Net from client 3 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 24 rows processed tkyte@TKYTE816> set autotrace off
From over 3,600 consistent gets (logical I/Os), to just 12. No physical I/O this time around as the data was found in the cache. Our buffer cache will be much more efficient now as it has less to cache. I could not even begin to cache the previous query's working set, but now I can. Notice how our query plan shows we are now doing a full scan of the MY_ALL_OBJECTS_AGGS table, even though we queried the detail table MY_ALL_OBJECTS. When the SELECT OWNER, COUNT(*)... query is issued, the database automatically directs it to our materialized view.
Let's take this a step further by adding a new row to the MY_ALL_OBJECTS table, and committing the change:
tkyte@TKYTE816> insert into my_all_objects 2 ( owner, object_name, object_type, object_id ) 3 values 4 ( 'New Owner', 'New Name', 'New Type', 1111111 ); 1 row created. tkyte@TKYTE816> commit; Commit complete.
Now, we issue effectively the same query again, but this time we're just looking at our newly inserted row:
tkyte@TKYTE816> set timing on tkyte@TKYTE816> select owner, count(*) 2 from my_all_objects 3 where owner = 'New Owner' 4 group by owner; OWNER COUNT(*) ------------------------------ ---------- New Owner 1 Elapsed: 00:00:00.01 tkyte@TKYTE816> set timing off tkyte@TKYTE816> set autotrace traceonly tkyte@TKYTE816> select owner, count(*) 2 from my_all_objects 3 where owner = 'New Owner' 4 group by owner; Execution Plan ---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=1 Card=1 Bytes=9) 1 0 TABLE ACCESS (FULL) OF 'MY_ALL_OBJECTS_AGGS' (Cost=1 Card=Valve) Statistics ---------------------------------------------------------- 0 recursive calls 12 db block gets 6 consistent gets 0 physical reads 0 redo size 430 bytes sent via SQL*Net to client 424 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed tkyte@TKYTE816> set autotrace off
The analysis shows that we scanned the materialized view and found the new row. By specifying REFRESH ON COMMIT in our original definition of the view, we requested that Oracle maintain synchronization between the view and the details - when we update the details, the summary will be maintained as well. It cannot maintain synchronization in every case of an arbitrary materialized view, but in the case of a single table summary (as we have) or joins with no aggregation, it can.
Now, one last query:
tkyte@TKYTE816> set timing on tkyte@TKYTE816> select count(*) 2 from my_all_objects 3 where owner = 'New Owner'; COUNT(*) ---------- 1 Elapsed: 00:00:00.00 tkyte@TKYTE816> set timing off tkyte@TKYTE816> set autotrace traceonly tkyte@TKYTE816> select count(*) 2 from my_all_objects 3 where owner = 'New Owner'; Execution Plan ---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=1 Card=1 Bytes=9) 1 0 SORT (AGGREGATE) 2 1 TABLE ACCESS (FULL) OF 'MY_ALL_OBJECTS_AGGS' (Cost=1 Card=Valve) Statistics ---------------------------------------------------------- 0 recursive calls 12 db block gets 5 consistent gets 0 physical reads 0 redo size 367 bytes sent via SQL*Net to client 424 bytes received via SQL*Net from client 2 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 1 rows processed tkyte@TKYTE816> set autotrace off
We can see that Oracle is smart enough to use the view even when the query appears to be slightly different. There was no GROUP BY clause here, yet the database recognized that the materialized view could still be used. This is what makes materialized views 'magical'. The end users do not have to be aware of these summary tables. The database will realize for us that the answer already exists and, as long as we enable query rewrite (which we did), will automatically rewrite the query to use them. This feature allows you to immediately impact existing applications, without changing a single query.
This is relatively straightforward and is answered in a single word - performance. By calculating the answers to the really hard questions up front (and once only), we will greatly reduce the load on our machine. We will experience:
Less physical reads - There is less data to scan through.
Less writes - We will not be sorting/aggregating as frequently.
Decreased CPU consumption - We will not be calculating aggregates and functions on the data, as we will have already done that.
Markedly faster response times - Our queries will return incredibly quickly when a summary is used, as opposed to the details. This will be a function of the amount of work we can avoid by using the materialized view, but many orders of magnitude is not out of the question.
Materialized views will increase your need for one resource - more permanently allocated disk. We need extra storage space to accommodate the materialized views, of course, but for the price of a little extra disk space, we can reap a lot of benefit.
Materialized views work best in a read-only, or read-intensive environment. They are not designed for use in a high-end OLTP environment. They will add overhead to modifications performed on the base tables in order to capture the changes. There are concurrency issues with regards to using the REFRESH ON COMMIT option. Consider our summary example from before. Any rows that are inserted or deleted from this table will have to update one of 24 rows in the summary table in order to maintain the count in real time. What this means is that, at most, you can have 24 people committing at the same time (assuming they all affect a different owner that is). This does not preclude the use of materialized views in an OLTP environment. For example if you use full refreshes on a recurring basis (during off-peak time) there will be no overhead added to the modifications, and there would be no concurrency issues. This would allow you to report on yesterday's activities, for example, and not query the live OLTP data for reports.
Materialized views may appear to be hard to work with at first. There will be cases where you create a materialized view, and you know that the view holds the answer to a certain question but, for some reason, Oracle does not. If you dig deep enough, you'll discover why, and it always comes back to the fact that Oracle is just a piece of software, and can only work with the information disk provided with it. The more meta data provided, the more pieces of information about the underlying data you can give to Oracle, the better. These pieces of information are mundane things that you might not even think about in a data warehouse environment, such as NOT NULL constraints, primary keys, foreign keys and so on. The meta data provided by these keys and constraints gives the optimizer more information, and hence, more of a chance.
Note | Not only do keys and constraints, such as those listed above preserve data integrity, they also add information about the data, into the data dictionary, which can be used in query rewrites - hence the term met adata. See the 'Constraints' section for further information. |
In the following sections we will look at what you have to do to set up materialized views, some examples of using them, and how adding more information, more meta data, to the database will make materialized views work more often.
There is one mandatory INIT.ORA parameter necessary for materialized views to function, and this is the COMPATIBLE parameter. The value of COMPATIBLE should be set to 8.1.0, or above, in order for query rewrites to be functional. If this value is not set appropriately, query rewrite will not be invoked.
There are two other relevant parameters that may be set at either the system-level (via the INIT.ORA file), or the session-level (via the ALTER SESSION command). They are:
QUERY_REWRITE_ENABLED - Unless the value of this parameter is set to TRUE, query rewrites will not take place. The default value is FALSE.
QUERY_REWRITE_INTEGRITY - This parameter controls how Oracle rewrites queries and may be set to one of three values:
ENFORCED - Queries will be rewritten using only constraints and rules that are enforced and guaranteed by Oracle. There are mechanisms by which we can tell Oracle about other inferred relationships, and this would allow for more queries to be rewritten, but since Oracle does not enforce those relationships, it would not make use of these facts at this level.
TRUSTED - Queries will be rewritten using the constraints that are enforced by Oracle, as well as any relationships existing in the data that we have told Oracle about, but are not enforced by the database. For example, in our initial example we could have created the physical table MY_ALL_OBJECTS_AGGS manually using a parallel, no-logging CREATE TABLE AS SELECT (to speed up the building of the summary table). We could have then created the materialized view, instructing it to use this pre-built table instead of creating the summary table itself. If we wish Oracle to use this pre-built table during a subsequent query rewrite, we must specify a value of TRUSTED. This is because we want Oracle to 'trust' that we have supplied the correct data in the pre-built table - Oracle does not enforce that the data in this table is correct.
STALE_TOLERATED - Queries will be rewritten to use materialized views even if Oracle knows the data contained in the materialized view is 'stale' (out-of-sync with the details). This might be useful in an environment where the summary tables are refreshed on a recurring basis, not on commit, and a slightly out-of-sync answer is acceptable.
In the example above, you saw the ALTER SESSION statements that enable this query rewrite magic. Since the example used only relationships and objects enforced by Oracle, the query rewrite integrity could be set to the highest level - ENFORCED.
I also needed to grant QUERY REWRITE privileges to myself. Now, the account I used happened to be a DBA account, which has QUERY REWRITE, so why did I need to grant the privilege directly to myself? The reason is that you cannot create compiled stored objects such as materialized views, stored procedures, and triggers that rely on privileges from a role (the DBA role in this case). See Chapter 23, Invoker and Definers Rights, for a full explanation of the use of roles and compiled stored objects. If you create a materialized view with QUERY REWRITE enabled, but do not have QUERY REWRITE system privilege yourself, you will receive the following error:
create materialized view my_all_objects_aggs * ERROR at line 1: ORA-01031: insufficient privileges
So, now that we can create a materialized view and show that it works, what are the steps Oracle will undertake to rewrite our queries? Normally, when QUERY_REWRITE_ENABLED is set to FALSE, Oracle will take your SQL as is, parse it, and optimize it. With query rewrites enabled, Oracle will insert an extra step into this process. After parsing, Oracle will attempt to rewrite the query to access some materialized view, instead of the actual table that it references. If it can perform a query rewrite, the rewritten query (or queries) is parsed and then optimized along with the original query. The query plan with the lowest cost from this set is chosen for execution. If it cannot rewrite the query, the original parsed query is optimized and executed as normal.
When query rewrite is enabled, Oracle will use the following steps to try and rewrite a query with a materialized view.
In this method, Oracle considers possible exact string matches in the set of available materialized views found in the data dictionary. In the example above, this is the method Oracle would have used for the very first query that used the materialized view. The algorithm used is 'friendlier' (more flexible) than a shared pool comparison (which demands an exact byte-for-byte match) as it ignores whitespace, case of characters, and other formatting.
Starting with the FROM clause, the optimizer compares the remaining text of the materialized view's defining query. This allows for mismatches between items in the SELECT list. If the data you need can be generated from the materialized view (if your SELECT list can be satisfied) Oracle will rewrite the query using the materialized view. The query SELECT LOWER(OWNER) FROM MY_ALL_OBJECTS GROUP BY OWNER; would be an example of a partial text match.
These enable the use of a materialized view even if it contains only part of the data, more data than requested, or data that can be converted. The optimizer tests the materialized view's definition against the individual components of the query (SELECT, FROM, WHERE, GROUP BY) to find a match. The checks that Oracle performs against these components are:
Data sufficiency - Can the required data be obtained from a given materialized view? If you ask for column X, and column X is not in the materialized view and, furthermore, it is not retrievable via some join with the materialized view, then Oracle will not rewrite the query to use that view. For example, the query SELECT DISTINCT OWNER FROM MY_ALL_OBJECTS, using our previous example, can be rewritten using our materialized view - the OWNER column is available. The query SELECT DISTINCT OBJECT_TYPE FROM MY_ALL_OBJECTS cannot be satisfied using the materialized view, as the view does not have sufficient data.
Join compatibility - Ensures that any JOIN required by the submitted query can be satisfied by the materialized view.
We can look at an example of join compatibility using MY_ALL_OBJECTS and the following tables:
tkyte@TKYTE816> create table t1 ( owner varchar2(30), flag char(1) ); Table created. tkyte@TKYTE816> create table t2 ( object_type varchar2(30), flag char(1) ); Table created.
The following query is join-compatible with the materialized view - the query can and will be rewritten using the materialized view:
tkyte@TKYTE816> select a.owner, count(*), b.owner 2 from my_all_objects a, t1 b 3 where a.owner = b.owner 4 and b.flag is not null 5 group by a.owner, b.owner 6 /
The database can see that using our materialized view, or the actual base table would result in the same answer. However, the following query, while similar, is not join-compatible:
tkyte@TKYTE816> select a.owner, count(*), b.object_type 2 from my_all_objects a, t2 b 3 where a.object_type = b.object_type 4 and b.flag is not null 5 group by a.owner, b.object_type 6 /
The OBJECT_TYPE column is not in our materialized view, so Oracle cannot rewrite this query using the materialized view.
This is required if both the materialized view and the query contain a GROUP BY clause. If the materialized view is grouped at the same level, or is grouped at a higher level of detail than is needed, the query will be rewritten to use the materialized view. The query SELECT COUNT(*) FROM MY_ALL_OBJECTS GROUP BY 1; applied against our first example would be a case where the materialized view is grouped at a higher level of detail than is needed. The database can rewrite this query to use our materialized view even though the grouping for the query is not the same as the grouping for the materialized view.
This is required if both query and materialized view contain aggregates. It will ensure that the materialized view can satisfy the required aggregates. It can perform some interesting rewrites in some cases. For example, it will recognize that AVG(X) is the same as SUM(X)/COUNT(X), so a query that requires AVG(X) can be satisfied by a materialized view with the SUM and COUNT.
In many cases, simple application of the above rules will allow Oracle to rewrite a query to use a materialized view. In other cases (as we will see in an example below), the database will need a little more help from you. You will need to give it additional information in order for it to recognize that it can use a materialized view to answer a question.
In this section, we'll look at ways to do this, first by using constraints to help us make use of a query rewrite, and then by using dimensions, which are a means of describing complex relationships - hierarchies of data.
I've been asked in the past, 'Why should I use a primary key? Why not just use a unique index?' Well, the answer is that you could, but doesn't the fact you used a primary key say something over and above just using a unique index? In fact it does - it can say a lot. The same goes for the use of foreign keys, NOT NULL constraints and others. Not only do they protect the data, but they also add information about the data into the data dictionary. Using this additional information, Oracle is able to perform a query rewrite more often, in many complex cases.
Consider the following small example. We will copy the EMP and DEPT tables from the SCOTT schema, and create a materialized view that pre-joins the tables together. This materialized view differs from our first example in that it is a REFRESH ON DEMAND materialized view. This means that in order for changes to be applied to it, we will have to refresh it manually:
tkyte@TKYTE816> create table emp as select * from scott.emp; Table created. tkyte@TKYTE816> create table dept as select * from scott.dept; Table created. tkyte@TKYTE816> alter session set query_rewrite_enabled=true; Session altered. tkyte@TKYTE816> alter session set query_rewrite_integrity=enforced; Session altered. tkyte@TKYTE816> create materialized view emp_dept 2 build immediate 3 refresh on demand 4 enable query rewrite 5 as 6 select dept.deptno, dept.dname, count (*) 7 from emp, dept 8 where emp.deptno = dept.deptno 9 group by dept.deptno, dept.dname 10 / Materialized view created. tkyte@TKYTE816> alter session set optimizer_goal=all_rows; Session altered.
Since the underlying tables and the resulting materialized view are very small, we force the use of the cost-based optimizer using the ALTER SESSION command, instead of analyzing tables as we would normally. If Oracle knew how small these tables were, it would not do some of the optimizations we would like. Using the 'default' statistics, it will behave as if the tables were fairly large.
Now, we have withheld a lot of information from Oracle here. It does not understand the relationship between EMP and DEPT, does not know which columns are primary keys, and so on. Now, let's run a query, and see what happens:
tkyte@TKYTE816> set autotrace on tkyte@TKYTE816> select count(*) from emp; COUNT(*) ---------- 14 Execution Plan ---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=1 Card=1) 1 0 SORT (AGGREGATE) 2 1 TABLE ACCESS (FULL) OF 'EMP' (Cost=1 Card=82)
The query has been directed at the underlying EMP table. Now, you and I know that the COUNT(*) query could easily, and more efficiently (especially if the number of employees in each department was large, and there were lots of departments), have been answered from the materialized view. There, we have all of the information we need to get the count of employees. We know this because we are aware of things about the data that we kept from Oracle:
DEPTNO is the primary key of DEPT - This means that each EMP record will join to, at most, one DEPT record.
DEPTNO in EMP is a foreign key to DEPTNO in DEPT - If the DEPTNO in EMP is not a Null value, then it will be joined to a row in DEPT (we won't lose any non-Null EMP records during a join).
DEPTNO in EMP is NOT NULL - This coupled with the foreign key constraint tells us we won't lose any EMP records.
These three facts imply that if we join EMP to DEPT, each EMP row will be observed in the resultset at least once and at most once. Since we never told Oracle these facts, it was not able to make use of the materialized view. So, let's make Oracle aware of them:
tkyte@TKYTE816> alter table dept 2 add constraint dept_pk primary key(deptno); Table altered. tkyte@TKYTE816> alter table emp 2 add constraint emp_fk_dept 3 foreign key(deptno) references dept(deptno); Table altered. tkyte@TKYTE816> alter table emp modify deptno not null; Table altered. tkyte@TKYTE816> set autotrace on tkyte@TKYTE816> select count(*) from emp; COUNT(*) ---------- 14 Execution Plan ---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=1 Card=1 Bytes=13) 1 0 SORT (AGGREGATE) 2 1 TABLE ACCESS (FULL) OF 'EMP_DEPT' (Cost=1 Card=82 Bytes=1066)
Now Oracle is able to rewrite the query using the EMP_DEPT materialized view. Any time that you know Oracle could use a materialized view, but it is not doing so (and you have verified you can use materialized views in general), take a closer look at the data and ask yourself 'What piece of information have I withheld from Oracle?' Nine times out of ten, you'll find a missing piece of meta data that, when included, allows Oracle to rewrite the query.
So, what happens if this is a true data warehouse, and there are tens of millions of records in the above tables? You don't really want the additional effort of verifying a foreign key relationship - you already did that in your data scrubbing routine, didn't you? In this case, you can create a non-validated constraint, one that is used to inform the database about a relationship, but it has not been validated by the database itself. Let's look at the above example again, but this time we'll simulate the loading of data into an existing data warehouse (our example above is our data warehouse). We'll drop our constraints, load the data, refresh the materialized views, and add our constraints back. We'll start with dropping the constraints:
tkyte@TKYTE816> alter table emp drop constraint emp_fk_dept; Table altered. tkyte@TKYTE816> alter table dept drop constraint dept_pk; Table altered. tkyte@TKYTE816> alter table emp modify deptno null; Table altered.
Now, in order to simulate our load, I will insert a single new row into EMP (not much of a load, I know, but enough for demonstration purposes). Then, we will refresh our materialized view and tell Oracle to consider it as FRESH:
tkyte@TKYTE816> insert into emp (empno,deptno) values ( 1, 1 ); 1 row created. tkyte@TKYTE816> exec dbms_mview.refresh( 'EMP_DEPT' ); PL/SQL procedure successfully completed. tkyte@TKYTE816> alter materialized view emp_dept consider fresh; Materialized view altered.
Now we tell Oracle about the relationships between EMP and DEPT:
tkyte@TKYTE816> alter table dept 2 add constraint dept_pk primary key(deptno) 3 rely enable NOVALIDATE 4 / Table altered. tkyte@TKYTE816> alter table emp 2 add constraint emp_fk_dept 3 foreign key(deptno) references dept(deptno) 4 rely enable NOVALIDATE 5 / Table altered. tkyte@TKYTE816> alter table emp modify deptno not null NOVALIDATE; Table altered.
So here we have told Oracle that there is a foreign key from EMP to DEPT as before. However this time, because we will have scrubbed our data prior to loading it into the warehouse, we tell Oracle not to perform any validating checks. The NOVALIDATE option bypasses the checking of existing data we loaded, and RELY tells Oracle to trust the integrity of the data. Basically, we have told Oracle to trust that if it joins EMP to DEPT by DEPTNO, every row in EMP will be retrieved at least once, and at most once.
In fact, in this case, we have 'lied' to the database. We have inserted a row into EMP that has no corresponding row in DEPT. We are now ready to query:
tkyte@TKYTE816> alter session set query_rewrite_integrity=enforced; Session altered. tkyte@TKYTE816> select count(*) from emp; COUNT(*) ---------- 15 Execution Plan ---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=1 Card=1) 1 0 SORT (AGGREGATE) 2 1 TABLE ACCESS (FULL) OF 'EMP' (Cost=1 Card=164)
Since we set QUERY_REWRITE_INTEGRITY=ENFORCED Oracle did not rewrite the query to use the materialized view. We must go down a level in query integrity. We need Oracle to 'trust' us:
tkyte@TKYTE816> alter session set query_rewrite_integrity=trusted; Session altered. tkyte@TKYTE816> select count(*) from emp; COUNT(*) ---------- 14 Execution Plan ---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=1 Card=1 Bytes=13) 1 0 SORT (AGGREGATE) 2 1 TABLE ACCESS (FULL) OF 'EMP_DEPT' (Cost=1 Card=82 Bytes=1066)
In this case, Oracle did in fact rewrite the query, but the side effect is that our newly inserted row has not been counted. The 'wrong' answer is returned because the 'fact' that each row in EMP should be preserved in a join to DEPT is not a fact, given the data we loaded. When the materialized view refreshed, it did not get the newly added EMP row. The data we told Oracle to rely on was not reliable. This demonstration highlights two important points:
You can use materialized views in a large data warehouse very efficiently without having to perform lots of extra, typically redundant, verifications of the data.
BUT, you had better be 100 percent sure that your data is scrubbed if you ask Oracle to rely on it.
Use of dimensions is another method by which we can give even more information to Oracle. Suppose we have a table of details giving a transaction date and a customer ID. The transaction date points to another table that gives full details of what month the transaction date was in, what quarter of your fiscal year it represents, what fiscal year it was in, and so on. Now, suppose you created a materialized view that stored aggregated sales information at the monthly level. Can Oracle use that view to answer a query for sales data for a particular quarter or year? Well, we know that transaction date implies month, month implies quarter, and quarter implies year, so the answer is that it can, but Oracle doesn't know this (yet), so although it can, it won't.
Using a database object called a DIMENSION, we can alert Oracle to these facts so that it will use them to rewrite queries in more cases. A dimension declares a parent/child relationship between pairs of columns. We can use it to describe to Oracle that, within a row of a table, the MONTH column implies the value you'll find in the QTR column, the QTR column implies the value you'll find in the YEAR column, and so on. Using a dimension, we can have a materialized view that has fewer details than the detailed records (summarized to the monthly level perhaps). This may still be at a higher level of aggregation than the query requests (the query wants data by quarter, say), but Oracle will recognize that it can use the materialized view to get the answer.
Here is a simple example. We will set up a SALES table to store the transaction date, a customer ID, and the total number of sales. This table will have about 350000 rows in it. Another table, TIME_HIERARCHY, will store the mapping of transaction date to month, to quarter, to year. If we join the two together, we can obtain aggregate sales by month, quarter, year, and so on. Likewise, if we had a table that mapped a customer ID to a zip code, and the zip code to a region, we could easily join this table to SALES, and aggregate by zip code or region.
In a conventional database schema (one without materialized views and other structures) these operations would succeed, but they would be very slow. For every row in the sales data, we would have to perform an indexed read into the lookup table to convert either the transaction date or customer ID into some other value (a NESTED LOOP JOIN) in order to group by this other value. Enter the materialized view. We can store a summarized rollup of the details, perhaps at the monthly level for the transaction date, and at the zip code level for the customer information. Now, rolling up to quarters or by region becomes a very fast operation.
We'll start by creating the SALES table, and load it with some random test data, generated using the ALL_OBJECTS view.
tkyte@TKYTE816> create table sales 2 (trans_date date, cust_id int, sales_amount number ); Table created. tkyte@TKYTE816> insert /*+ APPEND */ into sales 2 select trunc(sysdate,'year')+mod(rownum,366) TRANS_DATE, 3 mod(rownum,100) CUST_ID, 4 abs(dbms_random.random)/100 SALES_AMOUNT 5 from all_objects 6 / 21921 rows created. tkyte@TKYTE816> commit; Commit complete.
These details will represent one year's worth of data. I set up the TRANS_DATE to simply be the first day of this year plus a number between 1 and 365. The CUST_ID is a number between 0 and 99. The total number of sales is some typically 'large' number (it was a really good year).
My ALL_OBJECTS view contains about 22000 rows, so after four inserts that consecutively double the size of the table, we'll have about 350000 records. I am using the /*+ APPEND */ hint simply to avoid the redo log that would otherwise be generated by these large inserts:
tkyte@TKYTE816> begin 2 for i in 1 .. 4 3 loop 4 insert /*+ APPEND */ into sales 5 select trans_date, cust_id, abs(dbms_random.random)/100 6 from sales; 7 commit; 8 end loop; 9 end; 10 / PL/SQL procedure successfully completed. tkyte@TKYTE816> select count(*) from sales; COUNT(*) ---------- 350736
Now we need to set up our TIME_HIERARCHY table, to roll up the date field by month, year, quarter, and so on:
tkyte@TKYTE816> create table time_hierarchy 2 (day primary key, mmyyyy, mon_yyyy, qtr_yyyy, yyyy) 3 organization index 4 as 5 select distinct 6 trans_date DAY, 7 cast (to_char(trans_date,'mmyyyy') as number) MMYYYY, 8 to_char(trans_date,'mon-yyyy') MON_YYYY, 9 'Q' || ceil( to_char(trans_date,'mm')/3) || ' FY' 10 || to_char(trans_date,'yyyy') QTR_YYYY, 11 cast( to_char( trans_date, 'yyyy' ) as number ) YYYY 12 from sales 13 / Table created.
In this case, it was simple enough. We generated:
MMYYYY - The month, including the year
MON_YYYY - Same as above but we 'spelled' out the month
QTR_YYYY - The quarter of the year, including the year
YYYY - The year itself
In general, the computations required to create this table could be much more complex. For example, fiscal year quarters, typically, are not so easily computed, and neither are fiscal years. They do not generally follow the calendar year.
We'll now create the materialized view, SALES_MV. The summary we are creating rolls the data up from individual days to months. We would expect our materialized view to have about 1/30 the number of rows of our SALES table, if the data was evenly distributed:
tkyte@TKYTE816> analyze table sales compute statistics; Table analyzed. tkyte@TKYTE816> analyze table time_hierarchy compute statistics; Table analyzed. tkyte@TKYTE816> create materialized view sales_mv 2 build immediate 3 refresh on demand 4 enable query rewrite 5 as 6 select sales.cust_id, sum(sales.sales_amount) sales_amount, 7 time_hierarchy.mmyyyy 8 from sales, time_hierarchy 9 where sales.trans_date = time_hierarchy.day 10 group by sales.cust_id, time_hierarchy.mmyyyy 11 / Materialized view created. tkyte@TKYTE816> set autotrace on tkyte@TKYTE816> select time_hierarchy.mmyyyy, sum(sales_amount) 2 from sales, time_hierarchy 3 where sales.trans_date = time_hierarchy.day 4 group by time_hierarchy.mmyyyy 5 / MMYYYY SUM(SALES_AMOUNT) ---------- ----------------- 12001 3.2177E+11 12002 1.0200E+10 22001 2.8848E+11 32001 3.1944E+11 42001 3.1012E+11 52001 3.2066E+11 62001 3.0794E+11 72001 3.1796E+11 82001 3.2176E+11 92001 3.0859E+11 102001 3.1868E+11 112001 3.0763E+11 122001 3.1305E+11 13 rows selected. Execution Plan ---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=4 Card=327 Bytes=850VALVE) 1 0 SORT (GROUP BY) (Cost=4 Card=327 Bytes=8502) 2 1 TABLE ACCESS (FULL) OF 'SALES_MV' (Cost=1 Card=327 Bytes
So far, so good - Oracle rewrote the query to use the view, SALES_MV. However, let's see what happens if we issue a query that calls for a higher level of aggregation:
tkyte@TKYTE816> set timing on tkyte@TKYTE816> set autotrace on tkyte@TKYTE816> select time_hierarchy.qtr_yyyy, sum(sales_amount) 2 from sales, time_hierarchy 3 where sales.trans_date = time_hierarchy.day 4 group by time_hierarchy.qtr_yyyy 5 / QTR_YYYY SUM(SALES_AMOUNT) ------------------------------------------------ ----------------- Q1 FY2001 9.2969E+11 Q1 FY2002 1.0200E+10 Q2 FY2001 9.3872E+11 Q3 FY2001 9.4832E+11 Q4 FY2001 9.3936E+11 Elapsed: 00:00:05.58 Execution Plan ---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=8289 Card=5 Bytes=14) 1 0 SORT (GROUP BY) (Cost=8289 Card=5 Bytes=145) 2 1 NESTED LOOPS (Cost=169 Card=350736 Bytes=10171344) 3 2 TABLE ACCESS (FULL) OF 'SALES' (Cost=169 Card=350736 B 4 2 INDEX (UNIQUE SCAN) OF 'SYS_IOT_TOP_30180' (UNIQUE) Statistics ---------------------------------------------------------- 0 recursive calls 15 db block gets 351853 consistent gets ...
We see that Oracle doesn't have the knowledge we have. It does not yet know that it could have used the materialized view to answer this particular query, so it used the original SALES table instead, and had to do a lot of work to get the answer. The same thing would happen if we requested data aggregated by fiscal year.
So, let's use a DIMENSION to alert Oracle to the fact that the materialized view would be useful in answering this question. First, we'll create the DIMENSION:
tkyte@TKYTE816> create dimension time_hierarchy_dim 2 level day is time_hierarchy.day 3 level mmyyyy is time_hierarchy.mmyyyy 4 level qtr_yyyy is time_hierarchy.qtr_yyyy 5 level yyyy is time_hierarchy.yyyy 6 hierarchy time_rollup 7 ( 8 day child of 9 mmyyyy child of 10 qtr_yyyy child of 11 yyyy 12 ) 13 attribute mmyyyy 14 determines mon_yyyy; Dimension created.
This tells Oracle that the DAY column of the TIME_HIERARCHY table implies MMYYYY, which in turn implies QTR_YYYY. Finally, QTR_YYYY implies YYYY. Also stated is the fact that MMYYYY and MON_YYYY are synonymous - there is a one-to-one mapping between the two. So, any time Oracle sees MON_YYYY used, it understands it as if MMYYYY was used. Now that Oracle has a greater understanding of the relationships between the data we can see a marked improvement in our query response times:
tkyte@TKYTE816> set autotrace on tkyte@TKYTE816> select time_hierarchy.qtr_yyyy, sum(sales_amount) 2 from sales, time_hierarchy 3 where sales.trans_date = time_hierarchy.day 4 group by time_hierarchy.qtr_yyyy 5 / QTR_YYYY SUM(SALES_AMOUNT) ----------------------------------------------- ----------------- Q1 FY2001 9.2969E+11 Q1 FY2002 1.0200E+10 Q2 FY2001 9.3872E+11 Q3 FY2001 9.4832E+11 Q4 FY2001 9.3936E+11 Elapsed: 00:00:00.20 Execution Plan ---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=7 Card=5 Bytes=195) 1 0 SORT (GROUP BY) (Cost=7 Card=5 Bytes=195) 2 1 HASH JOIN (Cost=6 Card=150 Bytes=5850) 3 2 VIEW (Cost=4 Card=46 Bytes=598) 4 3 SORT (UNIQUE) (Cost=4 Card=46 Bytes=598) 5 4 INDEX (FAST FULL SCAN) OF 'SYS_IOT_TOP_30180' (UNI 6 2 TABLE ACCESS (FULL) OF 'SALES_MV' (Cost=1 Card=327 Byt Statistics ---------------------------------------------------------- 0 recursive calls 16 db block gets 12 consistent gets ...
Well, we went from more than 350000 logical reads to 12 - not too bad at all. If you run this example, you'll be able to see the difference. The first query took a while (about six seconds), the answer to the second query was on the screen before our hands left the Enter key (about a fifth of a second).
We can use this DIMENSION feature many times on the same base fact table. Consider if we assign a ZIP_CODE and REGION attribute to every customer in our database:
tkyte@TKYTE816> create table customer_hierarchy 2 ( cust_id primary key, zip_code, region ) 3 organization index 4 as 5 select cust_id, 6 mod( rownum, 6 ) || to_char(mod( rownum, 1000 ), 'fm0000') zip_code, 7 mod( rownum, 6 ) region 8 from ( select distinct cust_id from sales) 9 / Table created. tkyte@TKYTE816> analyze table customer_hierarchy compute statistics; Table analyzed.
Next, we recreate our materialized view to be a summary that shows us SALES_AMOUNT by ZIP_CODE and MMYYYY:
tkyte@TKYTE816> drop materialized view sales_mv; Materialized view dropped. tkyte@TKYTE816> create materialized view sales_mv 2 build immediate 3 refresh on demand 4 enable query rewrite 5 as 6 select customer_hierarchy.zip_code, 7 time_hierarchy.mmyyyy, 8 sum(sales.sales_amount) sales_amount 9 from sales, time_hierarchy, customer_hierarchy 10 where sales.trans_date = time_hierarchy.day 11 and sales.cust_id = customer_hierarchy.cust_id 12 group by customer_hierarchy.zip_code, time_hierarchy.mmyyyy 13 / Materialized view created.
We'll try to execute a query now, which will show us sales by ZIP_CODE and MMYYYY, and demonstrate that I used the materialized view as expected:
tkyte@TKYTE816> set autotrace tkyte@TKYTE816> select customer_hierarchy.zip_code, 2 time_hierarchy.mmyyyy, 3 sum(sales.sales_amount) sales_amount 4 from sales, time_hierarchy, customer_hierarchy 5 where sales.trans_date = time_hierarchy.day 6 and sales.cust_id = customer_hierarchy.cust_id 7 group by customer_hierarchy.zip_code, time_hierarchy.mmyyyy 8 / 1250 rows selected. Execution Plan ---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=1 Card=409 Bytes=204 1 0 TABLE ACCESS (FULL) OF 'SALES_MV' (Cost=1 Card=409 Bytes=2 Statistics ---------------------------------------------------------- 28 recursive calls 12 db block gets 120 consistent gets ...
However, when we ask for information at a different level of aggregation (rolling MMYYYY up to YYYY and ZIP_CODE up to REGION), we see that Oracle does not recognize that it can use the materialized view:
tkyte@TKYTE816> select customer_hierarchy.region, 2 time_hierarchy.yyyy, 3 sum(sales.sales_amount) sales_amount 4 from sales, time_hierarchy, customer_hierarchy 5 where sales.trans_date = time_hierarchy.day 6 and sales.cust_id = customer_hierarchy.cust_id 7 group by customer_hierarchy.region, time_hierarchy.yyyy 8 / 9 rows selected. Execution Plan ---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=8289 Card=9 Bytes=26 1 0 SORT (GROUP BY) (Cost=8289 Card=9 Bytes=261) 2 1 NESTED LOOPS (Cost=169 Card=350736 Bytes=10171344) 3 2 NESTED LOOPS (Cost=169 Card=350736 Bytes=6663984) 4 3 TABLE ACCESS (FULL) OF 'SALES' (Cost=169 Card=350736 5 3 INDEX (UNIQUE SCAN) OF 'SYS_IOT_TOP_30185' (UNIQUE) 6 2 INDEX (UNIQUE SCAN) OF 'SYS_IOT_TOP_30180' (UNIQUE) Statistics ---------------------------------------------------------- 0 recursive calls 15 db block gets 702589 consistent gets ...
Oracle understands the time dimension we set up, but it does not yet have any information about how CUST_ID, ZIP_CODE, and REGION relate to each other in our CUSTOMER_HIERARCHY table. What we do to correct this is to rebuild our dimension with two hierarchies in it - one describing the TIME_HIERARCHY and the other describing the CUSTOMER_HIERARCHY table:
tkyte@TKYTE816> drop dimension time_hierarchy_dim 2 / Dimension dropped. tkyte@TKYTE816> create dimension sales_dimension 2 level cust_id is customer_hierarchy.cust_id 3 level zip_code is customer_hierarchy.zip_code 4 level region is customer_hierarchy.region 5 level day is time_hierarchy.day 6 level mmyyyy is time_hierarchy.mmyyyy 7 level qtr_yyyy is time_hierarchy.qtr_yyyy 8 level yyyy is time_hierarchy.yyyy 9 hierarchy cust_rollup 10 ( 11 cust_id child of 12 zip_code child of 13 region 14 ) 15 hierarchy time_rollup 16 ( 17 day child of 18 mmyyyy child of 19 qtr_yyyy child of 20 yyyy 21 ) 22 attribute mmyyyy 23 determines mon_yyyy; Dimension created.
We dropped the original time hierarchy and created a new, more descriptive one explaining all of the relevant relationships. Now Oracle will understand that the SALES_MV we created is able to answer many more questions. For example, if we ask our 'REGION by YYYY' question again:
tkyte@TKYTE816> select customer_hierarchy.region, 2 time_hierarchy.yyyy, 3 sum(sales.sales_amount) sales_amount 4 from sales, time_hierarchy, customer_hierarchy 5 where sales.trans_date = time_hierarchy.day 6 and sales.cust_id = customer_hierarchy.cust_id 7 group by customer_hierarchy.region, time_hierarchy.yyyy 8 / REGION YYYY SALES_AMOUNT ---------- ---------- ------------ 0 2001 5.9598E+11 0 2002 3123737106 1 2001 6.3789E+11 2 2001 6.3903E+11 2 2002 3538489159 3 2001 6.4069E+11 4 2001 6.3885E+11 4 2002 3537548948 5 2001 6.0365E+11 9 rows selected. Execution Plan ---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=11 Card=9 Bytes=576) 1 0 SORT (GROUP BY) (Cost=11 Card=9 Bytes=576) 2 1 HASH JOIN (Cost=9 Card=78 Bytes=4992) 3 2 HASH JOIN (Cost=6 Card=78 Bytes=4446) 4 3 VIEW (Cost=3 Card=19 Bytes=133) 5 4 SORT (UNIQUE) (Cost=3 Card=19 Bytes=133) 6 5 INDEX (FAST FULL SCAN) OF 'SYS_IOT_TOP_30180' (U 7 3 TABLE ACCESS (FULL) OF 'SALES_MV' (Cost=1 Card=409 B 8 2 VIEW (Cost=3 Card=100 Bytes=700) 9 8 SORT (UNIQUE) (Cost=3 Card=100 Bytes=700) 10 9 INDEX (FULL SCAN) OF 'SYS_IOT_TOP_30185' (UNIQUE) Statistics ---------------------------------------------------------- 0 recursive calls 16 db block gets 14 consistent gets ...
Oracle was able to make use of both hierarchies in the dimension here, and can now make use of the materialized view. Due to the dimensions we created, it performed simple lookups to convert the CUST_ID column into REGION (since CUST_ID implies ZIP_CODE implies REGION), the MMYYYY column into QTR_YYYY, and answered our question almost instantly. Here, we reduced the number of logical I/Os from more than 700,000 to 16. When you consider that the size of the SALES table will only grow over time, and the size of the SALES_MV will grow much more slowly (180 records or so per month), we can see that this query will scale very well.
The last piece to the materialized view puzzle is the DBMS_OLAP package. This package is used for the following purposes:
To estimate the size of a materialized view, in terms of the number of rows and bytes of storage.
To validate that your dimension objects are correct, given the primary/foreign key relationships you have set up.
To recommend additional materialized views, and to name views that should be dropped, based on either actual utilization and structure, or just structure alone.
To evaluate the utilization of a materialized view, using procedures that are provided, which will report on the actual usefulness of your materialized views whether they are actually being used or not.
Unfortunately, the utilization routines are beyond the scope of what I can cover in one chapter. It involves the setting up of Oracle Trace and the Enterprise Manager Performance Pack, but we shall take a look at the other three.
In order to use the DBMS_OLAP package, you must have external procedures set up, as most of the DBMS_OLAP code is actually stored in a C library. See Chapter 18, C-Based External Procedures, for set-up instructions if you receive an error such as the following:
ERROR at line 1: ORA-28575: unable to open RPC connection to external procedure agent ORA-06512: at "SYS.DBMS_SUMADV", line 6 ORA-06512: at "SYS.DBMS_SUMMARY", line 559 ORA-06512: at line 1
The ESTIMATE_SUMMARY_SIZE routine will report the estimated number of rows and bytes of storage that the materialized view will consume. Since hindsight is 20/20, we can ask DBMS_OLAP to estimate a figure and then compare that figure to what we get in reality.
In order to run this procedure, you will need to ensure that you have a PLAN_TABLE installed in your schema. You will find the CREATE TABLE statement in the [ORACLE_HOME]/rdbms/admin directory on your database server, in a file named utlxplan.sql. If you execute this script, it will create the PLAN_TABLE for you. This table is used by the EXPLAIN PLAN facility that in turn is used by DBMS_OLAP to estimate the size of the materialized view. With this table in place, we can use the built-in ESTIMATE_SUMMARY_SIZE routine to get an estimate of the number of rows/bytes a materialized view would need if we were to build it. I start with a DELETE STATISTICS on our SALES_MV materialized view. DBMS_OLAP would not normally have access to a materialized view to actually see what the sizes are, so we have to hide it (otherwise DBMS_OLAP will get the exact answer from the data dictionary):
tkyte@TKYTE816> analyze table sales_mv DELETE statistics; Table analyzed. tkyte@TKYTE816> declare 2 num_rows number; 3 num_bytes number; 4 begin 5 dbms_olap.estimate_summary_size 6 ( 'SALES_MV_ESTIMATE', 7 'select customer_hierarchy.zip_code, 8 time_hierarchy.mmyyyy, 9 sum(sales.sales_amount) sales_amount 10 from sales, time_hierarchy, customer_hierarchy 11 where sales.trans_date = time_hierarchy.day 12 and sales.cust_id = customer_hierarchy.cust_id 13 group by customer_hierarchy.zip_code, time_hierarchy.mmyyyy', 14 num_rows, 15 num_bytes ); 16 17 dbms_output.put_line( num_rows || ' rows' ); 18 dbms_output.put_line( num_bytes || ' bytes' ); 19 end; 20 / 409 rows 36401 bytes PL/SQL procedure successfully completed.
The first parameter to this routine is the name of the plan to be stored in the plan table. This name is not excessively relevant, except that you will want to DELETE FROM PLAN_TABLE WHERE STATEMENT_ID = 'SALES_MV_ESTIMATE' after you are done. The second parameter is the query that will be used to instantiate the materialized view. DBMS_OLAP will analyze this query using the statistics on all of the underlying tables to guess the size of this object. The remaining two parameters are the outputs from DBMS_OLAP - the rows and byte count estimates, with values of 409 and 36401, respectively. Now let's calculate the true valves:
tkyte@TKYTE816> analyze table sales_mv COMPUTE statistics; Table analyzed. tkyte@TKYTE816> select count(*) from sales_mv; COUNT(*) ---------- 1250 tkyte@TKYTE816> select blocks * 8 * 1024 2 from user_tables 3 where table_name = 'SALES_MV' 4 / BLOCKS*8*1024 ------------- 40960
So, the ESTIMATE_SUMMARY_SIZE routine did very well on the size of the table, but underestimated the number of rows. This is typical of anything that 'estimates' - it will get some things right, and miss on others. I would use this routine for a rough 'best guess' as to the size of an object.
This routine takes any given dimension, and checks that the hierarchies you have defined are valid. For example, in our example earlier, it would ensure that a CUST_ID implied a ZIP_CODE implied a REGION. To see this routine at work, we'll create a 'bad' example to work from. We'll start by creating a table that has a row for every day of this year with the day, month and year as attributes:
tkyte@TKYTE816> create table time_rollup 2 ( day date, 3 mon number, 4 year number 5 ) 6 / Table created. tkyte@TKYTE816> insert into time_rollup 2 select dt, to_char(dt,'mm'), to_char(dt,'yyyy') 3 from ( select trunc(sysdate,'year')+rownum-1 dt 4 from all_objects where rownum < 366 ) 5 / 365 rows created.
So, here we have set up a time roll-up similar to our previous example. This time however, I did not preserve the year in the month attribute, just the two digits that represent the month. If we add one more row to this table:
tkyte@TKYTE816> insert into time_rollup values 2 ( add_months(sysdate,12), 3 to_char(add_months(sysdate,12),'mm'), 4 to_char(add_months(sysdate,12),'yyyy') ); 1 row created.
We can see the problem. We will be saying that DAY implies MONTH, and MONTH implies YEAR, but it is not true in this case. We'll have one month that implies one of two different years. DBMS_OLAP will do a sanity check for us to show us our error. First we'll set up the dimension:
tkyte@TKYTE816> create dimension time_rollup_dim 2 level day is time_rollup.day 3 level mon is time_rollup.mon 4 level year is time_rollup.year 5 hierarchy time_rollup 6 ( 7 day child of mon child of year 8 ) 9 / Dimension created.
And then validate it:
tkyte@TKYTE816> exec dbms_olap.validate_dimension( 'time_rollup_dim', user, false, false ); PL/SQL procedure successfully completed.
It looks like it succeeded, but we really have to check the table it creates and populates for us:
tkyte@TKYTE816> select * from mview$_exceptions; OWNER TABLE_NAME DIMENSION_NAME RELATIONSHI BAD_ROWID ----- ------------ ---------------- ----------- ------------------ TKYTE TIME_ROLLUP TIME_ROLLUP_DIM CHILD OF AAAGkxAAGAAAAcKAA7 TKYTE TIME_ROLLUP TIME_ROLLUP_DIM CHILD OF AAAGkxAAGAAAAcKAA8 TKYTE TIME_ROLLUP TIME_ROLLUP_DIM CHILD OF AAAGkxAAGAAAAcKAA9 ... 32 rows selected.
If we look at the rows the MVIEW$_EXCEPTIONS point us to, we'll find that they are the rows for the month of MARCH (I ran this in March). Specifically:
tkyte@TKYTE816> select * from time_rollup 2 where rowid in ( select bad_rowid from mview$_exceptions ); DAY MON YEAR --------- ---------- ---------- 01-MAR-01 3 2001 02-MAR-01 3 2001 03-MAR-01 3 2001 04-MAR-01 3 2001 ... 30-MAR-01 3 2001 31-MAR-01 3 2001 26-MAR-02 3 2002 32 rows selected.
The problem is evident at this point, MON does not imply YEAR - the dimension is invalid. It would be unsafe to use this dimension, as the wrong answer would result.
It is recommended that your dimensions be validated after they are modified to ensure the integrity of the results you receive from the materialized views that make use of them.
One of the more interesting uses of the DBMS_OLAP package is to have it tell you what materialized views you should consider creating. The RECOMMEND routines do just that.
There are two versions of this routine:
RECOMMEND_MV looks at the structure of the table, the foreign keys that are in place, existing materialized views, statistics on everything, and then develops a list of prioritized recommendations.
RECOMMEND_MV_W goes one step further. If you are using Oracle Trace and the Enterprise Manager Performance Packs, it will look at the queries the system processes, and recommend materialized views based on that real-life information.
As a simple example, we'll ask DBMS_OLAP to take a look at our existing 'fact' table, SALES.
Note | A fact table is a table in a star schema that, quite simply, contains facts. The SALES table we used above is a fact table. A fact table typically has two types of columns. There are columns that contain facts (values like the SALES_AMOUNT in our sales table), and there are columns that are foreign keys to dimension tables (list TRANS_DATE in our sales table). |
Let's see what DBMS_OLAP has to say. Before we can do that, we'll need to add the foreign keys. The RECOMMEND routine won't look at the DIMENSION to see what can be done - it needs to see the foreign keys to determine the relationships between tables:
tkyte@TKYTE816> alter table sales add constraint t_fk_time 2 foreign key( trans_date) references time_hierarchy 3 / Table altered. tkyte@TKYTE816> alter table sales add constraint t_fk_cust 2 foreign key( cust_id) references customer_hierarchy 3 / Table altered.
Once we've done that, we are ready to look at our fact table, SALES:
tkyte@TKYTE816> exec dbms_olap.recommend_mv( 'SALES', 10000000000, '' ); PL/SQL procedure successfully completed.
Here we asked RECOMMEND_MV to:
Look at the table SALES.
Consider a large amount of space to be used for the materialized views (we just passed a really big number).
Not feel that it needs to keep any particular materialized view (we passed '' as the list of views to KEEP).
Next, we can either query the tables it populates directly or, more conveniently, use a sample routine to print the contents. To install the sample routine and run the report you will:
tkyte@TKYTE816> @C:\oracle\RDBMS\demo\sadvdemo Package created. Package body created. Package created. Package body created. tkyte@TKYTE816> exec demo_sumadv.prettyprint_recommendations Recommendation Number = 1 Recommended Action is CREATE new summary: SELECT CUSTOMER_HIERARCHY.CUST_ID, CUSTOMER_HIERARCHY.ZIP_CODE, CUSTOMER_HIERARCHY.REGION , COUNT(*), SUM(SALES.SALES_AMOUNT), COUNT(SALES.SALES_AMOUNT) FROM TKYTE.SALES, TKYTE.CUSTOMER_HIERARCHY WHERE SALES.CUST_ID = CUSTOMER_HIERARCHY.CUST_ID GROUP BY CUSTOMER_HIERARCHY.CUST_ID, CUSTOMER_HIERARCHY.ZIP_CODE, CUSTOMER_HIERARCHY.REGION Storage in bytes is 2100 Percent performance gain is 43.2371266138567 Benefit-to-cost ratio is .0205891079113603 Recommendation Number = 2 ... PL/SQL procedure successfully completed.
DBMS_OLAP looked at the dimensions and existing materialized views, and is now making suggestions for additional materialized views that may usefully be created, given the meta data (primary keys, foreign keys, and dimensions) we have entered into the database.
If we used Oracle Trace we could go a step further with this recommendation process. Oracle Trace is capable of capturing the actual queries asked of the system, and recording details about them. These will be used by DBMS_OLAP to perform even more focused recommendations, recommendations based not only on what is possible, but based on the reality of the types of questions you actually ask of the data. Materialized views that are possible but would not be used by you based on your workload are not recommended. Other materialized views that are possible, and would be used by you, would be recommended, since they would be used by queries that your system actually executes.
There are a few considerations to be aware of with regards to using materialized views. We will briefly cover some of them here.
As mentioned above, materialized views typically add overhead to individual transactions and, if created with REFRESH ON COMMIT, will introduce contention. The overhead arises from the need to track the changes made by a transaction - these changes will either be maintained in the session state or in log tables. In a high-end OLTP system, this overhead is not desirable. The concurrency issue comes into play with a REFRESH ON COMMIT materialized view, due to the fact that many rows in the detail fact table point to a single row in a summary table. An update to any one of perhaps thousands of records, will need to modify a single row in the summary. This will naturally inhibit concurrency in a high update situation.
This does not preclude the use of materialized views with OLTP, in particular materialized views that are REFRESHed ON DEMAND, with a full refresh. A full refresh does not add the overhead of tracking transaction-level changes. Rather, at some point in time, the defining query for the materialized view is executed, and the results simply replace the existing materialized view. Since this is done on demand (or on a timed basis), the refresh may be scheduled for a time when the load is light. The resulting materialized view is especially relevant for reporting purposes - your OLTP data can be transformed using SQL into something that is easy and fast to query, every night. The next day, your online reports of yesterday's activities run as fast as possible, and easily co-exist with your OLTP system.
As we discussed above - this has three modes:
ENFORCED - Will only use a materialized view if there is no chance of getting incorrect or stale data.
TRUSTED - Oracle will use a materialized view even if some of the constraints it is relying on are something Oracle did not validate or enforce. This is typical in a data warehouse environment, where many constraints may be present but have not been enforced by Oracle.
STALE_TOLERATED - Oracle will use a materialized view even if it knows the data it is derived from has changed. This is typical in a reporting environment such as described in the preceding caveat.
You must understand the ramifications of using each of these modes. ENFORCED will give you the right answer every time, at the expense of not using some materialized views that would speed up the query response time. TRUSTED, if what Oracle has been asked to 'trust' turns out to be false, may give a result that would not be achieved if the original source data had been queried instead. We saw an example of this early on with the EMP_DEPT materialized view. STALE_TOLERATED should be used in reporting systems where getting a value that existed a while ago is acceptable. If up-to-the-minute information is mandatory, STALE_TOLERATED should not be used.
Materialized views are a powerful data warehouse/decision support system feature. A single materialized view may be used by many different, but related, queries. Best of all, it is 100 percent transparent to the application, and the end user. You do not have to teach people what summaries are available - you inform Oracle what is possible via constraints, referential integrity, and dimensions. It does the rest for you.
Materialized views are the natural evolution, and merging, of features found in the database and decision support tools. No more are the summary table management features of Oracle's Discoverer (and other like tools) limited to these environments. Now every client, from the lowly SQL*PLUS to your custom developed applications, to off-the-shelf reporting tools can take advantage of the fact that the answer has already been stored for them.
Add to all this the DBMS_OLAP tool. It will not only estimate how much additional storage you need to support a materialized view, but it can watch how your existing views are used. Based on this, it will recommend dropping some and creating others - to the point of even supplying you with the query it feels you should use in the materialized view.
All in all, materialized views in a read-only/read-intensive environment will definitely pay you back for the additional storage through reduced query response time and reduced resources needed to actually process the queries.