Star Transformation Questions


Now that we've seen the various star schema explain plans and their performance results, let's examine star transformation in a little more detail. Over the past few years , when presenting and speaking on data warehouses, I've been asked quite a few questions regarding star transformation queries. Often I find it's really just a matter of getting comfortable with these radically different concepts for handling large fact tables. Here are some of the most frequently asked questions:

Q : Does the Oracle version matter?

Yes. As I've said numerous times up to this point, you must use Oracle 8i or 9i for best results, although Oracle 8.0 introduced many of the necessary features and may well work for small to mid- sized data warehouses. But, ORA-00600 errors start showing up for both bitmap indexes and partitions as the row counts exceed half a billion.

Q : Does the fact table size matter?

No. The star transformation will work the same if you have a million or a billion rows. The explain plan will be identical and the performance will beat all other alternatives. I've gotten the exact same results using 1/1000 of my UNIX production data on my notebook running Windows.

Q : Will star transformation work when dimension tables are huge?

Yes, kind of. I've had people say they have dimensions with hundreds of millions of rows and they wonder whether star transformation will still work for them. It might, but I really think they have underlying business analysis and dimensional modeling problems. For example, CUSTOMER is often mistakenly viewed as a dimension table for a fact such as CONTRACTS. But really, DEMOGRAPHIC should be the dimension for CONTRACTS.

Q : Will star transformation work for pre-canned reports as well?

Yes. Often, business intelligence users will save and re-run their various reports. Moreover, the data warehouse may also have pre-canned reports written by the IS staff, which may embody those reports everyone needs on a regular basis. It does not matter. The star transformation plan will apply and work for both types of reports.

Q : Will star transformation explain plans always be as easily recognizable as in Figure 5-3 (i.e., contain the phrase transformation)?

No. You should not count on seeing the phase transformation within the explain plan. What's most important is the basic structure of the explain plan, which should look something like this (simplified a bit for readability and to be generically more accurate across the various Oracle versions):

  • Hash join

    • Table access by index ROWID for fact

      • Bitmap AND

        • Bitmap MERGE

          • Table access by index ROWID for Dimension #1

            • Bitmap "AND"

              • Bitmap index scan

              • Bitmap index scan

              • Repeats

          • Bitmap index range scan for fact

        • Bitmap MERGE

          • Table access by index ROWID for Dimension #2

            • Bitmap "AND"

              • Bitmap index scan

              • Bitmap index scan

              • Repeats

          • Bitmap index range scan for fact

        • Repeats

Q : Isn't parallel query with full table scans better?

No. Why would you want to scan a billion rows in parallel when all you need is a few hundred thousand on which to run a calculation? This is a prime example of people reading an Oracle quote like "It's better to do a full table scan in parallel than to traverse an index b-tree" and applying it with reckless abandon. Use common sense. Use the star transformation. Learn to apply blanket Oracle quotes with caution.

Q : But isn't parallel query with star transformation better?

Yes, sometimes. If you have a fact table that is a billion rows and neither partitioned nor parallel, star transformation will still run lightning-fast. Partitioning may shave off a few seconds or minutes. And, parallel query may shave off even a few more seconds or minutes. But note that neither will provide an order of magnitude improvement. The star transformation plan will be the same, with just some minor differences in the columns for object node, operation in/out, partition start, and partition stop. That's it. The moral of the story is that if you can get it right for the non-parallel and non-partitioned world, you can improve on it with these features as well. But they are merely icing on the cake. Also remember what I said about overloading your parallel processor machine by making your fact tables overly parallel. Unless you have more CPUs than concurrent users, you probably don't want parallel. I've been to too many sites with fewer than 32 CPUs using parallel and suffering. If you have less than 16 CPUs, forget parallel query, unless you only have two concurrent users.

Q : What about fact table bitmap indexes and low cardinality?

This is my biggest data warehousing pet peeve question to date. Again, people are reading blanket Oracle documentation and not seeing what's being said. According to Oracle: "The advantages of using bitmap indexes are greatest for low cardinality columns in which the number of distinct values is small compared with the number of rows in the table." Most people seem to ignore the phrase "compared with the number of rows in the table." If the POS_DAY fact table has a billion rows and creates a bitmap index on PRODUCT_ID, that is low cardinality. Yes, PRODUCT has 200,000 rows. But that's small in comparison to a billion. Yet I get this question at least twice at every data warehousing presentation. (I'll cover bitmap index design for successful star transformation queries in much greater detail in a later section of this chapter.)

Q : We want to run a 24x7 data warehouse; how do we update bitmap indexes?

You can't. Bitmap indexes are a requirement for star transformation. And bitmap indexes do not update well; they generally corrupt and slow the load down by a factor of ten or more. I'll go back to Chapter 1 and ask: How can you require 24x7? A true data warehouse is for executives and senior managers; these guys work normal business hours of 9 “5, and most are in the same location, headquarters. I've yet to meet anyone doing a truly strategic data warehouse and require more than 7x16. And I've talked to the DBAs of many Fortune 500 companies doing data warehouses.

Q : I cannot get star transformation explain plans. Why?

Simple; you did not adhere to the prerequisites. Remember, you cannot get a star transformation explain plan without proper INIT.ORA settings, bitmap indexes, and cost-based optimization (all of which will be covered in the next few sections). I've yet to find anyone whose star schema data warehouse is not a fit. Yet I hear people saying that their data is special and thus they cannot do star transformations. That's a lame excuse because nobody really has such specialized situations. It reminds me of programmers who blame compiler bugs every time their programs don't work.



Oracle DBA Guide to Data Warehousing and Star Schemas
Oracle DBA Guide to Data Warehousing and Star Schemas
ISBN: 0130325848
EAN: 2147483647
Year: 2003
Pages: 79
Authors: Bert Scalzo

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net