Section 5.9. Pre-Joining Tables


5.9. Pre-Joining Tables

We have seen that physically grouping rows together is of most benefit when performing range scans, where we are obviously interested in a succession of logically adjacent rows. But our discussion so far has been with regard to retrieving data from only one table. Unless the database design is very, very, very bad, most queries will involve far more than one table. It may therefore seem somewhat questionable if we group all the data from one table into one physical location, only to have to complete the retrieval by visiting several randomly scattered locations for data from a second and subsequent tables. We need some method to group data from at least two tables into the same physical location on disk.

The answer lies in pre-joined tables, a technique that is supported by some database systems. Pre-joining is not the same as summary tables or materialized views, which are themselves nothing other than redundant data, pre-digested results that are updated more or less automatically.

Pre-joined tables are tables that are physically stored together, based on some criterion that will usually be the join condition. (Oracle calls such a set of pre-joined tables a cluster, which has nothing to do with either index clustering, as defined earlier in this chapter, nor with the MySQL clusters of databases, which are multiple servers accessing the same set of tables.)

When tables are pre-joined, the basic unit of storage (a page or a block), normally devoted to the data from a single table, holds data from two or more tables, brought together on the basis of a common join key. This arrangement may be very efficient for one specific join. But it often proves to be a disaster for everything else. Here's a review of some of the disadvantages of pre-joining tables:

  • Once the data from two or more tables starts to be shared within one page (or block), the amount of data from one table that can be held in one database page obviously falls, as the page is now sharing its fixed space between two or more tables. Consequently, there is a net increase in the number of pages needed to hold all the data from that one table. More I/O is required than previously if a full table scan has to be performed.

  • Not only is data being shared across additional pages, but the effective size of those pages has been reduced from what was obviously judged to be the optimum at database creation time, and so overflow and chaining start to become significant problems. When this happens, the number of successive accesses required to reach the actual data also increases.

  • Moreover, as anybody who has ever shared an apartment will know, one person often expands space occupancy at the expense of the other. Database tables are just the same! If you want to address this problem by allocating strictly identical storage to each table per page in the cluster, the result is frequently storage waste and the use of even more pages.

This particular type of storage should be used extremely sparingly to solve very specific issues, and then only by database administrators. Developers should forget about this technique.

Pre-joining tables is a very specialized tactic to facilitate queries, but is often done to the detriment of just about every other database activity.




The Art of SQL
The Art of SQL
ISBN: 0596008945
EAN: 2147483647
Year: N/A
Pages: 143

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net