8.1 The Case for Nested LoopsThroughout this book, I have asserted that nested-loops joins on the join keys offer more robust execution plans than hash or sort -merge joins. Let's examine why. Consider a two-table query diagram, as shown in Figure 8-1. Figure 8-1. A prototype two-table querySight-unseen, it is possible to say a surprising amount about this query, to a high probability, if you already know that it is a functionally useful business query that has come to you for tuning:
These assertions lead to the conclusion that the product of the two filter ratios ( Fa x Fb ) must be quite small and will likely grow smaller with time. Therefore, at least one of Fa and Fb must also be quite small. In practice, this is almost always achieved with one of the filter ratios being much smaller than the other; the smaller filter ratio is almost always the one that grows steadily smaller with time. Generally, the smallest of these filter ratios justifies indexed access in preference to a full table scan. If the best (smallest) filter ratio is Fb and it is low enough to justify indexed access to table B , then nested loops from B will generally point to the same fraction ( Fb ) of rows in table A . (A given fraction of master records will point to the same fraction of details.) This fraction will also be low enough to justify indexed access (through the foreign key, in this case) to table A , with lower cost than a full table scan. Since Fb < Fa under this assumption, nested loops will minimize the number of rows touched and therefore minimize the number of physical and logical I/Os to table A , compared to an execution plan that drives directly through the index for the filter on A . Either a hash or a sort-merge join to table A would require a more expensive full table scan or index range scan on the less selective filter for table A . Since index blocks for the join are better cached than the large table A , they are inexpensive to read, by comparison to table blocks. Therefore, when the best filter ratio is Fb , nested loops minimize the cost of the read to table A . When the best filter is on the detail table (table A , in this case), the same argument holds at the limit where Jd =1. When Jd >1, larger values of Jd tend to favor hash joins. However, unless Jd is quite large, this factor does not usually overcome the usually strong advantage of driving to every table from the most selective filter. When Jd is large, this implies that the master table B is much smaller than the detail table A and will be consequently much better cached and less expensive to reach, regardless of the join method. I already discussed this case at length in Section 6.6.5 in Chapter 6, and I won't repeat it all here. The key point is that, even in the cases in which hash joins improve a given join cost the most, they usually reduce only a comparatively minor component of the query cost ”the cost of reaching a much smaller, better-cached master table. To achieve even this small benefit, the database must place the joined-to rowset in memory or, worse , store the hashed set temporarily on disk. |