In one sense, this section belongs in Chapter 6 or Chapter 7, because in many applications outer joins are as common as inner joins. Indeed, those chapters already cover some of the issues surrounding outer joins. However, some of the questions about outer joins are logically removed from the rest of the tuning problem, and the answers to these questions are clearer if you understand the arguments of Chapter 8. Therefore, I complete the discussion of outer joins here. Outer joins are notorious for creating performance problems, but they do not have to be a problem when handled correctly. For the most part, the perceived performance problems with outer joins are simply myths, based on either misunderstanding or problems that were fixed long ago. Properly implemented, outer joins are essentially as efficient to execute as inner joins. What's more, in manual optimization problems, they are actually easier to handle than inner joins.
I already discussed several special issues with outer joins in Chapter 7, under sections named accordingly :
These cases all require special measures already described in Chapter 7. Here, I extend the rules given in Chapter 7 to further optimize the simplest, most common case: simple, unfiltered, downward outer joins to tables, either nodes that are leaf nodes of the query diagram or nodes that lead to further outer joins. Figure 9-1 illustrates this case with 16 outer joins. Figure 9-1. A query with 16 unfiltered outer joinsI call these unfiltered, downward outer joins to simple tables normal outer joins. Normal outer joins have a special and useful property, from the point of view of optimization:
This property makes consideration of normal outer joins especially simple; normal outer joins have no effect at all on the running rowcount (the number of rows reached at any point in the join order). Therefore, they are irrelevant to the optimization of the rest of the query. The main justification for performing downward joins before upward joins does not apply; normal outer joins, unlike inner joins, cannot reduce the rowcount going into later, more expensive upward joins. Since they have no effect on the rest of the optimization problem, you should simply choose a point in the join order where the cost of the normal outer join itself is minimum. This turns out to be the point where the running rowcount is minimum, after the first point where the outer-joined table is reachable through the join tree. 9.1.1 Steps for Normal Outer Join Order OptimizationThe properties of normal outer joins lead to a new set of steps specific to optimizing queries with these joins:
9.1.2 ExampleThese steps require some elaboration. To apply Steps 1 and 2 to Figure 9-1, consider Figure 9-2. Figure 9-2. The inner query diagram for the earlier queryThe initially complex-looking 22-way join is reduced in the inner query diagram in black to a simple 6-way join, and it's an easy optimization problem to find the best inner-join order of (C3, D1, B3, A1, M, A3) . Now, consider Step 3. The driving table is C3 , so the subset s_0 contains any normal outer-joined tables reachable strictly through downward joins hanging below C3 , which would be the set {D2, D3} . The first upward join is to B3 , so s_1 is {C2, C4} , the set of nodes (excluding s_0 nodes) reachable through downward joins from B3 . The second upward join is to A1 , so s_2 is {B1, B2, C1} . The final upward join is to M , so s_3 holds the rest of the normal outer-joined tables: {A2, B4, B5, C5, C6, D4, D5, D6 , D7} . Figure 9-3 shows this division of the outer-joined tables into subsets. Figure 9-3. The outer-joined-table subsets for the earlier queryNow, consider Step 4. Since you need only comparative (or relative) running rowcounts, let the initial number of rows in C3 be whatever it would need to be to make r_0 a convenient round number ”say, 10. From there, work out the list of values for r_n :
Finally, apply Step 5. The minimum for all relative rowcounts is r_1 , so the subsets s_0 and s_1 , which are eligible to join before the query reaches A1 , should both join at that point in the join order just before the join to A1 . The minimum after that point is r_3 , so the subsets s_2 and s_3 should both join at the end of the join order. The only further restriction is that the subsets join from the top down, so that lower tables are reached through the join tree. For example, you cannot join to C1 before you join to B1 , nor to D7 before joining to both B5 and C6 . Since I have labeled master tables by level, D s below C s, C s below B s, B s below A s, you can assure top-down joins by just putting each subset in alphabetical order. For example, a complete join order, consistent with the requirements, would be: (C3, D1, B3, {D2, D3}, {C2, C4}, A1 , M, A3, {B1, B2, C1}, {A2, B4, B5, C5, C6, D4, D5, D6, D7}) , where curly brackets are left in place to show the subsets. |