Chapter 5. Joins | SQL Performance Tuning

There was once a simple rule for joinsIndex all the columns . As DBMSs have grown more sophisticated, and different join plans have become part of every DBMS's repertoire , the ways to optimize have become more varied and more subtle. Now, to get the most out of your DBMS, you need to know what the special situations are, and what the DBMS is getting up to.

The first thing to keep in mind is, when your work requires you to join information from multiple tables, you don't want to do the joins yourself. What you do want is to provide the optimum conditions so that your DBMS will perform the best join on its ownbecause it makes a big difference.

For example, if two tables, called Table1 and Table2 , both have 1,000 rows and this SQL statement is executed:

 SELECT * FROM Table1, Table2    WHERE Table1.column1 = Table2.column1

then, effectively, the DBMS is supposed to do this:

Process the FROM clause, which saysMake a temporary table that is a cartesian join of all the rows in both Table1 and Table2 . This step results in a 1,000,000-row temporary table. (A cartesian join, or cartesian product , of two sets, say A and B , is the set of all ordered pairs {a , b} where a is a member of set A and b is a member of set B . In database terms, a cartesian product joins all rows in Table1 with all rows in Table2 . Thus if Table1 has the values {T_a1 , T_b1} and Table2 has the values {T_a2 , T_b2} then the cartesian product is {(T_a1 , T_a2) (T_a1 , T_b2) (T_b1 , T_a2) (T_b1 , T_b2)} . Cartesian products are useful for explanation, but when we see an operation that "goes cartesian" we usually criticize the optimizer.)
Then process the WHERE clause, which saysFilter the rows in the temporary table that have the same value in the columns named column1 .

The temporary table's size is the product of the sizes of the two original tables, which means the processing time goes up geometrically if Table1 and Table2 get bigger (this is known as a cartesian explosion ). The DBMS must use a strategy that comes up with the same results as if there was such an explosion, but without the horrific response times that the formal requirement implies.

Fortunately, three such strategies, or join plans, exist. In order of importance, they are: nested-loop joins, sort -merge joins, and hash joins. In this chapter, we'll show you how these join plans work, and how you can make a big difference to the response time your DBMS can produce when using them. We'll also talk about another strategy you can use to improve join performance: avoiding the join altogether by using a join index. Because of the geometric effects of joins, we are no longer talking about shaving a few percentage points off of a timing. Getting joins to work just right can result in queries working several times faster than they would if you were ignorant of their workings. In our examples, we'll mostly use what's called "old style join syntax" as we expect that's what most readers will be familiar with. But we'll also examine "ANSI style join syntax" and show you how it compares with the old style.