Tuning Anti-Joins Using Subqueries

With an anti-join, we retrieve all rows from one table for which there is no matching row in another table. There are a number of ways of expressing anti-joins in MySQL.

Perhaps the most natural way of writing an anti-join is to express it as a NOT IN subquery. For instance, Example 21-7 returns all of the customers who are not employees.

Example 21-7. Example of an anti-join using NOT IN

SELECT count(*)
 FROM customers
 WHERE (contact_surname,contact_firstname, date_of_birth)
 NOT IN (SELECT surname,firstname, date_of_birth
 FROM employees)

Short Explain
-------------
1 PRIMARY select(ALL) on customers using no key
 Using where
2 DEPENDENT SUBQUERY select(ALL) on employees using no key
 Using where

Another way to express this query is to use a NOT EXISTS subquery. Just as MySQL will rewrite IN subqueries to use the EXISTS clause, so too will MySQL rewrite a NOT IN subquery as a NOT EXISTS. So, from MySQL's perspective, Example 21-7 and Example 21-8 are equivalent.

Example 21-8. Example of an anti-join using NOT EXISTS

SELECT count(*)
 FROM customers
 WHERE NOT EXISTS (SELECT *
 FROM employees
 WHERE surname=customers.contact_surname
 AND firstname=customers.contact_firstname
 AND date_of_birth=customers.date_of_birth)

Short Explain
-------------
1 PRIMARY select(ALL) on customers using no key
 Using where
2 DEPENDENT SUBQUERY select(ALL) on employees using no key
 Using where

A third but somewhat less natural way to express this query is to use a LEFT JOIN. This is a join in which all rows from the first table are returned even if there is no matching row in the second table. NULLs are returned for columns from the second table that do not have a matching row.

In Example 21-9 we join customers to employees and return NULL values for all of the employees who are not also customers. We can use this characteristic to eliminate the customers who are not also employees by testing for a NULL in a normally NOT NULL customer column.

Example 21-9. Example of an anti-join using LEFT JOIN

SELECT count(*)
 FROM customers
 LEFT JOIN employees
 ON (customers.contact_surname=employees.surname
 and customers.contact_firstname=employees.firstname
 and customers.date_of_birth=employees.date_of_birth)
 WHERE employees.surname IS NULL

Short Explain
-------------
1 SIMPLE select(ALL) on customers using no key

1 SIMPLE select(ALL) on employees using no key
 Using where; Not exists

21.2.1. Optimizing an Anti-Join

The guidelines for optimizing anti-joins using subqueries or left joins are identical to the guidelines for optimizing normal subqueries or joins. Scalability and good performance will be achieved only if we create an index to optimize the subquery or the join. For the previous examples, this would mean creating an index on customer names as follows:[*]

[*] It might occur to you that creating an index on customers would produce a better join than the index on employees. However, LEFT JOINs can only be performed with the table that will return all rows as the first table in the jointhis means that the join order can only be customers to employees, and therefore the index to support the join must be on employees.

 CREATE INDEX i_customers_name ON employees(surname,firstname,date_of_birth);

Figure 21-6 shows the massive performance improvements that result when we create a supporting index for an anti-join.

Figure 21-6. Comparison of anti-join techniques

Figure 21-6 also shows a substantial performance advantage for the NOT IN subquery over NOT EXISTS or LEFT JOIN when there is no index to support the anti-join. We noted earlier that MySQL rewrites the NOT IN-based statement to a NOT EXISTS, so it is at first surprising that there should be a performance difference. However, examination of the NOT IN rewrite reveals a number of undocumented compiler directives within the rewritten SQL that appear to give NOT IN a substantial performance advantage in the absence of an index.

Not only is the LEFT JOIN technique slower than NOT IN or NOT EXISTS, but it degrades much faster as the quantity of data to be processed increases. Figure 21-7 shows that the LEFT JOIN version of the anti-join degrades much more rapidly as the size of the tables being joined increasesthis is the opposite of the effect shown for normal subqueries, where the join solution was found to be more scalable than the subquery solution (refer to Figure 21-3).

To optimize an anti-join, create indexes to support the subquery or right hand table of a LEFT JOIN. If you cannot support the subquery with an index, use NOT IN in preference to NOT EXISTS or LEFT JOIN.


Part I: Stored Programming Fundamentals

Introduction to MySQL Stored Programs

MySQL Stored Programming Tutorial

Language Fundamentals

Blocks, Conditional Statements, and Iterative Programming

Using SQL in Stored Programming

Error Handling

Part II: Stored Program Construction

Creating and Maintaining Stored Programs

Transaction Management

MySQL Built-in Functions

Stored Functions

Triggers

Part III: Using MySQL Stored Programs in Applications

Using MySQL Stored Programs in Applications

Using MySQL Stored Programs with PHP

Using MySQL Stored Programs with Java

Using MySQL Stored Programs with Perl

Using MySQL Stored Programs with Python

Using MySQL Stored Programs with .NET

Part IV: Optimizing Stored Programs

Stored Program Security

Tuning Stored Programs and Their SQL

Basic SQL Tuning

Advanced SQL Tuning

Optimizing Stored Program Code

Best Practices in MySQL Stored Program Development



MySQL Stored Procedure Programming
MySQL Stored Procedure Programming
ISBN: 0596100892
EAN: 2147483647
Year: 2004
Pages: 208

Flylib.com © 2008-2020.
If you may any questions please contact us: flylib@qtcs.net