Sometimes you want to query one table, use that result to query another table, and then use that result to query yet another table. It's tempting to do this as three separate queries, but the right solution is to chain them yourself into one SQL statement.
Consider a database where employees have job titles, and job titles have ranks, and a rank has a salary, as shown in Table 2-3, Table 2-4, and Table 2-5.
Employee | Title |
---|---|
Gordon Russell | Lecturer |
Andrew Cumming | Teaching fellow |
Jim Smith | Technician |
Title | Rank |
---|---|
Lecturer | LECT1 |
Teaching fellow | LECT2 |
Technician | TECH1 |
Rank | Payment |
---|---|
LECT1 | 2000.00 |
LECT2 | 3000.00 |
TECH1 | 5000.00 |
TECH2 | 6000.00 |
Determining how much to pay Andrew Cumming would require three steps. First, you'd need to determine Andrew's title:
mysql> SELECT title FROM jobs WHERE employee = 'Andrew Cumming'; +-----------------+ | title | +-----------------+ | Teaching Fellow | +-----------------+
Next, you'd need to determine the pay rank for a teaching fellow:
mysql> SELECT rank FROM ranks WHERE title = 'Teaching Fellow'; +-------+ | rank | +-------+ | LECT2 | +-------+
Finally, you'd need to look up the salary for someone at the LECT2 pay grade:
mysql> SELECT payment FROM salary WHERE rank = 'LECT2'; +---------+ | payment | +---------+ | 3000.00 | +---------+
That's not efficient, because you'd need to pass three different queries to the database and process the results in between. If a table is updated during this process the answer might be wrong, or the query might even return an error. Combining queries can make people nervous. Nervous programmers often use subqueries:
mysql> SELECT payment FROM salary WHERE rank = -> (SELECT rank FROM ranks WHERE title = -> (SELECT title FROM jobs WHERE employee = 'Andrew Cumming')); +---------+ | payment | +---------+ | 3000.00 | +---------+
The preceding code is good in that you have reduced the problem to a single query and thus removed many of the overhead problems, but subquery statements can be slow. When your subquery statements contain no aggregate functions (such as MAX( )), chances are you don't need a subqueryyou need a JOIN. If you have a working subquery arrangement, follow these steps to make it a JOIN:
Here's an intermediate stage:
SELECT payment FROM salary,ranks,jobs WHERE salary.rank = (Select ranks.rank from grades AND ranks.title = (Select jobs.title from jobs AND jobs.employee = 'Andrew Cumming'))
Ultimately, you get this:
SELECT payment FROM salary,ranks,jobs WHERE salary.rank = ranks.rank AND ranks.title = jobs.title AND jobs.employee = 'Andrew Cumming'
Another approach is to take the conditions inside the subquery statements and make them JOIN ON conditions:
SELECT payment FROM salary JOIN ranks ON (salary.rank = ranks.rank) JOIN jobs ON (ranks.title = jobs.title) WHERE jobs.employee = 'Andrew Cumming'
2.4.1. Looking for What's Not There
Often programmers can handle this join-instead-of-subquery approach for inclusive matches, but it starts to get a little shakier with exclusive matches (looking for things that do not exist). For instance, how do you determine whether there are any ranks not currently allocated to a title? The brute force approach is to query the database for all ranks from the salary table, and then query each one in the ranks table. Needless to say, such an approach would result in bad performance. The next general approach is to use a subquery with NOT IN, but this too may not perform well:
mysql> SELECT salary.rank FROM salary -> WHERE rank NOT IN (SELECT rank FROM ranks); +-------+ | rank | +-------+ | TECH2 | +-------+
These may be performance losers because the subquery will likely be executed first, creating an intermediate temporary table in the database. This temporary table is then used to solve the outer query. However, in creating the temporary table, all indexes that may have existed on salary won't be used, and the database will have to perform a full scan on the temporary table.
The query is the opposite of the nested subquery problem considered earlier, because now you want to find nonmatching rows between tables. Strangely, trying the earlier technique but using != rather than = results in a huge mess of results which don't mean anything. Instead, you need to rely on OUTER JOIN. Put all tables required into a single FROM clause using an OUTER JOIN to link the tables. You are looking for things in salary which are not in ranks, and with OUTER JOIN the rows that don't match will have NULL values for ranks.rank:
mysql> SELECT salary.rank -> FROM salary LEFT OUTER JOIN ranks ON (salary.rank = ranks.rank) -> WHERE ranks.rank IS NULL; +-------+ | rank | +-------+ | TECH2 | +-------+
You also can use this technique to eliminate EXISTS and NOT EXISTS. Without the subquery, the optimizer finds it much easier to use your indexes.
SQL Fundamentals
Joins, Unions, and Views
Text Handling
Date Handling
Number Crunching
Online Applications
Organizing Data
Storing Small Amounts of Data
Locking and Performance
Reporting
Users and Administration
Wider Access
Index