Recipe 12.2. Finding Rows with No Match in Another Table


Problem

You want to find rows in one table that have no match in another. Or you want to produce a list on the basis of a join between tables, and you want the list to include an entry for every row in the first table, even when there are no matches in the second table.

Solution

Use an outer joina LEFT JOIN or a RIGHT JOIN.

Discussion

Section 12.1 focused on inner joins, which are joins that find matches between two tables. However, the answers to some questions require determining which rows do not have a match (or, stated another way, which rows have values that are missing from the other table). For example, you might want to know which artists in the artist table you don't yet have any paintings by. The same kind of question occurs in other contexts. Some examples:

  • You're working in sales. You have a list of potential customers, and another list of people who have placed orders. To focus your efforts on people who are not yet actual customers, you want to find people in the first list who are not in the second.

  • You have one list of baseball players, and another list of players who have hit home runs, and you want to know which players in the first list have not hit a home run. The answer is determined by finding those players in the first list who are not in the second.

For these types of questions, it's necessary to use an outer join. Like an inner join, an outer join can find matches between tables. But unlike an inner join, an outer join can also determine which rows in one table have no match in another. Two types of outer join are LEFT JOIN and RIGHT JOIN.

To see why outer joins are useful, let's consider the problem of determining which artists in the artist table are missing from the painting table. At present, the tables are small, so it's easy to examine them visually:

mysql> SELECT * FROM artist ORDER BY a_id; +------+----------+ | a_id | name     | +------+----------+ |    1 | Da Vinci | |    2 | Monet    | |    3 | Van Gogh | |    4 | Picasso  | |    5 | Renoir   | +------+----------+ mysql> SELECT * FROM painting ORDER BY a_id, p_id; +------+------+-------------------+-------+-------+ | a_id | p_id | title             | state | price | +------+------+-------------------+-------+-------+ |    1 |    1 | The Last Supper   | IN    |    34 | |    1 |    2 | The Mona Lisa     | MI    |    87 | |    3 |    3 | Starry Night      | KY    |    48 | |    3 |    4 | The Potato Eaters | KY    |    67 | |    3 |    5 | The Rocks         | IA    |    33 | |    5 |    6 | Les Deux Soeurs   | NE    |    64 | +------+------+-------------------+-------+-------+ 

By looking at the tables, you can see that you have no paintings by Monet or Picasso (there are no painting rows with an a_id value of 2 or 4). But as you acquire more paintings and the tables get larger, it won't be so easy to eyeball them and answer the question by inspection. Can you answer it using SQL? Sure, although first attempts at a solution generally look something like the following statement, which uses a not-equal condition to look for mismatches between the two tables:

mysql> SELECT * FROM artist INNER JOIN painting     -> ON artist.a_id != painting.a_id; +------+----------+------+------+-------------------+-------+-------+ | a_id | name     | a_id | p_id | title             | state | price | +------+----------+------+------+-------------------+-------+-------+ |    2 | Monet    |    1 |    1 | The Last Supper   | IN    |    34 | |    3 | Van Gogh |    1 |    1 | The Last Supper   | IN    |    34 | |    4 | Picasso  |    1 |    1 | The Last Supper   | IN    |    34 | |    5 | Renoir   |    1 |    1 | The Last Supper   | IN    |    34 | |    2 | Monet    |    1 |    2 | The Mona Lisa     | MI    |    87 | |    3 | Van Gogh |    1 |    2 | The Mona Lisa     | MI    |    87 | |    4 | Picasso  |    1 |    2 | The Mona Lisa     | MI    |    87 | |    5 | Renoir   |    1 |    2 | The Mona Lisa     | MI    |    87 | |    1 | Da Vinci |    3 |    3 | Starry Night      | KY    |    48 | |    2 | Monet    |    3 |    3 | Starry Night      | KY    |    48 | |    4 | Picasso  |    3 |    3 | Starry Night      | KY    |    48 | |    5 | Renoir   |    3 |    3 | Starry Night      | KY    |    48 | |    1 | Da Vinci |    3 |    4 | The Potato Eaters | KY    |    67 | |    2 | Monet    |    3 |    4 | The Potato Eaters | KY    |    67 | |    4 | Picasso  |    3 |    4 | The Potato Eaters | KY    |    67 | |    5 | Renoir   |    3 |    4 | The Potato Eaters | KY    |    67 | |    1 | Da Vinci |    3 |    5 | The Rocks         | IA    |    33 | |    2 | Monet    |    3 |    5 | The Rocks         | IA    |    33 | |    4 | Picasso  |    3 |    5 | The Rocks         | IA    |    33 | |    5 | Renoir   |    3 |    5 | The Rocks         | IA    |    33 | |    1 | Da Vinci |    5 |    6 | Les Deux Soeurs   | NE    |    64 | |    2 | Monet    |    5 |    6 | Les Deux Soeurs   | NE    |    64 | |    3 | Van Gogh |    5 |    6 | Les Deux Soeurs   | NE    |    64 | |    4 | Picasso  |    5 |    6 | Les Deux Soeurs   | NE    |    64 | +------+----------+------+------+-------------------+-------+-------+ 

That output obviously is not correct. (For example, it falsely indicates that each painting was painted by several different artists.) The problem is that the statement produces a list of all combinations of values from the two tables in which the artist ID values aren't the same, whereas what you really need is a list of values in artist that aren't present at all in painting. The trouble here is that an inner join can only produce results based on combinations of values that are present in both tables. It can't tell you anything about values that are missing from one of them.

When faced with the problem of finding values in one table that have no match in (or that are missing from) another table, you should get in the habit of thinking, "Aha, that's a LEFT JOIN problem." A LEFT JOIN is one type of outer join: it's similar to an inner join in that it attempts to match rows in the first (left) table with the rows in the second (right) table. But in addition, if a left table row has no match in the right table, a LEFT JOIN still produces a rowone in which all the columns from the right table are set to NULL. This means you can find values that are missing from the right table by looking for NULL. It's easier to understand how this happens by working in stages. Begin with an inner join that displays matching rows:

mysql> SELECT * FROM artist INNER JOIN painting     -> ON artist.a_id = painting.a_id; +------+----------+------+------+-------------------+-------+-------+ | a_id | name     | a_id | p_id | title             | state | price | +------+----------+------+------+-------------------+-------+-------+ |    1 | Da Vinci |    1 |    1 | The Last Supper   | IN    |    34 | |    1 | Da Vinci |    1 |    2 | The Mona Lisa     | MI    |    87 | |    3 | Van Gogh |    3 |    3 | Starry Night      | KY    |    48 | |    3 | Van Gogh |    3 |    4 | The Potato Eaters | KY    |    67 | |    3 | Van Gogh |    3 |    5 | The Rocks         | IA    |    33 | |    5 | Renoir   |    5 |    6 | Les Deux Soeurs   | NE    |    64 | +------+----------+------+------+-------------------+-------+-------+ 

In this output, the first a_id column comes from the artist table and the second one comes from the painting table.

Now compare that result with the output you get from a LEFT JOIN. A LEFT JOIN is written much like an INNER JOIN:

mysql> SELECT * FROM artist LEFT JOIN painting     -> ON artist.a_id = painting.a_id; +------+----------+------+------+-------------------+-------+-------+ | a_id | name     | a_id | p_id | title             | state | price | +------+----------+------+------+-------------------+-------+-------+ |    1 | Da Vinci |    1 |    1 | The Last Supper   | IN    |    34 | |    1 | Da Vinci |    1 |    2 | The Mona Lisa     | MI    |    87 | |    2 | Monet    | NULL | NULL | NULL              | NULL  |  NULL | |    3 | Van Gogh |    3 |    3 | Starry Night      | KY    |    48 | |    3 | Van Gogh |    3 |    4 | The Potato Eaters | KY    |    67 | |    3 | Van Gogh |    3 |    5 | The Rocks         | IA    |    33 | |    4 | Picasso  | NULL | NULL | NULL              | NULL  |  NULL | |    5 | Renoir   |    5 |    6 | Les Deux Soeurs   | NE    |    64 | +------+----------+------+------+-------------------+-------+-------+ 

The output is similar to that from the inner join, except that the LEFT JOIN also produces at least one output row for every artist row, including those that have no painting table match. For those output rows, all the columns from painting are set to NULL. These are rows that the inner join does not produce.

Next, to restrict the output only to the nonmatched artist rows, add a WHERE clause that looks for NULL values in any painting column that cannot otherwise contain NULL. This filters out the rows that the inner join produces, leaving those produced only by the outer join:

mysql> SELECT * FROM artist LEFT JOIN painting     -> ON artist.a_id = painting.a_id     -> WHERE painting.a_id IS NULL; +------+---------+------+------+-------+-------+-------+ | a_id | name    | a_id | p_id | title | state | price | +------+---------+------+------+-------+-------+-------+ |    2 | Monet   | NULL | NULL | NULL  | NULL  |  NULL | |    4 | Picasso | NULL | NULL | NULL  | NULL  |  NULL | +------+---------+------+------+-------+-------+-------+ 

Finally, to show only the artist table values that are missing from the painting table, shorten the output column list to include only columns from the artist table. The result is that the LEFT JOIN lists those left-table rows containing a_id values that are not present in the right table:

mysql> SELECT artist.* FROM artist LEFT JOIN painting     -> ON artist.a_id = painting.a_id     -> WHERE painting.a_id IS NULL; +------+---------+ | a_id | name    | +------+---------+ |    2 | Monet   | |    4 | Picasso | +------+---------+ 

A similar kind of operation can be used to report each left-table value along with an indicator as to whether it's present in the right table. To do this, perform a LEFT JOIN that counts the number of times each left-table value occurs in the right table. A count of zero indicates that the value is not present. The following statement lists each artist from the artist table and shows whether you have any paintings by the artist:

mysql> SELECT artist.name,     -> IF(COUNT(painting.a_id)>0,'yes','no') AS 'in collection'     -> FROM artist LEFT JOIN painting ON artist.a_id = painting.a_id     -> GROUP BY artist.name; +----------+---------------+ | name     | in collection | +----------+---------------+ | Da Vinci | yes           | | Monet    | no            | | Picasso  | no            | | Renoir   | yes           | | Van Gogh | yes           | +----------+---------------+ 

A RIGHT JOIN is another kind of outer join. It is like LEFT JOIN but reverses the roles of the left and right tables. Semantically, RIGHT JOIN forces the matching process to produce a row from each table in the right table, even in the absence of a corresponding row in the left table. Syntactically, tbl1 LEFT JOIN tbl2 is equivalent to tbl2 RIGHT JOIN tbl1. This means that you would rewrite the preceding LEFT JOIN as follows to convert it to a RIGHT JOIN that produces the same results:

mysql> SELECT artist.name,     -> IF(COUNT(painting.a_id)>0,'yes','no') AS 'in collection'     -> FROM painting RIGHT JOIN artist ON artist.a_id = painting.a_id     -> GROUP BY artist.name; +----------+---------------+ | name     | in collection | +----------+---------------+ | Da Vinci | yes           | | Monet    | no            | | Picasso  | no            | | Renoir   | yes           | | Van Gogh | yes           | +----------+---------------+ 

Elsewhere in this book, I'll generally refer only to LEFT JOIN for brevity, but such references apply to RIGHT JOIN as well if you reverse the roles of the tables.

Other Ways to Write LEFT JOIN and RIGHT JOIN Queries

As with INNER JOIN, if the names of the columns to be matched in an outer join are the same in both tables and you are comparing them with the = operator, you can use a USING clause rather than ON. For example, the following two statements are equivalent:

SELECT * FROM t1 LEFT JOIN t2 ON t1.n = t2.n; SELECT * FROM t1 LEFT JOIN t2 USING (n); 

As are these:

SELECT * FROM t1 RIGHT JOIN t2 ON t1.n = t2.n; SELECT * FROM t1 RIGHT JOIN t2 USING (n); 

In the special case that you want to base the comparison on every column that appears in both tables, you can use NATURAL LEFT JOIN or NATURAL RIGHT JOIN and omit the ON or USING clause:

SELECT * FROM t1 NATURAL LEFT JOIN t2; SELECT * FROM t1 NATURAL RIGHT JOIN t2; 


See Also

As shown in this section, LEFT JOIN is useful for finding values with no match in another table or for showing whether each value is matched. LEFT JOIN may also be used to produce a summary that includes all items in a list, even those for which there's nothing to summarize. This is very common for characterizing the relationship between a master table and a detail table. For example, a LEFT JOIN can produce "total sales per customer" reports that list all customers, even those who haven't bought anything during the summary period. (See Section 12.4 for information about master-detail lists.)

You can also use LEFT JOIN to perform consistency checking when you receive two datafiles that are supposed to be related, and you want to determine whether they really are. (That is, you want to check the integrity of their relationship.) Import each file into a MySQL table, and then run a couple of LEFT JOIN statements to determine whether there are unattached rows in one table or the otherthat is, rows that have no match in the other table. Section 12.13 discusses how to identify (and optionally delete) these unattached rows.




MySQL Cookbook
MySQL Cookbook
ISBN: 059652708X
EAN: 2147483647
Year: 2004
Pages: 375
Authors: Paul DuBois

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net