Recipe 12.2. Finding Rows with No Match in Another Table

Recipe 12.2. Finding Rows with No Match in Another Table

Problem

You want to find rows in one table that have no match in another. Or you want to produce a list on the basis of a join between tables, and you want the list to include an entry for every row in the first table, even when there are no matches in the second table.

Solution

Use an outer joina LEFT JOIN or a RIGHT JOIN .

Discussion

Section 12.1 focused on inner joins, which are joins that find matches between two tables. However, the answers to some questions require determining which rows do not have a match (or, stated another way, which rows have values that are missing from the other table). For example, you might want to know which artists in the artist table you don't yet have any paintings by. The same kind of question occurs in other contexts. Some examples:

• You're working in sales. You have a list of potential customers, and another list of people who have placed orders. To focus your efforts on people who are not yet actual customers, you want to find people in the first list who are not in the second.

• You have one list of baseball players, and another list of players who have hit home runs, and you want to know which players in the first list have not hit a home run. The answer is determined by finding those players in the first list who are not in the second.

For these types of questions, it's necessary to use an outer join . Like an inner join, an outer join can find matches between tables. But unlike an inner join, an outer join can also determine which rows in one table have no match in another. Two types of outer join are LEFT JOIN and RIGHT JOIN .

To see why outer joins are useful, let's consider the problem of determining which artists in the artist table are missing from the painting table. At present, the tables are small, so it's easy to examine them visually:

```mysql>

SELECT * FROM artist ORDER BY a_id;

+------+----------+
a_id  name
+------+----------+
1  Da Vinci
2  Monet
3  Van Gogh
4  Picasso
5  Renoir
+------+----------+
mysql>

SELECT * FROM painting ORDER BY a_id, p_id;

+------+------+-------------------+-------+-------+
a_id  p_id  title              state  price
+------+------+-------------------+-------+-------+
1     1  The Last Supper    IN        34
1     2  The Mona Lisa      MI        87
3     3  Starry Night       KY        48
3     4  The Potato Eaters  KY        67
3     5  The Rocks          IA        33
5     6  Les Deux Soeurs    NE        64
+------+------+-------------------+-------+-------+
```

By looking at the tables, you can see that you have no paintings by Monet or Picasso (there are no painting rows with an a_id value of 2 or 4). But as you acquire more paintings and the tables get larger, it won't be so easy to eyeball them and answer the question by inspection. Can you answer it using SQL? Sure, although first attempts at a solution generally look something like the following statement, which uses a not-equal condition to look for mismatches between the two tables:

```mysql>

SELECT * FROM artist INNER JOIN painting

->

ON artist.a_id != painting.a_id;

+------+----------+------+------+-------------------+-------+-------+
a_id  name      a_id  p_id  title              state  price
+------+----------+------+------+-------------------+-------+-------+
2  Monet        1     1  The Last Supper    IN        34
3  Van Gogh     1     1  The Last Supper    IN        34
4  Picasso      1     1  The Last Supper    IN        34
5  Renoir       1     1  The Last Supper    IN        34
2  Monet        1     2  The Mona Lisa      MI        87
3  Van Gogh     1     2  The Mona Lisa      MI        87
4  Picasso      1     2  The Mona Lisa      MI        87
5  Renoir       1     2  The Mona Lisa      MI        87
1  Da Vinci     3     3  Starry Night       KY        48
2  Monet        3     3  Starry Night       KY        48
4  Picasso      3     3  Starry Night       KY        48
5  Renoir       3     3  Starry Night       KY        48
1  Da Vinci     3     4  The Potato Eaters  KY        67
2  Monet        3     4  The Potato Eaters  KY        67
4  Picasso      3     4  The Potato Eaters  KY        67
5  Renoir       3     4  The Potato Eaters  KY        67
1  Da Vinci     3     5  The Rocks          IA        33
2  Monet        3     5  The Rocks          IA        33
4  Picasso      3     5  The Rocks          IA        33
5  Renoir       3     5  The Rocks          IA        33
1  Da Vinci     5     6  Les Deux Soeurs    NE        64
2  Monet        5     6  Les Deux Soeurs    NE        64
3  Van Gogh     5     6  Les Deux Soeurs    NE        64
4  Picasso      5     6  Les Deux Soeurs    NE        64
+------+----------+------+------+-------------------+-------+-------+
```

That output obviously is not correct. (For example, it falsely indicates that each painting was painted by several different artists.) The problem is that the statement produces a list of all combinations of values from the two tables in which the artist ID values aren't the same, whereas what you really need is a list of values in artist that aren't present at all in painting . The trouble here is that an inner join can only produce results based on combinations of values that are present in both tables. It can't tell you anything about values that are missing from one of them.

When faced with the problem of finding values in one table that have no match in (or that are missing from) another table, you should get in the habit of thinking, "Aha, that's a LEFT JOIN problem." A LEFT JOIN is one type of outer join: it's similar to an inner join in that it attempts to match rows in the first (left) table with the rows in the second (right) table. But in addition, if a left table row has no match in the right table, a LEFT JOIN still produces a rowone in which all the columns from the right table are set to NULL . This means you can find values that are missing from the right table by looking for NULL . It's easier to understand how this happens by working in stages. Begin with an inner join that displays matching rows:

```mysql>

SELECT * FROM artist INNER JOIN painting

->

ON artist.a_id = painting.a_id;

+------+----------+------+------+-------------------+-------+-------+
a_id  name      a_id  p_id  title              state  price
+------+----------+------+------+-------------------+-------+-------+
1  Da Vinci     1     1  The Last Supper    IN        34
1  Da Vinci     1     2  The Mona Lisa      MI        87
3  Van Gogh     3     3  Starry Night       KY        48
3  Van Gogh     3     4  The Potato Eaters  KY        67
3  Van Gogh     3     5  The Rocks          IA        33
5  Renoir       5     6  Les Deux Soeurs    NE        64
+------+----------+------+------+-------------------+-------+-------+
```

In this output, the first a_id column comes from the artist table and the second one comes from the painting table.

Now compare that result with the output you get from a LEFT JOIN . A LEFT JOIN is written much like an INNER JOIN :

```mysql>

SELECT * FROM artist LEFT JOIN painting

->

ON artist.a_id = painting.a_id;

+------+----------+------+------+-------------------+-------+-------+
a_id  name      a_id  p_id  title              state  price
+------+----------+------+------+-------------------+-------+-------+
1  Da Vinci     1     1  The Last Supper    IN        34
1  Da Vinci     1     2  The Mona Lisa      MI        87
2  Monet     NULL  NULL  NULL               NULL    NULL
3  Van Gogh     3     3  Starry Night       KY        48
3  Van Gogh     3     4  The Potato Eaters  KY        67
3  Van Gogh     3     5  The Rocks          IA        33
4  Picasso   NULL  NULL  NULL               NULL    NULL
5  Renoir       5     6  Les Deux Soeurs    NE        64
+------+----------+------+------+-------------------+-------+-------+
```

The output is similar to that from the inner join, except that the LEFT JOIN also produces at least one output row for every artist row, including those that have no painting table match. For those output rows, all the columns from painting are set to NULL . These are rows that the inner join does not produce.

Next, to restrict the output only to the nonmatched artist rows, add a WHERE clause that looks for NULL values in any painting column that cannot otherwise contain NULL . This filters out the rows that the inner join produces, leaving those produced only by the outer join:

```mysql>

SELECT * FROM artist LEFT JOIN painting

->

ON artist.a_id = painting.a_id

->

WHERE painting.a_id IS NULL;

+------+---------+------+------+-------+-------+-------+
a_id  name     a_id  p_id  title  state  price
+------+---------+------+------+-------+-------+-------+
2  Monet    NULL  NULL  NULL   NULL    NULL
4  Picasso  NULL  NULL  NULL   NULL    NULL
+------+---------+------+------+-------+-------+-------+
```

Finally, to show only the artist table values that are missing from the painting table, shorten the output column list to include only columns from the artist table. The result is that the LEFT JOIN lists those left-table rows containing a_id values that are not present in the right table:

```mysql>

SELECT artist.* FROM artist LEFT JOIN painting

->

ON artist.a_id = painting.a_id

->

WHERE painting.a_id IS NULL;

+------+---------+
a_id  name
+------+---------+
2  Monet
4  Picasso
+------+---------+
```

A similar kind of operation can be used to report each left-table value along with an indicator as to whether it's present in the right table. To do this, perform a LEFT JOIN that counts the number of times each left-table value occurs in the right table. A count of zero indicates that the value is not present. The following statement lists each artist from the artist table and shows whether you have any paintings by the artist:

```mysql>

SELECT artist.name,

->

IF(COUNT(painting.a_id)>0,'yes','no') AS 'in collection'

->

FROM artist LEFT JOIN painting ON artist.a_id = painting.a_id

->

GROUP BY artist.name;

+----------+---------------+
name      in collection
+----------+---------------+
Da Vinci  yes
Monet     no
Picasso   no
Renoir    yes
Van Gogh  yes
+----------+---------------+
```

A RIGHT JOIN is another kind of outer join. It is like LEFT JOIN but reverses the roles of the left and right tables. Semantically, RIGHT JOIN forces the matching process to produce a row from each table in the right table, even in the absence of a corresponding row in the left table. Syntactically, tbl1 LEFT JOIN tbl2 is equivalent to tbl2 RIGHT JOIN tbl1 . This means that you would rewrite the preceding LEFT JOIN as follows to convert it to a RIGHT JOIN that produces the same results:

```mysql>

SELECT artist.name,

->

IF(COUNT(painting.a_id)>0,'yes','no') AS 'in collection'

->

FROM painting RIGHT JOIN artist ON artist.a_id = painting.a_id

->

GROUP BY artist.name;

+----------+---------------+
name      in collection
+----------+---------------+
Da Vinci  yes
Monet     no
Picasso   no
Renoir    yes
Van Gogh  yes
+----------+---------------+
```

Elsewhere in this book, I'll generally refer only to LEFT JOIN for brevity, but such references apply to RIGHT JOIN as well if you reverse the roles of the tables.

Other Ways to Write LEFT JOIN and RIGHT JOIN Queries

As with INNER JOIN , if the names of the columns to be matched in an outer join are the same in both tables and you are comparing them with the = operator, you can use a USING clause rather than ON . For example, the following two statements are equivalent:

```SELECT * FROM t1 LEFT JOIN t2 ON t1.n = t2.n;
SELECT * FROM t1 LEFT JOIN t2 USING (n);
```

As are these:

```SELECT * FROM t1 RIGHT JOIN t2 ON t1.n = t2.n;
SELECT * FROM t1 RIGHT JOIN t2 USING (n);
```

In the special case that you want to base the comparison on every column that appears in both tables, you can use NATURAL LEFT JOIN or NATURAL RIGHT JOIN and omit the ON or USING clause:

```SELECT * FROM t1 NATURAL LEFT JOIN t2;
SELECT * FROM t1 NATURAL RIGHT JOIN t2;
```