Recipe 12.3. Comparing a Table to Itself


Problem

You want to compare rows in a table to other rows in the same table. For example, you want to find all paintings in your collection by the artist who painted The Potato Eaters. Or you want to know which states listed in the states table joined the Union in the same year as New York. Or you want to know which states did not join the Union in the same year as any other state.

Solution

Problems that require comparing a table to itself involve an operation known as a self-join. It's performed much like other joins, except that you must always use table aliases so that you can refer to the same table different ways within the statement.

Discussion

A special case of joining one table to another occurs when both tables are the same. This is called a self-join. Although many people find the idea confusing or strange to think about at first, it's perfectly legal. It's likely that you'll find yourself using self-joins quite often because they are so important.

A tip-off that you need a self-join is when you want to know which pairs of elements in a table satisfy some condition. For example, suppose that your favorite painting is The Potato Eaters, and you want to identify all the items in your collection that were done by the artist who painted it. Do so as follows:

  1. Identify the row in the painting table that contains the title The Potato Eaters, so that you can refer to its a_id value.

  2. Use the a_id value to match other rows in the table that have the same a_id value.

  3. Display the titles from those matching rows.

The artist ID and painting titles that we begin with look like this:

mysql> SELECT a_id, title FROM painting ORDER BY a_id; +------+-------------------+ | a_id | title             | +------+-------------------+ |    1 | The Last Supper   | |    1 | The Mona Lisa     | |    3 | Starry Night      | |    3 | The Potato Eaters | |    3 | The Rocks         | |    5 | Les Deux Soeurs   | +------+-------------------+ 

A two-step method for picking out the right titles without a join is to look up the artist's ID with one statement and then use the ID in a second statement to select rows that match it:

mysql> SELECT @id := a_id FROM painting WHERE title = 'The Potato Eaters'; +-------------+ | @id := a_id | +-------------+ |           3 | +-------------+ mysql> SELECT title FROM painting WHERE a_id = @id; +-------------------+ | title             | +-------------------+ | Starry Night      | | The Potato Eaters | | The Rocks         | +-------------------+ 

A different solution that requires only a single statement is to use a self-join. The trick to this lies in figuring out the proper notation to use. First attempts at writing a statement that joins a table to itself often look something like this:

mysql> SELECT title     -> FROM painting INNER JOIN painting     -> ON a_id = a_id;     -> WHERE title = 'The Potato Eaters'; ERROR 1066 (42000): Not unique table/alias: 'painting' 

The problem with that statement is that the column references are ambiguous. MySQL can't tell which instance of the painting table any given column name refers to. The solution is to give at least one instance of the table an alias so that you can distinguish column references by using different table qualifiers. The following statement shows how to do this, using the aliases p1 and p2 to refer to the painting table different ways:

mysql> SELECT p2.title     -> FROM painting AS p1 INNER JOIN painting AS p2     -> ON p1.a_id = p2.a_id     -> WHERE p1.title = 'The Potato Eaters'; +-------------------+ | title             | +-------------------+ | Starry Night      | | The Potato Eaters | | The Rocks         | +-------------------+ 

The statement output illustrates something typical of self-joins: when you begin with a reference value in one table instance (The Potato Eaters) to find matching rows in a second table instance (paintings by the same artist), the output includes the reference value. That makes sense: after all, the reference matches itself. If you want to find only other paintings by the same artist, explicitly exclude the reference value from the output:

mysql> SELECT p2.title     -> FROM painting AS p1 INNER JOIN painting AS p2     -> ON p1.a_id = p2.a_id     -> WHERE p1.title = 'The Potato Eaters' AND p2.title != 'The Potato Eaters'; +--------------+ | title        | +--------------+ | Starry Night | | The Rocks    | +--------------+ 

A more general way to exclude the reference value without naming it literally is to specify that you don't want output rows to have the same title as the reference, whatever that title happens to be:

mysql> SELECT p2.title     -> FROM painting AS p1 INNER JOIN painting AS p2     -> ON p1.a_id = p2.a_id     -> WHERE p1.title = 'The Potato Eaters' AND p2.title != p1.title +--------------+ | title        | +--------------+ | Starry Night | | The Rocks    | +--------------+ 

The preceding statements use comparisons of ID values to match rows in the two table instances, but any kind of value can be used. For example, to use the states table to answer the question "Which states joined the Union in the same year as New York?," perform a temporal pairwise comparison based on the year part of the dates in the table's statehood column:

mysql> SELECT s2.name, s2.statehood     -> FROM states AS s1 INNER JOIN states AS s2     -> ON YEAR(s1.statehood) = YEAR(s2.statehood)     -> WHERE s1.name = 'New York'     -> ORDER BY s2.name; +----------------+------------+ | name           | statehood  | +----------------+------------+ | Connecticut    | 1788-01-09 | | Georgia        | 1788-01-02 | | Maryland       | 1788-04-28 | | Massachusetts  | 1788-02-06 | | New Hampshire  | 1788-06-21 | | New York       | 1788-07-26 | | South Carolina | 1788-05-23 | | Virginia       | 1788-06-25 | +----------------+------------+ 

Here again, the reference value (New York) appears in the output. If you want to prevent that, add to the ON expression a term that explicitly excludes the reference:

mysql> SELECT s2.name, s2.statehood     -> FROM states AS s1 INNER JOIN states AS s2     -> ON YEAR(s1.statehood) = YEAR(s2.statehood) AND s1.name != s2.name     -> WHERE s1.name = 'New York'     -> ORDER BY s2.name; +----------------+------------+ | name           | statehood  | +----------------+------------+ | Connecticut    | 1788-01-09 | | Georgia        | 1788-01-02 | | Maryland       | 1788-04-28 | | Massachusetts  | 1788-02-06 | | New Hampshire  | 1788-06-21 | | South Carolina | 1788-05-23 | | Virginia       | 1788-06-25 | +----------------+------------+ 

Like the problem of finding other paintings by the painter of The Potato Eaters, the statehood problem can be solved by using a user-defined variable and two statements. That will always be true when you're seeking matches for just one particular row in your table. Other problems require finding matches for several rows, in which case the two-statement method will not work. Suppose that you want to find each pair of states that joined the Union in the same year. In this case, the output potentially can include any pair of rows from the states table. There is no fixed reference value, so you cannot store the reference in a variable. A self-join is perfect for this problem:

mysql> SELECT YEAR(s1.statehood) AS year,     -> s1.name AS name1, s1.statehood AS statehood1,     -> s2.name AS name2, s2.statehood AS statehood2     -> FROM states AS s1 INNER JOIN states AS s2     -> ON YEAR(s1.statehood) = YEAR(s2.statehood) AND s1.name != s2.name     -> ORDER BY year, s1.name, s2.name; +------+----------------+------------+----------------+------------+ | year | name1          | statehood1 | name2          | statehood2 | +------+----------------+------------+----------------+------------+ | 1787 | Delaware       | 1787-12-07 | New Jersey     | 1787-12-18 | | 1787 | Delaware       | 1787-12-07 | Pennsylvania   | 1787-12-12 | | 1787 | New Jersey     | 1787-12-18 | Delaware       | 1787-12-07 | | 1787 | New Jersey     | 1787-12-18 | Pennsylvania   | 1787-12-12 | | 1787 | Pennsylvania   | 1787-12-12 | Delaware       | 1787-12-07 | | 1787 | Pennsylvania   | 1787-12-12 | New Jersey     | 1787-12-18 | ... | 1912 | Arizona        | 1912-02-14 | New Mexico     | 1912-01-06 | | 1912 | New Mexico     | 1912-01-06 | Arizona        | 1912-02-14 | | 1959 | Alaska         | 1959-01-03 | Hawaii         | 1959-08-21 | | 1959 | Hawaii         | 1959-08-21 | Alaska         | 1959-01-03 | +------+----------------+------------+----------------+------------+ 

The condition in the ON clause that requires state pair names not to be identical eliminates the trivially duplicate rows showing that each state joined the Union in the same year as itself. But you'll notice that each remaining pair of states still appears twice. For example, there is one row that lists Delaware and New Jersey, and another that lists New Jersey and Delaware. This is often the case with self-joins: they produce pairs of rows that contain the same values, but for which the values are not in the same order. For techniques that eliminate these "near duplicates" from the query result, see Section 14.5.

Some self-join problems are of the "Which values are not matched by other rows in the table?" variety. An instance of this is the question "Which states did not join the Union in the same year as any other state?" Finding these states is a "nonmatch" problem, which is the type of problem that typically involves a LEFT JOIN. In this case, the solution uses a LEFT JOIN of the states table to itself:

mysql> SELECT s1.name, s1.statehood     -> FROM states AS s1 LEFT JOIN states AS s2     -> ON YEAR(s1.statehood) = YEAR(s2.statehood) AND s1.name != s2.name     -> WHERE s2.name IS NULL     -> ORDER BY s1.name; +----------------+------------+ | name           | statehood  | +----------------+------------+ | Alabama        | 1819-12-14 | | Arkansas       | 1836-06-15 | | California     | 1850-09-09 | | Colorado       | 1876-08-01 | | Illinois       | 1818-12-03 | | Indiana        | 1816-12-11 | | Iowa           | 1846-12-28 | | Kansas         | 1861-01-29 | | Kentucky       | 1792-06-01 | | Louisiana      | 1812-04-30 | | Maine          | 1820-03-15 | | Michigan       | 1837-01-26 | | Minnesota      | 1858-05-11 | | Mississippi    | 1817-12-10 | | Missouri       | 1821-08-10 | | Nebraska       | 1867-03-01 | | Nevada         | 1864-10-31 | | North Carolina | 1789-11-21 | | Ohio           | 1803-03-01 | | Oklahoma       | 1907-11-16 | | Oregon         | 1859-02-14 | | Rhode Island   | 1790-05-29 | | Tennessee      | 1796-06-01 | | Utah           | 1896-01-04 | | Vermont        | 1791-03-04 | | West Virginia  | 1863-06-20 | | Wisconsin      | 1848-05-29 | +----------------+------------+ 

For each row in the states table, the statement selects rows in which the state has a statehood value in the same year, not including that state itself. For rows having no such match, the LEFT JOIN forces the output to contain a row anyway, with all the s2 columns set to NULL. Those rows identify the states with no other state that joined the Union in the same year.




MySQL Cookbook
MySQL Cookbook
ISBN: 059652708X
EAN: 2147483647
Year: 2004
Pages: 375
Authors: Paul DuBois

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net