Recipe 12.16. Referring to Join Output Column Names in Programs

Problem

You need to process the result of a join from within a program, but the column names in the result set aren't unique.

Solution

Revise the query using column aliases so that each column has a unique name, or refer to the columns by position.

Discussion

Joins typically retrieve columns from related tables, so it's not unusual for columns selected from different tables to have the same names. Consider the following join that shows the items in your art collection (originally seen in Section 12.1). For each painting, it displays artist name, painting title, the state in which you acquired the item, and how much it cost:

mysql> SELECT artist.name, painting.title, states.name, painting.price     -> FROM artist INNER JOIN painting INNER JOIN states     -> ON artist.a_id = painting.a_id AND painting.state = states.abbrev; +----------+-------------------+----------+-------+ | name     | title             | name     | price | +----------+-------------------+----------+-------+ | Da Vinci | The Last Supper   | Indiana  |    34 | | Da Vinci | The Mona Lisa     | Michigan |    87 | | Van Gogh | Starry Night      | Kentucky |    48 | | Van Gogh | The Potato Eaters | Kentucky |    67 | | Van Gogh | The Rocks         | Iowa     |    33 | | Renoir   | Les Deux Soeurs   | Nebraska |    64 | +----------+-------------------+----------+-------+

The statement is written using table qualifiers for each output column. Nevertheless, the column names in the output are not distinct because MySQL doesn't include table names in the column headings. If you're processing the result of the join from within one of your own programs and fetching rows into a data structure that references column values by name, nonunique column names can cause some values to become inaccessible. The following Perl script fragment illustrates the difficulty:

$stmt = qq{   SELECT artist.name, painting.title, states.name, painting.price   FROM artist INNER JOIN painting INNER JOIN states   ON artist.a_id = painting.a_id AND painting.state = states.abbrev }; $sth = $dbh->prepare ($stmt); $sth->execute (); # Determine the number of columns in result set rows two ways: # - Check the NUM_OF_FIELDS statement handle attribute # - Fetch a row into a hash and see how many keys the hash contains $count1 = $sth->{NUM_OF_FIELDS}; $ref = $sth->fetchrow_hashref (); $count2 = keys (%{$ref}); print "The statement is: $stmt\n"; print "According to NUM_OF_FIELDS, the result set has $count1 columns\n"; print "The column names are: " . join sort (",", @{$sth->{NAME}})) . "\n"; print "According to the row hash size, the result set has $count2 columns\n"; print "The column names are: " . join sort (",", @{$sth->{NAME}})) . "\n";

The script issues the statement and then determines the number of columns in the result, first by checking the NUM_OF_FIELDS attribute and then by fetching a row into a hash and counting the number of hash keys. Executing this script results in the following output:

According to NUM_OF_FIELDS, the result set has 4 columns The column names are: name,name,title,price According to the row hash size, the result set has 3 columns The column names are: name,price,title

There is a problem here: the column counts don't match. The second count is 3 (not 4) because the nonunique column names cause multiple column values to be mapped onto the same hash element. As a result of these hash-key collisions, some of the values are lost. To solve this problem, make the column names unique by supplying aliases. For example, the statement can be rewritten like this:

SELECT   artist.name AS painter, painting.title,   states.name AS state, painting.price FROM artist INNER JOIN painting INNER JOIN states   ON artist.a_id = painting.a_id AND painting.state = states.abbrev

If you make that change and rerun the script, its output becomes:

According to NUM_OF_FIELDS, the result set has 4 columns The column names are: painter,price,state,title According to the row hash size, the result set has 4 columns The column names are: painter,price,state,title

Now the two column counts are the same, which indicates that no values are lost when fetching into a hash.

Another way to address the problem that requires no column renaming is to fetch the row into something other than a hash. For example, you can fetch the row into an array and refer to the columns by ordinal position within the array:

while (my @val = $sth->fetchrow_array ()) {   print "painter: $val[0], title: $val[1], "       . "state: $val[2], price: $val[3]\n"; }

The name-clash problem just described might have different solutions in other languages. For example, the problem doesn't occur in quite the same way in Python scripts that use the MySQLdb module. Suppose that you retrieve a row using a dictionary, Python's analog to a Perl hash (Section 2.4). In this case, MySQLdb notices clashing column names and places them in the dictionary using a key consisting of the table name and column name. Thus, for the following statement, the dictionary keys would be name, title, states.name, and price:

SELECT artist.name, painting.title, states.name, painting.price FROM artist INNER JOIN painting INNER JOIN states   ON artist.a_id = painting.a_id AND painting.state = states.abbrev

That means column values won't get lost. Unfortunately, it's still necessary to be aware of nonunique names. If you try to refer to column values using just the column names, you won't get the results you expect for those names that are reported with a leading table name. This problem does not occur if you use aliases to make each column name unique because the dictionary entries will have the names that you assign.