Updating One Table Based on Values in Another

12.19.1 Problem

You need to update existing records in one table based on the contents of records in another table, but MySQL doesn't yet allow join syntax in the WHERE clause of UPDATE statements. So you have no way to associate the two tables.

12.19.2 Solution

Create a new table that is populated from the result of a join between the original table and the table containing the new information. Then replace the original table with the new one. Or write a program that selects information from the related table and issues the queries necessary to update the original table. Or use mysql to generate and execute the queries.

12.19.3 Discussion

Sometimes when updating records in one table, it's necessary to refer to records in another table. Recall that the states table used in several earlier recipes contains rows that look like this:

mysql> SELECT * FROM states;
+----------------+--------+------------+----------+
| name | abbrev | statehood | pop |
+----------------+--------+------------+----------+
| Alaska | AK | 1959-01-03 | 550043 |
| Alabama | AL | 1819-12-14 | 4040587 |
| Arkansas | AR | 1836-06-15 | 2350725 |
| Arizona | AZ | 1912-02-14 | 3665228 |
...

Now suppose that you want to add some new columns to this table, using information from another table, city, that contains information about each state's capital city and largest (most populous) city:

mysql> SELECT * FROM city;
+----------------+----------------+----------------+
| state | capital | largest |
+----------------+----------------+----------------+
| Alabama | Montgomery | Birmingham |
| Alaska | Juneau | Anchorage |
| Arizona | Phoenix | Phoenix |
| Arkansas | Little Rock | Little Rock |
...

It would be easy enough to add new columns named capital and largest to the states table structure using an ALTER TABLE statement. But then how would you modify the rows to fill in the new columns with the appropriate values? The most convenient way to do this would be to run an UPDATE query that uses join syntax in the WHERE clause:

UPDATE states,city
SET states.capital = city.capital, states.largest = city.largest
WHERE states.name = city.state;

That doesn't work, because MySQL does not yet support this syntax. Another solution would be to use a subselect in the WHERE clause, but subselects are not scheduled for inclusion until MySQL 4.1. What are the alternatives? Clearly, you don't want to update each row by hand. That's unacceptably tediousand silly, too, given that the new information is already stored in the city table. The states and city tables contain a common key (state names), so let's use that information to relate the two tables and perform the update. There are a few techniques you can use to achieve the same result as a multiple-table update:

Create a new table that is like the original states table, but includes the additional columns to be added from the related table, city. Populate the new table using the result of a join between the states and city tables, then replace the original states table with the new one.
Write a program that uses the information from the city table to generate and execute UPDATE statements that update the states table one state at a time.
Use mysql to generate the UPDATE statements.

12.19.4 Performing a Related-Table Update Using Table Replacement

The table-replacement approach works as follows. To extend the states table with the capital and largest columns from the city table, create a tmp table that is like the states table but adds capital and largest columns:

CREATE TABLE tmp
(
 name VARCHAR(30) NOT NULL, # state name
 abbrev CHAR(2) NOT NULL, # 2-char abbreviation
 statehood DATE, # date of entry into the Union
 pop BIGINT, # population as of 4/1990
 capital VARCHAR(30), # capital city
 largest VARCHAR(30), # most populous city
 PRIMARY KEY (abbrev)
);

Then populate tmp using the result of a join between states and city that matches up rows in the two tables using state names:

INSERT INTO tmp (name, abbrev, statehood, pop, capital, largest)
SELECT
 states.name, states.abbrev, states.statehood, states.pop,
 city.capital, city.largest
FROM
 states LEFT JOIN city ON states.name = city.state;

The query uses a LEFT JOIN for a reason. Suppose the city table is incomplete and doesn't contain a row for every state. In that case, a regular join will fail to produce an output row for any states that are missing from the city table, and the resulting tmp table will be missing records for those states, even though they are present in the states table. Not good! The LEFT JOIN ensures that the SELECT produces output for every row in the states table, whether or not it's matched by a city table row. Any state that is missing in the city table would end up with NULL values in the tmp table for the capital and largest columns, but that's appropriate when you don't know the city namesand generating an incomplete row certainly is preferable to losing the row entirely.

The resulting tmp table is like the original one, but has two new columns, capital and largest. You can examine it to see this. After verifying that you're satisfied with the tmp table, use it to replace the original states table:

DROP TABLE states;
ALTER TABLE tmp RENAME TO states;

If you want to make sure there is no time, however brief, during which the states table is unavailable, perform the replacement like this instead:

RENAME TABLE states TO states_old, tmp TO states;
DROP TABLE states_old;

12.19.5 Performing a Related-Table Update by Writing a Program

The table-replacement technique is efficient because it lets the server do all the work. On the other hand, it is most appropriate when you're updating all or most of the rows in the table. If you're updating just a few rows, it may be less work to update the table "in place" for just those rows that need it. Also, table replacement requires more than twice the space of the original states table while you're carrying out the update procedure. If you have a huge table to update, you may not want to use all that space.

A second technique for updating a table based on a related table is to read the information from the related table and use it to generate UPDATE statements. For example, to update states with the information stored in the city table, read the city names and use them to create and issue a series of queries like this:

UPDATE states SET capital = 'Montgomery', largest = 'Birmingham'
WHERE name = 'Alabama';
UPDATE states SET capital = 'Juneau', largest = 'Anchorage'
WHERE name = 'Alaska';
UPDATE states SET capital = 'Phoenix', largest = 'Phoenix'
WHERE name = 'Arizona';
UPDATE states SET capital = 'Little Rock', largest = 'Little Rock'
WHERE name = 'Arkansas';
...

To carry out this procedure, first alter the states table so that it includes the new columns:[3]

[3] If you've already modified states using the table-replacement procedure, first restore the table to its original structure by dropping the capital and largest columns:

ALTER TABLE states ADD capital VARCHAR(30), ADD largest VARCHAR(30);

Next, write a program that reads the city table and uses its contents to produce UPDATE statements that modify the states table. Here is an example script, update_cities.pl, that does so:

#! /usr/bin/perl -w
# update_cities.pl - update states table capital and largest city columns,
# using contents of city table. This assumes that the states table has
# been modified to include columns named capital and largest.

use strict;
use lib qw(/usr/local/apache/lib/perl);
use Cookbook;

my $dbh = Cookbook::connect ( );

my $sth = $dbh->prepare ("SELECT state, capital, largest FROM city");
$sth->execute ( );
while (my ($state, $capital, $largest) = $sth->fetchrow_array ( ))
{
 $dbh->do ("UPDATE states SET capital = ?, largest = ? WHERE name = ?",
 undef, $capital, $largest, $state);
}

$dbh->disconnect ( );

exit (0);

The script has all the table and column names built in to it, which makes it very special purpose. You could generalize this procedure by writing a function that accepts parameters indicating the table names, the columns to use for matching records in the two tables, and the columns to use for updating the rows. The update_related.pl script in the joins directory of the recipes distribution shows one way to do this.

12.19.6 Performing a Related-Table Update Using mysql

If your data values don't require any special handling for internal quotes or other special characters, you can use mysql to generate and process the UPDATE statements. This is similar to the technique shown in Recipe 12.18 for using mysql to simulate a subselect.

Put the following statement in a file, update_cities.sql:

SELECT CONCAT('UPDATE states SET capital = '',capital,
 '', largest = '',largest,'' WHERE name = '',state,'';')
FROM city;

The query reads the rows of the city table and uses them to generate statements that update states. Execute the query and save the result in tmp:

% mysql -N cookbook < update_cities.sql > tmp

tmp will contain statements that look like the queries generated by the update_cities.pl script. Assuming that you're added the capital and largest columns to the states table, you can execute these statements as follows to update the table:

% mysql cookbook < tmp

Using the mysql Client Program

Writing MySQL-Based Programs

Record Selection Techniques

Working with Strings

Working with Dates and Times

Sorting Query Results

Generating Summaries

Modifying Tables with ALTER TABLE

Obtaining and Using Metadata

Importing and Exporting Data

Generating and Using Sequences

Using Multiple Tables

Statistical Techniques

Handling Duplicates

Performing Transactions

Introduction to MySQL on the Web

Incorporating Query Resultsinto Web Pages

Processing Web Input with MySQL

Using MySQL-Based Web Session Management

Appendix A. Obtaining MySQL Software

Appendix B. JSP and Tomcat Primer

Appendix C. References