ProblemYou've created a table with a unique index to prevent duplicate values in the indexed column or columns. But this results in an error if you attempt to insert a duplicate row, and you want to avoid having to deal with such errors. SolutionOne approach is to just ignore the error. Another is to use an INSERT IGNORE, REPLACE, or INSERT ... ON DUPLICATE KEY UPDATE statement, each of which modifies MySQL's duplicate-handling behavior. For bulk-loading operations, LOAD DATA has modifiers that enable you to specify how to handle duplicates. DiscussionBy default, MySQL generates an error when you insert a row that duplicates an existing unique key value. Suppose that the person table has the following structure, with a unique index on the last_name and first_name columns: CREATE TABLE person ( last_name CHAR(20) NOT NULL, first_name CHAR(20) NOT NULL, address CHAR(40), PRIMARY KEY (last_name, first_name) ); An attempt to insert a row with duplicate values in the indexed columns results in an error: mysql> INSERT INTO person (last_name, first_name) -> VALUES('X1','Y1'); Query OK, 1 row affected (0.00 sec) mysql> INSERT INTO person (last_name, first_name) -> VALUES('X1','Y1'); ERROR 1062 (23000): Duplicate entry 'X1-Y1' for key 1 If you're issuing the statements from the mysql program interactively, you can simply say, "Okay, that didn't work," ignore the error, and continue. But if you write a program to insert the rows, an error may terminate the program. One way to avoid this is to modify the program's error-handling behavior to trap the error and then ignore it. See Section 2.2 for information about error-handling techniques. If you want to prevent the error from occurring in the first place, you might consider using a two-query method to solve the duplicate-row problem:
But that doesn't really work: another client might insert the same row after the SELECT and before the INSERT, in which case the error would still occur for your INSERT. To make sure that doesn't happen, you could use a transaction or lock the tables, but then you've gone from two statements to four. MySQL provides three single-query solutions to the problem of handling duplicate rows. Choose from among them according to the duplicate-handling behavior you want to affect:
INSERT IGNORE is more efficient than REPLACE because it doesn't actually insert duplicates. Thus, it's most applicable when you just want to make sure a copy of a given row is present in a table. REPLACE, on the other hand, is often more appropriate for tables in which other nonkey columns need to be replaced. INSERT ... ON DUPLICATE KEY UPDATE is appropriate when you must insert a record if it doesn't exist, but just update some of its columns if the new record is a duplicate in the indexed columns. Suppose that you're maintaining a table named passtbl for a web application that contains email addresses and password hash values, and that is keyed by email address: CREATE TABLE passtbl ( email VARCHAR(60) NOT NULL, password VARBINARY(60) NOT NULL, PRIMARY KEY (email) ); How do you create new rows for new users, but change passwords of existing rows for existing users? A typical algorithm for handling row maintenance might look like this:
These steps must be performed within a transaction or with the tables locked to prevent other users from changing the tables while you're using them. In MySQL, you can use REPLACE to simplify both cases to the same single-statement operation: REPLACE INTO passtbl (email,password) VALUES(address,hash_value); If no row with the given email address exists, MySQL creates a new one. If a row does exist, MySQL replaces it; in effect, this updates the password column of the row associated with the address. INSERT IGNORE and REPLACE are useful when you know exactly what values should be stored in the table when you attempt to insert a row. That's not always the case. For example, you might want to insert a row if it doesn't exist, but update only certain parts of it otherwise. This commonly occurs when you use a table for counting. Suppose that you're recording votes for candidates in polls, using the following table: CREATE TABLE poll_vote ( poll_id INT UNSIGNED NOT NULL AUTO_INCREMENT, candidate_id INT UNSIGNED, vote_count INT UNSIGNED, PRIMARY KEY (poll_id, candidate_id) ); The primary key is the combination of poll and candidate number. The table should be used like this:
Neither INSERT IGNORE nor REPLACE are appropriate here because for all votes except the first, you don't know what the vote count should be. INSERT ... ON DUPLICATE KEY UPDATE works better here. The following example shows how it works, beginning with an empty table: mysql> SELECT * FROM poll_vote; Empty set (0.01 sec) mysql> INSERT INTO poll_vote (poll_id,candidate_id,vote_count) VALUES(14,2,1) -> ON DUPLICATE KEY UPDATE vote_count = vote_count + 1; Query OK, 1 row affected (0.01 sec) mysql> SELECT * FROM poll_vote; +---------+--------------+------------+ | poll_id | candidate_id | vote_count | +---------+--------------+------------+ | 14 | 2 | 1 | +---------+--------------+------------+ 1 row in set (0.01 sec) mysql> INSERT INTO poll_vote (poll_id,candidate_id,vote_count) VALUES(14,2,1) -> ON DUPLICATE KEY UPDATE vote_count = vote_count + 1; Query OK, 2 rows affected (0.00 sec) mysql> SELECT * FROM poll_vote; +---------+--------------+------------+ | poll_id | candidate_id | vote_count | +---------+--------------+------------+ | 14 | 2 | 2 | +---------+--------------+------------+ 1 row in set (0.00 sec) For the first INSERT, no row for the candidate exists, so the row is inserted. For the second INSERT, the row already exists, so MySQL just updates the vote count. With INSERT ... ON DUPLICATE KEY UPDATE, there is no need to check whether the row exists because MySQL does that for you. The row count indicates what action the INSERT statement performs: 1 for a new row and 2 for an update to an existing row. The techniques just described have the benefit of eliminating overhead that might otherwise be required for a transaction. But this benefit comes at the price of portability because they all involve MySQL-specific syntax. If portability is a high priority, you might prefer to stick with a transactional approach. See AlsoFor bulk record-loading operations in which you use the LOAD DATA statement to load a set of rows from a file into a table, duplicate-row handling can be controlled using the statement's IGNORE and REPLACE modifiers. These produce behavior analogous to that of the INSERT IGNORE and REPLACE statements. See Section 10.7 for more information. The use of INSERT ... ON DUPLICATE KEY UPDATE for initializing and updating counts is further demonstrated in Recipes Section 11.14 and Section 19.12. |