Loading Data Efficiently


Working with Sequences

Many applications need to generate unique numbers for identification purposes. The requirement for unique values occurs in a number of contexts: membership numbers, sample or lot numbering, customer IDs, bug report or trouble ticket tags, and so forth.

MySQL's mechanism for providing unique numbers is through the AUTO_INCREMENT column attribute, which enables you to generate sequential numbers automatically. However, AUTO_INCREMENT columns are handled somewhat differently by the various storage engines that MySQL supports, so it's important to understand not only the general concepts underlying the AUTO_INCREMENT mechanism, but also the differences between storage engines. This section describes how AUTO_INCREMENT columns work in general and for specific storage engines so that you can use them effectively without running into the traps that sometimes surprise people. It also describes how you can generate a sequence without using an AUTO_INCREMENT column.

General AUTO_INCREMENT Concepts

AUTO_INCREMENT columns must be defined according to the following conditions:

  • There can be only one column per table with the AUTO_INCREMENT attribute and it should have an integer data type.

  • The column must be indexed. It is most common to use a PRIMARY KEY or UNIQUE index, but it is allowable to use a non-unique index.

  • The column must have a NOT NULL constraint. MySQL makes the column NOT NULL even if you don't explicitly declare it that way.

Once created, an AUTO_INCREMENT column behaves like this:

  • Inserting NULL into an AUTO_INCREMENT column causes MySQL to generate the next sequence number automatically and insert that value into the column. AUTO_INCREMENT sequences normally begin at 1 and increase monotonically, so successive records inserted into a table get sequence values of 1, 2, 3, and so forth. Under some circumstances and depending on the storage engine, it may be possible to set or reset the next sequence number explicitly or to reuse values deleted from the top end of the sequence.

  • The value of the most recently generated sequence number can be obtained by calling the LAST_INSERT_ID() function, as long as you call it during the same connection that was used to generate the number. This enables you to reference the AUTO_INCREMENT value in subsequent statements even without knowing what the value is. LAST_INSERT_ID() returns 0 if no AUTO_INCREMENT value has been generated during the current connection.

    LAST_INSERT_ID() is tied only to AUTO_INCREMENT values generated during the current connection to the server. In particular, it is not affected by AUTO_INCREMENT activity associated with other clients. You can generate a sequence number, and then call LAST_INSERT_ID() to retrieve it later, even if other clients have generated their own sequence values in the meantime.

    For a multiple-row INSERT that generates several AUTO_INCREMENT values, LAST_INSERT_ID() returns the first one.

  • Inserting a row without specifying an explicit value for the AUTO_INCREMENT column is the same as inserting NULL into the column. If ai_col is an AUTO_INCREMENT column, these statements are equivalent:

     INSERT INTO t (ai_col,name) VALUES(NULL,'abc'); INSERT INTO t (name) VALUES('abc'); 

  • By default, inserting 0 into an AUTO_INCREMENT column has the same effect as inserting NULL. If you enable the NO_AUTO_VALUE_ON_ZERO SQL mode, inserting a 0 results in a 0 being stored, not the next sequence value.

  • If you insert a record and specify a non-NULL, non-zero value for an AUTO_INCREMENT column that has a unique index, one of two things will happen. If a record already exists with that value, a duplicate-key error occurs. If a record does not exist with that value, the record is inserted with the AUTO_INCREMENT column set to the given value. If the value is larger than the current next sequence number, the sequence is reset to continue with the next value after that for subsequent rows. In other words, you can "bump up" the counter by inserting a record with a sequence value greater than the current counter value.

    Bumping up the counter can result in gaps in the sequence, but you also can exploit this behavior to generate a sequence that begins at a value higher than 1. Suppose that you create a table with an AUTO_INCREMENT column, but you want the sequence to begin at 1000 rather than at 1. To achieve this, insert a "fake" record with a value of 999 in the AUTO_INCREMENT column. Records inserted subsequently are assigned sequence numbers beginning with 1000, after which you can delete the fake record.

    Why might you want to begin a sequence with a value higher than 1? One reason is to make sequence numbers all have the same number of digits. If you're generating customer ID numbers, and you expect never to have more than a million customers, you could begin the series at 1,000,000. You'll be able to add well over a million customer records before the digit count for customer ID values changes. Other reasons for not beginning a sequence at 1 might have nothing to do with technical considerations. For example, if you were assigning membership numbers, you might want to begin a sequence at a number higher than 1 to forestall political squabbling over who gets to be member number 1, by making sure there isn't any such number. Hey, it happens. Sad, but true.

  • For some storage engines, values deleted from the top of a sequence are reused. In this case, if you delete the record containing the largest value in an AUTO_INCREMENT column, that value is reused the next time you generate a new value. An implication of this property is that if you delete all the records in the table, all values are reused and the sequence starts over beginning at 1.

  • If you use UPDATE to set an AUTO_INCREMENT column to a value that already exists in another row, a duplicate-key error occurs if the column has a unique index. If you update the column to a value larger than any existing column value, the sequence continues with the next number after that for subsequent records. If you update the column to 0, it is set to 0 (this is true regardless of whether NO_AUTO_VALUE_ON_ZERO is enabled).

  • If you use REPLACE to update a record based on the value of the AUTO_INCREMENT column, the AUTO_INCREMENT value does not change. If you use REPLACE to update a record based on the value of some other PRIMARY KEY or UNIQUE index, the AUTO_INCREMENT column is updated with a new sequence number if you set it to NULL, or if you set it to 0 and NO_AUTO_VALUE_ON_ZERO is not enabled.

AUTO_INCREMENT Handling Per Storage Engine

The general AUTO_INCREMENT characteristics just described form the basis for understanding sequence behavior specific to other storage engines. Most engines implement behavior that for the most part is similar to that just described, so keep the preceding discussion in mind as you read on.

AUTO_INCREMENT for MyISAM Tables

MyISAM tables offer the most flexibility for sequence handling. The MyISAM storage engine has the following AUTO_INCREMENT characteristics:

  • MyISAM sequences normally are monotonic. The values in an automatically generated series are strictly increasing and are not reused if you delete records. If the maximum value is 143 and you delete the record containing that value, MySQL still generates the next value as 144. There are two exceptions to this behavior. First, if you empty a table with trUNCATE TABLE, the counter may be reset to begin at 1. Second, values deleted from the top of a sequence are reused if you use a composite index to generate multiple sequences within a table. (This technique is discussed shortly.)

  • MyISAM sequences begin at 1 by default, but it is possible to start the sequence at a higher value. With MyISAM tables, you can specify the initial value explicitly by using an AUTO_INCREMENT = n option in the CREATE TABLE statement. The following example creates a MyISAM table with an AUTO_INCREMENT column named seq that begins at 1,000,000:

     CREATE TABLE mytbl (     seq INT UNSIGNED NOT NULL AUTO_INCREMENT,     PRIMARY KEY (seq) ) ENGINE = MYISAM AUTO_INCREMENT = 1000000; 

    A table can have only one AUTO_INCREMENT column, so there is never any ambiguity about the column to which the terminating AUTO_INCREMENT = n option applies, even if the table has multiple columns.

  • You can change the current sequence counter for an existing MyISAM table with ALTER TABLE. If the sequence currently stands at 1000, the following statement causes the next number generated to be 2000:

     ALTER TABLE mytbl AUTO_INCREMENT = 2000; 

    If you want to reuse values that have been deleted from the top of the sequence, you can do that, too. The following statement will set the counter down as far as possible, causing the next number to be one larger than the current maximum sequence value:

     ALTER TABLE mytbl AUTO_INCREMENT = 1; 

    You cannot use the AUTO_INCREMENT option to set the current counter lower than the current maximum value in the table. If an AUTO_INCREMENT column contains the values 1 and 10, using AUTO_INCREMENT = 5 sets the counter so that the next automatic value is 11.

  • If you use INSERT DELAYED, the AUTO_INCREMENT value is not generated until the record actually is inserted. In this case, LAST_INSERT_ID() cannot be relied on to return the sequence value.

The MyISAM storage engine supports the use of composite (multiple-column) indexes for creating multiple independent sequences within the same table. To use this feature, create a multiple-column PRIMARY KEY or UNIQUE index that includes an AUTO_INCREMENT column as its final column. For each distinct key in the leftmost column or columns of the index, the AUTO_INCREMENT column will generate a separate sequence of values. For example, you might use a table named bugs for tracking bug reports of several software projects, where the table is defined as follows:

 CREATE TABLE bugs (     proj_name   VARCHAR(20) NOT NULL,     bug_id      INT UNSIGNED NOT NULL AUTO_INCREMENT,     description VARCHAR(100),     PRIMARY KEY (proj_name, bug_id) ) ENGINE = MYISAM; 

Here, the proj_name column identifies the project name and the description column contains the bug description. The bug_id column is an AUTO_INCREMENT column; by creating an index that ties it to the proj_name column, you can generate an independent series of sequence numbers for each project. Suppose that you enter the following records into the table to register three bugs for SuperBrowser and two for SpamSquisher:

 mysql> INSERT INTO bugs (proj_name,description)     -> VALUES('SuperBrowser','crashes when displaying complex tables'); mysql> INSERT INTO bugs (proj_name,description)     -> VALUES('SuperBrowser','image scaling does not work'); mysql> INSERT INTO bugs (proj_name,description)     -> VALUES('SpamSquisher','fails to block known blacklisted domains'); mysql> INSERT INTO bugs (proj_name,description)     -> VALUES('SpamSquisher','fails to respect whitelist addresses'); mysql> INSERT INTO bugs (proj_name,description)     -> VALUES('SuperBrowser','background patterns not displayed'); 

The resulting table contents are as follows:

 mysql> SELECT * FROM bugs ORDER BY proj_name, bug_id; +--------------+--------+------------------------------------------+ | proj_name    | bug_id | description                              | +--------------+--------+------------------------------------------+ | SpamSquisher |      1 | fails to block known blacklisted domains | | SpamSquisher |      2 | fails to respect whitelist addresses     | | SuperBrowser |      1 | crashes when displaying complex tables   | | SuperBrowser |      2 | image scaling does not work              | | SuperBrowser |      3 | background patterns not displayed        | +--------------+--------+------------------------------------------+ 

The table numbers the bug_id values for each project separately, regardless of the order in which records are entered for projects. You need not enter all records for one project before you enter records for another.

If you use a composite index to create multiple sequences, values deleted from the top of each individual sequence are reused. This contrasts with the usual MyISAM behavior of not reusing values.

AUTO_INCREMENT for MEMORY Tables

The MEMORY storage engine has the following AUTO_INCREMENT characteristics:

  • The initial sequence value can be set with an AUTO_INCREMENT = n table option in the CREATE TABLE statement, and can be modified after table creation time using that option with ALTER TABLE.

  • Values that are deleted from the top of the sequence are not reused. Exception: If you empty the table with trUNCATE TABLE, the sequence may be reset to begin at 1.

  • Composite indexes cannot be used to generate multiple independent sequences within a table.

  • The MEMORY storage engine does not support AUTO_INCREMENT prior to MySQL 4.1.

  • If you use INSERT DELAYED, the AUTO_INCREMENT value is not generated until the record actually is inserted. In this case, LAST_INSERT_ID() cannot be relied on to return the sequence value.

AUTO_INCREMENT for InnoDB Tables

The InnoDB storage engine has the following AUTO_INCREMENT characteristics:

  • Before MySQL 5.0.3, the initial sequence value cannot be set with an AUTO_INCREMENT = n table option in the CREATE TABLE statement, nor can it be modified using that option with ALTER TABLE.

  • Values that are deleted from the top of the sequence normally are not reused. Exception: As of MySQL 5.0.3, if you empty the table with trUNCATE TABLE, the sequence may be reset to begin at 1. Reuse can occur under the following conditions as well. For an InnoDB table, the first time that you generate a sequence value for an AUTO_INCREMENT column, InnoDB uses one greater than the current maximum value in the column (or 1 if the table is empty). InnoDB maintains this counter in memory for use in generating subsequent values; it is not stored in the table itself. This means that if you delete values from the top of the sequence and then restart the server, the deleted values are reused. Restarting the server also cancels the effect of using an AUTO_INCREMENT table option in a CREATE TABLE or ALTER TABLE statement.

  • Gaps in a sequence can occur if transactions that generate AUTO_INCREMENT values are rolled back.

  • Composite indexes cannot be used to generate multiple independent sequences within a table.

AUTO_INCREMENT for BDB Tables

The BDB storage engine has the following AUTO_INCREMENT characteristics:

  • The initial sequence value cannot be set with an AUTO_INCREMENT = n table option in the CREATE TABLE statement, nor can it be modified using that option with ALTER TABLE.

  • Values that are deleted from the top of the sequence are reused.

  • Composite indexes can be used to generate multiple independent sequences within a table. The AUTO_INCREMENT column must be the final column named in the index. Values deleted from the top of each individual sequence are reused, just as for single-column sequences.

Issues to Consider with AUTO_INCREMENT Columns

You should keep the following points in mind to avoid being surprised when you use AUTO_INCREMENT columns:

  • Although it is common to use the term "AUTO_INCREMENT column," AUTO_INCREMENT is not a data type; it's a data type attribute. Furthermore, AUTO_INCREMENT is an attribute intended for use only with integer types. Older versions of MySQL are lax in enforcing this constraint and will let you define a data type such as CHAR with the AUTO_INCREMENT attribute. However, only the integer types work correctly as AUTO_INCREMENT columns.

  • The primary purpose of the AUTO_INCREMENT mechanism is to allow you to generate a sequence of positive integers. The use of non-positive numbers in an AUTO_INCREMENT column is unsupported. Consequently, you may as well define AUTO_INCREMENT columns to be UNSIGNED. Using UNSIGNED also has the advantage of giving you twice as many sequence numbers before you hit the upper end of the data type's range.

  • Don't be fooled into thinking that adding AUTO_INCREMENT to a column definition is a magic way of getting an unlimited sequence of numbers. It's not; AUTO_INCREMENT sequences are always bound by the range of the underlying data type. For example, if you use a TINYINT column, the maximum sequence number is 127. When you reach that limit, your application begins to fail with duplicate-key errors. If you use TINYINT UNSIGNED instead, you reach the limit at 255.

  • Clearing a table's contents entirely with trUNCATE TABLE may reset a sequence to begin again at 1, even for storage engines that normally to not reuse AUTO_INCREMENT values. The sequence reset occurs due to the way that MySQL attempts to optimize a complete table erasure operation: When possible, it tosses the data rows and indexes and re-creates the table from scratch rather than deleting rows one at a time. This causes sequence number information to be lost. If you want to delete all records but preserve the sequence information, you can suppress this optimization by using DELETE with a WHERE clause that is always true, to force MySQL to evaluate the condition for each row and thus to delete every row individually:

     DELETE FROM tbl_name WHERE 1; 

Tips for Working with AUTO_INCREMENT Columns

This section describes some techniques that are useful when working with AUTO_INCREMENT columns.

Adding a Sequence Number Column to a Table

Suppose that you create a table and put some information into it:

 mysql> CREATE TABLE t (c CHAR(10)); mysql> INSERT INTO t VALUES('a'),('b'),('c'); mysql> SELECT * FROM t; +------+ | c    | +------+ | a    | | b    | | c    | +------+ 

Then you decide that you want to include a sequence number column in the table. To do this, issue an ALTER TABLE statement to add an AUTO_INCREMENT column, using the same kind of type definition that you'd use with CREATE TABLE:

 mysql> ALTER TABLE t ADD i INT UNSIGNED NOT NULL AUTO_INCREMENT PRIMARY KEY; mysql> SELECT * FROM t; +------+---+ | c    | i | +------+---+ | a    | 1 | | b    | 2 | | c    | 3 | +------+---+ 

Note how MySQL assigns sequence values to the AUTO_INCREMENT column automatically. You need not do so yourself.

Resequencing an Existing Column

If a table already has an AUTO_INCREMENT column, but you want to renumber it to eliminate gaps in the sequence that may have resulted from row deletions, the easiest way to do it is to drop the column and then add it again. When MySQL adds the column, it assigns new sequence numbers automatically.

Suppose that a table t looks like this, where i is the AUTO_INCREMENT column:

 mysql> CREATE TABLE t (c CHAR(10), i INT UNSIGNED AUTO_INCREMENT     -> NOT NULL PRIMARY KEY); mysql> INSERT INTO t (c)     -> VALUES('a'),('b'),('c'),('d'),('e'),('f'),('g'),('h'),('i'),('j'),('k'); mysql> DELETE FROM t WHERE c IN('a','d','f','g','j'); mysql> SELECT * FROM t; +------+----+ | c    | i  | +------+----+ | b    |  2 | | c    |  3 | | e    |  5 | | h    |  8 | | i    |  9 | | k    | 11 | +------+----+ 

The following ALTER TABLE statement drops the column and then adds it again, renumbering the column in the process:

 mysql> ALTER TABLE t     -> DROP i,     -> ADD i INT UNSIGNED NOT NULL AUTO_INCREMENT,     -> AUTO_INCREMENT = 1; mysql> SELECT * FROM t; +------+---+ | c    | i | +------+---+ | b    | 1 | | c    | 2 | | e    | 3 | | h    | 4 | | i    | 5 | | k    | 6 | +------+---+ 

The AUTO_INCREMENT = 1 clause resets the sequence to begin again at 1. For a MyISAM or MEMORY table (or InnoDB table, as of MySQL 5.0.3), you can use a value other than 1 to begin the sequence at a different value. For other storage engines, just omit the AUTO_INCREMENT clause, because they do not allow the initial value to be specified this way. The sequence will begin at 1.

Note that although it's easy to resequence a column, and the question, "How do you do it?" is a common one, there is usually very little need to do so. MySQL doesn't care whether a sequence has holes in it, nor do you gain any performance efficiencies by resequencing. In addition, if you have records in another table that refer to the values in the AUTO_INCREMENT column, resequencing the column destroys the correspondence between tables.

Generating Sequences Without AUTO_INCREMENT

MySQL supports a method for generating sequence numbers that doesn't use an AUTO_INCREMENT column at all. Instead, it uses an alternative form of the LAST_INSERT_ID() function that takes an argument. If you insert or update a column using LAST_INSERT_ID(expr), the next call to LAST_INSERT_ID() with no argument returns the value of expr. In other words, MySQL treats expr as though it had been generated as an AUTO_INCREMENT value. This allows you to create a sequence number and then retrieve it later in your session, confident that the value will not have been affected by the activity of other clients.

One way to use this strategy is to create a single-row table containing a value that you update each time you want the next value in the sequence. For example, you can create and initialize the table like this:

 CREATE TABLE seq_table (seq INT UNSIGNED NOT NULL); INSERT INTO seq_table VALUES(0); 

Those statements set up seq_table with a single row containing a seq value of 0. To use the table, generate the next sequence number and retrieve it like this:

 UPDATE seq_table SET seq = LAST_INSERT_ID(seq+1); SELECT LAST_INSERT_ID(); 

The UPDATE statement retrieves the current value of the seq column and increments it by 1 to produce the next value in the sequence. Generating the new value using LAST_INSERT_ID(seq+1) causes it to be treated like an AUTO_INCREMENT value, which allows it to be retrieved by calling LAST_INSERT_ID() without an argument. LAST_INSERT_ID() is client-specific, so you get the correct value even if other clients have generated other sequence numbers in the interval between the UPDATE and the SELECT.

Other uses for this method are to generate sequence values that increment by a value other than 1, or that are negative. For example, this statement can be executed repeatedly to generate a sequence of numbers that increase by 100 each time:

 UPDATE seq_table SET seq = LAST_INSERT_ID(seq+100); 

Repeating the following statement generates a sequence of decreasing numbers:

 UPDATE seq_table SET seq = LAST_INSERT_ID(seq-1); 

You also can use this technique to generate a sequence that begins at an arbitrary value, by setting the seq column to an appropriate initial value.

The preceding discussion describes how to set up a counter using a table with a single row. That's okay for a single counter. If you want several counters, add another column to the table to server as a counter identifier, and insert a row into the table for each counter. Suppose that you have a Web site and you want to put some "this page has been accessed n times" counters in several pages. Create a table with two columns. One column holds a name that uniquely identifies each counter. The other holds the current counter value. You can still use the LAST_INSERT_ID() function, but you determine which row it applies to by using the counter name. For example, you can create such a table with the following statement:

 CREATE TABLE counter (     name  VARCHAR(255) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL,     value INT UNSIGNED,     PRIMARY KEY (name) ); 

The name column is a string so that you can name a counter whatever you want, and it's defined as a PRIMARY KEY to prevent duplicate names. This assumes that applications using the table agree on the names they'll be using. For Web counters, uniqueness of counter names is ensured simply by using the pathname of each page within the document tree as its counter name. The name column has a binary collation to cause pathname values to be treated as case sensitive. (If your system has pathnames that are not case sensitive, use a collation that is not case sensitive.)

To use the counter table, insert a row corresponding to each page for which you need a counter. For example, to set up a new counter for the site's home page, do this:

 INSERT INTO counter (name,value) VALUES('index.html',0); 

That initializes a counter named 'index.html' with a value of zero. To generate the next sequence value for the page, use its pathname to look up the correct counter value and increment it with LAST_INSERT_ID(expr), and then retrieve the value with LAST_INSERT_ID():

 UPDATE counter SET value = LAST_INSERT_ID(value+1) WHERE name = 'index.html'; SELECT LAST_INSERT_ID(); 

An alternative approach is to increment the counter without using LAST_INSERT_ID(), like this:

 UPDATE counter SET value = value+1 WHERE name = 'index.html'; SELECT value FROM counter WHERE name = 'index.html'; 

However, that doesn't work correctly if another client increments the counter after you issue the UPDATE and before you issue the SELECT. You could solve that problem by putting LOCK TABLES and UNLOCK TABLES around the two statements. Or you could create the table as an InnoDB or BDB table and update the table within a transaction. Either method blocks other clients while you're using the counter, but the LAST_INSERT_ID() method accomplishes the same thing more easily. Because its value is client-specific, you always get the value you inserted, not the one from some other client, and you don't have to complicate the code with transactions or locks to keep other clients out.



MySQL The definitive guide to using, programming, and administering MySQL 4. 1 and 5. 0
Mysql: the Definitive Guide to Using, Programming, and Administering Mysql 4.1 and 5.0
ISBN: B003A1PKHY
EAN: N/A
Year: 2004
Pages: 190
Authors: Paul Dubois

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net