Improving Data Modification

< Day Day Up >

In addition to internal storage engine features such as InnoDB's insert buffering capability, MySQL gives you a surprising amount of control over performance for INSERT, REPLACE, UPDATE, and DELETE operations.

Each storage engine's buffering mechanisms are explored a little later. For now, this section looks at some examples of how you can develop your applications to improve the speed of your data modifications.

First, you might decide at the server, connection, or statement level that you want these types of data modifications to take less precedence than query operations. To set this behavior as the default for your system, include --low-priority-updates when you launch mysqld. If you want this trait enforced by connection, issue a SET LOW_PRIORITY_UPDATES = 1 request. Finally, you can make this request by statement by including LOW_PRIORITY within your SQL syntax. After being set, all searches against the relevant table(s) must complete before making any modifications.

Improving `INSERT`

The following sections look at several options to boost performance for data-loading operations. Before getting started, note that Chapter 15, "Improving Import and Export Operations," is dedicated to import and export operations a little later in the book.

`INSERT LOW_PRIORITY`

You reviewed SELECT HIGH_PRIORITY earlier in this chapter. Recall that adding HIGH_PRIORITY to your queries forces MySQL to move them ahead of other processes that modify data.

On the other hand, when it comes to inserting information into a table, you just saw that you can elect to request that these inserts take a lower precedence than operations that are already reading from the table by simply adding LOW_PRIORITY to your INSERT statement. This causes the insert operation to wait until these reads are completed, including any reads that may have begun while the old reads are finishing.

Clearly, this can have a negative impact on an application that sees a mixture of relatively infrequent inserts with constant searches. Here is yet another example of when replication can add great value to your environment: You could set up a master to receive new data inserts, while directing queries against a replicated, more static copy of the master's database. Finally, note that the INSERT HIGH_PRIORITY option doesn't really increase INSERT priority per se; instead, it merely overrides the --low-priority-updates server option should it be set.

`INSERT` Versus `INSERT DELAYED`

Because the MyISAM engine only supports table-level locks, there might be situations in which an INSERT operation needs to wait if there are other users accessing the table at that moment. This is less of a problem than it used to be because MySQL now supports parallel SELECT and INSERT operations if there are no empty blocks in the middle of the table (that is, new rows are able to be stored in blocks at the end of the table).

However, for those situations in which there are empty blocks in the middle of your MyISAM table, you can instruct MySQL to queue your inserts via the INSERT DELAYED statement. From your database application's perspective, it's as if the data has been written into the table, when it, in fact, has entered a memory queue for writing as soon as the table is ready. This queue is shared by multiple clients, which also helps to improve performance.

For example, we created a MyISAM table, filled it with millions of rows of random data, and then deleted significant quantities of data from the middle of the table. After that was complete, we ran a simple INSERT DELAYED statement, which was accepted immediately by MySQL. Figure 8.8 shows what the MySQL Administrator reported after the insert operation was accepted.

Figure 8.8. A delayed `INSERT` operation.

As you can see, MySQL has allocated a thread (PID 10) and is waiting until the query (PID 9) completes before actually writing the row to the table.

This convenience comes at a cost: MySQL must allocate some additional memory and threads to support this functionality. You are also unable to determine the last value of any columns that have the AUTO_INCREMENT attribute because the rows haven't really been written yet. There is also a risk of lost data if the server is abruptly shut down before the operation truly completes.

In summary, consider using INSERT DELAYED in situations when you

Are using the MyISAM engine
Have a heavily accessed table, with a mixture of INSERT, UPDATE, and DELETE operations
Can't afford to have your client applications block until their INSERT operations complete
Are confident that your mysqld process is not subject to undue interruptions or crashes, or if you can reconstruct and retry inserting information should the server crash

`INSERT` Versus `INSERT ... ON DUPLICATE KEY UPDATE` Versus `REPLACE`

If you are building a MySQL-based application that performs both INSERT and UPDATE operations, there will likely be situations in which you want the flexibility to insert a row if it doesn't yet exist (that is, there is no row with the same primary key in the table) but update the row if it already does (that is, a row with this primary key already present).

Beginning with version 4.1, MySQL enhanced the INSERT statement with ON DUPLICATE KEY UPDATE. Some readers might have heard of this functionality described as "UPSERT," which is an amalgamation of UPDATE and INSERT.

This new capability can help streamline your application as well as have a helpful impact on performance, especially for large tables that see frequent insert and update operations. Take a look at the following example.

Suppose that you create a table to track the current mileage totals for all customers:

 CREATE TABLE customer_mileage (     customer_id INT PRIMARY KEY,     current_mileage INT,     ...     ... );

After this table is created, several different applications repeatedly access the table. Some create new customer_mileage records (including current_mileage), whereas others merely add some additional miles to the current_mileage column for existing customers.

If you erroneously issue an INSERT statement against this table when there is already an existing record, you receive an error, which you'll have to manage:

 INSERT INTO customer_mileage VALUES (179939,5634); ERROR 1062 (23000): Duplicate entry '179939' for key 1

However, the new INSERT functionality lets you write a single, flexible SQL statement that handles both INSERT and UPDATE operations:

 INSERT INTO customer_mileage VALUES (179939,5634) ON DUPLICATE KEY UPDATE current_mileage = current_mileage + 5634;

Note that if there is no row already present with this primary key, MySQL reports that one row is affected. However, if there is a row already resident, MySQL reports that two rows are affected.

Finally, to take advantage of this capability, your table needs either a primary key or at least one unique index.

Multirow Inserts

MySQL supports INSERT statements that create multiple rows. This can have a notable effect on performance, especially when inserting large quantities of information. Consider the following example.

High-Hat Airways has entered into a partnership with KramAir, one of Southeast Asia's most successful airlines. Part of this partnership includes mutual frequent flyer credit for all flights flown on either airline. Each month, KramAir sends a number of large files containing relevant details to ensure these credits. One of these files contains data that needs to be loaded into your flights table, which is defined as follows:

 CREATE TABLE flights (     flight_id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,     flight_number SMALLINT UNSIGNED NOT NULL,     flight_date DATE NOT NULL,     flight_departure_city CHAR(3),     flight_arrival_city CHAR(3) );

Because KramAir is also a happy MySQL customer, it's easy for them to generate a multirow INSERT as part of their mysqldump operation. Chapter 15 explores mysqldump.

Without the multirow INSERT syntax, your SQL operations would look like this:

 INSERT INTO flights VALUES (1,2358,'2006-03-09','SYD','KUL'); INSERT INTO flights VALUES (2,109,'2006-03-09','SIN','HAN'); INSERT INTO flights VALUES (3,9893,'2006-03-09','HKG','NRT'); INSERT INTO flights VALUES (4,642,'2006-03-09','ACK','KUL'); INSERT INTO flights VALUES (5,8845,'2006-03-09','BGK','SIN'); ...

The multirow INSERT syntax is much more compressed (if a bit harder to read):


 INSERT INTO `flights` VALUES (1,2358,'2006-03-09','SYD','KUL'),(2,109,'2006-03-09','SIN' ,'HAN'),(3,9893,'2006-03-09','HKG','NRT'),(4,642,'2006-03-09','ACK','KUL '),(5,8845,'2006-03-09','BGK','SIN')...

To gauge the potential performance gains from introducing multirow inserts, we created a sample data set of 600,000 rows to be loaded into the flights table.

First, we created this table using the InnoDB storage engine. Next, we included a LOCK TABLE directive at the start of the load. Finally, we tested the difference between using a dedicated INSERT statement for each row as well as multirow INSERT statements.

To ensure that buffering was not a factor, after each test we dropped the table, stopped and restarted the engine, and re-created the table.

The results were dramatic: The time to complete the load when using single row INSERT statements was approximately thirtyfold longer than when rows were loaded in multiples.

Next, we conducted the same set of tests with the MyISAM storage engine. The results were quite a bit less dramatic, but still substantial: Single row INSERT operations took eight times longer than corresponding multirow INSERT statements.

Disabling Indexes

If you are using the MyISAM storage engine, you have the option of disabling index updates when loading large amounts of data into the table by running myisamchk with the --keys-used parameter. If you are exporting data via mysqldump, you can also specify --disable-keys when launching the utility. This is discussed in more detail in Chapters 7 and 15.

Improving `DELETE`

Most transactional applications generally don't delete large amounts of data in one operation; they typically remove a relatively small quantity of data in the context of a brief transaction. However, there are many other situations in which large-scale deletions happen. MySQL offers a number of performance-enhancing options that you can use to speed up these operations. The following sections review what these options are and how you can take advantage of them. Server parameters that affect deletion are not covered now because Chapters 10, 11, and 12 provide a significant amount of coverage.

`DELETE LOW_PRIORITY`

Add the LOW_PRIORITY option to your DELETE statement if you don't mind delaying your delete operation until all other active readers of the table have completed their access to the table. Note that after it starts, the DELETE operation consumes the same amount of resources even if you specified LOW_PRIORITY. Of course, this statement does not improve delete performance per se, but does reduce its impact on other activities.

`DELETE QUICK`

If you are 1) using the MyISAM storage engine and 2) attempting to delete a large amount of data, try including the QUICK option with your DELETE statement:

 DELETE QUICK FROM transactions WHERE transaction_id BETWEEN 100000 AND 300000;

Choosing this option means that MySQL skips some internal index housekeeping steps when processing the delete. However, if these values are unlikely to be reinserted later, this increases the likelihood of table and index fragmentation because MySQL now bypasses the normal steps of merging that index leaves upon deletion. To reclaim this space, you eventually want to defragment the table and indexes via either the OPTIMIZE TABLE or myisamchk command.

To test the performance implications of DELETE versus DELETE QUICK, we first created a sample table and filled it with 10 million rows of random data:

 CREATE TABLE delete_test (     col1 INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,     col2 CHAR(10) NOT NULL,     col3 VARCHAR(200) NOT NULL,     col4 BIGINT NOT NULL,     INDEX (col3),     INDEX (col4) ) ENGINE = MYISAM;

Next, we ran some DELETE tests. All tests deleted the same amount of data, using the primary key as a lookup. The only difference was that half the operations used DELETE alone, whereas the other half used DELETE QUICK.

Unfortunately, the results were inconclusive at best. In a significant number of situations, DELETE QUICK slightly degraded performance rather than enhancing it. Because the prime motivation for using DELETE QUICK is to dramatically improve performance, these tests indicate that it does not add tremendous value.

Nevertheless, it is possible that your specific table and index structure might benefit from employing DELETE QUICK; this is when experimentation on a test server that replicates your environment is invaluable.

`DELETE`, `ORDER BY`, and `LIMIT`

First introduced in version 4.0, the combination of DELETE, ORDER BY, and LIMIT gives you significant flexibility when deleting from very large tables. You can construct a delete operation that first sorts the potential deleted rows by criteria that you specify, and then deletes a limited number of rows to keep performance within acceptable parameters.

For example, using the TRansaction_detail table described earlier in this chapter, suppose that you want to delete the oldest 1,000 rows. Prior to this new combination of DELETE options, you would have needed to somehow figure out what the time stamp was for the oldest 1,000 rows and then include that value in our WHERE clause.

However, you can now write much simpler SQL:

 DELETE FROM transaction_detail ORDER BY transaction_date LIMIT 1000;

It goes without saying, of course, that you will want your ORDER BY clause to make use of an index.

`DELETE` Versus `trUNCATE TABLE` Versus `DROP TABLE`

MySQL offers a choice of several commands when you need to quickly remove large quantities of data from an existing table. Your actual choice depends on your storage engine as well as your particular processing need. Take a look at a few scenarios:

You are using the InnoDB storage engine and need to delete all rows in a large table. In this case, it is much faster to issue a DROP TABLE command and then re-create the table via CREATE TABLE, assuming you have sufficient permission. Otherwise, running a full DELETE on the table can generate tremendous transaction load on the database engine.
You are using the MyISAM storage engine, and need to delete all rows in a large table. MySQL offers the trUNCATE TABLE command, which implicitly drops and then re-creates the table.
You are using any storage engine, and need to delete the vast majority (90%+) of rows in a table. In this situation, you might get better performance by first extracting the rows that you want to keep, placing them in a temporary table, and then dropping/re-creating the large table. After this is done, you can then reload the remaining rows back into the newly re-created table.

< Day Day Up >

Improving INSERT

INSERT LOW_PRIORITY

INSERT Versus INSERT DELAYED

Figure 8.8. A delayed INSERT operation.

INSERT Versus INSERT ... ON DUPLICATE KEY UPDATE Versus REPLACE