Optimal Transactions | MySQL Database Design and Tuning

< Day Day Up >

Transactions are a fundamental component in ensuring that data alterations are managed in a secure, consistent way. This section investigates how to apply solid optimization techniques to your transactions, combining their data integrity features with good performance.

As stated before, the BDB, NDB, and InnoDB storage engines all offer transactions. However, because this book does not explicitly cover the BDB engine, and because the NDB engine is covered in Chapter 17, "Optimal Clustering," this section exclusively focuses on InnoDB instead.

This section begins by clarifying a number of key transaction terms and concepts, and then moves on to exploring their costs and benefits. Finally, this section provides a collection of tips to help get the most speed out of your transactional operations.

Key Transaction Terms and Concepts

Before you can begin to choose an optimal transaction strategy for your server or application, you must first understand how transactions work, both standalone as well as in concert with other transactions. This section explores isolation levels, providing examples of each setting along with output from SHOW INNODB STATUS that you can use to comprehend what is happening on your MySQL server. A little later on, this chapter makes recommendations about selecting the correct isolation level.

Isolation Levels

Because modern relational database management systems are designed to support thousands of concurrent users, there must be some rules in place so that these simultaneous constituents can work without descending into chaos. These rules have been encapsulated into isolation levels, which dictate the concurrency behavior that each transaction undertakes, both solely as well as when encountering potential conflict.

MySQL offers all four of the isolation levels specified by the SQL standard. They include the following:

READ UNCOMMITTED Also referred to as "dirty read," this isolation level uses no locks when querying information with plain SELECT statements; on the other hand, UPDATE and DELETE statements employ locks. The scope of locking is dependent on the search conditions provided with the statement. If a unique search condition is passed in, InnoDB only locks the index records in question, and no others. On the other hand, if a range is passed into the statement, InnoDB is forced to apply additional locks to the gaps between records, reducing concurrency in the process.
In the case of plain SELECT statements, any locks held by other active threads are disregarded and the data is retrieved anyway. Although fast and simple, this introduces the possibility of making a decision based on "phantom rows" (that is, information that can be rolled back to a different state than it was when originally retrieved by the READ UNCOMMITTED query).
For example, assume that there are only two users (Bud and Lou) working on a MySQL server. Bud begins a transaction and inserts a row into the "vip" table shown earlier in this chapter. Lou has set a very permissive isolation level of READ UNCOMMITTED. Lou queries the table, and sees the row that Bud has just inserted, even though it is not yet committed. In fact, Bud is preoccupied at the moment, and has walked away from his desk, leaving the transaction open.
At this moment, if Lou ran SHOW INNODB STATUS, you would see these transaction details about Bud and Lou's sessions:
```
 ---TRANSACTION 0 1061815303, ACTIVE 9 sec, process no 20247, OS thread id 2108386224 MySQL thread id 8, query id 668 Client292 LCostello show innodb status ---TRANSACTION 0 1061815302, ACTIVE 17 sec, process no 20247, OS thread id 2108185520 1 lock struct(s), heap size 320, undo log entries 1 MySQL thread id 7, query id 634 Client142 BAbbott 
```
Perhaps Lou will make a decision about the row that Bud has inserted but not yet committed. If he does, there is a good chance that this decision will be wrong because Bud might decide to roll back the transaction.
As it turns out, Lou decides to update the row that he sees in the table. He issues an UPDATE statement, but is surprised to see that the statement appears to take a very long time to execute. If he were to run SHOW INNODB STATUS again at that moment, he would likely be alarmed at what he sees:
```
 ---TRANSACTION 0 1061815879, ACTIVE 13 sec, process no 20846, OS thread id 21089 88336 starting index read mysql tables in use 1, locked 1 LOCK WAIT 2 lock struct(s), heap size 320 MySQL thread id 14, query id 4425 Client292 LCostello Updating update vip set vip_level = 'Gold' where last_name = 'Fields' ------- TRX HAS BEEN WAITING 13 SEC FOR THIS LOCK TO BE GRANTED: RECORD LOCKS space id 0 page no 213046 n bits 80 index `last_name` of table `hig h_hat/vip` trx id 0 1061815879 lock_mode X waiting Record lock, heap no 12 PHYSICAL RECORD: n_fields 2; 1-byte offs TRUE; info bits  0  0: len 6; hex 4669656c6473; asc Fields;; 1: len 4; hex 8000000b; asc     ;; 
```
This output tells you that poor Lou won't be able to make his change until after Bud commits the transaction. Unfortunately, Bud has left the building, so no transaction commit will be forthcoming. In fact, several moments later Lou receives the following error:
```
 ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction 
```
Even with this permissive isolation level, Lou still received an error because data altering statements like he was trying to run still attempt to acquire locks, and open transactions can block these locks.
READ COMMITTED This isolation level is more restrictive than READ UNCOMMITTED, preventing others from seeing phantom rows. However, other data-modifying operations are treated the same as with READ UNCOMMITTED.
Continuing the preceding example, Lou has learned from the error of his ways, and has decided to use the READ COMMITTED transaction isolation level so as to never see a phantom row again.
As part of month-end processing, Bud runs a lengthy transaction that must update all rows in the table. At that moment, Lou needs to insert a single row into the table. Unfortunately, once again Lou is blocked from completing his work. Consulting SHOW INNODB STATUS, he sees the following:
```
 ---TRANSACTION 0 1061815882, ACTIVE 6 sec, process no 20846, OS thread id 210898 8336 inserting mysql tables in use 1, locked 1 LOCK WAIT 2 lock struct(s), heap size 320 MySQL thread id 14, query id 4909 Client292 LCostello update insert into vip (last_name, vip_level) values ('Hardy','Gold') ------- TRX HAS BEEN WAITING 6 SEC FOR THIS LOCK TO BE GRANTED: RECORD LOCKS space id 0 page no 213044 n bits 80 index `PRIMARY` of table `high_ hat/vip` trx id 0 1061815882 lock_mode X insert intention waiting Record lock, heap no 1 PHYSICAL RECORD: n_fields 1; 1-byte offs TRUE; info bits 0  0: len 9; hex 73757072656d756d00; asc supremum ;; ------------------ ---TRANSACTION 0 1061815881, ACTIVE 31 sec, process no 20846, OS thread id 21087 87632 2 lock struct(s), heap size 320, undo log entries 6 MySQL thread id 15, query id 4922 Client142 BAbbott 
```
Just as Lou is about to pick up the phone to ask Bud to finish his work, he receives another error from MySQL:
```
 ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction 
```
What happened this time? Because Bud was running a broad UPDATE process, InnoDB was forced to set locks to prevent other users from inserting information into the spaces between rows in the table. Lou's transaction fit that profile, so it had to wait until the transaction completed. Unfortunately, it ran out of time yet again because Bud's transaction took too long.
REPEATABLE READ Even more protective than COMMITTED READ, this isolation level ensures that all SELECT statements within a transaction see consistent results, even if the underlying data has changed in the interim.
For example, suppose that Lou is running a simple query within a transaction, and is looking at a series of rows. A few moments later, Bud starts a transaction, makes a change to a particular row, and then commits his transaction. What will Lou see? Until he closes his transaction and then refreshes the data, it appears as if Bud never made his change. Known as a "consistent read," this uses InnoDB's multiversioning capabilities to provide a point-in-time view of information. If Lou were to run SHOW INNODB STATUS while his original transaction was still open, he would see the following entry of interest:
```
 ---TRANSACTION 0 1061815884, ACTIVE 26 sec, process no 20846, OS thread id 21089 88336 MySQL thread id 14, query id 5299 Client292 LCostello Trx read view will not see trx with id >= 0 1061815885, sees < 0 1061815885 
```
The last line is interesting: It provides insight into which transactions will be "invisible" to Lou until he closes his active transaction.

SERIALIZABLE As the most protective isolation level, SERIALIZABLE blocks other threads from modifying information that has merely been viewed with a simple SELECT statement.

For example, Bud has now decided that he doesn't want anyone to alter any of his data while he's reading it, even if he won't be making any changes at all. By setting his isolation level to SERIALIZABLE, InnoDB causes any other database access threads that attempt to modify any of Bud's already-read data to pause. They won't be able to proceed until they either time out or Bud completes his transaction. Viewing SHOW INNODB STATUS points this out:

 ---TRANSACTION 0 1061815888, ACTIVE 4 sec, process no 20846, OS thread id 210898 8336 starting index read mysql tables in use 1, locked 1 LOCK WAIT 3 lock struct(s), heap size 320 MySQL thread id 14, query id 5603 Client292 LCostello Updating update vip set vip_level = 'Tin' where last_name = 'Fish' ------- TRX HAS BEEN WAITING 4 SEC FOR THIS LOCK TO BE GRANTED: RECORD LOCKS space id 0 page no 213044 n bits 80 index `PRIMARY` of table `high_ hat/vip` trx id 0 1061815888 lock_mode X locks rec but not gap waiting Record lock, heap no 7 PHYSICAL RECORD: n_fields 5; 1-byte offs TRUE; info bits 0  0: len 4; hex 80000006; asc     ;; 1: len 6; hex 00003f4a0626; asc   ?J &;; 2: len 7; hex 800005c0020084; asc        ;; 3: len 4; hex 46697368; asc Fish;; 4: l en 1; hex 02; asc  ;;

Setting Transaction Isolation Levels

You can set a serverwide default transaction isolation level policy by providing a value for the --TRansaction-isolation option in your MySQL configuration file. Unless specified otherwise, the default value is REPEATABLE READ.

You can also use the SET TRANSACTION ISOLATION LEVEL statement to control this behavior globally or for a session. Remember that any active transactions don't have their isolation levels changed by this statement; only new transactions are affected.

You can discover the global and session-level isolation levels currently in effect in several ways:

 mysql> SELECT @@global.tx_isolation; +-----------------------+ | @@global.tx_isolation | +-----------------------+ | REPEATABLE-READ       | +-----------------------+ 1 row in set (0.00 sec) mysql> SELECT @@tx_isolation; +----------------+ | @@tx_isolation | +----------------+ | SERIALIZABLE   | +----------------+ 1 row in set (0.00 sec) mysql> SHOW VARIABLES LIKE 'TX%'; +---------------+--------------+ | Variable_name | Value        | +---------------+--------------+ | tx_isolation  | SERIALIZABLE | +---------------+--------------+ 1 row in set (0.00 sec)

Cost of Transactions

It's important to note that in InnoDB, all activity happens within a transaction regardless of whether the developer or administrator overtly requests one. The AUTOCOMMIT option controls the mechanism by which these transactions are concluded. When set to 1, every database-affecting operation can be considered as occurring within its own transaction. On the other hand, setting AUTOCOMMIT to 0 means that a transaction remains open until either a COMMIT or ROLLBACK statement is sent by the client. Most applications that contain transaction-concluding logic (that is, whether to commit or roll back the transaction) elect to set AUTOCOMMIT to 0.

Naturally, transactions involve some extra costs. These costs chiefly involve additional writes to the binary log, as well as added memory consumption to keep track of the internal locks that InnoDB uses to maintain transactional integrity. Although neither of these costs is terribly burdensome, they do warrant consideration before opting for transactional support for a given table. This topic is explored in more detail in the next section.

Transaction Performance Suggestions

Before choosing whether to require that a table use transactions, it's useful to take a number of factors into account. Then, if you decide to stipulate that a given set of tables use the InnoDB storage engine, you can take several actions to improve overall transaction response. This section examines these decision criteria and suggestions.

Choosing the Right Storage Engine

Regardless of your AUTOCOMMIT configuration, InnoDB's transactional support incurs extra overhead. For this reason, it's smart to only use InnoDB as the storage engine for those tables that require the extra capabilities offered by transactions. Otherwise, using an engine like MyISAM is a better choice.

Even if a table requires transactional support, consider using a nontransactional table to hold historical or other nondynamic information.

Choosing the Right Isolation Level

Each of MySQL's four transaction isolation levels is appropriate in certain circumstances. This section describes the scenarios in which each isolation level is suitable:

READ UNCOMMITTED As you saw earlier, this isolation level uses the fewest number of locks, but opens up the possibility of seeing "phantom rows." Although potentially dangerous in highly dynamic, multiuser environments, it can safely be used in single-user situations, or when the table's data is static.
READ COMMITTED Applications that would benefit from this isolation level are those that cannot tolerate phantom rows but do not need the more protective capabilities of REPEATABLE READ.
REPEATABLE READ As MySQL's default, this isolation level is suitable for applications that require assistance in avoiding phantom rows, protection from other users altering information that is currently being modified by the current application, and the data reliability and point-in-time view provided by its consistent read guarantees.
Dozens of types of applications fit this profile; a good example is a program that computes a result after examining a set of detail records, updates these detail records, and then revises a header record with this information. In this case, it's vital that the detail records remain consistent until they and their associated header record are updated and the transaction is committed.
SERIALIZABLE This highly restrictive isolation level should only be used for those transactions that cannot tolerate any changes to underlying data, even if the transaction is simply reading information without making any of its own alterations.
Extending the example previously described, suppose that this application also examines detail data from another, secondary table via a simple SELECT statement, but that this newly read data is not touched in any way. Still, the developer does not want anyone to alter any rows in this secondary table that have been evaluated in computing the result for updating the header, even if those rows themselves are not updated. In this case, SERIALIZABLE would be the right approach, but it should be used with extreme caution: If the transaction examines many rows and is lengthy, there is a good chance other users and processes will encounter delays until this transaction closes and releases its locks.

Keeping Transactions Brief

Transactions do not improve with age; in most cases, they should be kept as brief as possible. There are a number of dangers introduced by lengthy transactions:

Degraded concurrency Combining a restrictive isolation level with a long transaction is a recipe for concurrency problems: You just saw how one inattentive user (Bud) kept his transactions open too long and caused another user (Lou) to make all kinds of incorrect decisions and encounter timeouts.
Resource consumption Although InnoDB is very efficient in its transaction- supporting resource allocation, a long-running transaction consumes these resources for greater amounts of time, reducing their availability for other users.
Rollback segment issues Chapter 12 discusses how InnoDB uses insert and update undo logs in its rollback segment to help ensure consistent views of information, even if transactions are under way. If a transaction runs too long, InnoDB is unable to free up these resources in the rollback segment, which could cause tablespace issues or other performance-degrading problems.
Deadlocks Over time, InnoDB has increasingly improved its ability to avoid deadlocks, and manage them when they do occur. Developers can do their part by keeping transactions as short as possible. In addition, for those sites using AUTOCOMMMIT = 1, consider setting innodb_table_locks to 0 to help reduce the frequency of deadlocks.
Rollback problems Although InnoDB uses good buffering techniques when handling big insert operations, a rollback of the same transaction does not benefit from this caching. This means that a rollback can take a very long time, even if the insert operation was proceeding rapidly. By definition, a short transaction has fewer modifications that might need to be rolled back.

Now that you've read the sorrowful litany of difficulties that can be spawned by lengthy transactions, what can you do to reduce the frequency of their occurrences?

Use the right isolation level As you saw earlier in this chapter, there are dramatic resource and concurrency differences among the four isolation levels. These differences can also affect transaction duration because a transaction that requires very restrictive concurrency might find itself pausing until other resources are released. Try to gain a good understanding of these isolation levels, and use the most permissive one you can without compromising your data's integrity.
If you are new to either MySQL or transactions, you will reap large rewards from the relatively small investment of time it takes to experiment with multiple sessions, sample tables, and different transaction isolation levels.
Don't include user interface communication within the transaction One way to guarantee all kinds of transaction and other problems is to allow users to walk away from their computers while a transaction is waiting for information from them. Always prompt your users for all necessary input to complete, and only then start (and quickly finish) your transaction.
Use DROP TABLE or TRUNCATE TABLE rather than DELETE Chapter 8, "Advanced SQL Tips," reviews a variety of SQL performance-enhancing tips. One important suggestion contrasts the speed of a brute-force DROP TABLE/trUNCATE TABLE compared with the more nuanced (and consequently slower) DELETE statement. These faster statements also consume much fewer resources, and greatly reduce transaction time.

< Day Day Up >