What Are Transactions? | MySQL Tutorial

The first thing to address in this chapter is what exactly is a transaction? In the context of a database management system, a transaction is a sequence of related instructions that must be treated as one indivisible unit. That is, all the work in the transaction must be done, or all of it must be left undone. This concept is known as atomicity. A transaction is atomic because it cannot be broken down into parts ”it all gets processed or it all gets ignored.

This has particular implications when considering concurrent access by multiple users, processes, or threads and also for recovery. Modern computers are often accessed by more than one person at a time. They usually have more than one program running at one time, and they often run programs such as Web server software that create many processes or threads that run at the same time. Each of these users, programs, processes, or threads may need access to the database server.

Multiple threads must not interfere with each other when running concurrently. If an error occurs, the database must honor transactions when recovering. This means returning the database to the state it was in before the error occurred or finishing the whole transaction. It is generally better to lose an entire transaction of related changes than to recover to a state partway through a sequence of updates that might result in the database being in an inconsistent state.

This concept can be expressed more formally , but an example is probably the easiest way to come to grips with it. Consider a very simple (but potentially very important) database that records bank account details. Each account has, at minimum, a unique identifier and a balance.

We can create this table as follows :

 create table account (number int not null auto_increment primary key,   balance float) type = InnoDB;

It probably is not a shock that we can create two new accounts with these queries:

 insert into account (balance) values (0.0); insert into account (balance) values (1000.0); insert into account (balance) values (2000.0);

There is not very much that can go wrong here, and the result should be this data:

 +--------+---------+  number  balance  +--------+---------+       1        0        2     1000        3     2000  +--------+---------+

As long as we are not making assumptions about the account numbers generated by the auto-increment column, it does not matter in what order these queries are eventually run. It does not matter if we are running them from three different interactive clients simultaneously and cannot be sure which will be processed first.

This is often not true for more complex queries or sets of queries. Consider the following pair of statements intended to deposit $500 into account 2:

 # first check balance select balance from account where number = 2; # query gives us a result of 00 # now store updated balance update account set balance = 1500 where number = 2;

These queries are related. They need to be run together. If other clients can update the balance of this account between our balance check and our balance update, we may not get the result we expected.

If two clients were running pairs of queries like this at the same time, our final result would depend on timing. If we were attempting to deposit $500 with these queries and another client was attempting to deposit $100 with the following pair, the end result could be a balance of $1100 or $1500 ” neither of which is the right result.

 # first check balance select balance from account where number = 2; # query gives us a result of 00 # now store updated balance update account set balance = 1100 where number = 2;

This is obviously not desirable, but in this case the problem is easily solved . Making our updates relative rather than absolute will make them into single, indivisible units and will solve the problem. The following query will run correctly, regardless of what other queries are running at the same time:

 update account set balance = balance + 500 where number = 2;

A single update statement in MySQL is always atomic. It cannot be interrupted by another query or half succeed. It will complete or will completely fail on an error.

More complex scenarios are harder to address. Consider the following pair of queries intended to transfer $1000 from account 2 to account 1:

 update account set balance = balance - 1000 where number = 2; update account set balance = balance + 1000 where number = 1;

Our updates are both relative, but it is important that these two queries be run together for sensible results. The total amount of money in the system should be the same after the queries as before. Money should be moved from account to account, not created or destroyed . If a power failure happened between running the first query and running the second query, our data would no longer be consistent.

In simple cases, a workaround is to collapse the queries into one SQL statement. In this case we could write this:

 update account as source, account as dest   set source.balance = source.balance  1000,     dest.balance = dest.balance + 1000   where source.number = 2 and dest.number = 1;

By using two aliases to the account table ( source and dest ), we have ensured that this is one atomic update that will either succeed or fail. We do not need to worry about the server stopping between parts of our operation.

In this case the only casualty is readability. The combined query is harder to read and debug than our first attempt.

In many cases, it may not be possible to collapse all the related queries into one like this. The solution in those cases, and anytime you want more readable code, is to use MySQL's transaction syntax. By marking a set of statements as a transaction, you inform the database that they are a related, indivisible set. They should be treated as an atomic unit and either all succeed or all have no effect. You can run these two queries as a single transaction using the following SQL statements:

 start transaction; update account set balance = balance - 1000 where number = 2; update account set balance = balance + 1000 where number = 1; commit;

An important property of transactions is that they are not visible to other sessions until they are complete and committed. No other thread can read inconsistent data from the table(s) while you are in the process of updating it.

Another benefit of transactions is that partially performed transactions can be undone. As long as we attempt to roll back the transaction before we have committed it, then any changes made by queries that are part of the transaction will be undone.

In the case of our transfer example, if we added a SELECT statement to check that we were not removing more money from the source account than it contained, we could use the keyword ROLLBACK if we wanted to cancel the whole transaction. The syntax would look like this:

 start transaction; update account set balance = balance - 1000 where number = 2; update account set balance = balance + 1000 where number = 1; select balance from account where number = 2; # select tells us that account #2 has a negative balance! # we'd better abort rollback;

Calling ROLLBACK aborts the transaction and undoes any changes it would have made. A transaction that was rolled back instead of committed leaves no trace in the data. Because partial results were never visible to other sessions, it is exactly as though it never happened.