Multiversion Concurrency Control

I l @ ve RuBoard

PostgreSQL makes use of Multiversion Concurrency Control (MVCC) to maintain data consistency. Understanding, at least in concept, how MVCC works can be beneficial to a PostgreSQL administrator or developer.

Most popular RDBMSs make use of table or row locks to maintain database consistency. Typically, these locks occur at the physical level of the file. These locks are used to prohibit two or more instances from writing to the same row (or table) concurrently.

PostgreSQL uses a more advanced method for ensuring database integrity. In MVCC, each transaction sees a version of the database as it existed at some near point in the past. It is import to distinguish that the transaction is not seeing the actual data, just a previous version of that data.

This prevents the current transaction from having to deal with the database arriving at an inconsistent state due to other concurrent database transactions. In essence, once a transaction is started, that transaction is an island unto itself. The underlying data structures are isolated from other transactions' manipulations. Once that transaction has ended, the changes it made to its specific version of the database are merged back into the actual data structures.

There are three types of concurrency problems that any RDBMS has to deal with:

Dirty reads. A transaction reads data written by another uncommitted process.
Phantom reads. A transaction re-executes a query because the underlying data has changed in such a way as to make a different result set occur.
Nonrepeat reads. A transaction rereads data and finds data that has changed due to another transaction having been committed since the first read occurred.

PostgreSQL provides for READ COMMITTED and SERIALIZABLE isolation levels of protection that offer the following:

Level	Dirty Read	Phantom Read	Nonrepeat Read
`READ COMMITTED`	Not possible	Possible	Possible
`SERIALIZABLE`	Not possible	Not possible	Not possible

`READ COMMITTED` Level

This is the default method of isolation protection in PostgreSQL. READ COMMITTED level prevents queries from seeing data changes after the transaction has been started. However, the transaction will see previous changes it made to the table while in process.

The crucial point in understanding READ COMMITTED isolation is what happens during an UPDATE , DELETE , or SELECT FOR UPDATE command being run by another transaction. Instances such as these can cause only partial isolation support. The following steps outline how such a scenario can occur:

Transaction A will wait for Transaction B to complete.
If Transaction B issues a ROLLBACK , then Transaction A will continue as usual.
If Transaction B completes with a COMMIT , then Transaction A will re-execute its query to make certain that no criterion has changed that would result in it not needing to be run (for example, if the row was deleted by Transaction B).
If the row still matches the criterion, then the update will continue. (Note: See the paragraph following this list.)
The row is then doubly updated, and other waiting statements in Transaction A will continue to execute.

The important point to notice is what happens in step 4. At this point, Transaction A has a new version of the database it is using. This occurred when Transaction A re-executed its query. At that point, it was using a new version of the database as its baseline. Therefore, subsequent statements in Transaction A will operate on the changes made by Transaction B. So, in this way, the transaction isolation is only partial. It is possible for transactions to "seep" into each other in specific cases like these.

`SERIALIZABLE` Level

This isolation level differs from READ COMMITTED in that each transaction must occur in a scheduled serial manner. No transactions can occur that would result in one transaction acting on another transaction's modifications.

This is enforced by strict transaction scheduling. This scheduling mandates that if one transaction completes successfully and thereby contaminates another transaction's read buffer, then the second transaction issues a ROLLBACK automatically.

The practical effect of this is that the database system must be constructed in such a way as to expect transaction failures and retry them afterward. On a heavily used system, this could mean that a significant percentage of transactions would fail due to this strict scheduling. Such a burden could make the system much slower than would occur under a straight READ COMMITTED database.

In the majority of cases, the READ COMMITTED isolation level is appropriate. However, there might be some queries that mandate such a rigorous approach to ensure data validity.

I l @ ve RuBoard

READ COMMITTED Level

SERIALIZABLE Level

`READ COMMITTED` Level

`SERIALIZABLE` Level