Multiversion Concurrency Control

I l @ ve RuBoard

PostgreSQL makes use of Multiversion Concurrency Control (MVCC) to maintain data consistency. Understanding, at least in concept, how MVCC works can be beneficial to a PostgreSQL administrator or developer.

Most popular RDBMSs make use of table or row locks to maintain database consistency. Typically, these locks occur at the physical level of the file. These locks are used to prohibit two or more instances from writing to the same row (or table) concurrently.

PostgreSQL uses a more advanced method for ensuring database integrity. In MVCC, each transaction sees a version of the database as it existed at some near point in the past. It is import to distinguish that the transaction is not seeing the actual data, just a previous version of that data.

This prevents the current transaction from having to deal with the database arriving at an inconsistent state due to other concurrent database transactions. In essence, once a transaction is started, that transaction is an island unto itself. The underlying data structures are isolated from other transactions' manipulations. Once that transaction has ended, the changes it made to its specific version of the database are merged back into the actual data structures.

There are three types of concurrency problems that any RDBMS has to deal with:

  • Dirty reads. A transaction reads data written by another uncommitted process.

  • Phantom reads. A transaction re-executes a query because the underlying data has changed in such a way as to make a different result set occur.

  • Nonrepeat reads. A transaction rereads data and finds data that has changed due to another transaction having been committed since the first read occurred.

PostgreSQL provides for READ COMMITTED and SERIALIZABLE isolation levels of protection that offer the following:

Level

Dirty Read

Phantom Read

Nonrepeat Read

READ COMMITTED

Not possible

Possible

Possible

SERIALIZABLE

Not possible

Not possible

Not possible

READ COMMITTED Level

This is the default method of isolation protection in PostgreSQL. READ COMMITTED level prevents queries from seeing data changes after the transaction has been started. However, the transaction will see previous changes it made to the table while in process.

The crucial point in understanding READ COMMITTED isolation is what happens during an UPDATE , DELETE , or SELECT FOR UPDATE command being run by another transaction. Instances such as these can cause only partial isolation support. The following steps outline how such a scenario can occur:

  1. Transaction A will wait for Transaction B to complete.

  2. If Transaction B issues a ROLLBACK , then Transaction A will continue as usual.

  3. If Transaction B completes with a COMMIT , then Transaction A will re-execute its query to make certain that no criterion has changed that would result in it not needing to be run (for example, if the row was deleted by Transaction B).

  4. If the row still matches the criterion, then the update will continue. (Note: See the paragraph following this list.)

  5. The row is then doubly updated, and other waiting statements in Transaction A will continue to execute.

The important point to notice is what happens in step 4. At this point, Transaction A has a new version of the database it is using. This occurred when Transaction A re-executed its query. At that point, it was using a new version of the database as its baseline. Therefore, subsequent statements in Transaction A will operate on the changes made by Transaction B. So, in this way, the transaction isolation is only partial. It is possible for transactions to "seep" into each other in specific cases like these.

SERIALIZABLE Level

This isolation level differs from READ COMMITTED in that each transaction must occur in a scheduled serial manner. No transactions can occur that would result in one transaction acting on another transaction's modifications.

This is enforced by strict transaction scheduling. This scheduling mandates that if one transaction completes successfully and thereby contaminates another transaction's read buffer, then the second transaction issues a ROLLBACK automatically.

The practical effect of this is that the database system must be constructed in such a way as to expect transaction failures and retry them afterward. On a heavily used system, this could mean that a significant percentage of transactions would fail due to this strict scheduling. Such a burden could make the system much slower than would occur under a straight READ COMMITTED database.

In the majority of cases, the READ COMMITTED isolation level is appropriate. However, there might be some queries that mandate such a rigorous approach to ensure data validity.

I l @ ve RuBoard


PostgreSQL Essential Reference
PostgreSQL Essential Reference
ISBN: 0735711216
EAN: 2147483647
Year: 2001
Pages: 118
Authors: Barry Stinson

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net