Transactions and Transaction Boundaries | DB2 9 Fundamentals: Certification Study Guide

A transaction (also known as a unit of work) is a sequence of one or more SQL operations grouped together as a single unit, usually within an application process. Such a unit is called atomic (from the Greek word meaning "not able to be cut," because it is indivisible-either all of its work is carried out, or none of its work is carried out. A given transaction can perform any number of SQL operations-from a single operation to many hundreds or even thousands, depending on what is considered a "single step" within your business logic. (It is important to note that the longer a transaction is, the more database concurrency decreases and the more resource locks are acquired; this is usually considered a sign of a poorly written application.)

The initiation and termination of a single transaction defines points of data consistency within a database (we'll take a closer look at data consistency in Chapter 7, "Data Concurrency"); either the effects of all operations performed within a transaction are applied to the database and made permanent (committed), or the effects of all operations performed are backed out (rolled back) and the database is returned to the state it was in before the transaction was initiated. (Any data pages that were copied to a buffer pool on behalf of a transaction will remain in the buffer pool until their storage space is needed-at that time, they will be removed.)

In most cases, transactions are initiated the first time an executable SQL statement is executed after a connection to a database has been made or immediately after a pre-existing transaction has been terminated. Once initiated, transactions can be implicitly terminated using a feature known as "automatic commit" (in this case, each executable SQL statement is treated as a single transaction, and any changes made by that statement are applied to the database if the statement executes successfully or discarded if the statement fails), or they can be explicitly terminated by executing the COMMIT or the ROLLBACK SQL statement. The basic syntax for these two statements is:

 COMMIT <WORK>

and

 ROLLBACK <WORK>

When the COMMIT statement is used to terminate a transaction, all changes made to the database since the transaction began are made permanent. On the other hand, when the ROLLBACK statement is used, all changes made are backed out and the database is returned to the state it was in just before the transaction began. Figure 5-12 shows the effects of a transaction that was terminated with a COMMIT statement; Figure 5-13 shows the effects of a transaction that was terminated with a ROLLBACK statement.

image from book
Figure 5-12: Terminating a transaction with the COMMIT SQL statement.

image from book
Figure 5-13: Terminating a transaction with the ROLLBACK SQL statement.

It is important to remember that commit and rollback operations only have an effect on changes that have been made within the transaction they terminate. So in order to evaluate the effects of a series of transactions, you must be able to identify where each transaction begins, as well as when and how each transaction is terminated. Figure 5-14 shows how the effects of a series of transactions can be evaluated.

image from book
Figure 5-14: Evaluating the effects of a series of transactions.

Changes made by a transaction that have not been committed are usually inaccessible to other users and applications (there are exceptions, which we will look at in Chapter 7, "Data Concurrency," when we look at the Uncommitted Read isolation level), and can be backed out with a rollback operation. However, once changes made by a transaction have been committed, they become accessible to all other users and/or applications and can only be removed by executing new UPDATE or DELETE SQL statements (from a new transaction). So what happens if a system failure occurs before a transaction's changes can be committed? If the user/application has been disconnected (for example, because of a network failure), the DB2 Database Manager backs out all uncommitted changes (by replaying information stored in the transaction log files), and the database is returned to the state it was in just before the unexpectedly terminated transaction began. On the other hand, if the database or the DB2 Database Manager is terminated (for example, because of a hard disk failure or a loss of power), the next time the database is restarted (which will take place automatically the next time a user attempts to connect to the database if the database configuration parameter autorestart has been set accordingly), the DB2 Database Manager will try to roll back all open transactions it finds in the database's transaction log files. Only after this succeeds will the database be placed online again (i.e., made accessible to users and applications).

Transaction Management and Savepoints

Often, it is desirable to limit the amount of work performed within a single transaction so that locks acquired on behalf of the transaction are released in a timely manner. (When locks are held by one transaction, other transactions may be forced to wait for those locks to be freed before they can continue.) Additionally, if a large number of changes are made within a single transaction, it can take a considerable amount of time to back those changes out if the transaction is rolled back. However, using several small transactions to perform a single large task has its drawbacks as well. For one thing, the opportunity for data inconsistency to occur will be increased if business rules have to cross several transaction boundaries. Furthermore, each time a COMMIT statement is used to terminate a transaction, the DB2 Database Manager must perform extra work to commit the current transaction and start a new one. (Another drawback of having multiple commit points for a particular operation is that portions of an operation might be committed and therefore be visible to other applications before the operation is completed.)

To get around these issues, DB2 uses a mechanism known as a savepoint to allow an application to break the work being performed by a single large transaction into one or more subsets. By using savepoints, an application avoids the exposure to "dirty data" that might occur when multiple commits are performed, yet it provides granular control over an operation-you can use as many savepoints as you want within a single transaction as long as the savepoints used are not nested.

Savepoints are created by executing the SAVEPOINT SQL statement. The basic syntax for this statement is:

 SAVEPOINT [SavepointName] <UNIQUE> ON ROLLBACK RETAIN CURSORS <ON ROLLBACK RETAIN LOCKS>

where:

SavepointName

Identifies the name that is to be assigned to the savepoint to be created.

If the UNIQUE option is specified when the SAVEPOINT statement is executed, the name assigned to the savepoint created will be unique and cannot be reused by the application that created it as long as the savepoint is active.

Thus, if you wanted to create a savepoint named MY_SP, you could do so by executing a SAVEPOINT statement that looks like this:

 SAVEPOINT my_sp ON ROLLBACK RETAIN CURSORS

Once created, a savepoint can be used in conjunction with a special form of the ROLLBACK SQL statement to return a database to the state it was in at the point in time a particular savepoint was created. The syntax for this form of the ROLLBACK statement is:

 ROLLBACK <WORK> TO SAVEPOINT <[SavepointName]>

where:

SavepointName

Identifies the name assigned to the savepoint that indicates the point in time that operations performed against the database are to be rolled back (backed out) to.

When a savepoint is no longer needed, it can be released by executing the RELEASE SAVEPOINT SQL statement. The syntax for this statement is:

 RELEASE <TO> SAVEPOINT <[SavepointName]>

where:

SavepointName

Identifies the name assigned to the savepoint that is to be released.

Thus, suppose a table named EMPLOYEES has the following characteristics:

Open table as spreadsheet

Column Name	Data Type
EMPID	INTEGER
NAME	CHAR(20)
AGE	INTEGER

Now, suppose it was populated as follows:

Open table as spreadsheet

EMPID	NAME	AGE
1	MICK JAGGER	63
2	KEITH RICHARDS	63
3	RONNIE WOOD	59
4	CHARLIE WATTS	65

If the following sequence of SQL operations were to be performed within a single transaction, in the order shown:

 UPDATE employees SET empid = 3 WHERE empid = 2; SAVEPOINT s1 ON ROLLBACK RETAIN CURSORS; UPDATE employees SET empid = 5 WHERE empid = 3; SAVEPOINT s2 ON ROLLBACK RETAIN CURSORS; INSERT INTO employees VALUES (6, 'BILL WYMAN', 70); ROLLBACK TO SAVEPOINT s1; UPDATE employees SET empid = 2 WHERE empid = 4;

Then the records stored in the EMPLOYEES table would look like this:

Open table as spreadsheet

EMPID	NAME	AGE
1	MICK JAGGER	63
2	KEITH RICHARDS	63
3	RONNIE WOOD	59
4	CHARLIE WATTS	65

In this example, only the effects of the first and last update operation were retained; all other changes were removed when the ROLLBACK TO SAVEPOINT SQL statement was executed.

Once a savepoint is created, all subsequent SQL statements executed are associated with that savepoint until it is released - either explicitly by calling the RELEASE SAVEPOINT statement or implicitly by ending the transaction or unit of work that the savepoint was created in. In addition, when you issue a ROLLBACK TO SAVEPOINT SQL statement, the corresponding savepoint is not automatically released as soon as the rollback operation is completed. Instead, you can issue multiple ROLLBACK TO SAVEPOINT statements for a given transaction, and each time a ROLLBACK TO SAVEPOINT statement is executed, the database will be returned to the state it was in at the time the savepoint was created. (If multiple savepoints have been created, it is possible to rollback to any savepoint available; you are not required to rollback successively to every savepoint, in the opposite order in which they were created, to return the database to the state it was in when an earlier savepoint was created.)