Transactions | Microsoft SQL Server 2005 Stored Procedure Programming in T-SQL & .NET

Even from the very name of the Transact-SQL language, you can conclude that transactions play a major role in SQL Server. They are an important mechanism for enforcing the consistency and integrity of the database.

A transaction is the smallest unit of work in SQL Server. To qualify a unit of work as a transaction, it must satisfy the following four criteria, often referred to as the ACID test:

Atomicity All data changes must be completed successfully, or none of them will be written permanently to the database.
Consistency After a transaction, the database must be left in a consistent state. All rules must be applied during processing to ensure data integrity. All constraints must be satisfied. All internal data structures must be left in an acceptable state.
Isolation Changes to the database made by a transaction must not be visible to other transactions until the transaction is complete. Before the transaction is committed, other transactions should see the data only in the state it was in before the transaction.
Durability Once a transaction is completed, changes must not revert even in the case of a system failure.

Autocommit Transactions

In fact, every Transact-SQL statement is a transaction. When it is executed, it either finishes successfully or is completely abandoned. To illustrate this, let's try to delete all records from the EqType table. Take a look at the following diagram:

image from book

A foreign key relationship exists between the EqType and Equipment tables. The foreign key prevents the deletion of records in the EqType table that are referenced by records in the Equipment table.

Let's try to delete them anyway. You can see the result of such an attempt in Figure 5-5.

image from book
Figure 5-5: Complete failure of attempt to delete records

Two Select statements that will count the number of records in EqType are placed around the Delete statement. As expected, the Delete statement is aborted because of the foreign key. The count of records before and after the Delete statement is the same, which confirms that all changes made by the Delete statement were canceled. So the database remains in the state that it was in before the change was initiated.

If there were no errors, SQL Server would automatically commit the transaction (that is, it would record all changes) to the database. This kind of behavior is called autocommit.

In this case, SQL Server deleted records one after the other from the EqType table until it encountered a record that could not be deleted because of the foreign key relationship, at which point the operation was canceled.

Explicit Transactions

The most popular and obvious way to use transactions is to give explicit commands to start or finish the transaction. Transactions started in this way are called explicit transactions. You can group Transact-SQL statements into a single transaction using the following statements:

Begin Transaction
Rollback Transaction
Commit Transaction

If anything goes wrong with any of the grouped statements, all changes need to be aborted. The process of reversing changes is called rollback in SQL Server terminology. If everything is in order with all statements within a single transaction, all changes are recorded together in the database. In SQL Server terminology, these changes are committed to the database.

I will demonstrate the use of these processes on the ap_LeaseSchedule_Clear stored procedure. Its main purpose is to set monthly lease amounts to zero for each asset associated with an expired lease schedule. It also sets the total of the lease amounts to zero. These two operations must be performed simultaneously to preserve the integrity of the database.

      Create Procedure ap_LeaseShedule_Clear      -- Set value of Lease of all equipment      -- associated with expired Lease Schedule to 0.      -- Set total amount of Lease Schedule to 0.           @intLeaseScheduleId int      As      Begin Transaction      -- Set value of Lease of all equipment      -- associated with expired Lease Schedule to 0      Update dbo.Inventory      Set Lease = 0      Where LeaseScheduleId = @intLeaseScheduleld      If @@Error <> 0 goto PROBLEM      -- Set total amount of Lease Schedule to 0      Update dbo.LeaseSchedule      Set PeriodicTotalAmount = 0      Where ScheduleId = @intLeaseScheduleld      If @@Error <> 0 goto PROBLEM      Commit Transaction      Return 0      PROBLEM:      Print ' Unable to eliminate lease amounts from the database!'      Rollback Transaction      Return 1

Before the real processing starts, the Begin Transaction statement notifies SQL Server to treat all of the following actions as a single transaction. It is followed by two Update statements. If no errors occur during the updates, all changes are committed to the database when SQL Server processes the Commit Transaction statement, and finally the stored procedure finishes. If an error occurs during the updates, it is detected by If statements and execution is continued from the PROBLEM label. After displaying a message to the user, SQL Server rolls back any changes that occurred during processing.

We will review more complex transactions (including nested transactions) and ways to process errors in the next chapter.

Implicit Transactions

The third transaction mode is called the implicit transaction. To use this mode, you must set the Set Implicit_Transactions On statement for the connection. Any of the following statements will serve as an implicit start to a transaction:

Alter Table	Create	Delete	Drop
Fetch	Grant	Insert	Open
Revoke	Select	Truncate Table	Update

To finish the transaction, you must use the Commit Transaction or Rollback Transaction statement. After that, any of the preceding commands will start a new implicit transaction.

Transaction Processing Architecture

An explanation of how transactions are implemented in Microsoft SQL Server will give you some insight into many processes.

Every change to the database is recorded in a transaction log before it is written to the appropriate tables. Transaction logs are implemented in separate files (or sets of files) with the extension .Idf. All modifications are written to this file chronologically. The records in this transaction log can later be used to roll back the transaction (thus providing atomicity) or to commit the changes to the database (thus providing durability). Two types of records can be stored in transaction logs:

Logical operations performed (for instance, Insert, Update, Delete, and start of transaction)
Before and after images of the changed data (that is, copies of data before and after the change is made)

Note

The transaction log does not record queries that are executed against the database (since they do not modify its content).

The transaction log mechanism helps to resolve data integrity issues such as:

If a client application loses its connection before a transaction is finished, SQL Server will detect a problem and roll back changes to ensure consistency.
If the machine loses power during processing, SQL Server will recover the database when services are restored. All transactions that were recorded in the transaction log in an undivided manner (that is, as part of a complete transaction set) are rolled forward (written to data tables) as if nothing unusual has happened. All transactions that were not completed before the problem occurred are rolled back (deleted) from the database.

Note

The transaction log also plays an important role in the implementation of backups in SQL Server. When a user starts a full backup, SQL Server records a complete snapshot of the data tables in backup files. At that point, SQL Server marks the current position in the transaction log and continues to record all changes to the database in the transaction log. Transactions logged during the process are also recorded as part of the full backup. When the backup is complete, SQL Server makes another mark in the transaction log. At the time of the next backup, a transaction log backup will suffice. To restore the database, an administrator first uses the full backup and then one or more transaction log backups that have been run since the full backup. SQL Server runs through the transaction log and applies changes to the data tables.

Nested Transactions

SQL Server allows you to nest transactions. Basically, this feature means that a new transaction can start even though the previous one is not complete:

      Begin transaction      . . .           Begin transaction           . . .           Commit transaction      . . .      Commit transaction

Usually this situation occurs when one stored procedure containing a transaction calls another stored procedure that also contains a transaction. In the following example, ap_Order_Complete_l completes an order by setting its completion date and changing the status of the order, and then looping through associated order items and calling ap_OrderItem_Complete_l to complete each of them. ap_Order!tem_Complete_l sets the completion date of an order item to the last ChargeLog date associated with that Orderltem. Both of these procedures contain a transaction.

      Create Procedure ap_Order_Complete_l      -- Complete all orderltems and then complete order          @intOrderId int,          @dtsCompletionDate smalldatetime      As      set nocount on      Declare @intErrorCode int,              @i int,              @intCountOrderltems int,              @intOrderltemId int      Select @intErrorCode = @@Error      If @intErrorCode = 0         Begin Transaction      -- complete order      If @intErrorCode = 0      Begin           Update dbo. [Order]           Set CompletionDate = @dtsCompletionDate,               OrderStatusId = 4 -- completed           Where OrderId = @intOrderId           Select @intErrorCode = @@Error      End      -- loop through Orderltems and complete them      If @intErrorCode = 0      Begin           Create Table #OrderItems(                 id int identity(1,1),                 OrderItemId int)          Select @intErrorCode = @@Error      End      -- collect orderltemIds      If @intErrorCode = 0      Begin           Insert Into #OrderItems(OrderltemId)                Select ItemId                From dbo.Orderltem                Where OrderId = @intOrderId                Select @intErrorCode = @@Error      End      If @intErrorCode = 0      Begin           Select @intCountOrderltems = Max (Id),                  @i = 1           From #OrderItems           Select @intErrorCode = @@Error      End      while @intErrorCode = 0 and @i <= @intCountOrderltems      Begin           If @intErrorCode = 0           Begin                Select @intOrderltemId = OrderltemId                From #OrderItems                Where id = @i               Select @intErrorCode = @@Error           End           If @intErrorCode = 0                 Exec @intErrorCode = dbo.ap_OrderItem_Complete_1 @intOrderItemId           If @intErrorCode = 0                Set @i = @i + 1      End      If OintErrorCode = 0 and @@trancount > 0             Commit Transaction      Else             Rollback Transaction      return @intErrorCode      Go      create Procedure dbo.ap_OrderItem_Complete_1      -- Set CompletionDate of Orderltem to date      --of last ChargeLog record associated with OrderItem.          @intOrderItemId int      As      set nocount on      Declare @intErrorCode int      Select @intErrorCode = @@Error      If @intErrorCode = 0           Begin Transaction      -- Set CompletionDate of OrderItem to date      --of last ChargeLog record associated with OrderItem.      If @intErrorCode = 0      Begin           update dbo.OrderItem           Set CompletionDate = (Select Max(ChargeDate)                                from dbo.ChargeLog                                where ItemId = @intOrderItemId)           Where ItemId = @intOrderItemId           Select @intErrorCode = @@Error      End      If @intErrorCode = 0      Begin           exec @intErrorCode = dbo .ap_NotifyAccounting @intOrderItemId      End      If @intErrorCode = 0 and @@trancount > 0           Commit Transaction      Else           Rollback Transaction      Return @intErrorCode

In the case of nested transactions, no Commit statements except the outer one will save changes to the database. Only after the last transaction is committed will all changes to the database become permanent. Up to that point, it is still possible to roll back all changes.

The interesting question is how SQL Server knows which transaction is the last one. It keeps the number of opened transactions in the @@trancount global variable for each user connection. When SQL Server encounters a Begin Transaction statement, it increments the value of the @@trancount, and when SQL Server encounters a Commit Transaction statement, it decrements the value of the @@trancount. Therefore, the only effect of a nested (internal) Commit Transaction statement is a change to the @@trancount value. Only the outer Commit Transaction statement (when @@trancount = 1) stores changes in data tables rather than in the transaction log.

The following is a purely academic example that does not perform any real processing, but it demonstrates the effect of nested transactions on the @@trancount global variable:

      print 'Trancount = ' + Convert(varchar(4), @@trancount)      BEGIN TRANSACTION          print 'Trancount = ' + Convert(varchar(4), @@trancount)          BEGIN TRANSACTION          print 'Trancount = ' + Convert(varchar(4), @@trancount)          COMMIT TRANSACTION          print 'Trancount = ' + Convert(varchar(4), @@trancount)     COMMIT TRANSACTION     print 'Trancount = ' + Convert(varchar(4), @@trancount)

Each transactional statement will increment and decrement the ©©trancount:

      Trancount = 0      Trancount = 1      Trancount = 2      Trancount = 1      Trancount = 0

An interesting inconsistency to observe is in the behavior of the Rollback Transaction statement. No matter how many transaction levels deep execution extends, the Rollback Transaction statement will cancel all changes caused by all transactions (and bring the @@trancount value down to zero). In fact, if you execute an additional Rollback Transaction statement after the first one, SQL Server will report an error.

      print 'Trancount = ' + Convert(varchar(4), @@trancount)      BEGIN TRANSACTION          print 'Trancount = ' + Convert(varchar(4), @@trancount)          BEGIN TRANSACTION          print 'Trancount = ' + Convert(varchar(4), @@trancount)          ROLLBACK TRANSACTION          print 'Trancount = ' + Convert(varchar(4), @@trancount)      ROLLBACK TRANSACTION      print 'Trancount = ' + Convert(varchar(4), @@trancount)

The following is the result of this example:

      Trancount = 0      Trancount = 1      Trancount = 2      Trancount = 0      Server: Msg 3903, Level 16, State 1, Line 8      The ROLLBACK TRANSACTION request has no corresponding BEGIN TRANSACTION.      Trancount = 0

Tip

I have to admit that I had many problems with this issue at one time. Be careful.

To prevent this error, you need to test for the value of the @@trancount variable before you execute the Rollback Transaction statement. A simple way to test for this value works something like this:

      if @@trancount > 0           Rollback Transaction

You will find a much better solution in Chapter 6.

Named Transactions

Transaction statements can be named. The name must be a valid SQL Server identifier (that is, no more than 128 characters), but SQL Server will read only the first 32 characters:

      Begin Tran[saction][transaction_name @transaction_name_variable]      Commit Tran[saction][transaction_name @transaction_name_variable]      Rollback [Tran[saction][transaction_name @transaction_name_variable]]

I know that this sounds like a perfect tool for resolving some issues with nested transactions. Unfortunately, in nested transactions, only the names of outer transactions are recorded by SQL Server. If you try to roll back any of the inner transactions, errors occur. The following listing is an academic demonstration of such an attempt:

      BEGIN TRANSACTION tl          BEGIN TRANSACTION t2          ROLLBACK TRANSACTION t2      ROLLBACK TRANSACTION tl      SQL Server will return an error:      Server: Msg 6401, Level 16, State 1, Line 3      Cannot roll back t2. No transaction or savepoint of that name was found.

Tip

You can see that you need to know the name of the outer transaction that has called all other stored procedures/transactions. This is not a practical requirement, especially when your stored procedure will he called from more than one stored procedure. Therefore, I recommend that you do not use transaction names.

Savepoints

SQL Server contains a mechanism for rolling back only part of a transaction. This statement may seem to contradict the basic idea of a SQL Server transaction as I have explained it, but it can be justified in some cases. Microsoft recommends savepoints be used if it is more expensive to check whether the change will be valid in advance (for example, because of a slow connection) and when the operation has a high probability of success. For example, assume that you are trying to reserve a set of plane tickets (or to get a set of some other resources) using different companies (distributed database system).

Each leg of a journey has to be booked separately. If the reservation fails, you will roll back just that leg of the journey, not all the reservations that you already successfully made. Only in the case that it is impossible to find any alternative for the remaining part of the journey will you roll back the complete transaction.

To mark a savepoint in a transaction, use the following statement:

      Save Tran[saction] {savepoint_name|@savepoint_variable}

The savepoint's name is also a SQL Server identifier, but SQL Server reads only the first 32 characters.

To roll back part of the transaction, you must use the savepoint name or variable:

      Rollback Tran[saction]{savepoint_name @savepoint_variable}

Note

Rollback Transaction statements without a savepoint will roll back the complete transaction.

Savepoints do not save anything to the database. They just mark the point to which you can roll back a transaction. Resources (like locks) also stay in place after a Save Transaction statement. They are released only when a transaction has been completed or canceled.

The following procedures are designed to store an order and a set of order items in a database. The ap_ScrapOrderItem_Save stored procedure uses savepoints to roll back the insertion of a particular item.

      Create Procedure dbo.ap_ScrapOrder_Save      -- save order information.           @dtsOrderDate smalldatetime,           @intRequestedById int,           @dtsTargetDate smalldatetime,           @chvNote varchar(200),           @insOrderTypeId smallint,           @inyOrderStatusId tinyint      As           Set nocount on           Insert dbo. [Order] (OrderDate,    RequestedById,                                TargetDate,   Note,                                OrderTypeId,  OrderStatusId)          Values (@dtsOrderDate,   @intRequestedById,                  @dtsTargetDate,  @chvNote,                  @insOrderTypeld, @inyOrderStatusId)      Return @@identity      Go      Create Procedure dbo.ap_ScrapOrderItem_Save      -- Saves order item.      -- If error occurs, this item will be rolled back,      -- but other items will be saved.      -- demonstration of use of Save Transaction      -- must be called from sp or batch that initiates transaction           @intOrderId int,           @intlnventoryId int,           @intOrderItemId int OUTPUT      As           Set nocount on           Declare  @intErrorCode int,                    @chvlnventoryId varchar(10)           -- name the transaction savepoint           Set OchvInventoryId = Convert(varchar, OintInventoryId)           Save Transaction @chvInventoryId           -- Set value of Lease of all equipment associated           -- with expired Lease Schedule to 0           Insert dbo.OrderItem (OrderId, Inventoryld)           Values (OintOrderId, @intInventoryId)           Select @intOrderItemId = @@identity,                  @intErrorCode = @@Error           If @intErrorCode <> 0           Begin                Rollback Transaction @chvInventoryId                Return @intErrorCode           End      Return 0      Go

Let's assume that the caller is some external application that is trying to fulfill an order by adding line item by line item. If one line item fails, the application will detect an error, roll back to the last savepoint, and try to add some other line item.

The stored procedures are designed in such a manner that a transaction must be initiated by the caller. You can test the stored procedures by using the following batch:

      Declare  @intOrderId int,               @intOrderItemId int      Begin Tran      Exec @intOrderId = dbo.ap_ScrapOrder_Save                @dtsOrderDate = '1/10/2003',                @intRequestedById = 1,                @dtsTargetDate = '1/1/2004',                @chvNote = NULL,                @insOrderTypeId =3, -- scrap                @inyOrderStatusId = 1 -- ordered      Exec dbo.ap_ScrapOrderItem_Save                @intOrderId,                5,                @intOrderItemId OUTPUT      Exec dbo.ap_ScrapOrderItem_Save                @intOrderId,                6,                @intOrderItemId OUTPUT      Exec dbo.ap_ScrapOrderItem_Save                @intOrderId,                8,                @intOrderItemId OUTPUT      Commit Tran

In nested transaction statements, transaction names are ignored or can cause errors. If you are using transactions in stored procedures, which could be called from within other transactions, do not use transaction names. In the previous example, although stored procedures with transaction names are called from a batch (it could have been implemented as a stored procedure), the transaction itself was not nested.

Locking

Let me remind you of the requirements represented by the so-called ACID test. The isolation requirement means that changes to the database made by a transaction are not visible to other transactions that are themselves in an intermediate state at the time of that transaction's completion, and that before the transaction is committed, other transactions can see data only in the state it was in before the transaction.

To satisfy the isolation requirement, SQL Server uses locks. A lock is a restriction placed on the use of a resource in a multi-user environment. It prevents other users (that is, processes) from accessing or modifying data in the resource. SQL Server automatically acquires and releases locks on resources in accordance with the actions a user performs. For example, while the user is updating a table, nobody else can modify (and in some cases, even see) records that are already updated. As soon as all updates connected to the user action are completed, the locks are released and the records become accessible.

There is just one problem with this process. Other users have to wait for the resource to become available again—they are blocked. Such blocking can lead to performance problems or even cause a process to fail. The use of locking is a trade-off between data integrity and performance. SQL Server is intelligent enough to handle most problems, and it does a great job of preventing problems. It is also possible to control locking using transaction isolation levels and optimizer (lock) hints, both of which are described in the next section. Locks can have different levels of granularity. They can be acquired on

Rows
Pages
Keys
Ranges of keys
Indexes
Tables
Databases

SQL Server automatically acquires a lock of the appropriate granularity on a resource. If SQL Server determines during execution that a lock is no longer adequate, it dynamically changes the lock's granularity.

Locks are acquired by connection. Even if two connections are initiated from the same application, one can block the other.

The type of lock acquired by SQL Server depends on the effect that the change statement will have on the resource. For example, different locks are applied for the Select statement and the Update statement. There are five lock types:

Shared (read) locks Usually acquired for operations that do not modify data (that is, read operations). Another transaction can also acquire a nonexclusive lock on the same record, and thus the lock is shared. The shared lock is released when the transaction moves on to read another record.
Exclusive (write) locks Acquired for statements that modify data (such as Insert, Update, and Delete). Only one exclusive lock on a resource can be held at a time. An exclusive lock can be acquired only after other locks on the resource (including shared locks) are released.
Update locks Resemble shared locks more than they do exclusive locks. They are used to notify SQL Server that a transaction will later modify a resource. They prevent other transactions from acquiring exclusive locks. Update locks can coexist with shared locks. Just before the resource is modified, SQL Server promotes the update lock to an exclusive lock.
Intent locks Set on an object of higher granularity to notify SQL Server that a process has placed a lock of lower granularity inside the object. For example, if a transaction places a lock on a page in a table, it will also place an intent lock on the table. The intent lock means that SQL Server does not have to scan the whole table to find out if a process has placed a lock on some page or record inside, in order to place a table lock for another transaction. In fact, there are three different types of intent locks: IS (intent share), IX (intent exclusive), and SIX (shared with intent exclusive).
Schema locks Prevent the dropping or modifying of a table or index while it is in use. There are two types of schema locks. Sch-S (schema stability) locks prevent table or index drops. Sch-M (schema modification) locks ensure that other transactions cannot access the resource while it is being modified.

Transaction Isolation Levels and Hints

You can change the default behavior of SQL Server using transaction isolation levels or lock hints. Transaction isolation levels set locking at the connection level, and lock hints set locking at the statement level. SQL Server can work on five different transaction isolation levels:

Serializable The highest level in which transactions are completely isolated. The system behaves as though the transactions are occurring one after another. SQL Server will hold locks on both data and key records until the end of the transaction. This may lead to some performance issues.
Repeatable Read Forces SQL Server to place shared locks on data records and hold them until the transaction is completed. Unfortunately, it allows phantoms, which occur when a transaction reads a range of records. There is no guarantee that some other concurrent transaction will not add records that fall in the range or modify keys of records so that they fall out of the range. If the uncommitted transaction repeats the read, the result will be inconsistent.
Read Committed The default level in SQL Server. SQL Server places shared locks while reading. It allows phantoms and nonrepeatable reads. There is no guarantee that the value of the record that a transaction reads multiple times during execution will stay consistent. Some other transaction could change it.
Read Uncommitted The lowest level of isolation in SQL Server. It ensures that physically corrupt data is not read. SQL Server will not place shared locks, and it will ignore exclusive locks. You will have the fewest performance issues when using this level, but you will also likely have many data integrity problems. It allows phantoms, nonrepeatable reads, and dirty reads (everybody can see the content of the changed record, even if a transaction is not yet committed and could potentially be rolled back).
Snapshot A new level of isolation introduced in SQL Server 2005. It is designed to support databases that use row versioning to reduce blocking of readers and writers. The engine stores versions of rows before they were modified in tempdb and serves them to readers until the transaction is completed. This reduces blocking time but increases the need for processing power and memory. It could lead to concurrency conflicts. Readers do not issue shared locks.

Note

There is one database setting that changes the behavior of SQL Server by using the same row versioning mechanism as Snapshot isolation level. It is also introduced in SQL Server 2005. In some ways it is like Snapshot and in some ways like Read Committed isolation level. Like Snapshot, it is based on row versioning. Since this is a database setting, it will preserve earlier versions of all rows changed in the database (not just rows that are changed in a session set to Snapshot isolation level). Like with Read Committed, SQL Server can return to readers consistent versions of rows (before the transaction). But since SQL Server can return to readers earlier versions of all rows that are still being changed or locked in transactions, readers do not have to wait for transactions to be completed and therefore do not have to issue read locks.

The isolation level is specified in the Set Transaction Isolation Level statement. For example:

      Set Transaction Isolation Level Repeatable Read

Locking hints change the behavior of the locking manager as it processes a single Transact-SQL statement. They overwrite behavior set by the transaction isolation level. The following table describes hints that can be used to control locking:

Hints	Description
Holdlock or Serializable	Holds a shared lock until a transaction is completed. The lock will not be released when the resource is no longer needed, but rather when the transaction is completed.
Nolock	This hint applies only to Select statements. SQL Server will not place shared locks and it will ignore exclusive locks.
Updlock	Uses update instead of shared locks while reading a table.
Rowlock	Specifies the granularity of locks at the row level.
Paglock	Specifies the granularity of locks at the page level.
Tablock	Specifies the granularity of locks at the table level.
Tablockx	Specifies the granularity of locks at the table level and the type of lock to be exclusive.
Readcommitted	Equivalent to the default isolation level (Read Committed).
Readpast	This hint is applicable only in Select statements working under the Read Committed isolation level. Result sets created with this hint will not contain records locked by other transactions.
Readuncommitted	Equivalent to the Read Uncommitted isolation level.
Repeatableread	Equivalent to the Repeatable Read isolation level.

Locking hints can be used in Select, Insert, Update, or Delete statements. They are set after the table reference in SQL statements (for example, in the From clause of a Select statement or in the Insert clause of an Insert statement). Their scope is just the table that they are used for. For example, the following command will hold a lock until the transaction is completed:

      Select *      From Inventory With (HOLDLOCK)      Where InventoryId = OintInventoryId

Nobody will be able to change data records that were read and keys that match the criteria of this table until the transaction is completed. Therefore, this table cannot have phantoms, nonrepeatable reads, or dirty reads.

The next example demonstrates the use of hints in an Update statement and the use of more than one hint in a statement:

      Update Inventory With (TABLOCKX, HOLDLOCK)      Set StatusId = 4      Where StatusId = OintStatusId

The complete table will be locked for the duration of the transaction.

Distributed Transactions

Microsoft Distributed Transaction Coordinator (MSDTC) is a component that allows you to span transactions over two or more servers while maintaining transactional integrity.

Servers in this scenario are called resource managers, and MSDTC performs the function of transaction manager. In fact, all those resource managers do not even have to be Microsoft servers; they just have to be compatible with MSDTC. For example, it is possible to execute a single transaction against databases on Microsoft SQL Server and Oracle.

When transactions are distributed over different resource managers, different mechanisms have to be applied by the transaction coordinator to compensate for problems that might occur in such an environment. A typical problem is network failure. For example, everything might be executed properly by each individual resource manager, but if the transaction coordinator is not informed due to a network failure, the result is the same as if one of the resource managers had failed, and the transaction will be rolled back.

The mechanism for dealing with such problems is called the two-phase commit (2PC). As the name implies, it consists of two phases:

Prepare phase Starts when a transaction manager receives a request to execute a transaction. It notifies the resource managers and informs them of the work that needs to be done. The resource managers perform all changes and even write everything from the transaction login memory to the disk. When everything is completed, each resource manager sends a status message indicating success or failure to the transaction manager.
Commit phase Starts when the transaction manager receives messages from resource managers. If the resource managers successfully complete the preparation phase, the transaction manager sends a Commit command to the resource managers. Each of them makes the changes permanently to the database and reports the success of the operation to the transaction manager. If any of the resource managers reports failure during the preparation phase, the transaction manager will send a Rollback command to all resource managers.

From a developer's point of view, distributed transactions are very similar to regular transactions. The major difference is that you need to use the following statement to start the transaction:

      Begin Distributed Tran[saction]    [transaction_name]

Distributed transactions can also be started implicitly, by executing a query or stored procedure that will be run against distributed servers.

Transactions are completed with regular Commit or Rollback statements. The following stored procedure updates two tables in a local database and then updates information in a remote database using a remote stored procedure:

      Alter Procedure ap_LeaseShedule_Clear_distributed      -- Set value of Lease of all equipment associated to 0      -- Set total amount of Lease Schedule to 0.      -- notify lease company that lease schedule is completed           @intLeaseScheduleId int      As           Declare @chvLeaseNumber varchar(50),                   @intError int           -- Verify that lease has expired           If GetDate@ < (Select EndDate                             From dbo.LeaseSchedule                             Where ScheduleId = OintLeaseScheduleId)              Raiserror ('Specified lease schedule has not expired yet!', 16,1)           If @@Error <> 0           Begin                Print 'Unable to eliminate lease amounts from the database!'                Return 50000           End           -- get lease number           Select @chvLeaseNumber = Lease.LeaseNumber           From dbo.Lease Lease           Inner Join dbo.LeaseSchedule LeaseSchedule           On Lease.LeaseId = LeaseSchedule.LeaseId           Where (LeaseSchedule.ScheduleId = OintLeaseScheduleId)           Begin Distributed Transaction           -- Set value of Lease of all equipment associated to 0           Update dbo.Inventory           Set Lease = 0           Where LeaseScheduleId = OintLeaseScheduleId          If @@Error <> 0 Goto PROBLEM          -- Set total amount of Lease Schedule to 0          Update LeaseSchedule          Set PeriodicTotalAmount = 0          Where ScheduleId = OintLeaseScheduleId          If @@Error <> 0 Goto PROBLEM          -- notify lease vendor      Exec SintError = lease_srvr.LeaseShedules..prLeaseScheduleComplete                          @chvLeaseNumber, @intLeaseScheduleId           If @intError <> 0 GoTo PROBLEM           Commit Transaction           Return 0      PROBLEM:           print 'Unable to complete lease schedule!'           Rollback Transaction      Return 50000

Apart from a reference to the remote stored procedure, the only thing that needed to be done was to use the Distributed keyword to start the transaction. Everything else was managed by MSDTC.

Typical Transaction-related Problems

Transactions are a powerful weapon in the hands of a programmer, but improper use can cause substantial damage. I will try to forewarn you of some typical problems.

A Never-ending Story

The worst thing that you can do is to explicitly open a transaction and then forget to close it. All changes sent to the database through that connection will become part of that transaction; resources normally released at the end of a transaction are held indefinitely; other users cannot access resources; and eventually, your server chokes.

Spanning a Transaction over Batches

A transaction can span batches. SQL Server counts transactions over the connection, so it is "legal" to issue two batches like this over one connection:

      Begin Transaction      update dbo.Inventory      set Lease = 0      where LeaseScheduleId = 141      Go      update dbo.LeaseSchedule      Set PeriodicTotalAmount = 0      where ScheduleId = 141      Commit Transaction      Go

However, I cannot think of any justification for doing so, and you significantly increase the probability of error. For example, you could easily forget to finish the transaction.

Tip

There are some cases in which it is justified for a transaction to span hatches; for example, when a DDL statement must he in a separate hatch.

Rollback Before Begin

Sometimes you might set your error handling so that all errors that occur in a stored procedure are treated in the same way. Naturally, you will include a statement to roll back the transaction. If an error occurs before the transaction starts, the stored procedure will jump to the error handling code and another error will occur:

      Create Procedure dbo.ap_LeaseShedule_Clear_l      -- Set value of Lease of all equipment associated      -- with expired Lease Schedule to 0      -- Set total amount of Lease Schedule to 0.           @intLeaseScheduleId int      As           -- Verify that lease has expired           If GetDate() < (select EndDate                          from dbo.LeaseSchedule                          where ScheduleId = @intLeaseScheduleId)         Raiserror ('Specified lease schedule has not expired yet!', 16,1)        -- If error occurs here,        -- server will execute Rollback before transaction is started!        if @@Error <> 0 goto PROBLEM        Begin Transaction        -- Set value of Lease of all equipment associated        -- with expired Lease Schedule to 0        update dbo.Inventory        set Lease = 0        where LeaseScheduleId = @intLeaseScheduleId        if @@Error <> 0 goto PROBLEM        -- Set total amount of Lease Schedule to 0        update dbo.LeaseSchedule        Set PeriodicTotalAmount = 0        where ScheduleId = @intLeaseScheduleId        if @@Error <> 0 goto PROBLEM        commit transaction        return 0      PROBLEM:      print 'Unable to eliminate lease amounts from the database!'             rollback transaction      return 1

Multiple Rollbacks

Unlike Commit statements, only one Rollback statement is required to close a set of nested transactions. In fact, if more than one Rollback statement is executed, SQL Server will raise another error.

Long Transactions

SQL Server places locks on data that has been modified by a transaction, to prevent other users from further changing the data until the transaction is committed. This feature can lead to problems if a transaction takes "too long" to complete.

Note

There is no exact definition of "too long." The longer a transaction works, the greater the likelihood that problems will occur.

Some of the problems that might occur if a long transaction is present in the database include the following:

Other users are blocked. They will not be able to access and modify data.
The transaction log fills up. (SQL Server 2000 and SQL Server 7.0 can be configured to automatically increase the size of the transaction log, but you could fill your disk as well.)
Most of the time, transaction log work is performed in memory. If all available memory is used before the transaction is complete, SQL Server will start saving changes to disk, thus reducing the overall performance of the server.

Tip

You should be particularly aware of concurrency problems because tbey are the problems most likely to bappen. Wbile you are developing applications, you will probably work alone (or in a small group) on the server, but the situation will cbange drastically wben you place 50,250, or 5000 concurrent users on the production server.