What Is a Transaction? | Programming Distributed Applications with Com and Microsoft Visual Basic 6.0 (Programming/Visual Basic)

[Previous] [Next]

A typical command in an OLTP application conducts several read and write operations to perform a single unit of work. In essence, the command executes a transaction. A transaction can be defined as a related set of operations that read and modify data on behalf of a single client. As you'll see, the requirements are quite high in an OLTP application that runs many transactions per second.

In this chapter, I'll use a sample application that lets clients submit sales orders to purchase products from the Animal Market database. The database schema for this application is shown in Figure 8-1. To keep things simple, each order is submitted on behalf of a single customer for a specific quantity of one type of product. An example is an order in which Bob buys five dogs for $100 ($20 apiece) The application must perform some validation and make the following updates to properly record the purchase:

Subtract the purchased quantity from the product's inventory level in the Products table

Add the purchase price to the customer's account balance in the Customers table

Add a new order record to the Orders table

click to view at full size.

Figure 8-1 The Animal Market application is based on this database schema.

In a real-world application, you'd want to allow a customer to buy multiple products in a single order. However, I've avoided such a design because it would require a fourth table to track order details. My simpler database design will allow you to focus more on important OLTP concepts and less on relational database theory.

Here are the SQL statements you need to submit to the DBMS to record the purchase:

 -- Decrement inventory. UPDATE Products  SET Quantity = Quantity - 5 WHERE Product = 'Dog' -- Charge customer. UPDATE Customers SET AccountBalance = AccountBalance + 100 WHERE Customer = 'Bob' -- Add new order record. INSERT Orders(Customer, Product, Quantity, Price) VALUES('Bob', 'Dog', 5, 100)

You might also want to perform some validation checks before writing a sales order. Here's an example of writing a batch of Transact-SQL statements that performs a validation before attempting an update. The batch raises an error if the requested quantity isn't in stock.

 DECLARE @Quantity int SELECT @Quantity = Quantity FROM Products WHERE Product = 'Dog' -- Raise user-defined error 50001 -- if fewer than five dogs are in stock. If (@Quantity<5) BEGIN     RAISERROR(50001,16,1)     RETURN END UPDATE Products  SET Quantity = @Quantity - 5 WHERE Product = 'Dog'

This example leverages SQL Server's ability to raise custom error messages back to the client. A call to RAISERROR can use custom error messages that have been added to an application with the system-provided stored procedure sp_addmessage. You should know how to properly raise errors from SQL statements and how to catch them from a Microsoft Visual Basic component.

A real-world sales order will most likely have other validation checks in addition to the one shown above. However, I'll leave the rest to your imagination. At this point, I've shown enough SQL so that we can begin our discussion of writing transactions.

Transactional Systems

Writing an OLTP application without transaction support from your DBMS or from the underlying platform would be incredibly difficult because so many things can go wrong during a transaction. Users often submit requests that violate business rules. For example, what if an order request asks for a product that isn't in inventory? What if a customer doesn't have the available credit to make the purchase? System failures can occur as well. What if the transaction log has filled up, leaving the database in a read-only condition? What if the DBMS computer is unreachable due to a system crash or a network failure? All these scenarios prevent the user from successfully completing a transaction.

If you didn't get any assistance from your DBMS, you'd have to write lots of contingency code to deal with just about every possible failure scenario. If a transaction successfully completes the first update but can't complete the second, the first operation must be undone. For example, let's say you remove five dogs from inventory, only to discover that the customer doesn't have enough money to buy them. You have to submit a compensating UPDATE statement against the products table to replace the inventory you just removed. Or consider a more complex transaction, in which failure occurs on the tenth operation. You have to undo nine updates in order to return the database to its previous state.

Another problem with rolling back write operations in the manner that I've just described is that it doesn't even take concurrency into account. If a large number of clients were bombarding your application with many requests per second, you'd have to write all sorts of locking logic to deal with consistency and synchronization. For instance, look at the following SQL statement:

 UPDATE Table1 SET Field1 = Field1 * 2 WHERE ID = 'ID28'

The statement seems simple enough: It doubles the value of Field1. But how can you undo this update? If you can guarantee that no other transaction has changed the value of Field1, you can undo it with the following statement:

 UPDATE Table1 SET Field1 = Field1 * 0.5 WHERE ID = 'ID28'

However, if another transaction changes the value in between the submission of your initial SQL statement and your compensating undo statement, the data will be left in an inconsistent state. To guard against this, you would have to implement some kind of locking mechanism, preventing other transactions from changing this field value in between your two statements.

Fortunately, you don't have to write your application in this manner. OLTP application programmers can leverage transaction services provided by the underlying infrastructure. DBMSs such as SQL Server, Oracle, and DB2 provide extensive support for transaction processing. Other types of data sources, such as mainframe applications and message queues, often provide their own form of transaction support as well.

This built-in support makes writing a transaction much simpler. You start a transaction, conduct a series of read and write operations, and finish by committing or rolling back the transaction. The underlying system handles most of the difficult work of dealing with rollback and locking.

A Flashback Through the ACID Rules

If you've read anything about the theory behind transaction processing, you've no doubt heard of the ACID rules. A transaction must be atomic, consistent, isolated, and durable. When you write a transaction for an OLTP system, all participants should adhere to these rules. You've probably seen these rules countless times, but just to be redundant over and over again, let's review them once more.

Atomic A transaction must be an all-or-nothing proposition. Everything must be successfully updated or nothing should be updated.

Consistent Individual operations within a transaction can leave data in such a state that it violates the system's integrity constraints, but before a transaction is released, the system's data as a whole must be returned to a valid state.

Isolated The system must isolate, or hide, the uncommitted changes of each transaction from all other transactions (typically through locking).

Durable When a transaction is committed, the data sources involved must place all changes in stable storage, and these changes must be recoverable in the event of a system failure.

Take a moment to consider what the third ACID rule really means. The system must isolate a transaction's uncommitted changes from other transactions because uncommitted changes might be rolled back. Let's look at an example of a system that doesn't provide isolation. When transaction A makes a change, an interval of time passes before the change is committed. If transaction B can see and use the uncommitted changes of transaction A, problems can occur. What happens if transaction A is rolled back? Transaction B can be infected with data that never really existed. An uncommitted read is also known as a dirty read. As you can see, to run transactions concurrently, a system must isolate uncommitted changes to enforce data consistency.

As a programmer, you rely on a transactional system to provide the required levels of isolation through locking. A DBMS such as SQL Server incorporates a built-in transaction manager and a lock manager, as shown in Figure 8-2. The transaction manager is responsible for enforcing the ACID rules. It calls on its friend the lock manager to provide each transaction with its required level of isolation.

If transaction A modifies a record, all other transactions that attempt to read this record will block until transaction A is released. Once transaction A is committed, the lock manager allows transaction B to read the record in its new state. Once transaction A has been rolled back, transaction B can read the record in its initial state. In both cases, the lock manager prevents transaction B from seeing uncommitted changes. Transaction B must wait until transaction A has been released.

click to view at full size.

Figure 8-2 The lock manager of a DBMS enforces synchronization through a locking policy.

At this point, it's important to make a few observations. Data consistency requires isolation and isolation requires locking. Locking decreases concurrency because it results in blocking. Blocking reduces an application's overall responsiveness and throughput. From these facts, we can formulate two important rules for optimizing an OLTP application:

Don't acquire unnecessary locks on data items.

Try to minimize the time that any lock is held.

Locking and Granularity

Most lock managers use more than one level of granularity when they place locks on data. Coarse-grain locks (a table lock, for example) isolate larger amounts of data. Finer-grain locks (such as a record lock) isolate smaller amounts of data. If a transaction attempts to modify every record in a table, the lock manager can perform its work faster by acquiring and releasing a single table lock rather than acquiring and releasing many individual record locks. This is especially true for a table that has 100,000 rows.

However, a table lock has a definite downside—it seriously affects concurrency. Once transaction A acquires an exclusive table lock, transaction B must wait to access any record in the table. This is true even if the two transactions aren't interested in the same records. With record-level locks, a transaction locks only the data that it's actually using. Record-level locks are preferable when a transaction is modifying a single record in a large table. Finer-grain locks have much less impact on concurrency.

SQL Server adds one additional level of locking, known as a page lock. A page lock is somewhere between a record lock and a table lock in terms of granularity. SQL Server stores records in terms of pages. A standard page size helps SQL Server optimize disk I/O when moving data back and forth between memory and storage. It also provides a locking granularity that allows for faster performance than record-level locking but isn't as restrictive as table-level locking.

SQL Server 6 and earlier versions support only page-level and table-level locking. SQL Server 6.5 introduced a scheme in which you can use record-level locks at the last page in a table when you insert rows. However, this feature is off by default; you must explicitly enable it on a table-by-table basis. Version 7 is the first version of SQL Server to offer true record-level locking.

SQL Server 7 has a built-in optimizer that determines the level of locking for each read operation and write operation. If you're running an INSERT or an UPDATE that involves a single row, the optimizer typically uses record-level locking. If you have an update that involves many rows, however, the locking optimizer might escalate the level to page locks or possibly a table lock.

In most cases, you can rely on SQL Server's optimizer to choose the locking level for you. However, in some situations you might want to explicitly tell SQL Server to use row-level locking or a table lock. SQL Server includes a set of locking hints that influence locking granularity. You can place a hint in your SQL code like this:

 UPDATE Products WITH (ROWLOCK)  SET Quantity = Quantity - 1 WHERE Name = 'Dog'

Write Locks vs. Read Locks

Most lock managers use two lock types: write locks and read locks. A lock manager uses write locks (also called exclusive locks) on data items to isolate the uncommitted changes of a transaction. The lock manager places read locks (also called shared locks) on data items when they're being read.

A write lock conflicts with other write locks and with read locks. A transaction that has a write lock blocks all other transactions from reading or writing to the data item in question. The data item remains locked until the transaction is committed or rolled back. This makes sense because the system must isolate uncommitted changes to ensure data consistency. However, this isolation has a price: The blocking reduces overall system concurrency and throughput.

Read locks don't conflict with other read locks. Many transactions can acquire read locks on the same data item concurrently. However, a transaction can't acquire a write lock on a data item that has outstanding read locks. This behavior ensures that a transaction doesn't overwrite a data item while another transaction is reading it. Table 81 summarizes how the lock manager handles lock requests when locks are already in place.

Table 8-1 How the Lock Manager Handles Lock Requests

	No Lock Held	Read Lock Held	Write Lock Held
Read Lock Requested	Lock acquired	Lock acquired	Request blocked
Write Lock Requested	Lock acquired	Request blocked	Request blocked

Isolation Levels

It's not all that difficult to determine how a lock manager uses write locks. When a transaction makes a change, the data item in question is exclusively locked from all other transactions. The system knows to hold this exclusive lock to the very end of the transaction. However, the lock manager's timing for acquiring and releasing read locks isn't as obvious. When a transaction reads a data item, how long should the system hold a read lock? Should the read lock be held for the entire transaction or just for the duration of the read operation?

You can adjust the amount of time a read lock is held by tuning the transaction's isolation level. While longer read locks help enforce data consistency and transaction isolation, they also degrade concurrency and throughput. In some situations, releasing read locks before the end of a transaction can provide an important optimization. Here are the four isolation levels supported by SQL Server:

Read Uncommitted The transaction can read any data items whether or not they have outstanding write locks. Reading a data item with an outstanding write lock is known as a dirty read. Also, the transaction won't acquire (or release) read locks. This isolation level provides the fastest access but is vulnerable to inconsistent reads.

Read Committed Transactions can read only data items that have been committed. A transaction running at this isolation level must wait until another transaction's write lock is released before accessing a data item. Read locks are held for the duration of read operations but are released before the end of the transaction. This is the default isolation level used by both SQL Server and ActiveX Data Objects (ADO) when you run a transaction without COM+.

Repeatable Read This level is like Read Committed except that all read locks are held to the end of the transaction. Any data item that's read in any stage of a transaction can be read later with the same results; that is, all data read by the transaction remains in the same state until the transaction is committed or rolled back.

Serializable This level is like Repeatable Read with one extra qualification: A query that runs multiple times in a serializable transaction must always have the same results. Phantom data items can't be inserted by other transactions until the transaction is complete. For example, if a transaction running at this level runs a query to determine the number of records in a table, no other transaction can insert a row in the table until the transaction is completed. The lock manager typically enforces this isolation level through the use of a table lock or an index range lock. This is the default isolation level used by SQL Server when it is called by an object running in a COM+ transaction.

Every level except Serializable compromises isolation to improve concurrency and throughput. When you run transactions with an isolation level of Serializable, the transaction manager favors data consistency and isolation at the expense of concurrency and throughput. A transaction is said to be serializable if it isn't affected by other transactions. When every transaction is serializable, it has the same effect on an application as running the transactions one at a time. Running all your operations at the Serializable level is always best for data consistency—but worst for concurrency. Later in the chapter, you'll see there are ways to run different operations inside the same transaction at different levels of isolation.