No matter what concurrency model you're working with, an understanding of transactions is crucial. A transaction is the basic unit of work in SQL Server. Typically, it consists of several SQL commands that read and update the database, but the update is not considered final until a COMMIT command is issued (at least for an explicit transaction). In general, when I talk about a modification operation or a read operation, I am talking about the transaction that performs the data modification or the read, which is not necessarily a single SQL statement. When I say that writers will block readers, I mean that as long as the transaction that performed the write operation is active, no other process can read the modified data. The mechanics of transaction control, along with discussions of the ANSI transaction properties and transaction isolation levels, can be found in Inside Microsoft SQL Server 2005: T-SQL Programming. I will go into detail about the transaction isolation levels, and the types of locks in the various isolation levels, which will overlap somewhat with the T-SQL programming volume. I will not, however, address the programming aspects of working with transactions in this volume, other than to describe the difference between an implicit and an explicit transaction. An implicit transaction is any individual INSERT, UPDATE, or DELETE statement. You can also consider SELECT statements to be implicit transactions, although there are no records for SELECT statements written to the transaction log. No matter how many rows are affected, the statement must exhibit all the ACID properties of a transaction, which I'll tell you about in the next section. An explicit transaction is one whose beginning is marked with a BEGIN TRAN statement and whose end is marked by a COMMIT TRAN or ROLLBACK TRAN. Most of the examples I present use explicit transactions because it is the only way to show SQL Server state in the middle of a transaction. For example, many types of locks are held for only the duration of the transaction. I can begin a transaction, perform some operations, look around in the metadata to see what locks are being held, and then end the transaction. When the transaction ends, the locks are released; I can no longer look at them. ACID PropertiesTransaction processing guarantees the consistency and recoverability of SQL Server databases. It ensures that all transactions are performed as a single unit of workeven in the presence of a hardware or general system failure. Such transactions are referred to as having the ACID properties: atomicity, consistency, isolation, and durability. In addition to guaranteeing that explicit multi-statement transactions maintain the ACID properties, SQL Server guarantees that an implicit transaction also maintains the ACID properties. Here's an example in pseudocode of an explicit ACID transaction. BEGIN TRANSACTION DEBIT_CREDIT Debit savings account $1000 Credit checking account $1000 COMMIT TRANSACTION DEBIT_CREDIT Now let's take a closer look at each of the ACID properties. AtomicitySQL Server guarantees the atomicity of its transactions. Atomicity means that each transaction is treated as all or nothingit either commits or aborts. If a transaction commits, all its effects remain. If it aborts, all its effects are undone. In the preceding DEBIT_CREDIT example, if the savings account debit is reflected in the database but the checking account credit isn't, funds will essentially disappear from the databasethat is, funds will be subtracted from the savings account but never added to the checking account. If the reverse occurs (if the checking account is credited and the savings account is not debited), the customer's checking account will mysteriously increase in value without a corresponding customer cash deposit or account transfer. Because of SQL Server's atomicity feature, both the debit and credit must be completed or else neither event is completed. ConsistencyThe consistency property ensures that a transaction won't allow the system to arrive at an incorrect logical statethe data must always be logically correct. Constraints and rules are honored even in the event of a system failure. In the DEBIT_CREDIT example, the logical rule is that money can't be created or destroyed: a corresponding, counterbalancing entry must be made for each entry. (Consistency is implied by, and in most situations redundant with, atomicity, isolation, and durability.) IsolationIsolation separates concurrent transactions from the updates of other incomplete transactions. In the DEBIT_CREDIT example, another transaction can't see the work in progress while the transaction is being carried out. For example, if another transaction reads the balance of the savings account after the debit occurs, and then the DEBIT_CREDIT transaction is aborted, the other transaction will be working from a balance that never logically existed. SQL Server accomplishes isolation among transactions automatically. It locks data or creates row versions to allow multiple concurrent users to work with data while preventing side effects that can distort the results and make them different from what would be expected if users were to serialize their requests (that is, if requests were queued and serviced one at a time). This serializability feature is one of the isolation levels that SQL Server supports. SQL Server supports multiple isolation levels so that you can choose the appropriate tradeoff between how much data to lock, how long to hold locks, and whether to allow users access to prior versions of row data. This tradeoff is known as concurrency vs. consistency. DurabilityAfter a transaction commits, SQL Server's durability property ensures that the effects of the transaction persist even if a system failure occurs. If a system failure occurs while a transaction is in progress, the transaction is completely undone, leaving no partial effects on the data. For example, if a power outage occurs in the midst of a transaction before the transaction is committed, the entire transaction is rolled back to when the system was restarted. If the power fails immediately after the acknowledgment of the commit is sent to the calling application, the transaction is guaranteed to exist in the database. Write-ahead logging and automatic rollback and roll-forward of transactions during the recovery phase of SQL Server startup ensure durability. Transactions always support all four of the ACID properties. A transaction might exhibit several additional behaviors as well. Some people call these behaviors "dependency problems" or "consistency problems," but I don't necessarily think of them as problems. They are merely possible behaviors, and you can determine which of these behaviors you want to allow and which you want to avoid. Your choice of isolation level determines which of these behaviors is allowed.
The behavior of your transactions depends on the isolation level. As mentioned earlier, you can decide which of the four behaviors described previously to allow by setting an appropriate isolation level using the command SET TRANSACTION ISOLATION LEVEL <isolation_level>. Your concurrency model (optimistic or pessimistic) determines how the isolation level is implementedor, more specifically, how SQL Server guarantees that the behaviors you don't want will not occur. Isolation LevelsSQL Server 2005 supports five isolation levels that control the behavior of your read operations. Three of them are available only with pessimistic concurrency, one is available only with optimistic concurrency, and one is available with either. We'll look at these levels now, but a complete understanding of isolation levels also requires an understanding of locking and row versioning. In my descriptions of the isolation levels, I'll mention the locks or row versions that support that level, but keep in mind that locking and row versioning will be discussed in detail later in the chapter. Uncommitted ReadIn Uncommitted Read isolation, all the behaviors described previously except lost updates are possible. Your queries can read uncommitted data, and both non-repeatable reads and phantoms are possible. Uncommitted read is implemented by allowing your read operations to not take any locks, and because SQL Server isn't trying to acquire locks, it won't be blocked by conflicting locks acquired by other processes. Your process will be able to read data that another process has modified but not yet committed. Although this scenario isn't usually the ideal option, with Uncommitted Read you can't get stuck waiting for a lock, and your read operations don't acquire any locks that might affect other processes that are reading or writing data. When using Uncommitted Read, you give up the assurance of strongly consistent data in favor of high concurrency in the system without users locking each other out. So when should you choose Uncommitted Read? Clearly, you don't want to use it for financial transactions in which every number must balance. But it might be fine for certain decision-support analysesfor example, when you look at sales trendsfor which complete precision isn't necessary and the trade-off in higher concurrency makes it worthwhile. Read Uncommitted isolation is a pessimistic solution to the problem of too much blocking activity because it just ignores the locks and does not provide you with transactional consistency. Read CommittedSQL Server 2005 supports two varieties of Read Committed isolation, which is the default isolation level. This isolation level can be either optimistic or pessimistic, depending on the database setting READ_COMMITTED_SNAPSHOT. Because the default for the database option is off, the default for this isolation level is to use pessimistic concurrency control. Unless indicated otherwise, when I refer to the Read Committed isolation level, I will be referring to both variations of this isolation level. I'll refer to the pessimistic implementation as Read Committed (locking), and I'll refer to the optimistic implementation as Read Committed (snapshot). Read Committed isolation ensures that an operation never reads data that another application has changed but not yet committed. (That is, it never reads data that logically never existed.) With Read Committed (locking), if another transaction is updating data and consequently has exclusive locks on data rows, your transaction must wait for those locks to be released before you can use that data (whether you're reading or modifying). Also, your transaction must put share locks (at a minimum) on the data that will be visited, which means that data might be unavailable to others to use. A share lock doesn't prevent others from reading the data, but it makes them wait to update the data. By default, share locks can be released after the data has been processedthey don't have to be held for the duration of the transaction, or even for the duration of the statement. (That is, if shared row locks are acquired, each row lock can be released as soon as the row is processed, even though the statement might need to process many more rows.) Read Committed (snapshot) also ensures that an operation never reads uncommitted data, but not by forcing other processes to wait. In Read Committed (snapshot), every time a row is updated, SQL Server generates a version of the changed row with its previous committed values. The data being changed is still locked, but other processes can see the previous versions of the data as it was before the update operation began. Repeatable ReadRepeatable Read is a pessimistic isolation level. It adds to the properties of Committed Read by ensuring that if a transaction revisits data or a query is reissued, the data will not have changed. In other words, issuing the same query twice within a transaction will not pick up any changes to data values made by another user's transaction. However, the Repeatable Read isolation level does allow phantom rows to appear. Preventing non-repeatable reads is a desirable safeguard in some cases. But there's no free lunch. The cost of this extra safeguard is that all the shared locks in a transaction must be held until the completion (COMMIT or ROLLBACK) of the transaction. (Exclusive locks must always be held until the end of a transaction, no matter what the isolation level or concurrency model, so that a transaction can be rolled back if necessary. If the locks were released sooner, it might be impossible to undo the work because other concurrent transactions might have used the same data and changed the value.) No other user can modify the data visited by your transaction as long as your transaction is open. Obviously, this can seriously reduce concurrency and degrade performance. If transactions are not kept short or if applications are not written to be aware of such potential lock contention issues, SQL Server can appear to "hang" when a process is waiting for locks to be released. Note
SnapshotSnapshot isolation is an optimistic isolation level. Like Read Committed (snapshot), it allows processes to read older versions of committed data if the current version is locked. The difference between Snapshot and Read Committed (snapshot) has to do with how old the older versions have to be. (We'll see the details in the section on row versioning.) Although the behaviors prevented by Snapshot isolation are the same as those prevented by Serializable, Snapshot is not truly a serializable isolation level. With Snapshot isolation, it is possible to have two transactions executing simultaneously that give us a result that is not possible in any serial execution. Table 8-1 shows an example of two simultaneous transactions. If they run in parallel, they will end up switching the price of two books in the titles table in the pubs database. However, there is no serial execution that would end up switching the values, whether we run Transaction 1 and then Transaction 2, or run Transaction 2 and then Transaction 1. Either serial order ends up with the two books having the same price.
SerializableSerializable is also a pessimistic isolation level. The Serializable isolation level adds to the properties of Repeatable Read by ensuring that if a query is reissued, rows will not have been added in the interim. In other words, phantoms will not appear if the same query is issued twice within a transaction. Serializable is therefore the strongest of the pessimistic isolation levels because it prevents all the possible undesirable behaviors discussed earlierthat is, it does not allow uncommitted reads, non-repeatable reads, or phantoms, and it also guarantees that your transactions can be run serially. Preventing phantoms is another desirable safeguard. But once again, there's no free lunch. The cost of this extra safeguard is similar to that of Repeatable Readall the shared locks in a transaction must be held until completion of the transaction. In addition, enforcing the Serializable isolation level requires that you not only lock data that has been read, but also lock data that does not exist! For example, suppose that within a transaction we issue a SELECT statement to read all the customers whose ZIP Code is between 98000 and 98100, and on first execution no rows satisfy that condition. To enforce the Serializable isolation level, we must lock that range of potential rows with ZIP Codes between 98000 and 98100 so that if the same query is reissued, there will still be no rows that satisfy the condition. SQL Server handles this situation by using a special kind of lock called a key-range lock. Key-range locks require that there be an index on the column that defines the range of values. (In this example, that would be the column containing the ZIP Code.) If there is no index on that column, Serializable isolation requires a table lock. I'll discuss the different types of locks in detail in the section on locking. The Serializable level gets its name from the fact that running multiple serializable transactions at the same time is the equivalent of running them one at a timethat is, serially. For example, suppose that transactions A, B, and C run simultaneously at the Serializable level and each tries to update the same range of data. If the order in which the transactions acquire locks on the range of data is B, C, and then A, the result obtained by running all three simultaneously is the same as if they were run sequentially in the order B, C, and then A. Serializable does not imply that the order is known in advance. The order is considered a chance event. Even on a single-user system, the order of transactions hitting the queue would be essentially random. If the batch order is important to your application, you should implement it as a pure batch system. Serializable means only that there should be a way to run the transactions serially to get the same result you get when you run them simultaneously. In the upcoming discussion of the Snapshot isolation level, I'll show you a case where transactions are not serializable. More Info
Table 8-2 summarizes the behaviors that are possible in each isolation level and notes the concurrency control model that is used to implement each level. You can see that Read Committed and Read Committed (snapshot) are identical in the behaviors they allow, but the behaviors are implemented differentlyone is pessimistic (locking), and one is optimistic (row versioning). Serializable and Snapshot also have the same No values for all the behaviors, but one is pessimistic and one is optimistic.
|