Chapter 17: Working with Transactional Data | Microsoft SQL Server 2005: The Complete Reference: Full Coverage of all New and Improved Features

As powerful as SQL Server 2005 is, most applications work just fine with its default values, and often there is no need to override or assume control of its innermost data-processing functionality. But there are also many times that applications are required to have specialized and critical control over sophisticated, yet still standards-compliant, RDBMS features: locking, transactions, and so on.

No doubt many of you will require such access to functionality and control from the get-go, especially if you are supporting applications and solutions that are not run-of-the-mill. While e-commerce applications, .NET applications, transaction processing, and other distributed solutions are no longer exceptional situations (thanks to the Internet), they still represent areas in database programming that are fairly new-age stuff, especially for desktop or PC solution developers who have finally decided that they need to be liberated: This chapter uncovers what you need to know to tame SQL Server’s innermost data processing services.

We will also use this chapter to talk about cursors, functions, and a number of advanced programming concepts.

Locking

SQL Server, like all DBMSs, supports locking and implicitly and automatically manages it for you for the most part. You can create applications that work with SQL Server and never have to worry about locking data. But, in many respects, that’s like driving a motor car and thinking you don’t need to know what goes on in the engine-until you break down in the middle of Alligator Alley without a fan belt or a fuel line and Skeeter and his tow truck are 100 miles away.

Why Do We Need Locking

SQL Server 2005 is a multiuser database management system. Today you can have one person accessing data, and tomorrow you can have one million people all accessing the data concurrently. Some of those people may only need to read the data, and if they are allowed to, then they can all share the data or the document. But what if more than one person has a contribution to make; what if more than one person needs to modify the record? What if one user decides to modify a record you are working with and another comes along with information that requires the record to be deleted? Now you have a problem akin to two buyers fighting over an item at a garage sale, each pulling on the item until it breaks in half. When more than one person accesses a record f or modification, the record or row may break-in database lingo, the data can no longer be guaranteed to be reliable. Data integrity is violated, and the data is no longer trustworthy.

I love that old saying that “there is nothing more constant than change.” Change is happening all the time. But change often needs to be managed, or the transition from one state to the next may become corrupt and the outcome of the change sequence compromised. In high-end multiuser systems the concurrency problem is highly exacerbated.

My first encounter with locking in computer systems takes me back 20 years to when I was a junior reporter working for a daily newspaper. We had this huge computer system called the Atex system, and I would type my news stories into it. Once the article was finished, I would enter the send command and off it would go to the news desk. After that, I no longer had write access to the news story I could read the story, and every time I accessed it I would see that it was being edited or updated and so on. If I tried to change something, the system would tell me that I no longer had write access to the file because it was being updated by the copy editor. The only way I could get write access was to be part of the copy editor group.

One day I managed to hack into the system (I made a friend in the computer room) and reprogrammed my access (so started my sojourn into computer systems and software development). I was then able to get write permission to the file in the copyediting stage, even before the chief copy editor had finished with the file. So I would go in and change my byline from “By Daily Mail Reporter” to “By Jeffrey Shapiro.” When the chief copy editor found out, he had me “sentenced” to spend six months as a junior subeditor, and my shift started at midnight.

Today you can experience the same thing in your average word processor. It is possible to lock the document so that someone else can read it but not update it. And you can even control the access (read or write) to the file.

In SQL Server databases you cannot afford to be working with data that suddenly becomes updated under your nose. If you receive a message like “the record you are reading no longer exists in the database,” then you know that the record was not locked and someone deleted it (or a serious lock or access violation has occurred). The following list explores four specific problems that occur in weak locking situations:

Lost update The lost update occurs when two or more users open a record or document at the same time. One updates the record and moves on. But a second or even a third user updates the record before the original user has finished with it. When the first user finally updates the record, the update made before lost. Getting back to my copyediting experience: I would change my byline before the copy editor was finished, only to find later that he overwrote my update with the copy still sitting in his terminal’s session and my change was lost.
Uncommitted dependency The uncommitted dependency occurs when you allow a record to be read or accessed while another user is still working with it. In database jargon, we say that the user or session is able to obtain a dirty read on the data. This means that the later session can read the data and make some decision concerning it. To speed up data processing in certain situations, it might make sense to allow a dirty read. For example, when building computer telephony systems, a dirty read was a necessity on certain records because the overhead of locking was intolerable. For example, if the user was currently setting up his or her telephone extension and locked the record, it would have been unacceptable for an inbound caller to have to wait on the phone until the user was done.

In that situation, the dirty read was acceptable, but it would not be on a financial analysis system in which a session had critical information required to be updated to a record. A lock would be required to prevent the dirty read so that no new sessions could make decisions on the data until the update was complete. If you were about to issue an urgent sell order, you would not want other sessions to keep buying the shares in the time it takes to click the Commit button that updates the record.
Nonrepeatable read The nonrepeatable read is a situation a session finds itself in when it performs multiple reads on data for a specific purpose; only each time it reads the data before its processing is finished, the original variables it depends on for its analysis have changed. This is also known as an inconsistent analysis problem. In data centers, often reports and processing can be so complex that the analysis or processing takes hours, even the whole night or a weekend. The processor needs to lock the data from updates so that no variables can be changed before it has completed its task.
Expropriated data Expropriated data, often termed phantom data or phantom reads, can cause severe integrity problems for databases, even system crashes and failure. If locking is not enforced and access to a record is not denied for the most critical of applications, it is likely in a multiuser environment that a record you have opened in your application is moved or deleted before you are done with it. Now you have data that you can no longer save because you are in edit mode and the data you need to update no longer exists or has been moved to another location. I call this “expropriated data” because you cannot update or delete a record that no longer exists. This is one of the first issues I had to deal with when developing voice mail systems. A caller would try to leave a message for a user whom the administrator had already deleted from the system.

In situations where these problems cannot be tolerated, you have to turn to locking. And if the level of automated or default locking applied by SQL Server is insufficient or does not meet the solution requirements, you need to manually manage the locking process. Let’s now turn to SQL Server’s locking features.

Isolation

Even though more than one transaction is executing in SQL Server at any given time, the transactions execute in complete ignorance of each other; the primary benefit being performance. This transaction “traffic” is desirable because the very purpose of a client/ server database management system is to allow more than one user to have access to the database at the same time. But the isolation has a downside. It means that two or more transactions without knowledge of each other can collide when they are after the same data. A real-world analogy often amused me and my fellow New York University students in Washington Square Park. People would throw a chunk of bread into a crowd of pigeons and have a laugh on the ones that collided. When two pigeons grab the same piece of food, one of them, usually the weaker, will have to yield. If neither yields, you have a deadlock.

Table 17–1: Isolation Levels
Isolation Level	Dirty Read	Nonrepeatable Read	Phantom Read
Read Uncommitted	Possible	Possible	Possible
Read Committed	No	Possible	Possible
Read Repeatable	No	No	Possible
Serialized	No	No	No

So data integrity is assured by the ability of the DBMS to isolate transactions from each other, and the level of that isolation, in order to prevent the collisions. This is referred to as the transaction isolation level. The only way to absolutely guarantee that data being used by one transaction cannot be interfered with by another transaction is to give only one transaction access to the data. Then when that transaction is done with the data, the next transaction in line gets its chance. This is known as serialization. However, if all transactions were simply made to wait in a long line, then a transaction that only needed to read data would have to wait unnecessarily for the transactions ahead of it to complete. This would adversely affect the concurrency of data access and result in a slow system exhibiting poor performance.

It is thus possible to relax the serialization of transactions by adjusting the isolation level of the transactions-depending on the nature of the transactions using the data. An isolation scale is thus available that permits for a lower isolation level with higher data integrity risk on the one end of the scale and a higher isolation level with lower data integrity risk on the other end.

SQL Server follows the isolation level recommendations specified by the SQL-92 standard. These are defined in Table 17–1.

What is demonstrated in the table is that a transaction given a serialized isolation level is assured that no other transaction is reading or accessing the data. Lesser isolation levels relax the serialization of the transactions; transaction performance is increased, but data integrity is at more risk.