Lesson 1:Transaction and Locking Architecture | MCSE Training Kit (Exam 70-229): Microsoft SQL Server(TM) 2000 Database Design and Implementation (Pro Certification)

3 4

SQL Server uses transactions to process a set of Transact-SQL statements as a unit. As a transaction is executed, locks are used to prevent other users from accessing the data affected by that transaction. To support transactional processing, SQL Server contains a number of architectural components, including transaction logs, concurrency control, locks, and support for distributed queries. This lesson discusses each of these components and provides an overview of how they support transactions in SQL Server.

After this lesson, you will be able to:

Identify and describe the architectural components in SQL Server that support transactions.

Estimated lesson time: 30 minutes

Transaction Log Architecture

Every SQL Server database has a transaction log that records all transactions and the database modifications made by each transaction. This record of transactions and their modifications supports three operations:

Recovery of individual transactions. If an application issues a ROLLBACK statement or if SQL Server detects an error (such as the loss of communication with a client), the log records are used to roll back any modifications made during an incomplete transaction.
Recovery of all incomplete transactions when SQL Server is started. If a server running SQL Server fails, the databases might be left in a state where some modifications were never written from the buffer cache to the data files, and there might be some modifications from incomplete transactions in the data files. When a copy of SQL Server is started, it runs a recovery of each database. Every modification recorded in the log that was not written to the data files is rolled forward. Every incomplete transaction found in the transaction log is then rolled back to ensure that the integrity of the database is preserved.
Rolling a restored database forward to the point of failure. After the loss of a database, as is possible if a hard drive fails on a server that does not have a Redundant Array of Independent Disks (RAID), you can restore the database to the point of failure. You first restore the last full or differential database backup and then restore the sequence of transaction log backups to the point of failure. As you restore each log backup, SQL Server reapplies all of the modifications recorded in the log to roll forward all of the transactions. When the last log backup is restored, SQL Server then uses the log information to roll back all transactions that were not complete at that point.

The transaction log is not implemented as a table but as a separate file or set of files in the database. The log cache is managed separately from the buffer cache for data pages, resulting in simple, fast, and robust code within the database engine. The format of log records and pages is not constrained to follow the format of data pages.

You can implement the transaction log on several files. You can also define the files to autogrow as required, which reduces the potential of running out of space in the transaction log and reduces administrative overhead. The mechanism for truncating unused parts of the log is quick and has a minimal effect on transaction throughput.

Write-Ahead Transaction Log

SQL Server 2000, like many relational databases, uses a write-ahead log. A write-ahead log ensures that no data modifications are written to disk before the associated log record.

SQL Server maintains a buffer cache into which it reads data pages when data must be retrieved. Data modifications are not made directly to disk but are instead made to the copy of the page in the buffer cache. The modification is not written to disk until either the database is checkpointed or until the modifications are written to disk so that the buffer can be used to hold a new page. Writing a modified data page from the buffer cache to disk is called flushing the page. A page modified in the cache but not yet written to disk is called a dirty page.

At the time a modification is made to a page in the buffer, a log record is built into the log cache and records the modification. This log record must be written to disk before the associated dirty page is flushed from the buffer cache to disk. If the dirty page were flushed before the log record, it would create a modification on disk that could not be rolled back if the server failed before the log record was written to disk. SQL Server has logic that prevents a dirty page from being flushed before the associated log record. Because log records are always written ahead of the associated data pages, the log is called a write-ahead log.

Transaction Log Logical Architecture

The SQL Server transaction log operates logically as if it is a serial string of log records. A log sequence number (LSN) identifies each log record. Each new log record is written to the logical end of the log with an LSN higher than the LSN of the record before it.

Log records for data modifications record either the logical operation performed or record before-and-after images of the modified data. A before image is a copy of the data before the operation is performed; an after image is a copy of the data after the operation has been performed.

Many types of operations are recorded in the transaction log:

The start and end of each transaction
Every data modification (insert, update, or delete)
Every extent allocation or deallocation
The creation or dropping of a table or index

Log records are stored in a serial sequence as they are created. Each log record is stamped with the ID of the transaction to which it belongs. For each transaction, all log records associated with the transaction are singly linked in a chain using backward pointers that speed the rollback of the transaction.

Rollback statements are also logged. Each transaction reserves space on the transaction log to ensure that enough log space exists to support a rollback if an error is encountered. This reserve space is freed when the transaction is completed. The amount of space reserved depends on the operations performed in the transaction but is generally equal to the amount of space used to log each oper-ation.

Checkpoints and the Active Portion of the Log

Checkpoints minimize the portion of the log that must be processed during the full recovery of a database. During a full recovery, you must perform two types of actions:

The log might contain records of modifications that were not flushed to disk before the system stopped. These modifications must be rolled forward.
All of the modifications associated with incomplete transactions (transactions for which there is no COMMIT or ROLLBACK log record) must be rolled back.

Checkpoints flush dirty data and log pages from the buffer cache of the current database, minimizing the number of modifications that have to be rolled forward during a recovery. A checkpoint writes to the log file a record marking the start of the checkpoint and stores information recorded for the checkpoint in a chain of checkpoint log records. The LSN of the start of this chain is written to the database boot page.

Checkpoints occur for the following events:

When a CHECKPOINT statement is executed
When an ALTER DATABASE statement is used to change a database option
When an instance of SQL Server is stopped by executing a SHUTDOWN statement or by using the SQL Server Service Control Manager to stop the service from running an instance of the database engine
When an instance of SQL Server periodically generates automatic checkpoints in each database in order to reduce the amount of time that the instance would take to recover the database

SQL Server 2000 always generates automatic checkpoints. The interval between automatic checkpoints is based on the number of records in the log, not on the amount of time. The time interval between automatic checkpoints can vary greatly. The time interval can be long if few modifications are made in the database. Automatic checkpoints occur frequently if a considerable amount of data is modified.

Truncating the Transaction Log

If log records were never deleted from the transaction log, the logical log would grow until it filled all of the available space on the disks that hold the physical log files. At some point in time, old log records no longer necessary for recovering or restoring a database must be deleted to make way for new log records. The process of deleting these log records to reduce the size of the logical log is called truncating the log.

The active portion of the transaction log can never be truncated. The active portion is needed to recover the database at any time, so you must have the log images needed to roll back all incomplete transactions. The log images must always be present in the database in case the server fails, because the images are required to recover the database when the server is restarted. The record at the start of the active portion of the log is identified by the minimum recovery log sequence number (MinLSN).

Log truncation occurs at the completion of any BACKUP LOG statement and occurs every time a checkpoint is processed, provided the database is using the simple recovery model.

Transaction Log Physical Architecture

The transaction log in a database maps over one or more physical files. Conceptually, the log file is a serial string of log records. Physically, the sequence of log records must be stored efficiently in the set of physical files that implement the transaction log.

SQL Server 2000 segments each physical log file internally into a number of virtual log files. Virtual log files have no fixed size, and there is no fixed number of virtual log files for a physical log file. SQL Server chooses the size of the virtual log files dynamically while creating or extending log files. SQL Server tries to maintain a small number of virtual files. The size of the virtual files after a log file name extension is based on the size of the existing log and the size of the new file increment. The size or number of virtual log files cannot be configured or set by administrators; rather, it is determined dynamically by the SQL Server code.

The only time virtual log files affect system performance is if the log files are defined with small size and growth_increment values. If these log files grow to a large size through many small increments, they will have a lot of virtual log files, which can slow down recovery. The recommendation is for log files to be defined with a size value that is close to the final size needed. Also, they should have a relatively large growth_increment value.

Shrinking the Transaction Log

The physical size of the log file is reduced when a DBCC SHRINKDATABASE statement is executed, when a DBCC SHRINKFILE statement referencing a log file is executed, or when an autoshrink operation occurs.

Shrinking a log depends first on truncating the log. Log truncation does not reduce the size of a physical log file; instead, it reduces the size of the logical log and marks as inactive the virtual logs that do not hold any part of the logical log. A log shrink operation removes enough inactive virtual logs to reduce the log file to the requested size.

The unit of size reduction is a virtual log. For example, if you have a 600 MB log file that has been divided into six 100 MB virtual logs, the size of the log file can only be reduced in 100 MB increments. The file size can be reduced to sizes such as 500 MB or 400 MB, but it cannot be reduced to sizes such as 433 MB or 525 MB.

Virtual logs that hold part of the logical log cannot be freed. If all the virtual logs in a log file hold parts of the logical log, the file cannot be shrunk until a truncation marks one or more of the virtual logs at the end of the physical log as inactive.

When any file is shrunk, the space freed must come from the end of the file. When a transaction log file is shrunk, enough virtual logs from the end of the file are freed to reduce the log to the size that the user requested. The target_size specified by the user is rounded to the next-highest virtual log boundary. For example, if a user specifies a target_size of 325 MB for our sample 600 MB file with 100 MB virtual log files, the last two virtual log files are removed. The new file size is 400 MB.

In SQL Server 2000, a DBCC SHRINKDATABASE or DBCC SHRINKFILE operation attempts to shrink the physical log file to the requested size immediately (subject to rounding) if the following conditions are met:

If no part of the logical log is in the virtual logs beyond the target_size mark, the virtual logs after the target_size mark are freed, and the successful DBCC statement is completed with no messages.
If part of the logical log is in the virtual logs beyond the target_size mark, SQL Server 2000 frees as much space as possible and issues an informational message. The message tells you what actions you need to perform to get the logical log out of the virtual logs at the end of the file. After you perform this action, you can then reissue the DBCC statement to free the remaining space.

In the following statement, DBCC SHRINKFILE is used to reduce the TestDB_Log file in the TestDB database to 1 MB:

 USE TestDB GO DBCC SHRINKFILE (TestDB_Log, 1) GO

Concurrency Architecture

When many people attempt to modify data in a database at the same time, a system of controls must be implemented so that modifications made by one person do not adversely affect those of another person. This process is referred to as concurrency control.

Two classifications exist for instituting concurrency control:

Pessimistic concurrency control. A system of locks prevents users from modifying data in a way that affects other users. After a user performs an action that causes a lock to be applied, other users cannot perform actions that would conflict with the lock until the owner releases it. This process is called pessimistic control because it is mainly used in environments where there is high contention for data.
Optimistic concurrency control. In optimistic concurrency control, users do not lock data when they read it. When an update is performed, the system checks to see whether another user changed the data after it was read. If another user updated the data, an error occurs. Typically, the user who receives the error rolls back the transaction and starts again. This situation is called optimistic because it is mainly used in environments where there is low contention for data.

SQL Server 2000 supports a wide range of optimistic and pessimistic concurrency control mechanisms. Users specify the type of concurrency control by specifying a transaction isolation level for a connection and concurrency options on cursors. These attributes can be defined by using either Transact-SQL statements or through the properties and attributes of database application programming interfaces (APIs) such as ADO, OLE DB, and ODBC.

Locking Architecture

A lock is an object that software uses to indicate that a user has some dependency on a resource. The software does not allow other users to perform operations on the resource that would adversely affect the dependencies of the user who owns the lock. Locks are managed internally by system software and are acquired and released based on actions that the user takes.

SQL Server 2000 uses locks to implement pessimistic concurrency control among multiple users who are performing modifications in a database at the same time. By default, SQL Server manages both transactions and locks on a per-connection basis. For example, if an application opens two SQL Server connections, locks acquired by one connection cannot be shared with the other connection. Neither connection can acquire locks that would conflict with locks held by the other connection. Only bound connections are not affected by this rule.

SQL Server locks are applied at various levels of granularity in the database. Locks can be acquired on rows, pages, keys, ranges of keys, indexes, tables, or databases. SQL Server dynamically determines the appropriate level at which to place locks for each Transact-SQL statement. The level at which locks are acquired can vary for different objects referenced by the same query. For example, one table might be very small and have a table lock applied, while another larger table might have row locks applied. The level at which locks are applied does not have to be specified by users and needs no configuration by administrators. Each instance of SQL Server ensures that locks granted at one level of granularity respect locks granted at another level.

Several lock modes exist: shared, update, exclusive, intent, and schema. The lock mode indicates the level of dependency that the connection has on the locked object. SQL Server controls how the lock modes interact. For example, an exclusive lock cannot be obtained if other connections hold shared locks on the resource.

Locks are held for the length of time needed to protect the resource at the level requested.

If a connection attempts to acquire a lock that conflicts with a lock held by another connection, the connection attempting to acquire the lock is blocked until one of the following events occurs:

The conflicting lock is freed and the connection acquires the lock that it requested.
The timeout interval for the connection expires. By default, there is no timeout interval, but some applications set a timeout interval to prevent an indefinite wait.

If several connections become blocked while waiting for conflicting locks on a single resource, the locks are granted on a first-come, first-served basis as the preceding connections free their locks.

SQL Server has an algorithm to detect deadlocks, a condition in which two connections have blocked each other. If an instance of SQL Server detects a deadlock, it will terminate one transaction, allowing the other to continue.

SQL Server might dynamically escalate or de-escalate the granularity or type of locks. For example, if an update acquires a large number of row locks and has locked a significant percentage of a table, the row locks are escalated to a table lock. If a table lock is acquired, the row locks are released. SQL Server 2000 rarely needs to escalate locks; Query Optimizer usually chooses the correct lock granularity at the time the execution plan is compiled.

Distributed Transaction Architecture

Distributed transactions are transactions that involve resources from two or more sources. SQL Server 2000 supports distributed transactions, allowing users to create transactions that update multiple SQL Server databases and other sources of data.

There are several ways in which applications can include SQL Server 2000 in a distributed transaction:

If an application has a local transaction and issues a distributed query, the local transaction is escalated to a distributed transaction.
A BEGIN DISTRIBUTED TRANSACTION statement can be issued.
If an application has a local transaction and the option REMOTE_PROC_TRANSACTIONS is set to ON, calling a remote stored procedure escalates the local transaction to a distributed transaction.
Applications using the Microsoft OLE DB Provider for SQL Server or the SQL Server ODBC driver can use OLE DB methods or ODBC functions to have a SQL Server connection join a distributed transaction that the application started.

Exercise 1: Accessing and Modifying the Transaction Log

In this exercise, you will view the properties of the BookShopDB database and change the size of the transaction log. You will then use a Transact-SQL statement to reduce the size of the log file. To perform this exercise, you should be logged into your Windows 2000 Server computer as Administrator.

To view the properties of the BookShopDB database and change the size of the transaction log

Open Enterprise Manager.
Expand the console tree until the BookShopDB is displayed.
Right-click the BookShopDB node and click Properties.

The BookShopDB Properties dialog box appears.

Click the Transaction Log tab.

Notice that the space allocated for the transaction log is 1 MB and that the file is set to automatically grow by 10 percent.

In the Space Allocated (MB) box of the BookShopDB_Log row, change 1 to 3.
Click OK.

The transaction log is allocated 3 MB of space.

Leave Enterprise Manager open for a later procedure.

To shrink the transaction log

Open Query Analyzer and connect to your local server.
In the Editor pane of the Query window, enter the following Transact-SQL code:

 USE BookShopDB GO DBCC SHRINKFILE (BookShopDB_Log, 1) GO

In this statement, you are shrinking the size of the BookShopDB_Log file in the BookShopDB database from 3 MB to 1 MB.

Execute the Transact-SQL statement.

Information about the log file is displayed in the Grids tab of the Results pane.

Close Query Analyzer.

To view the properties of the BookShopDB database and the transaction log

In Enterprise Manager, right-click the BookShopDB node and click Properties.

The BookShopDB Properties dialog box appears.

Click the Transaction Log tab.

Notice that the space allocated for the transaction log is once again 1 MB.

Close Enterprise Manager.

Lesson Summary

To support transactional processing, SQL Server contains a number of architectural components, including transaction logs, concurrency control, locks, and support for distributed queries. Every SQL Server database has a transaction log that records all transactions and the database modifications made by each transaction. This record of transactions and their modifications supports three operations: recovery of individual transactions, recovery of all incomplete transactions when SQL Server is started, and rolling a restored database forward to the point of failure. SQL Server 2000, like many relational databases, uses a write-ahead log to ensure that no data modifications are written to disk before the associated log record. The SQL Server transaction log operates logically as if it is a serial string of log records. The physical size of the log file is reduced when a DBCC SHRINKDATABASE statement is executed, when a DBCC SHRINKFILE statement referencing a log file is executed, or when an autoshrink operation occurs. In SQL Server, a system of controls is implemented so that modifications made by one person do not adversely affect those of another. SQL Server 2000 uses locks to implement pessimistic concurrency control among multiple users who are performing modifications in a database at the same time. SQL Server locks are applied at various levels of granularity in the database. Distributed transactions are transactions that involve resources from two or more sources. SQL Server 2000 supports distributed transactions, enabling users to create transactions that update multiple SQL Server databases and other sources of data.