The SQL Server Engine

Figure 3-1 shows the general architecture of SQL Server. For simplicity, I've made some minor omissions and simplifications and ignored certain "helper" modules.

Now let's look in detail at the major modules.

The Net-Library

The Net-Library (often called Net-Lib, but in this book I'll use Net-Library) abstraction layer enables SQL Server to read from and write to many different network protocols, and each such protocol (such as TCP/IP sockets) can have a specific driver. The Net-Library layer makes it relatively easy to support many different network protocols without having to change the core server code.

A Net-Library is basically a driver that's specific to a particular network interprocess communication (IPC) mechanism. (Be careful not to confuse driver with device driver.) All code in SQL Server, including Net-Library code, makes calls only to the Microsoft Win32 subsystem. SQL Server uses a common internal interface between Microsoft Open Data Services (ODS)—which manages its use of the network—and each Net-Library. If your development project needs to support a new and different network protocol, you can handle all network-specific issues by simply writing a new Net-Library. In addition, you can load multiple Net-Libraries simultaneously, one for each network IPC mechanism in use.

SQL Server uses the Net-Library abstraction layer on both the server and client machines, making it possible to support several clients simultaneously on different networks. Microsoft Windows NT/2000 and Windows 98 support the simultaneous use of multiple protocol stacks. Net-Libraries are paired. For example, if a client application is using a Named Pipes Net-Library, SQL Server must also be listening on a Named Pipes Net-Library. The client application determines which Net-Library is actually used for the communication, and you can control the client application's choice by using a tool called the Client Network Utility. You can easily configure SQL Server to listen on multiple Net-Libraries by using the Server Network Utility, which is available under Programs\Microsoft SQL Server on the Start menu.

SQL Server 2000 has two primary Net-Libraries: Super Socket and Shared Memory. TCP/IP, Named Pipes, IPX/SPX, and so on are referred to as secondary Net-Libraries. The OLE DB Provider for SQL Server, SQL Server ODBC driver, DB-Library, and the database engine communicate directly with these two primary network libraries. Intercomputer connections communicate through the Super Socket Net-Library. Local connections between an application and a SQL Server instance on the same computer use the Shared Memory Net-Library if Shared Memory support has been enabled (which it is, by default). SQL Server 2000 supports the Shared Memory Net-Library on all Windows platforms.

click to view at full size.

Figure 3-1. The major components of the SQL Server architecture.

The Super Socket Net-Library has two components:

  • Communication path If the client is configured to communicate over TCP/IP Sockets or NWLink IPX/SPX connection, the Super Socket Net-Library directly calls the Windows Socket 2 API for the communication between the application and the SQL Server instance.

    If the client is configured to communicate over a Named Pipes, Multiprotocol, AppleTalk, or Banyan VINES connection, a subcomponent of the Super Socket Net-Library called the Net-Library router loads the secondary Net-Library for the chosen protocol and routes all Net-Library calls to it.

  • Encryption layer The encryption is implemented using the Secure Sockets Layer (SSL) API. The level of encryption, 40-bit or 128-bit, depends on the Windows version on the application and SQL Server-based computers. Enabling encryption can slow network performance not only because of the extra work of encrypting and decrypting communication between the client and the server but also because an extra roundtrip is required between the client and the server every time a connection is made.

    Shared Memory Net-Library communication is inherently secure without the need for encryption. The Shared Memory Net-Library never participates in intercomputer communication. The area of memory shared between the application process and the database engine process cannot be accessed from any other Windows process.

For compatibility with earlier versions of SQL Server, the Multiprotocol Net-Library continues to support its own encryption. This encryption is specified independently of the SSL encryption and is implemented by calling the Windows RPC encryption API.

Figure 3-2 (taken from SQL Server Books Online) shows SQL Server 2000's Net-Library architecture.

The distinction between the IPC mechanisms and the underlying network protocols is important. IPC mechanisms used by SQL Server include Named Pipes, RPC, SPX, and Windows Sockets. Network protocols used include TCP/IP, NetBEUI, Shared Memory, NWLink IPX/SPX, Banyan VINES SPP, and AppleTalk ADSP. Two Net-Libraries, Multiprotocol and Named Pipes, can be used simultaneously over multiple network protocols (NetBEUI, NWLink IPX/SPX, and TCP/IP). You can have multiple network protocols in your environment and still use only one Net-Library.

SQL Server 2000 running on Windows NT/2000 supports impersonation of security contexts to provide an integrated logon authentication capability called Windows Authentication. Windows Authentication operates over network protocols that support authenticated connections between clients and servers. Such connections are referred to as trusted connections and are supported by SQL Server 2000 using any available Net-Library. Instead of requiring a separate user ID/password logon each time a connection to SQL Server is requested, SQL Server can impersonate the security context of the user running the application that requests the connection. If that user has sufficient privileges (or is part of a Windows NT/2000 domain group that does), the connection is established. Note that Windows Authentication is not available when SQL Server is running on Windows 98. When you connect to SQL Server running on Windows 98, you must specify a SQL Server logon ID and password.

click to view at full size.

Figure 3-2. SQL Server 2000's Net-Library Architecture.

Which Net-Library Is Fastest?


Strictly speaking, the TCP/IP Sockets Net-Library is the fastest Net-Library. In a pure network test that does nothing except throw packets back and forth between Net-Library pairs, the TCP/IP Sockets Net-Library is perhaps 30 percent faster than the slowest Net-Library. But in LAN environments and applications, the speed of the Net-Library probably makes little difference because the network interface is generally not a limiting factor in a well-designed application.

On a LAN, however, turning on encryption does cause a performance hit. But again, most applications probably wouldn't notice the difference. Your best bet is to choose the Net-Library that matches your network protocols and provides the services you need in terms of unified logon, encryption, and dynamic name resolution. (I'll explain these choices further in Chapter 4.)

Open Data Services

Open Data Services (ODS) functions as the client manager for SQL Server; it's basically an interface between server Net-Libraries and server-based applications, including SQL Server. ODS manages the network: it listens for new connections, cleans up failed connections, acknowledges "attentions" (cancellations of commands), coordinates threading services to SQL Server, and returns result sets, messages, and status values back to the client.

SQL Server clients and the server speak a private protocol known as Tabular Data Stream (TDS). TDS is a self-describing data stream. In other words, it contains tokens that describe column names, datatypes, events (such as cancellations), and status values in the "conversation" between client and server. The server notifies the client that it is sending a result set, indicates the number of columns and datatypes of the result set, and so on—all encoded in TDS. Neither clients nor servers write directly to TDS. Instead, the open interfaces of OLE-DB, ODBC, and DB-Library at the client emit TDS using a client implementation of the Net-Library.

ODS accepts new connections, and if a client unexpectedly disconnects (for example, if a user reboots the client computer instead of cleanly terminating the application), ODS automatically frees resources such as locks held by that client.

You can use the ODS open interface to extend the functionality of SQL Server by writing extended stored procedures. The code for the ODS API itself was formerly stored outside of SQL Server as part of the DLL opends60.dll. In SQL Server 2000, to enhance the performance of SQL Server's own internal mechanisms that need to use presupplied extended procedures, the code that makes up ODS is internal to the SQL Server engine. Any extended stored procedure DLLs that you create must link against opends60.dll, which contains stubs to access the real ODS routines internal to SQL Server.

ODS input and output buffers

After SQL Server puts result sets into a network output buffer (write buffer) that's equal in size to the configured packet size, the Net-Library dispatches the buffer to the client. The first packet is sent as soon as the network output buffer is full or, if an entire result set fits in one packet, when the batch is completed. (A batch is one or more commands sent to SQL Server to be parsed and executed together. For example, if you're using OSQL.EXE, a batch is the collection of all the commands that appear before a specific GO command.) In some exceptional operations (such as one that provides progress information for database dumping or provides DBCC messages), the output buffer is flushed and sent even before it is full or before the batch completes.

SQL Server has two input buffers (read buffers) and one output buffer per client. Double-buffering is needed for the reads because while SQL Server reads a stream of data from the client connection, it must also look for a possible attention. (This allows that "Query That Ate Cleveland" to be canceled directly from the issuer. Although the ability to cancel a request is extremely important, it's relatively unusual among client/server products.) Attentions can be thought of as "out-of-band" data, although they can be sent with network protocols that do not explicitly have an out-of-band channel. The SQL Server development team experimented with double-buffering and asynchronous techniques for the output buffers, but these didn't improve performance substantially. The single network output buffer works nicely. Even though the writes are not posted asynchronously, SQL Server doesn't need to bypass the operating system caching for these as it does for writes to disk.

Because the operating system provides caching of network writes, write operations appear to complete immediately with no significant latency. If, however, several writes are issued to the same client and the client is not currently reading data from the network, the network cache eventually becomes full and the write blocks. This is essentially a throttle. As long as the client application is processing results, SQL Server has a few buffers queued up and ready for the client connection to process. But if the client's queue is already stacked up with results and is not processing them, SQL Server stalls sending them and the network write operation to that connection has to wait. Since the server has only one output buffer per client, data cannot be sent to that client connection until it reads information off the network to free up room for the write to complete. (Writes to other client connections are not held up, however; only those for the laggard client are affected.)

Stalled network writes can also affect locks. For example, if READ COMMITTED isolation is in effect (the default), a share lock can normally be released after SQL Server has completed its scan of that page of data. (Exclusive locks used for changing data must always be held until the end of the transaction to ensure that the changes can be rolled back.) However, if the scan finds more qualifying data and the output buffer is not free, the scan stalls. When the previous network write completes, the output buffer becomes available and the scan resumes. But as I stated earlier, that write won't complete until the client connection "drains" (reads) some data to free up some room in the pipe (the virtual circuit between SQL Server and client connection).

If a client connection delays processing results that are sent to it, concurrency issues can result because locks are held longer than they otherwise would be. A kind of chain reaction occurs: if the client connection has not read several outstanding network packets, further writing of the output buffer at the SQL Server side must wait because the pipe is full. Since the output buffer is not available, the scan for data might also be suspended because no space is available to add qualifying rows. Since the scan is held up, any lock on the data cannot be released. In short, if a client application does not process results in a timely manner, database concurrency can suffer.

The size of the network output buffer can also affect the speed at which the client receives the first result set. As mentioned earlier, the output buffer is sent when the batch, not simply the command, is done, even if the buffer is not full. If two queries exist in the same batch and the first query has only a small amount of data, its results are not sent back to the client until the second query is done or has supplied enough data to fill the output buffer. If both queries are fast, waiting for both queries to finish is not a problem. But suppose the first query is fast and the second is slow. And suppose the first query returns 1000 bytes of data. If the network packet size is 4096 bytes, the first result set must wait in the output buffer for the second query to fill it. The obvious solution here is to either make the first command its own batch or make the network packet size smaller. The first solution is probably best in this case, since it is typically difficult to fine-tune your application to determine the best buffer size for each command. But this doesn't mean that each command should be its own batch. Quite the contrary. In fact, under normal circumstances, grouping multiple commands into a single batch is most efficient and is recommended because it reduces the amount of handshaking that must occur between client and server.

ODS default Net-Libraries

On the server side, ODS provides functionality that mirrors that of ODBC, OLE DB, or DB-Library at the client. Calls exist for an ODS server application to describe and send result sets, to convert values between datatypes, to assume the security context associated with the specific connection being managed, and to raise errors and messages to the client application.

ODS uses an event-driven programming model. Requests from servers and clients trigger events that your server application must respond to. Using the ODS API, you create a custom routine, called an event handler, for each possible type of event. Essentially, the ODS library drives a server application by calling its custom event handlers in response to incoming requests.

ODS server applications respond to the following events:

  • Connect events When a connect event occurs, SQL Server initiates a security check to determine whether a connection is allowed. Other ODS applications, such as a gateway to DB/2, have their own logon handlers that determine whether connections are allowed. Events also exist that close a connection, allowing the proper connection cleanup to occur.
  • Language events When a client sends a command string, such as an SQL statement, SQL Server passes this command along to the command parser. A different ODS application, such as a gateway, would install its own handler that accepts and is responsible for execution of the command.
  • Remote stored procedure events These events occur each time a client or SQL Server directs a remote stored procedure request to ODS for processing.

ODS also generates events based on certain client activities and application activities. These events allow an ODS server application to respond to changes to the status of the client connection or of the ODS server application.

In addition to handling connections, ODS manages threads (and fibers) for SQL Server. It takes care of thread creation and termination and makes the threads available to the User Mode Scheduler (UMS). Since ODS is an open interface with a full programming API and toolkit, independent software vendors (ISVs) writing server applications with ODS get the same benefits that SQL Server derives from this component, including SMP-capable thread management and pooling, as well as network handling for multiple simultaneous networks. This multithreaded operation enables ODS server applications to maintain a high level of performance and availability and to transparently use multiple processors under Windows NT/2000 because the operating system can schedule any thread on any available processor.

The Relational Engine and the Storage Engine

The SQL Server database engine is made up of two main components, the relational engine and the storage engine. These two pieces are clearly separated, and their primary method of communication with each other is through OLE DB. The relational engine comprises all the components necessary to parse and optimize any query. It also manages the execution of queries as it requests data from the storage engine in terms of OLE DB row sets and then processes the row sets returned. (Row set is the OLE DB term for a result set.) The storage engine comprises the components needed to actually access and modify data on disk.

The command parser

The command parser handles language events raised by ODS. It checks for proper syntax and translates Transact-SQL commands into an internal format that can be operated on. This internal format is known as a query tree. If the parser doesn't recognize the syntax, a syntax error is immediately raised and identifies where the error occurred. However, non-syntax error messages cannot be explicit about the exact source line that caused the error. Because only the command parser can access the source of the statement, the statement is no longer available in source format when the command is actually executed.

The optimizer

The optimizer takes the query tree from the command parser and prepares it for execution. This module compiles an entire command batch, optimizes queries, and checks security. The query optimization and compilation result in an execution plan.

The first step in producing such a plan is to normalize each query, which potentially breaks down a single query into multiple, fine-grained queries. After the optimizer normalizes a query, it optimizes it, which means that the optimizer determines a plan for executing that query. Query optimization is cost-based; the optimizer chooses the plan that it determines would cost the least based on internal metrics that include estimated memory requirements, estimated CPU utilization, and the estimated number of required I/Os. The optimizer considers the type of statement requested, checks the amount of data in the various tables affected, looks at the indexes available for each table, and then looks at a sampling of the data values kept for each index or column referenced in the query. The sampling of the data values is called distribution statistics and will be discussed in detail in Chapter 15. Based on the available information, the optimizer considers the various access methods and join strategies it could use to resolve a query and chooses the most cost-effective plan. The optimizer also decides which indexes, if any, should be used for each table in the query, and, in the case of a multitable query, the order in which the tables should be accessed and the join strategy to use.

The optimizer also uses pruning heuristics to ensure that more time isn't spent optimizing a query than it would take to simply choose a plan and execute it. The optimizer doesn't necessarily do exhaustive optimization. Some products consider every possible plan and then choose the most cost-effective one. The advantage of this exhaustive optimization is that the syntax chosen for a query would theoretically never cause a performance difference, no matter what syntax the user employed. But if you deal with an involved query, it could take much longer to estimate the cost of every conceivable plan than it would to accept a good plan, even if not the best one, and execute it.

After normalization and optimization are completed, the normalized tree produced by those processes is compiled into the execution plan, which is actually a data structure. Each command included in it specifies exactly which table will be affected, which indexes will be used (if any), which security checks must be made, and which criteria (such as equality to a specified value) must evaluate to TRUE for selection. This execution plan might be considerably more complex than is immediately apparent. In addition to the actual commands, the execution plan includes all the steps necessary to ensure that constraints are checked. Steps for calling a trigger are a bit different from those for verifying constraints. If a trigger is included for the action being taken, a call to the procedure that comprises the trigger is appended. If the trigger is an instead-of trigger, the call to the trigger's plan replaces the actual data modification command. For after triggers, the trigger's plan is branched to right after the plan for the modification statement that fired the trigger, before that modification is committed. The specific steps for the trigger are not compiled into the execution plan, unlike those for constraint verification.

A simple request to insert one row into a table with multiple constraints can result in an execution plan that requires many other tables to also be accessed or expressions to be evaluated. The existence of a trigger can also cause many additional steps to be executed. The step that carries out the actual INSERT statement might be just a small part of the total execution plan necessary to ensure that all actions and constraints associated with adding a row are carried out.

For more details on query processing and optimization, see Chapter 15.

The SQL manager

The SQL manager is responsible for everything having to do with managing stored procedures and their plans. It determines when a stored procedure needs recompilation based on changes in the underlying objects' schemas, and it manages the caching of procedure plans so that other processes can reuse them.

The SQL manager also handles autoparameterization of queries. In SQL Server 2000, certain kinds of ad hoc queries are treated as if they were parameterized stored procedures, and query plans are generated and saved for them. This can happen if a query uses a simple equality comparison against a constant, as in the following statement:

 SELECT * FROM pubs.dbo.titles WHERE type = 'business' 

This query can be parameterized as if it were a stored procedure with a parameter for the value of type:

 SELECT * FROM pubs.dbo.titles WHERE type = @param 

A subsequent query, differing only in the actual value used for the value of type, can use the same query plan that was generated for the original query. I'll discuss the details of autoparameterization in Chapter 15.

The expression manager

The expression manager handles computation, comparison, and data movement. Suppose your query contains an expression like this one:

 SELECT @myqty = qty * 10 FROM mytable 

The expression manager copies the value of qty from the row set returned by the storage engine, multiplies it by 10, and stores the result in @myqty.

The query executor

The query executor runs the execution plan that the optimizer produced, acting as a dispatcher for all the commands in the execution plan. This module loops through each command step of the execution plan until the batch is complete. Most of the commands require interaction with the storage engine to modify or retrieve data and to manage transactions and locking.

Communication between the relational engine and the storage engine

The relational engine uses OLE DB for most of its communication with the storage engine. The following description of that communication is adapted from the section titled "Database Engine Components" in SQL Server Books Online. It describes how a SELECT statement that processes data from local tables only is processed:

  1. The relational engine compiles the SELECT statement into an optimized execution plan. The execution plan defines a series of operations against simple OLE DB row sets from the individual tables or indexes referenced in the SELECT statement. The row sets requested by the relational engine return the amount of data needed from a table or index to perform one of the operations used to build the SELECT result set. For example, this SELECT statement requires a table scan if it references a table with no indexes:

     SELECT * FROM Northwind.dbo.ScanTable 

    The relational engine implements the table scan by requesting one row set containing all the rows from ScanTable. This next SELECT statement needs only information available in an index:

     SELECT DISTINCT LastName  FROM Northwind.dbo.Employees 

    The relational engine implements the index scan by requesting one row set containing the leaf rows from the index that was built on the LastName column. The following SELECT statement needs information from two indexes:

     SELECT CompanyName, OrderID, ShippedDate  FROM Northwind.dbo.Customers AS Cst  JOIN Northwind.dbo.Orders AS Ord  ON (Cst.CustomerID = Ord.CustomerID) 

    The relational engine requests two row sets: one for the nonclustered index on Customers and the other for one of the clustered indexes on Orders.

  2. The relational engine uses the OLE DB API to request that the storage engine open the row sets.
  3. As the relational engine works through the steps of the execution plan and needs data, it uses OLE DB to fetch the individual rows from the row sets it asked the storage engine to open. The storage engine transfers the data from the data buffers to the relational engine.
  4. The relational engine combines the data from the storage engine row sets into the final result set transmitted back to the user.

Not all communication between the relational engine and the storage engine uses OLE DB. Some commands cannot be expressed in terms of OLE DB row sets. The most obvious and common example is when the relational engine processes data definition language (DDL) requests to create a table or other SQL Server object.

The Access Methods Manager

When SQL Server needs to locate data, it calls the access methods manager. The access methods manager sets up and requests scans of data pages and index pages and prepares the OLE DB row sets to return to the relational engine. It contains services to open a table, retrieve qualified data, and update data. The access methods manager doesn't actually retrieve the pages; it makes the request to the buffer manager, which ultimately serves up the page already in its cache or reads it to cache from disk. When the scan is started, a look-ahead mechanism qualifies the rows or index entries on a page. The retrieving of rows that meet specified criteria is known as a qualified retrieval. The access methods manager is employed not only for queries (selects) but also for qualified updates and deletes (for example, UPDATE with a WHERE clause).

A session opens a table, requests and evaluates a range of rows against the conditions in the WHERE clause, and then closes the table. A session descriptor data structure (SDES) keeps track of the current row and the search conditions for the object being operated on (which is identified by the object descriptor data structure, or DES).

The Row Operations Manager and the Index Manager

You can consider the row operations manager and the index manager as components of the access methods manager because they carry out the actual method of access. Each is responsible for manipulating and maintaining its respective on-disk data structures, namely rows of data or B-tree indexes. They understand and manipulate information on data and index pages.

The row operations manager

The row operations manager retrieves, modifies, and performs operations on individual rows. (For more information about on-disk structures, see Chapter 5; for more on data page and row format, see Chapter 6.) It performs an operation within a row, such as "retrieve column 2" or "write this value to column 3." As a result of the work performed by the access methods manager, as well as by the lock manager and transaction manager, which will be discussed shortly, the row will have been found and will be appropriately locked and part of a transaction. After formatting or modifying a row in memory, the row operations manager inserts or deletes a row.

The row operations manager also handles updates. SQL Server 2000 offers three methods for handling updates. All three are direct, which means that there's no need for two passes through the transaction log, as was the case with deferred updates in versions of SQL Server prior to SQL Server 7. SQL Server 2000 has no concept of a deferred data modification operation.

SQL Server 2000 has three update modes:

  • In-place mode This mode is used to update a heap or clustered index when none of the clustering keys change. The update can be done in place, and the new data is written to the same slot on the data page.
  • Split mode This mode is used to update nonunique indexes when the index keys change. The update is split into two operations—a delete followed by an insert—and these operations are performed independently of each other.
  • Split with collapse mode This mode is used to update a unique index when the index keys change. After the update is rewritten as delete and insert operations, if the same index key is both deleted and then reinserted with a new value, the delete and insert are "collapsed" into a single update operation.

In Chapter 9, we'll look at examples of each type of update.

The index manager

The index manager maintains and supports searches on B-trees, which are used for SQL Server indexes. An index is structured as a tree, with a root page and intermediate-level and lower-level pages (or branches). A B-tree groups records that have similar index keys, thereby allowing fast access to data by searching on a key value. The B-tree's core feature is its ability to balance the index tree. (B stands for balanced.) Branches of the index tree are spliced together or split apart as necessary so that the search for any given record always traverses the same number of levels and thus requires the same number of page accesses.

The traversal begins at the root page, progresses to intermediate index levels, and finally moves to bottom-level pages called leaf pages. The index is used to find the correct leaf page. On a qualified retrieval or delete, the correct leaf page is the lowest page of the tree at which one or more rows with the specified key or keys reside. SQL Server supports both clustered and nonclustered indexes. In a nonclustered index, shown in Figure 3-3, the leaf level of the tree (the leaf pages of the index) contains every key value in the index along with a bookmark for each key value. The bookmark indicates where to find the referenced data and can have one of two forms, depending on whether the base table has a clustered index. If the base table has no clustered index, the table is referred to as a heap. The bookmarks in nonclustered index leaf pages for a heap are pointers to the actual records in which the data can be found, and these pointers consist of a row ID (RID), which is a file number, a page number, and a row number on the page. If the base table has a clustered index, the bookmark in any nonclustered index leaf page contains the clustered index key value for the row.

After reaching the leaf level in a nonclustered index, you can find the exact location of the data, although you still must separately retrieve the page on which that data resides. Because you can access the data directly, you don't need to scan all the data pages to find a qualifying row. Better yet, in a clustered index, shown in Figure 3-4, the leaf level actually contains the full data rows, not simply the index keys. A clustered index keeps the data in a table logically ordered around the key of the clustered index, and the leaf pages of a clustered index are in fact the data pages of the table. All the data pages of a table with a clustered index are linked together in a doubly linked list. Following the pages, and the rows on those pages, from the first page to the last page provides the logical order to the data.

Because data can be ordered in only one way, only one clustered index can exist per table. This makes the selection of the appropriate key value on which to cluster data an important performance consideration.

You can also use indexes to ensure the uniqueness of a particular key value. In fact, the PRIMARY KEY and UNIQUE constraints on a column work by creating a unique index on the column's values. The optimizer can use the knowledge that an index is unique in formulating an effective query plan. Internally, SQL Server always ensures that clustered indexes are unique by adding a 4-byte uniqueifier to clustered index key values that occur more than once. This uniqueifier becomes part of the key and is used in all levels of the clustered index and in references to the clustered index key through all nonclustered indexes.

click to view at full size.

Figure 3-3. A nonclustered index with the leaf level containing bookmarks—either a RID or a clustered index key value.

Since SQL Server maintains ordering in index leaf levels, you do not need to unload and reload data to maintain clustering properties as data is added and moved. SQL Server always inserts rows into the correct page in clustered sequence. For a clustered index, the correct leaf page is the data page in which a row is inserted. For a nonclustered index, the correct leaf page is the one into which SQL Server inserts a row containing the key value (and bookmark) for the newly inserted row. If data is updated and the key values of an index change, or if the row is moved to a different page, SQL Server's transaction control ensures that all affected indexes are modified to reflect these changes. With transaction control, index operations are performed as atomic operations. The operations are logged and fully recovered in the event of a system failure.

For more details on index structures and maintenance, see Chapter 7.

click to view at full size.

Figure 3-4. A clustered index with the data located at the leaf level.

Locking and Index Pages


As you'll see later, in the section on the lock manager, pages of an index use a slightly different locking mechanism than regular data pages. A lightweight lock called a latch is used to lock upper levels of indexes. Latches are not involved in deadlock detection because SQL Server 2000 uses "deadlock-proof" algorithms for index maintenance.

You can customize the locking strategy for indexes on a table basis or index basis. You can use the system stored procedure sp_indexoption to enable or disable page or row locks with any particular index or, by specifying a table name, for every index on that table. The settable options are DisAllowPageLocks and DisAllowRowLocks. If both of these options are set to TRUE for a particular index, only table level locks are applied.

The Page Manager and the Text Manager

The page manager and the text manager cooperate to manage a collection of pages as named databases. Each database is a collection of 8-KB disk pages, which are spread across one or more physical files. (In Chapter 5, you'll find more details about the physical organization of databases.)

SQL Server uses eight types of disk pages: data pages, text/image pages, index pages, Page Free Space (PFS) pages, Global Allocation Map (GAM and SGAM) pages, Index Allocation Map (IAM) pages, Bulk Changed Map pages, and Differential Changed Map pages. All user data, except for the text, ntext, and image datatypes, are stored on data pages. These three datatypes, which are used for storing large objects (up to 2 GB each of text or binary data), can use a separate collection of pages, so the data is not typically stored on regular data pages with the rest of the rows. Instead, a pointer on the regular data page identifies the starting page and offset of the text/image data. However, in SQL Server 2000, large object data that contains only a few bytes can optionally be stored in the data row itself. Index pages store the Btrees that allow fast access to data. PFS pages keep track of which pages in a database are available to hold new data. Allocation pages (GAMs, SGAMs, and IAMs) keep track of the other pages. They contain no database rows and are used only internally. Bulk Changed Map pages and Differential Changed Map pages are used to make backup and recovery more efficient. I'll cover all the types of pages extensively in Chapters 5, 6, and 8.

The page manager allocates and deallocates all types of disk pages, organizing extents of eight pages each. An extent can be either a uniform extent, for which all eight pages are allocated to the same object (table or index), or a mixed extent, which can contain pages from multiple objects. If an object uses fewer than eight pages, the page manager allocates new pages for that object from mixed extents. When the size of the object exceeds eight pages, the page manager allocates new space for that object in units of entire uniform extents. This optimization prevents the overhead of allocation from being incurred every time a new page is required for a large table; this overhead is incurred only every eighth time. Perhaps most important, this optimization forces data of the same table to be contiguous, for the most part. At the same time, the ability to use mixed extents keeps SQL Server from wasting too much space if a database contains many small tables.

To determine how contiguous a table's data is, you use the DBCC SHOWCONTIG command. A table with a lot of allocation and deallocation can get fairly fragmented, and rebuilding the clustered index (which also rebuilds the table) or running DBCC INDEXDEFRAG can improve performance, especially when a table is accessed frequently using ordered index scans. I'll discuss these two commands in detail in Chapter 8.

The Transaction Manager

A core feature of SQL Server is its ability to ensure that transactions follow the ACID properties (discussed in Chapter 2). Transactions must be atomic—that is, all or nothing. If a transaction has been committed, it must be recoverable by SQL Server no matter what—even if a total system failure occurs one millisecond after the commit was acknowledged. In SQL Server, if work was in progress and a system failure occurred before the transaction was committed, all the work is rolled back to the state that existed before the transaction began. Write-ahead logging makes it possible to always roll back work in progress or roll forward committed work that has not yet been applied to the data pages. Write-ahead logging ensures that a transaction's changes—the "before and after" images of data—are captured on disk in the transaction log before a transaction is acknowledged as committed. Writes to the transaction log are always synchronous—that is, SQL Server must wait for them to complete. Writes to the data pages can be asynchronous because all the effects can be reconstructed from the log if necessary. The transaction manager coordinates logging, recovery, and buffer management. These topics are discussed later in this chapter; at this point, we'll just look at transactions themselves.

The transaction manager delineates the boundaries of statements that must be grouped together to form an operation. It handles transactions that cross databases within the same SQL Server, and it allows nested transaction sequences. (However, nested transactions simply execute in the context of the first-level transaction; no special action occurs when they are committed. And a rollback specified in a lower level of a nested transaction undoes the entire transaction.) For a distributed transaction to another SQL Server (or to any other resource manager), the transaction manager coordinates with the Microsoft Distributed Transaction Coordinator (MS DTC) service using operating system remote procedure calls. The transaction manager marks savepoints, which let you designate points within a transaction at which work can be partially rolled back or undone.

The transaction manager also coordinates with the lock manager regarding when locks can be released, based on the isolation level in effect. The isolation level in which your transaction runs determines how sensitive your application is to changes made by others and consequently how long your transaction must hold locks to protect against those changes. Four isolation-level semantics are available in SQL Server 2000: Uncommitted Read (also called "dirty read"), Committed Read, Repeatable Read, and Serializable.

The behavior of your transactions depends on the isolation level. We'll look at these levels now, but a complete understanding of isolation levels also requires an understanding of locking because the topics are so closely related. The next section gives an overview of locking; you'll find more detailed information in Chapter 14.

Uncommitted Read

Uncommitted Read, or dirty read (not to be confused with "dirty page," which I'll discuss later) lets your transaction read any data that is currently on a data page, whether or not that data has been committed. For example, another user might have a transaction in progress that has updated data, and even though it's holding exclusive locks on the data, your transaction can read it anyway. The other user might then decide to roll back his or her transaction, so logically those changes were never made. If the system is a single-user system and everyone is queued up to access it, the changes will not have been visible to other users. In a multiuser system, however, you read the changes and take action based on them. Although this scenario isn't desirable, with Uncommitted Read you can't get stuck waiting for a lock, nor do your reads issue share locks (described in the next section) that might affect others.

When using Uncommitted Read, you give up the assurance of strongly consistent data in favor of high concurrency in the system without users locking each other out. So when should you choose Uncommitted Read? Clearly, you don't want to use it for financial transactions in which every number must balance. But it might be fine for certain decision-support analyses—for example, when you look at sales trends—for which complete precision isn't necessary and the tradeoff in higher concurrency makes it worthwhile.

Committed Read

Committed Read is SQL Server's default isolation level. It ensures that an operation never reads data that another application has changed but not yet committed. (That is, it never reads data that logically never existed.) With Committed Read, if a transaction is updating data and consequently has exclusive locks on data rows, your transaction must wait for those locks to be released before you can use that data (whether you're reading or modifying). Also, your transaction must put share locks (at a minimum) on the data that will be visited, which means that data might be unavailable to others to use. A share lock doesn't prevent others from reading the data, but it makes them wait to update the data. Share locks can be released after the data has been sent to the calling client—they don't have to be held for the duration of the transaction.

NOTE


Although a transaction can never read uncommitted data when running with Committed Read isolation, if the transaction subsequently revisits the same data, that data might have changed or new rows might suddenly appear that meet the criteria of the original query. If data values have changed, we call that a non-repeatable read. New rows that appear are called phantoms.

Repeatable Read

The Repeatable Read isolation level adds to the properties of Committed Read by ensuring that if a transaction revisits data or if a query is reissued, the data will not have changed. In other words, issuing the same query twice within a transaction will not pick up any changes to data values made by another user's transaction. However, Repeatable Read isolation level does allow phantom rows to appear.

Preventing nonrepeatable reads from appearing is a desirable safeguard. But there's no free lunch. The cost of this extra safeguard is that all the shared locks in a transaction must be held until the completion (COMMIT or ROLLBACK) of the transaction. (Exclusive locks must always be held until the end of a transaction, no matter what the isolation level, so that a transaction can be rolled back if necessary. If the locks were released sooner, it might be impossible to undo the work.) No other user can modify the data visited by your transaction as long as your transaction is outstanding. Obviously, this can seriously reduce concurrency and degrade performance. If transactions are not kept short or if applications are not written to be aware of such potential lock contention issues, SQL Server can appear to "hang" when it's simply waiting for locks to be released.

NOTE


You can control how long SQL Server waits for a lock to be released by using the session option LOCK_TIMEOUT. I'll discuss this option, as well as many other ways of customizing locking behavior, in Chapter 16.

Serializable

The Serializable isolation level adds to the properties of Repeatable Read by ensuring that if a query is reissued, rows will not have been added in the interim. In other words, phantoms will not appear if the same query is issued twice within a transaction. More precisely, Repeatable Read and Serializable affect sensitivity to another connection's changes, whether or not the user ID of the other connection is the same. Every connection within SQL Server has its own transaction and lock space. I use the term "user" loosely so as not to obscure the central concept.

Preventing phantoms from appearing is another desirable safeguard. But once again, there's no free lunch. The cost of this extra safeguard is similar to that of Repeatable Read—all the shared locks in a transaction must be held until completion of the transaction. In addition, enforcing the Serializable isolation level requires that you not only lock data that has been read, but also lock data that does not exist! For example, suppose that within a transaction we issue a SELECT statement to read all the customers whose zip code is between 98000 and 98100, and on first execution no rows satisfy that condition. To enforce the Serializable isolation level, we must lock that "range" of potential rows with zip codes between 98000 and 98100 so that if the same query is reissued, there will still be no rows that satisfy the condition. SQL Server handles this by using a special kind of lock called a key-range lock, which I'll discuss in Chapter 14. The Serializable level gets its name from the fact that running multiple serializable transactions at the same time is the equivalent of running them one at a time—that is, serially. For example, suppose transactions A, B, and C run simultaneously at the Serializable level and each tries to update the same range of data. If the order in which the transactions acquire locks on the range of data is B, C, and A, the result obtained by running all three simultaneously is the same as if they were run one at a time in the order B, C, and A. Serializable does not imply that the order is known in advance. The order is considered a chance event. Even on a single-user system, the order of transactions hitting the queue would be essentially random. If the batch order is important to your application, you should implement it as a pure batch system.

SEE ALSO


For an interesting critique of the formal definitions of the ANSI isolation levels, see the companion CD, which contains a technical report called "A Critique of ANSI SQL Isolation Levels," published by the Microsoft Research Center.

The tough part of transaction management, of course, is dealing with rollback/rollforward and recovery operations. I'll return to the topic of transaction management and recovery a bit later. But first I'll discuss locking and logging further.

The Lock Manager

Locking is a crucial function of a multiuser database system such as SQL Server. Recall from Chapter 2 that SQL Server lets you manage multiple users simultaneously and ensures that the transactions observe the properties of the chosen isolation level. At the highest level, Serializable, SQL Server must make the multiuser system perform like a single-user system—as though every user is queued up to use the system alone with no other user activity. Locking guards data and the internal resources that make it possible for many users to simultaneously access the database and not be severely affected by others' use.

The lock manager acquires and releases various types of locks, such as share locks for reading, exclusive locks for writing, intent locks to signal a potential "plan" to perform some operation, extent locks for space allocation, and so on. It manages compatibility between the lock types, resolves deadlocks, and escalates locks if needed. The lock manager controls table, page, and row locks as well as system data locks. (System data, such as page headers and indexes, are private to the database system.)

The lock manager provides two separate locking systems. The first enables row locks, page locks, and table locks for all fully shared data tables, data pages and rows, text pages, and leaf-level index pages and index rows. The second locking system is used internally only for restricted system data; it protects root and intermediate index pages while indexes are being traversed. This internal mechanism uses latches, a lightweight, short-term variation of a lock for protecting data that does not need to be locked for the duration of a transaction. Full-blown locks would slow the system down. In addition to protecting upper levels of indexes, latches protect rows while they are being transferred from the storage engine to the relational engine. If you examine locks by using the sp_lock stored procedure or a similar mechanism that gets its information from the syslockinfo system table, you won't see or be aware of latches; you'll see only the locks for fully shared data. However, counters are available in the System Monitor to monitor latch requests, acquisitions, and releases. These will be discussed in Chapter 17.

Locking is an important aspect of SQL Server. Many developers are keenly interested in locking because of its potential effect on application performance. Chapter 14 is devoted to the subject, so I won't go into it further here.

Other Managers

Also included in the storage engine are managers for controlling utilities such as bulk load, DBCC commands, backup and restore operations, and the Virtual Device Interface (VDI). VDI allows ISVs to write their own backup and restore utilities and to access the SQL Server data structures directly, without going through the relational engine. There is a manager to control sorting operations and one to physically manage the files and backup devices on disk.



Inside Microsoft SQL Server 2000
Inside Microsoft SQL Server 2000
ISBN: 0735609985
EAN: 2147483647
Year: 2005
Pages: 179
Authors: Kalen Delaney

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net