Reading and Writing the Data Pages | Microsoft SQL Server 2005: The Complete Reference: Full Coverage of all New and Improved Features

In previous versions of SQL Server, versions 6.5 and earlier, and other legacy DBMS products, the storage functionality and the relational functionality were part and parcel of a single unit. Since version 7.0 the two areas of functionality are encapsulated in separate engines: the storage engine and the relational engine. Both engines include a number of submodules responsible for the reading and writing of data. For the most part, you do not need to concern yourself with the core architecture; however, an understanding of how SQL Server accesses data will help you decide if you need to improve performance with additional or fewer indexes, or if you need to rewrite queries and so forth.

The role played by the storage engine was described in Chapter 2, but it works closely with the relational engine, across the OLE DB “bridge” that is the means by which the two components interface. This is illustrated in Figure 3–1. All read requests are generated by the relational engine, which makes use of specialized access algorithms (encapsulated in access objects) to read data in tables, with or without indexes or other objects.

image from book
Figure 3–1: The relational engine and the storage engine are separate components of SQL Server 2005 (all editions)

Access methods are not part of any standard, such as SQL. The language is too abstract for that (although I will discuss Data Definition Language and the syntax for creating and working with indexes in Chapter 12). And access methods are proprietary; each vendor is free to do something special.

The relational engine determines the access method it needs to use to obtain data. These methods might involve local table scans, remote table scans, foreign tables scans, native files reads, index scans, keyed reads, and so on. The combination of these methods determines the general pattern of reads used by SQL Server. SQL Server can make use of a number of different access methods, including the Indexed Sequential Access Method (ISAM), which is used by desktop database systems such as Microsoft Access and Microsoft FoxPro. The database architecture discussed in Chapter 2 makes the reading of data as fast and efficient as possible.

How Data Is Saved

SQL Server works with a singly linked list that contains the addresses of free buffer pages. As soon as a data-reading thread requires a buffer page, it uses the first page in the free buffer list. We say that data pages are dirty when they contain data that has not been written to disk. Each buffer page contains a reference counter and an indicator that specifies whether the data page contains dirty data. Every time a query references a buffer page, the reference counter is incremented.

Each instance of SQL Server maintains a lazywriter thread that has the honor of scanning through the buffer cache. Its first job upon system startup is checking the size of the free buffer list. If the free buffer list is below a certain preset point, which depends on the size of the cache, more free pages are added. The buffer pages are scanned frequently by the lazywriter thread, and at each scan the reference counter in the buffer page header is divided by 4. The remainder of the division is discarded. Finally when the reference counter is reduced to 0, the dirty page indicator is checked. If the indicator specifies that the page is dirty, the lazywriter schedules an event to save the data to disk. The writer event coincides with a free buffer page value that is predetermined internally.

The log files come into the picture because data in the buffer cache is first written to the transaction log as the transaction progresses. The purpose of writing to the log first is to allow SQL Server to roll back or roll forward the data before it is committed to the table. Then, in the event of a problem that causes suspect or no data to be written to the table on the physical disk, the commit or bad data can be rolled back. After the data is written to disk and the commit is sound, or if the data page did not contain dirty data, it is freed. Associations between the buffer pages and the physical data pages are removed. Finally the buffer page is returned to the free list. The size of the free buffer list is internally decided by SQL Server and cannot be configured by the DBA.

The writing of the transaction logs takes place outside of the regimen scheduled by the threads working on the buffer and data pages. The commit directive forces all pending log transactions to be written to disk.