SQL Server I0 Architecture | Microsoft SQL Server 2005: The Complete Reference: Full Coverage of all New and Improved Features

SQL Server I/0 Architecture

The I/O architecture, which defines how data is read from and written to the databases, is the architecture underpinning all database management systems. It is also one of the key components of these products and one of the key architectures that sets them apart from one other. The system that has the fastest, leanest, and most innovative I/O architecture wins the transaction-load war and ultimately market share, as discussed earlier on the subject of TPC benchmarks.

SQL Server 2005 has lofty goals set for it. It is currently cleaning up. On the one hand, it must meet the needs of small businesses, while on the other, it competes with the heavyweights of the world. I believe Microsoft met the challenge, because SQL Server can scale from a single-user solution to an Internet deployment supporting thousands of users. Let’s take a closer look at the I/O architecture that achieves this feat.

There are two locations from which an instance of SQL Server can read (and write) data: from memory and from the physical hard disk. It does not take much effort to imagine what would become of a hard disk if all the data required by clients was constantly read from the database-a barrage of continuous transactions might, literally, set the server on fire. So SQL Server, like all DBMS systems, stores a virtual copy of its most accessed data, and most recently accessed data, in memory In other words, the read or write does not require the movement of physical parts.

When SQL Server starts and data is requested, it is loaded into virtual memory and occupies spaces allocated for it in the so-called buffer cache we discussed earlier. As you learned in Chapter 2, the data is stored in 8KB data pages on disk. The pages are made contiguous to form 64KB extents. Accordingly, the virtual data pages in the buffer cache are also 8KB in size.

Now, a virtual read (sometimes referred to as a “logical read,” a term I have a problem with) occurs every time a client requests data that is stored in the buffer cache. If the data required is not in the buffer cache, SQL Server has to get the data from the hard disk, a physical read. Conversely, a virtual write takes places when data is updated or changed in memory. By virtual write, we mean that the data is updated in the cache as opposed to being updated on the disk. However, the data is not persistent, and if the electrical power that provides life to the cache is terminated, data loss ensues. From time to time, data must then be physically written to disk. But performance takes precedence over persistence, and the DBMS pushes the virtual management envelope by reading and writing to data more than once before it is saved to the disks.

The DBMS thus has to maintain a suitable buffer so that perf ormance is maintained and the server is not forced to write and read excessively to and from the hard disk. However, the larger the buffer cache used by SQL Server, the less memory resources are made available to other applications. A point of diminishing returns arrives when the server saves on physical I/O but loses through the excessive paging that occurs to support other applications.

SQL Server pushes the so-called I/O envelope by using two Windows Server 2003 I/O features: Scatter-Gather I/O and Asynchronous I/O. Scatter-Gather I/O has often been cited as a killer feature of Windows Server 2003, but it was introduced to the operating system with Windows NT 4.0 SP2 many years ago. No doubt it was a target for enhancement in Windows Server 2003, and it is now built into the operating system. Essentially, this technology is responsible for moving data from disconnected areas of RAM to contiguous blocks of data on a hard disk. While you, the DBA or system administrator, do not have any control over the operating system’s use of this technology, systems and software have to be specifically written to take advantage of it. SQL Server 2005 is one such system-actually, Scatter-Gather I/O was originally created for SQL Server.

Note

This is not exactly the same thing that the VMM does, because the shunting around of data by the VMM occurs without the cognizance of the application.

The performance boost for SQL Server is more obvious when the DBMS reads data from the hard disk. For example: If SQL Server reads in a 64KB extent, it does not have to reserve a 64KB area in the buffer cache for the data. It can scatter the eight buffer pages directly into the buffer cache and later gather the eight pages by maintaining the cache addresses of the eight buffer pages.

Another I/O feature SQL Server makes heavy use of is Asynchronous I/O. If I/O were synchronous, every time SQL Server spawned a thread to read or write data, the entire system would have to wait for the I/O operation to complete before regaining control and moving on with the next process. But under Asynchronous I/O, SQL Server can carry on with other tasks while checking back periodically on the status of the I/O thread.

Using Asynchronous I/O functionality, SQL Server can spawn multiple concurrent I/O operations against files. You can throttle the maximum number of I/O operations by setting the max async io configuration option. The default is 32 threads of I/O execution for the 32-bit systems and 64 threads for the 64-bit systems. Only the most demanding of applications will require you to set more than the default. This is now a good point to launch into the subject of SQL Server’s thread and task architecture.