The Physical Database Architecture | Microsoft SQL Server 2005: The Complete Reference: Full Coverage of all New and Improved Features

SQL Server 2005’s physical architecture, its internal level, is highly complex. The architecture was extensively overhauled in version 8.0 (SQL Server 2000), and this version builds on those foundations. Earlier versions of SQL Server were very different on the inside, so if you are here from version 7.0, by-passing version 8.0 on the way over to 9.0, you need to listen up.

Pages and Extents

Pages and extents are to SQL Server what cells and combs are to a beehive. Pages contain the rows, which contain the data. Collections of pages make up extents. Extent collections are essentially what makes up your database. This next section explores pages and extents in more detail.

Pages

Looking at the internal level, we find that the fundamental base data storage object is the page. Each page is only 8KB in size, which means that a SQL Server 2005 database can store 128 pages in a megabyte.

Pages are managed and controlled in a similar fashion to other Windows OS objects. First, the page is identified with a globally unique identifier or object ID. Second, each page contains a 96-byte header that is used to store pertinent information about the page, including its type, available space, and so on. Table 2–1 lists the six types of pages that are used by the SQL Server 2005 database.

Table 2–1: Page Types Used by SQL Server Databases
Page Type	Contents
Data	Pages that contain the data rows with all data, except text, ntext, image, nvarchar(max), varchar(max), varbinary(max), and xml data, when “text in row” is set to ON.
Index	All index entries.
Text/Image	The large object data types of text, ntext, image, nvarchar(max), varchar(max), varbinary(max), and xml data. Also contains the variable length columns when the data row exceeds 8KB of varchar, nvarchar, varbinary, and sql_variant.
Global Allocation Map (GAM), Shared Global Allocation Map (SGAM)	Indicates if extents are allocated.
Page Free Space	Information about page allocation and free space available on pages.
Index Allocation Map (IAM)	Information about extents used by a table or index per allocation unit.
Bulk Changed Map	Information about extents modified by bulk operations since the last BACKUP LOG statement per allocation unit.
Differential Changed Map	Information about extents that have changed since the last BACKUP DATABASE statement per allocation unit.

Data pages store all the data in the rows except, as indicated, large amounts of data for the types text, ntext, image, nvarchar(max), varchar(max), varbinary(max), and xml. The “(max)” prefixed types and “xml” are new to SQL Server 2005. The aforementioned types are large extents of character and binary information, which are stored in separate pages (more about the large object data types later). The Text/Image data pages are also used for varchar, nvarchar, varbinary and sql_variant when the data row exceeds 8KB, as indicated in Table 2–1.

The layout of a data page is illustrated in Figure 2–9, which illustrates that the data is laid out serially on the pages in rows. The rows start from the top of the page, immediately after the header. Each page contains a region that stores the row offsets for the data pages. These row offsets are stored at the end of the data pages. The row offsets contain one entry for each row on the page. The row offsets also indicate how far the first byte of the row is from the start of the page. The row offsets are started at the end of the page-in other words, the first row offset is the last offset, stored at the end of the page. Each row on a page can hold up to 8,060 bytes of data. And the rows cannot span to other pages.

image from book
Figure 2–9: A SQL Server data page

Extents

Extents are used to allocate space to tables and indexes. Each extent contains up to eight contiguous pages, which means that each extent is a maximum of 64KB in size. The database can thus maintain 16 extents per available MB of storage space.

There are two types of extents in SQL Server 2005, uniform extents and mixed extents. Uniform extents are the property or ownership of a single object. In other words, the object that owns the uniform extent also owns all eight pages in the extent. The mixed extent can be shared by up to eight objects. Each object is allocated a page. The object retains ownership of the page until its data requirement exceeds the limits of the page, after which it is assigned to a uniform extent.

Database Files and Filegroups

As discussed earlier, SQL Server databases are made up of a collection of three files: the primary data files, the secondary data files, and the log files. The data pages described previously are stored in these database files. Transaction logs are of course stored in the log files.

SQL Server gives these files the .mdf, .ndf, and .ldf extensions respectively but does not enforce these extensions for its own functionality. It is not recommended that you change them, however, especially if your backup mechanisms and scripts are associated with these file extensions.

When you create database files, either interactively with Management Studio or in T-SQL code, you can specify that the database files grow and shrink automatically (autoshrink). You need to specify a growth increment/decrement by which the file will grow or shrink as needed. This is illustrated in Figure 2–10. The file can also be allowed to grow until it has used all the space available to it on the system. You can also specify a maximum size for a database file.

image from book
Figure 2–10: Configuring a database interactively

Filegroups Filegroups are a useful management feature in SQL Server that let you assign database files to a group object. You can improve system performance by collecting files into groups and placing them onto specific disk drives and storage arrays. You can create filegroups for each disk drive and then assign the tables, indexes, or large object data types to these filegroups.

Log files cannot be part of a filegroup because they are maintained separately from data files. Also, you cannot assign files to more than one group. The following is a list of the three types of filegroups:

Primary The primary filegroup contains primary data. The system or built-in files are gathered up into the primary group.
User-defined The user-defined files groups can be created in T-SQL code using the FILEGROUP keyword in CREATE DATABASE and ALTER DATABASE statements.
Default The default filegroup is where all the files that have not been assigned to either primary or user-defined groups are kept. If you create a database and do not specify a group for it, it will automatically be assigned to the default group. Members of the db_owner fixed database role can manually move a file between groups. If you do not define a default group, the primary group assumes the role of default.

Filegroups are not an essential element in SQL Server 2005, but they can greatly ease the administrative burden. For example, you can create a backup regimen that backs up certain files or filegroups, instead of backing up the entire database. We will return to the filegroups in Chapters 6 and 7. Figure 2–11 illustrates working with filegroups interactively using the Management Studio.

image from book
Figure 2–11: Configuring filegroups

Space Allocation and Reuse

SQL Server’s database architecture has been designed to respond to frequent changes in the size and disposition of its databases. The net result of the wizardry in its internal mechanisms is a faster and more responsive system. This is especially important when you are working with very large databases (VLDBs), critical OLTP systems, or a large number of users. What goes on at the lower levels is not something a user can see (only feel), but both database developers and DBAs alike can benefit from knowledge of the architecture and how it operates.

At work under the “hood” is a data usage and space tracking system that appears to be relatively simple, which is its beauty. Coupled with the page-extent architecture, the system is able to easily monitor database allocation and space occupancy with little overhead. On close inspection of the pages, we find that the space allocation information is confined to only a few pages, which are densely packed. This means that you need less physical memory to hold the free space information, which translates into less use of virtual memory and thus less hard disk reads.

I alluded earlier to several types of allocation maps used by SQL Server to store the allocation of extents. Two are key to the recording of the extent allocations. These are the Global Allocation Maps (GAMs) and the Shared Global Allocation Maps (SGAMs), which are derived from the GAM bitmap.

The GAM records information that describes what extents have been allocated. Each GAM covers a whopping 64,000 extents, which translates into 4GB of data. The GAM, a bitmap, provides one bit for each extent it covers. If a bit in the map is set to 0, it means that the extent related to the bit is being used. If the bit is set to 1, it means the extent is free.

The SGAM bitmaps provide information about which extents are being used as mixed extents. The reverse order of the bits is applied in the SGAM bitmap. If the GAM bit is set to 1, then the SGAM bit is set to 0. If the SGAM bit is 0, it means the extent is either free or not in use as a uniform or full mixed extent. In other words, it could be a mixed extent whose pages are currently in use. Table 2–2 further illustrates the bitmap information for current use of an extent.

Table 2–2: The Bit States for Reporting the Current State of an Extent
State of Extent	GAM Bit State	SGAM Bit State
Free, or not in use	1	0
Uniform extent, or a full mixed extent	0	0
Mixed extent with free pages	0	1

Microsoft’s use of bitmaps to manage usage is at once simple and ingenious because it allows for an elegant means of managing usage of databases using a tried-and-tested bitmapping technique. The algorithm is simple and thus virtually bulletproof. All the server is required to do is search the bitmaps for free extents and then change the state of the GAM bit and its alter ego, the SGAM bit. To free an extent, SQL Server ensures the GAM bit is set to 1 and the SGAM bit is set to 0.

Another use of bitmaps is demonstrated with the Page Free Space (PFS) pages that record whether an individual page has been allocated and how much free space is in the page. The bitmap records whether the page is empty, up to 50 percent full, up to 80 percent full, up to 95 percent full, or between 96 and 100 percent full. SQL Server uses this information when it needs to find free space for a row, or when it has to allocate a new page.

Closer inspection is required to fully appreciate the architecture at work here. Figure 2–12 provides a conceptual view of the data file and the order and sequence of pages in the data file. The sequence is not accidental. It allows SQL Server to search the data files because it always knows where to find the GAM page, which is page 2 in a zero-based collection (it is actually the third page in the data file; the file header at page 0 is the first page).

image from book
Figure 2–12: The order of pages in the data file

Space Used by Indexes Indexes can consume a ton of space in databases. Index Allocation Map (IAM) bitmaps are used by SQL Server to keep track of the extents in databases that indexes use. Like GAMs and SGAMs, IAMs are stored in the respective databases that contain the objects in question. These IAM pages are allocated as needed and are stored at random in the file. SQL Server uses IAM pages to allocate extents to objects in the same fashion it allocates pages and extents to table data.

How Tables and Indexes Are Organized in SQL Server

The SQL Server table pages we described earlier are organized according to a system of linked or unlinked lists. When a clustered index is defined for a table or a view, the data rows are stored in the pages in the order based on the clustered index key. The data pages are linked by a doubly linked list, and the index is implemented as a B-tree index structure that provides a fast retrieval mechanism based on the clustered index key

Data rows that are not stored in any order are called heaped tables and do not require their data pages to be sorted in any order either. And the data pages are not linked in a linked list. Rows in heaped tables are found by scanning the IAM pages that manage the space allocated to a heap. SQL Server essentially scans the IAMs serially to find the extents holding pages for heap data. The table scan on a heap takes place as follows:

SQL Server reads the first IAM in the filegroup and scans all extents defined by the IAM.
SQL Server repeats the process for each IAM in the file.
SQL Server repeats the preceding steps for every file in the database or filegroup until the last IAM for the heap has been processed.

A single instance of SQL Server can support up to 249 nonclustered indexes on each table or indexed view. Nonclustered indexes also have a B-tree index structure similar to the one established for clustered indexes. However, nonclustered indexes have no control over the order of the data rows. In other words, the only time that data pages are sorted according to a system or order is when clustered indexes are established for the tables.

Every table, index, and indexed view is represented by a row inserted in the sysindexes table in its respective database. The row is uniquely identified by the combination of the object identifier (id) column and the index identifier (indid). The allocation of the tables, indexes, and indexed views is managed by a chain of IAM pages. A FirstIAM column in sysindexes points to the first IAM page in the chain of IAM pages managing the space allocated to the table, indexed view, or index.

Heaps are also allocated rows in the sysindexes table but have an index identifier of 0. When dealing with a heap, SQL Server uses the IAM pages to find the heap table pages in the data page collection because they are not linked.

Clustered indexes, tables, and views each have a representative row in sysindexes that has an index identifier or indid of 1. The root column points to the top of the clustered index B-tree to allow the server to use the index B-tree to locate data pages.

Nonclustered indexes, tables, and views also have rows in sysindexes. The values, however, of the indid range from 2 to 250. The root column thus points to the top of the nonclustered B-tree.

Tables that also have a text, ntext, or image column also have rows in sysindexes with an indid of 255. The FirstIAM column points to the chain of IAM pages that manage the text, ntext, image, nvarchar(max), varchar(max), varbinary(max), and xmlpages.

Transaction Logs

Transaction logs can be studied as components, at both the conceptual and internal levels of the database; from a physical perspective as well as a logical one. The subject is important enough to warrant introduction here under its own heading. Transaction logs are the footprints of SQL Server transactions. Not only can they trace transaction activity, but also they provide a near-perfect mechanism for recovery and redundancy of the data stored in tables. The transaction logs are capable of the following functions:

They can be used to recover individual transactions. These are known as rollbacks in database server parlance. If a transaction’s data is determined to be “questionable,” the transaction that led to the insertion of the questionable data can be rolled back. The questionable data is removed from the database. The ROLLBACK statement is akin to the instant replay you see on television or in films.
They can be used to recover incomplete transactions. Incomplete transactions may occur when SQL Server crashes in the middle of a transaction…which should never happen. The incomplete transaction is recovered when SQL Server restarts. When SQL Server restarts, it runs a recovery of each database. If certain transactions are found not to have completed (but they are sound in the log), they are rolled forward. At the same time, any transactions found to have caused questionable data are rolled back.
They can be used to bring a database current from the last full or differential backup to the last transaction in the transaction log. It may, for example, happen that tables are damaged and you will need to restore a database from the last backup. The transaction log closes the “gap” between the last backup and the last transaction in the log. Any transactions that are not in the restored database are rolled forward and installed to the database.

The following list describes some of the operations stored in the transaction logs:

The start and end of every transaction
All INSERT, UPDATE, and DELETE transactions, including any transaction performed by system processes
All extent allocations and deallocations
The creation or termination of tables and indexes

Transaction Log Architecture

SQL Server transaction logs use a write-ahead log mechanism. This means that SQL Server ensures that no data updates are written to disk until they have been written to the log. The log record must in fact be written to disk before the data pages in the buffer are written to disk. In other words, SQL Server can ensure that a dirty page is not flushed before the log record associated with the dirty page is flushed (see the section “How Data Is Saved” in Chapter 3).

Note

Bulk insert and copy operations can cause a transaction log to fill up quickly. When you need to insert huge amounts of data for which you don’t need transaction logs, nonlogged operations are supported. This is discussed in Chapters 16 and 17.

Transaction logs are not tables but exist as separate files or collections of files in the database. The layout or structure of the data pages has no bearing on the log files. The log cache is also managed separately from the data cache. The log cache is, however, controlled by the database engine, but the code that is executed is not related to any data page functionality.

Log files are implemented over several files as needed. Like databases, they can be configured to “autogrow” as needed by the logging engine. This prevents you from running out of space in the transaction log (although truncation plays an important part here too). However, you will still need to make sure you do not run out of hard disk space. Chapters 7 and 8 also look at the truncation of transaction logs, which is an important subject for DBAs, because if stale log records are not deleted from the transaction log, your log files will just keep growing until your server runs out of hard disk space.