Storage Engine and IO Changes | Microsoft SQL Server 2005: Changing the Paradigm (SQL Server 2005 Public Beta Edition)

Storage Engine and I/O Changes

The storage engine is an integral component of the SQL Server architecture, and it is responsible for managing database files, building and reading physical pages, controlling concurrency, handling logging and recovery, carrying out physical I/O, handling table and index traversal, and so on. SQL Server 2005 introduces several enhancements to the storage engine and physical I/O. Here is an overview of these enhancements:

Instant file initialization On Windows XP and Windows Server 2003, SQL Server 2005 can instantly initialize data files, without filling the reclaimed disk space with zeros. This can be a huge time saver when you're creating very large databases (for instance, while creating databases for a restore operation). This feature is discussed in detail in Chapter 8, "Reliability and High Availability in the Database Engine."
Data file autogrow In SQL Server 2000, the default FILEGROWTH setting for data files was 10%. SQL Server 2005 changes this to 1MB. The log file FILEGROWTH setting of 10% remains unchanged.
Efficient read-ahead The read-ahead algorithm has been updated to dynamically detect whether read-ahead is required, despite bad estimates from the optimizer.
Multi-path I/O (MPIO) SQL Server 2005 leverages the Windows Server 2003 MPIO capability, which allows multiple host bus adapters (HBAs) to be used for concurrent data access. MPIO enables applications to diversify requests to storage in order to circumvent a single point of failure. MPIO dynamically balances the I/O workload.
Allocation units (AUs) The SQL Server 2000 storage engine made use of index application map (IAM) pages to find out about the extents (that is, groups of eight pages) used by a heap (that is, a group of tables with no clustered index) or an index. A heap or an index had at least one IAM for each file on which it had allocated extents. The storage engine could read the IAM chain to build a sorted list of the disk addresses that needed to be read. There was a single IAM chain per heap/index in SQL Server 2000.
In SQL Server 2005, each heap or index can have multiple IAM chains. This change has been made to support large object (LOB) data types, such as varchar(max), and also to support row overflow data exceeding 8KB. SQL Server 2005 relaxes the restriction of a maximum of 8060 bytes per row for columns containing varchar, nvarchar, varbinary, sql_variant, or CLR user-defined type columns. Therefore, the following batch would fail in SQL Server 2000, but it succeeds in SQL Server 2005:
```
USE tempdb; CREATE TABLE dbo.tblTest(    c1 varchar(4000),    c2 varchar(4000),    c3 varchar(4000)); GO INSERT INTO dbo.tblTest VALUES    (REPLICATE('*', 4000), REPLICATE('*', 4000), REPLICATE('*', 4000)); GO
```
In other words, a SQL Server 2005 table row can contain more than 8,060 bytes. To support this and to support LOB types, SQL Server 2005 introduces two additional IAM chains. SQL Server also introduces a new term called allocation unit that refers to a set of pages owned by an IAM chain. These are the three types of allocation units:
- An allocation unit of type IN_ROW_DATA is used to manage data or index rows that contain all data except LOB data.
- An allocation unit of type LOB_DATA is used to manage LOB data of types text, ntext, image, xml, varchar(max), nvarchar(max), varbinary(max), or CLR user-defined data types.
- An allocation unit of type ROW_OVERFLOW_DATA is used to manage variable-length data stored in varchar, nvarchar, varbinary, or sql_variant columns that exceed the 8,060-byte row size limit.
You can use the sys.allocation_units and sys.system_internals_allocation_units catalog views to obtain information about each allocation unit in a database. Note that sys.system_internals_allocation_units is for internal use only and may change in the future.
Table and index partitioning You can significantly enhance the manageability and scalability of large tables and indexes by horizontally partitioning the data or index into multiple partitions. This feature supersedes the partitioned views functionality available in earlier releases.
Partitioning enables index and table data to be spread across multiple filegroups in partitions. A partition function defines how the rows of a table or an index are mapped to a set of partitions based on the values of certain columns, referred to as partitioning columns. A partition scheme maps each partition specified by the partition function to a filegroup. This lets you develop archiving strategies that enable tables to be scaled across filegroups and, therefore, physical devices. In some situations, partitioning may also improve performance. Partitioned tables and indexes are supported in the Enterprise and Developer Editions of SQL Server 2005 only.
SQL Server 2005 introduces new DDL statements, such as CREATE PARTITION FUNCTION and CREATE PARTITION SCHEME, and enhances existing DDL statements for creating tables and indexes, to allow you to set up table/index partitioning. A complete description of partitioning is beyond the scope of this chapter. Refer to SQL Server 2005 Books Online and whitepapers on the Microsoft website for more details.