The Silent Killer: IO Problems | Professional SQL Server 2005 Administration (Wrox Professional Guides)

The Silent Killer: I/O Problems

Many times, customers complain about their SQL Server performance and point to the database because the processors aren't that busy. After a discussion and a little elbow grease, frequently the culprit is an I/O bottleneck. The confusion comes from the fact that disk I/O is inversely proportional to CPU. In other words, over time, the processors are waiting for outstanding data requests that are queued on an overburdened disk subsystem. This section is about laying out the SQL Server shell or container on disk and configuring it properly to maximize the exploitation hardware resources. Scaling any database is a balancing act based on moving the bottleneck to the least affecting resource.

SQL Server I/O Process Model

Windows Server 2003 and SQL Server 2005 storage engine work together to mask the high cost of a disk I/O request. The Windows 2003 I/O Manager handles all I/O operations. The I/O Manager fulfills all I/O (read or write) requests by means of scatter-gather or asynchronous methods. For examples of scatter-gather or asynchronous methods, refer to Books Online under "I/O Architecture."

The SQL Server storage engine manages when, how, and the number of disk I/O operations performed. However, the Windows operating system (I/O Manager Subsystem) performs the underlying I/O operations and provides the interface to the physical media. That is why we always recommend running SQL Server 2005 on the latest version of Windows. There are behavioral differences in Windows Server 2003, Windows Server 2003 with SP1, and Windows Server 2003 R2 that SQL Server 2005 can take advantage of. See the http://www.microsoft.com/windowsserver2003/ site under the "Compare the Editions of Windows Server 2003" heading for more specifics. Definitely have a look at the Symmetric Multiprocessing (SMP) under the "Hardware Specifications" section.

The job of the storage engine is to manage or mitigate as much of the cost of these I/O operations as possible. For instance, the storage engine allocates much of its virtual space to a buffer cache. This cache is managed via cost-based analysis to ensure that memory is optimized to efficiently use its space for content. Data that is frequently updated or requested is maintained in memory. This benefits the user's request by performing a logical I/O and avoiding expensive physical I/O requests.

Database File Placement

SQL Server stores its database on the operating system files (physical disks or LUNS). The database is made up of three file types: a primary data file (MDF), one or more secondary data file(s) (NDF), and a transaction log (LDF).

Database (file) location is critical to the performance of the DBMS and its corresponding I/O. Using a fast and independent I/O subsystem for database primary data files is the right thing to do. As described in Chapter 11, available disk space does not equate better performance. Rather, the more physical spindles there are, including LUNS, the better your system will perform. Data can be stored according to usage across data files and filegroups that span many physical disks. A filegroup is a collection of data files used in managing database primary data-file placement.

Read-only filegroups are a new SQL Server 2005 feature. They allow for reference data that rarely gets updated (or archived data) to be placed in a read-only filegroup. This filegroup, along with additional filegroups, can be part of an updatable database.

Note

In SQL Server 2005, the use of MDF, NDF, and LDF file extensions is now optional.

There are functional changes to tempdb to take into consideration when doing your primary data-file placement strategy.

Tempdb considerations

When SQL Server is restarted, tempdb is the only database that returns to the default size of 8MB, and it will continue to grow based on requirements. During the autogrow operation worker, threads can lock database resources during the database-growth operation, affecting server concurrency. To avoid timeouts, the autogrow operation should be limited to under two minutes.

In SQL Server 2005, tempdb has taken on support for a whole new set of features. tempdb still consists of a primary data and log file, but it also has two version stores. A version store is a collection of data pages that hold data rows required to support a particular feature. A "common" version store is used by numerous features:

Optimizing Bulk Import Performance
Using Common Table Expressions
WITH common_table_expression (Transact-SQL)
About Choosing a Cursor Type
Database Mail
DBCC CHECKDB (Transact-SQL)
Optimizing DBCC CHECKDB Performance
Understanding Event Notifications
tempdb and Index Creation
Special Guidelines for Partitioned Indexes
Disk Space Requirements for Index DDL Operations
Index Disk Space Example
How Online Index Operations Work
Using Large-Value Data Types
Using Multiple Active Result Sets (MARS)
Using Query Notifications
Execution Plan Caching and Reuse
Understanding Row Versioning Isolation Levels
Row Versioning Resource Usage

There is also a dedicated "online-index-build" version store for the online index building process.

Placing the tempdb database on an isolated and fast I/O subsystem to ensure good performance is a good place to start. Unfortunately, it probably will not be enough. As we just mentioned, the tempdb database now supports user objects (like temporary tables), internal objects (intermediate sort results), and functionality-based requirements on the version stores. There has been a ton of work performed on tempdb internals to improve scalability.

Note

Consider reading BOL under "Capacity Planning for tempdb" for additional information and functionality details regarding tempdb usage.

We recommend that you do some type of capacity planning for tempdb to ensure that it's properly sized and can handle the needs of your enterprise. At a minimum, we recommend the following:

Take into consideration the size of your existing tempdb.
Monitor tempdb while running your largest-known affecting process.
Rebuild the index of your largest table online while monitoring tempdb. Don't be surprised if this number turns out to be two times the table size, as this process now takes place in tempdb.

The SQL Team recommended the following query to monitor tempDB at PASS last year, and we think you should use it. This query identifies and expresses tempdb space used, in kilobytes, by internal objects, free space, version store, and user objects.

 select sum(user_object_reserved_page_count)*8 as user_objects_kb,    sum(internal_object_reserved_page_count)*8 as internal_objects_kb,    sum(version_store_reserved_page_count)*8 as version_store_kb,    sum(unallocated_extent_page_count)*8 as freespace_kb from sys.dm_db_file_space_usage where database_id = 2

The output of this query looks like this:

 user_objects_kb      internal_objects_kb  version_store_kb     freespace_kb -------------------- -------------------- -------------------- -------------------- 256                  640                  0                    6208

Note

If any of these stores run out of space, tempdb operations will cease.

Taking into consideration the preceding results, the following steps are recommend:

Avoid autogrow. Preallocate space for tempdb files based on the results of your testing, but leave autogrow enabled in case tempdb runs out of space.
Create one database file per system CPU or processor core (all equal in size).
Set tempdb to simple recovery model (allows for space recovery).
Set autogrow to 10 percent.
Place tempdb on its own fast and independent I/O subsystem.
Create alarms that monitor the environment by using SQL Server agent or Microsoft Operations Manager with SQL Pack to ensure that you never get error 1101 or 1105 (tempdb is full). This is crucial because the server stops processing inserts. Right-click SQL Server Agent in the SQL Server Management Studio and fill in the screen, as shown in Figure 12-2.
Use instant file initialization. If you are not running the SQL Server (MSSQLSERVER) Service account with admin privileges, make sure that the SE_MANAGE_VOLUME_NAME permissions have been assigned to the services account. This feature can take a 15-minute file-initialization process down to about a second for the same process.

image from book
Figure 12-2

Another great tool is the sys.dm_db_task_space_usage DMV, which provides insight into tempdb's space consumption on a per-task basis. Keep in mind that once the task is complete, the counters reset to zero.

In addition, you should monitor the disk per Avg. Sec/ Reads; numbers consistently above 30MS are a serious I/O bottleneck issue. If you have very large tempdb requirements, look at Q-917047, "Microsoft SQL Server I/O subsystem requirements for tempdb database" or in BOL for "Optimizing tempdb Performance." We are not trying to scare anyone with SQL Server 2005 tempdb requirements, but we are trying to convey the fact that you may have to put some serious work into tempdb capacity planning.

Note

A concurrency enhancement for tempdb is available via trace flag database T1118. See Q328551 for additional information.