IO | Professional SQL Server 2005 Administration (Wrox Professional Guides)

I/O

I/O encompasses both network I/O and disk I/O. In most cases with SQL Server, you are primarily concerned with disk I/O, as that's where the data lives. However, you also need to understand the effect that poor network I/O can have as a bottleneck to performance.

Configuring I/O for a server storage system is perhaps the place where you have the most options, and it can have the largest impact on the performance of your SQL Server system. When you turn off your computer, the only thing that exists is the data stored on your hard drive. When you turn the power on, the processor starts running, the OS is loaded, and SQL Server is started; all this happens by reading data and code from the storage subsystem.

This basic concept is true for everything that happens on a computer. Everything starts its life on the disk, and has to be read from the disk into memory, and from there through the various processor caches before it reaches the processor and can be used as either code or data. Any results the processor arrives at have to be written back to disk to persist any system event, e.g., shutdown, failure, maintenance, etc.

SQL Server 2005 is very sensitive to disk performance, more so than many applications, because of the manner in which it manages large amounts of data in user databases. Many applications have the luxury of being able to load all their data from disk into memory and then being able to run for long periods of time without having to access the disk again. SQL Server strives for that model, as it is by far the fastest way to get anything done. Unfortunately, when the requested operation requires more data than can fit into memory, SQL Server has to do some shuffling around to keep going as fast as it can, and it has to start writing data back to disk so it can use that memory to generate some new results.

At some point in the life of SQL Server data, every piece of data that SQL server uses has to come from disk, and must be written back to disk.

Network

Referring back to Figure 11-2, you can see that the network is a key component in any SQL Server system. The network is the link over which SQL Server receives all its requests to do something, and by which it sends all its results back to the client. In most cases, today's high-speed networks provide enough capacity to enable a SQL Server system to use all its other resources (CPU, memory, disk) to their maximum before the network becomes a bottleneck.

In some systems the type of work being done on the SQL Server is relatively small compared to the number of requests being sent to the server, or to the amount of data being returned to the client. In either of these cases, the network can be a bottleneck. Network bottlenecks can occur anywhere in the network. They can be on the NIC of the client, where the client is an application server that's serving the database server with hundreds of thousands of requests per second. Bottlenecks can occur on the fabric of the network between the NIC or the client (application server, Web server, or the user's workstation). This network fabric can consist of many pieces of network infrastructure, from the simplest system in which two machines are connected over a basic local area network to the most complex network interconnected systems in either the Internet or a global corporate WAN. In these larger more complex interconnected systems, much of the network can be beyond your control, and may introduce bandwidth or latency issues outside of acceptable limits, but that we have no control over. In these cases you can do little more than investigate, document, and report your findings.

The parts of networking that we are going to examine here are those that you may have direct control over, and all are on the SQL Server system. We are making the assumption that the remainder of the network fabric is up to the job of supporting the number of requests we receive, and of passing the results back to the client in a timely manner.

It's beyond the scope of this book to go into the details of monitoring and tuning the network outside of the SQL Server system.

Disks

The other piece of I/O is disk IO. With earlier versions of SQL Server, disks were pretty simple, leaving you with limited options. In most cases you had a couple of disks at most to deal with. Large enterprise systems have long had the option of using SAN storage, while medium to large business systems have been able to use external disk subsystems using some form of RAID, and most likely utilizing a SCSI interface that enables you to build disk subsystems with hundreds if not thousands of disks.

Take a moment to consider some of the basic physics involved in disk performance. It's important to understand the fundamental differences between different types of disks, as they explain differences in performance. This in turn helps you make an informed decision about what kind of disks to use. The following table demonstrates some fundamental disk information.

Open table as spreadsheet

Disk Rotational Speed	Rotational Latency	Track-to-Track Latency	Seek Time	Data Transfer Rate	Transfer Time for 8KB	Total Latency
5,400 rpm	5.5 mS	6.5 mS	12 mS	90MB/Sec	88 >S	12.1 mS
7,200 rpm	4.1 mS	6.5 mS	10.7 mS	120MB/Sec	66 >S	10.8 mS
10,000 rpm	3 mS	1.5 mS	4.5 mS	166MB/Sec	48 >S	4.6 mS
15,000 rpm	2 mS	1.5 mS	3.5 mS	250MB/Sec	32 >S	3.5 mS

Rotational latency is the time it takes the disk to make a half-rotation. This figure is given for just half a rotation rather than a full rotation because on average the disk makes only a half rotation to find the sector you want to access. It is calculated quite simply as rotational speed in rpm divided by 60 (to give revolutions per second) and then divided by the number of rotations per second.

Track-to-track latency is the time it takes the disk head to move from one track to another. In the table, the number for 5,400 and 7,200 rpm disks is considerably slower than for the 10,000 and 15,000 rpm disks. This indicates that the slower disks are more likely to be ATA disks rather than SCSI disks. ATA disks have considerably slower internals than SCSI disks, which accounts for the considerable difference in price and performance between the two disk types.

Seek time is the magic number disk manufacturers publish and it's their measure of rotational latency and track-to-track latency. This is calculated in the preceding table as the sum of rotational latency and track-to-track latency.

Data transfer rate is the average rate of throughput. This is usually calculated as the amount of data that can be read from a single track in one revolution, so for a modern disk with a data density of around 1MB/track, and for a 7,200 rpm disk, rotating at 120 revs per second, that equates to around 1MB per track 120 revolutions per sec = 120MB/sec of data that can be read from the disk.

Transfer time for 8KB is the time it takes to transfer a given amount of data at that transfer rate. To see how long it takes to transfer an 8KB block, you simply divide the amount of data you want to move by the transfer rate, so 8KB/120MB = 66 micro secs (or >S)

Total latency for a given amount of data is the sum of rotational latency, track-to-track latency, and disk transfer time.

In the table you can clearly see that for a single 8KB transfer, the largest amount of time is spent moving the head to the right position over the disk's surface. Once there, reading the data is a tiny percentage of the time.

Latency limits the disk's ability to service random read requests. Sequential read requests will have an initial latency, and are then limited by the disk's ability to read the data from the disk surface and get each I/O request back through the many layers, before reading the next sector from the disk's surface.

Write requests will often be buffered to cache, which is bad for SQL Server. You always want to turn disk caching off, unless you are using a storage array with a battery backup that can guarantee writing the cache to disk in the event of a power failure.

Throughput in MB/sec is a measure of how many bytes the disk can transfer to or from its surface in a second. This is usually quoted as a theoretical number based on the disk's bus.

Throughput in IOs/sec is a measure of how many I/Os the disk can service per second. Most often this is quoted as a theoretical number based on the disk's bus.

Storage Design

After all that talk about the different pieces of a storage system, it's time to get serious about figuring out how best to configure your storage system. This is a pretty difficult thing to do, so we have come up with a set of guidelines, and then go into the details of how you can figure this out for yourself.

No Silver Bullet in Storage Design

Simply put, there is no single simple way to configure storage that's going to suit every purpose. SQL Server systems can be required to do dramatically different things, and each implementation could require a radically different storage configuration to suit its peculiar I/O requirements.

Use the Vendors' Expertise

The vendors of each piece of hardware should be the people you turn to for expertise in how to configure their hardware. They may not necessarily know how to configure it best for SQL Server, however, so this is where you have to be able to convey SQL Server's requirements in a manner the hardware vendor might understand. This is best done by being able to quote specific figures of reads versus writes, sequential versus random I/O, block sizes, I/O's per second, MB/sec for throughput, and minimum and maximum latency figures. This information will help the vendor provide you with the optimal settings for their piece of the hardware, be it disks, an array controller, fiber, networking, or some other piece of the storage stack.

Every System Is Different

Each SQL Server system may have very different I/O requirements. Understand this and don't try to use a cookie-cutter approach to I/O configuration (unless you have already done the work to determine that you really do have SQL systems with the exact same IO requirements).

Simple Is Better

It is an age-old engineering concept that simpler solutions are easier to design, easier to build, easier to understand, and hence easier to maintain. In most cases, this holds true for I/O design as well. The simpler solutions invariably work faster, are more robust and reliable, and take less maintenance than more complex designs. Unless you have a very compelling, specific reason for using a complex storage design, keep it simple.

More Disks

More disks are invariably faster than fewer disks. For example, if you have to build a 1TB volume, it's going to deliver higher performance if it's built from a lot of small disks (ten 100GB disks) versus a few larger disks (two 500GB disks). This is true for various reasons. First, smaller disks are usually faster than larger disks. Second, you stand a better chance of being able to utilize more spindles to spread read and write traffic over multiple disks when you have more disks in the array. This gives you throughput that in some cases is the sum of individual disk throughput. For example, if those 100GB disks and the 500GB disks all delivered the same throughput, which was, say 20MB/Sec, then you would be able to sum that for the two 500GB disks to achieve just 40MB/sec. However, with the ten smaller disks, you would sum that to arrive at 200MB/sec, or five times more.

Faster Disks

Not surprisingly, faster disks are better for performance than slower disks. However this doesn't just mean rotational speed, which is just one factor of overall disk speed. What you are looking for is some indicator of the disk's ability to handle the I/O characteristics of the workload you are specifically interested in. Unfortunately, disk manufacturers rarely, if ever, provide any information other than rotational speed and theoretical disk bus speeds. For example, you will often see a 10K or 15K rpm SCSI disk rated as being able to deliver a maximum throughput of 300MB/sec because it's on a SCSI 320 bus. However, if you hook up that disk and start running some tests using a tool such as SQLIO, then chances are good that for small-block-size non-queued random reads you will be lucky to get much more than 2–4MB/sec from that disk. Even for the fastest I/O types, large-block sequential I/O, you will be lucky to get more than 60–70 MB/sec. That's a long way from 300MB/sec.

Test

Testing is an absolutely essential part of any configuration, optimization, or performance tuning exercise. Too often we have spoken with customers who have been convinced that black is white based on absolutely nothing more than a gut feeling, or some half-truth overheard in a corridor conversation.

Until you have some test results in your hand, you don't truly know what the I/O system is doing, so forget all that speculation, which is really nothing more than poorly informed guesswork, and start testing your I/O systems to determine what's really going on.

Monitor

Once you have set up the system and tested it, you need to keep monitoring it to ensure that you are aware of any changes to the workload or I/O system performance as soon as it starts. If you have a policy of monitoring, you will also build a history of system performance that can be invaluable for future performance investigations. Trying to track down the origins of a slow-moving trend can be very hard without a solid history of monitoring data. See Chapter 14 for more specifics on what and how to monitor.

Designing a Storage System

Now let's run through the steps you need to take when designing a storage system. The following sections introduce each of the different parts of a storage system, providing some guidance on key factors, and offering recommendations where they are appropriate.

The first questions you have to answer when designing a new storage system are about the disks. How many disks do you need? What size should they be? How fast should they be?

Space

The first thing you really need to determine is how much space you need. Once you have sized the data, you need to factor in index space, growth, room for backups, and recovery. All of these factors will increase the amount of space required.

How Many Disks

The number of disks you need is going to be driven by the amount of space you need, the performance you want, and how robust the storage needs to be. As an example, here are several alternative ways to build a 1TB disk subsystem:

Two 500GB SATA disks with a basic SATA controller: This is the simplest and cheapest option. Use two of the latest 500GB disks. However, this is also the slowest option, and the least robust.
Four 500GB SATA disks with a RAID 1 controller: This is twice the cost of the previous system, and isn't any faster, but it provides a robust solution using a mirroring controller to provide redundancy.
Four 250GB SATA disks with a RAID 0 controller: This system is a non-robust step to improve performance. You've increased the number of disks by reducing their size. Using a striping controller, you can get some performance improvement by writing and reading from more disks at the same time.
Eight 125GB SCSI disks with a RAID 0 controller: Now you're starting to get into a more serious performance configuration. This uses more, smaller disks, which will provide a significant performance improvement.

With all of these configurations you need to be aware of other bottlenecks in the I/O path. The first thing to consider is the PC's bus bandwidth, and then the disk adapter's bus bandwidth. If your server is using a basic PCI bus, then this will likely be limited to around 200MB/sec, so anytime you need to do large sequential I/O to more than two to four disks, you may be limited by the PCI bus. Next is the disk bus. For example, with a SCSI 320 bus, you can max out the bandwidth with around six disks at about 260MB/sec. If you need a system able to deliver higher than 250MB/sec, you should consider using an adapter with multiple channels.

Cost

Much as we would all like to have an unlimited budget, that's unlikely to be the case in most environments, so the ideal design will have to be modified to bring it in line with the available budget. This often results in less than optimal designs, but it's a necessary factor when designing any system.

Desired I/O Characteristics

The first thing you need to consider here are the I/O characteristics of the operations you are going to perform in SQL Server. This can be a complex issue to resolve, and in many cases the easiest way to determine this is through testing and monitoring of an existing system so that you can identify the exact types and volume of I/O that each major feature or function requires.

If you don't have an existing system, then you may be able to use the information in the following table as a guideline to get you started.

Open table as spreadsheet

Operation	Random/Sequential	Read/Write	Size Range
CREATE DATABASE	Sequential	Write	512KB (Only the log file is initialized in SQL Server 2005)
Backup	Sequential	Read/Write	Multiple of 64K (up to 4MB)
Restore	Sequential	Read/Write	Multiple of 64K (up to 4MB)
DBCC - CHECKDB	Sequential	Read	8K–64K
DBCC - DBREINDEX (Read Phase)	Sequential	Read	(see Read Ahead)
DBCC - DBREINDEX (Write Phase)	Sequential	Write	Any multiple of 8K up to 128K
DBCC - SHOWCONTIG	Sequential	Read	8K–64K

It is not necessary for all of the space you need to have the same storage characteristics. The SQL Data Log, and tempdb should go onto the fastest, most robust storage you can afford, while space for backup and recovery and other repeatable tasks can be slower and less robust.

In some cases, where your database is a copy of some other master system, you might decide to build the whole system on cheap, non-robust storage. In the event of a failure, you could rebuild the whole system from the master system. However, if your copy has an SLA that requires minimal downtime and doesn't allow for a full rebuild, even though the data is only a copy, you have to implement some form of robust storage so that you never have to take the unacceptable hit of rebuilding from the master.

RAID

As part of the "How many disks do you need?" question, you must consider the RAID level you require, as this will influence the total number of disks required to build a storage system of a certain size and with the I/O characteristics you require:

Availability: The first factor when thinking about RAID is the level of availability you need from the storage.
Cost: An important part of any system is meeting any cost requirements. It's no good specifying the latest greatest high-performance system if it costs 10, 100, or 1,000 times your budget.
Space: Another major factor in combination with cost is how much space you need to provide.
Performance: The performance of the storage is another major factor that should help you determine what level of RAID you should choose.

Mirror with Stripe

This is the fastest, most robust option, but it also costs the most to implement. It may be overkill in some cases. Backups, data load, and read-only copies may not require this level of protection.

Striping with Parity

Striping with parity is a more cost-effective option to provide a degree of robustness, with some increase in performance. The downside with RAID 5 is that it's a bit of a compromise all round, and in the event of a disk failure system, performance can be seriously degraded when rebuilding the array with the new disk.

RAID-Level Recommendations

You need to use the fastest most robust storage for SQL data files. The recommendation is to use striping with mirroring.

You also need to use the fastest most robust storage for SQL log files. As for SQL data files, the recommendation is to use striping with mirroring.

If you know your application is going to make extensive use of tempdb, use the fastest most robust storage for tempdb. This might seem a little strange because the data in tempdb is always transitory, but the requirement for robustness comes from the need to keep the system running, not from a concern about losing data. If the rest of the system is using robust storage but tempdb isn't, then a single disk failure will prevent everything from working.

The operating system and SQL files can live on a simple mirror, although in many cases the time to rebuild the OS and SQL may be within acceptable downtime, so then a single disk will suffice.

For critical systems, the OS and SQL files should be on a mirrored disk array, but it only needs to be a single mirrored pair. OS and SQL files don't have high I/O requirements; they are typically read once when the application is loaded, and then not touched until new code paths may be used, and then a few more 4K random reads are issued. These files just don't require high-performance storage.

Isolation

Isolation is needed at several levels. You want to isolate the different types of I/O SQL generates to optimize each storage system. You don't want the heavy random I/O generated by a high-volume OLTP system from the highly sequential write access to the log file.

At the same time, on a shared storage system such as a SAN, you want to ensure that your storage is isolated from the I/O generated by other systems using the same shared storage. Isolation is primarily concerned with the sharing of disks that results from virtualization on a SAN system, but the principle can also apply to other parts of the storage subsystem, specifically the ports, fiber, and switches that make up the SAN fabric.

Separate SQL Data from the Log

I/O to SQL data files is very different in nature from I/O to the SQL log file. SQL data traffic is random in nature with relatively larger block sizes, occasionally becoming large sequential I/O for large table scans. SQL log traffic is sequential in nature, and is predominantly write until a checkpoint occurs, when you will see some read activity.

Because of this, it's important to separate data files and log files onto separate disks. Doing this enables the heads on the log disks to track sequentially, matching the log write activity. The heads on the data traffic will have a lot of seeks, as they need to get the next random disk block, but this won't impact log performance, nor will it be further randomized by having to intersperse random data reads and writes with sequential log writes.

Group Similar I/O Workloads

Going one step further, if you have multiple databases, or SQL instances, or tables within your database with very different access patterns, then wherever possible you should try to group similar I/O patterns together on the same set of disks. Identify read-only data access, write-only data access, and frequent read/write accesses, and then separate each group onto its own set of spindles.

Using tempdb

SQL Server 2005 makes much more extensive use of tempdb than previous versions, so it's very important to know how often you need to use this database. See Chapter 14 for more details on monitoring. If after monitoring you find that your system uses tempdb extensively, you should consider placing it on separate disks.

Another option considered by some users is placing tempdb on a RAM disk, or other high-performance solid state disk. Because of the lightning-fast response times of these kinds of disks, this can provide a considerable performance boost for systems that make extensive use of tempdb.

Note

Although it's OK to place tempdb on volatile disks such as RAM disks, it's not OK to put any other SQL files on these kinds of volatile storage. SQL Server has very specific requirements for its storage to ensure the integrity of data.

tempdb is discussed in more detail in Chapter 2 and Chapter 12.

Large Storage System Considerations: SAN Systems

More and more SQL Server systems are using storage provided by an external storage array of some kind. Frequently, these large external systems are called a SAN system, but they could be NAS or some other storage array technology such as iSCSI. This terminology doesn't refer to the storage but to the technology used to connect your SQL Server to the box of disks on a network of some kind.

In the case of a SAN system, the network is a dedicated storage network, frequently built using fiber. For an iSCSI system, the network is an IP network (the IP part of TCP/IP) built with compatible network cards but using a private network dedicated to storage traffic. For a NAS system, the network is not dedicated; it is shared with all your other network traffic.

Any discussion on SQL storage configuration has to include information on the concepts involved in configuring an external storage array.

Disk Virtualization

SAN systems have many challenges to face when you are placing large numbers of fast, high-capacity disks in a single box. One of these problems is how to provide the fastest storage to the largest number of people. You know that to increase disk performance, you need to stripe across as many spindles as possible, but many users of the SAN may require as little as 250GB or less of storage. If your SAN system is filled with 76GB disks, you could deliver that 250GB using three or four disks. Unfortunately, delivering 250GB from just four disks isn't going to provide much performance. Step back and consider that the SAN itself may have as many as 140 of these 76GB disks. If you could take a small piece of a larger number of disks, you could build the same 250GB chunk of storage, but it would be much faster.

Now consider the situation in which you can take a 2GB chunk from each of those 140 76GB disks. When you combine all those 2GB chunks together, you end up with 280GB of raw disk space. After subtracting some system overhead, that would probably end up at around 250GB. Sure, you wasted a little bit of space for the overhead of managing 140 disks, but now you have a 250GB chunk of storage that has the potential to deliver I/O at the rate of the sum of 140 disks; or if each disk were capable of 50MB/sec for sequential 64K reads, you would have a combined I/O throughput of run around 7,000 MB/Sec (7GB/sec). In practice, you won't get anywhere near that theoretical I/O rate, but you will see considerably higher I/O rates than you would if you just combined three or four disks together.

Disk virtualization is a technique that SAN vendors use for doing this. All the large SAN vendors have their own unique method, but they are all based around the same basic concept: Slice each disk up into uniform slices, recombine these small slices into larger chunks of storage, and present these virtualized chunks to each application.

Start with a single disk, and create multiple slices on the disk.
Do the same thing across multiple disks.
Group a collection of these disk slices together using some level of RAID, and present it to the server as a logical unit number (LUN).

LUNs

When considering the LUNs that the storage is going to present to the OS, you have to answer two questions. How big should each LUN be, and how many should you have? The starting point for answering these questions is to determine how much you need. If you have differing storage requirements, you also need to know how much of each different type you require. Different types of storage might range from high speed, high reliability for data and log; low speed, high reliability for archive data; high speed, low reliability for backup staging (before writing to tape); and low speed, low reliability for "other" storage that doesn't have tight performance criteria or present reliability concerns.

LUN Size

The next factor to be considered is LUN size. For a SAN-based system, or large local storage array, this equates to how big you make each chunk of storage presented to the operating system. These LUN-sized chunks of storage are the first time the OS sees the storage. You can tell from the amount of work you have already done with virtualization that the OS has no way of knowing how many disks, or how much from each disk, each LUN represents.

A number of factors can influence your decision regarding LUN size. You need to consider how the storage is going to be mounted in the OS. If you are going to mount each LUN as its own volume, then you can make them a little larger, although you need to remain far below the single-volume limit of 2TB. You want to stay well below this limit due to backup, restore, and startup times. On startup, the OS will run a basic check of each volume. Basically, the OS runs Check Disk (CHKDSK) at startup to validate each volume. If the volume is very large, the time taken to run chkdsk can become long - in some cases, very long, which can start to have a large impact on system startup time. For large enterprise servers with less than one server restart scheduled per year, this isn't a major problem, except when it's time to set everything up - installing the OS and drivers, and configuring the server; many reboots may be required. If each reboot takes one or two hours, restarting the server eight or more times becomes a two or three day labor, rather than something you do while getting another coffee refill.

In most cases, the storage array vendor will have specific recommendations regarding how big each LUN should be. These are based upon their extensive experience with their storage systems. Be cautious here because much of the storage vendor's experience will have come from storage arrays, not running SQL Server, whose I/O requirements are completely different. Make sure they understand the unique nature of the storage I/O characteristics you have for your system.

Number of LUNs

If there are no clear criteria that set an ideal size for each LUN, then there may be factors that dictate a specific number of LUNs. The simplest way to determine the number of LUNs you need is to divide the total storage volume required by the size of LUN you want, which gives you the ideal number of LUNs. That's not a very informed way of determining the number, however, and in many ways the calculation should start from the number of LUNs. The starting point for this is the storage vendor's recommendations. They should be able to provide guidance on how best to configure their storage, but their expertise might not include experience with SQL Server.

Server Configuration

After spending a lot of time configuring the storage, you still have to configure the server. The main configuration options on the server are related to the number, placement, and configuration of the host bus adapter (HBA), and then you're into the details of the operating system, and more specifically the device manager and the file system.

Disk Adapters

The disk adapter is the interface card that you plug into the PCI bus inside your server to enable the PCI bus to connect to the ATA, SCSI, iSCSI, or fiber channel cabling required to connect to the disks. Several factors should be considered with disk adapters.

Number of Adapters

When determining the number of disk adapters, you first have to consider how many disks you need to support. On some interfaces, such as ATA and SCSI, there are physical limits to the number of disks allowed per bus. This is two disks per ATA bus (one master, one slave) and either eight or sixteen per SCSI bus (depending on which version of SCSI is in use). If you need an ATA-based system with eight disks, you need to have enough disk adapters to provide four ATA buses. If you need a SCSI-based system with 32 disks, and you are using a version of SCSI that supports 16 disks per bus, you would need two buses.

One adapter can provide more than one bus. You might find an ATA adapter that implements two or maybe four ATA buses, so your eight-disk system might only need a single adapter. In the case of SCSI adapters, many are multi-bus, so you should be able to deliver two SCSI buses from a single adapter card.

Placement

In larger servers, the physical placement of the adapter card in the PCI slots can have an impact on performance. There are two ways placement can affect performance. On a large system, the physical distance between the furthest PCI slot and the CPU can increase latency and reduce overall bandwidth.

On some systems the PCI slots are not all of the same type. It is increasingly common for systems to have a few PCI-X or fast 64-bit PCI slots with the remainder being slower slots. Placing a high-speed 64-bit, PCI-X, or PCI-Express disk adapter into a slow-speed PCI slot (some won't physically fit, but some will) can force the whole bus to run at a considerably slower speed than expected.

Firmware

Most disk adapters have firmware that can be upgraded. Even when you first purchase a new disk adapter, you should check the vendor's Web site to make sure you have the very latest firmware version. Vendors will change their firmware to fix bugs, improve performance, and match specific drivers, so you must always confirm that you have the latest firmware to ensure optimal performance.

Drivers

Even though the disk adapter probably comes with drivers, there is a very good chance that they are outdated by the time you purchase the adapter, so even before installing anything you should check the vendor's Web site for the very latest signed drivers.

In most cases, you should only ever run the latest signed driver, because this is the one that has been through full certification testing. Sometimes the vendor's Web site will offer a new version of a driver that fixes a specific problem you are encountering, but which hasn't been signed yet. In this very specific case, it's acceptable to take the risk of using the unsigned driver, provided you understand that you may encounter system crashes as a result of unfound bugs. If this happens, then the unsigned driver should be removed or rolled back immediately while further troubleshooting is performed to determine the true cause of the system crash.

Configuration

Many disk adapters have options that can be set to change the way they work in different environments, or for different types of disks to which they may be attached. For ATA and SCSI adapters, this configuration is usually pretty minimal, unless the adapter includes RAID, which is another topic altogether. In most cases the configuration involves fiber channel options on HBAs. iSCSI adapters that are network interface cards (NICs) also have configuration options, but these are the same settings you may have used to configure any IP-based network adapter, and were covered earlier in the material on NICs.

Partitioning

After installing the disks, you need to configure the disks themselves, and the first step with a new disk is to consider the disk partitions. What kind of partition do you want to create on the disk? There are two main options: an MBR partition or a GPT partition. Unless you have a specific requirement that forces the use of GPT, you should use an MBR partition.

Mounting Volumes

After you have partitioned the disks, you have to decide what kind of volumes you want. The choice here is between using a basic volume or a dynamic volume. We recommend using basic volumes, unless you need specific features that only a dynamic volume can provide. If you are running in a clustered environment, there are limitations on using dynamic volumes. This may result in having to resize your LUNs, or using an alternate option such as mount points.

Start Sector Alignment

Start sector alignment is a topic that frequently comes up for discussion. When dealing with storage arrays, the value you use for start sector alignment is driven by the cache line size. You need to ensure that the start sector on the partition is aligned with the start of the cache line on the SAN. If you're not familiar with the intricacies of cache lines, ask your SAN vendor.

The value to use for start sector alignment will come from the storage vendor. After obtaining a value, check to see if the vendor specified this in sectors (each sector is 512 Bytes) or KB. The Start Sector Offset is set using either diskpar or diskparT. Note the difference, as one has a T at the end, the other doesn't. diskpar is the older tool. diskparT started shipping with WS03 SP1. diskpar takes the disk offset as a number of clusters, whereas diskparT takes its offset as a number of KB. A value specified in diskpar of 64 (clusters) results in a start sector offset of 32K. Using 32 (KB) in diskparT/align will also give you a 32K offset.

File Systems - NTFS versus FAT

This shouldn't even be an issue anymore: Always use NTFS. The reliability offered by NTFS, and the speed and functional improvements of NTFS over FAT, make it the only real choice for your file system.

NTFS Allocation Unit Size

Another topic that frequently comes up is the NTFS allocation unit size, also known as NTFS cluster size. In testing, we ran with SQL Server 2005, and found that changing the NTFS cluster size has no impact on SQL performance. This is due to the small number and large size of the files that SQL uses. The recommendation that results from this is to continue to use an allocation unit size of 64K. Whatever you decide to use, test to confirm that you are getting the I/O characteristics you expected.

Fragmentation

Any discussion on disks would not be complete without a discussion of fragmentation. Fragmentation can occur in several forms with SQL Server:

Internal fragmentation occurs when data gets old, i.e., it has been subject to many inserts, updates, and deletes. This is covered in Chapter 16.
External fragmentation, which we are interested in here, can take two forms:
- Classic file fragmentation occurs when a file is created and the file system doesn't have enough contiguous disk space to create the file in a single fragment. You end up with a single file spread across multiple file fragments.
- Autogrow fragmentation is the fragmentation that occurs when you enable autogrow and the database size continuously grows with the addition of more files. These files may or may not have classic file fragmentation as well, but SQL Server has to manage multiple data and/or log files, which creates additional overhead. In some cases the number of files can be in the thousands.

One important point to consider here is that SQL files don't become more fragmented once they have been created. If files are created when there isn't enough contiguous free space, they are created in multiple fragments. If the disk is defragmented (and the OS has enough space to fully defragment all files) right after the files are created, then the files are no longer fragmented, and won't ever become fragmented.

The ideal scenario is that you have dedicated disks for your SQL files, and can size each file correctly, create the files, and disable autogrow. In this situation, you start with clean disks, create one or two files that aren't fragmented, and they stay that way forever. That way, you only need to deal with internal fragmentation.

The next scenario is when you start from the previous situation but allow autogrow at some tiny size or percentage, so you end up adding hundreds or thousands of small files. In this case, those files may or may not be fragmented, depending on how much free space is available on the disk when each autogrow operation occurs. Your only solution here to remove the fragmentation is to schedule server downtime to rebuild each DB using a few large files, sized correctly for the expected DB growth, and then disable autogrow. Doing this will resolve any disk fragmentation that occurs, and by disabling autogrow you will prevent external fragmentation from ever occurring.

The worst case is that you don't have dedicated disks, you used the default DB sizes, and enabled autogrow. Now you may have several problems to resolve. Your SQL files are competing for I/O capacity with the OS, and anything else running on the server. Until you add dedicated disks for SQL this won't be resolved. In addition, you may also end up with a lot of file fragments, as each autogrow operation adds another data or log file. As each new file is created by autogrow, it might be fragmented over the disk surface. As more files are added and the disk fills up, the chances of created fragmented files increases.

The best way to avoid problems is as follows:

Install the OS.
Defragment the disk.
Install any applications (SQL Server).
Defragment the disk.
Create data and log files at maximum size.
Check for fragmentation and defragment if necessary.
Disable autogrow.
Routinely defragment the disk to clean up fragmentation caused by other applications. This preserves the free space should you ever need to add more SQL data or log files.

In most cases, the operating system's disk defragmenter does a great job and is all you need. In some cases, however, you may need to consider purchasing a third-party disk defragmentation utility.