File Placement and Protection | Microsoft SQL Server 2000 High Availability

Another crucial factor in database availability and performance is where you place the data, log, and index files used by SQL Server, and what level of RAID you employ for performance and availability. This is another difficult piece in the disk puzzle, because it goes hand-in-hand with designing your storage at the physical hardware level. You cannot do one without the other. Especially at this stage, what you do not know about your application will hurt you. The physical implementation of your logical design, not unlike turning a logical schema into actual column names and data types, is a difficult process. They are both closely tied together, so one cannot be done in isolation from the other. For example, the logical design at some level has to take into account the physical characteristics of any proposed disk subsystem.

Note	In a failover cluster, all system databases are placed on the shared disk array. No databases are local to either node, otherwise a failover would be impossible .

On the other hand, the SQL Server 2000 binaries for both clustered and nonclustered systems are installed to local, internal drives in the system itself. This is a change from SQL Server 7.0. As in a cluster, you needed to put the binaries on the shared cluster disk. You should standardize where on each server SQL Server will be installed so that any DBA or administrator will know where SQL Server resides. In a clustered system, these binaries must exist in the same location.

System Databases and Full-Text Indexes

For file placement of the system databases, it is easier to give general recommendations for both clustered and nonclustered systems. The only difference for a failover cluster versus a stand-alone SQL Server is that the system databases must all reside on the shared disk array in a cluster ”they cannot be local to a server node. The following are the two main system databases for which you need to be concerned about file placement:

Tempdb This system database is at the top of the list. Tempdb is often forgotten in the configuration equation, but can be a pivotal linchpin in the success of your SQL Server implementation because it serves as the scratch pad for the system. Tempdb is used to store all temporary tables (the ones that have a # prefix) and temporary stored procedures, and it is also used by queries for things like joins (which can create temporary work tables) and sorts. Tempdb is re-created every time SQL Server is restarted, so it is not a database that needs to factor into a backup and restore strategy. Tempdb, because it is write- intensive , would be better off placed on a RAID 1 LUN of its own if it is heavily used. RAID 5 might not buy you much with tempdb unless you could either give tempdb its own control channel or more working spindles. Combining tempdb with user databases is not a good idea if each individual database is heavily used.

Tip

Remember, tempdb also has a log file. Although it is rare, there are extreme cases in which it is possible to cause the performance problems for the tempdb log with enough transactional load. This would then require you to move this log to a separate disk or disk stripe in the same way you would for user data. Again, do not configure tempdb this way unless you have to.

Distribution The distribution database is created when replication is installed. With merge and snapshot replication, the distribution database only stores status. However, with transactional replication, it is much more heavily used and must factor into any planning and performance. Hotspotting can potentially occur with RAID 5 if your disks are not quick enough to handle a heavy transactional volume with replication because of the overhead incurred to calculate and write the parity bit, so a striped mirror or mirrored stripe would probably help the distribution database. If your distribution database is located on the same server as your publication database and you have performance problems, you might try moving the distribution functionality to another server before you alter the disk subsystem.

Full-text indexes are also I/O intensive. In a cluster, these also obviously need to be on the shared disk array. In a nonclustered environment, you should not put the full-text index on the same drive or LUN as your log or data. Unless you have numerous spindles, your write performance suffers with RAID 5, so use some other form of RAID to increase your write and read performance.

Database backups are covered in depth in Chapter 9, Backup and Restore, but they should be taken into account at the time of disk configuration. It might be cheaper and quicker to back up to disk, but if you are placing backups onto disk, or doing your primary backup to disk first, you need to account for that in your performance and storage requirements. You do not want to be performing a backup to a database and placing its resulting file on the same LUN or disk as the existing data file, as that reduces both your disk performance and available space.

For the system databases (master, model, tempdb, and so on), there is generally much less I/O than there is in a user database. Make sure that you protect them with at least RAID 1 for recovery and availability. Msdb is very important because it contains some settings that would be harder to restore without a good backup or having it available. Master is also important, but it can be rebuilt to an initial state with the REBUILDM command-line utility. You would still need a backup with your settings to restore over that, but there is no corresponding REBUILDM-like tool for msdb.

User Databases

Where to place the user databases is always the million-dollar question, with no one- size -fits-all answer. Like all of high availability, it depends, based on all of the factors presented in this chapter.

Databases, the Quorum, and Failover Clustering

As noted in Chapter 3, An Introduction to Microsoft High Availability Technologies, only one instance of SQL Server can access a given LUN (which might even have multiple logical drive letters configured on it), so it cannot be shared among instances. Similarly, the quorum, which is used by a server cluster (but not a majority node set server cluster ”for more information, see Chapter 5), must be on its own disk because the base cluster resources need this disk. The minimum size recommendation for the quorum is 500 MB, and it might be larger if you keep the default configuration for the clustered MS DTC, which logs to the quorum disk. Again, MS DTC configuration is covered in Chapter 5. All of these factors affect any placement of databases and files in the cluster, so it must be planned for up front.

Files and Filegroups

One of the biggest questions is when to use files and filegroups. Each database for SQL Server 2000 has a primary data file, and others, known as secondary data files, can also be used. In addition to the data files, there are files used for the transaction log. The primary data file has an extension of .mdf, secondary data files have an extension of .ndf, and log files have an extension of .ldf. Filegroups are exactly what they sound like: a group of files, but they are files that are grouped together. You cannot have files or filegroups used by more than one database. If you create a database named MyDatabase, only that database can use its corresponding .mdf and .ldf files.

Although there is a potential performance benefit to using multiple files and filegroups, the main reason you would consider implementing files and filegroups would be for easier administration. First and foremost, when you use multiple files for a database, whether log or data, they are filled proportionally. If you add a new file into the filegroup, it might show heavy I/O until it catches up with the other files. Second, files and filegroups allow you to deal with smaller portions of your database when doing maintenance, such as backups and restores , making maintenance windows easier to deal with. The success of files and filegroups is not to split up I/O bandwidth but to combine I/O bandwidth (that is, placing a file on each disk stripe for each filegroup). Scan density is king. SQL Server 2000 bases the degree of parallelism on the system load at the time the query execution plan is built. Readahead is based on the size of the buffer pool and the overall capacity of the I/O subsystem, so the number of files in a filegroup does not influence this.

However, as with any technology, there are always catches. Probably the biggest one for files and filegroups is that some maintenance within SQL Server 2000, such as DBCC INDEXDEFRAG, only works on a per-file basis. If you have an index that spans multiple files, you might encounter some fragmentation. DBCC DBREINDEX spans multiple files, but that is a trade-off for the potential performance impact and possible blocking due to employing it.

Caution

Do not use files and filegroups just for the sake of using them. If your database is small enough to easily be backed up and restored by your current strategy, files and filegroups might complicate things for you from an administrative perspective. In this case, if you decide to employ files and filegroups, it would generally be a performance improvement decision, which is not the usual case for implementing them ”it is usually an administrative decision.