Understanding Filegroups | Microsoft SQL Server 2005: The Complete Reference: Full Coverage of all New and Improved Features

Databases and their contents, including tables, indexes, and various programmatic elements, are stored in files. These files are kept in filegroups. A filegroup is another form of container in SQL Server. To refresh your memory with respect to file location flexibility, you can store your data files in various locations and then use filegroups to manage them as a unit. For example, you could locate the primary file papazulu.mdf in the volume it was created on and then store the secondary data files (papazulu1.mdf, papazulu2.mdf, and papazulu3.mdf), if you need them, on a number of different drives (even locating each new file on its own hard disk). Then create a filegroup (called pzulugroup1) and assign all the papazulu secondary database files to this filegroup. When it comes to backup, you can then back up the filegroups instead of the three or four separate files. (The script to back up a filegroup is covered in Chapter 7.)

One of the best features of the filegroup, however, is the flexibility you get in managing your database objects. For example, if you have a huge table that is the constant target of queries and modifications, it would make sense to allocate it to a filegroup on a faster disk. Then most of the I/O for the database can be targeted to the faster disk.

Spreading database files around like this can help tremendously with performance because each file can be dedicated to its own hard disk. This might become a little expensive if you are using a RAID system for redundancy (RAID 5 will require at least three disks, of course). On the other hand, RAID configuration aside, if you are working with really large databases, your primary and secondary files might grow to sizes that bump up against the limits of your Windows Server file system file size. For more on RAID, see Chapter 9.

SQL Server fills all files in the filegroup on a proportional basis so that all files in the filegroup become full at the same time. When all the files in the filegroup are full, SQL Server will then expand all the files in the group one at a time on a round-robin scheduling basis if the database is set to grow automatically (discussed later in the section CREATE DATABASE).

There are four important rules to remember about filegroups; they may or may not count in your f favor or figure in your database plan:

A filegroup cannot be used by more than one database. You cannot take papazulu and assign it to the filegroup for the tangocharlie database. Likewise, files can only be members of one filegroup. The CREATE DATABASE and ALTER DATABASE statements have no facility for such multiallocation or designation, in any event.
Transaction logs are not part of filegroups, although they can be located where the database plan takes them.
The primary file is assigned to the primary filegroup, and the primary filegroup cannot be changed or altered.
If you do not specify alternative filegroups for the secondary data files, all files are then placed into the primary filegroup. The disadvantage is that if the primary filegroup is filled up, then no further data can be stored in the catalog.

The Default Filegroup

As you discovered when you created your first database, it contains a primary filegroup, which becomes the default filegroup before any new filegroups are created, including any user-defined filegroups you specify. Later you can change the default filegroup using the ALTER DATABASE statement we will be discussing shortly.

The primary filegroup can fill up if autogrow for your database is turned off and the disks holding the primary group can run out of space. You can turn autogrow back on, but until you have worked out how to best reduce the size of the files, you may have no choice but to add a larger hard disk and then move the files onto this disk.

The reason for having a default filegroup is so that any object created without specifying a filegroup is assigned to the default filegroup. A problem may arise at a later date, however, in which you will not be able to add to the database because you are low on hard disk space and your primary default filegroup is full. You may then have to create a new filegroup and make the new group the default. In other words, you create a new filegroup so that new user-defined objects, such as tables and views, do not compete with the system objects and tables for data space.

Setting Filegroups Read-Only

SQL Server lets you mark a filegroup as read-only. This means that you cannot modify or add to the files in the filegroup in any way. Marking a filegroup read-only can come in handy. For example, your turnkey application can include data in the filegroup that cannot be removed, and you can store key configuration tables in the read-only file to prevent users from adding data to them (and this is the only filegroup you can then move to a compression-enabled disk).

Relational OLAP (ROLAP) and Hybrid OLAP (HOLAP) databases are also good candidates for read-only filegroups. By setting their filegroups read-only, you provide an additional safeguard against anyone writing to the databases. As you know, OLAP databases or warehouse data, once declared historical or temporal, should not be modified in any way that could render analysis questionable. In the event you need to add data and refresh a cube, you can easily change the file to read/write.

As mentioned earlier, you can place read-only filegroups, or an entire read-only database, on a compressed volume.

Filegroup Tips

When creating and working with your database, consider the following recommendations for your database files and filegroups:

When creating your filegroups, create a secondary filegroup and make it the default filegroup. This will leave the primary filegroup dedicated to the system objects, free from interference from user-defined objects.
Place the filegroups across as many physical disks (or logical disks for RAID sets) as you can so that you can spread the work across several disks. There is nothing worse than to have one disk in an array or cluster that is doing all the work, while the others sit idle.
Try to locate tables that are used in complex and frequently accessed JOINs in separate filegroups that are then located on separate disks. You will see a marked performance improvement over a single-disk implementation because two or more disks are involved in the processing of the JOIN (in parallel) instead of one. Remember again that if you are using RAID, you will have to create several logical disks (each containing at least three disks for a RAID 5 configuration).
Do not place transaction logs on the same physical disk as the filegroups, if you can help it. If the disk fails, you lose not only the database but also the transaction log. Recovery will thus be slow and painful, and if you are not doing frequent online backups, you could lose a lot of data.
You can obtain a report on your filegroups at any time when you execute the system stored procedure sp_helpfilegroup. See Appendix for information about the result set returned from this proc.