Section 19.3. The DBA s View | Backup & Recovery: Inexpensive Backup Solutions for Open Systems

19.3. The DBA's View

These elements of SQL Server are important to know if you plan to talk to a DBA about your server.

19.3.1. Database Files

Like most other databases, SQL Server uses datafiles to store its data. There are always a minimum of two datafiles associated with a database: one for the database data itself and one for the logfiles. It is a generally accepted practice to locate these files on different disks or partitions to help with performance and with disaster recovery. An individual datafile can be associated with only one database.

There are three types of files in SQL Server: primary datafiles, secondary datafiles, and logfiles.

Primary datafiles: Every database has at least one primary datafile. This file stores data and holds information about other datafiles. The default extension for primary datafiles is .mdf.
Secondary datafiles: Any datafile that is not a primary datafile and not a logfile is considered a secondary datafile. These files store additional data. The default extension for these files is .ndf.
Logfiles: Logfiles don't directly store any data; instead, they hold all the log information for the database. Like primary datafiles, there must be at least one logfile for each database. It is possible, and encouraged, to have multiple logfiles and spread them across different disks. The default extension for logfiles is .ldf.

The physical location of the files is not very restrictive; they can be kept on any FAT or NTFS partition. NTFS is preferred because it's better integrated with later versions of Windows and has enhanced security. The files are stored in the primary file of the database as well as in the master database itself.

Each database file can expand as more data storage is needed. The amount of growth is specified in increments, either as a percentage or as a fixed size amount. Growth continues to occur until the maximum file size is reached. If this size is not specified, growth can occur only until the file size reaches 16 TB or until the disk is filled, whichever happens first. With more than one datafile, growth won't occur until all datafiles have been filled; growth then occurs in a round robin format starting with the first file. There is a theoretical database (not file size) limit of 1,048,516 TB.

Like databases, logfiles have various states. These states are online, offline, restoring, recovery pending, suspect, and defunct.

19.3.2. Filegroups

A set of related files can be grouped together in a filegroup. A filegroup makes it easier to manage the storage resources of the files it contains and also allows for easier administration of those files.

There are two types of filegroups in SQL Server 2005: primary and user defined.

Primary: The primary filegroup contains the primary datafile and any other datafile not specifically assigned to another filegroup. The primary filegroup houses all pages for the system tables.
User defined: User-defined filegroups are specified when the filegroup keyword is used in a create database or alter database command.

Keep in mind these rules for files and filegroups:

Logfiles are never part of a filegroup; they are always managed independently from datafiles.
Files can't belong to more than one filegroup.
Certain database objects (tables, indexes, large objects) can be specified to use a particular filegroup.
One filegroup will always be designated as the default.
When creating a table or index, if no filegroup is specified, the default is used.
Only one filegroup at a time can be the default.
Data that resides in partitioned objects can be located in different filegroups.

19.3.3. Transaction Log

The transaction log keeps a record of every transaction (insert, update, delete, page allocation/deallocation, begin/end transaction statements) that happens in a database. The operation is recorded here before the true change is ever made to the database itself. This type of write ahead logging (WAL) guarantees that no data changes are committed to the disk before the record is written to the logfile, which provides a redundant architecture by which problems can easily be recovered from, and transactions can be reverted or rolled back.

SQL Server has a logical transaction log and a physical transaction log. The physical log is the actual file on the disk. The logical log is the representation to the database engine of that log and occupies space in the physical log. The logical log can be thought of as a series of virtual logfiles of no fixed size. Space within the logical logfile is continually reused, in a circular manner, until the logfile fills. In order to prevent this amount of data from growing too large, the transaction log is periodically truncated. This truncation generally happens automatically during a backup but can also be done manually.

Keep in mind that manual truncation breaks the log backup chain.

It is important to truncate this log to prevent it from filling up. If a transaction log fills, the database switches to read-only mode and does not allow updates. Truncation never reduces the size of the physical logfile, only the logical. To reduce the size of the physical file, special commands must be issued, which are discussed later in this section. Logfiles can also grow indefinitely if the FILEGROWTH setting is enabled. This allows a transaction logfile that has filled to grow by a specified growth increment. Of course, the growth of the physical logfile is limited to the amount of storage you have available.

If the transaction log does fill up, there are several options to truncate it:

Doing a physical backup of the transaction log
If there is free disk space, increasing the size of the logfile
If there is not free disk space, freeing up disk space or moving the log to a drive with free space
Adding a new logfile to a different disk
Truncating after a system or explicit checkpoint

19.3.3.1. Monitoring logfile size with dbcc

The size of the transaction logfiles can be monitored with dbcc. To view the current usage, use the dbcc sqlperf command:

dbcc sqlperf (logspace) Database Name               Log Size (MB)     Log Space Used (%)     Status -----------------------     -------------     ------------------     ----------- master                      0.4921875         81.74603               0 tempdb                      0.4921875         63.39286               0 model                       0.4921875         73.01588               0 msdb                        0.4921875         86.50793               0 ReportServer                0.7421875         38.68421               0 ReportServerTempDB          0.7421875         36.77632               0 Inventory                   0.9921875         45.22638               0

19.3.3.2. Reducing the size of the physical log

Sometimes it may be necessary to reduce the space of the physical logfile. This can be accomplished with the dbcc shrinkfile command:

dbcc SHRINKFILE (Inventory_Log,1) DbId   FileId      CurrentSize MinimumSize UsedPages   EstimatedPages ------ ----------- ----------- ----------- ----------- -------------- 7      2           128         128         128         128

19.3.4. Pages

SQL Server uses the concept of pages for data storage. SQL Server storage in datafiles is logically divided into pages, and these pages are numbered contiguously starting at 0. Disk access is managed at the page level; any read or write is done a whole page at a time.

There are 128 pages per megabyte, and each page is 8 KB. Each page starts with a 96-byte header used to hold information about the page. This information includes the page number, the allocation ID of the owning object, and the type and amount of free space.

There are page types for different storage types: data, indexes, text or image data, global, shared global allocation map, free space, index allocation map, bulk change map, and differential change map.

Rows of data are entered into the page starting after the header, in sequence. The row offset table starts at the end of the page and contains one entry for each row on that page. Each entry details how far the first byte of the row is from the beginning of the page. These entries are in reverse order of the rows on the page that they track.

19.3.5. Extents

Extents are important because they govern the way space is managed within the database. A group of 8 physically contiguous pages makes up an extent, and there are 16 extents per megabyte. By grouping pages in extents, the pages can be more efficiently managed.

There are two types of extents, uniform and mixed. The pages within uniform extents are owned by only one object and can be used only by that object. The pages within mixed extents can be shared by up to eight objects, meaning that each of the eight pages within the extent can have a different owner.

When creating a new table or index, space is allocated from pages in mixed extents. As the table grows past its initial size of eight pages, it then switches to uniform extents.

19.3.6. Partitions

Partitions segment data in a more logical and manageable manner. An additional benefit, and a substantial case for partitioning, is the potential for performance gain.

By default, tables are allocated to just one partition. Tables or indexes arranged in the default single-partition scheme are essentially the same as tables or indexes in earlier versions of SQL Server. In a table or index that spans multiple partitions, the data is partitioned horizontally. This means that groups of rows are mapped into individual partitions based on the column they are a part of. Note that partitions can be a part of more than one filegroup across the same database. Regardless of how many partitions or filegroups a table or index spans, when they are queried or updated, the object is treated as a single logical entity.

Within partitions in SQL Server 2005, data rows are arranged in one of two ways, either by clusters or heaps. Cluster data has a clustered index. This is a b-tree index that arranges the data rows based on the order in the clustered index key. Heap data has no clustered indexes. Here, data is stored in no particular order or format.

19.3.7. Table and Index Specifics

Data in tables is stored in fixed-size pages of 8 KB. This data is stored in rows, and these rows generally don't span more than one page. The exception is large datatypes such as (n)text, image, varchar(max), and xml. Large datatypes can span multiple, noncontiguous pages.

Tables span one or more partitions and, in each partition, data rows are organized in either a heap or clustered index structure. Depending on the type of data, pages of the heap or clustered index are managed in one or more allocation units. Partitions in this instance are not partitions as you would normally expect in disk storage; here they are simply a way for the user to define the way the data is organized. By default, a table or index is only associated with one partition. Keep in mind that individual partitions must exist in a single filegroup.

19.3.8. Snapshot Backups (2005)

New in SQL Server 2005 is the concept of snapshot backups and restore. Utilizing this technology requires software and/or hardware from third-party vendors, so we cover its benefits only briefly here because this book focuses on inexpensive backup methods.

Snapshot backups allow for very fast backups with almost no impact to the performance of the server. The major benefit of snapshot backups is that the restore process can happen equally as fast. Other benefits include creating copies that can be used for other purposes such as reporting purposes.

These types of backups are equivalent to regular backups and can be intermixed with regular backups. For instance, you can use snapshot backups in a restore process where the first full backup is from a tape. Snapshot backups are supported only with full, partial, and file backups, and partially with differential backups.