Backing Up and Restoring a Database | Inside Microsoft SQL Server 2000

As you're probably aware by now, this book is not intended as a how-to book for database administrators. The Bibliography lists several excellent books that can teach you the mechanics of actually making database backups and restoring and can suggest best practices for setting up a backup-and-restore plan for your organization. Nevertheless, I'd like to discuss some important issues relating to backup and restore processes to help you understand why one backup plan might be better suited to your needs than another.

Types of Backups

No matter how much fault tolerance you have implemented on your database system, it is no replacement for regular backups. Backups can provide a solution to accidental or malicious data modifications, programming errors, and natural disasters (if you store backups in a remote location). If you choose to provide the fastest possible speed for your data files at the cost of fault tolerance, backups provide insurance in case your data files are damaged.

The process of re-creating a database from backups is called restoring. The degree to which you can restore the lost data depends on the type of backup. There are three main types of backups in SQL Server 2000, and a couple of additional variations on those types:

Full backup A full database backup basically copies all the pages from a database onto a backup device, which can be a local or network disk file, a local tape drive, or even a named pipe.
Differential backup A differential backup copies only the extents that were changed since the last full backup was made. The extents are copied onto a specified backup device. SQL Server can quickly tell which extents need to be backed up by examining the bits on the DCM pages for each data file in the database. Each time a full backup is made, all the bits are cleared to 0. When any page in an extent is changed, its corresponding bit in the DCM page is changed to 1.
Log backup In most cases, a log backup copies all the log records that have been written to the transaction log since the last full or log backup was made. However, the exact behavior of the BACKUP LOG command depends on your database's recovery mode setting. I'll discuss recovery modes shortly.

NOTE
For full details on the mechanics of defining backup devices, making backups, or scheduling backups to occur at regular intervals, consult SQL Server Books Online or one of the SQL Server administration books listed in the Bibliography.

A full backup can be made while your SQL Server is in use. This is considered a "fuzzy" backup—that is, it is not an exact image of the state of the database at any particular point in time. The backup threads just copy extents, and if other processes need to make changes to those extents while the backup is in progress, they can do so.

To maintain consistency for either a full or a differential backup, SQL Server records the current log sequence number (LSN) at the time the backup starts and then again at the time the backup ends. This allows the backup to also capture the relevant parts of the log. The relevant part starts with the oldest open transaction at the time of the first recorded LSN and ends with the second recorded LSN.

As mentioned previously, what gets recorded with a log backup depends on the recovery model you are using. So before I talk about log backup in detail, I'll tell you about recovery models.

Recovery Models

As I told you in the section on database options, three values can be set for the RECOVERY option: FULL, BULK_LOGGED, or SIMPLE. The value you choose determines the speed and size of your transaction log backups as well as the degree to which you are at risk for loss of committed transactions in case of media failure.

FULL recovery model

The FULL recovery model provides the least risk of losing work in the case of a damaged data file. If a database is in this mode, all operations are fully logged, which means that in addition to logging every row added with the INSERT operation, removed with the DELETE operation, or changed with the UPDATE operation, SQL Server also writes to the transaction log in its entirety every row inserted using a bcp or BULK INSERT operation. If you experience a media failure for a database file and need to recover a database that was in FULL recovery mode, and you've been making regular transaction log backups preceded by a full database backup, you can restore to any specified point in time up to the time of the last log backup. In addition, if your log file is available after the failure of a data file, you can restore up to the last transaction committed before the failure. SQL Server 2000 also supports a feature called log marks, which allows you to place reference points in the transaction log. If your database is in FULL recovery mode, you can choose to recover to one of these log marks. I'll talk a bit more about log marks in Chapter 12.

In FULL recovery mode, SQL Server will also fully log CREATE INDEX operations. In SQL Server 2000, when you restore from a transaction log backup that includes index creations, the recovery operation is much faster because the index does not have to be rebuilt—all the index pages have been captured as part of the database backup. In previous versions, SQL Server logged only the fact that an index had been built, so when you restored from a log backup, the entire index would have to be built all over again!

So, FULL recovery mode sounds great, right? As always, there's a tradeoff. The biggest tradeoff is that the size of your transaction log files can be enormous, and therefore it can take substantially longer to make log backups than with any previous release.

BULK_LOGGED recovery model

The BULK_LOGGED recovery model allows you to completely restore a database in case of media failure and also gives you the best performance and least log space usage for certain bulk operations. These bulk operations include BULK INSERT, bcp, CREATE INDEX, SELECT INTO, WRITETEXT, and UPDATETEXT. In FULL recovery mode, these operations are fully logged, but in BULK_LOGGED recovery mode, they are only minimally logged.

When you execute one of these bulk operations, SQL Server logs only the fact that the operation occurred. However, the operation is fully recoverable because SQL Server keeps track of what extents were actually modified by the bulk operation. Every data file in a SQL Server 2000 database now has an additional allocation page called a BCM page, which is managed much like the GAM and SGAM pages that I discussed earlier in the chapter. Each bit on a BCM page represents an extent, and if the bit is 1 it means that this extent has been changed by a minimally logged bulk operation since the last full database backup. A BCM page is located at the 8th page of every data file, and every 511,230 pages thereafter. All the bits on a BCM page are reset to 0 every time a full database backup or a log backup occurs.

Because of the ability to minimally log bulk operations, the operations themselves can be carried out much faster than in FULL recovery mode. There is a little overhead to setting the bits in the appropriate BCM page, but compared to the cost of logging each individual change to a data or index row, the cost of flipping bits is almost negligible.

If your database is in BULK_LOGGED mode and you have not actually performed any bulk operations, you can restore your database to any point in time or to a named log mark because the log will contain a full sequential record of all changes to your database.

The tradeoff comes during the backing up of the log. In addition to copying the contents of the transaction log to the backup media, SQL Server scans the BCM pages and backs up all the modified extents along with the transaction log itself. The log file itself stays small, but the backup of the log can be many times larger. So the log backup takes more time and might take up a lot more space than in FULL recovery mode. The time it takes to restore a log backup made in BULK_LOGGED recovery mode is similar to the time it takes to restore a log backup made in FULL recovery mode. The operations don't have to be redone; all the information necessary to recover all data and index structures is available in the log backup.

SIMPLE recovery model

The SIMPLE recovery model offers the simplest backup-and-restore strategy. Your transaction log is truncated at regular, frequent intervals. Therefore, only full database backups and differential backups are allowed. You get an error if you try to back up the log while in SIMPLE recovery mode. Because the log is not needed for backup purposes, sections of it can be reused as soon as all the transactions it contains are committed or rolled back, and the transactions are no longer needed for recovery from server or transaction failure.

Converting from SQL Server 7

Microsoft intended these recovery models to replace the select into/bulkcopy and trunc. log on chkpt. database options. Earlier versions of SQL Server required that the select into/bulkcopy option be set in order to perform a SELECT INTO or bulk copy operation. The trunc. log on chkpt. option forced your transaction log to be truncated every time a checkpoint occurred in the database. This option was recommended only for test or development systems, not for production servers. You can still set these options using the sp_dboption procedure, but not using the ALTER DATABASE command. However, in SQL Server 2000, changing either of these options using sp_dboption also changes your recovery mode, and changing your recovery mode changes the value of one or both of these options, as you'll see below. The recommended method for changing your database recovery mode is to use the ALTER DATABASE command:

 ALTER DATABASE <database_name> SET RECOVERY [FULL | BULK_LOGGED | SIMPLE]

To see what mode your database is in, you can use the DATABASEPROPERTYEX() property function:

 SELECT DATABASEPROPERTYEX('<database_name>', 'recovery')

As I just mentioned, you can change the recovery mode by changing the database options. For example, if your database is in FULL recovery mode and you change the select into/bulkcopy option to true, your database recovery mode automatically changes to BULK_LOGGED. Conversely, if you force the database back into FULL mode using ALTER DATABASE, the value of the select into/bulkcopy option changes automatically. In fact, sysdatabases doesn't record any special information for the recovery mode. The recovery mode is determined by the status bits for these two database options. If bit 3 in sysdatabases.status is set, the database has select into/bulkcopy enabled, and if bit 4 is set, the database has trunc. log on chkpt. enabled. Table 5-4 shows the relationship between the database options and the new recovery modes.

Table 5-4. The relationship between SQL Server 7 database options and recovery modes.

If trunc. log on chkpt. is:	And select into/bulkcopy is:	The recovery mode is:
FALSE	FALSE	FULL
FALSE	TRUE	BULK_LOGGED
TRUE	FALSE	SIMPLE
TRUE	TRUE	SIMPLE

If you're using SQL Server 2000 Standard or Enterprise Edition, the model database starts in FULL recovery mode, so all your new databases will also be in FULL mode. If you're using SQL Server Personal Edition or the Microsoft SQL Server Desktop Engine, the model database starts in SIMPLE recovery mode. You can change the mode of the model database or any other user database by using the ALTER DATABASE command.

The new recovery model offers you two major benefits over previous versions. First, you can always perform a SELECT INTO operation without having to worry about what options you've set. Prior to SQL Server 2000, you could only run the SELECT INTO or minimally logged bulk copy operation if you had set the specific database option to true, and only a database owner could change that option. That sometimes meant a restriction on what non-DBO developers could accomplish.

Second, you can freely switch between the FULL and BULK_LOGGED modes without worrying about your backup scripts failing. Prior to SQL Server 2000, once you performed a SELECT INTO or a bulk copy, you could no longer back up your transaction log. So if you had automatic log backup scripts scheduled to run at regular intervals, these would break and generate an error. This can no longer happen. You can run SELECT INTO or bulk copy in any recovery mode, and you can back up the log in either FULL or BULK_LOGGED mode.

In addition, you can easily switch between FULL and BULK_LOGGED modes if you usually operate in FULL mode but occasionally need to perform a bulk operation quickly. You can change to BULK_LOGGED and pay the price later when you back up the log; the backup will simply take longer and be larger.

You can't easily switch to and from SIMPLE mode. When you use the ALTER DATABASE command to change from SIMPLE to FULL or BULK_LOGGED, you must first make a complete database backup in order for the change in behavior to be complete. Remember that SIMPLE recovery mode is comparable to the database option that truncates the log at regular intervals. The truncation option isn't recommended for production databases, where you need maximum transaction recoverability. The only time that SIMPLE mode is really useful is in test and development situations or for small databases that are primarily read-only. I suggest that you use FULL or BULK_LOGGED for your production databases and that you switch between those modes whenever you need to.

Choosing a Backup Type

If you're responsible for creating the backup plan for your data, you'll not only need to choose a recovery mode but also decide what kind of backup to make. I mentioned the three main types: full, differential, and log. In fact, you can use all three types together. To accomplish any type of full restoration of a database, you must occasionally make a full database backup. In addition, you can also make differential or log backups. Here are some facts to help you decide between these last two:

A differential backup:

Is faster if your environment includes a lot of changes to the same data. It will back up only the most recent change, whereas a log backup will capture every individual update.
Captures the entire B-tree structures for new indexes, whereas a log backup captures each individual step in building the index.
Is cumulative. When you recover from a media failure, only the most recent differential backup needs to be restored because it will contain all the changes since the last full database backup.

A log backup:

Allows you to restore to any point in time because it is a sequential record of all changes.
Can be made after a failure of the database media, as long as the log is available. This will allow you to recover right up to the point of the failure. The last log backup (called the tail of the log) must specify the WITH NO_TRUNCATE option in the BACKUP LOG command if the database itself is unavailable.
Is sequential and discrete. Each log backup contains completely different log records. When you use a log backup to restore a database after a media failure, all log backups must be applied in the order that they were made.

Restoring a Database

How often you make each type of backup determines two things: how fast you can restore a database and how much control you have over which transactions are restored. Consider the schedule in Figure 5-5, which shows a database fully backed up on Sunday. The log is backed up daily, and a differential backup is made on Tuesday and Thursday. A drive failure occurs on Friday. If the failure does not include the log files or if you have mirrored them using RAID 1, you should back up the tail of the log with the NO_TRUNCATE option.

WARNING
If you are operating in BULK_LOGGED recovery mode, backing up the log also backs up any data that was changed with a BULK_LOGGED operation, so you might need to have more than just the log file available to back up the tail of the log. You'll also need to have available any filegroups containing data inserted with a bulk copy or SELECT INTO command.

click to view at full size.

Figure 5-5. Combined usage of log and differential backups reduces total restore time.

To restore this database after a failure, you must start by restoring the full backup made on Sunday. This does two things: it copies all the data, log, and index pages from the backup media to the database files, and it applies all the transactions in the log. You must determine whether incomplete transactions are rolled back. You can opt to recover the database by using the WITH RECOVERY option of the RESTORE command. This will roll back any incomplete transactions and open the database for use. No further restoring can be done. If you choose not to roll back incomplete transactions by specifying the WITH NORECOVERY option, the database will be left in an inconsistent state and will not be usable.

If you choose WITH NORECOVERY, you can then apply the next backup. In the scenario depicted in Figure 5-5, you would restore the differential backup made on Thursday, which would copy all the changed extents back into the data files. The differential backup also contains the log records spanning the time the differential backup was being made, so again you have to decide whether to recover the database. Complete transactions are always rolled forward, but you determine whether incomplete transactions are rolled back.

After the last differential backup is restored, you must restore, in sequence, all the log backups made after the last differential backup was made. This includes the tail of the log backed up after the failure if you were able to make this last backup.

NOTE
The recovery done during a restore operation works almost exactly the same way as restart recovery, which I described in Chapter 3. There is an analysis pass to determine how much work might need to be done, a roll-forward pass to redo completed transactions and return the database to the state it was in when the backup was complete, and a rollback pass to undo incomplete transactions. The big difference between restore recovery and restart recovery is that with restore recovery you have control over when the rollback pass is done. It should not be done until all the rolling forward from all the backups has been applied. Only then should you roll back any transactions that are still not complete.

Backing up and restoring files and filegroups

SQL Server 2000 allows you to back up individual files or filegroups. This can be useful in environments with extremely large databases. You can choose to back up just one file or filegroup each day, so the entire database does not have to be backed up as often. It also can be useful when you have an isolated media failure on just a single drive and you think that restoring the entire database would take too long.

Here are a few details you should keep in mind when backing up and restoring files and filegroups:

Individual files and filegroups can be backed up only when your database is in FULL or BULK_LOGGED recovery mode because you must apply log backups after you restore a file or filegroup and you can't make log backups in SIMPLE mode.
Unlike differential and full database backups, a backup of a file or filegroup does not back up any portion of the transaction log.
You can restore individual file or filegroup backups from a full database backup.
Immediately before restoring an individual file or filegroup, you must back up the transaction log. You must have an unbroken chain of log backups from the time the file or filegroup backup was made.
After restoring a file or filegroup backup, you must restore all the transaction logs made between the time you backed up the file or filegroup and the time you restored it. This guarantees that the restored files are in sync with the rest of the database.
For example, suppose you backed up Filegroup FG1 at 10 A.M. Monday. The database is still in use, and changes happen to data on FG1 and transactions are processed that change data in both FG1 and other filegroups. You back up the log at 4 P.M. More transactions are processed that change data in both FG1 and other filegroups. At 6 P.M., a media failure occurs and you lose one or more of the files making up FG1.

To restore, you must first back up the tail of the log containing all changes that occurred between 4 P.M. and 6 P.M. You can then restore FG1 using the RESTORE DATABASE command, specifying just filegroup FG1. Your database will not be in a consistent state because the restored FG1 will have changes only through 10 A.M. but the rest of the database will have changes through 6 P.M. However, SQL Server knows when the last change was made to the database because each page in a database stores the LSN of the last log record that changed that page. When restoring a filegroup, SQL Server makes a note of the maximum LSN in the database. You must restore log backups until the log reaches at least the maximum LSN in the database, and you will not reach that point until you apply the 6 P.M. log backup.

Partial restore

SQL Server 2000 lets you do a partial restore of a database in emergency situations. Although the description and the syntax seem similar to file and filegroup backups, there is a big difference. With file and filegroup backups, you start with a complete database and replace one or more files or filegroups with previously backed up versions. With a partial database restore, you don't start with a full database. You restore individual filegroups, which must include the primary filegroup containing all the system tables, to a new location. Any filegroups you don't restore no longer exist and are treated as OFFLINE when you attempt to reference data stored on them. You can then restore log backups or differential backups to bring the data in those filegroups to a later point in time. This allows you the option of recovering the data from a subset of tables after an accidental deletion or modification of table data. You can use the partially restored database to extract the data from the lost tables and copy it back into your original database.

Restoring with standby

In normal recovery operations, you have the choice of either running recovery to roll back incomplete transactions or not running recovery. If you run recovery, no further log backups can be restored and the database is fully usable. If you don't run recovery, the database is inconsistent and SQL Server won't let you use it at all. You have to choose one or the other because of the way log backups are made.

For example, in SQL Server 2000, log backups do not overlap—each log backup starts where the previous one ended. Consider a transaction that makes hundreds of updates to a single table. If you back up the log in the middle of the updating and again after the updating is finished, the first log backup will have the beginning of the transaction and some of the updates and the second log backup will have the remainder of the updates and the commit. Suppose you then need to restore these log backups after restoring the full database. If, after restoring the first log backup, you run recovery, the first part of the transaction is rolled back. If you then try to restore the second log backup, it will start in the middle of a transaction and SQL Server won't know what the beginning of the transaction did. You certainly can't recover transactions from this point because their operations might depend on this update that you've lost part of. So, SQL Server will not allow any more restoring to be done. The alternative is to not run recovery to roll back the first part of the transaction, but instead to leave the transaction incomplete. SQL Server will know that the database is inconsistent and will not allow any users into the database until you finally run recovery on it.

What if you want to combine the two approaches? It would be nice to be able to restore one log backup and look at the data before restoring more log backups, particularly if you're trying to do a point-in-time recovery, but you won't know what the right point is. SQL Server provides an option called STANDBY that allows you to recover the database and still restore more log backups. If you restore a log backup and specify WITH STANDBY = '<some filename>', SQL Server will roll back incomplete transactions but keep track of the rolled-back work in the specified file, which is known as a standby file. The next restore operation will first read the contents of the standby file and redo the operations that were rolled back, and then it will restore the next log. If that restore also specifies WITH STANDBY, incomplete transactions will again be rolled back but a record of those rolled back transactions will be saved. Keep in mind that you can't modify any data if you've restored WITH STANDBY (SQL Server will generate an error message if you try), but you can read the data and continue to restore more logs. The final log must be restored WITH RECOVERY (and no standby file will be kept) to make the database fully usable.