Chapter 9, Database Environment Basics for Recovery, provided the foundation to help you prepare your backup and restore strategy. This chapter guides you through that process.
On many production servers, the backup and restore strategy is to perform only periodic (such as weekly) full database backups with transaction log backups between the full database backups (with a frequency of, say, every two hours). Some companies add occasional differential database backups (for example, nightly except on the day when full backups occur). The basic strategy of performing full database backups and transaction log backups is adequate to minimize data loss.
However, if you want to achieve high availability, you can minimize downtime by restoring a differential database backup instead of all of the logs over the same period. Restoring a differential backup is significantly faster because it contains extents that contain the completed results of all modifications that have occurred without having to read them from the log. In some cases, however, the size of the differential backup might approach the size of a full backup. If this is your scenario, then you are better off performing a full database backup so that you have less to restore (just the full backup and the appropriate transaction logs). Regardless of whether or not you add differential database backups to your strategy, you must use a full backup as the starting point for a restore.
If you have a very large database (VLDB), which is usually measured in hundreds of gigabytes or even terabytes, you must consider how long your VLDB takes to back up. More important, you also have to consider the time it will take to restore the database. Are these times acceptable based on your business requirements?
Always use recovery-oriented planning when determining your backup strategy, no matter what size your databases are. Do not base your strategy on how long your backups take to run; base it on how long it will take to recover from a data-loss event. Unless your strategy uses storage-assisted backups such as split-mirror , a restore that incorporates a full database backup as part of the strategy is likely to take a long time, possibly multiple hours or even days depending on the size of your database. Even if a time span of hours or days of downtime is acceptable (if so, chances are this book is meaningless to you), how much data can you afford to lose? What is your site redundancy strategy? Do you use log shipping or copy all your log backups to a secondary site?
Microsoft SQL Server temporarily prevents certain operations such as a transaction log backup while a full database backup or a differential database backup is running. You could encounter a significant amount of data loss if you had site failure during the full or differential database backup. This scenario might sound unlikely , but when your full database backup time is measured in hours, your work loss exposure and risk for data loss increase dramatically. If your database backup takes eight hours to perform, your secondary site could be as much as eight hours behind at the time of a failure. This situation could result in eight hours of data loss if the backup was not complete or had not copied to the secondary location. Losing eight hours of data is unacceptable in most cases.
What if you could allow transaction log backups to occur while you were performing a different type of backup? Then it would be possible to continue to send transaction log backups to the secondary site even while your large backups are occurring. You might choose not to perform a full database backup regularly if you back up transaction logs on a frequent basis. (Every minute or two is a common frequency for minimizing data loss. However, you are limited by the speed of your systems as well as your workload if you try to perform transaction log backups that frequently; it might not be possible.) By using a file/filegroup backup strategy, you can completely avoid performing a full database backup, and log backups will never be paused . This strategy allows a secondary site to always be as close as possible to the primary site and minimizes the potential data loss.
Some basic backup and restore strategies have few restrictions. A full database backup has no real restrictions; it can be performed anytime . However, the more advanced file and filegroup backup strategies have some basic requirements you need to understand. Primarily, you must perform transaction log backups, a crucial component of restore, on a regular basis. When you perform transaction log backups, you minimize the risk of data loss and take advantage of features such as decreased total downtime in the event of isolated failure by using file/filegroup strategies. To be able to perform transaction log backups, first you must set the database recovery model to either Full or Bulk- Logged. If you set the recovery model to Simple, you cannot back up the transaction log, and you will not be able to use the file/filegroup backup strategy.
Transaction log backups are required for recovery if you use a file/filegroup backup strategy.
One optional configuration decision you can make for your databases is that when you are creating your database objects, place them strategically within the database itself. In most databases and especially in a VLDB, your data will likely vary in the way it is used. For example, some data will be predominantly read-only and some will be predominantly read/write. In some tables, new rows might be read/write to accommodate frequent inserts , whereas old rows (historical data used mainly for analysis) would be read-only. For other tables, the distribution might vary based on corporate policy. For example, if price changes to products in a catalog are infrequent, price information would be predominantly read-only until price increases were periodically propagated as changes to the database.
After reviewing the overall use of your data, you can determine table use and strategically place read-only tables (or tables with isolated or batch modifications) in their own filegroup and read/write tables on another. Then you not only save time during a restore, but you also save money and time during backups. Although the read-only portion of your database might be large, it does not need to be backed up as often as the read/write portion.
There are two important factors to be aware of:
Even though a filegroup is read-only, you still have to restore all the log files until the end of the transaction log backup files. See the topic How to restore files and filegroups ( Transact -SQL) in SQL Server Books Online.
You can back up all the filegroups immediately followed by a transaction log backup and get the equivalent of a full database backup. In this case, you have all the data in one backup set and will need only to restore log backups after that point. The strategy of using file backups to allow for continuous frequent log backups is therefore not as cumbersome as it might seem. You restore the main data backup, then all the log backups they have taken while the data backup was in progress. However, be aware that the first log backup to restore after the data backup might have been taken before the data backup.
Additionally, partitioning your larger tables to split read activity from write activity can improve maintenance performance. Operations such as rebuilding an index take less time on physically smaller tables. For example, consider a typical sales table that holds the online sales information for a large company. Sales for the current month are read and write (which requires frequent index maintenance). In contrast, sales from the previous month or quarter are read-only because they are now used only for analysis, meaning maintenance occurs only once a month when the data is moved to the read-only portion of the database.
For tables with different usages, consider using separate filegroups for each type of data. For example, creating four separate filegroups ”one for read- only data, one for read/write data, one for text and image data, and one for extremely large tables ”can help when you are determining your backup strategies. Not only will you reduce maintenance times for each of the smaller tables (in other words, data partitions), but you will also add numerous backup and, especially, restore strategy options that can reduce downtime and backup costs.
Do not partition databases into multiple files or filegroups in a production environment without first testing to see if you will actually benefit from implementing files and filegroups. Although there are good arguments for moving things into their own files and filegroups, as noted earlier, doing this can ultimately add to your administrative overhead and, in turn , maintenance. The maintenance might run more quickly in some cases, but you might have more tasks to perform and more objects and tasks to manage. Ultimately, you will be making some trade-off. You want to create an entire SQL Server solution ” including a backup and restore strategy ”that you can maintain and perform day in and day out.
Finally, remember to back up all of your operating system files, application files, registry entries, and so on. In most cases, it is faster to restore the base state of your servers than it is to install again from scratch from CDs or DVDs. Although a database administrator (DBA) is not usually responsible for backing up these files, a DBA must coordinate with those people who are responsible for doing this to ensure that their systems are fully protected.
Before you can devise a backup strategy, take into account your company s archival scheme, if one exists. An archival strategy is also known as a retention period for your backup files. A retention period involves the following:
How long you will keep a backup, whether it is a SQL Server database, the operating system, or any other application data. The amount of time a file is retained is usually dictated by a corporate policy, but is also governed by cost, space, and other logistics.
How long you will keep media active in the backup mix, otherwise known as rotation . Using the same tape repeatedly is not the proper strategy because you are just overwriting your backups onto the same tape or disk without being able to go back to an older backup should that backup process fail. Media also fails, which is another reason for rotation. Do not let your media be a huge point of failure. Along with rotating your media, label each tape or disk with the contents and backup date so you are not playing the Guess the Proper Backup Tape game in a potential disaster recovery or restore situation.
The location of backup storage. Whether it is on a tape or disk, the media the backups are stored on take up physical room. Do you store the media on site? off site? If your corporate disaster recovery strategy states that you must be able to go back a week, it might make sense to keep a week (or even two) on site, but anything older than that should be stored in a secure, climate-controlled off-site facility. Also, you might be at risk if your datacenter is damaged and it not only housed your systems, but your complete corporate archive of backup tapes as well. Remember that backup tapes take up physical storage space and have a cost associated with them, so you might not be able to keep every backup since the inception of your company. Many companies implement an archival and rotation schedule of somewhere between 30 days and 90 days, unless they are regulated by some external auditing agency or specific industry rules (for example, a financial company) and are mandated to retain large amounts of archived backups.
Off-site location access. If you store your backups at an off-site location, you need to have physical access to the location at all hours, so choose your storage facility wisely.
|On the CD|| |
Use the file SQL_DB_Backup_Info.xls to record your backup strategy. It has two main tabs that you can copy and modify. One is for an entire instance of SQL Server so you can record the databases, and one is per application. This strategy allows you to have two views to understand the scope of your backup strategy better.
The most effective backup approach combines two strategies: one strategy is based on full database backups (the full database “based backup strategy), and the other is based on file/filegroup backups (the file-based backup strategy). Both strategies can use differential backups, but all strategies involve the use of transaction log backups.
This strategy is the most common strategy. It is generally based on backup types that have been available since the introduction of SQL Server, yet if the environment is right it might also utilize one of the newer backup types to improve recovery time. This strategy is easier to administer than the file-based backup strategy and offers up-to-the-minute recovery and point-in-time recovery. You can change between the Full and Bulk-Logged recovery models without breaking the continuity of the log. Remember to also review the recommended practices for batch processing, because using the Bulk-Logged recovery model will significantly reduce the work-loss exposure created during the changes in recovery models.
The typical strategy consists of automated backups of these types:
Full database backups
Differential database backups
Transaction log backups
Their frequency is dictated by your database but usually follows something like this:
Full database backups weekly (although less frequent full backups are an option, as these might be quite large and expensive to manage).
Differential database backups nightly (except on the night of the full database backup).
Transaction log backups ”consistent and automated, as well as two special cases. The frequency dictates the potential work-loss exposure in terms of minutes and the transaction log size. The smaller the interval, the smaller the log will be, which is, of course, dependent on database activity.
Automated Hourly, every 30 minutes, every 5 minutes, every 1 minute, and so on.
Special Case 1 (Proactive) When the transaction log percent log used reaches a higher than normal level, back it up proactively before it fills up.
Special Case 2 (Reactive) When the transaction log fills up, back it up.
Consider the following example using full and transaction log backups, as well as differentials, because performing full database backups weekly leads to a large number of logs being created hourly. In a week, if you performed a transaction log backup every hour , you would have nearly 170 transaction log backups (24 7 = 168, but because you perform one full backup, no transaction logs are generated during the process).
Restoring all of these transaction logs from transaction log backups takes time and is cumbersome. You want a streamlined process, not one in which you need to write close to 180 Transact-SQL statements. So although it takes significantly less time than the original modification took to apply all of the operations in the transaction logs, it could still take a lot of time to implement (again, no computations , no functions, just data is applied ”but for every log row). Instead of having to build a version of a database by reapplying changes row by row, you can get to a specific point in time much more quickly by using a differential backup.
In this scenario full database backups are represented as F 1 and F 2 , differential database backups are represented as D 1 , D 2 , D 3 , and D 4 , and 20 transaction log backups are shown as l 1 through l 20 . At point in time x you decide to recover the database.
In this example, only automated backups are used to describe recovery. When disaster recovery using backups is discussed, you will learn how (and when) to access a final backup, which includes the changes since the database was taken offline. In this case a 28th backup would be necessary. This backup is called the tail of the transaction log and it provides up-to-the-minute recovery. This part of your recovery process is the most important, and the procedure has not yet been discussed. However, all of the sample scripts include the code for backing up the 28th backup ”backing up the tail of the log.
This example shows a total of 27 backups. With this backup strategy, you have multiple potential recovery paths and more efficient recovery. The most optimal recovery path is to use the last full database backup, the last differential backup after that full database backup, and then all of the transaction log backups after the differential backup to restore the database up-to-the-minute when it is possible.
F 2 D 2 l19 l20
This strategy is optimal because you can recover all of the activity that occurred between the full database backup at F 2 and the differential backup at D 2 with only a single restore. This restore is a copy of how the pages looked at the point in time when the differential backup was made, and it includes all changes since the full backup. Remember, each differential database backup includes all changes since the last full backup. Each differential could get larger and larger, so you will still want to periodically perform a full database backup. However, differentials offer the ability to perform a potentially smaller backup (maybe a fraction of the entire database) more frequently and not burden the system by performing full database backups. But what if the last differential backup was bad? Do you have any other options? You could use the differential backup prior to that as shown here:
F 2 D 1 l16 l17 l18 l19 l20
If that differential backup were bad as well, then you could use only the transaction logs and still recover:
F 2 l13 l14 l15 l16 l17 l18 l19 l20
If the last full database backup were bad, you have other options. If the backup at F 2 is bad, go back to F 1 . However, when you go back to a previous full database backup, you must remember that the only differentials you can apply are those that apply to that version of the full database backup. For the full database backup performed at F 1 you can apply any differentials performed after that full backup (not another) and before the next full database backup (F 2 in this case). If F 1 were required for the restore then the most optimal restore sequence starting with F1 would be the following:
F 1 D 3 l10 l11 l12 l13 l14 l15 l16 l17 l18 l19 l20
You could then repeat the same process described previously and return to an earlier differential if the differential database backup D 3 were bad. A complete list of all of your remaining options follows:
F 1 D 2 l7 l8 l9 l10 l11 l12 l13 l14 l15 l16 l17 l18 l19 l20
F 1 D 1 l4 l5 l6 l7 l8 l9 l10 l11 l12 l13 l14 l15 l16 l17 l18 l19 l20 F 1 l1 l2 l3 l4 l5 l6 l7 l8 l9 l10 l11 l12 l13 l14 l15 l16 l17 l18 l19 l20
In fact, even if this full database backup at F 1 were bad, you could return to the previous full database backup, assuming you kept it, and still roll forward to the last transaction log backup. Or you could return to the previous full database backup ”or the one before that, or the one before that ” as long as you have the entire sequence of log backups to apply .
Never truncate the transaction log; it breaks the log backup sequence.
As part of an overall cost savings analysis, a company determines that by implementing a differential backup strategy its backup needs can be covered completely and the company can save money at the same time by implementing a differential backup strategy. Because transaction log backups are critical, the company decides that transaction log backups should be performed to disk. Additionally, it augments the backup strategy with a log shipping site at a remote location. The current full database backup strategy costs $44,380 annually. The breakdown of the costs associated with it is as follows:
Tape costs Each Linear Tape-Open (LTO) tape costs $70, and each full database backup requires 3 LTO tapes. Full database backups are performed daily, requiring 21 LTO tapes per week. Tapes are reused every 12 weeks (allowing the company to keep each month of history and then reuse after roughly 3 months). Proactively, tapes are cycled out after 24 weeks, allowing an archive off site of roughly 6 months. The total cost per year is $35,280. (All price quotations in this book are in U.S. dollars.)
Labor costs A typical operator who is paid $50 an hour takes half an hour a day to locate, mount, dismount, label, and file each backup. Each week this process requires a total of 3.5 hours, and this specific task takes a total of 182 hours per year. The total labor cost per year is $9,100.
However, the database is over 1 terabyte (TB) and only 2 percent of the data changes daily. If 1 TB of data can fit on 3 tapes, several database differential backups can easily fit on the same tapes. In fact, numerous differentials can even fit on a single tape. The first day the backup will be 20 GB (the total backup space needed 20 GB); the second day it will be 40 GB (if appended, the total backup space needed is 60 GB); the third day it will be 60 GB (again, if appended, the total backup space needed is 120 GB); the fourth day it will be 80 GB (if appended, the total backup space needed is 200 GB); and so on.
Effectively, you could fit an entire week s worth of differential database backups on one set of tapes; however, the decision is made to rotate tapes after three differential database backups and store them on only a single tape (instead of using a stripe set). This strategy yields a significant savings in both tape media and operational costs for a total of $10,350 annually. The breakdown of the costs associated with it is as follows:
Tape costs Instead of performing full database backups daily, a full database backup is performed once per week and still requires 3 LTO tapes. Differential database backups are performed six nights per week; however, only three differential database backups are stored on each backup set. Because the size is expected to be less than 120 GB, only a single tape is used to store the differential database backups. The archiving process does not change, but instead of requiring 21 LTO tapes per week, this process requires only 5. The rotational practices are the same. The total cost per year is $8,400.
Labor costs A typical operator who is paid $50 per hour takes .5 hours per day to locate, mount, dismount, label, and file each full database backup. Each week, this new process requires a total of 45 minutes. A total of 39 hours per year is spent on this specific task. The total labor cost per year is $1,950.
The total saving for using differential backups is $34,030 ($44,380 “ $10,350). The differential database backup saves 75 percent of the cost of the full database strategy. Additionally, because the differential database backs up only a maximum of 12 percent of the database (2 percent per day over the six days until a full database backup), the type of backup affects the production workload significantly less than the full database backup. Furthermore, the transaction logs are paused for a far shorter period of time, keeping the log shipping secondary more up to date.
This example shows the cost savings of using different backup strategies. Not all databases can benefit in this way. Carefully weigh your needs and resources against your required recovery times and database type.
This strategy was introduced and possible in Microsoft SQL Server 7.0, but there were limitations using SQL Server 7.0 that made it less robust than the implementation in SQL Server 2000. The most significant difference is that in SQL Server 7.0, files could not be backed up individually if they were part of a filegroup or if there were data dependencies within other filegroups. If they were, SQL Server required that all files or filegroups be backed up at the same time. In SQL Server 2000, you can back up any file or filegroup at any time, with the only restriction being that your database be in the Full or Bulk-Logged recovery model (which all production databases should be running in anyway).
The typical strategy consists of automated backups of these types:
Full file/filegroup backups
Differential file/filegroup backups
Transaction log backups
Many applications that tend to get very large are usually vendor- neutral. As such, they do not usually support database proprietary extensions such as filegroups, even though they are transparent to the application. So before you start breaking up the database, make sure your software vendor will support you if you implement this file structure.
In the file-based backup strategy, you can completely customize the frequency and granularity of the backups ”a huge benefit for VLDB ”in which different portions of the data have different uses. You should customize the granularity based on the way the data is distributed within the database (such as targeting specific types of data to specific locations within a database). Moreover, you should determine the frequency of each backup based on the type of data that that is stored there.
Determining which type of user data should go into your filegroups requires some strategizing. As a simple start, the Primary filegroup should contain only the system tables, and you need only one transaction log file. As discussed earlier, you need only one transaction log file because frequent log backups minimize the space required to hold the changes, and you do not gain performance when you have more than one transaction log. For example, if your transaction log is backed up every minute, the size of the transaction log is the size to hold one minute s worth of log entries.
Long-running transactions might require a larger log. Make sure you have tested the sizing of your transaction log against all of the types of activities that must occur within your database.
You should store user-defined data in nonprimary filegroups. In general, you should rarely split individual tables and their indexes into separate filegroups. However, you might consider this approach if you have a single large table that you want to place directly into a filegroup for better scan performance and more backup options. Remember, do not implement it without testing properly.
You cannot restore filegroup backups to another location without restoring the entire database. However, if you are using a full database backup, you can restore filegroups by using the WITH PARTIAL option on a restore. This approach is beneficial when recovering from user error. When recovering from hardware failure, and recovering in place, restoring a filegroup from a filegroup backup is faster than restoring from a full database backup. For more information, see the topic Partial Database Restore Operations in SQL Server Books Online.
Here are a few recommendations for filegroup usage:
Read-only tables A great benefit if the VLDB has a large portion of read-only data. For this portion of the database, you can significantly decrease the frequency of full filegroup backups and then only perform occasional filegroup differential backups. If the recovery plan includes only recovering the damaged file or filegroup and not an entire database, frequent file differential backups will minimize the cost of rolling the database forward.
Read-write tables Can often make up the smaller percentage in VLDBs, yet still be critical to the database. The read-write tables need more frequent backups, yet they are only a minor fraction of the total size of the database. By placing them in their own filegroups, you can then set the frequency of backup of that filegroup.
Text or image data Often in VLDBs, binary large objects (BLOBs) can take up a large amount of read-only space. As with read-only tables, you can benefit greatly by backing these up less frequently if all in the filegroup are read-only or with differentials if little changes.
Large tables Benefits are few if you want to have better control over the frequency of the large table backups. If you choose to put a table into its own filegroup, file-based backups help only in being able to choose a different granularity of backup. For example, you might back up less frequently because of so few modifications. Unfortunately, you cannot perform a partial database restore from file and filegroup backups unless you are restoring because of hardware or file corruption. When an isolated failure occurs, SQL Server allows the recovery to include only the affected part of the database in place.
Once the file or filegroup has been restored, transaction log backups can be used to bring that file up to the point in time when the database became suspect. If you are trying to recover from an accidentally dropped table and return a part of the database to an earlier point in time, then this approach has no benefits. However, you can restore the entire database to an alternate location and then choose a more manual recovery process to bring an earlier version of the table into the database.
In fact, the file-based backup strategy provides some key benefits because it is so flexible, yet it is also complex in design, administration, and recovery. The most important part is making sure you back up every file at some point. You must achieve a complete backup set similar to what the full database backup provides (all data files must have been backed up) so that you can re- create the database structure completely, if necessary (for example, if you need to recover the database from the ground up). You can create this backup set (from which you can build the database framework, if needed) by either backing up the files individually, backing up the filegroups individually, or backing up some combination of the two ”as long as all files are backed up at some point.
Both strategies use differential backups as well as transaction log backups, which provides some significant benefits, such as the following:
Up-to-the-minute recovery If the log is accessible and no Bulk- Logged operations have occurred since the last log backup, you will be able to back up the tail of the transaction log.
Built-in redundancy If a full backup (file/filegroup or database) is damaged because of inaccessible media or for some other reason, you can use the previous full (file/filegroup or database) backup. Similarly, if a differential (file/filegroup or database) backup is damaged because of inaccessible media or for some other reason, you can use the previous differential (file/filegroup or database) backup.
Although this strategy is the most common, you must be aware of its pros and cons. Better to know now your potential for downtime, data loss, or both during planning rather than at the time of a disaster!
The pros include the following:
Administration is simple.
When you combine it with other strategies (such as log shipping), this strategy is especially good at achieving minimal downtime and data loss in the event of a failure.
You can perform partial restores of the database to another location (which can be helpful in some recovery strategies).
The cons include the following:
Log backups are paused during the full backup (which can affect some of the other strategies, such as log shipping).
Log backups are paused during the differential backup. (Although this process should take less time than the full database backup, it can also affect other strategies.)
Significant data loss could occur if a disaster were to strike during a long-running full backup.
Full database backups can be quite large, and repetitively backing up predominantly read-only data is a waste of time and resources (disk or tape).
Although this strategy is not widely used, you should consider it if you have a VLDB. This strategy requires design techniques that focus on physical placement of objects to best use the strategy. However, any database can use some of the basic principles of this strategy by having multiple files. This strategy is best for situations in which log shipping is used and recovery is being optimized for hardware failure. If human error is the main purpose for choosing this strategy, you must be aware of the restrictions!
The pros include the following:
You can back up portions of the database on a more granular level and at different intervals.
You can perform log backups without delay or interruption to other dependent technologies (such as log shipping).
You can restore a portion of the database more quickly in the event of an isolated hardware failure.
The cons include the following:
Administration is complex.
You cannot perform partial restores of the database to another location. Only the full database “based backup strategy allows you to perform partial database restores from full database backups. The database must use a structural design that supports files and filegroups; then on restore, the full database backup must be read to find the file/filegroup that is being restored. However, using a log shipping secondary that has a load delay can resolve this problem, especially in a VLDB where loading the database ”even partially ” can be extremely time-consuming . This recovery option can be important in disasters involving human error.