Section 15.9. Backing Up an RDBMS


15.9. Backing Up an RDBMS

Protecting an RDBMS is very complex. There are several storage elements, including datafiles, rollback logs, transaction logs, and the master database. How do you get all of the data to a secondary storage medium if it's changing all the time?

This chapter focuses primarily on how to back up the data in the database. We assume that you are also backing up other important information, such as the operating system, the database application, and the schema of the database.


15.9.1. Physical and Logical Backups

There are two primary methods of backing up an RDBMS database: physical backups and logical backups.

A physical backup physically backs up the datafiles. This is also referred to as a database backup. There are two types of physical backups, a cold backup and a hot backup. A cold backup is done by shutting down the database prior to doing the backup. This is often the simplest way to do a database backup, especially if your database's datafiles reside in the filesystem. All you have to do is shut down the database and run your normal filesystem backup utility. Unfortunately, this method may require your database to be shut down for a long time. That is why more and more environments are performing hot backups, which are done while the database is online. These types of backup, of course, require a lot more work behind the scenes, because you are trying to copy the datafiles while the database is writing to them. You'll also need a backup utility that understands the internal structure of the database.

There are several ways to do physical backups:

  • With most databases that use cooked files as datafiles, you can simply shut down the database and do a full system backup. Since everything exists as a regular filesystem file, everything will get backed up. If you need to restore, you can then restore the entire database from this backup. You then need to replay the transaction logs against the old database files. Many databases don't allow you to replay transaction logs against a backup that was made this way. For those databases (which include DB2, Sybase, SQL Server, and Sybase), this method will not work.

  • If your datafiles are raw partitions in Unix, you can shut down the database and back them up if you have a utility or script to do so. For example, in Unix you use dd. This method is quite a bit more complex than the first, though, because you need to know which devices to run dd against.

  • The next method of backing up a database is to back it up live to disk or tape using a utility provided for that purpose. Oracle uses rman for this purpose, DB2 uses the db2 backup command, Sybase and SQL Server provide the dump utility, and you can use ntbackup with Exchange.

  • Another method of backing up databases is to use a utility that sends one or more streams of data to a commercial storage manager (i.e., backup software). This is the cleanest method if you can afford it (it can cost several thousand dollars per system). Each of the major database vendors provides an API that third-party storage managers can write to that allows them to back up data from a database.

  • A few commercial utilities provide functionality that is different from that of the previously listed options. The most popular of these is SQL Backtrack, which is now sold by BMC software. These utilities wrap around some of the native utilities and do not use the vendor-supplied API. They also enable you to send a data stream to commercial backup products, and some products have interfaces to the utilities. Still other options include interfaces to products that do not use the newer interfaces. The validity of backup and recovery programs that do not use the vendor-supplied API is left as a decision for the reader.

A logical backup copies, or exports, data objects (usually tables) but does not record the data's location. A logical backup can restore a deleted table without having to restore all the datafiles in which it resides. It also can be used to move a table from one database to another. These options are made possible by the definition of a logical backup: it backs up the data and not the data's location. Therefore, it can be restored into any location. Logical backups, however, do not have the ability to do a point-in-time recovery. They also can introduce referential integrity problems, because you could load a table that requires information from another table that is not present. The biggest problem with exports, though, is that they almost always need to be done with the database offline.

Logical backups are actually much simpler than physical backups. Each database has an export utility that creates a logical backup of one or more database objects to a file. Some of the commercial utilities also allow you to integrate logical and physical backups into your backup system.

Raw Devices Versus Cooked Files

This is an often-debated topic in DBA circles. Many DBAs want to use raw devices because they believe they are faster than using filesystem files. Some DBAs prefer to use filesystem (cooked) files due to their ease of administration.

The historical reasons for using raw partitions were increased performance and data integrity. A database spends a lot of time trying to ensure all the data is in its proper state. It logs every changed page and knows the exact condition of every page at any point in time. It assumes that when it tells the OS to write a page to the datafile, the OS does so. However, many filesystems cache writes to increase performance. This means that the database thinks that a page has been modified on disk when it really hasn't. This is a bad thing.

Modern databases resolve this problem by opening a cooked file in O_SYNC mode, which automatically flushes any changes from the OS buffer to the disk. This means that this historical argument is no longer true with most databases. Therefore, the only remaining reason for using raw partitions as datafiles is performance. Many claim a 5 to 15 percent performance increase over filesystem files. Your mileage may vary; these tests were performed on a closed track with a professional driver; please do try this at home. Find out what works best for you.


15.9.2. Get Every Instance

Chapter 2 includes a section called "Are You Backing Up What You Think You're Backing Up?" It talks about how your backup programs should be written so that everything in your system is automatically discovered and backed up. If you add a new filesystem, you should not then have to edit your backup scripts to get it backed up. This goes double for databases because they often are added and deleted much more frequently than filesystems.

You need some way to ensure that every database instance on every server is being backed up. Commercial backup products can automatically ask the operating system which drives and filesystems it has. Wouldn't it be nice if you could do that with databases? Let's start with Oracle and Sybase. Sybase has the interfaces file that lists every server on each system. If an instance is not listed in this file, users cannot connect to it. Oracle has the oratab file that accomplishes the same task, but its use is not mandatory, as Sybase's interfaces file is. Programs can ask the Windows registry about which Exchange or SQL Server databases are running on a given server. DB2 DBAs can query its catalog. Unfortunately, Informix does not have this type of functionality unless you create it yourself.

Since the files described here are not always used and yet should be, I'd like to emphasize what I just said: you need a centralized file that lists all the instances on the server. You then should start with that file to determine what instances are on a given serverinstances that need to be backed up. Enforce the use of the oratab file by writing startup scripts that start up only instances that are listed in those files. A wayward (or busy) DBA still can create an instance without putting it in this file and can even get it running. However, if you reboot the box enough times, the DBA will be sure to put it in the startup file so he doesn't have to manually start it every time! For Informix, create a file similar to oratab and write startup scripts that only start up databases listed in that file.

15.9.3. Transaction Log Dumps Are Not Incremental Backups

This important topic often is misunderstood. The confusion comes from Sybase and SQL Server documentation that often refers to a transaction log dump as an incremental backup. They are not the same thing!

What is the difference between the two? An incremental backup is a special backup that contains only the changed pages (blocks) since the last full (or higher level incremental) backup. A transaction log dump is a backup of all the transactions that have occurred since the last transaction log dump. They may sound similar, but they're not. The latter is much more difficult to manage and much slower to read.

Perhaps the best way to illustrate this would be to discuss Oracle's rman program, which has both incremental and transaction log backups. Suppose that you created a full backup on Friday. During the week, you did not perform any full backups and ran only redo log (transaction log) backups. Now suppose that it is Thursday, and you need to restore your database. You would have your full backup from Friday and your redo log backups from each day. To restore, you need to read your full backup and then read each of the redo logfiles in order. Assuming you made one backup volume per day, you would need seven volumes.

Now assume the same scenario as the preceding, except that you also ran an incremental backup every night. If you have to restore the database on Thursday, you would need the full backup from Friday, the latest incremental backup volume (presumably Wednesday's), and the redo logs for Thursday; in other words, three volumes instead of seven. That is because the incremental backup contains all changes since the full backup.

Besides the difference in complexity, reading an incremental backup also is much quicker than reading a transaction log backup. Ask anyone who has rolled through several days' worth of transaction logs. In one benchmark that I performed, reading two weeks of transaction logs took 36 hours. Reading an incremental backup covering the same time period took only one hour. The reason for this is simple. A given page may be changed several times; replaying the transaction log also changes it several times. Loading a true incremental backup changes it only once, to its last value.

DB2 has incremental backups, which back up all changes since the last full backup, and delta backups, which back up only changes since the last backup of any kind. They also have transaction log backups. Oracle and Informix support multiple levels in their backups (e.g., level 0, level 1, etc.).

15.9.4. Do It Yourself: Creating Your Own Backup Utility

You don't have to use a high-priced commercial utility to back up your databases. Doing so certainly can make your backups more automated or centrally controlled, but since most commercial utilities can cost thousands of dollars per system, many people are using homegrown systems.

15.9.4.1. Intermediary disk

This is one of the most popular ways to do homegrown database backups. It's fast, clean, and easy. The basic idea uses a script that backs up the database to disk. That backup is treated as a regular file and backed up by the nightly filesystem backup. You can even save the amount of disk space needed by compressing the files after they're backed up. If you're really pressed for space, you can use named pipes to compress the backup as it's being written. In that case, you would need a backup disk that is only one-third to one-half the size of your original database disk (depending on the compression rate you get). Unless you have a very large database, this probably is cheaper than buying a commercial utility to perform this task. Each of the vendor-specific database backup chapters in this book explain how to do this.

15.9.4.2. Dedicated tape drive

You can use homegrown backup scripts to back up to a dedicated tape drive. This is a little more complicated and requires somewhat more work on your part. Depending on the size of your database, this may be more or less expensive than backing up to disk, but it definitely will be slower. It is also more complex, because you must keep track of each volume and label it in such a way that you know which database was backed up to it. (If you back up to disk, this can be done by naming the backup file the same name as the database.)

15.9.4.3. Shell/batch scripts

I assume that you are doing the preceding backups using some sort of shell or batch script. Scripts are much better than having a simple cron or at entry that says: back up databaseA to deviceB. Scripts can do lots of error checking and can be told to do things such as notify the DBA if something is wrong.

15.9.5. Calling a Professional

This is one of the biggest growth markets within the backup product industry. Most commercial filesystem backup products now have interfaces to automatically back up your database to volumes that are managed by their product. It's really a beautiful thing, but it does come at a price!

Each major database vendorincluding all those covered in this bookhas a backup utility API that commercial backup products can program to that allows them to back up that database. (The commercial backup product is often called a storage manager.) On a high level, these backup utilities all work in essentially the same way. The database vendor's utility generates one or more backup streams via an API that storage managers can talk to. The companies that produce the storage managers then can write a utility that talks to their storage manager on one side, and the database backup utility's API on the other side. Although the database backup utilities come bundled with the database products, the commercial backup products' utilities cost several thousand dollars.

These backup utilities sometimes will not function without a storage manager. For example, Oracle's rman and Informix's onbar must have a storage manager to talk to in order to back up to tape, although rman can back up to a disk without a storage manager. In contrast, Sybase's and SQL Server's dump utility, DB2's db2 backup, and Exchange's ntbackup can back up to tape or disk without any storage manager present. Since Oracle and Informix didn't want to force their customers to buy a storage manager or the interface to their backup utility, they came up with a compromise. Both vendors bundled a free, stripped-down version of a storage manager that gives you enough functionality to be able to use rman and onbar to do backups. Figure 15-5 uses Oracle and this storage manager to illustrate the different pieces of the backup puzzle. Oracle uses rman to interface with the database on one side and the storage manager's database module on the other side. The storage manager communicates with the backup media on one side and the database module on the other side. The storage manager then uses the database module to interface between the storage manager and rman. The backup data flows from the Oracle database, through rman, tHRough the database module, through the storage manager, and to the backup media. Restores obviously flow in the opposite direction.

Figure 15-5. A typical commercial database backup utility setup


Utilities for each of the big databases have their own backup and recovery history.

15.9.5.1. DB2

db2 backup has a long history and has always kept backup and recovery in mind. You can restore databases created on different types of platforms, and the restore utility can also be used to restore backup images that were produced on a previous version of DB2 (up to two versions earlier) as long as the word size (32- or 64-bit) is the same. You can even back up a database created on Linux for zSeries (mainframe) and restore it to an AIX-based DB2 UDB server. DB2 has also recently added the recovery command, which helps simplify restores.

15.9.5.2. Exchange

Exchange is a special-purpose database and does not have a built-in backup command. However, there is an API that ntbackup interfaces with, so you can use ntbackup to back up and recover a live Exchange database. You can also back up and recover at the storage group level. If you want to do mailbox-level backups, you need to use other utilities.

15.9.5.3. Informix

Informix backups have historically been done with tbtape (now called ontape), which is a standalone backup command designed to back up to tape. ontape is simple, has incremental capabilities, also can back up to disk, and backs up the database live. Some of these features, which always were assumed by Informix users to be present in other database systems, are just now appearing in other products. Informix also has onbar, which is designed specifically to send a stream of backup data to a commercial product. Whichever command you use, you can recover individual dbspaces.

15.9.5.4. MySQL

How you back up a MySQL database depends on which storage engine you use. Each major storage engine has its own backup and recovery method.

15.9.5.5. Oracle

Oracle databases can be backed up with the Recovery Manager, or rman. rman has come a long way since its introduction and is a much easier way to back up and restore your database than what Oracle now calls user-managed backups.

15.9.5.6. PostgreSQL

There are two main ways to back up PostgreSQL databases. The first is equivalent to the user-managed hot backups in Oracle, where you run a script that puts the database in backup mode, then copies its files. The second is a command (pg_dump or pg_dumpall) that dumps the entire database into a large text or binary file.

15.9.5.7. Sybase

Sybase has come a long way in the backup arena. You can now recover an individual tablespace or device. Unfortunately, since many people use only the default segment or filegroup, this means most DBAs have to restore the entire database or nothing at all. Creating additional segments or filegroups could help with this issue.

15.9.5.8. SQL Server

The SQL Server backup utility provides basic backup and restore capabilities. It enables you to do incremental, differential, full, and transaction log backups without third-party tools.




Backup & Recovery
Backup & Recovery: Inexpensive Backup Solutions for Open Systems
ISBN: 0596102461
EAN: 2147483647
Year: 2006
Pages: 237

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net