15.9. Backing Up an RDBMSProtecting an RDBMS is very complex. There are several storage elements, including datafiles, rollback logs, transaction logs, and the master database. How do you get all of the data to a secondary storage medium if it's changing all the time?
15.9.1. Physical and Logical BackupsThere are two primary methods of backing up an RDBMS database: physical backups and logical backups. A physical backup physically backs up the datafiles. This is also referred to as a database backup. There are two types of physical backups, a cold backup and a hot backup. A cold backup is done by shutting down the database prior to doing the backup. This is often the simplest way to do a database backup, especially if your database's datafiles reside in the filesystem. All you have to do is shut down the database and run your normal filesystem backup utility. Unfortunately, this method may require your database to be shut down for a long time. That is why more and more environments are performing hot backups, which are done while the database is online. These types of backup, of course, require a lot more work behind the scenes, because you are trying to copy the datafiles while the database is writing to them. You'll also need a backup utility that understands the internal structure of the database. There are several ways to do physical backups:
A logical backup copies, or exports, data objects (usually tables) but does not record the data's location. A logical backup can restore a deleted table without having to restore all the datafiles in which it resides. It also can be used to move a table from one database to another. These options are made possible by the definition of a logical backup: it backs up the data and not the data's location. Therefore, it can be restored into any location. Logical backups, however, do not have the ability to do a point-in-time recovery. They also can introduce referential integrity problems, because you could load a table that requires information from another table that is not present. The biggest problem with exports, though, is that they almost always need to be done with the database offline. Logical backups are actually much simpler than physical backups. Each database has an export utility that creates a logical backup of one or more database objects to a file. Some of the commercial utilities also allow you to integrate logical and physical backups into your backup system.
15.9.2. Get Every InstanceChapter 2 includes a section called "Are You Backing Up What You Think You're Backing Up?" It talks about how your backup programs should be written so that everything in your system is automatically discovered and backed up. If you add a new filesystem, you should not then have to edit your backup scripts to get it backed up. This goes double for databases because they often are added and deleted much more frequently than filesystems. You need some way to ensure that every database instance on every server is being backed up. Commercial backup products can automatically ask the operating system which drives and filesystems it has. Wouldn't it be nice if you could do that with databases? Let's start with Oracle and Sybase. Sybase has the interfaces file that lists every server on each system. If an instance is not listed in this file, users cannot connect to it. Oracle has the oratab file that accomplishes the same task, but its use is not mandatory, as Sybase's interfaces file is. Programs can ask the Windows registry about which Exchange or SQL Server databases are running on a given server. DB2 DBAs can query its catalog. Unfortunately, Informix does not have this type of functionality unless you create it yourself. Since the files described here are not always used and yet should be, I'd like to emphasize what I just said: you need a centralized file that lists all the instances on the server. You then should start with that file to determine what instances are on a given serverinstances that need to be backed up. Enforce the use of the oratab file by writing startup scripts that start up only instances that are listed in those files. A wayward (or busy) DBA still can create an instance without putting it in this file and can even get it running. However, if you reboot the box enough times, the DBA will be sure to put it in the startup file so he doesn't have to manually start it every time! For Informix, create a file similar to oratab and write startup scripts that only start up databases listed in that file. 15.9.3. Transaction Log Dumps Are Not Incremental BackupsThis important topic often is misunderstood. The confusion comes from Sybase and SQL Server documentation that often refers to a transaction log dump as an incremental backup. They are not the same thing! What is the difference between the two? An incremental backup is a special backup that contains only the changed pages (blocks) since the last full (or higher level incremental) backup. A transaction log dump is a backup of all the transactions that have occurred since the last transaction log dump. They may sound similar, but they're not. The latter is much more difficult to manage and much slower to read. Perhaps the best way to illustrate this would be to discuss Oracle's rman program, which has both incremental and transaction log backups. Suppose that you created a full backup on Friday. During the week, you did not perform any full backups and ran only redo log (transaction log) backups. Now suppose that it is Thursday, and you need to restore your database. You would have your full backup from Friday and your redo log backups from each day. To restore, you need to read your full backup and then read each of the redo logfiles in order. Assuming you made one backup volume per day, you would need seven volumes. Now assume the same scenario as the preceding, except that you also ran an incremental backup every night. If you have to restore the database on Thursday, you would need the full backup from Friday, the latest incremental backup volume (presumably Wednesday's), and the redo logs for Thursday; in other words, three volumes instead of seven. That is because the incremental backup contains all changes since the full backup. Besides the difference in complexity, reading an incremental backup also is much quicker than reading a transaction log backup. Ask anyone who has rolled through several days' worth of transaction logs. In one benchmark that I performed, reading two weeks of transaction logs took 36 hours. Reading an incremental backup covering the same time period took only one hour. The reason for this is simple. A given page may be changed several times; replaying the transaction log also changes it several times. Loading a true incremental backup changes it only once, to its last value. DB2 has incremental backups, which back up all changes since the last full backup, and delta backups, which back up only changes since the last backup of any kind. They also have transaction log backups. Oracle and Informix support multiple levels in their backups (e.g., level 0, level 1, etc.). 15.9.4. Do It Yourself: Creating Your Own Backup UtilityYou don't have to use a high-priced commercial utility to back up your databases. Doing so certainly can make your backups more automated or centrally controlled, but since most commercial utilities can cost thousands of dollars per system, many people are using homegrown systems. 15.9.4.1. Intermediary diskThis is one of the most popular ways to do homegrown database backups. It's fast, clean, and easy. The basic idea uses a script that backs up the database to disk. That backup is treated as a regular file and backed up by the nightly filesystem backup. You can even save the amount of disk space needed by compressing the files after they're backed up. If you're really pressed for space, you can use named pipes to compress the backup as it's being written. In that case, you would need a backup disk that is only one-third to one-half the size of your original database disk (depending on the compression rate you get). Unless you have a very large database, this probably is cheaper than buying a commercial utility to perform this task. Each of the vendor-specific database backup chapters in this book explain how to do this. 15.9.4.2. Dedicated tape driveYou can use homegrown backup scripts to back up to a dedicated tape drive. This is a little more complicated and requires somewhat more work on your part. Depending on the size of your database, this may be more or less expensive than backing up to disk, but it definitely will be slower. It is also more complex, because you must keep track of each volume and label it in such a way that you know which database was backed up to it. (If you back up to disk, this can be done by naming the backup file the same name as the database.) 15.9.4.3. Shell/batch scriptsI assume that you are doing the preceding backups using some sort of shell or batch script. Scripts are much better than having a simple cron or at entry that says: back up databaseA to deviceB. Scripts can do lots of error checking and can be told to do things such as notify the DBA if something is wrong. 15.9.5. Calling a ProfessionalThis is one of the biggest growth markets within the backup product industry. Most commercial filesystem backup products now have interfaces to automatically back up your database to volumes that are managed by their product. It's really a beautiful thing, but it does come at a price! Each major database vendorincluding all those covered in this bookhas a backup utility API that commercial backup products can program to that allows them to back up that database. (The commercial backup product is often called a storage manager.) On a high level, these backup utilities all work in essentially the same way. The database vendor's utility generates one or more backup streams via an API that storage managers can talk to. The companies that produce the storage managers then can write a utility that talks to their storage manager on one side, and the database backup utility's API on the other side. Although the database backup utilities come bundled with the database products, the commercial backup products' utilities cost several thousand dollars. These backup utilities sometimes will not function without a storage manager. For example, Oracle's rman and Informix's onbar must have a storage manager to talk to in order to back up to tape, although rman can back up to a disk without a storage manager. In contrast, Sybase's and SQL Server's dump utility, DB2's db2 backup, and Exchange's ntbackup can back up to tape or disk without any storage manager present. Since Oracle and Informix didn't want to force their customers to buy a storage manager or the interface to their backup utility, they came up with a compromise. Both vendors bundled a free, stripped-down version of a storage manager that gives you enough functionality to be able to use rman and onbar to do backups. Figure 15-5 uses Oracle and this storage manager to illustrate the different pieces of the backup puzzle. Oracle uses rman to interface with the database on one side and the storage manager's database module on the other side. The storage manager communicates with the backup media on one side and the database module on the other side. The storage manager then uses the database module to interface between the storage manager and rman. The backup data flows from the Oracle database, through rman, tHRough the database module, through the storage manager, and to the backup media. Restores obviously flow in the opposite direction. Figure 15-5. A typical commercial database backup utility setupUtilities for each of the big databases have their own backup and recovery history. 15.9.5.1. DB2db2 backup has a long history and has always kept backup and recovery in mind. You can restore databases created on different types of platforms, and the restore utility can also be used to restore backup images that were produced on a previous version of DB2 (up to two versions earlier) as long as the word size (32- or 64-bit) is the same. You can even back up a database created on Linux for zSeries (mainframe) and restore it to an AIX-based DB2 UDB server. DB2 has also recently added the recovery command, which helps simplify restores. 15.9.5.2. ExchangeExchange is a special-purpose database and does not have a built-in backup command. However, there is an API that ntbackup interfaces with, so you can use ntbackup to back up and recover a live Exchange database. You can also back up and recover at the storage group level. If you want to do mailbox-level backups, you need to use other utilities. 15.9.5.3. InformixInformix backups have historically been done with tbtape (now called ontape), which is a standalone backup command designed to back up to tape. ontape is simple, has incremental capabilities, also can back up to disk, and backs up the database live. Some of these features, which always were assumed by Informix users to be present in other database systems, are just now appearing in other products. Informix also has onbar, which is designed specifically to send a stream of backup data to a commercial product. Whichever command you use, you can recover individual dbspaces. 15.9.5.4. MySQLHow you back up a MySQL database depends on which storage engine you use. Each major storage engine has its own backup and recovery method. 15.9.5.5. OracleOracle databases can be backed up with the Recovery Manager, or rman. rman has come a long way since its introduction and is a much easier way to back up and restore your database than what Oracle now calls user-managed backups. 15.9.5.6. PostgreSQLThere are two main ways to back up PostgreSQL databases. The first is equivalent to the user-managed hot backups in Oracle, where you run a script that puts the database in backup mode, then copies its files. The second is a command (pg_dump or pg_dumpall) that dumps the entire database into a large text or binary file. 15.9.5.7. SybaseSybase has come a long way in the backup arena. You can now recover an individual tablespace or device. Unfortunately, since many people use only the default segment or filegroup, this means most DBAs have to restore the entire database or nothing at all. Creating additional segments or filegroups could help with this issue. 15.9.5.8. SQL ServerThe SQL Server backup utility provides basic backup and restore capabilities. It enables you to do incremental, differential, full, and transaction log backups without third-party tools. |