Section 21.3. Point-in-Time Recovery | Backup & Recovery: Inexpensive Backup Solutions for Open Systems

21.3. Point-in-Time Recovery

Point-in-time, or PIT, recovery in PostgreSQL is very different from either pg_dump or pg_dumpall. However, if you're familiar with how to do hot backups in Oracle without using rman, this is very similar. It's also not that complicated.

21.3.1. Creating a Backup to Use with Point-in-Time Recovery

If you're going to do PIT recovery, you have to first enable WAL archiving. If you haven't done that, review the write ahead log section to see how to do so.

Once you've enabled WAL archiving, the process is simple:

Tell PostgreSQL that you're going to start an external backup (select pg_start_backup).
Perform the actual backup.
Tell PostgreSQL that you're finished with the external backup (select pg_stop_backup).

Let's look at that in more detail. The first thing you have to do is to tell PostgreSQL that you're going to start a hot backup. Connect to any database in the cluster as superuser, and issue the following SQL command:

> select pg_start_backup('label');

For label, use any string that you want to identify the backup, such as the full path of where you're going to put the backup. PostgreSQL then puts a file called backup_label in the cluster directory. This file contains information about the backup such as start/stop time of the backup and the WAL segments written while backup was being done. You have now told PostgreSQL that you plan to perform an external backup. You can then copy the files underneath the cluster directory any way you wish, while probably excluding the pg_xlog directory to save some space. Methods can include everything from creating a tar archive on disk to running a commercial backup utility to back it up to tape or disk.

Once the backup is done, you need to tell PostgreSQL that you're done. Connect to any database in the cluster as superuser, and issue this SQL command:

> select pg_stop_backup;

The write ahead log is used during recovery to correct any inconsistencies created by backing up the datafiles/pagefiles while they were being used. It is therefore essential that you have every WAL segment that was active from the time you selected pg_start_backup and pg_stop_backup. If you don't have every one of these WAL segments, you cannot use that backup.

The final WAL segment is archived as soon as you generate enough changes to fill it. However, this could be several hours or several days, depending on the level of change within your database. If you lost your entire database directory (including the pg_xlog subdirectory), you couldn't use the backup you just created because the last WAL segment it needs would be lost. It would be nice if you could force PostgreSQL to switch WAL segments, but this is not available as of this writing. If you could do that, the current WAL segment would be archived immediately, and you would be sure that you could use this current backup.

Suppose that you do find yourself in a recovery scenario in which you cannot use the most recent backup because you lost the disk drive where the database and the pg_xlog directory resided. What would you do? The option that results in the least amount of data loss would be to recover from the previous backup (which you hopefully kept) and use the archived transaction logs to roll forward to the latest log you have available.

If you'd like to see which WAL segments are needed for the hot backup, you can examine the .backup file that is created in the WAL archive directory. The filename would look something like this:

000000010000001A00000012.00535CD8.backup

In this example, 000000010000001A00000012 was the current WAL log when the backup was started (pg_start_backup), and 00535CD8 is the checkpoint record. If you examine this file, you'll see something like the following:

START WAL LOCATION: 1A/12535CD8 (file 000000010000001A00000012) STOP WAL LOCATION: 1A/131C407C (file 000000010000001A00000013) CHECKPOINT LOCATION: 1A/12535CD8 START TIME: 2006-07-11 11:59:02 BST LABEL: DBBACK-FILE-20060614-132.tar.gz STOP TIME: 2006-07-11 12:04:21 BST

To be able to restore with this backup, you absolutely must have the file specified in START WAL LOCATION (e.g., 000000010000001A00000012), the file specified in STOP WAL LOCATION (e.g., 00000001000000-1A00000013), and any files in between. (In this example, there are no files in between the start and stop WAL segments.)

Once you've done all this, your backup is done.

21.3.2. Restoring from a Point-in-Time Backup

The steps to restoring from a PIT backup are relatively simple. Yes, they're more complicated than a pg_restore or psql command, but there are two advantages to restoring this way. The first is that you can perform a multithreaded restore, greatly increasing the speed of a large restore. The second advantage is that you can recover right up to the point of failure. If those advantages aren't important to you, then use one of the other methods. Here's the procedure:

Stop the postmaster if it's running.
If you've got the space to do so, and if it's still available, you might want to back up the whole cluster data directory (and any other place you put tablespaces) to some alternate location. At a minimum, you should copy the contents of the pg_xlog subdirectory because it may contain unarchived WAL files.
Clean out the cluster data directory and any other directories into which you will be copying data, including pg_xlog.
Restore the database files from the backup taken earlier, with the appropriate ownership and permissions. Make sure that you don't overwrite any symbolic links that you created.
If your restore copied any files into pg_xlog, remove them; these are from the backed up database and are now obsolete. If you didn't back up pg_xlog, be sure to recreate it and the pg_xlog/archive_status subdirectory as well.
Copy (not move) back into pg_xlog any unarchived WAL files that you saved in step 2, or that you saved immediately after the backup.
Create a file recovery.conf in the cluster data directory. This file contains a restore_command that is essentially the opposite of the archive_command that you used for archiving WAL segment files. For example, suppose you use this archive_command during normal operations:

% archive_command = 'cp -i %p /mnt/server/archivedir/%f'

The reverse of that would be the restore_command:

% restore_command = 'cp /mnt/server/archivedir/%f %p'

Put the appropriate command into the recovery.conf file.
It might also be a good idea to modify pg_hba.conf so that ordinary users can't connect as soon as the recovery is complete. Once you've verified that the restore worked, you can undo this change and allow users access to the database.
Start the postmaster. It automatically goes into recovery mode and processes the appropriate WAL files. Once it's done, it renames recovery.conf to recovery.done and then commences normal database operations.
Check the database to make sure all is well. If not, return to step 1. If so, return everything to normal. If you denied users access in step 8, allow them to connect by restoring pg_hba.conf to its normal status and doing a pg_ctl reload.

Please note that this method is an all-or-nothing method. You cannot use the PIT method to restore an individual table, tablespace, or database. You also cannot restore to a point in time before the backup was taken.

BackupCentral.com has a wiki page for every chapter in this book. Read or contribute updated information about this chapter at http://www.backupcentral.com.