Common Errors | PostgreSQL Developer's Handbook2001

Mastering backups and recovery is perhaps one of the most important areas of database administration. However, the maintenance of restorable copies can, in many cases, be very difficult ” especially on systems that have to do a lot of INSERT and UPDATE operations.

In general, you can make four mistakes when working with a database server:

You have no backup at all. We don't have to mention that this can lead to some real trouble.
You create a backup but you have never tested the recovery process. This is maybe one of the worst errors you can make when working with backups. You create backups and feel secure that nothing can happen to you because you have copies of your data. If something goes wrong, you might be in deep trouble if you don't know how recovery works. If you have never tried it, you feel insecure during the recovery process, and this can take a lot of time and lead to errors. Make sure that you know perfectly well how to restore a backup. This is an extremely important point that most people forget about.
You create a backup but nobody but you knows about it . Machines are generally more reliable than humans . You should take this into consideration when designing database systems and thinking of backup strategies. In many companies we dealt with in the past few years , we saw something extremely dangerous: The person who was responsible for the backup or the entire IT system knew everything, but was the only one who did. What happens if that person is on holiday or decides to leave the company? Many companies could be in deep trouble when facing such a situation. No matter how redundant and reliable your IT system is, if no one knows how it works and what has to be done in case of failure, you will be in deep trouble. Redundant knowledge among people working with a system is as least as important redundant hardware. Documentation of all crucial processes in an IT system has to be available, and more than just one person must know how things work. Many companies try to save money by neglecting documentation ”recovery processes are, in most cases, more expensive and more dangerous than writing a short piece of documentation. Finally, don't forget that documentation must also be maintained .
You create a backup on the same volume as the original data. In the last few years, we have also seen some other horror scenarios. We have found backup systems where the backup was actually on the same volume as the original data. Assume a situation where your hard disk has stopped working. If you can still read the backup, you are very lucky. In most cases this won't be possible. You have to use separate volumes for your backup and the original data or you will face trouble when you try to recover.
The backup and the original data are in the same room. In case of fire, it is essential to have one version of the data stored in an another location; otherwise , the data and backup might be destroyed . In that case, the data is lost and there is no way to restore it.
You have only the most recent backup. If you have only the most recent backup available, you might have a problem. In most cases, you don't know that something has gone wrong on the system, so the most recent backup will also contain errors and recovery will be difficult.

Full Backup Versus Incremental Backup

Many companies rely on redundant storage systems; they are a good choice for protecting your data, but they are not enough. Backup does not mean buying a lot of additional and expensive hardware. Real backup starts in the brain of every person involved in the process of data processing and backup. Everybody has to take care of data security and documentation to guarantee a fully working IT environment.

Most backup systems available on the net support two types of backups:

Full backup . Full backup means that all the data is saved at once. This is the most comfortable way to provide backups that can easily be recovered. Depending on the amount of data, however, full backups can sometimes be a problem. When dealing with a lot of data, a full backup might not fit on a backup medium or copying all data might take too long.
Incremental backup . Incremental backup is usually used in combination with full backup. In an incremental backup, only data that is newer than the last backup is saved. The advantage of this method is that significantly less data has to be saved. The problem is that the recovery process can be complicated and annoying (depending on the backup software you use). On systems where full backups are not possible because the amount of data is too large, some sort of incremental backup has to be used. Many people use full backups periodically and run incremental backups more frequently. Withdatabases, incremental backups are difficult because backing up is not done on the row level.

Starting Backups Using cron

On Unix systems, backups are usually started by cron, a daemon used to execute scheduled commands. cron is flexible and easy-to-configure software. Also called "Vixie Cron" because the most important implementation of the cron daemon is provided by Paul Vixie, this implementation is currently used by all major Linux distributors .

If you want to add a job to your cron, you can do this by typing crontab -e in your favorite Unix shell. cron starts the default editor of your system, which in many cases, is ed. ed is a line-oriented text editor some of you might know from ancient Unix times. On some systems, the default editor is vi, and one of the most powerful and widespread editors on Unix systems. If you don't want to use ed or vi to configure cron, you set the environment variables $EDITOR or $VISUAL to your favorite editors. This can be done with the following command (for Bourne Shell users):

 export EDITOR=vim export VISUAL=vim

vim is a text editor that is upward-compatible to vi. On Linux systems, vi is usually an alias for vim. vim has some nice advantages over vi, such as syntax highlighting.

You can use cron to define all times when you want to start a backup. Check out your man pages for a complete reference on cron and crontab .

Here is an example of an entry in crontab:

 0 3 * * *       /full_path_to_script/myscript.pl

myscript.pl is started at 3 o'clock every day. Every entry in crontab consists of six fields. The first five fields are used to define the times the program in field six has to be started by cron.

The first field defines the minute to start the programm, field two defines the hour , field three is responsible for the day of the month, field four defines the month, and field five can be used to restrict the day of the week the process has to be started. A field might contain an ordinary value or an asterisk ( * ), which indicates that the process is always started. You can see in the example that the script is started every day at three o'clock ”this means every day of the month, every month, and every day of the week, but only at three o'clock and zero minutes.

Backup Hardware

A lot of backup hardware is available. Whether you need to back up a few files or an entire mainframe, the hardware industry provides nearly everything you can imagine to back up your data.

The dinosaurs in the backup business are tapes. Some of you might already have dealt with DDSx tapes or IBM tapes, such as 3480 or 3490. Most people feel a insecure when working with streamers, because tapes have to be accessed differently than CDs, for instance. Unix systems offer a powerful set of commands to handle tapes, but for many users, it's difficult to get used to these tools.

Some companies offer jukeboxes, which are cartridges that contain multiple tapes (for example, 6 tapes). You don't have to change tapes every day, and can use cron to make this happen automatically. If you have to use multiple tapes to back up your data, you can also use robots. Some very powerful solutions are currently available to manage dozens of backup tapes automatically. Robots are commonly used mainframe environments to avoid changing tapes manually.

Backup servers are, in many cases, the most comfortable way to back up data. Backup servers are sometimes also cheaper than expensive streamer hardware. A backup server is nothing else than a machine used to receive copied data. Recovery from a backup server is usually easy because you merely copy the data over the network back to the machine that has crashed ”this can usually be done quickly.