There are two ways to back up your PostgreSQL database. The first method is to create an archive containing the filesystem files that comprise your database. The second method is to create a SQL script that describes how to re-create the data in your database.
In the first method, you use an archiving tool, such as tar, cpio, or backup, to back up all the files in your database cluster. There are a number of disadvantages to this method. First, your entire database cluster must be shut down to ensure that all buffers have been flushed to disk. Second, the size of a filesystem archive will often be larger than the size of the equivalent script because the filesystem archive will contain indexes and partially filled pages that do not have to be archived. Finally, it is not possible to restore a single database or table from a filesystem archive. There are, however, two advantages to using a filesystem archive. First, you may already have a backup scheme in place that will back up a file system; including your database cluster in that scheme is probably pretty easy.
The second (and usually preferred) method is to create a SQL script that can reconstruct the contents of your database from scratch. Then, when you need to restore data from an archive, you simply run the script.
PostgreSQL provides two utilities that you can use to create archive scripts: pg_dump, and pg_dumpall.
Using pg_dump
The pg_dump program creates a SQL script that re-creates the data and metadata in your database. Before I get into too many details, it might help to see the kind of script that pg_dump will create[7]:
[7] I've changed the formatting of this script slightly so that it fits on a printed page.
In this example, I've asked pg_dump to produce a script that re-creates a single table (-t customers) using INSERT commands rather than COPY commands (inserts).
$ pg_dump inserts -t customers movies Selected TOC Entries: connect - bruce TOC Entry ID 2 (OID 518934) Name: customers Type: TABLE Owner: bruce CREATE TABLE "customers" ( "id" integer, "customer_name" character varying(50), "phone" character(8), "birth_date" date ); TOC Entry ID 3 (OID 518934) Name: customers Type: ACL Owner: REVOKE ALL on "customers" from PUBLIC; GRANT ALL on "customers" to "bruce"; GRANT ALL on "customers" to "sheila"; Data for TOC Entry ID 4 (OID 518934) Name: customers Type: TABLE DATA Owner: bruce INSERT INTO "customers" VALUES (1,'Jones, Henry','555-1212','1970-10-10'); INSERT INTO "customers" VALUES (2,'Rubin, William','555-2211','1972-07-10'); INSERT INTO "customers" VALUES (3,'Panky, Henry','555-1221','1968-01-21'); INSERT INTO "customers" VALUES (4,'Wonderland, Alice N.','555-1122','1969-03-05'); INSERT INTO "customers" VALUES (7,'Grumby, Jonas',NULL,'1984-02-21'); INSERT INTO "customers" VALUES (8,'Haywood, Rosemary','666-1212','1965-02-03');
If we feed this script back into psql (or some other client application), psql will connect to the database as user bruce, CREATE the customers table, assign the proper privileges to the table, and INSERT all the rows that had been committed at the time that we started the original pg_dump command. You can see that this script contains everything that we need to re-create the customers table starting from an empty database. If we had defined triggers, sequences, or indexes for the customers table, the code necessary to re-create those objects would appear in the script as well.
Now let's look at some of the command-line options for pg_dump. Start with pg_dump help:
$ pg_dump help pg_dump dumps a database as a text file or to other formats. Usage: pg_dump [options] dbname Options: -a, data-only dump only the data, not the schema -b, blobs include large objects in dump -c, clean clean (drop) schema prior to create -C, create include commands to create database in dump -d, inserts dump data as INSERT, rather than COPY, commands -D, column-inserts dump data as INSERT commands with column names -f, file=FILENAME output file name -F, format {c|t|p} output file format (custom, tar, plain text) -h, host=HOSTNAME database server host name -i, ignore-version proceed even when server version mismatches pg_dump version -n, no-quotes suppress most quotes around identifiers -N, quotes enable most quotes around identifiers -o, oids include oids in dump -O, no-owner do not output connect commands in plain text format -p, port=PORT database server port number -R, no-reconnect disable ALL reconnections to the database in plain text format -s, schema-only dump only the schema, no data -S, superuser=NAME specify the superuser user name to use in plain text format -t, table=TABLE dump this table only (* for all) -U, username=NAME connect as specified database user -v, verbose verbose mode -W, password force password prompt (should happen automatically) -x, no-privileges do not dump privileges (grant/revoke) -X use-set-session-authorization, use-set-session-authorization output SET SESSION AUTHORIZATION commands rather than connect commands -Z, compress {0-9} compression level for compressed formats If no database name is not supplied, then the PGDATABASE environment variable value is used. Report bugs to .
The most basic form for the pg_dump command is
pg_dump database
In this form, pg_dump archives all objects in the given database. You can see that pg_dump understands quite a number of command-line options. I'll explain the most useful options here.
If you use large-objects in your database, you may want to include the blobs (or -b) option so that large-objects are written to the resulting script. Needless to say, archiving large-objects increases the size of your archive.
You also might want to include either clean (-c) or create (-C) when you are using pg_dump for backup purposes. The clean flag tells pg_dump to DROP an object before it CREATE s the objectthis reduces the number of errors you might see when you restore from the script. The second option, create, tells pg_dump to include a CREATE DATABASE statement in the resulting archive. If you want to archive and restore an entire database, use the create option when you create the archive and drop the database before you restore.
In the previous example, I included the inserts flag. This flag, and the related column-inserts flag affect how pg_dump populates each table in your database. If you don't include either flag, pg_dump will emit COPY commands to put data back into each table. If you use the inserts flag, pg_dump will emit INSERT commands rather than COPY commands. If you use the column-inserts flag, pg_dump will build INSERT commands, which include column lists, such as
INSERT INTO "customers" ("id","customer_name","phone","birth_date") VALUES (1,'Jones, Henry','555-1212','1970-10-10');
Emitting COPY commands causes the restore to execute more quickly than INSERT commands, so you should usually omit both flags (inserts and column-inserts). You might want to build INSERT commands if you intend to use the resulting script for some other purpose, such as copying data into an Oracle, Sybase, or SQL Server database.
Because pg_dump is a client application, you don't have to be logged in to the server to create an archive script. A few of the pg_dump options (port, host, and username) control how pg_dump will connect to your database.
One of the problems that you may encounter when you run pg_dump is that it can produce scripts that are too large to store as a single file. Many operating systems impose a maximum file size of 2GB or 4GB. If you are archiving a large database, the resulting script can easily exceed the file size limit, even though no single table would (remember, each table is stored in its own file).
There are two (related) solutions to this problem. First, you can decrease the size of the archive script by compressing it. The pg_dump program writes the archive script to its standard output so you can pipe the script into a compression program:
$ pg_dump movies | gzip -9 > movies.gz
or
$ pg_dump movies | bzip2 -9 > movies.bz2
You also can compress the archive script by telling pg_dump to create the archive in custom format. The custom format is compressed and is organized so that the pg_restore program (described a bit later) can avoid problems caused by order of execution. To choose the custom format, include the format c flag:
$ pg_dump format c movies > movies.bak
Using the custom format means that your archive script will be compressed (thus taking less space and possibly fitting within the operating system imposed file size limit). However, you can't restore a custom-format script using psql; you must use pg_restore. That's not a problem per se; it's just something to be aware of.
Unfortunately, compressing the archive script is not really a solution; it simply delays the inevitable because even in compressed form, you may still exceed your OS file size limit. You may need to split the archive script into smaller pieces. Fortunately, the split command (a Unix/Linux/Cygwin utility) makes this easy. You can dump an entire database into a collection of smaller archive scripts (20MB each) with the following command:
$ pg_dump movies | split bytes=20m movies.bak.
This command causes pg_dump to produce a single script, but when you pipe the script to split, it will split the script into 20MB chunks. The end result is a collection of one or more files with names such as movies.bak.aa, movies.bak.ab, ... movies.bak.zz. When you want to restore data from these archives, you can concatenate them using the cat command:
$ cat movies.bak.* | psql -d movies
See the PostgreSQL Reference Manual for complete details on the pg_dump command.
Using pg_dumpall
The pg_dump command can archive individual tables or all the tables in a single database, but it cannot archive multiple databases. To archive an entire cluster, use the pg_dumpall command. pg_dumpall is similar to pg_dump in that it creates SQL scripts that can be used to re-create a database cluster.
pg_dumpall is actually a wrapper that invokes pg_dump for each database in your cluster. That means that pg_dumpall supports the same set of command-line options as pg_dump. Well, almostpg_dumpall silently ignores any attempts to produce a custom or tar format script. pg_dumpall can produce archive scripts only in plain text format. This introduces two problems. First, you cannot compress the archive script by selecting the custom format; you must pipe the script to an external compression program instead. Second, you cannot archive large-objects using pg_dumpall (pg_dump can archive only large-objects using custom format, which you can't use with pg_dumpall).
Using pg_restore
When you create an archive script using pg_dump or pg_dumpall, you can restore the archive using pg_restore. The pg_restore command processes the given archive and produces a sequence of SQL commands that re-create the archived data. Note that pg_restore cannot process plain text archive scripts (such as those produced by pg_dumpall); you must produce the archive using the format=c or format=t options. If you want to restore a plain text archive script, simply pipe it into psql.
A typical invocation of pg_restore might look like this:
$ pg_restore clean -d movies movies.bak
The clean flag tells pg_restore to drop each database object before it is restored. The -d movies option tells pg_restore to connect to the movies database before processing the archive: All SQL commands built from the archive are executed within the given database. If you don't supply a database name, pg_restore writes the generated SQL commands to the standard output stream; this can be useful if you want to clone a database.
Like pg_dump, pg_restore can be used from a remote host. That means that you can provide the hostname, username, and password on the pg_restore command line.
The pg_restore program allows you to restore specific database objects (tables, functions, and so on); see the PostgreSQL Reference Manual for more details.
Part I: General PostgreSQL Use
Introduction to PostgreSQL and SQL
Working with Data in PostgreSQL
PostgreSQL SQL Syntax and Use
Performance
Part II: Programming with PostgreSQL
Introduction to PostgreSQL Programming
Extending PostgreSQL
PL/pgSQL
The PostgreSQL C APIlibpq
A Simpler C APIlibpgeasy
The New PostgreSQL C++ APIlibpqxx
Embedding SQL Commands in C Programsecpg
Using PostgreSQL from an ODBC Client Application
Using PostgreSQL from a Java Client Application
Using PostgreSQL with Perl
Using PostgreSQL with PHP
Using PostgreSQL with Tcl and Tcl/Tk
Using PostgreSQL with Python
Npgsql: The .NET Data Provider
Other Useful Programming Tools
Part III: PostgreSQL Administration
Introduction to PostgreSQL Administration
PostgreSQL Administration
Internationalization and Localization
Security
Replicating PostgreSQL Data with Slony
Contributed Modules
Index