How to Configure PostgreSQL

This chapter is dedicated to all database administrators and to those who want to become PostgreSQL administrators. Database administration can be either a simple or a difficult task, depending on the demands your database has to face and the level of security you have to provide.

PostgreSQL is indeed a database system that can easily be administered as long as the database administrator (DBA) understands the basic concepts of databases, backup strategies, and security issues. Security demands are always increasing and it is necessary to have a high standard for security, even if your database is only used in a protected environment. But let's get to some practical stuff.

Configuring the Postmaster

The heart of every PostgreSQL database server is the so called "postmaster, the supervisor daemon of your database. The postmaster has a lot of parameters that can be configured to make your database run better and faster.

By using

  postmaster --help

we can get a little help about working with the postmaster.

On Unix systems, the postmaster is usually started when the default runlevel starts its processes, but it can also be started manually. No matter how the postmaster is started, you can use the flags just shown. Let's take a closer look at some flags.

-B Defines the number of shared memory buffers PostgreSQL uses for the backend processes. One shared memory buffer is usually 8KB, but the size can be redefined in src/include/config.h (see BLCKSZ ).
-D Specifies the directory PostgreSQL will use for the data. -D defines the root directory of a cluster of databases. If the database directories are not defined via the shell, PostgreSQL will use the environment variable $PGDATA instead. If you want to set this variable, check out your Unix manual and look for the export command (Bourne Shell).
-N Sets the maximum number of backend processes that are allowed to run on the machine. The default value is set to 32 but can also be changed in src/include/config.h .
-S Starts the postmaster in silent mode, which means that all messages the postmaster normally produces ”no matter whether on standard output or standard error ”are redirected to /dev/null . This is extremely bad for troubleshooting because you will loose important information.
-d Defines the debugging level your PostgreSQL server will use. The debugging level can be set from 1 to 5. The higher the debugging level is, the more information will be displayed. In most cases, it is useful to redirect the output to a file because this is the only way to get a complete history of what is going on with your PostgreSQL server.
-i Allows remote users to connect to your PostgreSQL server. Without -i set, only local Unix domain socket connections can be established.
-l Provides secure connections in combination with the -i flag. This works only when SSL has been enabled at compile time.
-o Passes options to the backend processes. This can be very useful when tuning or debugging your server. Many users consider -o more comfortable than editing files.
-p Defines the port your server will use. If no port is defined, PostgreSQL will use the value of the environment variable $PGPORT . Normally, PostgreSQL uses 5432 as the default port (the value that is usually compiled into the binaries).

Note

All changes that are done in src/include/config.h require a recompile of PostgreSQL.

If you want to start PostgreSQL using an init script, the parameters passed to the postmaster have to be added to the init scripts in /etc/rc.d/init.d for users of RedHat Linux or /etc/init.d for users of Debian Linux. Just go through the shell script and add the desired flags to the section where pg_ctl is started. init scripts are very hardware dependent, so there is more than just one way of how things can be done.

PostgreSQL 7.1 ”The `pg_options` File

PostgreSQL databases version 7.1 use a file called pg_options to configure runtime parameters. The pg_options file is used to configure a variety of parameters. It can usually be found in $PGDATA/data . All values defined in pg_options can be changed at runtime. You don't have to restart PostgreSQL because the parameters in pg_options are re-read when a new backend starts, becase everytime the backend is started a connection to the PostgreSQL database is established.

This feature is extremely useful because it avoids unpleasant downtimes.

Note

Queries that are already running are not affected by the changes.

The syntax of the pg_options file is extremely simple, a shown in the following:

 verbose=2 query=2 syslog=0

First verbose is set to two which makes PostgreSQL display some runtime information on the screen. The seond line tells PostgreSQL to print information about queries. Syslog is turned off so that all information is sent to stdout and stderr . Adding an option can easily be done by adding it to the file. The following is a complete list of all currently supported options:

all all is the global trace flag. It can be set to (trace messages enabled individually), 1 (enable all trace messages), or -1 (disable all trace messages). Tracing means that the internals of the database can be monitored .
verbose The verbose flag is used to define the amount of message the database provides. means that no messages are displayed. 1 provides more messages, and 2 displays as many messages as possible.
query This flag is used to define the way queries are traced. means that no query is displayed, 1 writes one line per query, and 4 prints the full query.
plan Prints the query plan.
parse Prints the output of the parser.
rewritten Prints rewritten queries.
pretty_plan Pretty-print query plan.
pretty_parse Prints pretty-print parser output.
pretty_rewritten Prints pretty-print rewritten queries.
parsestats Prints the statistics of the parser.
plannerstats Prints the statistics of the planner.
executorstats Prints the statistics of the executor.
locks Used to trace locking.
userlocks Used to trace user locks.
spinlocks Used to trace spin locks.
notify Used to trace notify functions.
lock_debug_oidmin Minimum relation oid traced by locks.
lock_debug_relid oid , if not zero, of relation traced by locks.
deadlock_timeout Deadlock check timer.
syslog Defines the behavior of syslog . prints all messages to standard output or standard error, 1 prints all messages to standard output, standard error, and syslog . 2 sends the messages to syslog only.
hostlookup Enables hostname lookups in ps_status .
showportnumber Shows port numbers in ps_status .
nofsync Disables fsync on a per-backend basis.

The nofsync flag may be especially interesting for you, and it is worth going into a little more into detail. int fsync(int fd) can be found in the unistd.h C library and is used to flush all data to disk. This is useful because it protects your data from being lost when the machine crashes. But fsync can also cost a lot of performance when dealing with huge amounts of data. We will focus on the performance aspect of fsync in Chapter 8, called "Performance Tuning."

PostgreSQL 7.1 ” `postgreSQL.conf`

Since PostgreSQL 7.1, all runtime parameters can be found in postgresql.conf . Like its predecessor, the file can be found in $PGDATA/data . Let's take a look at the default config file:

 # # PostgreSQL configuration file # ----------------------------- # # This file consists of lines of the form # #   name = value # # (The `=' is optional.) White space is collapsed, comments are # introduced by `#' anywhere on a line. The complete list of option # names and allowed values can be found in the PostgreSQL # documentation. Examples are: #log_connections = on #fsync = off #max_connections = 64 # Any option can also be given as a command line switch to the # postmaster, e.g., 'postmaster -c log_connections=on'. Some options # can be set at run-time with the 'SET' SQL command. #======================================================================== # #    Connection Parameters # #tcpip_socket = false #ssl = false #max_connections = 32 # 1-1024 #port = 5432 #hostname_lookup = false #show_source_port = false #unix_socket_directory = '' #unix_socket_group = '' #unix_socket_permissions = 0777 #virtual_host = '' #krb_server_keyfile = '' # #    Performance # #sort_mem = 512 #shared_buffers = 2*max_connections # min 16 #fsync = true # #    Optimizer Parameters # #enable_seqscan = true #enable_indexscan = true #enable_tidscan = true #enable_sort = true #enable_nestloop = true #enable_mergejoin = true #enable_hashjoin = true #ksqo = false #geqo = true #effective_cache_size = 1000  # default in 8k pages #random_page_cost = 4 #cpu_tuple_cost = 0.01 #cpu_index_tuple_cost = 0.001 #cpu_operator_cost = 0.0025 #geqo_selection_bias = 2.0 # range 1.5-2.0 # #    GEQO Optimizer Parameters # #geqo_threshold = 11 #geqo_pool_size = 0  #default based in tables, range 128-1024 #geqo_effort = 1 #geqo_generations = 0 #geqo_random_seed = -1 # auto-compute seed # #    Inheritance # #sql_inheritance = true # #    Deadlock # #deadlock_timeout = 1000 # #    Expression Depth Limitation # #max_expr_depth = 10000 # min 10 # #    Write-ahead log (WAL) # #wal_buffers = 8 # min 4 #wal_files = 0 # range 0-64 #wal_sync_method = fsync # fsync or fdatasync or open_sync or open_datasync # Note: default wal_sync_method varies across platforms #wal_debug = 0 # range 0-16 #commit_delay = 0 # range 0-100000 #commit_siblings = 5 # range 1-1000 #checkpoint_segments = 3 # in logfile segments (16MB each), min 1 #checkpoint_timeout = 300 # in seconds, range 30-3600 # #    Debug display # #silent_mode = false #log_connections = false #log_timestamp = false #log_pid = false #debug_level = 0 # range 0-16 #debug_print_query = false #debug_print_parse = false #debug_print_rewritten = false #debug_print_plan = false #debug_pretty_print = false #ifdef USE_ASSERT_CHECKING #debug_assertions = true #endif # #    Syslog # #ifdef ENABLE_SYSLOG #syslog = 0 # range 0-2 #syslog_facility = 'LOCAL0' #syslog_ident = 'postgres' #endif # #    Statistics # #show_parser_stats = false #show_planner_stats = false #show_executor_stats = false #show_query_stats = false #ifdef BTREE_BUILD_STATS #show_btree_build_stats = false #endif # #    Lock Tracing # #trace_notify = false #ifdef LOCK_DEBUG #trace_locks = false #trace_userlocks = false #trace_spinlocks = false #debug_deadlocks = false #trace_lock_oidmin = 16384 #trace_lock_table = 0 #endif

We will take a look at the most important parameters that affect your daily work with PostgreSQL.

log_connections Prints one line for every connection established to the server.
fsync Tells the database whether to turn fsync() on or off. If fsync() is on, the database makes sure that the data is physically written to disk.
max_connections By default, PostgreSQL allows 32 simultaneous connections. This value can be changed by setting max_connections to the desired value. By default, there is a compile limit of 1024 connections, which can only be changed by recompiling the server.
tcpip_socket If this flag is turned on, PostgreSQL accepts TCP/IP connections and not only Unix domain socket connections.
ssl Enables secure connections via SSL.
port Defines the TCP-port to which the server is listening. This value can only be set when the server starts; you can't change it for individual backend processes.
hostname_lookup If you want hostnames instead of IP addresses to be logged, hostname_lookup has to be turned on. Don't forget that DNS lookups can take some time.
show_source_port With this flag enabled, the database shows the outgoing port of the host connecting to PostgreSQL.
unix_socket_directory Defines the directory of the Unix domain socket the postmaster is listening to for connections from client applications. The default value is usually set to /tmp .
unix_socket_group Defines the group owner of the Unix domain socket. By default, it is an empty string, which means that the default group of the current user is taken.
unix_socket_permissions The value must be defined by using the default Unix syntax for setting file system permissions. For example, 0777 means that everybody can connect.
virtual_host Works pretty much the same like Apache's virtual_host directly. By default, PostgreSQL listens to all hostnames assigned to the machine.
krb_server_keyfile Defines the location of the Kerberos key file.
sort_mem If PostgreSQL has to sort data, it is usually done in memory as long as the size of the sort buffer is not exceeded. If that happens, the database uses temporary files for sorting. Compared to sorting in memory, this is ghastly slow, but if the amount of data that has to be processed is extremely huge, it cannot be done in any other way. Keep in mind that if many concurrent sorts are running, the amount of memory defined by sort_mem is used multiple times. The size of the sort buffer is defined in kilobytes.
shared_buffers Sets the shared memory buffers the database will use. A buffer is usually 8KB; the default value is set to 32.
enable_seqscan If you want to avoid the planner using sequential scans , you can unset the enable_seqscan flag. It is not possible to turn sequential scans off completely, but their use can be reduced to the minimum.
enable_indexscan Defines whether the database use indexscans.
enable_tidscan Defines whether the planner uses TID scan plan types.
enable_sort If you want to influence the number of sort operations, you can unset the enable_sort flag. Sorting can be turned off completely because the planner cannot always find a workaround.
enable_nestloop Enables the use of nested loops .
enable_mergejoin Enables the use of merge joins.
enable_hashjoins Enables the use of has joins.
ksqo The Key Set Query Optimizer (KSQO) makes the planner convert queries with many ANDs and ORs in the WHERE clause to queries using UNION . This is usually done when working with Microsoft Access.
geqo Defines whether generic query optimization is used. Generic query optimization means that query planning is done without exhaustive searching. The database uses on by default.
effective_cache_size Defines the value the optimizer assumes for the size of the disk cache.
random_page_cost In many cases, PostgreSQL cannot do sequential reading. Jumping from block to block takes longer than sequential reading. random_page_cost can be used to tell the query optimizer how much longer random reading takes (for example, 4 times). This is a floating point value and allows very fine tuning.
cpu_tuple_cost Defines the value the optimizer will assume to process a tuple. It is measured as a fraction of the cost of a sequential page fetch.
cpu_index_tuple_cost Similar to cpu_tuple_cost but sets the time for processing a tuple using an index.
cpu_operator_cost Sets the time to process an operator in a WHERE clause. It is also measured as a fraction of the cost of a sequential scan.
sql_inheritance Defines the behavior of the database when dealing with inheritance. As described in the section called "Inheritance" in Chapter 8), PostgreSQL treats inherited tables slightly different. If you find the old way better, set this value to off .
deadlock_timeout In a few cases, the database has to face a deadlock. deadlock_timeout defines the time to wait when waiting for a lock before checking whether the conditions for a deadlock are fulfilled. The time is measured in milliseconds .
max_expr_depth Tells the database how many expressions can be nested within each other. If this is set to a very high value, the backend can crash because of buffer overflows.
wal_buffers Defines the number of disk page buffers stored in shared memory for WAL log. WAL is short for Write Ahead Log.
wal_files PostgreSQL creates log files in advance. wal_files defines how many of these files are created.
silent_mode This has actually the same effect as the -S flag of the postmaster. No standard error and no standard output is produced.
log_timestamp Adds a timestamp to each entry in the logfile. The default value is set to off , but we strongly recommend that you turn it on , because it allows you to find out what kind of action was performed by the database.
log_pid The process ID can be added to every entry in the logfile by using the log_pid flag.
debug_level The higher this value is, the more debugging information will be displayed. The default value is set to . During development, it is useful to turn the debug level up and then down again for production.
syslog syslog is a powerful logging tool supported by all current Linux distributions. If this value is set to , syslog is turned off completely. 1 means that messages are written to standard output and syslog . If you use 2 , syslog only is used.

postgresql.conf is simply a wonderful thing. All major configuration tasks can be done within just one file, and this makes PostgreSQL administration even more simple.

Configuring the Postmaster

PostgreSQL 7.1 ”The pg_options File

PostgreSQL 7.1 ” postgreSQL.conf

PostgreSQL 7.1 ”The `pg_options` File

PostgreSQL 7.1 ” `postgreSQL.conf`