How to Configure PostgreSQL
This chapter is dedicated to all database administrators and to those who want to become PostgreSQL administrators. Database administration can be either a simple or a difficult task, depending on the demands your database has to face and the level of security you have to provide.
PostgreSQL is indeed a database system that can easily be administered as long as the database administrator (DBA) understands the basic concepts of databases, backup strategies, and security issues. Security demands are always increasing and it is necessary to have a high standard for security, even if your database is only used in a protected environment. But let's get to some practical stuff.
The heart of every PostgreSQL database server is the so called
the supervisor daemon of your database. The postmaster has a lot of parameters that can be configured to make your database run better and faster.
we can get a little help about working with the postmaster.
On Unix systems, the postmaster is usually started when the default runlevel starts its processes, but it can also be started manually. No matter how the postmaster is started, you can use the flags just shown. Let's take a closer look at some flags.
Defines the number of shared memory buffers PostgreSQL uses for the backend processes. One shared memory buffer is usually 8KB, but the
can be redefined in
Specifies the directory PostgreSQL will use for the data.
defines the root directory of a cluster of databases. If the database directories are not defined via the shell, PostgreSQL will use the environment variable
instead. If you want to set this variable, check out your Unix manual and look for the
command (Bourne Shell).
Sets the maximum number of backend processes that are allowed to run on the machine. The default value is set to 32 but can also be changed in
Starts the postmaster in silent mode, which means that all messages the postmaster normally produces ”no matter whether on standard output or standard error ”are redirected to
. This is extremely bad for troubleshooting because you will loose important information.
Defines the debugging level your PostgreSQL server will use. The debugging level can be set from 1 to 5. The higher the debugging level is, the more information will be displayed. In most cases, it is useful to redirect the output to a file because this is the only way to get a complete history of what is going on with your PostgreSQL server.
Allows remote users to connect to your PostgreSQL server. Without
set, only local Unix domain socket connections can be established.
Provides secure connections in combination with the
flag. This works only when SSL has been enabled at compile time.
Passes options to the backend processes. This can be very useful when tuning or debugging your server. Many users consider
more comfortable than editing files.
Defines the port your server will use. If no port is defined, PostgreSQL will use the value of the environment variable
. Normally, PostgreSQL uses 5432 as the default port (the value that is usually compiled into the binaries).
All changes that are done in
require a recompile of PostgreSQL.
If you want to start PostgreSQL using an
script, the parameters passed to the postmaster have to be added to the
for users of RedHat Linux or
for users of Debian Linux. Just go through the shell script and add the desired flags to the section where
scripts are very hardware dependent, so there is more than just one way of how things can be done.
PostgreSQL 7.1 ”The
PostgreSQL databases version 7.1 use a file called
to configure runtime parameters. The
file is used to configure a variety of parameters. It can usually be found in
. All values defined in
can be changed at runtime. You don't have to restart PostgreSQL because the parameters in
are re-read when a new backend starts, becase everytime the backend is started a connection to the PostgreSQL database is established.
This feature is extremely useful because it avoids unpleasant downtimes.
Queries that are already running are not affected by the changes.
The syntax of the
file is extremely simple, a shown in the following:
is set to two which makes PostgreSQL display some runtime information on the screen. The seond line
PostgreSQL to print information about queries. Syslog is turned off so that all information is sent to
. Adding an option can easily be done by adding it to the file. The following is a complete list of all currently supported options:
is the global trace flag. It can be set to
(trace messages enabled individually),
(enable all trace messages), or
(disable all trace messages). Tracing means that the internals of the database can be
flag is used to define the amount of message the database provides.
means that no messages are displayed.
provides more messages, and
displays as many messages as possible.
This flag is used to define the way queries are traced.
means that no query is displayed,
one line per query, and
prints the full query.
Prints the query plan.
Prints the output of the parser.
Prints rewritten queries.
Pretty-print query plan.
Prints pretty-print parser output.
Prints pretty-print rewritten queries.
Prints the statistics of the parser.
Prints the statistics of the planner.
Prints the statistics of the executor.
Used to trace locking.
Used to trace
Used to trace spin locks.
Used to trace notify functions.
traced by locks.
, if not zero, of relation traced by locks.
Deadlock check timer.
Defines the behavior of
prints all messages to standard output or standard error,
prints all messages to standard output, standard error, and
sends the messages to
Enables hostname lookups in
on a per-backend basis.
flag may be
interesting for you, and it is worth going into a little more into detail.
int fsync(int fd)
can be found in the
C library and is used to flush all data to disk. This is useful because it protects your data from being lost when the machine crashes. But
can also cost a lot of performance when dealing with huge amounts of data. We will focus on the performance aspect of
in Chapter 8, called "Performance Tuning."
PostgreSQL 7.1 ”
Since PostgreSQL 7.1, all runtime parameters can be found in
. Like its predecessor, the file can be found in
. Let's take a look at the default
# PostgreSQL configuration file
# This file consists of lines of the form
# name = value
# (The `=' is optional.) White space is collapsed, comments are
# introduced by `#' anywhere on a line. The complete list of option
# names and allowed values can be found in the PostgreSQL
# documentation. Examples are:
#log_connections = on
#fsync = off
#max_connections = 64
# Any option can also be given as a command line switch to the
# postmaster, e.g., 'postmaster -c log_connections=on'. Some options
# can be set at run-time with the 'SET' SQL command.
# Connection Parameters
#tcpip_socket = false
#ssl = false
#max_connections = 32 # 1-1024
#port = 5432
#hostname_lookup = false
#show_source_port = false
#unix_socket_directory = ''
#unix_socket_group = ''
#unix_socket_permissions = 0777
#virtual_host = ''
#krb_server_keyfile = ''
#sort_mem = 512
#shared_buffers = 2*max_connections # min 16
#fsync = true
# Optimizer Parameters
#enable_seqscan = true
#enable_indexscan = true
#enable_tidscan = true
#enable_sort = true
#enable_nestloop = true
#enable_mergejoin = true
#enable_hashjoin = true
#ksqo = false
#geqo = true
#effective_cache_size = 1000 # default in 8k pages
#random_page_cost = 4
#cpu_tuple_cost = 0.01
#cpu_index_tuple_cost = 0.001
#cpu_operator_cost = 0.0025
#geqo_selection_bias = 2.0 # range 1.5-2.0
# GEQO Optimizer Parameters
#geqo_threshold = 11
#geqo_pool_size = 0 #default based in tables, range 128-1024
#geqo_effort = 1
#geqo_generations = 0
#geqo_random_seed = -1 # auto-compute seed
#sql_inheritance = true
#deadlock_timeout = 1000
# Expression Depth Limitation
#max_expr_depth = 10000 # min 10
# Write-ahead log (WAL)
#wal_buffers = 8 # min 4
#wal_files = 0 # range 0-64
#wal_sync_method = fsync # fsync or fdatasync or open_sync or open_datasync
# Note: default wal_sync_method varies across platforms
#wal_debug = 0 # range 0-16
#commit_delay = 0 # range 0-100000
#commit_siblings = 5 # range 1-1000
#checkpoint_segments = 3 # in logfile segments (16MB each), min 1
#checkpoint_timeout = 300 # in seconds, range 30-3600
# Debug display
#silent_mode = false
#log_connections = false
#log_timestamp = false
#log_pid = false
#debug_level = 0 # range 0-16
#debug_print_query = false
#debug_print_parse = false
#debug_print_rewritten = false
#debug_print_plan = false
#debug_pretty_print = false
#debug_assertions = true
#syslog = 0 # range 0-2
#syslog_facility = 'LOCAL0'
#syslog_ident = 'postgres'
#show_parser_stats = false
#show_planner_stats = false
#show_executor_stats = false
#show_query_stats = false
#show_btree_build_stats = false
# Lock Tracing
#trace_notify = false
#trace_locks = false
#trace_userlocks = false
#trace_spinlocks = false
#debug_deadlocks = false
#trace_lock_oidmin = 16384
#trace_lock_table = 0
We will take a look at the most important parameters that affect your daily work with PostgreSQL.
Prints one line for every connection established to the server.
Tells the database whether to
on or off. If
is on, the database makes sure that the data is physically written to disk.
By default, PostgreSQL allows 32 simultaneous connections. This value can be changed by setting
to the desired value. By default, there is a compile limit of 1024 connections, which can only be changed by recompiling the server.
If this flag is turned on, PostgreSQL accepts TCP/IP connections and not only Unix domain socket connections.
Enables secure connections via SSL.
Defines the TCP-port to which the server is listening. This value can only be set when the server starts; you can't change it for individual backend processes.
If you want hostnames instead of IP addresses to be logged,
has to be turned on. Don't forget that DNS lookups can take some time.
With this flag enabled, the database shows the outgoing port of the host connecting to PostgreSQL.
Defines the directory of the Unix domain socket the postmaster is listening to for connections from client applications. The default value is usually set to
Defines the group owner of the Unix domain socket. By default, it is an empty string, which means that the default
of the current user is taken.
The value must be defined by using the default Unix syntax for setting file system permissions. For example,
means that everybody can connect.
Works pretty much the same like Apache's
directly. By default, PostgreSQL listens to all hostnames assigned to the machine.
Defines the location of the Kerberos key file.
If PostgreSQL has to sort data, it is usually done in memory as long as the size of the
buffer is not exceeded. If that happens, the database uses temporary files for sorting. Compared to sorting in memory, this is ghastly slow, but if the amount of data that has to be
is extremely huge, it cannot be done in any other way. Keep in mind that if many concurrent sorts are running, the amount of memory defined by
is used multiple times. The size of the sort buffer is defined in kilobytes.
Sets the shared memory buffers the database will use. A buffer is usually 8KB; the default value is set to 32.
If you want to avoid the planner using sequential
, you can unset the
flag. It is not possible to turn sequential scans off completely, but their use can be reduced to the minimum.
Defines whether the database use indexscans.
Defines whether the planner uses TID scan plan types.
If you want to influence the number of sort operations, you can unset the
flag. Sorting can be turned off completely because the planner cannot always find a workaround.
Enables the use of nested
Enables the use of merge joins.
Enables the use of
The Key Set Query Optimizer (KSQO) makes the planner convert queries with many ANDs and ORs in the
clause to queries using
. This is usually done when working with Microsoft Access.
Defines whether generic query optimization is used. Generic query optimization means that query planning is done without exhaustive searching. The database uses
Defines the value the optimizer assumes for the size of the disk cache.
In many cases, PostgreSQL cannot do sequential reading. Jumping from block to block takes longer than sequential reading.
can be used to tell the query optimizer how much longer random reading takes (for example, 4 times). This is a floating point value and allows very fine tuning.
Defines the value the optimizer will assume to process a tuple. It is measured as a fraction of the cost of a sequential page fetch.
but sets the time for processing a tuple using an index.
Sets the time to process an operator in a
clause. It is also measured as a fraction of the cost of a sequential scan.
Defines the behavior of the database when dealing with inheritance. As described in the section called "Inheritance" in Chapter 8), PostgreSQL treats inherited tables slightly different. If you find the old way better, set this value to
In a few cases, the database has to face a deadlock.
defines the time to wait when waiting for a lock before checking whether the conditions for a deadlock are fulfilled. The time is measured in
Tells the database how many expressions can be nested within each other. If this is set to a very high value, the backend can crash because of buffer overflows.
Defines the number of disk page buffers stored in shared memory for WAL log. WAL is short for Write Ahead Log.
PostgreSQL creates log files in advance.
defines how many of these files are created.
This has actually the same effect as the
flag of the postmaster. No standard error and no standard output is produced.
Adds a timestamp to each entry in the logfile. The default value is set to
, but we strongly recommend that you turn it
, because it allows you to find out what kind of action was performed by the database.
The process ID can be added to every entry in the logfile by using the
The higher this value is, the more debugging information will be displayed. The default value is set to
. During development, it is useful to turn the debug level up and then down again for production.
is a powerful logging tool supported by all current Linux distributions. If this value is set to
is turned off completely.
means that messages are written to standard output and
. If you use
only is used.
is simply a wonderful thing. All major configuration
can be done within just one file, and this makes PostgreSQL administration even more simple.