10.3 Configuring PostgreSQL | PHP and PostgreSQL. Advanced Web Programming2002

Because PostgreSQL is a flexible and powerful database server, many parameters can be configured and modified to achieve a maximum amount of performance and to make PostgreSQL behave the way you want it to.

In general, two files are important for configuring your database. pg_hba.conf contains the network settings and postgresql.conf contains almost all other parameters.

In this section you will find out how PostgreSQL can be configured and what settings you can make to achieve your target.

10.3.1 Runtime Parameters

PostgreSQL's runtime parameters can be found in postgresql.conf, which is located in your database cluster. The syntax of the file is easy and you will learn about the most important entries in this section. The entries will be discussed in the same order they occur in the default configuration file of PostgreSQL. This way it should be an easy task to find information about a certain setting quickly:

tcpip_socket In the default configuration PostgreSQL only accepts Unix sockets. If this flag is set to true, PostgreSQL will also accept connections via TCP/IP.
ssl To turn SSL (Secure Socket Layer) on, this flag has to be set to true.
max_connections This setting defines how many simultaneous connections PostgreSQL will accept. The default value is 32. The compiled-in upper limit is 1024. If the number of connections is increased, it is also necessary to change the memory settings of PostgreSQL such as the number of shared buffers; otherwise, PostgreSQL will not start up.
port In the default configuration PostgreSQL listens to port 5432 TCP. If you want to define another port, you can do it using the port flag.
hostname_lookup Normally only IP addresses are listed in the log file. To resolve this hostname_lookup must be set to true.
show_source_port This flag shows the outgoing port number of the host trying to establish a connection to PostgreSQL in the connection log messages. You could trace back the port number to find out what user initiated the connection.
unix_socket_directory This flag defines the directory the postmaster is listening to. The flag can be changed at compile time. The default directory is /tmp.
unix_socket_group A socket can be assigned to a group. Using an empty string means that the default group of the user is used.
unix_socket_permissions Unix sockets use the same permissions as any other file on a Unix system. The default setting is 0777, which means that everybody can connect to the database.
virtual_host By default PostgreSQL listens to all names assigned to a server. Using virtual_host allows you to tell PostgreSQL to listen to only one hostname.
krb_server_keyfile Sets the location of the Kerberos server key file.
shared_buffers Defines the number of shared buffers used by PostgreSQL. By default the size of one buffer is exactly the size of one page, which means 8192 bytes. You have to use at least twice the number of shared buffers as the maximum number of connections; otherwise, the postmaster will not start up.
max_fsm_relations Sets the maximum number of tables for which free space will be tracked in the shared free-space map. The default value is 100.
max_fsm_pages Sets the maximum number of disk pages for which free space will be tracked in the shared free-space map.
max_locks_per_transaction Defines the maximum number of objects that can be locked at a time.
wal_buffers Defines the number of pages buffers in shared memory for Write Ahead Logs (WAL).
sort_mem This flag defines how much data can be sorted in memory before moving to temporary files. sort_mem does not define the total amount of memory used for sorting, but it defines the amount of memory used per sort process. If you have more than just one concurrent sort process, you might need the amount of memory more than just once.
vacuum_mem With VACUUM, memory is allocated to keep track of the tuples reordered by the command. vacuum_mem defines the amount of memory VACUUM is allowed to allocate.
wal_files Defines the number of log files created in advance.
wal_fsync_method wal_fsync_method can have different values depending on the desired behavior of the database. FSYNC makes sure that the data is flushed to disk after every commit operation. FDATASYNC calls the fdatasync function after every commit. OPEN_SYNC writes WAL files with open() option O_SYNC. OPEN_DATASYNC writes WAL files with open() option O_DSYNC. All options are available on every operating system and should be defined in postgresql.conf.
wal_debug Defines the debug level for Write Ahead Log (WAL).
commit_delay This delay tells PostgreSQL to wait before using fsync to flush the WAL buffers to disk. In case of delay it is possible to flush buffers and WAL buffers with just one fsync, which is faster. The delay is defined in microseconds.
commit_siblings Sets the minimum number of concurrent transactions before using commit_delay.
checkpoint_segments Defines the maximum distance between two WAL checkpoints in the log segment. By default one log segment is 16MB.
checkpoint_timeout Sets the maximum time (in seconds) between two WAL checkpoints.
fsync To gain speed, the operating system buffers and caches data in memory before writing it to disk. In other words, if PostgreSQL writes data, this does not mean that it is written to disk immediately. In case of disaster, this can be a problem because the image of data the database is working on is not identical to the data on disk. Therefore the fsync flag can be used to tell the operating system to write the data to disk immediately. This will lead to a lower performance of the system, but it will help you to protect your data in case of disaster.
enable_seqscan By default this flag is set to off. If you want to force PostgreSQL to use an index, and if an index exists on a column, this flag has to be set to off. If no index on a column has been defined, PostgreSQL will perform a sequential scan, so you don't have to worry that you cannot look for columns on which no index has been defined.
enable_indexscan Just like sequential scans, index scans can be turned off.
enable_tidscan Tells PostgreSQL whether or not it is allowed to use TID scan plan types. Normally this is used for debugging purposes.
enable_sort To avoid as many sort operations as possible, this flag can be set to off. In some cases sorts can be avoided by using workarounds. Turning sorts off does not mean that absolutely no sorts are performed.
enable_nestloop Nested loops are loops within loops. Usually nested loops occur when two tables are joined without using an index. Nested loops are often a hint that a query has not been optimized properly.
enable_mergejoin and enable_hashjoin PostgreSQL provides various ways of joining tables. This flag is used to turn hash-joins and merge-joins on and off.
ksqo The Key Set Query Optimizer is often used in combination with products like Microsoft Access. The idea is that queries containing complex OR and AND conjunctions are substituted for queries using UNION. This will not always return the same result as if ksqo was turned off.
geqo In complex queries, it is possible in most cases to achieve faster results by using a genetic algorithm instead of exhaustive searching.
effective_cache_size Sets an estimation of the disk cache that will be used by the kernel. The value is defined in 8k blocks.
random_page_cost Sets the estimation made for loading one block from the disk.
cpu_tuple_cost Sets the costs of processing one tuple.
cpu_index_tuple_cost Sets the costs for processing one tuple using an index.
cpu_operator_cost Sets the costs for processing one tuple in the WHERE clause.
geqo_effort geqo_generations, geqo_pool_size, geqo_random_seed, geqo_selection_bias These flags are tuning parameters for geqo. With the help of these settings, it is possible to define how the genetic algorithm works. These parameters should only be changed if you know perfectly well how things work internally; otherwise, the optimizer might produce bad results.
silent_mode Makes the postmaster run silently, which means that no messages are sent to stdout and stderr.
log_connections Displays information about every connection that is established to the server successfully.
log_timestamp To make PostgreSQL add a timestamp to the logfile, this flag has to be enabled. Setting this flag to true is important because this way it is possible to debug your applications easily.
log_pid In addition to the timestamp it is possible to log the process id of the back end generating the logging entry.
debug_level The standard debugging level can be set to values ranging from 0 to 16. The higher the debugging level is, the more information will be generated. In real-world applications, it is not useful to use a higher debugging level than 3 or 4.
DEBUG_PRINT_PARSE DEBUG_PRINT_PLAN, DEBUG_PRINT_REWRITTEN, DEBUG_PRINT_QUERY, DEBUG_PRETTY_PRINT With the help of these parameters, it is possible to send a variety of values to the logfile.
debug_assertions If you are facing strange problems, assertion checks can be turned on to analyze strange behavior. This flag can only be used if --enable-cassert has been enabled at compile time.
syslog PostgreSQL supports syslog as a logging mechanism. To use syslog, --enable-syslog must be enabled at compile time.
syslog_facility With the help of this flag you can define the syslog facility you want to use.
syslog_ident If syslog is turned on, this option determines the program name used to identify PostgreSQL messages in syslog log messages.
show_query_stats show_parser_stats, show_planner_stats, show_executor_stats, show_btree_build_stats To log information about the various components of the database such as the parser or the planner, these flags can be set to on.
stats_start_collector PostgreSQL provides a subprocess for generating statistics. By default this feature is turned on, but it can be turned off by setting stats_start_collector to off.
stats_reset_on_server_start If this flag is turned on, the statistics are dropped every time the server is started.
trace_notify Generates a lot of debugging output for the LISTEN and NOTIFY command.
dynamic_library_path To tell PostgreSQL where dynamically loaded objects can be found, you can define a path using dynamic_library_path.
australian_timezones If this parameter is set to true, CST, EST, and SAT are interpreted as Australian time zones rather than as North American Central/Eastern time zones.
authentication_timeout An authentication process has to be ready after a certain time. If the time is exceeded, PostgreSQL breaks the connection to avoid hangups.
default_transaction_isolation PostgreSQL 7.2 supports two of four transaction isolation levels proposed by the ANSI SQL standard. With the help of default_transaction_isolation, it is possible to tell PostgreSQL which transaction isolation level is the default one.
max_expr_depth Sets the maximum depth of an expression accepted by the parser. This way "endless" expressions can be avoided.
max_files_per_process Sets the maximum number of files a process is allowed to open.
password_encryption Passwords can be stored in an encrypted or an unencrypted way. This flag sets the default value if neither ENCRYPTED nor UNENCRYPTED is passed to the CREATE USER command.
sql_inheritance To turn inheritance on or off, this flag can be used. If inheritance is turned off, every table will be treated as an independent table.
transform_null_equals By default expr = NULL is not treated as expr IS NULL. To change the behavior, this flag can be set to on.

If you are running a small database system, most of the default settings will satisfy your demands. In huge installations or complex operations that have to be performed by the database, we recommend that you take a closer look at these settings to get the maximum performance out of your system.

10.3.2 Network Security

So far, you have learned about the runtime parameters of PostgreSQL. In this section you will learn to configure PostgreSQL for use in network environments.

The most important configuration file for running PostgreSQL in a network is called pg_hba.conf. This file is used to restrict or allow access to the database via a TCP/IP network. In addition to allowing and restricting, you can define which computers on the network have to use which kind of user authentication (if any).

The format of the pg_hba.conf file is easy take a look at the syntax overview:

 host  DBNAME  IP_ADDRESS  ADDRESS_MASK  AUTH_TYPE  [AUTH_ARGUMENT] hostssl  DBNAME  IP_ADDRESS  ADDRESS_MASK  AUTH_TYPE  [AUTH_ARGUMENT] local  DBNAME  AUTH_TYPE  [AUTH_ARGUMENT]

A pg_hba.conf file can contain three different kinds of records. host records are used to restrict or allow the access of remote machines connecting to PostgreSQL via TCP/IP. First, the name of the database you want to restrict or allow has to be defined. If you want to define a setting affecting all databases, you can use the keyword all. In the next field the IP address of a host or an IP range has to be defined. In the fourth column, a netmask has to be defined to tell PostgreSQL how big a network is. AUTH_TYPE tells PostgreSQL what kind of authentication has to be used. In the last column, arguments can be passed to the database.

Let's take a closer look at the kinds of user authentication that are supported by PostgreSQL:

trust No user authentication is required. Every valid user will be accepted.
password A user has to use a password to access the database. The password is sent to the server as plain text and won't be encrypted. A name of a file can be passed to PostgreSQL as AUTH_ARGUMENT. If a file is used, the users and passwords are taken from this file and all entries in pg_shadow will be overwritten.
md5 In contrast to password, encrypted passwords are sent over the network. md5 can use usernames stored in secondary password files but not secondary passwords.
crypt This option is equal to md5 but it is used in combination with clients prior to 7.2.
ident Ident servers send the name a user is logged in on a machine to PostgreSQL. This way it is not necessary to pass the user to the database because it is automatically determined by ident.
krb4 Kerberos 4 authentication.
krb5 Kerberos 5 authentication.
reject Rejects the establishment of a connection to the database.
pam Authentication is done by PAM (Pluggable Authentication Modules). If you want to use this feature, PostgreSQL must be compiled using --with-pam.

The same ways of authentication can be used in combination with hostssl instead of host.

local records are equal to host and hostssl records, but the IP address and the netmask do not have to be defined because PostgreSQL uses Unix sockets instead of TCP/IP.

Now that you have seen which methods and flags are provided by PostgreSQL, it is time to have a look at some examples:

 local      all                                          trust host       all         127.0.0.1     255.255.255.255    trust

In the default configuration the local machine can access all databases without user authentication. local means that all connections are established via Unix sockets and not via TCP/IP. In the second line you can see that local connections using the loopback device are allowed as well. If you want to allow remote users to connect to the database, some lines have to be added:

 local      all                                             trust host       all         127.0.0.1        255.255.255.255    trust host       phpbook     212.186.25.254   255.255.255.255    password host       phpbook     192.168.1.0      255.255.255.0      crypt

In this example two lines have been added to the configuration. The third line tells PostgreSQL to allow users at 212.186.25.254 to connect to the database called phpbook using a password. In addition, the entire network 192.168.1.x can access phpbook as well using passwords. In this case the passwords will be sent to the server in an encrypted way.

Before you try to connect to the database, you can define the password of the user called hs:

 phpbook=> ALTER USER hs UNENCRYPTED PASSWORD 'mypasswd'; ALTER USER

Let's try to connect to the database located on 212.186.25.254:

 [postgres@athlon pgdb]$ psql -h 212.186.25.254 -d phpbook psql: could not connect to server: Connection refused         Is the server running on host 212.186.25.254 and accepting         TCP/IP connections on port 5432?

Although you have changed pg_hba.conf, you cannot connect to the database. To solve the problem, you can restart the server. In the first step, the server is shut down:

 [postgres@athlon pgdb]$ pg_ctl -D /var/pgdb/ stop waiting for postmaster to shut down......done postmaster successfully shut down

If no errors occurred, PostgreSQL can be started:

 [postgres@athlon pgdb]$ pg_ctl -D /var/pgdb/ -o "-i" start postmaster successfully started DEBUG:  database system was shut down at 2001-12-09 01:12:14 CET DEBUG:  checkpoint record is at 0/FE5174 DEBUG:  redo record is at 0/FE5174; undo record is at 0/0; shutdown TRUE DEBUG:  next transaction id: 3300; next oid: 26077 DEBUG:  database system is ready

Don't forget the -i flag to make PostgreSQL listen to TCP/IP connections.

When starting the server, PostgreSQL reads the configuration file and uses the new settings. Therefore you can connect to the database now:

 [postgres@athlon pgdb]$ psql -h 212.186.25.254 -d phpbook -U hs Password: Welcome to psql, the PostgreSQL interactive terminal. Type:  \copyright for distribution terms        \h for help with SQL commands        \? for help on internal slash commands        \g or terminate with semicolon to execute query        \q to quit phpbook=>

A password must be passed to PostgreSQL, and if it is correct, you will be in the interactive shell.

When I am teaching, people are always asking me about the behavior of PostgreSQL when the pg_hba.conf file is not definite. Because this is an important point, I have decided to include a practical example here:

 local      all                                             trust host       all         127.0.0.1        255.255.255.255    trust host       phpbook     212.186.25.254   255.255.255.255    password host       phpbook     212.186.25.254   255.255.255.255    reject

In the third line, it says that phpbook on 212.186.25.254 can be accessed. In the fourth line, the exact opposite is defined. What is PostgreSQL going to do?

After restarting the server, you can easily find the answer:

 [postgres@athlon postgres]$ psql -h 212.186.25.254 -d phpbook -U hs Password: Welcome to psql, the PostgreSQL interactive terminal. Type:  \copyright for distribution terms        \h for help with SQL commands        \? for help on internal slash commands        \g or terminate with semicolon to execute query        \q to quit phpbook=>

You can connect to the database just as you have done before: PostgreSQL takes the settings in the third line because the first line is the first one to configure 212.186.25.254.

If the third and the fourth line are switched, the situation will be different:

 local      all                                             trust host       all         127.0.0.1        255.255.255.255    trust host       phpbook     212.186.25.254   255.255.255.255    reject host       phpbook     212.186.25.254   255.255.255.255    password

After restarting the server, PostgreSQL will tell you that you are not allowed to access the database because the first line defining 212.186.25.254 contains reject:

 [postgres@athlon postgres]$ psql -h 212.186.25.254 -d phpbook -U hs psql: FATAL 1:  No pg_hba.conf entry for host 212.186.25.254, user hs, database phpbook

It is necessary to know how PostgreSQL treats ambiguous configuration files because otherwise the configuration you have built might not be enough. If you are in doubt, we always recommend that you check your configuration using an IP spoofer.

10.3.3 Advanced Authentication and SSL

Configuring business-critical and huge systems demands additional features. Business-critical applications especially demand a high level of security, so it can be useful to work with things like Secure Socket Layer (SSL) to transmit data over a network in a secure way.

In addition to secure transmission, a more complex and more flexible authentication system can be used. This section will provide a brief overview of the most important ways of authenticating.

10.3.3.1 Using SSL

To achieve a higher level of security, PostgreSQL can be used in combination with OpenSSL, which is natively supported by PostgreSQL. For using SSL some basic steps have to be taken. The first thing to do is to add the appropriate entries to pg_hba.conf:

 local      all                                             trust host       all         127.0.0.1        255.255.255.255    trust hostssl    phpbook     212.186.25.254   255.255.255.255    password [postgres@athlon tmp]$ pg_ctl -D /var/pgdb/ -o "-i -l" start postmaster successfully started /usr/local/postgresql/bin/postmaster: failed to load server certificate (/var/pgdb//server.crt): No such file or directory

To use PostgreSQL with SSL keys, certificates have to be generated. The algorithm used by OpenSSL is based on a public and a private key. In this section you will not learn about how to use SSL in detail because this is beyond the scope of this book. The only thing you will see is how to generate a self-signed certificate fast. Let's see how it works:

 [postgres@athlon tmp]$ openssl req -new -text -out cert.req Using configuration from /usr/share/ssl/openssl.cnf Generating a 1024 bit RSA private key .++++++ ..++++++ writing new private key to 'privkey.pem' Enter PEM pass phrase: Verifying password - Enter PEM pass phrase: ----- You are about to be asked to enter information that will be incorporated into your certificate request. What you are about to enter is what is called a Distinguished Name or a DN. There are quite a few fields but you can leave some blank For some fields there will be a default value, If you enter '.', the field will be left blank. ----- Country Name (2 letter code) [AU]:AT State or Province Name (full name) [Some-State]:Vienna Locality Name (eg, city) []:Vienna Organization Name (eg, company) [Internet Widgits Pty Ltd]:www.cybertec.at Organizational Unit Name (eg, section) []:Marketing Common Name (eg, your name or your server's hostname) []:athlon Email Address []:hs@cybertec.at Please enter the following 'extra' attributes to be sent with your certificate request A challenge password []:iloveyou An optional company name []:

Just enter the command at the beginning of the listing and answer all questions you are asked. The key will be passphrase-protected. To remove the passphrase, which is necessary to start the postmaster automatically, run the following command:

 [postgres@athlon tmp]$ openssl rsa -in privkey.pem -out cert.*pem* read RSA key Enter PEM pass phrase: writing RSA key

In the next step, the certificate has to be turned into a self-signed certificate:

 [postgres@athlon pgdb]$ openssl req -x509 -in cert.req -text -key cert.pem -out cert.cert Using configuration from /usr/share/ssl/openssl.cnf

Finally, the only thing left to do is to copy the key to the directory where the postmaster will look for it. In this scenario the directory will be the root directory of the database cluster the postmaster has been started for:

 [postgres@athlon pgdb]$ cp cert.pem /var/pgdb/server.key [postgres@athlon pgdb]$ cp cert.cert /var/pgdb/server.crt

Now that the certificates are ready, you can start the postmaster:

 [postgres@athlon pgdb]$ pg_ctl -D /var/pgdb/ -o "-i -l" start postmaster successfully started DEBUG:  database system was shut down at 2001-12-09 17:20:47 CET DEBUG:  checkpoint record is at 0/16FC92C DEBUG:  redo record is at 0/16FC92C; undo record is at 0/0; shutdown TRUE DEBUG:  next transaction id: 45517; next oid: 36086 DEBUG:  database system is ready

No errors occurred, and you can start working with PostgreSQL safely via SSL:

 [postgres@athlon pgdb]$ psql -d phpbook -h 212.186.25.254 -U hs Password: Welcome to psql, the PostgreSQL interactive terminal. Type:  \copyright for distribution terms        \h for help with SQL commands        \? for help on internal slash commands        \g or terminate with semicolon to execute query        \q to quit SSL connection (cipher: DES-CBC3-SHA, bits: 168) phpbook=>

When starting the front end, PostgreSQL will mention that the connection has been established via SSL.

10.3.3.2 Kerberos Authentication

Kerberos is a standard way of performing secure authentication across a public network. Kerberos is a protocol designed for authentication. Nowadays it is used by many commercial products as well as open source products. In PostgreSQL, Kerberos 4 and Kerberos 5 are supported, but only one version can be enabled at a time.

Networks are insecure and it is not recommended to transmit information as plain text without using encryption. Some people try to secure their networks using firewalls, but in many cases the potential threat comes from both inside and from somewhere outside. Therefore those parts that are considered to be secure are sometimes more insecure than others. This problem is addressed by Kerberos whose protocol makes heavy use of cryptography.

Kerberos can be used as a security solution for entire companies and also works reliably in combination with PostgreSQL.