The PostgreSQL File Structure | The Database Hackers Handbook: Defending Database Servers

PostgreSQL is typically installed to /usr/local/pgsql or /var/lib/pgsql on Unix systems and C:\Program Files\PostgreSQL\<version number> under Windows. The file structure is as follows :

 /var/lib/pgsql/data    pg_hba.conf    pg_ident.conf    PG_VERSION    postgresql.conf    postmaster.opts    postmaster.pid     +---base    +---1           1247           ...           17187           pg_internal.init           PG_VERSION               \---17229            1247            ...            17187            pg_internal.init            PG_VERSION                        +---global        1213        ...        16758        config_exec_params        pgstat.stat        pg_control        pg_pwd         +---pg_clog        0000         +---pg_log        postgresql-2004-12-05_000000.log         +---pg_subtrans        0000         +---pg_tblspc \---pg_xlog        000000010000000000000000             \---archive_status

Configuration files and the databases themselves are stored in the data directory, which is assigned to the environment variable $PGDATA. The $PGDATA directory contains three subdirectories. The base subdirectory contains the databases, represented as directories named after their object identifier (OID). The pg_database table holds the OID to database name mapping.

In addition to the databases created by the user , there are also a number of template databases. The CREATE DATABASE command takes an optional TEMPLATE parameter that specifies the template on which to base the new database. This enables "ready made" databases to be created that contain specific tables, views, data types, and functions defined in the template. On a default configuration, there are two templates: template0 and template1. Modifying template0 is not recommended because this is the template on which a new database is created if the TEMPLATE parameter is not supplied to CREATE DATABASE.

The global subdirectory contains OIDs that correspond to tables containing data that is not unique to a particular database. This includes the pg_group, pg_pwd, and pg_shadow tables, which hold username, group , and password information. The final subdirectory, pg_xlog, contains the transaction log.

The $PGDATA/postgreSQL.conf file contains the runtime configurations for the database. This includes basic connectivity options that specify whether the postmaster should use Unix sockets and/or TCP/IP sockets. It also includes various operational and optimizing parameters such as table scanning methods used in evaluating execution plans.

The $PGDATA/pg_hba.conf file consists of a set of records that permit and deny access to the database based on fields that describe the connection-type, the network properties, the database name, the username, and the authentication-type.

The $PGDATA/pg_ident.conf file maps operating system usernames to PostgreSQL usernames. This file is consulted when there is an access record that specifies the ident authentication type.

The $PGDATA/postmaster.opts file contains the default parameters used by the postmaster on startup. These parameters can be overridden by environment variables and flags passed to the postmaster command if it is launched directly (as opposed to launching it via a helper utility such as pg_ctl).

The $PGDATA/postmaster.pid exists while the postmaster is running and contains the process ID of the postmaster. This file was introduced in version 7.0.

Finally, the $PGDATA/PG_VERSION file contains the PostgreSQL version number.

File permissions in the $PGDATA directory should be such that only the operating system database user can read or write the configuration files or any of the database tables.

Protocols

PostgreSQL uses TCP and/or Unix domain sockets as its underlying transport, depending on configuration specified in postgresql.conf (located in $PGDATA). The Unix domain socket name is typically /tmp/.s.PGSQL.5432 and is created with default permissions allowing read/write for all users. This means that any local user can connect to the socket. Postgresql.conf contains the following options to create the socket in a more restrictive fashion:

unix_socket_directory: This is used to restrict where the socket is created. If this option is not specified, the default is /tmp.
unix_socket_group: This sets the group owner of the Unix domain socket. The owning user of the socket is always the user that starts the server. In combination with the unix_socket_permissions option this can be used as an additional access control mechanism for this socket type. By default this is the empty string, which uses the default group for the current user. This option can be set only at server start.
unix_socket_permissions: The default permissions are 777 (read/write/execute for all users). Reasonable alternatives are 0770 (only user and group) and 700 (only user). There is little point in revoking read or execute permissions for a Unix domain socket because write permission is the key attribute as far as security is concerned .

PostgreSQL is not network-enabled by default. To enable it on PostgreSQL prior to version 8.0 tcpip_socket must be set to true. On version 8.0, the listen_addresses parameter must contain the IP interfaces to listen on. When configured for network access the postmaster typically listens on port 5432 TCP. TCP communications may be encrypted using secure sockets. SSL can be enabled by setting SSL to true.

Authentication

PostgreSQL supports a number of different authentication models. The pg_hba.conf file is of key importance. It consists of a set of records that are matched against in the order that they appear in the file. Once a record is matched, parsing of the pg_hba.conf file ceases. Records take one of seven possible forms:

 # local      DATABASE  USER  METHOD  [OPTION] # host       DATABASE  USER  IP-ADDRESS  IP-MASK   METHOD  [OPTION] # hostssl    DATABASE  USER  IP-ADDRESS  IP-MASK   METHOD  [OPTION] # hostnossl  DATABASE  USER  IP-ADDRESS  IP-MASK   METHOD  [OPTION] # host       DATABASE  USER  IP-ADDRESS/CIDR-MASK  METHOD  [OPTION] # hostssl    DATABASE  USER  IP-ADDRESS/CIDR-MASK  METHOD  [OPTION] # hostnossl  DATABASE  USER  IP-ADDRESS/CIDR-MASK  METHOD  [OPTION]

Each record begins with a token specifying connection type. This can be one of the following:

local: This connection type matches Unix-domain sockets. It does not match TCP connections from localhost.
host: This connection type matches the client IP address of a TCP connection.
hostssl: This connection type matches a client IP address of a TCP connection that is over SSL.
hostnossl: This connection type matches a client IP address of a TCP connection that is not over SSL.

The next token in the record specifies the database that this connection wishes to access. Should both the connection and authentication succeed, the client will be able to access only the database that was specified during the connection startup. This means that should access to another database be required the client will have to disconnect and reconnect . There are a number of key words for this token:

All: Matches all databases.
Sameuser: Matches if the requested database is of the same name as the presented username.
Samegroup: Matches if the requested database is of the same name as the group that the presented username is in.

The next token specifies the username. Again, the keyword "all" matches all usernames. The next token depends on whether the connection type is local or host. If it is set to local, the authentication method follows. If it is set to host, the address range for the current record is specified. This takes the form of an IP address followed by the network mask (for example, 10.0.0.1 255.255.255.0) or the IP address/mask in CIDR notation (for example, 10.0.0.1/24).

The final required token specifies the authentication method. There are currently nine possibilities:

trust: This signifies that the connection should be allowed unconditionally, without requiring a password. This option essentially specifies no security.
Reject: The connection is unconditionally rejected.
Ident: This method relies on the client to authenticate the user. The postmaster connects to the identd daemon running on the client system in order to determine the client username. It then consults the ident map (specified as an option after the ident authentication type) or takes the client username as the PostgreSQL username if the sameuser keyword is specified.
password: This method states that the user must supply a password, which is sent in plaintext over the wire unless the connection type has been set to SSL. The username and password are matched against those in the pg_shadow table unless a file has been specified as an option after the password authentication type. If so, the file specified is assumed to contain flat text containing usernames and passwords.
crypt: This method also states that the user must supply a password, however; the postmaster sends the client a 2-byte random salt with which to encrypt the password (via the standard Unix crypt() function).
md5: This method was introduced as of version 7.2. The postmaster sends the client a random byte salt to the client. The client computes an md5 digest over the string formed by concatenating the username and password. The salt is appended to this digest, and the resulting string is hashed again. The second digest is sent to the server.
krb4 and krb5: These methods make use of Kerberos, a secure authentication service that uses symmetric cryptography. Architecturally, a Kerberos server (the Key Distribution Center) performs centralized key management and administrative functions such as granting tickets to users. Tickets permit access to Kerberos-enabled services. When compiled with Kerberos support, PostgreSQL acts like a standard Kerberos service.
pam: This method invokes the Pluggable Authentication Method (PAM). PAM is a system of libraries that handle the authentication tasks of applications on the system. The library provides an interface that applications defer to in order to perform standard authentication tasks . The principal feature of PAM is that the nature of the authentication is dynamically configurable. This allows the system administrator to choose how PostgreSQL will authenticate the user.

It is worth considering the preceding authentication types and the environments in which they might be found.

The trust method is fundamentally insecure because no verification of the user's identity takes place. It therefore provides no accountability. Its insecurity can be mitigated somewhat by setting the connection type to local, or to hostssl, enabling client certificates and restricting connections to certain IP addresses. Nevertheless, it is not recommended to use this authentication type. If the connection type is local, local users' identities are not verified . This is clearly insufficient for a multi-user system although might conceivably be found on a single-user system or where the permissions on the local socket are restrictive enough to allow access to a particular trusted group or user. Thus a record such as "local all all trust" might be found on single-user systems, databases in development environments, or on systems that are considered to be on secure networks.

Security Considerations

The information returned by this protocol is at most as trustworthy as the host providing it OR the organization operating the host. For example, a PC in an open lab has few if any controls on it to prevent a user from having this protocol return any identifier the user wants. Likewise, if the host has been compromised the information returned may be completely erroneous and misleading.

The Identification Protocol is not intended as an authorization or access control protocol. At best, it provides some additional auditing information with respect to TCP connections. At worst, it can provide misleading, incorrect, or maliciously incorrect information.

The use of the information returned by this protocol for other than auditing is strongly discouraged. Specifically, using Identification Protocol information to make access control decisions ” either as the primary method (i.e., no other checks) or as an adjunct to other methods may result in a weakening of normal host security.

The reject authentication method is used to explicitly deny certain kinds of connection types, usernames, or database names. The function that parses pg_hba.conf will reject the client connection if it does not find a matching record or it encounters a matching reject record. The reject record is useful for administrators that want to mark certain hosts , usernames, or database names as exceptions to other records. Thus a record such as "host all guest 10.0.0.1 255.255.255.0 reject" might be followed by "host all sameuser 10.0.0.1 255.255.255.0 password." This would deny access from the 10.0.0.1/24 subnet if the client supplies a username of "guest," but would permit all other usernames to attempt to authenticate. This is a contrived example; a valid use of the reject authentication type within an organization may be to explicitly prevent access to certain departmental databases based on IP range while allowing access to all other databases.

The ident authentication method uses the Identification Protocol described in RFC 1413. Its purpose is to map source-destination port pairs to a particular username. The ident authentication type provides little improvement on the trust method, since ultimately the postmaster relies on the client to provide correct information. The ident RFC contains a security caveat (see the "Security Considerations" sidebar).

"The Identification Protocol is not intended as an authorization or access control protocol," yet, somewhat bizarrely, ident is considered an authentication type in PostgreSQL. If an attacker is able to control the system that the postmaster queries, the attacker can return a username of his choosing. Therefore, a record of the type "host sameuser 10.0.0.1 0.0.0.0 ident sameuser" might be encountered in a closed environment where the integrity of the system with IP address 10.0.0.1 is trusted implicitly. This record would grant a user access to the database with a corresponding username. If a filename was specified instead of the sameuser directive, the postmaster would use this file to map the operating system username to a PostgreSQL username. Users would be granted access to the database with the same name as their PostgreSQL username.

The password authentication method is insufficient unless the connection-type is local or hostssl; otherwise the user's password will appear in clear text on the wire. If the connection type is set to host or hostnossl, the md5 and crypt authentications methods are recommended to mitigate against sniffing attacks. Thus for a system on which SSL is not enforced, a record of "host sameuser all 10.0.0.1 255.255.255.0 md5" might be encountered.

The various connection types and authentication methods are revisited in the following chapter when considering network-based attacks.

The System Catalogs

PostgreSQL stores metadata, such as information about tables and columns in system catalogs. PostgreSQL represents the system catalogs as ordinary tables that can be selected from, updated, and dropped like any other table (depending on privilege). Most system catalogs are copied from the template database during database creation and are thereafter database-specific ; however, a few catalogs are physically shared across all databases in a cluster. Table 24-1 lists the system catalogs.

Table 24-1: Complete List of System Catalogs
Catalog Name	Purpose
pg_aggregate	Aggregate functions
pg_am	Index access methods
pg_amop	Access method operators
pg_amproc	Access method support procedures
pg_attrdef	Column default values
pg_attribute	Table columns (attributes)
pg_cast	Casts (data type conversions)
pg_class	Tables, indexes, sequences (relations)
pg_constraint	Check constraints, unique constraints, primary key constraints, foreign key constraints
pg_conversion	Encoding conversion information
pg_database	Databases within this database cluster
pg_depend	Dependencies between database objects
pg_description	Descriptions or comments on database objects
pg_group	Groups of database users
pg_index	Additional index information
pg_inherits	Table inheritance hierarchy
pg_language	Languages for writing functions
pg_largeobject	Large objects
pg_listener	Asynchronous notification support
pg_namespace	Schemas
pg_opclass	Index access method operator classes
pg_operator	Operators
pg_proc	Functions and procedures
pg_rewrite	Query rewrite rules
pg_shadow	Database users
pg_statistic	Planner statistics
pg_trigger	Triggers
pg_type	Data types

The following system catalogs are likely to be of interest to an attacker:

pg_database: This catalog stores information about the available databases. There is one copy of the pg_database catalog per cluster. To list database names and their corresponding OIDs, execute
```
 SELECT datname, oid FROM pg_database     datname      oid -------      --- test        17257 template1       1 template0   17229 
```

pg_class: This catalog holds table, column, index, and view information together with their relevant access privileges. Each database contains its own pg_class catalog. To list user-defined tables, execute

 SELECT n.nspname, c.relname FROM pg_class c, pg_namespace n WHERE c.relnamespace=n.oid AND c.relkind = 'r'                   -- not indices, views, etc AND n.nspname not like 'pg\%'       -- not catalogs AND n.nspname != 'information_schema' -- not information_schema ORDER BY nspname, relname;     nspname   relname -------   ------- public    testtable1 public    testtable2

pg_group: This catalog defines groups and group membership. There is one copy of the pg_group catalog per cluster.
pg_language: This catalog contains details of the languages that have been registered, allowing creation and execution of stored procedures. Each database contains it own pg_language table.
pg_proc: This catalog stores information and access privileges for compiled functions and procedures. In the case of compiled functions, the prosrc and probin columns store the link symbol (essentially the function name) and the name of the shared object that contains the function. For procedures the prosrc column stores the procedure's source code.
pg_largeobject: This catalog is responsible for holding the data making up "large objects." A large object is identified by an OID assigned when it is created. Large objects are broken into segments small enough to be stored as rows in pg_largeobject. Each database contains its own pg_largeobject catalog. Large objects are manipulated via lo_creat, lo_unlink, lo_import, and lo_export. Data held in large objects can be scanned for substrings using the position() function:
```
 -- Locate sequence of bytes 'Test' -- Nonzero result indicates position in BYTEA array SELECT position('Test' IN pg_largeobject.data);     Position -------- 0 0 0 0 59 
```
pg_shadow: This catalog contains information about database users, including password. There is one copy of this catalog per cluster.
pg_trigger: This catalog stores triggers on tables. Each database contains its own copy of this catalog.

Examining Users and Groups

PostgreSQL does not use operating system credentials to authenticate users. By default, PostgreSQL usernames and passwords are stored in the pg_shadow system catalog table, which only the superuser can access. The following query can be used to display usernames and passwords:

 SELECT usename, usesuper, passwd FROM pg_shadow     usename usesuper passwd ------- -------- ------ test       f     md51fb0e331c05a52d5eb847d6fc018320d postgres   t     md5835dff2469b4e8c396b3c4cabde06282

Ordinary users have access to the pg_user view, which replaces the password column with ********. All users can therefore obtain usernames, account expiry information, and privilege levels, that is, whether a particular user is a superuser, whether they can create databases, and whether they can update system catalogs.

 SELECT usename, usecreatedb, usesuper, usecatupd, valuntil FROM pg_user     usename usecreatedb usesuper usecatupd valuntil ------- ----------- -------- --------- -------- test        f          f         f     infinity postgres    t          t         t       ""

Group information can be obtained through querying pg_group. The grolist column returns the list of usernames belonging to the particular group.

 SELECT groname, grolist FROM pg_group;     groname     grolist -------     ------- testgroup    {100}

Passwords are stored as md5 hashes by default, preceded by the characters md5. When a user is created, the superuser can explicitly override this to store the user's password in plaintext:

 CREATE USER test2 WITH UNENCRYPTED PASSWORD 'letmein'      SELECT usename, passwd FROM pg_shadow where substring(passwd from 1 for 3) <> 'md5'     usename passwd ------- ------ test2   letmein

Prior to PostgreSQL 7.2, when md5 support was added, all passwords were stored in plaintext. Although storing passwords this way deviates from best practice, it is not necessarily a security risk with correct permissions applied to the PostgreSQL directories and files. Only the operating system database user should have read/write access to PostgreSQL files; attacks that allow low-privileged users to interact with the filesystem in order to gain access to plaintext passwords (or even hashes) cannot be executed on PostgreSQL because these potentially dangerous functions are accessible to database superusers only.

Stored Procedures

PostgreSQL has extensible support for procedural languages. Before a function can be defined, the target language must be "installed" into each database where it is to be used. Languages installed in the database templates are automatically available in all subsequently created databases. The PostgreSQL manual documents PL/pgSQL; other languages implementations include PL/pgPerl, PL/pgPython, PL/pgTcl, PL/pgPHP, and PLJava.

The languages that a database supports can be determined by querying the pg_language table. All users have access to this table. The lanpltrusted column stores whether the language is trusted. Trusted languages can be used by any database user whereas untrusted languages can only be used by superusers. Languages are marked trusted if they provide a restrictive set of functions such that the caller cannot manipulate the filesystem, the network, or any other resource that may impact security. pgSQL, PL/Tcl, PL/Perl, and PL/Python are known to be trusted; the languages PL/TclU and PL/PerlU are designed to provide unlimited functionality and are therefore not marked as trusted (the "U" denotes "untrusted").

 SELECT lanname, lanpltrusted FROM pg_language;     lanname  lanpltrusted -------  ------------ internal    f c           f plpgsql     t sql         t plperl      t pltcl       t

The functions implemented in a particular language can be determined by querying the pg_proc table:

 SELECT proname from pg_proc p1, pg_language p2 WHERE p1.prolang = p2.oid AND p2.lanname = 'plpgsql';     proname ------- test_stored_proc1 test_stored_proc2

Furthermore, source code for each procedural function can be retrieved as follows:

 SELECT prosrc FROM pg_proc WHERE proname = 'test_stored_proc1';     prosrc ------ BEGIN RAISE NOTICE 'This is a test'; RETURN 1; END;

PostgreSQL functions execute under the caller's user privilege, unless the function was created with the SECURITY DEFINER attribute, in which case they will run under the context of the function creator. Such setuid functions can be determined by querying the prosecdef column of the pg_language table. PostgreSQL does not install any stored procedures by default, unlike other DBMSes such as Oracle.