System Performance | PostgreSQL Developer's Handbook2001

Tuning a system is one of the most important tasks a system administrator has to face. In this section, you will learn how to speed up your system and how to improve performance.

The Impact of I/O on System Performance and CPUs

On the hardware side, I/O and CPU power are the most important factors. Of course, hardware is not the only thing affecting the performance of your system, but it is a major component. Let's take a look at what can be done to tweak your hardware.

`CLUSTER`

Without a doubt, the performance of the I/O subsystem of a server is one of the most important things when working with databases. Tuning the I/O system plays a major part if you want to speed up your application.

Tuning can be done in many ways. On the one hand, the database provides some very useful tools to speed up I/O. On the other hand, performance gains can be achieved by tuning the file system.

Let's start with PostgreSQL's onboard tools first. The most important command when tuning I/O is CLUSTER . PostgreSQL stores indexes in separate files. When many INSERT , UPDATE , or DELETE statements are performed, the values of an index are not stored sequentially in the file. This does not mean that the index is somehow broken or that the result of queries will be wrong. It simply means that the overhead of reading the file increases because the database has to jump around inside the file to read the right piece of data. When the CLUSTER command is called, the database creates a temporary file where the values are stored in the same order as in the index. The temporary file will be used as the new index file as soon as the operation is ready. This will lead to a significant gain of performance, especially when many duplicate values are stored in the index.

Look at the syntax of the CLUSTER command:

 performance=#  \h CLUSTER  Command:     CLUSTER Description: Gives storage clustering advice to the server Syntax: CLUSTER indexname ON tablename

As you can see, the syntax is really easy. The only two things you have to define are the name of the index and the table for which it is defined.

We recommend that you run CLUSTER when the load on your system is low.

Tuning the File System

The file system is a very crucial point of the operation system. Today, a file system is more than just an indexed container for the files on your operating system.

Advanced file systems provide quite a lot of tuning parameters that may affect speed in many ways. Tuning a file system is somewhat trial and error. Simply try some settings and find out which one is the fastest for your system. But speed is not the only thing; tuning has always something to do with security. It is of no use to gain speed when the security of the data on your system decreases significantly. Security is very often much more important than speed, especially on high-availability systems. CPU clockspeeds are constantly increasing, and so do new challenges concerning security.

We will try to give you an overview of how tuning can be done. We will use the ext2 (second extended file system) for this purpose.

The ext2 file system is one of the most reliable file systems available. It is the standard file system on Linux platforms. ext2 has no journaling features, but developers ( open -source people and companies such as SGI) around the globe are working hard to provide journaling for Linux too.

The most important commands when tuning ext2 file systems are mke2fs and tune2fs , which are used to adjust tunable file system parameters such as maximum-mount-count and so on.

mke2fs is used to create an ext2 file system and supports a huge number of parameters that can be defined. Most of these parameters won't affect the speed of your system, but let's get to the parameters that do.

One of these parameters is the block size of the file system. A file system usually reads an entire block of a predefined size and not single bytes, because this is much faster and works much more efficiently . The size of the blocks affects the speed of your system. On a system with lots of small files, a smaller blocksize may be more useful because the overhead for small files can be reduced. When usually working with large files, the situation may be different. Depending on the application you want to build, the right block size may vary. Currently, ext2 supports three different possible block sizes ”1024, 2048, and 4096 bytes.

Another important flag is the -R flag that is used to pass options for RAID systems to the file system. To find out which options are currently supported by your version of the file system, check the documentation.

Another important parameter when tuning file systems is the size of a file system. The smaller a file system, the faster data can be accessed by the system. When building databases, you should think a moment about the amount of data that has to be stored. If only a few records have to be stored, it is useful to use a smaller partition for storing the data. This can lead to performance gains, especially for I/O-dependent queries.

The impact of tuning the file system on the performance of the system may be rather small, but after you have found the right setting for your system, it can be a simple way to get a little more power of your system.

`fsync`

The fsync flag has tremendous impact on the overall performance of your system. It can be useful to set the fsync flag to off temporarily, especially when building up huge databases. We don't recommend that you turn fsync off completely because the security of your data may suffer.

We would like to explain in a few words what fsync is and what it is good for. On Unix systems, data is not necessarily written to disk when an application writes data into the file. The data is cached by the operating system to achieve higher performance by reducing the number of disk accesses .

fsync is a system call that flushes the data to disk. This costs performance but provides higher security in case of a failure because the data in the file is always recent.

On PostgreSQL systems version 7.1, fsync can be turned on and off in pg_options . Since 7.1, most configuration parameters can be found in postgresql.conf . Another way to turn fsync off is to use the -F flag when starting the postmaster .

PostgreSQL on SMP Machines

It sometimes happens that one CPU is simply not enough for your PostgreSQL server. In this case, you can use machines with multiple CPUs. When running PostgreSQL on an SMP machines, the performance of the system will raise significantly when you are running many queries at once ”you won't gain speed (at least no significant gain) on an SMP machines when running only one query at a time.

If your database is very huge (5GB perhaps), we recommend that you use 64-bit CPUs for your PostgreSQL box, such as Digital Alpha, Sun Ultra Sparc 64, IBM 64bit CPUs, SGI 64 bit CPUs, HPUX machines, or 64-bit machines by AMD or Intel. Simply recompile your PostgreSQL server and you will see that the performance for huge databases increase.

The Impact of Memory on System Performance and Restricted Access to System Resources

In general, databases are very memory consuming applications. Depending on the type of application and the type of queries you want to run on your PostgreSQL server, the amount of needed memory may vary. As a rule of thumb, it can be said that the more memory a system has, the better for the performance of your system.

PostgreSQL's memory management can be configured by the user . This is very useful because you can try set the parameters of the system to your needs; this can lead to a significant increase in performance.

One of the most memory consuming tasks is sorting. Some components of SQL, such as ORDER BY , require internal sorting. Consequently, PostgreSQL uses a so-called sort buffer. The size of the sort buffer can be configured in postgresql.conf . If the amount of memory used for sorting is higher than the size of the sort buffer, PostgreSQL will use temporary files to perform the sort operation. Check the following example to see how temporary files are used by the database. Therefore, you can enter the directory where the database cluster your postmaster is working on is located:

 bash-2.04$  ls -l *sort*  -rw-------    1 postgres postgres 131211264 Apr 25 19:27 pg_sorttemp12516.0

To change the size of the default sort buffer, set sort_mem in postgresql.conf to the desired value that is measured in kilobytes.

We recommend that you use sort buffers that are not too small because temporary files are much slower than internal sorting.

In general, it can be said that it is more efficient to let the database instead of the operating system manage the system resources.

But how can you find out how much memory is used by a query? Simply use top , as , and sort the list by memory use (Shift+M). If you want to find the memory consumption of the entire system, you can use free .

If you have many applications running with different user IDs on your system, it might be useful to restrict resources for certain purposes. Imagine a situation where PostgreSQL is one of two database servers running on the system. In this case, you can define the maximum amount of memory or CPU time that can be used by the user (for example, user postgres ). Things like these can be defined on most Unix systems with the help of the ulimit command. The syntax of the command is very easy:

 ulimit: usage: ulimit [-SHacdflmnpstuv] [limit]

Unfortunately, man pages are only available for the C-function ( long ulimit(int cmd, long newlimit) ) and not for the shell command. For that reason, we have included a full overview of ulimit on Linux systems:

 -S Change and report the soft limit associated with a resource. -H Change and report the hard limit associated with a resource. -a All current limits are reported. -c The maximum size of core files created. -d The maximum size of a process's data segment. -f The maximum size of files created by the shell. -l The maximum size that may be locked into memory. -m The maximum resident set size. -n The maximum number of open file descriptors. -p The pipe buffer size. -s The maximum stack size. -t The maximum amount of cpu time in seconds. -u The maximum number of processes available to a single user. -v The maximum amount of virtual memory available to the process.

Restricting access to resources by using ulimit is also a very comfortable way to raise the level of security on your system. "Evil" people can easily make DOS (Denial of Service) attacks by allocating all the memory available on the server if you don't set ulimits to restrict the memory a user is allowed to use.

If you want to learn more about the current setting of your system, simply use ulimit in your Bash Shell.