I l @ ve RuBoard |
Generally speaking, there are no surefire methods for obtaining optimal performance from a database system. However, there are guidelines that can assist administrators in implementing successful performance-tuning strategies. Hardware ConsiderationsIf you notice that your database system is consistently running at high CPU loads or that an excessive amount of hard-disk paging is occurring, it might be necessary to upgrade your hardware. These are the four biggest hardware issues related to database performance:
Tuning SQL CodeAlthough it is common to blame hardware for database sluggishness, most often there are tunings to the underlying database code that could improve performance. Some general rules can be followed to help tune SQL database code:
Buffer Size and Other ConsiderationsPostgreSQL comes with certain default or preset settings with regard to buffer size, simultaneous connections, and sort memory. Usually these settings are fine for a standalone database. However, they usually are set cautiously low to make as little impact on the system as possible while idle. For larger, dedicated servers with several hundred or thousand megabytes of data, these settings will need to be adjusted. It is often assumed that setting the options to higher values will automatically improve performance. Generally, you should not exceed more than 20% of your system limits with any of these settings. It is important to leave sufficient RAM for kernel needs; a sufficient amount of memory particularly needs to be available to handle network connections, manage virtual memory, and control scheduling and process management. Without such tolerances, performance and responsiveness of the system will be negatively impacted. There are three crucial run-time settings that impact database performance: shared buffers, sort memory, and simultaneous connections. Shared BuffersThe shared buffer option ( -B ) determines how much RAM is made available to all of the server processes. Minimally, it should be set to at least twice the number of simultaneous connections allowed. Shared buffers can be set either in the postgresql.conf file or by issuing a direct command-line option to the postmaster back end. By default, many PostgreSQL installations come with a preset value of 64 for this setting. Each buffer consumes 8KB of system RAM. Therefore, in a default setting, 512KB of RAM is dedicated for shared buffers. If you are setting up a dedicated database that is expected to handle very large datasets or numerous simultaneous connections, it might need to be set as high as 15% of system RAM. For instance, on a machine with 512MB of RAM, that means a shared buffer setting of 9,000. Ideally, buffer space should be large enough to hold the most commonly accessed table completely in memory. Yet it should be small enough to avoid swap (page-in) activity from the kernel. Sort MemoryThe postgres back end (which typically is only called by the postmaster process) has a setting ( -S ) that determines how much memory is made available to query sorts. This value determines how much physical RAM is exhausted before resorting to disk space, while trying to process sorts or hash-related functions. This setting is declared in KB, with the standard default being 512. For complex queries, many sorts and hashes might be running in parallel, and each one will be allowed to use this much memory before swapping to hard disk begins. This is an important point to stress ”if you would blindly set this setting to 4,096, every complex query and sort would be allowed to take as much as 4MB of RAM. Depending on your machine's available resources, this might cause the virtual memory subsystem of your kernel to swap this memory out. Unfortunately, this is usually a much slower process than just allowing PostgreSQL to create temporary files in the first place. Simultaneous ConnectionsThere is a postmaster option ( -N ) that will set the number of concurrent connections that PostgreSQL will accept. By default, this setting is set to 32. However, it can be set as high as 1,024 connections. (Remember that shared buffers need to be set to at least twice this number.) Also remember that PostgreSQL does not run multithreaded (yet); therefore, every connection will spawn a new process. On some systems, like UNIX, this poses no significant problems. However, on NT, this can often become an issue. Optimizing Queries with EXPLAINThe EXPLAIN command describes the query plan being evaluated for the supplied query. It returns the following information:
The time units previously mentioned are not related to an objective amount of time; they are an indication of how many disk page fetches would be needed to complete the request. For example: EXPLAIN SELECT * FROM authors; NOTICE: QUERY PLAN Seq Scan on authors (cost=0.00..92.10 rows=5510 width=20) The preceding EXPLAIN statement lists the following estimations:
Modifying the query will produce different results: EXPLAIN SELECT * FROM authors WHERE age<10000; NOTICE: QUERY PLAN Seq Scan on authors (cost=0.00..102.50 rows=5510 width=20) In this example, you can see that the total cost increased slightly. It is interesting to note that although there is an index on age on this table, the query planner is still using a sequential scan. This is due to the fact that the search criterion is so broad; an index scan would not be of any benefit. (Obviously, all values in the age column are less than 10,000.) If you constrain the search criterion slightly more, you can see some changes: EXPLAIN SELECT * FROM authors WHERE age<75; NOTICE: QUERY PLAN Seq Scan on authors (cost=0.00..102.50 rows=5332 width=20) Again, you are still using an index scan, although the number of rows returned is now lower. Further constraints can produce results that are more dramatic: EXPLAIN SELECT * FROM authors WHERE age<30; NOTICE: QUERY PLAN Index Scan using age_idx on authors (cost=0.00..32.20 rows=991 width=20) A number of things are interesting about this result. First, you have finally constrained the criterion enough to force the query planner to make use of the age_idx index. Second, both the total cost and the number of returned rows are dramatically reduced. Finally, let's try the following: EXPLAIN SELECT * FROM authors WHERE age=27; NOTICE: QUERY PLAN Index Scan using age_idx on authors (cost=0.00..3.80 rows=71 width=20) You can see the tremendous speed gain you were able to achieve by using such a limited criterion. Using EXPLAIN on more complex queries can sometimes illuminate potential problems with the underlying database structure. EXPLAIN SELECT * FROM authors, payroll WHERE authors.name=payroll.name; NOTICE: QUERY PLAN Merge Join (cost=69.83..425.08 rows=85134 width=36) ->Index Scan using name_idx on authors (cost=0.00..273.80 rows=5510 width=20) ->Sort (cost=69.83..69.83 rows=1000 width=16) ->Seq scan on payroll (cost=0.00..20.00 rows=1000, width=16) This output produces some interesting facts about the underlying database structure. Obviously, the authors table has an index on name , but the payroll table appears to be resorting to using sequential scans and sorts to match fields. After investigating, it is determined that, in fact, the payroll table does not have an appropriate index for this join. So, after an index is created, you get the following results: CREATE INDEX pr_name_idx ON payroll(name); EXPLAIN SELECT * FROM authors, payroll WHERE authors.name=payroll.name; NOTICE: QUERY PLAN Merge Join (cost=0.00..350.08 rows=44134 width=36) ->Index Scan using name_idx on authors (cost=0.00..273.86 rows=5510 width=20) ->Index Scan using pr_name_idx on payroll (cost=0.00..29.50 rows=500 width=16) By including an index on the payroll table, you have now achieved a 25% increase in query execution. Running EXPLAIN on your queries is a good way to uncover hidden bottlenecks that are impacting system performance. In fact, for non-hardware “related problems, the EXPLAIN command is probably the single best tool that a DBA can use to solve performance problems. EXPLAIN provides the information necessary to intelligently allocate system resources, such as shared buffers, and optimize your queries and indexes for greater performance. One of the best ways to use the EXPLAIN command is as a benchmark generation tool. This way, when changes are made to table schema, indexes, hardware, or the operating system, a valid comparison can be made to determine how much these changes affected system performance. ![]() |
I l @ ve RuBoard |