8.8 Basic Optimizations Using EXPLAIN

8.8 Basic Optimizations Using `EXPLAIN`

After you have written the queries needed by your application, you will soon find out that some sort of tuning might be necessary. The more complex your queries are, the more likely it is that a lot of performance can be gained by changing some simple parameters.

8.8.1 Using `EXPLAIN`

To find out what the database does internally, PostgreSQL provides a command called EXPLAIN:

 phpbook=# \h EXPLAIN Command:     EXPLAIN Description: show the execution plan of a statement Syntax: EXPLAIN [ ANALYZE ] [ VERBOSE ] query

As you can see, the syntax of the command is easy.

With the help of EXPLAIN it is possible to find out how PostgreSQL processes a query internally. EXPLAIN displays the so-called execution plan on the screen. In this execution plan you can see every step performed by PostgreSQL.

Take a look at a simple join operation:

 SELECT company.*, employee.name, employee.salary         FROM company, employee         WHERE company.name=employee.company_name;

The query returns exactly one record:

  id |                 name                 |  city  | name |  salary ----+--------------------------------------+--------+------+----------   1 | Cybertec Geschwinde and Schoenig OEG | Vienna | Epi  | 52000.00 (1 row)

In the next step EXPLAIN can be used to display the internal execution plan of the query:

 phpbook=# EXPLAIN SELECT company.*, employee.name, employee.salary FROM company, employee WHERE company.name=employee.company_name; NOTICE:  QUERY PLAN: Merge Join  (cost=69.83..139.33 rows=1000 width=147)   ->  Index Scan using company_name_key on company  (cost=0.00..52.00 rows=1000 width=68)   ->  Sort  (cost=69.83..69.83 rows=1000 width=79)         ->  Seq Scan on employee  (cost=0.00..20.00 rows=1000 width=79) EXPLAIN

You have to read the execution plan from right to left. In this example this means that PostgreSQL starts with scanning employee and continues with sorting and scanning the primary key of company. After that the results of the sort operation and the index scan are merged.

When using EXPLAIN, PostgreSQL does not execute the query only the execution plan is displayed. This is an important point because you don't have to take care of what the query you are analyzing is doing.

To get a little bit more information about the execution plan, you can also use EXPLAIN ANALYSE. This feature has been introduced with PostgreSQL 7.2 and is a significant improvement over previous versions of PostgreSQL:

 phpbook=# EXPLAIN ANALYZE SELECT company.*, employee.name, employee.salary FROM company, employee WHERE company.name=employee.company_name; NOTICE:  QUERY PLAN: Merge Join  (cost=69.83..139.33 rows=1000 width=147) (actual time=0.24..0.26 rows=1 loops=1)   ->  Index Scan using company_name_key on company  (cost=0.00..52.00 rows=1000 width=68) (actual time=0.05..0.06 rows=1 loops=1)   ->  Sort  (cost=69.83..69.83 rows=1000 width=79) (actual time=0.12..0.12 rows=1 loops=1)        ->  Seq Scan on employee  (cost=0.00..20.00 rows=1000 width=79) (actual time=0.02..0.02 rows=1 loops=1) Total runtime: 0.47 msec

As you can see, PostgreSQL also displays information about the time it might take to perform the query. This is extremely helpful when working with more complex queries because you get an idea of how long things will take to be executed.

An additional feature provided by EXPLAIN is the keyword VERBOSE. VERBOSE displays detailed information about what is going on inside the database when executing a query. The next listing shows the detailed execution plans of a simple query. (We have decided not to use a more complex query here because the result would be far too long.)

 phpbook=# EXPLAIN VERBOSE SELECT 1+1; NOTICE:  QUERY DUMP: { RESULT :startup_cost 0.00 :total_cost 0.01 :rows 1 :width 0 :qptargetlist ({ TARGETENTRY :resdom { RESDOM :resno 1 :restype 23 :restypmod -1 :resname ?column? :reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk false } :expr { CONST :consttype 23 :constlen 4 :constbyval true :constisnull false : constvalue  4 [ 2 0 0 0 ] }}) :qpqual <> :lefttree <> :righttree <> :extprm () :locprm () :initplan <> :nprm 0  :resconstantqual <>} NOTICE:  QUERY PLAN: Result  (cost=0.00..0.01 rows=1 width=0) EXPLAIN

Even a simple operation such as 1+1 has a complex-looking execution plan. However, if you take a closer look at the listing, you will discover that most of the data returned by PostgreSQL consists of attributes (for example, ?column?, which defines the name of the column displayed in the result).

8.8.2 Optimizing a Query

After analyzing a query using EXPLAIN, you can try to optimize the execution time of a query by giving PostgreSQL's planner some hints what to do best. The planner is responsible for generating the execution plan of a query. This execution plan needs to be analyzed and bottlenecks have to be found in order to speed up your applications.

To see what can be done to gain speed, you can generate some records. We always recommend that you test your data structures with many records because this is the only way to detect bottlenecks. If you have only a few records, it is not possible to tune your database efficiently because it is difficult to measure the time gained by a certain change. Therefore it is useful to generate some dummy records and to test your data structures with these records. In many cases this will help you to test your system under real-world load. To generate some data you can build a small PHP script:

 <?php         $fp = fopen("/tmp/data.sql", "w+");         if      (!$fp)         {                 echo "cannot open file\n";                 exit;         }         fwrite($fp,"COPY data FROM stdin;\n");         for     ($i = 1; $i < 250000; $i++)         {                 $string = "$i   ".($i*2)."      ".cos($i)."\n";                 $status = fwrite($fp, $string);         }         fwrite($fp,"\.\n");         fclose($fp); ?>

First a file is opened for writing. If the file has been opened successfully, you can continue with a COPY command. In this case data will be inserted into a table called data this table can be created after generating the data.

After generating the header of the COPY command, many records are generated. Every record consists of three columns storing numbers and is written into the file using fwrite.

Finally, the file is closed.

To execute the file, you can use PHP from the command line as shown in the next listing:

 bash-2.04$ php makedata.php X-Powered-By: PHP/4.0.4pl1 Content-type: text/html

In most default installations, the maximum time the PHP script is allowed to run is set to 30 seconds if no data is displayed on the screen, this has to be taken into consideration.

After generating the data, you can create the table for storing the records:

 CREATE TABLE data (id int4, two int4, three numeric(10, 9));

As you can see, the table contains three records:

 phpbook=# \d data              Table "data"  Attribute |     Type      | Modifier -----------+---------------+----------  id        | integer       |  two       | integer       |  three     | numeric(10,9) |

Your PHP script has generated a COPY command and the data has been added to the table by the command. With the help of the Unix command head, you can list the first records in the file you have just created:

 bash-2.04$ head data.sql COPY data FROM stdin; 1       2       0.54030230586814 2       4       -0.41614683654714 3       6       -0.98999249660045 4       8       -0.65364362086361 5       10      0.28366218546323 6       12      0.96017028665037 7       14      0.7539022543433 8       16      -0.14550003380861 9       18      -0.91113026188468

The data has been generated successfully. To retrieve the last few lines from the file, you can use tail:

 bash-2.04$ tail data.sql -n3 249998  499996  -0.8685165807682 249999  499998  -0.88634511580316 \.

The last line of the COPY command has to be \. this way PostgreSQL will know that no more data is going to come.

To insert the data into the database, use the following command:

 psql phpbook < data.sql

Now that the data is in the table, you can try to send a simple query to the server. In this example the query is sent to the server using a shell command. This is necessary to measure the time the query takes:

 bash-2.04$ time psql -d phpbook -c "SELECT * FROM data WHERE id=200000"    id   |  two   |    three --------+--------+-------------  200000 | 400000 | 0.997444047 (1 row) real    0m1.601s user    0m0.020s sys     0m0.010s

The query takes more than one and a half seconds, which is far too long. If you use EXPLAIN to analyze the execution plan, you will soon find the reason for the disaster: PostgreSQL has to perform a sequential scan, which means that the entire table has to be read from the beginning to the end which takes a long time. To get around the problem, you can define an index:

 phpbook=# CREATE INDEX idx_data_id ON data (id); CREATE

Now the query should run significantly faster:

 bash-2.04$ time psql -d phpbook -c "SELECT * FROM data WHERE id=200000"    id   |  two   |    three --------+--------+-------------  200000 | 400000 | 0.997444047 (1 row) real    0m0.087s user    0m0.040s sys     0m0.000s

The performance gain is impressive because PostgreSQL can use an index now. Take a look at the execution plan of the query:

 phpbook=# EXPLAIN SELECT * FROM data; NOTICE:  QUERY PLAN: Index Scan using idx_data_id on data  (cost=0.00..4.68 rows=1 width=32) EXPLAIN phpbook=# EXPLAIN SELECT * FROM data where id = 200000; NOTICE:  QUERY PLAN: Index Scan using idx_data_id on data  (cost=0.00..4638.01 rows=2500 width=20) EXPLAIN

If the numbers in parentheses do not match the numbers you have, don't worry these values are taken from internal statistics and might vary.