Using PostgreSQL and gnuplot | PostgreSQL Developer's Handbook2001

In this section, you learn the fundamentals of gnuplot and, after some simple examples, move to PostgreSQL and see how it can be combined with gnuplot to generate database-driven graphics. To start gnuplot, use the following:

[hs@ duron hs]$ gnuplot -background white -background white makes sure that the background of the graphs we want to generate is white. After starting gnuplot, we will be in an interactive shell.

Now we can draw graphs by using the plot command:

 gnuplot>  plot tanh(x) - cos(x)

We want to see the graph of tanh(x) - cos(x) ( tanh is tangens hyperbolicus; cos is cosinus). Figure 20.1 shows the result.

Figure 20.1. A simple mathematical function.

graphics/20fig01.gif

gnuplot finds the best interval for the graph so that only the "useful" part is displayed.

gnuplot provides a large number of functions; you explore the most important ones in this section.

Let's get back to PostgreSQL. We have compiled a small table storing incomes for certain years . We will use this table for the next example:

 mygnuplot=#  SELECT * FROM income;  year  income ------+--------  1997   24000  1998   26300  1999   26000  2000   29000  2001   32100  2002   32000  2003   32700 (7 rows)

The data shown does not have the right format to be used for gnuplot directly; we have to use some parameters when retrieving the data from the table. In this example, we want only the first column to be displayed, so we select only the first column from the table:

 [hs@duron hs]$  psql -c "SELECT income FROM income" -A -F " " -t mygnuplot  24000 26300 26000 29000 32100 32000 32700

The data shown already fits our demands. Let's look at the command-line parameters: -A tells PostgreSQL to use the unaligned table output mode. -F defines the field separator ”this option is useful only when multiple columns are returned by the query. -t makes sure that the header is silently omitted.

Now that we have written a shell command, which returns the data in the way we need it, we have to find a way to pass it to gnuplot in order to draw the graph. Luckily, gnuplot supports some easy methods to read the data.

The following shows how we compute the result of the shell command we have shown previously and draw a graph:

 gnuplot>  plot "< psql -c 'SELECT income FROM income' -A -F ' ' -t mygnuplot "

The command between the two double quotes is evaluated, and the result is returned to gnuplot, which displays the graph on the screen (see Figure 20.2).

Figure 20.2. gnuplot displays a window containing the graph.

graphics/20fig02.gif

We can see an array of points representing the values in the database.

Up to now, we have created all graphics with the help of gnuplot's shell. This might be a bit of a problem for you when building huge applications. Let's see how the graphics shown previously can be created with the help of one simple Bourne shell command:

 [hs@duron hs]$  echo "plot \ "< psql -c 'SELECT income FROM income' -A -F ' ' -t   mygnuplot \ " "  gnuplot -background white -persist

We simply pipe the command to gnuplot. All double quotes passed to the program have to be masqueraded using a backslash so that the shell does not mix up the double quotes used for the echo command with the double quotes used for gnuplot.

The -persist flag is necessary; otherwise , gnuplot will close the X windows when the execution of the script is ready. Using -persist makes sure that the window is still alive after gnuplot has terminated .

Having the result displayed in an X window might be nice, but in some cases it is necessary to store the result generated by gnuplot in a file that you can use for further transformation. The following includes a file that contains a short gnuplot script.

First we set the terminal to pbm , which means that a pbm file is generated instead of X11 output. Then we include the command we showed previously:

 [hs@duron gnuplot]$  cat config.plot  set terminal pbm color plot "< psql -c 'SELECT income FROM income' -A -F ' ' -t mygnuplot "

We start gnuplot with config.plot as the first parameter and redirect the output to a file called graph.pbm :

 [hs@duron gnuplot]$ g  nuplot config.plot  > graph.pbm

We can use that file now. If pbm is not our desired file format, we can easily convert the file to jgp or any other format by using a program such as mogrify :

 [hs@duron gnuplot]$  mogrify -format jpg *.pbm

Now we want to change the labels of the graphics we have just generated. The first thing we want to do is to eliminate the ugly string containing psql on the upper edge of the graph. This can be done by using set nokey . The title of the graph should be Income table , and the axis should be labeled with year and income in USD per year . Up to now we have also seen that gnuplot finds the right scale for the graph by itself. In some cases, this might not be what we need, so we define the range of values displayed manually using xrange and yrange .

The following is a file that does exactly what we need:

 set nokey set title "Income table" set xlabel "year" set ylabel "income in USD per year" xmax=6 ymin=0 ymax=40000 set xrange [0 : xmax] set yrange [ymin : ymax] plot "< psql -c 'SELECT income FROM income' -A -F ' ' -t mygnuplot "

We can generate the graph using a simple shell command:

 [hs@duron gnuplot]$  gnuplot -persist -background white config.plot

The result will be the graph shown in Figure 20.3.

Figure 20.3. Redesigning the graph.

graphics/20fig03.gif

Often it is necessary to plot two independent data sources in one graph ” especially when you want to compare data. The following script shows how you can plot two data sources (they contain the same data) in two different ways:

 set nokey set grid set title "Income table" set xlabel "year" set ylabel "income in USD per year" xmax=6 ymin=0 ymax=40000 set yrange [ymin : ymax] plot "< psql -c 'SELECT year, income FROM income' -A -F ' ' -t mygnuplot" \                 with lines linewidth 3, \         "< psql -c 'SELECT year, income FROM income' -A -F ' ' -t mygnuplot" \                 with boxes linewidth 3

We want a grid to be displayed in the background of our graph. The last four lines of code are responsible for plotting. The first data source is displayed as a line, and linewidth is set to 3 . The second data source is displayed using boxes. gnuplot makes sure that the two components of the graph are displayed in different colors (see Figure 20.4).

Figure 20.4. Plotting two data sources.

graphics/20fig04.gif

The script plots the graph as we expected it to be, but there are still two problems that have to be solved . We have plotted two data sources, and therefore we have queried the database twice. This leads to two problems: It might happen that the data in the database changes during the execution of the first query, so the result of the second query might differ from the result of the first query. Another problem is that executing two queries takes much longer than executing just one.

Our two problems can be solved easily. To show you how this can be done, we have compiled a table containing the income of males and females:

 mygnuplot=#  SELECT * FROM income;  year  male   female ------+-------+--------  1997  24000   19200  1998  26300   21400  1999  26000   21800  2000  29000   21000  2001  32100   26500  2002  32000   26700  2003  33700   28200 (7 rows)

We use a simple Makefile to generate the result:

 [hs@duron gnuplot]$  cat Makefile  x       :       config.plot         psql -c 'SELECT year, male, female FROM income' -t -A -F ' ' \                 mygnuplot > file.data         gnuplot -background white -persist config.plot         rm -f file.data

Let's analyze the Makefile first. We select all data from table income and store the result of the query in file.data . Then we start gnuplot and pass the name of the configuration script to it.

Here is the configuration script:

 set nokey set grid set time set title "Income table" set xlabel "year" set ylabel "income in USD per year" set yrange[0 : ] plot 'file.data' using 1:2 with lines linewidth 2, \         'file.data' using 1:3 with lines linewidth 2

The time the image was created is displayed by using set time . We don't know the highest income in the database, so we set the lower limit for yrange " to and do not specify the upper value. gnuplot will make sure that a suitable border for the upper limit will be used. In the next step, we tell gnuplot to use the first and the second column in file.data as the data source for plotting the first line. The second line will be plotted using the first and the third column. Using one plot command is enough, because every additional graph is simply added to the list (see Figure 20.5).

Figure 20.5. Plotting the data in an external file.

graphics/20fig05.gif

The previous examples showed basic tasks that can be accomplished with gnuplot and PostgreSQL. Sometimes it might be necessary to add explanations to the graph so that the reader can easily understand what you want to say. Two components for adding explanations to a graph are essential: arrows and labels.

In the next example, we added two labels and one arrow to the scenery :

 set nokey set grid set time set arrow from 1999, 15000 to 2000, 20000 set label "significant changes" at 1998.35, 13000 set label "(c) SAMS" at 2002.3, 2200 set format y "%g$" set title "Income table" set xlabel "year" set ylabel "income" set yrange[0 : ] plot 'file.data' using 1:2 with lines linewidth 2, \         'file.data' using 1:3 with lines linewidth 2

First we add the arrow to the graph. As you can see, the coordinates can simply be defined. Then we add the labels to the scenery. The first component that we define for the label is the text we want to be displayed. Then we tell gnuplot where to place it, as we did for the arrow. We start gnuplot with the file shown.

The result can be seen in Figure 20.6.

Figure 20.6. A plot including arrows and comments.

graphics/20fig06.gif

For many applications, it is necessary to produce three-dimensional plots. Because gnuplot is a highly developed software, it is also capable of generating three-dimensional graphics.

To show you how such plots can be created, we compiled a table that contains two additional columns, which store the number of people we used to compute the average income ( wf is an abbreviation for workforce ):

 mygnuplot=#  SELECT * FROM income;  year  avg_male  wf_male  avg_female  wf_female ------+----------+---------+------------+-----------  1997     24000      732       19200        932  1998     26300     1412       21400       1054  1999     26000     1930       21800       1320  2000     29000     2065       21000       1150  2001     32100     2163       26500       1259  2002     32000     2254       26700       1292  2003     33700     2620       28200       1721 (7 rows)

The following is the config file for our three-dimensional plot:

 set nokey set time set format y "%g$" set title "Income table" set xlabel "year" set ylabel "income" splot 'file.data' using 1:2:3 with lines linewidth 2, \         'file.data' using 1:4:5 with lines linewidth 2

Here, we have to use splot instead of plot . Because we have one additional dimension, we have to define three columns containing the data ”we select the numbers of the columns according to the order they can be found in the input file.

The screenshot of the plot is shown in Figure 20.7.

Figure 20.7. A simple 3-D plot.

graphics/20fig07.gif