Adding Functions and an Overview of Programming Conventions

The more complex a database system is, the more important is the existence of rules and conventions to make the code clearer and easier to understand. In the case of PostgreSQL, some basic rules have to be taken into consideration when implementing additional functions. These are primarily important for making the system easier to understand as a whole. Before you get to these calling conventions, let's look at how functions can be implemented.

Writing Simple SQL Functions

Because PostgreSQL is a very flexible software, it is an easy task to add functions to the database. We have already shown how adding functions works with the help of many programming languages (C, PL/pgSQL, PL/Tcl, and so on). Now we will show that it is also possible to implement additional functionalities simply by using "ordinary" SQL code. As you might expect, using SQL for additional functions is just as easy as using any other programming language.

The advantage of functions implemented in pure SQL is that they are extremely portable, because no support for other programming languages is required. Imagine a PostgreSQL system where no support for, let's say, PL/Perl has been implemented. Adding PL/Perl functions to the system requires a recompile of the server, and adding a SQL function can be done without changing anything.

Let's imagine a table called mynumber containing a few values:

 mydata=#  SELECT * FROM mynumber;  a   b ----+----  22   6   3  12  55  51 (3 rows)

Here is a simple function implemented in SQL:

  CREATE FUNCTION delete_tab() RETURNS int4 AS '   DELETE FROM mynumber;   SELECT 1 AS deleted;'   LANGUAGE 'sql';

We simply remove all records from mynumber and return 1 . Let's have a look at the function in action:

 mydata=#  SELECT delete_tab();  delete_tab ------------           1 (1 row) mydata=#  SELECT * FROM mynumber;  a  b ---+--- (0 rows)

First we select all records from the table to see what the database contains. Then we call delete_tab . 1 is returned and we query the table again. Because the function has been executed successfully, the table is empty now.

Writing C Functions

Implementing really fast PostgreSQL extensions can hardly be achieved by using anything other than C. Because PostgreSQL is entirely written in C, this seems logical. Writing C code is most likely not the fastest way of implementing a feature, but the performance of your functions will not suffer from a lot of overhead that other programming languages (for example Perl or Tcl) will cause.

Normally, user -defined functions are added to PostgreSQL by using a loadable object (shared library). Shared libraries are loaded at runtime (when the function is called the first time) and stay in memory for the rest of the session. Knowing this is extremely important for debugging purposes. If you want to test your extensions with the help of psql, it is necessary to reconnect to the database after recompiling and adding the module to your database. Otherwise , the old object will still be in memory.

Tip

You should always use the full path to the shared library.

Additional functions are not inserted and compiled by PostgreSQL automatically. To add a function to a database, you have to use the CREATE FUNCTION command.

PostgreSQL's C Datatypes

PostgreSQL's C interface supports a large number of datatypes you will need to add additional functions to your PostgreSQL server. The following table lists all datatypes available in C:

SQL type	C Type	Defined In
abstime	AbsoluteTime	utils /nabstime.h
bool	bool	include/c.h
box	(BOX *)	utils/geo-decls.h
bytea	(bytea *)	include/postgres.h
"char"	char	N/A
cid	CID	include/postgres.h
datetime	(DateTime *)	include/c.h or include/postgres.h
int2	int2 or int16	include/postgres.h
int2vector	(int2vector *)	include/postgres.h
int4	int4 or int32	include/postgres.h
float4	(float4 *)	include/c.h or include/postgres.h
float8	(float8 *)	include/c.h or include/postgres.h
lseg	(LSEG *)	include/geo-decls.h
name	(Name)	include/postgres.h
oid	oid	include/postgres.h
oidvector	(oidvector *)	include/postgres.h
path	(PATH *)	utils/geo-decls.h
point	(POINT *)	utils/geo-decls.h
regproc	regproc or REGPROC	include/postgres.h
reltime	RelativeTime	utils/nabstime.h
text	(text *)	include/postgres.h
tid	ItemPointer	storage/itemptr.h
timespan	(TimeSpan *)	include/c.h or include/postgres.h
tinterval	TimeInterval	utils/nabstime.h

Calling Conventions for Functions Written in C

When implementing huge software projects such as PostgreSQL, it is necessary to introduce some sort of convention to make the code clearer and to make sure that programmers can easily keep track of the code and the entire project. Because PostgreSQL is a huge project, conventions for C functions have been introduced.

Two calling conventions are available. Version-0 Calling Convention is the old-style convention, which is still supported for compatibility reasons. A detailed discussion of the old-style convention is far beyond the scope of this book.

Version-1 Calling Convention is based on macros to avoid a lot of complexity for passing arguments and returning results. New-style functions are always defined like this:

 Datum name_of_function(PG_FUNCTION_ARGS)

For dynamically loaded functions, it is necessary to add the macro call to the same source file:

 PG_FUNCTION_INFO_V1(name_of_function);

For internal functions, this is not necessary, because PostgreSQL assumes that all functions are new-style. To retrieve arguments from PG_FUNCTIONS_ARGS , the PG_RETURN_xxx() macro is called, which corresponds to the argument's datatype. The data is returned with the help of a PG_RETURN_xxx() macro call, which also has to correspond with the return value's datatype.

To make the explanation a little clearer, let's look at a simple example. To divide floats, a function called float48div has been implemented in the following:

 $SRC_ROOT/src/backend/utils/adt/float.c

Here is the function:

 Datum float48div(PG_FUNCTION_ARGS) {         float4          arg1 = PG_GETARG_FLOAT4(0);         float8          arg2 = PG_GETARG_FLOAT8(1);         float8          result;         if (arg2 == 0.0)                 elog(ERROR, "float48div: divide by zero");         result = arg1 / arg2;         CheckFloat8Val(result);         PG_RETURN_FLOAT8(result); }

PG_FUNCTION_ARGS contains the arguments passed to the function. To extract the first argument from the input, PG_GETARG_FLOAT4 is called. The second argument is retrieved by using PG_GETARG_FLOAT8 . The return value is defined as float8 . In the next step, the function checks whether a division by zero has to be done. If not, the result is computed and returned using the PG_RETURN_FLOAT8 macro.

Another good example is the time_interval function, which can be found in the following:

 $SRC_ROOT/src/backend/utils/adt/date.c

Let's have a look at the code:

 Datum time_interval(PG_FUNCTION_ARGS) {         TimeADT         time = PG_GETARG_TIMEADT(0);         Interval   *result;         result = (Interval *) palloc(sizeof(Interval));         result->time = time;         result->month = 0;         PG_RETURN_INTERVAL_P(result); }

The input parameter is extracted by using a PG_GETARG_xxx macro again. This time we have to use PG_GETARG_TIMEADT , because this is the macro that corresponds to the required datatype. The macro that is used to return the result of the function also matches the datatype of the return value.

A very important point, when dealing with the previous function, is how the memory required for the return value is allocated: palloc is used instead of malloc .

Rules for Writing C Code

When extending your PostgreSQL server with C functions, some basic rules have to be taken into consideration.

Usually, the header files can be found in /usr/include/pgsql . To find out where the header files are installed on your system, you can use the pg_config command:

 [hs@duron Members]$  pg_config  --includedir  /usr/include/pgsql

All header files you will need to build PostgreSQL applications can be found in that directory.

When allocating memory, use palloc and pfree instead of malloc and mfree , as we showed in the previous section. The advantage of PostgreSQL's onboard memory management tools is that the allocated memory is freed automatically at the end of a transaction. This should help prevent memory leaks.

Your structures have to be "zeroed" using memset and bzero ; otherwise, you might run into trouble, because some functions (hash access method, hash join, and sort algorithm) use the raw bits contained in your structure. Because of alignment padding, initializing the fields of your structure will not be enough.

Include postgres.h before fmgr.h and any other header file; this should be done for compatibility reasons. c.h , elog.h , and palloc.h will automatically be included by postgres.h .

Make sure that your names do not conflict with names in PostgreSQL's executables; otherwise, you will face problems.

Compiling and linking your object files will need special flags so that they can be loaded dynamically. Check out the docs of the compiler on your system to find out more.