Returning Complete Rows from an Extension Function

If you've read through the first few sections in this chapter, you know how to write an extension function that returns a single scalar value (that's what the filesize() function does). You also know how to return a set of scalar values (that's what the filelist() function does). In this section, I'll show you how to return a set of rows (or, as the PostgreSQL developers prefer to call them, tuples).

To illustrate the sequence that you must follow to return multiple tuples from an extension function, I'll create a new function, fileinfo(), that combines filesize() and filelist(). You call fileinfo() with the name of a directory and it returns a SETOF tuples. Each tuple contains three columns: a filename, the size of the file (or NULL if the size is not known), and the file type (or NULL if the type is not known). When you've finished, you can call the fileinfo() function like this:

movies=# SELECT * FROM fileinfo( '/dev' );
 filename | filesize | filetype
---------------+------------+----------
 . | 9380 | d
 .. | 4096 | d
 adsp | 0 | c
 agpgart | 0 | c
 arpd | 0 | c
 audio | 0 | c
 cdrom | 0 | b
 console | 0 | c
 core | 1073156096 | -
 cpu | 360 | d
...

To start, you must define a data type that describes each row returned by the fileinfo() function:

movies=# CREATE TYPE _fileinfo AS ( filename TEXT, filesize INTEGER, filetype CHAR(1));
CREATE TYPE

I'll create a few helper functions that will simplify the fileinfo() function. Listing 6.6 shows the getFileInfo(), getFileType(), and text2cstring() functions:

Listing 6.6. fileinfo.c (Part 1)

 1 /*
 2 ** Filename: fileinfo.c
 3 */
 4 #include "postgres.h"
 5 #include "funcapi.h"
 6
 7 #include 
 8 #include 
 9
10 typedef struct
11 {
12 struct dirent ** dir_ctx_entries;
13 char * dir_ctx_name;
14 } dir_ctx;
15
16 static bool getFileInfo(struct stat * buf, char * dirName, char * fileName)
17 {
18 char * pathName = (char *) palloc(strlen(dirName)+1+strlen(fileName)+1);
19
20 strcpy( pathName, dirName );
21 strcat( pathName, "/" );
22 strcat( pathName, fileName );
23
24 if( stat( pathName, buf ) == 0 )
25 return( true );
26 else
27 return( false );
28 }
29
30 static char getFileType( mode_t mode )
31 {
32 if( S_ISREG(mode))
33 return( '-' );
34 if( S_ISDIR(mode))
35 return( 'd' );
36 if( S_ISCHR(mode))
37 return( 'c' );
38 if( S_ISBLK(mode))
39 return( 'b' );
40 if( S_ISFIFO(mode))
41 return( 'p' );
42 if( S_ISLNK(mode))
43 return( 'l' );
44 if( S_ISSOCK(mode))
45 return( 's' );
46
47 return( '?' );
48
49 }
50
51 static char * text2cstring( text * src )
52 {
53 int len = VARSIZE( src ) - VARHDRSZ;
54 char * dst = (char *)palloc( len+1 );
55
56 memcpy( dst, src->vl_dat, len );
57 dst[len] = '';
58
59 return( dst );
60 }
61

The getFileInfo() helper function (lines 16 through 28) calls stat() to retrieve metadata that describes the given file. The caller provides three parameters: the address of a struct stat structure that getFileInfo() fills in, the name of the directory where the target file resides, and the name of the target file itself. If the stat() function succeeds, getFileInfo() returns true and the caller can find the metadata for the file in the struct stat structure. If the stat() function fails, getFileInfo() returns false.

The second helper function, getFileType(), translates a mode_t (returned by the stat() function) into a single character that represents a file type. getFileType() returns one of the following values:

d (directory)
c (character device)
b (block device)
p (named pipe)
l (symbolic link)
s (socket)
? (unknown)
- (a "regular" filethat is, not one of the above)

The last helper function is text2cstring() (see lines 51 through 60). This function converts a TEXT value into a dynamically allocated, null-terminated string. The fileinfo() function (which I'll describe next) calls text2cstring() to convert its TEXT argument into the form expected by scandir().

The fileinfo() function is shown in Listing 6.7:

Listing 6.7. fileinfo.c (Part 2)

 62 PG_FUNCTION_INFO_V1(fileinfo);
 63
 64 Datum fileinfo(PG_FUNCTION_ARGS)
 65 {
 66 char * start = text2cstring( PG_GETARG_TEXT_P(0));
 67 dir_ctx * ctx;
 68 FuncCallContext * srf;
 69
 70 if( SRF_IS_FIRSTCALL())
 71 {
 72 TupleDesc tupdesc;
 73 MemoryContext oldContext;
 74
 75 srf = SRF_FIRSTCALL_INIT();
 76
 77 oldContext = MemoryContextSwitchTo( srf->multi_call_memory_ctx );
 78
 79 ctx = (dir_ctx *) palloc( sizeof( dir_ctx ));
 80
 81 tupdesc = RelationNameGetTupleDesc( "_fileinfo" );
 82
 83 srf->user_fctx = ctx;
 84 srf->max_calls = scandir( start, &ctx->dir_ctx_entries, NULL, alphasort );
 85 srf->attinmeta = TupleDescGetAttInMetadata( tupdesc );
 86
 87 ctx->dir_ctx_name = start;
 88
 89 MemoryContextSwitchTo( oldContext );
 90
 91 }
 92
 93 srf = SRF_PERCALL_SETUP();
 94 ctx = (dir_ctx *)srf->user_fctx;
 95
 96 if( srf->max_calls == -1 )
 97 SRF_RETURN_DONE( srf );
 98
 99 if( srf->call_cntr < srf->max_calls )
100 {
101 struct dirent * entry;
102 char * values[3];
103 struct stat statBuf;
104 char fileSizeStr[10+1] = {0};
105 char fileTypeStr[1+1] = {0};
106 HeapTuple tuple;
107
108 entry = ctx->dir_ctx_entries[srf->call_cntr];
109 values[0] = entry->d_name;
110
111 if( getFileInfo( &statBuf, ctx->dir_ctx_name, entry->d_name ))
112 {
113 snprintf( fileSizeStr, sizeof( fileSizeStr ), "%d", statBuf.st_size );
114 fileTypeStr[0] = getFileType( statBuf.st_mode );
115
116 values[1] = fileSizeStr;
117 values[2] = fileTypeStr;
118 }
119 else
120 {
121 values[1] = NULL;
122 values[2] = NULL;
123 }
124
125 tuple = BuildTupleFromCStrings( srf->attinmeta, values );
126
127 SRF_RETURN_NEXT( srf, HeapTupleGetDatum( tuple ));
128 }
129 else
130 {
131 SRF_RETURN_DONE( srf );
132 }
133 }

The fileinfo() function calls scandir() to generate an array that contains the names of all files in the given directory and then calls getFileInfo() (which in turn calls stat()) to retrieve the metadata for each file. The server calls fileinfo() until it stops returning values. Each invocation returns a single tuple (of type _fileinfo) that contains a filename and the size and type of that file.

fileinfo() starts by converting its argument from a TEXT value into a null-terminated string (the scandir() function that fileinfo() calls at line 84 requires a null-terminated string). At line 70, fileinfo() calls the SRF_IS_FIRSTCALL() macro to decide whether it should create and initialize a new context structure or use a structure created by a prior invocation.

The fileinfo() function has to do a little more memory-management work than the other functions you've seen in this chapter. The earlier functions allocated memory from the srf->multi_call_memory_ctx (or fmgr_info>fn_mcxt) pool. fileinfo() also allocates memory from that pool, but fileinfo() calls other PostgreSQL functions that allocate memory as well. For example, at line 81, you see a call to RelationNameGetTupleDesc(). That function allocates memory using the server's palloc() function. You must ensure that RelationNameGetTupleDesc() (and any function called by RelationNameGetTupleDesc()) allocates memory from the correct MemoryContext. Each MemoryContext has its own lifetime (or scope). If RelationNameGetTupleDesc() allocates memory from a MemoryContext with a lifetime that's too short (that is, a lifetime that ends before that last call to fileinfo()), you'll find that the data created by RelationNameGetTupleDesc() is de-allocated out from under you. On the other hand, if RelationNameGetTupleDesc() allocates memory from a MemoryContext with a lifetime that's too long, you'll create a memory leak. Take a look at line 81. Notice that you call RelationNameGetTupleDesc() with a single argument (the name of tuple type). Since you can't pass a MemoryContext to RelationNameGetTupleDesc(), how do you tell that function which MemoryContext to use? The answer is deceptively simple. Look closely at the call to palloc() at line 79. palloc() is the most commonly used memory allocation function in the PostgreSQL server. palloc() allocates memory from the MemoryContext pointed to by the CurrentMemoryContext global variable. If you want to talk RelationNameGetTupleDesc() into using a specific MemoryContext, you have to point CurrentMemoryContext to that context. That's what the code at line 77 does. Call MemoryContextSwitchTo() whenever you need to change the lifetime of data allocated by palloc(). Notice that the call at line 77 selects srf->multi_call_memory_ctx (which is a MemoryContext that survives as long as fileinfo() has more tuples to return). After MemoryContextSwitchTo() returns, palloc() will allocate memory from that MemoryContext until somebody calls MemoryContextSwitchTo() again. MemoryContextSwitchTo() switches to a new MemoryContext and returns the previous value. You should restore the original MemoryContext when you're finished with the new one (see line 89).

Once the correct MemoryContext is in place, fileinfo() allocates a new dir_ctx context structure (see line 79). Next, fileinfo() calls the RelationNameGetTupleDesc() function to retrieve the TupleDesc that defines the _fileinfo type (remember, the _fileinfo type describes the layout of the tuples returned by fileinfo(); you created the _fileinfo type earlier with a CREATE TYPE command). A TupleDesc is a structure that describes the shape of a tuple. It contains (among other things) the number of columns in the tuple and a description of each column. You don't have to peek inside of a TupleDesc (unless you want to) but fileinfo() needs the descriptor to build a return value.

After retrieving the tuple descriptor, fileinfo() records the address of its new context structure so it can find the structure in future invocations (line 83). Next, fileinfo() calls the scandir() function to generate an array that contains the name of each file in the given directory (start). scandir() records the address of the array in ctx->dir_ctx_entries.

When the fileinfo() function returns a value to the caller, it does so by building a tuple out of a collection of null-terminated strings; each string corresponds to one of the columns in the tuple. The TupleDesc that fileinfo() retrieved at line 81 doesn't contain quite enough information to convert C strings into a tuple. Fortunately, PostgreSQL provides a function that translates a TupleDesc into a new structure that contains all of the data you'll need: TupleDescGetAttInMetaData(). The code at line 85 calls this function and stores the address of the resulting structure in srf->attinmeta (which the PostgreSQL developers conveniently included for just this purpose). A little later, fileinfo() will use the new structure to create the return tuple.

The initialization phase completes by storing a copy of the directory name (line 87) and restoring the MemoryContext that was in place when fileinfo() was first called (line 89).

The code at lines 93 through 99 should be familiar by nowsee the previous section ("The PostgreSQL SRF Interface") if you need a refresher. Every time fileinfo() is called, it calls the SRF_PERCALL_SETUP() macro to find the appropriate FuncCallContext structure and then extracts the address of the dir_ctx structure created by the initial invocation.

I mentioned earlier that fileinfo() creates a return tuple out of a collection of null-terminated strings. Each tuple contains three columns: a filename, the size of the file, and the file type. Accordingly, fileinfo() creates three null-terminated strings (one for each column). The values[] array (see line 102) contains a pointer to each null-terminated string. After filling in values[], fileinfo() will call BuildTupleFromCStrings() to convert the strings into a tuple.

The first null-terminated string (values[0]) contains the name of one file found in the dir_ctx_entries[] array. The assignment statement at line 109 copies the address of the file name into values[0].

The other null-terminated strings (values[1] and values[2]) contain the file size and the file type (respectively). To find the size of the file, fileinfo() calls the getFileInfo() function you saw earlier. If successful, getFileInfo() returns true and fills in the statBuf structure with (among other things), the file size and type. After converting the file size into a null-terminated string (line 113) and translating the file mode into a human-readable file type (line 114), fileinfo() fills in the rest of the values[] array.

If getFileInfo() fails for some reason, the return tuple should contain a NULL filesize and a NULL filetype to indicate that those values are "unknown." Setting a column to NULL is easy Just set the corresponding entry in the values[] array to NULL (see lines 121 and 122).

By the time it reaches line 125, fileinfo() has gathered all of the information it needs to create the return tuple. The values[] array contains three string pointers (or one string pointer and two NULL's). To convert the null-terminated strings into a tuple, fileinfo() calls PostgreSQL's BuildTupleFromCStrings(). That function uses the tuple description produced by the earlier call to TupleDescGetAttInMetadata() and the pointers in values[] to create a tuple in the form expected by the PostgreSQL server. fileinfo() returns the tuple to the server by invoking the SRF_RETURN_NEXT() macro that I described earlier (see "The PostgreSQL SRF Interface" for more information).

When fileinfo() has finished processing all of the file names found in dir_ctx_entries[], it invokes the SRF_RETURN_DONE() macro instead to tell the server that it has finished building the result set.

If you want to try this function yourself, compile and install it (as described earlier) and execute the following command to tell the PostgreSQL server how to find and invoke the function:

CREATE OR REPLACE FUNCTION fileinfo(TEXT)
 RETURNS SETOF _fileinfo
 AS 'filelist.so','fileinfo' LANGUAGE C
 STRICT;

If you are rewarded with a message that states type _fileinfo is not yet defined, you forgot to execute the CREATE TYPE command that I mentioned at the beginning of this section.

You can call the fileinfo() function in any context where you would normally SELECT from a table. For example, to find the names of all files in the /dev directory:

movies=# SELECT * FROM fileinfo( '/dev' );
 filename | filesize | filetype
---------------+------------+----------
 . | 9380 | d
 .. | 4096 | d
 adsp | 0 | c
 agpgart | 0 | c
 arpd | 0 | c
 audio | 0 | c
 cdrom | 0 | b
 console | 0 | c
 core | 1073156096 | -
 cpu | 360 | d
...

One of the cool things about PostgreSQL functions is that you can mix functions that are written in different languages. For example, you can call fileinfo() (which is written in C) from a function written in PL/pgSQL (one of PostgreSQL's procedural language). In fact, Listing 6.8 shows a PL/pgSQL function that returns a SETOF _fileinfo tuples (just like the fileinfo() function). This function calls fileinfo() to recursively descend through an entire directory tree, returning one tuple for each file (and subdirectory) that it finds.

Listing 6.8. dirtree.sql

 1 -- File: dirtree.sql
 2
 3 CREATE OR REPLACE FUNCTION dirtree( TEXT ) RETURNS SETOF _fileinfo AS $$
 4 DECLARE
 5 file _fileinfo%rowtype;
 6 child _fileinfo%rowtype;
 7 BEGIN
 8
 9 FOR file IN SELECT * FROM fileinfo( $1 ) LOOP
10 IF file.filename != '.' and file.filename != '..' THEN
11 file.filename = $1 || '/' || file.filename;
12
13 IF file.filetype = 'd' THEN
14 FOR child in SELECT * FROM dirtree( file.filename ) LOOP
15 RETURN NEXT child;
16 END LOOP;
17 END IF;
18 RETURN NEXT file;
19 END IF;
20 END LOOP;
21
22 RETURN;
23
24 END
25 $$ LANGUAGE 'PLPGSQL';

Don't worry if you don't understand the dirtree() function yet. I'll describe the PL/pgSQL language in full detail in Chapter 7, "PL/pgSQL." The important thing to note here is that dirtree(), a function written in PL/pgSQL can call fileinfo(), a function written in C. Adding useful extension functions to PostgreSQL is not too difficult (assuming that you are comfortable working in C). Now that you understand the mechanism for creating new functions, I'd like to turn your attention to the process of creating a new data type. When you add a new data type to PostgreSQL, you must create a few supporting extension functions, so be sure you understand the material covered so far.

Part I: General PostgreSQL Use

Introduction to PostgreSQL and SQL

Working with Data in PostgreSQL

PostgreSQL SQL Syntax and Use

Performance

Part II: Programming with PostgreSQL

Introduction to PostgreSQL Programming

Extending PostgreSQL

PL/pgSQL

The PostgreSQL C APIlibpq

A Simpler C APIlibpgeasy

The New PostgreSQL C++ APIlibpqxx

Embedding SQL Commands in C Programsecpg

Using PostgreSQL from an ODBC Client Application

Using PostgreSQL from a Java Client Application

Using PostgreSQL with Perl

Using PostgreSQL with PHP

Using PostgreSQL with Tcl and Tcl/Tk

Using PostgreSQL with Python

Npgsql: The .NET Data Provider

Other Useful Programming Tools

Part III: PostgreSQL Administration

Introduction to PostgreSQL Administration

PostgreSQL Administration

Internationalization and Localization

Security

Replicating PostgreSQL Data with Slony

Contributed Modules

Index