The New PostgreSQL C++ APIlibpqxx | Part II: Programming with PostgreSQL

Table of contents:

Client 3Processing Queries

libpqxx defines a small group of interrelated classes you use to execute commands and retrieve results. You already know about the connection class, which represents a connection to a server. In a libpqxx application, all commands are executed using an object derived from class transaction_base, which represents a server-side transaction. When you execute a command, the result set (that is, the data returned by the server) is stored in an object of class resultlibpqxx creates a result object each time you execute a command. Inside a result object you'll find a collection of zero or more objects of class tuple. A tuple represents a single row in the result set and is a collection of one or more objects of type field. Each field represents a single data value. You can access the metadata returned by a command through the result, tuple, and field classes as well. In this section, we'll show you how all the pieces fit together.

Working with Transactions

To execute a query or object>> object>other command in a libpqxx, you must use a transaction<> object (or an object derived from TRansaction<>). The TRansaction<> constructor expects either one or two arguments:

transaction( connection_base & conn, const std::string & transactionName );
transaction( connection_base & conn );

Each transaction<> is attached to a single connection objectthe transaction<> uses the connection to send commands to the server and to retrieve results from the server. If you provide a transactionName[1] when you call the constructor, error messages thrown by the transaction include the name you provide; that makes it a little easier to figure out where an exception is coming from in a complex application.

[1] The libpqxx documentation states that a TRansaction<> name must start with a letter and contain only letters and digits.

The TRansaction<> type is a template class, parameterized on the transaction isolation level you want to enforce. Because the PostgreSQL server offers only two transaction isolation levels, you can choose from read_committed or serializable (see the section titled "Transaction Processing" in Chapter 3, "PostgreSQL SQL Syntax and Use," for more information about isolation levels). The default isolation level is read_committed so all three of the following definitions are equivalent:

transaction myTransaction( myConnection );
transaction<> myTransaction( myConnection );
work myTransaction( myConnection );

libpqxx defines a class named work as a convenient synonym for TRansaction.

After you have a transaction<> object, call the exec() member function to execute a command on the server. transaction<> defines three flavors of the exec() member function:

result exec( const char queryText[], const std::string & description );
result exec( const std::string queryText, const std::string & description );
result exec(const std::stringstream queryText, const std::string & description);

You can execute a command using whatever object>> object>form of query string you happen to have (a null-terminated C-style string, an std::string, or an std::stringstream). The description argument is optionalif you provide a description, libpqxx includes it in some (but not all) error messages generated by the command.

To execute a command on the server, simply call exec() with the text of the command and capture the return value in a result object:

...
 std::string myCommand( "SELECT * FROM customers" );

 try
 {
 result myResult = myTransaction.exec( myCommand. "Get Customer List" );
 }
 catch( runtime_error & e )
 {
 cerr << "Error detected: << e.what();
 }
 catch( exception & e )
 {
 cerr << e.what();
 }
 catch( ... )
 {
 cerr << "Unknown exception caught" << endl;
 }
...

The call to exec() blocks until the command completes and the entire result set has been received (or until an exception is thrown). libpqxx does not provide a nonblocking exec() variant at this time.

Use sqlesc() to Quote Strings That You Get from Outside Sources

Let's say you're writing object>> object>an interactive program that any customer can use to find his current account balance. You prompt the customer for his name and then create and execute a query similar to the following:

SELECT balance FROM customers WHERE customer_name = 'name';

That looks innocuous enough. But what happens if a customer types in something like this:

Panky, Henry'; DELETE FROM customers; COMMIT;

Now you append this suspicious name to the end of your SELECT command and you end up with:

SELECT balance FROM c3ustomers
 WHERE name = 'Panky, Henry'; DELETE FROM customers; COMMIT;'

That's probably not what you want. How do you prevent that kind of attack? You simply run any string you get from the outside world through the sqlesc() function before using it in a command. sqlesc() makes outside strings safe. In this example, sqlesc() would have changed name into

Panky, Henry'';DELETE FROM customer; COMMIT;

Notice that the single quotation mark in the original name has been doubledthat makes it safe to use inside of a query. When the server sees the properly quoted string, it thinks you're looking for a customer whose name happens to include a single quote and simply returns an empty result set.

Completing a Transaction

Because a libpqxx object>> object>application executes all commands within the context of a transaction<>, you must complete the TRansaction<> by calling either TRansaction<>::commit() or TRansaction::abort(). It's important to remember that if you don't explicitly commit a transaction by calling transaction<>::commit(), the transaction is automatically aborted when the TRansaction<> object is destroyed.

When you complete a transaction, the result set data and metadata are still available, but be aware that a result set that comes from an aborted transaction should be treated with suspicion.

Other Transaction Types (nontransaction and robusttransaction)

The transaction<> template class is one of three transaction-like classes defined by libpqxx. The robusttransaction<>> class class>has the same interface as transaction<>you use a robusttransaction<> object in exactly the same way that you use a transaction<> object. The difference between the two classes becomes apparent when you lose the connection to a server at exactly the wrong time. Regardless of which transaction type you use, libpqxx tries to reconnect to a server if your connection breaks downyou don't have to do anything because it happens automatically. What happens if your connection drops immediately after you execute a COMMIT, but before the server has a chance to send you an acknowledgement? Did the COMMIT execute? Without some extra work, you have no way to know. A robusttransaction takes care of that extra work for you. When you start a robusttransaction, libpqxx writes a record into a special log table in the server. When you COMMIT a robusttransaction the log record is committed as well. If the connection drops out before the COMMIT is acknowledged, robusttransaction simply looks for the log record (after reconnecting). If the log record is there, the COMMIT succeeded; if it's not there, the entire transaction aborted and you can execute the transaction again. Or better yet, you can use a TRansactor object to automate the whole process (see "Client 4Working with transactors" later in this chapter for more details). If you are executing read-only queries, you can use the class nontransaction instead of transaction<>. When you execute commands through a nontransaction object, you don't need to commit or abort the transaction when you're finished because each command is automatically object>> object> committed as soon as it completes.

Working with Result Sets

The TRansaction<>::exec() functions all return an object of type class result. A result is a very rich, STL-compatible container that holds both the data and metadata returned by a command. Because result is a standard container, it provides the usual container-related member functions such as size(), which returns the number of rows in the result set; empty(), which returns TRUE if the size() is zero; and clear(), which discards the result set.

If a result is a container, what exactly does it contain? A result object contains all the tuples (that is, rows) returned by a command. A tuple is a container, tooit contains all the fields in a single row of the result set. So, a result is a collection of tuples and a tuple is a collection of fields.

Working with Tuples (Rows)

The result class offers three methods for getting at the tuples inside. First, you can use the operator[] function to retrieve a tuple by its zero-based index:

...
 result myResult = myTransaction.exec( myCommand, "Get Customer List" );

 tuple firstRow = myResult[0];
 tuple secondRow = myResult[1];
...

Like any container class, calling result::operator[]() with an out-of-range index results in undefined, but generally nasty, behavior (I'll show you how to find the number of rows contained inside a result in a moment). Second, you can use the at() member function:

...
 result myResult = myTransaction.exec( myCommand, "Get Customer List" );

 try
 {
 tuple firstRow = myResult.at( 0 );
 tuple secondRow = myResult.at( 1 );
 }
 catch( out_of_range e )
 {
 // Handle out_of_range error here
 }
...

If you call result::at() with an out-of-range index, it throws an out_of_range exception. Finally, you can use the result::begin() and result::end() member functions to process tuples using an iterator:

...
 result myResult = myTransaction.exec( myCommand, "Get Customer List" );

 result::const_iterator I;

 for( I = myResult.begin(); I != myResult.end(); ++I )
 {
 const tuple tup = *I;

 processTuple( tup );
 }
...

Calling result::begin() gives you an iterator that points to the first tuple in the result set. result::end() returns an iterator that points just past the last tuple in the result set. An iterator acts suspiciously like a pointer. If you de-reference an iterator with the * operatorfor example, *myIteratoryou get a copy of the tuple to which the iterator currently points. If you de-reference an iterator with the ->for example, myIterator->memberyou access a member of the object to which the iterator currently points.

Because a result::const_iterator is an STL-compatible iterator, you can use it with most of the STL algorithms. I say "most" because libpqxx provides a const iterator, but not a non-const iterator. That means you can't use libpqxx iterators with algorithms that modify the object pointed to; that is, you can't use algorithms like sort(), transform(), fill(), replace_if(), remove(), and random_shuffle(). You can use a const_iterator with nonmodifying algorithms.

Most, if not all, of the STL algorithms expect at least two iterators that define a range of elements within the collection. For example, one of the most useful STL algorithms, for_each(), expects two iterators and a functor[2]. for_each() applies the functor to each element in the range between the first and second iterator. Consider the following code snippet:

[2] A functor is a function pointer or an object that defines the operator() member function.

...
 result myResult = myTransaction.exec( myCommand, "Get Customer List" );
 for_each( myResult.begin(), myResult.end(), printTuple );
...

The call to myResult.begin() returns an iterator that points to the first tuple in myResult. The call to myResult.end() returns an iterator that points just past the last tuple in myResult. The two iterators define a rangein this case, the range consists of all elements (tuples) in the collection (myResult). for_each() walks through the range and calls printTuple()[3] for each tuple it encounters.

[3] There are two ways you could define the printTuple() function: printTuple( tuple ) or printTuple( const tuple & ). In the first case, for_each() makes a copy of every tuple it finds in the range. In the second case, for_each() simply passes a reference to each tuple (which is a bit less expensive). Note that if you try to define both variants, your compiler will complain about an ambiguous overloaded function.

Of course, you can fine-tune the range seen by for_each() if you want to process a subset of the tuples in a result set. Because an iterator acts like a pointer, you can perform pointer arithmetic with iterators. For example, to advance an iterator by two elements, simply add 2 to the iterator, like this:

...
 result::const_iterator i = myResult.begin();

 i = i + 2;
...

To back up by two elements (that is, to move an iterator two elements toward the beginning of the collection), just subtract 2 from the iterator:

...
 result::const_iterator i = myResult.end();

 i = i - 2;
...

You can subtract one iterator from another to find the distance (the number of tuples) between the two. For example, to find the number of tuples in a result set, you could use the following code:

int tupleCount = myResult.end() - myResult.beg();

You could compute tupleCount that way, but we'll show you a much more intuitive method in a moment.

libpqxx also defines a reverse iterator (result::const_reverse_iterator) that walks through a result set starting with the last tuple and moving toward the first tuple. To print a result set in reverse order, use the following:

...
 result myResult = myTransaction.exec( myCommand, "Get Customer List" );

 for_each( myResult.rbegin(), myResult.rend(), printTuple );
...

When you call rbegin() (as opposed to begin()), you get an iterator that points just past the last tuple in the result set. Calling rend(), as opposed to end(), returns an iterator that points to the first tuple in the result set. When you increment a reverse iterator, you move toward the beginning of the collection (when you increment a forward iterator, you move towards the end).

Avoiding the find() and count() Algorithms

You might be tempted to use the find(), find_if(), count(), and count_if() STL algorithms to search through the tuples in a result. If possible, you should avoid using those functionsif you need to know whether a particular value exists within the result set, refine your WHERE clause and ask the server to find the value you're interested in. If you ask the server to find the value(s) you are interested in or to count data values, you won't suffer the performance hit required to send a large result set over the client/server connection.

Working with Fields (Columns)

Now that you know how to access the rows in a result set, it's time to turn your attention to the columns (libpqxx calls them fields) within a tuple. First off, you can find out how many fields are found within a tuple by calling the tuple::size() member function:

...
 const result::tuple & myTuple = myResult[0];

 cout << "This tuple holds " << myTuple.size() << " columns" << endl;
...

Note that tuple::size() returns the same number of fields for every tuple in a result regardless of whether the tuple contains NULL values.

There are two mechanisms and six member functions you can use to get a field value from a tuple. To access a field by its zero-based column number, you can call the tuple::operator[]() member function or tuple::at(). Like other STL containers, tuple::operator[]() with an invalid index causes undefined behavior (but you can bet you won't like the behavior). tuple::at tHRows an out_of_range exception instead. The following calls are equivalent (assuming that index is valid):

...
 result::field myField;

 myField = myTuple[index];
 myField = myTuple.at( index );
...

Of course, you should call tuple::at() within a try/catch block because it can throw an exception.

If you know the name of a field in the result set, you can access that field by name. tuple overloads both the [] operator and the at() member function to retrieve a field by its name. You can specify a field name as a std::string or as a C-style null-terminated string. To access a field by name using a C-style string, use code similar to the following:

...
 result::field myField;

 myField = myTuple[ "customer_name"];
 myField = myTuple.at( "customer_name" );
...

To access a field, by name, using a std::string:

...
 result::field myField;
 std::string fieldName( "customer_name" );

 myField = myTuple[ fieldName ];
 myField = myTuple.at( fieldName );
...

As you might expect, calling operator[]() with an invalid field name results in undefined behavior. However, if you call at() with an invalid name, you get an invalid_argument exception instead of the usual out_of_range exception.

All together, tuple defines six member functions that return a field:

field operator[]( size_type index) const throw()
field operator[]( const char fieldName[] ) const
field operator[]( const std::string & fieldName ) const
field at( size_type index ) const throw( std::out_of_range )
field at( const char fieldName[] ) const throw()
field at( const std::string & fieldName ) const throw()

You might have noticed that a tuplewhich is an STL-compatible container in most other respectsdoes not provide an iterator you can use to walk through the fields within the tuple. The author of libpqxx might add a column iterator at a later day, but for now you'll have to use operator()[] or at() (or write your own iterators).

Because result defines an operator[]() function that returns a tuple and tuple defines an operator[]() function that returns a field, you can treat a result as a two-dimensional array. For example, to process every field in every tuple in a result, you could use a code snippet similar to the following:

...
 for( row = 0; row < myResult.size(); ++row )
 for( col = 0; col < myResult[0].size(); ++col )
 processField( myResult[row][col] );
...

A field object can tell you a lot about itself. If you want to know the name of a field, call the field::name() member function (which returns a C-style null-terminated string):

...
 for( col = 0; col < myResult[0].size(); ++col )
 cout << "Column " << col <<
 " is named " << myResult[0][col].name() << endl;
...

To retrieve the data type of a field, call the field::type() member function:

oid columnType = myField.type();

Notice that field::type() returns an OID. The OID you get back from field::type() corresponds to an entry in the server's pg_type table. To map an OID back into a type name, simply execute the following query:

SELECT typname FROM pg_type WHERE oid = value;

Plugging in the return value from field::type() in place of value. libpqxx does not currently provide a way to retrieve the type modifiers associated with a field. That means, for example, that you can't query a DECIMAL field for its precision or scale.

You can call the field::is_null() function (which returns a bool) to determine whether a given field is NULL. is_null() returns true if the field is NULL and returns false if the field is non-NULL.

Finally, you can find the size of a field's value (in bytes) by calling field::size().

Which brings us around to two important questions: How do you get to the value of a field, and how is that value represented? All field values are stored internally as C-style null-terminated strings, which we'll refer to as "c_str() form". To get to the null-terminated string, call the field::c_str() member functionc_str() returns a pointer to the first character in the string. Here's a code snippet that prints every column in every row returned by a query[4]:

[4] libpqxx defines an operator<< function that makes this code snippet even easier: You can omit the call to c_str() when you write a field to a stream. cout << myResult[row][col] will do the job nicely.

...
 for( row = 0; row < myResult.size(); ++row )
 {
 for( col = 0; col < myResult[0].size(); ++col )
 cout << myResult[row][col].c_str() << ' ';
 cout << endl;
 }
...

The field class defines a number of template functions you can use to convert values from c_str() form into some other data typenamely, the type of the template parameter. The conversion functions can seem a little mysterious at first, but after you've seen a few examples, they make perfect sense. The template bool field::to( T & obj ) conversion function (or field::to() for short) is the easiest to understand. This function converts a field value from its normal c_str() form into some other type (T). field::to() returns true if the conversion succeeded, returns false if the field is NULL, or throws an std::domain_error if the conversion fails (a conversion might fail if, for example, you try to convert a string that contains letters into an int). Here's an example:

...
 const result::field & myField = myResult[0]["balance"];
 float fldValue( 0.0F );

 try
 {
 if( myField.to( fldValue ))
 cout << "Customer balance: " << fldValue << endl;
 else
 cout << "Customer balance: NULL" << endl;
 }
 catch( std::domain_error )
 {
 cerr << "Can't convert customer_balance into float form" << endl;
 }
....

Look closely at the call to myField.to( fldValue ). If the conversion succeeds, fldValue contains the value of the field, converted from c_str() form to float form. If the value of myField happens to be NULL, fldValue remains untouchedthat is, fldValue retains its initial value of 0.0F. If the conversion fails, field::to() throws an exception.

If you prefer, you can use the second variant of field::to() to provide an explicit value to use in place of NULL; the first form of field::to() leaves the destination object untouched if the field contains a NULL value. The second form of field::to() expects two arguments: a reference to the destination object and a default value. Here's the same code snippet you just saw, rewritten to use the second form of field::to():

...
 const result::field & myField = myResult[0]["balance"];
 float fldValue;

 try
 {
 myField.to( fldValue, float( 0.0F )))

 cout << "Customer balance: " << fldValue << endl;
 }
 catch( std::domain_error )
 {
 cerr << "Can't convert customer_balance into float form" << endl;
 }
....

Notice that you don't have to check the return code from myField::to()if the function returns (that is, if it doesn't throw an exception) fldValue returns a predictable value. If myField contains NULL, fldValue is explicitly set to 0.0F. The data type of the default value must match the data type of the destination objectin this example, the default value and destination object are both of type float.

field:to() knows how to convert a c_str() value into any of the following data types:

long
unsigned long
int
unsigned int
short
unsigned short
float
double
long double
bool
std::string
std::stringstream

The field class defines two more conversion (template) functions that you might find useful. The template T field::as( const T & defaultValue ) function (or field::as( defaultValue ) for short) converts a field value from c_str() form into type T, substituting defaultValue if the field happens to contain NULL. Like field::to(), field::as( defaultValue ) throws an std::domain_error if the conversion fails. field::as( defaultValue ) can convert into the same set of data types that field::to() knows how to deal with (long, unsigned long, int, and so on). So what's the difference between field::to() and field::as( defaultValue )? When you call field::to(), it converts the field value into an object (you provide a reference to the target object) and field::to() returns a Boolean value to indicate whether the field contains a NULL value. When you call field::as( defaultValue ), it creates an object of the desired type and returns a copy of that object. Here's a snippet that shows field::as( defaultValue ) in action:

...
 const result::field & myField = myResult[0]["balance"];

 try
 {
 cout << "New customer balance: " << myField.as( float(0.0F))+10.0F << endl;
 }
 catch( std::domain_error )
 {
 cerr << "Can't convert customer_balance into float form" << endl;
 }
....

Notice that you don't have to create a temporary float object here; field::as( defaultValue ) does that for you. Using field::as( defaultValue ) instead of field::to() can make your code more understandable.

You might be wondering how field::as( defaultValue ) decides which type of object to produce (C++ compilers can't infer the data type based on the return value). The answer is devilishly simple: field::as( defaultValue ) produces an object whose type matches the type of defaultValue, which brings us around to the last conversion function defined by class field. The second variant of field::as() is similar to field::as( defaultValue ) except that it doesn't let you specify a defaultValue. When you call the second variant, it throws an std::domain_error exception if the field contains a NULL value. In addition, it throws an std::domain_error if the conversion fails. If you can't define a defaultValue when you call this variant of field::as(), how does the function know what kind of object to return? You have to tell itand if you're not a hard-core C++ hacker, the syntax looks a little strange. Here's the same code snippet you saw earlier, this time rewritten to use the second form of field::as():

...
 const result::field & myField = myResult[0]["balance"];

 try
 {
 cout << "New customer balance: " << myField.as() + 10.0F << endl;
 }
 catch( std::domain_error )
 {
 cerr << "Can't convert customer_balance into float (or balance is NULL)"
 << endl;
 }
....

A call to this variant of field::as() is written myField.as(), where typename indicates the kind of object you want.

A Few Words About Qualified Data Type Names

All the libpqxx data types are defined in the pqxx namespace to help segregate libpqxx names from other libraries. That's good, but it can result in a lot of typing if you include pqxx:: at the beginning of every data type name. Instead, you can add a using namespace pqxx; directive near the top of your C++ source file (just after the #include directives). Likewise, libpqxx defines a few classes inside the result class. For example, the complete name of the tuple data type is really pqxx::result::tuple. Again, this involves a lot of extra typing. Make your source code a little easier to read and define a few new type names to make it easier to deal with libpqxx's well-structured nesting:

[View full width]

typedef pqxx::result::tuple tuple; // a synonym for 'pqxx:

:result::tuple' typedef pqxx::result::field field; // a synonym for 'pqxx:

:result::field' typedef pqxx::result::const_iterator result_iterator; // You get

the idea...

Summary of Metadata Stored in result, tuple, and field Objects

That's a lot of information to digestthe result, tuple, and field classes are very rich. Here's a quick summary of the metadata and data-access functions you'll find in these three classes.

A result set is represented by an object of class result. You get a result object when you call transaction<>::exec() to execute a command on the server.

A result is a collection of zero or more rows, each represented by an object of class tuple. You can access the tuples within a result by calling result::at() or result::operator[](), as shown here:

const tuple result::operator[]( size_type rowNumber ) throw();
const tuple result::at( size_type rowNumber ) throw( std::out_of_range );

Or, you can call result::begin() or result::rbegin() to obtain an iterator (or reverse iterator) you can use to walk through the tuples in a result. The prototypes for the iterator factories are

const_iterator result::begin();
const_iterator result::end();
const_reverse_iterator result::rbegin();
const_reverse_iterator result::rend();

A result object contains a wealth of metadata as well. To determine whether the command returned any tuples, call result::empty(). To find the number of tuples in the result set, call result::size(), and to find the number of columns, call result::columns(). The prototypes for these three functions are shown here:

size_type result::size();
bool result::empty();
tuple::size_type result::columns() throw();

You can also retrieve the name and data type of each column within a result using these functions:

const char * result::column_name( tuple::size_type columnNumber );
oid result::column_type( tuple::size_type columnNumber );
oid result::column_type( const std::string & columnName );
oid result::column_type( const char columnName[]);

Notice that you can retrieve the data type of a column by column name or column number.

You can find the column number of a column, if you know its name, by calling either of these two functions:

tuple::size_type result::column_number( const char columnName[]);
tuple::size_type result::column_number( const std::string & columnName);

Finally, the result class provides two functions you can call after you execute a command other than SELECT. The result::affected_rows() function returns the number of rows affected by an INSERT, UPDATE, or DELETE command. If you've just executed an INSERT command, you can call result::inserted_oid() to find the object id of the new row.

size_type result::affected_rows();
oid result::inserted_oid();

result::inserted_oid() returns the constant InvalidOid if you've just executed a command other than INSERT, if you've added more than one row, or if you've added a row to a table that does not store OIDs.

A tuple is a collection of one or more values, each represented by an object of class field. You can access the fields within a tuple by calling field::at() or field::operator[](), as shown here:

field tuple::operator[]( size_type columnNumber ) throw();
field tuple::operator[]( const char columnName[] );
field tuple::operator[]( const std::string & columnName );
field tuple::at( size_type columnNumber ) throw( std::out_of_range );
field tuple::at( const char columnName[] );
field tuple::at( const std::string & columnName );

You can use tuple::at() and tuple::operator[]() to find a column by its zero-based columnNumber or its columnName. If you try to find a column by name and the result set contains two or more columns with that name, the results are unpredictable (currently, you'll find the first column with the given name, but that behavior is undocumented and might change in the future).

If you want to retrieve the row number of a tuple (that is, the offset from the first tuple), call tuple::rownumber(). To find the number of columns in a tuple, use the tuple::size() function. These two functions are defined as follows:

result::size_type tuple::rownumber() throw();
size_type tuple::size() throw();

Like the result class, tuple provides a number of ways to find the column number and data type of a given column:

size_type tuple::column_number( const std::string & columnName );
size_type tuple::column_number( const char columnName[] );
oid tuple::column_type( size_type columnNumber );
oid tuple::column_type( const std::string & columnName );
oid tuple::column_type( const char columnName[] );

Unlike the result class, tuple does not provide a function that returns a column's name given its column numberthat seems like an oversight and might be corrected in a future version.

Finally, an object of type field represents the value of one column in one row of a result set. The field class defines four functions that expose the metadata associated with a value:

const char * field::name();
oid field::type();
bool field::is_null();
size_type field::size();

To access the value stored within a field as a C-style null-terminated string, call the field::c_str() function. To convert a field value into a different data type, call any of the field::to() or field::as() functions. The prototypes for these functions are shown here:

 const char *field::c_str();
template bool field::to( T & destination );
template bool field::to( T &destination, const T & defaultValue );
t emplate T field::as( const T & defaultValue );
template T field::as();

The Relationship Between libpqxx and libpq

Like most of the client-side APIs, libpqxx is a wrapper around the libpq interface, which is often used directly by PostgreSQL applications written in C. libpqxx is a lightweight wrapper. When you make a copy of a result object, you aren't duplicating the entire result set of a command; you're duplicating a pointer to libpq's copy of the result set. When you make a copy of a tuple, you're not duplicating all the values within that tupleyou're duplicating a pointer to the result set plus a row number. The same holds true for a field object: It's simply a pointer (of some sort) into the libpq result set. That means that libpqxx is an inexpensive wrapper. libpqxx uses a reference-counting scheme to ensure that the libpq result set is not discarded until all the pointers have been destroyed.

Now that you know how to access all the data and metadata stored inside a result and how to execute a command on the server, it's time to put everything together into a simple client application. Listings 10.610.9 show a simple client that uses libpqxx to execute a hard-wired query and print the results.

Listing 10.6. client3.cc (Part 1)

 1 /* client3.cc */
 2
 3 #include 
 4 #include 
 5 #include 
 6
 7 using namespace std;
 8 using namespace pqxx;
 9
10 typedef pqxx::result::tuple tuple;
11 typedef vector widthVector;
12
13 // Forward function declarations
14 static void getColumnWidths(const result & res, widthVector & widths);
15 static void printColumnHdrs(const result & res, const widthVector & widths);
16 static void printColumnVals(const result & res, const widthVector & widths);
17
18 int main( int argc, char * argv[] )
19 {
20 try
21 {
22 connection myConnection( argc > 1 ? argv[1] : "" );
23 transaction<> myTransaction( myConnection, "Get Customer List" );
24 const result & customerList =
25 myTransaction.exec( "SELECT * FROM customers" );
26 widthVector columnWidths( customerList.columns());
27
28 getColumnWidths( customerList, columnWidths );
29 printColumnHdrs( customerList, columnWidths );
30 printColumnVals( customerList, columnWidths );
31
32 myTransaction.commit();
33
34 }
35 catch( ... )
36 {
37 cerr << "Unknown exception caught" << endl;
38 exit( EXIT_FAILURE );
39 }
40
41 exit( EXIT_SUCCESS );
42 }
43

Lines 35 #include the header files required for this application. The only unusual header here is you need that header to use the stream manipulators you'll use to set column widths and justifications. Line 10 defines a shorthand synonym for result::tuple, and line 11 defines a class named widthVector you'll use to collect the maximum required width for each column.

Next you see the forward declarations for the three workhorse functions (in lines 1416). getColumnWidths() walks through a result set, finds the longest value in each column, and stores the required column widths in a widthVector. The printColumnHdrs() function uses the column widths computed by getColumnWidths() and the column names stored in the result to format and print column headers. printColumnVals() prints each value in every tuple in the result set (again, using the column widths computed by getColumnWidths() to format the values into tabular form).

Lines 2143 show the main() function for this application. client3 starts by creating a connection (myConnection) to a server. If you invoke client3 with any command-line arguments, the first argument is assumed to be a connection string and myConnection uses the properties in that string to find the server. If you invoke client3 without any command-line arguments, myConnection uses the built-in default connection properties and environment variables listed in Table 5.2 to connect to the server. If the connection attempt fails, myConnection throws an exception and the program jumps to the catch() clause at lines 3539. (Note: we've omitted most of the error handling code in the application for the sake of clarity.)

When the connection is up and running, client3 creates a transaction<> object named myTRansaction. mytransaction is assigned a descriptive name of "Get Customer List" so any error messages that show up will be a little easier to track down. At line 24 you see a call to myTRansaction.exec()exec() sends the text of the query (SELECT * FROM customers) to the server and waits for the entire result set to accumulate within the customerList object.

When mytransaction.exec() returns, the result set (customerList) is ready for use. At line 26, client3 creates a widthVector, named columnWidths, that holds the maximum width of each column in the result set. Notice that you can create a widthVector with exactly the right number of elements by initializing it with the number of columns returned by customerList.columns().

At lines 2830, client3 calls three helper functions (which I'll show you in a moment) to compute the width of each column, print column headers, and print the data values stored in result. The call to myTRansaction.commit() (see line 32) completes the transaction and commits any changes to the database.

Listing 10.7. client3.cc (Part 2)

44 static void getColumnWidths( const result & res, widthVector & widths )
45 {
46 result::size_type row;
47 tuple::size_type col;
48
49 // Compute width of widest value in each column
50 for( row = 0; row < res.size(); ++row )
51 for( col = 0; col < res.columns(); ++col )
52 if( res[row][col].size() > widths[col] )
53 widths[col] = res[row][col].size();
54
55 // Make room for any column name which happens to be
56 // longer than the widest value in that column
57 for( col = 0; col < res.columns(); ++col )
58 if( strlen( res.column_name( col )) > widths[col] )
59 widths[col] = strlen( res.column_name( col ));
60 }
61

Listing 10.7 shows the getColumnWidths() function. This function expects a (const) reference to a result object and a reference to a widthVector. It finds the longest value in each column in the result and stores the maximum length in the corresponding element in the widthVector. You'll need this vector to print column names and data values in tabular form.

Lines 50 and 51 traverse every row and column in the result set. You could use any of the tuple enumeration methods here: result::at(), result::operator[](), or a pair of iterators. Lines 52 and 53 hunt down the longest string in a given column and store the width in widths[col]. When the loop (lines 5053) completes, widths[n] holds the length of the longest string found in column n.

Lines 5559 walk through the column names stored in the result. If any given column name is longer than the longest value in that column, the widths[] element corresponding to that column is adjusted to reflect the length of the column name. When this loop completes, widths[n] holds the length of the longest string found in column n or the length of the column name, whichever is longer.

Listing 10.8. client3.cc (Part 3)

62 static void printColumnHdrs(const result & res, const widthVector & widths)
63 {
64
65 std::ostream out( cout.rdbuf()); // Construct a new stream so we can
66 // change the format flags and
67 // options without mucking up cout
68
69 for( tuple::size_type col = 0; col < res.columns(); ++col )
70 out << setw( widths[ col ] ) << left << res.column_name( col ) << " ";
71
72 out << endl;
73
74 for( tuple::size_type col = 0; col < res.columns(); ++col )
75 out << setw( widths[ col ] ) << setfill( '-' ) << "" << " ";
76
77 out << endl;
78
79 }
80

Listing 10.8 shows the printColumnHdrs() function. This function prints the name of each column in the given result (res) and then, on the next line, prints a line of separator characters (dashes) to show the width of each column. Here's a sample of the output generated by printColumnHdrs():

id customer_name phone birth_date balance
-- -------------------- -------- ---------- -------

printColumnHdrs() does its work by changing the formatting options of an output stream using stream manipulators. To avoid making permanent changes to the output stream, printColumnHdrs() clones std::cout (see line 65) and applies the necessary changes to the clone. That way, the formatting options for std::cout remain unchanged.

The loop at lines 69 and 70 prints the name of each column found in res. Before writing the column name (res.column_name( col )), printColumnHdrs() calls the setw() stream manipulator to set the field width to match the column width computed earlier by getColumnWidths(). The left manipulator tells the output stream to left-justify the column nameif the column name happens to be shorter than the column width, the name appears at the left side of the column.

The loop at lines 74 and 75 prints a sequence of dashes under each column name. Again, the setw() manipulator adjusts the field width for each column. The setfill() manipulator sets the stream's fill character to -. The fill character is used to pad the next value to the field width defined by setw(). Next, printColumnHdrs() writes an empty string to the output stream; because the empty string has a length of zero, you end up with a string of fill characters of just the right length.

Listing 10.9. client3.cc (Part 4)

 81 static void printColumnVals(const result & res, const widthVector & widths)
 82 {
 83 result::const_iterator i;
 84
 85 for( i = res.begin(); i != res.end(); ++i )
 86 {
 87 const tuple & tup = *i;
 88
 89 for( tuple::size_type col = 0; col < tup.size(); ++col )
 90 {
 91 // Note: because of a bug in some STL implementations,
 92 // setw() doesn't work as expected when followed
 93 // by a std::string so we convert to a const char *
 94 // instead
 95 cout << setw( widths[ col ] ) << tup[col].c_str() << " ";
 96 }
 97
 98 cout << endl;
 99 }
100 }

The printColumnVals() function in Listing 10.9 completes client3. printColumnVals() uses a result::const_iterator to walk through the tuples in the given result set (res). Take a close look at line 87notice that when you de-reference a result::const_iterator, you get a tuple (in this case, you don't need a copy of the tuple so printColumnVals() captures a tuple reference instead). Like printColumnHdrs(), this function changes the field width of the output stream (cout) just before printing each column value. You might be thinking that you could rewrite line 95 to look like this:

cout << setw( widths[ col [ ) << tup[col] << " ";

That should work because libpqxx defines an operator<< function that writes a field directly onto a stream. But, as the code comment says, a bug in some STL implementations (most notably, GNU's libstdc++ library) makes it impossible to set a stream's field width before printing an std::string.

Working with Large-Objects

Most of the tables you create are defined in terms of simple data types. You already know that PostgreSQL provides numeric, textual, date-time, geometric, and logical data types. But what data type should you use to store photographs? Or MP3 audio files?

One answer is a large-object (you might also see the term binary-large-object [BLOB]). A large-object is just an entry in the pg_largeobject system table. PostgreSQL provides a few predefined functions that make working large-objects easy. When you store a large-object in a PostgreSQL database, it's saved in one or more rows in the pg_largeobject table. Each row in pg_largeobject stores a 2KB chunk of your object. How do you store a large-object (say, a photograph) on your own table? You don't. You add a pointer to a large-object insteadthat is, you store the large-object's OID in your table.

A second alternative is the BYTEA data type. A column of type BYTEA can store an arbitrarily sized string of octets (also known as bytes). The BYTEA data type is similar to the VARCHAR data type, but some important differences exist. First, a VARCHAR value cannot hold a character whose value is 0I'm not talking about the character 0 whose value is actually 48 (see http://www.asciitable.com); I mean the character often called NULL. A BYTEA value can hold any 8-bit character. Second, a VARCHAR value is defined in terms of some specific character set (usually US ASCII). This means the collation sequence used when you compare two VARCHAR values might be based on something other than just the numeric value of each byte. When you compare two BYTEA values, the relationship between the two values is determined by comparing the numeric value of each character.

Whether you choose to use the large-object interface or the BYTEA data type depends mostly on how large your data is and what you need to do with it. A BYTEA column can hold up to 1GB; a large-object can hold values larger than 1GB. PostgreSQL provides a few functions that make loading binary data from an external file into a large-object easy. Loading external data into a BYTEA column isn't quite so easy, though. When you insert data into a BYTEA column, you must translate the data into a quoted (also called escaped) form (see Chapter 2, "Working with Data in PostgreSQL"). When you SELECT data from a BYTEA column, it comes back in quoted form, and that's not always easy to work with because you have to parse through the result and unquote it yourself. When you retrieve data from a large-object, you get the same binary data you put into it, but you have to get at the data using some special functions, described in this section.

For more information on the BYTEA data type, refer to Chapter 2. This section describes how to work with large-objects using libpqxx.

Let's say you want to add a picture to the tapes table and for each tape, you want to store a photograph of the box that was shipped with the tape. Currently, the tapes table looks like this:

CREATE TABLE tapes
(
 tape_id character(8),
 title character varying(80)
);

Because you aren't actually storing a photograph in this table (remember that large-objects are stored in the pg_largeobject table), you add a large-object identifier instead. A large-object identifier has a data type of OID. Here's what the new tapes table looks like after adding the row reference:

CREATE TABLE tapes
(
 tape_id character(8),
 title character varying(80),
 photo_id oid
);

It's important to remember that the photo_id column doesn't actually hold a photograph; it holds the address of a row in the pg_largeobjects table.

To store a photo in PostgreSQL, you might use the lo_import() function. lo_import() takes a filename as an argument and returns an OID as a result. Here's an example:

INSERT INTO tapes VALUES
(
 'AA-55892',
 'Casablanca',
 lo_import('/tmp/casablanca.jpg' )
);

The call to lo_import() opens the /tmp/Casablanca.jpg file, imports the contents of that file into the pg_largeobjects table, and returns the OID of the new large-object. You insert the OID into the photo_id column.

After you have a photo in your database, what can you do with it? It doesn't make a lot of sense to SELECT the photo from a text-based client because you would just see a lot of binary garbage.

You could use the lo_export() function to copy a photo back out to the filesystem. For example

SELECT lo_export( photo_id, '/tmp/casa2.jpg' )
WHERE tape_id = 'AA-5892';

libpqxx defines four interrelated classes that let you manipulate large-objects from within your own client applications:

largeobject A largeobject object stores a large-object ID (or loid)
largeobjectaccess Provides low-level access to a large-object
olostream An STL-compatible ostream you can use to write objects into a large-object
ilostream An STL-compatible istream you can use to read object from a large-object

You can create a large-object (that is, create an empty entry in the pg_largeobjects table) using an object of type largeobject or largeobjectaccess. Each class defines a constructor that will create an empty large-object:

largeobject( dbtransaction & transaction );
largeobjectaccess( dbtransaction & transaction,
 openmode mode = ios::in | ios::out );

The mode argument determines what you can do with the largeobjectaccess object. If mode is std::ios::in, you can read data from the large-object into other objects. If mode is std::ios::in, you can write data into the large-object. If mode is std::ios::in | std::ios::out (which is the default value), you can read data from or write data to the large-object.

You can also create a large-object and fill it with a copy of an existing file (such as a photograph or MP3 file) by calling a different constructor:

largeobject( dbtransaction & transaction, std::string fileName );
largeobjectaccess( dbtransaction & transaction, std::string fileName,
 openmode mode = ios::in | ios::out );

When you call one of these constructors, it creates a new large-object, opens the given file, and copies the content of the file into the new large-object. If libpqxx can't open the file you've named, the constructor throws a runtime_error exception.

If you've retrieved a large-object ID from the server, you can convert it into a largeobject object using this constructor:

largeobject( oid largeObjectID );

Or you can convert it into a largeobjectaccess object using this constructor:

largeobjectaccess( dbtransaction & transaction, oid largeObjectID );

After you have an object of type largeobject (or largeobjectaccess), you can retrieve the loid (large-object ID) with the id() member function:

oid largeobject::id();
oid largeobjectaccess::id();

In most cases, you'll store the loid in some other table (the tapes.photo_id column, for example). But you might want to write a large-object into a file. You can do that with the to_file() member functions:

void largeobject::to_file( dbtransaction & transaction, std::string fileName );
void largeobjectaccess::to_file( std::string fileName );

If libpqxx can't write to the file you've named, the constructor throws a runtime_error exception.

At this point, you're probably wondering about the distinction between largeobject and largeobjectaccess. They both provide access to server-side large-objects; they can both import an existing (client-side) file into a large-object; and they can both export a large-object to a client-side file. In fact, that's just about all they have in common. Think of a largeobject as a way to work with large-object identifiers. The largeobject class defines operators you can use to compare two large-object IDs (operator==(), operator!=(), operator<(), and so on). A largeobjectaccess object, on the other hand, lets you manipulate the content of a large-object.

The largeobjectaccess class defines three member functions that write a chunk of data to a large-object:

off_type cwrite( const char data[], size_type byteCount ) throw();
void write( const char data[], size_type byteCount );
void write( const std::string & data );

The cwrite() function writes, at most, byteCount bytes worth of data to the large-object, returning the number of bytes actually written, or -1 if an error occurs. The write() functions copy data into the large-object as well, but they throw a runtime_error exception if something goes wrong.

largeobjectaccess defines two member functions you can use to read data from a large-object:

off_type cread( char destination[], size_type byteCount ) throw();
size_type read( char destination[], size_type byteCount );

cread() reads, at most, byteCount bytes worth of data from the large-object into the buffer pointed to by destination, returning the number of bytes read, or -1 if an error occurs. The read() function does the same except that it throws a runtime_error exception if an error occurs.

Finally, largeobjectaccess provides two member functions you can use to "move around" inside of a large-object:

pos_type cseek( off_type offset, seekdir origin ) throw();
size_type seek ( size_type offset, seekdir origin );

To understand the cseek() and seek() functions, you have to know that a largeob jectaccess object maintains a pointer within the data stored in a large-object. This pointer always starts out by pointing to the first byte in the large-object. When you read data from a large-object, you start reading at the pointer and the pointer moves to a position just past the last byte that you've read. When you write data to a large-object, the data is written starting at the pointer and the pointer moves to a position just past the last byte you write. The cseek() and seek() functions move the pointer. The first argument (offset) specifies how many bytes you want to move. The second argument (origin) specifies the starting point. If origin is ios::cur, the pointer moves by offset bytes; if offset is positive, the pointer moves toward the end of the large-object; and if offset is negative, the pointer moves toward the beginning of the large-object. If origin is ios::beg, the pointer is positioned exactly offset bytes into the large-object (if you want to move to the beginning of the object, call seek( 0, ios::beg )). If origin is ios::end, the pointer is positioned exactly offset bytes before the end of the large-object (if you want to move to the end of the object, call seek( 0, ios::end )).

We said earlier that largeobjectaccess provides low-level access to a large-object. We said that because, when you use a largeobjectaccess, you're dealing with a sequence of raw bytes. libpqxx provides two more classes that provide high-level (that is, object-oriented) access: olostream and ilostream.

Both olostream and ilostream are derived from std::iostream. Therefore, you can use an olostream or ilostream anywhere you can use an std::iostream. To create an olostream or ilostream, you can call any of the following constructors:

olostream( dbtransaction & transaction, largeobject & obj );
olostream( dbtransaction & transaction, oid objectID );
ilostream( dbtransaction & transaction, largeobject & obj );
ilostream( dbtransaction & transaction, oid objectID );

An olostream is a stream you can use to append a C++ object onto a large-object. An ilostream is a stream you can use to create a C++ object from the content of a large-object.

To write an object to an olostream, use the << operator, just like you would with a standard ostream:

...
 myStream << "This string is written to the large object";
 myStream << 10.0F;
 myStream << int( 42 );
...

To read an object from an ilostream, use the >> operator, just like you would with a standard istream:

...
 std::string myString;
 float myFloat;
 int myInt;

 myStream >> myString;
 myStream >> myFloat;
 myStream >> myInt;
...

LISTEN/NOTIFY

Sometimes you want a client application to wait for some server-side event to occur before proceeding. You might, for example, need to write a queuing system that writes a work order into a PostgreSQL table and then expects a client application to carry out that work order. The most obvious way to write a client of this sort is to put your client application to sleep for a few seconds or minutes and then, when your application awakens, check for a new record in the work order table. If the record exists, go do your work and then repeat the whole cycle.

There are two problems with this approach, however. First, your client application can't be very responsive. When a new work order is added, it can take a few seconds or minutes for your client to notice (it's fast asleep after all). Second, your client application might spend a lot of time searching for work orders that don't exist.

PostgreSQL offers a solution to this problem: the LISTEN/NOTIFY mechanism (not to be confused with the notice mechanism you read about earlier in this chapter). A PostgreSQL server can signal client applications that some event has occurred by executing a NOTIFY eventName command. All client applications that are listening for that event are notified that the event has occurred. You get to choose your own event names (something like workOrderReceived might be a good choice). To inform the server that you are interested in an event, execute a LISTEN eventName command; to tell the server you are no longer interested in an event, simply use UNLISTEN eventName.

A libpqxx application can put itself to sleep until an event is signaled.

To tell libpqxx that you are interested in an event, you create an object derived from pqxx::triggerwhich is not related to a server-side procedural-language triggerand attach that object to a connection. A trigger-derived object is very simple to build: You need only one constructor and an operator() function (of course, you'll need a destructor for nontrivial classes).

Here's a simple TRigger that simply displays a message when its event is signaled:

...
class myTrigger : public trigger
{

public:

 explicit myTrigger( connection & conn, string eventName )
 : trigger( conn, eventName )
 {
 }

 virtual void operator()(int backendPid )
 {
 cout << "Notification: " << name() << " received from process "
 << backendPid << endl;
 }

};
...

The constructor is trivialit simply initializes the parent class with a connection and eventName. When libpqxx is notified that event eventName has occurred, it calls the operator() function, passing it the process ID of the server process that generated the event. If you're only interested in events generated by your server process, compare backendPid with the value returned by conn.backendpid().

To create a mytrigger object, just give it a reference to the connection you're using and the name of the event you're interested in, like this:

...
connection conn( argv[1] );
myTrigger trigger( conn, "workOrderReceived" );
...

This code snippet creates a trigger object that fires whenever a workOrderReceived event is signaled.

Here's the final piece of the puzzle. To put your application to sleep until an event occursspecifically, an event you're interested injust call connection::await_notification(). await_notification() won't return until an interesting event is signaled by the server. If you prefer, you can call an alternative form of await_notification():

int connection::await_notification( );
int connection::await_notification( long seconds, long microseconds );

The second form waits no longer than the specified period of time. Both forms return the number of notifications (events) received.

set_variable() and get_variable()

The PostgreSQL server keeps track of a number of configuration parameters that affect the way the server works (see the section titled "Configuring Your PostgreSQL Runtime Environment" in Chapter 21, "PostgreSQL Administration," for a complete list). For example, the search_path variable determines how objects, such as tables, functions, and so on, are located if you don't provide a fully qualified name. search_path defines the list of schemas to search and the order in which they are searched.

The connection and transaction<> classes both provide member functions you can and should use to modify and interrogate these runtime variables. The set_variable() function changes the value of a variable, and get_variable() returns the current value of a variable. The prototype for each function is shown here:

void connection::set_variable(std::string variableName, std::string newValue);
void transaction<>::set_variable(std::string variableName, std::string newValue);
std::string connection::get_variable(std::string variableName);
std::string transaction<>::get_variable(std::string variableName);

In each case, the function throws an sql_error exception if variableName refers to a nonexistent variable. There's an important difference between connection::set_variable() and transaction<>::set_variable(): When you modify a variable with transaction<>::set_variable(), the variable is restored to its original value if your transaction aborts. If, on the other hand, the transaction commits, the variable retains its new value. Variables you modify with connection::set_variable() retain their values for the life of the connection object. If the physical connection to the server is lost and reestablished behind the scenes, connection::set_variable() restores the values you've given to each variable. That's why you should use set_connection() function(s) to modify runtime variables instead of executing SET commands. If you modify a variable with a SET command, libpqxx won't notice and can't restore the desired value if the connection is lost and reestablished.

Part I: General PostgreSQL Use

Introduction to PostgreSQL and SQL

Working with Data in PostgreSQL

PostgreSQL SQL Syntax and Use

Performance

Part II: Programming with PostgreSQL

Introduction to PostgreSQL Programming

Extending PostgreSQL

PL/pgSQL

The PostgreSQL C APIlibpq

A Simpler C APIlibpgeasy

The New PostgreSQL C++ APIlibpqxx

Embedding SQL Commands in C Programsecpg

Using PostgreSQL from an ODBC Client Application

Using PostgreSQL from a Java Client Application

Using PostgreSQL with Perl

Using PostgreSQL with PHP

Using PostgreSQL with Tcl and Tcl/Tk

Using PostgreSQL with Python

Npgsql: The .NET Data Provider

Other Useful Programming Tools

Part III: PostgreSQL Administration

Introduction to PostgreSQL Administration

PostgreSQL Administration

Internationalization and Localization

Security

Replicating PostgreSQL Data with Slony

Contributed Modules

Index