The libpgtcl Large-Object API | PostgreSQL (2nd Edition)

The libpgtcl library provides a number of procedures that you can use to interact with PostgreSQL large-objects. A large-object is a value that is stored indirectly. When you create a column that will contain a large-object, the column should be of type OID (object-id). When you import a large-object into your database, the bits that make up the object are stored in the pg_largeobject system table and a reference is stored in your table. Large-objects are typically used to hold images, sound files, or large pieces of unstructured data.

There are two ways to create a large-object. First, you can create a large-object using the pg_lo_creat procedure. pg_lo_creat creates a new (empty) entry in the pg_largeobject table and returns the OID of that entry. After you have an empty large-object, you can write data into it using pg_lo_write .

Second, you can import an existing file (such as a JPEG-encoded photograph) into a database using pg_lo_import . The pg_lo_import manual page says that pg_lo_import requires two parameters (a connection handle and a filename) and returns nothing. That documentation is incorrect: pg_lo_import returns the OID of the new large-object.

Here is a code snippet that shows how to use the pg_lo_import procedure:

 ... pg_result [pg_exec $conn "BEGIN WORK"] -clear # Start a transaction set large_object_oid [pg_lo_import $conn "/images/happyface.jpg"] pg_result [pg_exec $conn "COMMIT WORK"] -clear ...

Note that you must call pg_lo_import within a transaction block. In fact, all large-object operations must occur within a transaction block.

The inverse of pg_lo_import is pg_lo_export . pg_lo_export copies a large-object into a file:

 ... pg_result [pg_exec $conn "BEGIN WORK"] -clear # Start a transaction pg_log_export $conn $large_object_oid "/images/jocularface.jpg" pg_result [pg_exec $conn "COMMIT WORK"] -clear ...

Like pg_lo_import , pg_lo_export must be called within a transaction block. You can also read the contents of a large-object using libpgtcl. To start with, you must open the desired large-object using pg_lo_open :

 ... set fd [pg_lo_open $conn $large_object_oid "rw"] ...

When you call pg_lo_open , you provide a connection handle, the OID of the large-object that you want, and an access mode. libpgtcl is a little fickle when it comes to large-object access modes; pg_lo_open expects "r" , "w" , or "rw" ; but pg_lo_create expects "INV_READ" , "INV_WRITE" , or "INV_READINV_WRITE" . The value returned by pg_lo_open is a large-object handle; and after you have one of those, you can read from, write to, or move around in the large-object.

First, let's talk about positioning within a large-object. Large-objects can be, well, large. Your application may not need to read (or write) an entire large-object all at once; for really big large-objects, you may want to work with small chunks . To make this possible, libpgtcl lets you seek your large-object handle to the part that you are interested in; then, you can read or write from there.

The pg_lo_lseek procedure is modeled after the Unix lseek () function. pg_lo_lseek requires three parameters:

 pg_lo_lseek  connection-handle large-object-handle offset starting-point

The connection-handle and large-object-handle parameters are self-explanatory. offset specifies the number of bytes you want to move. starting-point specifies which position you want to move from . SEEK_CUR means that you want to move offset bytes relative to the current position. SEEK_SET means that you want to move offset bytes relative to the start of the object. SEEK_END will position your offset bytes from the end of the object.

If you specify a starting-point of SEEK_CUR or SEEK_END , offset can be either positive or negative (a negative offset moves you toward the beginning of the object). With SEEK_SET , offset should always be a non-negative number. A starting-point of SEEK_SET and an offset of 0 position you to the beginning of the object. A starting-point of SEEK_END and an offset of 0 position you to the end of the object. If you specify a starting-point of SEEK_CUR and an offset of 0, your position within the object remains unchanged.

The pg_lo_tell procedure returns your current position within an object. pg_lo_tell requires two parameters:

 set current_offset [pg_lo_tell  connection-handle large-object-handle  ]

You can determine the number of bytes in a large-object by seeking to the end of the object and then finding the offset:

 ... pg_lo_lseek $conn $object_handle 0 SEEK_CUR set object_size [pg_lo_tell $conn $object_handle] ...

After you have established a position within a large-object, you can read from or write to the object. To write (or modify) data in a large-object, use the pg_lo_write procedure:

 pg_lo_write  connection-handle large-object-handle string length

For example, if you want to append a file onto an existing large-object, you would write code similar to this:

 ... pg_exec $conn "BEGIN" set fd [open "/images/sadface.jpg"] set object_handle [pg_lo_open $conn $large_object_oid "rw"] pg_lo_lseek $conn $object_handle 0 SEEK_END while { [eof $fd] != 1 } {   set val [read $fd 1000]   pg_lo_write $conn $object_handle val [string length $val] } close $fd pg_lo_close $object_handle pg_exec $conn "COMMIT" ...

After opening the file and the large-object, seek to the end of the large-object and then copy from the file handle to the large-object handle, 1000 bytes at a time. We've also called pg_lo_close to close the large-object handle.

When you write to a large-object, you can create holes in the data. For example, if you start out with an empty large-object and then seek 100 bytes into it before calling pg_lo_write , you are creating a 100-byte hole at the beginning of the large-object. Holes are treated as if they contain zeroes. In other words, when you read back this particular large-object, the first 100 bytes will contain nothing but zeroes ^[5] .

^[5] In case you are wondering, PostgreSQL stores each large-object in a collection of blocks. Each block is typically 2048 bytes long. When you create a hole in a large-object, PostgreSQL will store the minimal number of blocks required to hold the object. If a block within a large-object contains nothing but a hole, it will not take up any physical space in the pg_largeobject table.

You can also read from a large-object in a piece-by-piece manner using pg_lo_lseek and pg_lo_read :

 ... pg_exec $conn "BEGIN" set object_handle [pg_lo_open $conn $large_object_oid "r"] pg_lo_lseek $conn $object_handle 0 SEEK_END set len [pg_tell $conn $object_handle] pg_lo_lseek $conn $object_handle 0 SEEK_SET pg_lo_read $conn $object_handle img $len image create photo my_photo my_photo put $img -format gif pg_lo_close $object_handle pg_exec $conn "COMMIT" ...

As before, you must start a transaction block before using any of the large-object procedures. After opening the large-object (using pg_lo_open ), compute the size of the object. The easiest way to find the size of an existing large-object is to seek to the end and then use pg_lo_tell to find the offset of the last byte. After you know the size, you can read the entire object into a string variable using pg_lo_read . In the preceding example, we read the entire large-object in one call to pg_lo_read , but that is not strictly necessary. You can use pg_lo_lseek to move around within the large-object before you read (or write).

One important point here: When you call pg_lo_read (or pg_lo_write ), your position within the object is advanced by the number of bytes read (or written).

The pg_lo_read procedure requires four parameters:

 pg_lo_read  connection-handle object-handle varname length

The connection-handle and object-handle parameters should be familiar by now. The varname parameter should contain the name of a variable ”be careful with this parameter: You don't want to pass the contents of a variable; you want to pass the name. So, the following example will usually be incorrect:

 pg_lo_read $conn $object_handle $img $len

This is likely to be wrong because you are passing the contents of the $img variable, not the name. You most likely want ^[6]

^[6] The only time you would want to pass the value of a variable (as the third parameter) would be when one variable holds the name of another.

 pg_lo_read $conn $object_handle img $len

There is one more large-object procedure that you might need to know about. If you want to remove a large-object from your database, use the pg_lo_unlink procedure:

 pg_unlink $conn $large_object_id