Section 4.3. Object Databases | Advanced Perl Programming

4.3. Object Databases

While the methods we've seen in the previous section work very well for storing and retrieving individual objects, there are times when we want to deal with a massive collection of data with the same degree of efficiency. For instance, our CD collection may run to thousands of objects, while a simple query applicationfor example, to determine which artist recorded a particular trackwould only use one or two of them. In this case, we don't want to load up the whole object store into memory before we run the query.

In fact, what we could really deal with is the kind of fast, efficient indexing and querying that is the hallmark of traditional relational databases such as Oracle or MySQL, but which dealt with objects in the same way as Pixie. We want an object database.

4.3.1. Object Database Pitfalls

There are not many object databases on CPAN, and with good reason: writing object databases is incredibly difficult.

First, you need to worry about how to pick apart individual objects and store them separately, so that you don't end up with the pruning problem.

Second, you have to work out a decent way to index and query objects. Indexing and querying database rows in general is pretty easy, but objects? This is currently one of the areas that holds Pixie back from being an object database.

Allied with that, you need to work out how you're going to map the properties of your object to storage in a sensible way to allow such indexing; serialization-based solutions don't care about what's inside an object, they just write the whole thing into a string.

Fortunately, you don't really have to worry about these things; you can just use some of the existing solutions.

4.3.2. Tangram

Jean-Louis Leroy's Tangram is a mature and flexible but complex solution to mapping Perl objects onto database rows. Tangram is very explicit in terms of what the user must do to make it work. Except when it comes to filters, which we'll look at in a moment, Tangram is very short on DWIM.

For instance, Tangram relies on the user to provide a lot of class information, which it uses to decide how to map the objects onto the database. This gives you much more flexibility about how the database is laid out, but if you don't particularly care about that, it requires you to do a lot of tedious scaffolding work.

To get Tangram up and running on our CD database, we must first define the schema as a Perl data structure. This tells Tangram the classes we're interested in persisting, as well as which attributes to save and what data types they're going to be. Here's the schema for our classes:

     use Tangram;     use Tangram::TimePiece;     use DBI;     use CD;     our $schema = Tangram::Relational->schema({         classes => [             CD => {                 fields => {                     string => [ qw(title) ],                     timepiece => [ qw(publishdate) ],                     iarray  => {                         songs => {                             class => 'CD::Song',                             aggreg => 1,                             back => 'cd',                         },                     },                 },            },            'CD::Song' => {                fields => {                    string => [ qw(name) ],                }            },            'CD::Artist' => {                abstract => 1,                fields => {                    string => [ qw(name popularity) ],                    iset => {                        cds => {                            class => 'CD',                            aggreg => 1,                            back => 'artist'                        },                    },                },            },            'CD::Person' => {                bases  => [ "CD::Artist" ],                fields => {                    string => [ qw(gender haircolor) ],                    timepiece => [ qw(birthdate) ],                },            },            'CD::Band' => {                bases  => [ "CD::Artist" ],                fields => {                    timepiece => [ qw(creationdate enddate) ],                    set => {                        members => {                            class => 'CD::Person',                            table => "artistgroup",                        },                    },                },            },         ]});            $dbh = DBI->connect($data_source,$user,$password);            Tangram::Relational->deploy($schema, $dbh);            $dbh->disconnect();

With the schema built and deployed, we can store, retrieve, and search for objects via Tangram::Storage objects, and for so-called remote objects, which represent a class of objects of a particular type in storage.

4.3.2.1 Tangram CRUD: create, read, update, delete

We can create and insert objects, like so:

     my ($cd, @songs, $band, @people);     my $tom = CD::Band->new         ({ name => "Tom Waits",           popularity => "1",           cds => Set::Object->new           (            $cd =            CD->new({title => "Rain Dogs",                     songs => [                   @songs = map {CD::Song->new({ name => $_ })}                   "Singapore", "Clap Hands", "Cemetary Polka", ...                              ],                   }),           ),         });     # stick it in     my $storage = Tangram::Storage->connect($schema, $data_source, $username, $password);     my $oid = $storage->insert($tom);     my $id = $storage->export_object($tom);

Later, we can retrieve objects either by their object ID, or by class and ID:

     # Object ID     $band = $storage->load($oid);     # Class and ID - polymorphic select     $band = $storage->import_object("CD::Artist", $id);

The import_object method is polymorphic, meaning that it can load the CD::Artist object with ID $id, even though that object is actually a CD::Band object.

However, selecting by storage ID is not enough to get us by. We also need to be able to query objects based on some specification of which objects we want.

With Tangram, you first fetch a remote object, representing a database-side object. In its blank state, this remote object could represent any object in the database of that type. You then write expressions that refer to a subset of those objects with regular Perl operators:

     my $r_artist = $storage->remote("CD::Artist");     my @artists = $storage->select         ( $r_artist,           $r_artist->{name} eq "Tom Waits" );     my $r_cd = $storage->remote("CD");

It may look like that second parameter to select is going to return a single (false) value and the select isn't going to work; however, Tangram is more magical than that. First, the remote object doesn't represent a single artistit represents all the possible artists. Second, $r_artist->{name} returns an overloaded object, and just as we saw in the first chapter, we can use overloading to determine how objects behave in the presence of operators like eq. Here, the Tangram::Storage class overloads all the comparison operators to return Tangram::Filter objects; these objects store up all the comparisons and use them to represent a WHERE statement in the SQL select.

Tangram's query filters are extremely expressive:

     my $join = ($r_cd->{artist} eq $r_artist);     my $query =         ( $r_artist->{name}->upper(  )->like(uc("%beat%"))           | $r_cd->{title}->upper(  )->like(uc("%beat%")) );     my $filter = $join & $query;     my $cursor = $storage->cursor ( $r_cd, $filter );     my @cds=(  );     while ( my $cd = $cursor->current ) {         print("found cd = " ,$cd->title,               ", artist = ", $cd->artist->name, "\n");         $cursor->next;     }

Note that in the above example, we built the query keeping join conditions and query fragments seperate, combining them to pass to the Tangram::Storage function. Tangram uses a single & for AND and a single | for OR (see Tangram::Expr). We also used a Tangram::Cursor to iterate over the returned results, rather than slurping them all in at once. Finally, the CD::Artist object corresponding to each CD object is fetched via a back-reference.

A back-reference is an example of a third method of traversing a Tangram stored object structure: through the relationships of the object. Tangram ships with seven types of object relationship classes: many-to-one relationships (references), one-to-many relationships (intrusive or foreign key relationships, with three variants: Sets, Arrays, and Hashes), as well as many-to-many relationships (relationships connected via a link tableagain with three variants of Set, Array, and Hash).

So, once we have the @artists, we can retrieve the associated information just by following the Perl object structure. This is implemented via on-demand storage references.

     @cds = $artists[0]->cds->members;  # Set::Object     my @tracks = @{ $cds->[0]->songs };   # Array

So, we've covered create and readwhat about updates? Updates are performed by $storage->update:

     my ($pfloyd) = $storage->select         ( $r_artist,           $r_artist->{name} eq "Pink Floyd" );     $cd;     $pfloyd->cds->insert         ($cd=          CD->new({ title => "The Dark Side of The Moon",                    publishdate => Time::Piece->strptime("2000-04-06", "%y-%m-%d"),                    songs => [ map { CD::Song->new({ name => $_ }) }                               "Speak To Me/Breathe", "On The Run",                             "Time", "The Great Gig in the Sky",                               "Money", "Us And Them",                               "Any Colour You Like", "Brain Damage",                             "Eclipse",                           ],                  })         );     $pfloyd->popularity("legendary");     $storage->update($pfloyd);     $storage->id($cd);

So far we've demonstrated three points about Tangram's update facilities. The final aspect of Tangram's CRUDdeleting objectsis done with $storage->erase( ):

         my (@gonners) = $storage->select             ($r_artist,              $r_artist->{popularity} eq "one hit wonder");         $storage->erase(@gonners);

Tangram has excellent transaction support, mature object caching abilities, functions to deal with short-term dirty read problems, and the orthogonal ability to perform schema migration using two database handles. Its debugging output, selected with the environment variable TANGRAM_TRACE or by setting the Perl variable $Tangram::TRACE to a filehandle, provides a clear picture of what queries are being run by your program.

Its major downsides are that it does not support partially reading objects (only complete rows), it cannot easily be queried with raw SQL expressions, and it does not deal with indexing (the assumption being that the database administrator can set up appropriate indexes, or that creating such indexes happens independently of the normal schema deployment).