Recipe 14.5 Sorting Large DBM Files

14.5.1 Problem

You want to process a large dataset you'd like to commit to a DBM file in a particular order.

14.5.2 Solution

Use the DB_File's B-tree bindings and supply a comparison function of your own devising:

use DB_File; # specify the Perl sub to do key comparison using the # exported $DB_BTREE hash reference $DB_BTREE->{'compare'} = sub {     my ($key1, $key2) = @_ ;     "\L$key1" cmp "\L$key2" ; }; tie(%hash, "DB_File", $filename, O_RDWR|O_CREAT, 0666, $DB_BTREE)     or die "can't tie $filename: $!";

14.5.3 Description

An annoyance of hashes, whether in memory or as DBM files, is that they do not maintain proper ordering. The CPAN module Tie::IxHash can make a regular hash in memory maintain its insertion order, but that doesn't help for DBM databases or arbitrary sorting criteria.

The DB_File module supports a nice solution to this using a B-tree implementation. One advantage of a B-tree over a regular DBM hash is its ordering. When the user defines a comparison function, all calls to keys, values, and each are automatically ordered. For example, Example 14-3 is a program that maintains a hash whose keys will always be sorted case-insensitively.

Example 14-3. sortdemo
  #!/usr/bin/perl   #    sortdemo - show auto dbm sorting   use strict;   use DB_File;      $DB_BTREE->{'compare'} = sub {       my ($key1, $key2) = @_ ;       "\L$key1" cmp "\L$key2" ;   };      my %hash;   my $filename = '/tmp/sorthash.db';   tie(%hash, "DB_File", $filename, O_RDWR|O_CREAT, 0666, $DB_BTREE)       or die "can't tie $filename: $!";      my $i = 0;   for my $word (qw(Can't you go camp down by Gibraltar)) {       $hash{$word} = ++$i;   }      while (my($word, $number) = each %hash) {       printf "%-12s %d\n", $word, $number;   }

By default, the entries in a B-tree DB_File database are stored alphabetically. Here, though, we provide a case-insensitive comparison function, so using each to fetch all the keys would show:

by           6 camp         4 Can't        1 down         5 Gibraltar    7 go           3 you          2

This sorting property on hashes is so convenient that it's worth using even without a permanent database. If you pass undef where the filename is expected on the tie, DB_File will create a file in /tmp and then immediately unlink it, giving an anonymous database:

tie(%hash, "DB_File", undef, O_RDWR|O_CREAT, 0666, $DB_BTREE)         or die "can't tie: $!";

Remember these two things if you supply a comparison for your BTREE database. One, the new compare function must be specified when you create the database. Two, you cannot change the ordering once the database has been created; you must use the same compare function every time you access the database.

Using BTREE databases under DB_File also permits duplicate or partial keys. See its documentation for examples.

14.5.4 See Also

Recipe 5.7



Perl Cookbook
Perl Cookbook, Second Edition
ISBN: 0596003137
EAN: 2147483647
Year: 2003
Pages: 501

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net