Hack 19. Treat a File As an Array


Pretend a big stream of data on disk is a nice, malleable Perl data structure.

One of the big disappointments in programming is realizing that, although you can think of a text file as a long list of properly terminated lines, to the computer, it's just a big blob of ones and zeroes. If all you need to do is read the lines of a file and process them in order, you're fine. If you have a big file that you can't load into memory and can't process each line in order...well, good luck.

Fortunately, Mark Jason Dominus's Tie::File module exists, and is even in the core as of Perl 5.8.0. What good is it?

The Hack

Imagine you have a million-line CSV file of inventory data from a customer that's just not quite right. You can't import it into a spreadsheet, because that's too much data. You need to do some processing, inserting some lines and rearranging others. Importing the data into a little SQLite database won't work either because trying to get the queries right is too troublesome.

Tie::File won't help you write the rules for transforming lines, but it will take the pain out of manipulating the lines of a file. Just:

use Tie::File; tie my @csv_lines, 'Tie::File', 'big_file.csv'     or die "Cannot open big_file.csv: !$\\n";

Running the Hack

Suppose that your big CSV file contains a list of products and operations. That is, each line is either a list of product data (product id, name, price, supplier, et cetera) or some operation to perform on the previous n products. Operations take the form opname:number. Obviously the file would be easier to process if the operations appeared before the data on which to operate, but you can't always change customer data formats to something sane. In fact, this might be the easiest way to clean the data for other processes.

Tie::File makes this almost trivial:

for my $i ( 0 .. $#csv_lines ) {     next unless my ($op, $num) = $csv_lines[ $i ] =~ /^(\\w+):(\\d+)/;     next unless my $op_sub     = __PACKAGE__->can( 'op_' . $op );     my $start                  = $i - $num;     my $end                    = $i - 1;     my @lines                  = @csv_lines[ $start .. $end ];     my @newlines               = $op_sub->( @lines );     splice @csv_lines, $start, $num + 1, @newlines; }

Okay, there is a bit of cleverness in finding the right range of lines to modify, but consider how much trickier the code would have to be to do this while looping through the file a line at a time.

Of course, you can use all of the standard array manipulation operations (push, pop, shift, unshift, and splice) as necessary.



Perl Hacks
Perl Hacks: Tips & Tools for Programming, Debugging, and Surviving
ISBN: 0596526741
EAN: 2147483647
Year: 2004
Pages: 141

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net