Chapter 3. Data MungingHacks 19-27
Perl has always been in love with data. No matter where you find it, Perl happily processes and
Perl can be gentle, too. A little subtlety, a little style and finesse, and you can write
Sure, slinging data between sources sounds about as glamorous as slinging hash at the local diner, but it doesn't have to be that way. Here are several ideas to munge that yummy data with all of the
|
Hack 19. Treat a File As an Array
Pretend a big stream of data on disk is a nice, malleable Perl data structure.
One of the big disappointments in programming is
Fortunately, Mark Jason Dominus's
Tie::File
module exists, and is even in the
The Hack
Imagine you have a million-line CSV file of inventory data from a customer that's just not quite right. You can't import it into a spreadsheet, because that's too much data. You need to do some processing, inserting some lines and rearranging others. Importing the data into a little SQLite database won't work either because trying to get the queries right is too
Tie::File won't help you write the rules for transforming lines, but it will take the pain out of manipulating the lines of a file. Just:
use Tie::File;
tie my @csv_lines, 'Tie::File', 'big_file.csv'
or die "Cannot open big_file.csv: !$\n";
Running the Hack
Suppose that your big CSV file contains a list of products and operations. That is, each line is either a list of product data (product id,
Tie::File makes this almost trivial:
for my $i ( 0 .. $#csv_lines )
{
next unless my ($op, $num) = $csv_lines[ $i ] =~ /^(\w+):(\d+)/;
next unless my $op_sub = __PACKAGE__->can( 'op_' . $op );
my $start = $i - $num;
my $end = $i - 1;
my @lines = @csv_lines[ $start .. $end ];
my @newlines = $op_sub->( @lines );
splice @csv_lines, $start, $num + 1, @newlines;
}
Okay, there
is
a bit of
Of course, you can use all of the standard array manipulation operations ( push , pop , shift , unshift , and splice ) as necessary. |