Item 26: Pass references instead of copies.


Two disadvantages of the "plain old" method of subroutine argument passing are that (1) even though you can modify its elements, you can't modify an array or hash argument itself, and (2) copying an array or hash into @_ takes time. Both of these disadvantages can be overcome with references (see Item 30).

Passing arrays and hashes by reference is straightforward:

Print out the contents of an array, with each element prefixed by an index.

 sub print_em {    my $array_ref = shift;    my $i;    foreach (@$array_ref) {      print ++$i, ": $_\n";    }    return;  } 

Argument is an array ref.

Loop through elements in the array and print them.

Find all the elements in a hash that aren't in an array.

 sub minus {    my %hash = %{shift()};    my $array_ref = shift; 

Arguments are a hash ref and an array ref. Make a copy of the hash.

 foreach (@$array_ref) {      delete $hash{$_};    }    \%hash;  } 

Loop through the array and delete hash elements.

Return ref to the hash we just created.

 %h = (    H => 1, He => 2, Li => 3, Be => 4  );  $h_r = minus \%h, [qw(He Li)];  print join " ", %$h_r; 

H 1 Be 4

The second example also returns a reference to a hash. It would be considerably less efficient to return %hash itself. Returning "just" the hash variable actually results in %hash being unwound into a list of key-value pairs, which are then used to construct an entirely new hash. This is much less efficient than returning a reference to the hash that already existsand which will otherwise be destroyed when the subroutine exits. Think of it as recycling.

In the days before references, programmers sometimes resorted to passing typeglobs (see Item 57) when it was necessary to pass an array or hash by reference. Here's an example of using typeglobs to construct a subroutine that takes two arrays by reference (using Perl 5 syntax):

Take two arguments the old-fashioned (and inefficient) way. Note that the arguments are passed as typeglobs ( *a , *b ), not references ( \@a , \@b ).

 sub two_arrays {    local *a1 = shift;    local *a2 = shift;    print "a1[1] is $a1[1]\n";    print "a2[1] is $a2[1]\n";  }  @a = 1..3;  @b = 4..6;  two_arrays *a, *b; 

Create a private a1 and a2 .

When run, prints:

a1[1] is 2

a2[1] is 5

There is no reason to write code like this any more, but if you deal with a lot of legacy code, you may run into something like it.

Using local * on reference arguments

Subroutines that take arguments by reference for speed sometimes lose some of their speed advantage as they continually dereference those arguments. The syntax also becomes distracting and hard to follow. Here's a subroutine that takes two arrays and returns a list made up of the largest elements from the arrays compared pairwise:

Return maximum elements from two arrays, comparing pairwise.

 sub max_v {    my ($a, $b) = @_;    my $n = @$a > @$b ? @$a : @$b;    my @result;    for (my $i = 0; $i < $n; $i++) {      push @result, $$a[$i] > $$b[$i] ?        $$a[$i] : $$b[$i];    }    @result;  } 

Args are array refs.

$n has count of elements.

Compare pairs from @$a and @$b .

Those doubled dollar signs aren't very pretty, are they? One way to get around this problem is to alias variables to the arrays. Assigning a reference to a typeglob has the effect of creating an aliased variable of the type appropriate to the reference:

Return maximum elements from two arrays, comparing pairwise (improved version).

 sub max_v_local {    local (*a, *b) = @_;    my $n = @a > @b ? @a : @b;    my @result;    for (my $i = 0; $i < $n; $i++) {      push @result, $a[$i] > $b[$i] ?        $a[$i] : $b[$i];    }   @result;  } 

Alias the two array ref arguments.

Now we can write @a and @b instead. @a and @b are local to this subroutine.

This subroutine is somewhat easier to read once you get past the somewhat strange -looking first assignment. It will probably execute faster than the first version. When I tested this example, I saw about a 10 percent speed increasenot enormous , but significant.

Passing filehandles

Passing filehandles and dirhandles is a somewhat awkward matter in Perl. In the years B.R. (before references), programmers had to use typeglobs to pass filehandles and dirhandles. Once references were introduced, the FileHandle and DirHandle modules improved the situation somewhat. It also became possible to pass references to typeglobs, which is more efficient than passing "bare" typeglobs. The recently introduced IO module and the so-called *FOO{BAR} (typeglob subscript) syntax have added still more options. Let's look at all of them in brief. The filehandle typeglob looks like this (Perl 5 syntax again):

 sub fh_by_typeglob {    local *FH = shift;    print FH "your message here\n";  }  open FILE, ">temp.txt" or die $!;  fh_by_typeglob *FILE; 

This trusty old mechanism is still in widespread use, because it is well known and it is relatively efficientan extra symbol table lookup or two is insignificant if most of what's going on is I/O, and it does take time to load a module like FileHandle .

The FileHandle module creates objects that can be treated like ordinary scalars:

 use FileHandle;  sub fh_by_FileHandle {    my $fh = shift;    print $fh "your message here\n";  }  $file = new FileHandle "temp.txt", "w";  die "couldn't open: $!" unless $file;  fh_by_FileHandle $file; 

The syntax for passing typeglobs by reference shouldn't look all that surprisingbut notice that you don't have to dereference a globref to use it as a filehandle:

 sub fh_by_globref {    my $fh = shift;    print $fh "your message here\n";  } 

Typeglob ref is OK in the filehandle slot.

 open FILE, ">temp.txt" or die $!;  fh_by_globref \*FILE; 
 

Recent versions of Perl now include the IO classes, which are intended to eventually replace FileHandle , DirHandle , and other earlier I/O classes. IO::File works very much like FileHandle :

 use IO::File;  sub fh_by_IOFile {    my $fh = shift;    print $fh "your message here\n";  }  $file = new IO::File "temp.txt", "w";  die "couldn't open: $!" unless $file;  fh_by_IOFile $file; 

And, finally, recent versions of Perl have a new kind of referencethe ioref . An ioref is a reference to a structure internal to Perl that describes a filehandle and/or dirhandle. You can create an ioref with the *FOO{BAR} syntax and then use it like a filehandle or an object from IO::File :

 sub fh_by_ioref {    my $fh = shift;    print $fh "your message here\n";  } 

Ioref OK in filehandle slot.

 open FILE, ">temp.txt" or die $!;  fh_by_ioref *FILE{IO}; 

Create an ioref from a filehandle.

By now you're probably wondering which of these methods you should use. Unfortunately, this is one of those cases where I can't offer you any firm guidance. I will make a few suggestions, though:

  • IO::File is the wave of the futureuse it if your version of Perl supports it and you don't have a specific reason to avoid it.

  • Creating an ioref from a plain old filehandle using the *FOO{BAR} syntax is probably the most efficient method available in recent versions of Perl.

  • Lots of people are familiar with passing filehandles via typeglobs, and it's not particularly in efficient.

The situation for dirhandles is similar. You can use the DirHandle module, iorefs (e.g., $ioref = *DIRH{IO} ), or a dirhandle typeglob. You can also use the newer IO::Dir class. Which one you choose will depend on the circumstancesthere are no hard and fast rules. However, there will not be too many occasions when you will actually want to pass dirhandles to subroutines. Usually, the pathname is more appropriate, because you can't (not yet, anyway) use a dirhandle in place of a directory pathname:

Recursing through a directory tree

This subroutine counts all the normal files and directories in a directory tree.

 sub count_recurse {    local *DIRH; 

local not required, but cleaner.

 my ($file_ct_ref, $dir_ct_ref,        $dir_name) = @_;    $$dir_ct_ref++; 

Args are refs to scalar counts, directory name .

Count this directory.

 opendir DIRH, $dir_name or      die "couldn't open $dir_name: $!";    my @dir = readdir DIRH;    closedir DIRH; 

Read directory.

Close DIRH before recursing!

 for $file (@dir) {      next if $file eq '.' or $file eq '..';      if (-f "$dir_name/$file") {        $$file_ct_ref++;        next;      };      next unless -d "$dir_name/$file";      next if -l "$dir_name/$file";      count_recurse($file_ct_ref,          $dir_ct_ref, "$dir_name/$file");    }  } 

Loop through filenames and test them.

Count a file.

-d _ would be considerably more efficient (see Item 56 ).

Skip symlinks .

Then recurse.

 $file_ct = $dir_ct = 0;  count_recurse $file_ct, $dir_ct, ".";  print "$file_ct files, $dir_ct dirs\n"; 

Demo it on "."

Looks like a lot of work, doesn't it? If you are trying to traverse a directory tree, you may not need to go to all this troublecheck out the File::Find module first.



Effective Perl Programming. Writing Better Programs with Perl
Effective Perl Programming: Writing Better Programs with Perl
ISBN: 0201419750
EAN: 2147483647
Year: 1996
Pages: 116

Similar book on Amazon

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net