Hack 95. Modify Semantics with a Source Filter


Tweak Perl's behavior at the syntactic level.

In addition to adding new syntax [Hack #94], source code filters can change the behavior of existing Perl constructs. For example, a common complaint about Perl is that you cannot indent a heredoc properly. Instead you have to write something messed-up like:

sub usage {     if ($::VERBOSE)     {         print <<"END_USAGE"; Usage: $0 [options] <infile> <outfile> Options:     -z       Zero tolerance on formatting errors     -o       Output overview only     -d       Debugging mode END_USAGE     } }

rather than something tidily indented like:

sub usage {     if ($::VERBOSE)     {         print <<"END_USAGE";             Usage: $0 [options] <infile> <outfile>             Options:                 -z       Zero tolerance on formatting errors                 -o       Output overview only                 -d       Debugging mode             END_USAGE     } }

Except, of course, you can have your heredoc and indent it too. You just need to filter out the unacceptable indentation before the code reaches the compiler. This is another job for source filters.

The Hack

Suppose that you could use the starting column of a heredoc's terminator to indicate the left margin of each line of the preceding heredoc content. In other words, what if you could indent every line in the heredoc by the same amount as the final terminator marker? If that were the case, then the previous example would work as expected, printing:

$ ksv -z filename Usage: ksv [options] <infile> <outfile> Options:  -z       Zero tolerance on formatting errors  -o       Output overview only  -d       Debugging mode

with the start of each line hard against the left margin.

To make that happen in real life, you need a source filter that recognizes indented heredocs and rewrites them as unindented heredocs before they reach the compiler. Here's a module that provides just that:

package Heredoc::Indenting; use Filter::Simple; FILTER {     # Find all instances of...     1 while         s{ <<                     #     Heredoc marker            ( ['"]             )   # $1: Quote for terminator            ( (?:\\\\\\1|[^\\n])*? )   # $2: Terminator specification              \\1                   #     Matching closing quote            ( [^\\n]*  \\n       )   # $3: The rest of the statement line            ( .*? \\n           )   # $4: The heredoc contents            ( [^\\S\\n]*         )   # $5: Any whitespace indent before...              \\2 \\n                #     ...the terminator itself         }         # ... and replace it with the same heredoc, with its terminator         # outdented and the heredoc contents passed through a subroutine         # that removes the indent from each line...         {Try::outdent(q{$1$2$1}, '$5',<< $1$2$1)\\n$4$2\\n$3}xms; }; use Carp; # Remove indentations from a string... sub outdent {     my ($name, $indentation, $string) = @_;     # Complain if any line doesn't have the specified indentation...     if ($string =~ m/^((?:.*\\n)*?)(?!$indentation)(.*\\S.*)\\n/m)     {         my ($good_lines, $bad_line) = ($1, $2);         my $bad_line_pos = 1 + ($good_lines =~ tr/\\n/\\n/);         croak "Negative indentation on line $bad_line_pos ",               "of <<$name heredoc specified";     }     # Otherwise remove the indentations from each line...     $string =~ s/^$indentation//gm;     return $string; } 1;

The FILTER {...} block tells Filter::Simple how to filter any code that uses the Heredoc::Indenting module. The code comes in in the $_ variable and the block then uses a repeated regex substitution to replace each outdented heredoc with a regular left-justified heredoc.

The regex is complex because it has to break a heredoc up into: introducer, quoted terminator specification, remainder of statement, heredoc contents, terminator indent, and terminator. The replacement is complex too, as it reorders those components as: outdenter function, introducer, quoted terminator specification, heredoc contents, terminator, and remainder of statement.

This reordering also explains why the FILTER block uses 1 while s/.../.../ instead of s/.../.../g. Using the /g flag doesn't allow for overlapping matches, which would cause the substitution to skip over the rewritten remainder of statement component. The remainder of the statement might contain another indented heredoc however, which would then process incorrectly. In contrast, the 1 while... form rematches the partially rewritten source code from the start, so it correctly handles multiple heredocs on the same line.

There's a cunning layout trick used here. Because each heredoc is rewritten as a (modified) heredoc, on the second iteration of the 1 while, the first heredoc it will find is the one it just rewrote, so the substitution is in danger of reprocessing and re-reprocessing and re-re-reprocessing that very first heredoc ad infinitum. To avoid that, the module requires that indented heredocs have no space between their << introducer and their terminator specification, like so:

print <<"END_USAGE";

Then it carefully rewrites each heredoc so that it does have a space between those two components:

{Try::outdent(q{$1$2$1}, '$5',<< $1$2$1)\\n$4$2\\n$3}xms; #                               ^ #                               |             

That way, the next time the iterated substitution matches against the source code, it will ignore any already-rewritten heredocs and move on to the first unrewritten one instead.

Each heredoc is rewritten to pass through the try::outdent( ) subroutine at runtime. This subroutine removes the specified indentation (passed as its second argument) from the heredoc text, checking for invalid indentations as it does so.

Hacking the Hack

As an alternative, the FILTER block itself could run the heredoc contents through outdent( ) as it rewrites them. To do that, the second half of the substitution would look instead like:

{"<< $1$2$1\\n" . Try::outdent($1.$2.$1, $5, $4) . "$2\\n$3"}exms;

with the /e flag allowing you to specify the replacement as an expression to be evaluated, rather than as a simple string.

The advantage of this second version of the filter is that the outdenting of each heredoc now occurs only once, at compile time during the original source filtering, rather than every time perl encounters the heredoc at run-time. The disadvantage is that Perl will report any errors during the outdenting as occurring at the use Heredoc::Indenting line, rather than in the correct position of the heredoc in the source code. Although that's entirely accuratethey are occurring during the loading of the filtering moduleit's not very useful to users of the module, who really want to know where their heredocs are broken, not where your module detected the breakage.



Perl Hacks
Perl Hacks: Tips & Tools for Programming, Debugging, and Surviving
ISBN: 0596526741
EAN: 2147483647
Year: 2004
Pages: 141

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net