Section 10.1. Obfuscation | Advanced Perl Programming

10.1. Obfuscation

Detractors of Perl will invariably say something about it looking like line noise; they point to wonderfully obvious, but not necessarily friendly, constructions like @{$_[0]||[ ]} as examples of how ugly Perl can turn out. (However, put the same detractors in front of a COBOL program and they'll probably complain about it being too verboseyou can't please some people.) At any rate, the reputation Perl has achieved for being incomprehensible is largely due to the recreational activities of the Obfuscated Perl Contest^[*] and Perl Golf competitions. (Thanks, guys!)

^[*] The Obfuscated Perl Contest was run by The Perl Journal from 1996 until 2000 and took its inspiration from the International Obfuscated C Contest (http://www.iocc.org/), although for some reason most people don't think of C as looking like line noise.

But obfuscation is not only about producing code that looks like line noise. Obfuscations are really an outlet for creative impulses thatas professional programmerswe can't always use in our day jobs. Working on an obfuscation, we exchange the various operational constraints of work for another set of constraints: aestethics, cleverness, shortness, etc. The fact that the program prints "Just another Perl hacker" or does whatever the rules for the golf hole said is only a side effect.

As an example, we're going to take on a challenge posed on the Belfast Perl Mongers mailing list: write a program to solve the game of Boggle?^[*] as quickly as possible. I wrote a pretty quick algorithm but decided to obfuscate it before entering. Here's the idea I had: first, give the dice coordinates from (1,1) to (4,4). Now we construct a matrix of each die's neighbor:

^[*] Boggle™ is a word game that uses a grid of lettered dice; you have to find find words by tracing paths through adjacent letters, without using the same die twice. So in the grid on this page, you can start at p and trace out peril, but you can't have piper because that would require going over the p twice.

     my @neighbors=     (undef,[undef,[[1,2],[2,1],[2,2]],[[1,1],[1,3],[2,1],[2,2],[2,3]],[[1,     2],[1,4],[2,2],[2,3],[2,4]],[[1,3],[2,3],[2,4]]],[undef,[[1,1],[1,2],[     2,2],[3,1],[3,2]],[[1,1],[1,2],[1,3],[2,1],[2,3],[3,1],[3,2],[3,3]],[[     1,2],[1,3],[1,4],[2,2],[2,4],[3,2],[3,3],[3,4]],[[1,3],[1,4],[2,3],[3,     3],[3,4]]],[undef,[[2,1],[2,2],[3,2],[4,1],[4,2]],[[2,1],[2,2],[2,3],[     3,1],[3,3],[4,1],[4,2],[4,3]],[[2,2],[2,3],[2,4],[3,2],[3,4],[4,2],[4,     3],[4,4]],[[2,3],[2,4],[3,3],[4,3],[4,4]]],[undef,[[3,1],[3,2],[4,2]],     [[3,1],[3,2],[3,3],[4,1],[4,3]],[[3,2],[3,3],[3,4],[4,2],[4,4]],[[3,3]

This tells us that, for instance, the top left die (1,1) has neighbors (1,2), (2,1), and (2,2). Naturally, I used a small Perl program to pre-compute this for speed.

Next, we need to read the board from standard input and also keep two pieces of information about itfirst, we want to know what letters we have on the board in total, and also we want to have a hash that looks up dice by the letter on them. That's to say, given the following board:

     iane     nvpo     oire     ewlg

we can look up "e" and get (1,4), (3,4), and (4.1).

     for my $line (1..4) {       chop(my $row = <>);       my @row = split //, $row;       $has.=$row;       push @{$where{ $row[$_] }}, [$line, $_] for 1..4;     }

(We used one-based arrays to simplify the generation of the @neighbors variabledetecting that $neighbors[0] is undef is easier than having to check that we don't accidentally get an inaccurate answer by looking at $neighbors[-1].)

Next, we read in a word list and determine whether we can find the word in the Boggle grid. You might think it would be better to look through the Boggle grid and see what words you can make, as a human would, but this turns out to involve nearly half a million possibilities, whereas we can get any word list down smaller than that.

The key to this is in noting that we're only interested in words made up of letters that can actually be found on our grid; that's the $has variable:

     while (<>) {         chomp;         next unless /^[$has]{3,9}$/o;         my @stuff = split //, (my $word=$_);

We then locate the first letter of the word in our %where hash, giving us all possible starting points. The path subroutine takes a position, a hash of the positions already visited (so we don't go over the same letter twice) and the letters left to find, and returns a true value if there's a path that traces out the letters:

         for (@{$where{shift @stuff}}) {          print $word            if path($_, { $_->[0] . $_->[1] => 1 }, @stuff);         }

As you might be able to guess, path is a recursive subroutine; it looks at the available positions of the next letter to be found, checks that they're neighbors of the current position and that they're not in the history, and then does the same for the next letter to find.

     sub path {         my ($pos, $history, @left) = @_;         my @neigh = @{ $neighbors[ $pos->[0] ][ $pos->[1] ] };         for my $newpos (@{ $where{ shift @left } }) {             next if (!scalar grep { $newpos->[0] =  = $_->[0] &&                                     $newpos->[1] =  = $_->[1] } @neigh)                  || $history->{ $newpos->[0] . $newpos->[1] }++;             return !@left || path($newpos, $history, @left);             }     }

OK, we've spent a long time explaining the algorithm. Let's go around obfuscating it. First, we look at variable and subroutine names. You'll get no points for having descriptive naming, and brevity is a strong element of obfuscation. Let's replace @neighbors with @n, $history with $h, $nextpos with $n (note the useful distinction between @n and $n), and so on. We also get rid of obvious extraneous whitespace to give us:

     #!/usr/bin/perl -l     my@n=     (undef,[undef,[[1,2],[2,1],[2,2]],[[1,1],[1,3],[2,1],[2,2],[2,3]],[[1,     2],[1,4],[2,2],[2,3],[2,4]],[[1,3],[2,3],[2,4]]],[undef,[[1,1],[1,2],[     2,2],[3,1],[3,2]],[[1,1],[1,2],[1,3],[2,1],[2,3],[3,1],[3,2],[3,3]],[[     1,2],[1,3],[1,4],[2,2],[2,4],[3,2],[3,3],[3,4]],[[1,3],[1,4],[2,3],[3,     3],[3,4]]],[undef,[[2,1],[2,2],[3,2],[4,1],[4,2]],[[2,1],[2,2],[2,3],[     3,1],[3,3],[4,1],[4,2],[4,3]],[[2,2],[2,3],[2,4],[3,2],[3,4],[4,2],[4,     3],[4,4]],[[2,3],[2,4],[3,3],[4,3],[4,4]]],[undef,[[3,1],[3,2],[4,2]],     [[3,1],[3,2],[3,3],[4,1],[4,3]],[[3,2],[3,3],[3,4],[4,2],[4,4]],[[3,3]     ,[3,4],[4,3]]]);     for my $l(1..4){chop(my $r = <>); my @r = split //, $r; $h.=$r;     push @{$w{$r[$_] }}, [$l, $_] for 1..4;}     while (<>) {chomp; next unless /^[$h]{3,9}$/o; my @s = split //, (my     $w=$_);p($_,{$_->[0].$_->[1] => 1},@s) and print $w for @{$w{shift @s}};}     sub p { my ($p, $h, @l) = @_; my @n2 = @{$n[$p->[0]][$p->[1]]};for my     $n (@{$w{shift @l}}) {         next if (!scalar grep {$n->[0]=  =$_->[0]&&$n->[1]=  =$_->[1]} @n2)     ||$h->{$n->[0].$n->[1]}++; return !@l||p($n,$h,@l);}}

Well, this is still just about comprehensible, so we need to add a few more touches. For starters, Perl is amazingly tolerant of whitespace. Not only can we get rid of more whitespace where it's not needed (such as, for instance, all the whitespace in my @r = split //, $r) but we can add whitespace where it might not be expectedfor instance, a pleasing addition is a newline between the $ and r of a variable. We can also trim the code by replacing those undefs with zeros, and swapping a few control structures aroundX&&next instead of next if X, for instance. And, of course, no obfuscated program is complete without being formed into a nice regular shape:

     my@n=(0,[0,[[2,2],[2,1],[1,2]],[[2,3],[2,2],[2,1],[1,3]     ,[1,1]],[[2,4],[2,3],[2,2],[1,4],[1,2]],[[2,4],[2,3],[1     ,3]]],[0,[[3,2],[3,1],[2,2],[1,2],[1,1],[1,2]],[[3,3],[     3,2],[3,1],[2,3],[2,1],[1,3],[1,2],[1,3]],[[3,4],[3,3],     [3,2],[2,4],[2,2],[1,4],[1,3],[1,2]],[[3,4],[3,3],[2,3]     ,[1,4]]],[0,[[4,2],[4,1],[3,2],[2,2],[2,1],[2,2]],[[4,3     ],[4,2],[4,1],[3,3],[3,1],[2,3],[2,2],[2,3]],[[4,4],[4,     3],[4,2],[3,4],[3,2],[2,4],[2,3],[2,4]],[[4,4],[4,3],[3     ,3],[2,4]]],[0,[[4,2],[3,2],[3,1],[3,2]],[[4,3],[4,1],[     3,3],[3,2],[3,3]],[[4,4],[4,2],[3,4],[3,3],[3,4]],[[4,3     ],[3,4]]]);my@b=[  ];my($h,%w);for my $l(0,1,2,3){chop(my     $r=<>);my@r=split//,$r;push@b,[0,@r];push@{$w{$r[$_]}},     [$l,$_]for 1..4;$h.=$r}while (<>){/^[$h]{3,9}$/o||next;     chomp;my@s=split//,(my$w=$_);p($_,{$_->[0].$_->[1]=>1},     @s)&&print$w."\n"for@{$w{shift@s}}}sub p{my($p,$h,@l)=@     _;my@z=@{$n[$p->[0]][$p->[1]]};for my$n(@{$w{shift@l}})     {next if(!scalar grep{$n->[0]=  =$_->[0]&&$n->[1]=  =$_->[1     ]}@z)||$h->{$n->[0].$n->[1]}++;return!@l||p($n,$h,@l)}}

It was at this point that we tripped over a Perl 5.6.0 parser buffer overflow:

         syntax error at boggle.pl.old line 14, near "5{"         Unrecognized character \x02 at boggle.pl.old line 14.

which suggested that this was obfuscated enough.

But I am but an amateur at the obfuscation game; for instance, it's possible to determine what my algorithm does by inspectionit doesn't do anything interesting like rewrite itself on the fly or redefine subroutines at runtime. I don't make use of the interesting properties of Perl's special variables, such as the wonderful discovery that $} (and hence @}, %}, and *}) is a legal variable name that happens to be unused by the system.

For a piece of code that uses all these tricks and more, check out Damian Conway's SelfGOL. This program, if we can call it such, is just more than 1,000 bytes of pure, unadulterated evil. Instead of entering a different program for each category of the Obfuscated Perl Contest, Damian entered "SelfGOL" to all four.

SelfGOL can reproduce itself; it can turn other programs into a quine; it can display a scrolling banner; it plays the Game of Life; and it contains no (ordinary) loops, goto statements, or if statements. Control flow is done, well, interestingly. We reproduce it here in its entirety without comment, as it takes Damian three to seven hours to explain it:

     #!/usr/bin/perl -s     $;=$/;seek+DATA,undef$/,!$s;$_=<DATA>;$s&&print||(*{q;::\;     ;}=sub{$d=$d-1?$d:$0;s;';\t#$d#;,$_})&&$g&&do{$y=($x||=20)*($y||8);sub     i{sleep&f}sub'p{print$;x$=,join$;,$b=~/.{$x}/g,$;}sub'f{pop||1}sub'n{substr($b,     &f%$y,3)=~tr,O,O,}sub'g{@_[~~@_]=@_;--($f=&f);$m=substr($b,&f,1);($w,$w,$m,O)     [n($f-$x)+n($x+$f)-(${m}eq+O=>)+n$f]||$w}$w="\40";$b=join'',@ARGV?<>:$_,$w     x$y;$b=~s).)$&=~/\w/?O:$w)gse;substr($b,$y)=q++;$g='$i=0;$i?$b:$c=$b;     substr+$c,$i,1,g$i;$g=~s?\d+?($&+1)%$y?e;$i-$y+1?eval$g:do{$b=$c;p;i}';     sub'e{eval$g;&e};e}||eval||die+No.$;     _ _DATA_ _     $d&&do{{$^W=$|;*_=sub{$=+s=#([A-z])(.*)#=#$+$1#=g}}     @s=(q[$_=sprintf+pop@s,@s],";\n"->($_=q[     $d&&do{{$^W=$|;*_=sub{$=+s=#([A-z])(.*)#=#$+$1#=g}}'     @s=(q[%s],q[%s])x2;%s;print"\n"x&_,$_;i;eval};     ]))x2;$_=sprintf+pop@s,@s;print"\n"x&_,$_;i;eval};$/=$y;$"=",";print     q<#!/usr/bin/perl -sw     !$s?{do{>.($_=<>).q<}:do{@s=(q[printf+pop@s,@s],q[#!/usr/bin/perl -sw     !$s?{do{>.(s$%$%%$g,y=[=  =  =  =y=]=  =||&d,$_).q<}:do{@s=(q[%s],q[%s])x2;%s}     ])x2;printf+pop@s,@s}     >

These are its command-line switches:

     % selfgol -g           # play the Game of Life     % selfgol -s           # reproduce     % selfgol -d           # display banner     % selfgol -f script.pl # convert script.pl into a quine

We cannot leave the topic of obfuscation without mentioning B::Deparse. More than a few people have come up with the bright idea of writing auto-obfuscation programs in order to safeguard their code against prying eyes for commercial reasons; still others produce horrifyingly obfuscated code that you would prefer to understand before running. Deparse helps with both of these. For instance, Mark-Jason Dominus's entry in the 5th Obfuscated Perl Contest looks like this:

     @P=split//,".URRUU\c8R";@d=split//,"\nrekcah xinU / lreP rehtona tsuJ";sub p{     @p{"r$p","u$p"}=(P,P);pipe"r$p","u$p";++$p;($q*=2)+=$f=!fork;map{$P=$P[$f|ord     ($p{$_})&6];$p{$_}=/ ^$P/ix?$P:close$_}keys%p}p;p;p;p;p;map{$p{$_}=~/^[P.]/&&     close$_}%p;wait until$?;map{/^r/&&<$_>}%p;$_=$d[$q];sleep rand(2)if/\S/;print

If you want to deconstruct this, you could do worse than starting with the following:

     % perl -MO=Deparse mjd-japh     @P = split(??, '.URRUUxR', 0);     @d = split(??, "\nrekcah xinU / lreP rehtona tsuJ", 0);     sub p {         @p{"r$p", "u$p"} = ('P', 'P');         pipe *{"r$p"}, *{"u$p"};         ++$p;         ($q *= 2) += $f = !fork;         map {$P = $P[$f | ord $p{$_} & 6];         $p{$_} = / ^$P/xi ? $P : close *$_;} keys %p;     }     p ;     p ;     p ;     p ;     p ;     map {close *$_ if $p{$_} =~ /^[P.]/;} %p;     wait until $?;     map {<$_> if /^r/;} %p;     $_ = $d[$q];     sleep rand 2 if /\S/;     print $_;

B::Deparse is a module designed to work with the Perl compiler, O.pm. If we tell O to use the Deparse backend, (use O 'Deparse', or perl -MO=Deparse as it's more commonly spelled) instead of spitting out C or Perl bytecode, it spits out Perl as parsed by the Perl parser and then unparsed again.

If you need another hint, the -p option to Deparse can be used to add additional parentheses:

     % perl -MO=Deparse,-p mjd-japh     ...         (($q *= 2) += ($f = (!fork)));         map({($P = $P[($f | (ord($p{$_}) & 6))]);     ..