2.2. Parse::YappIf you're more familiar with tools like yacc, you may prefer to use François Désarménien's Parse::Yapp module. This is more or less a straight port of yacc to Perl.
For instance, let's use Parse::Yapp to implement the calculator in Chapter 3 of lex & yacc (O'Reilly). This is a very simple calculator with a symbol table, so you can say things like this: a = 25 b = 30 a + b 55 Here's their grammar: %{ double vbltable[26]; %} %union { double dval; int vblno; } %token <vblno> NAME %token <dval> NUMBER %left '-' '+' %left '*' '/' %nonassoc UMINUS %type <dval> expression %% statement_list: statement '\n' | statement_list statement '\n' ; statement: NAME '=' expression { vbltable[$1] = $3; } | expression { printf("= %g\n", $1); } ; expression: expression '+' expression { $$ = $1 + $3; } | expression '-' expression { $$ = $1--$3; } | expression '*' expression { $$ = $1 * $3; } | expression '/' expression { if($3 = = 0.0) yyerror("divide by zero"); else $$ = $1 / $3; } | '-' expression %prec UMINUS { $$ = -$2; } | '(' expression ')' { $$ = $2; } | NUMBER | NAME { $$ = vbltable[$1]; } ; %% Converting the grammar is very straightforward; the only serious change we need to consider is how to implement the symbol table. We know that Perl's internal symbol tables are just hashes, so that's good enough for us. The other changes are just cosmetic, and we end up with a Parse::Yapp grammar like this: %{ my %symtab; %} %token NAME %token NUMBER %left '-' '+' %left '*' '/' %nonassoc UMINUS %% statement_list: statement '\n' | statement_list statement '\n' ; statement: NAME '=' expression { $symtab{$_[1]} = $_[3]; } | expression { print "= ", $_[1], "\n"; } ; expression: expression '+' expression { $_[1] + $_[3] } | expression '-' expression { $_[1] - $_[3] } | expression '*' expression { $_[1] * $_[3] } | expression '/' expression { if ($_[3] = = 0) { $_[0]->YYError("divide by zero") } else { $_[1] / $_[3] } } | '-' expression %prec UMINUS { -$_[2] } | '(' expression ')' { $_[2] } | NUMBER | NAME { $symtab{$_[1]} } ; %% As you can see, we've declared a hash %symtab to hold the values of the names. Also, notice that that Yacc variables $1, $2, etc. become real subroutine parameters in the @_ array: $_[1], $_[2], and so on. Next we need to produce a lexer that feeds tokens to the parser. Parse::Yapp expects a subroutine to take input from the data store of the parser object. The Parse::Yapp object is passed in as the first parameter to the lexer, and so the data store ends up looking like $_[0]->YYData->{DATA}.[*] The lexing subroutine should modify this data store to remove the current token, and then return a two-element list.
The list should consist of the token type followed by the token data. For instance, in our calculator example, we need to tokenize 12345 as ("NUMBER", 12345). Operators, brackets, equals, and return should be returned as themselves, and names of variables need to be returned as ("NAME", "whatever"). At the end of the input, we need to return an empty string and undef: ('', undef). Here's a reasonably simple Perl routine that does all of that: sub lex { # print " Lexer called to handle (".$_[0]->YYData->{DATA}.")\n"; $_[0]->YYData->{DATA} =~ s/^ +//; return ('', undef) unless length $_[0]->YYData->{DATA}; $_[0]->YYData->{DATA} =~ s/^(\d+)// and return ("NUMBER", $1); $_[0]->YYData->{DATA} =~ s/^([\n=+\(\)\-\/*])// and return ($1, $1); $_[0]->YYData->{DATA} =~ s/^(\w+)// and return ("NAME", $1); die "Unknown token (".$_[0]->YYData->{DATA}."\n"; } Now that we have our grammar and our lexer, we need to run the grammar through the command-line utility yapp to turn it into a usable Perl module. If all is well, this should be a silent process: % yapp Calc.yapp % and we should have a new file Calc.pm ready for use.
We can now put it all together: our parser, the lexer, and some code to drive them. sub lex { $_[0]->YYData->{DATA} =~ s/^ +//; return ('', undef) unless length $_[0]->YYData->{DATA}; $_[0]->YYData->{DATA} =~ s/^(\d+)// and return ("NUMBER", $1); $_[0]->YYData->{DATA} =~ s/^([\n=+\(\)\-\/*])// and return ($1, $1); $_[0]->YYData->{DATA} =~ s/^(\w+)// and return ("NAME", $1); die "Unknown token (".$_[0]->YYData->{DATA}.")\n"; } use Calc; my $p = Calc->new( ); undef $/; $p->YYData->{DATA} = <STDIN>; $p->YYParse(YYlex => \&lex); This will take a stream of commands on standard input, run the calculations, and print them out, like this: % perl calc a = 2+4 b = a * 20 b + 15 ^D =135 For most parsing applications, this is all we need. However, in the case of a calculator, you hardly want to put all the calculations in first and get all the answers out at the end. It needs to be more interactive. What we need to do is modify the lexer so that it can take data from standard input, using the YYData area as a buffer. sub lex { $_[0]->YYData->{DATA} =~ s/^ +//; unless (length $_[0]->YYData->{DATA}) { return ('', undef) if eof STDIN; $_[0]->YYData->{DATA} = <STDIN>; $_[0]->YYData->{DATA} =~ s/^ +//; } $_[0]->YYData->{DATA} =~ s/^(\d+)// and return ("NUMBER", $1); $_[0]->YYData->{DATA} =~ s/^([\n=+\(\)\-\/*])// and return ($1, $1); $_[0]->YYData->{DATA} =~ s/^(\w+)// and return ("NAME", $1); die "Unknown token (".$_[0]->YYData->{DATA}.")\n"; } This time, we check to see if the buffer's empty, and instead of giving up, we get another line from standard input. If we can't read from that, then we give up. Now we can intersperse results with commands, giving a much more calculator-like feel to the application. |