6.3 Eliminating Left Recursion | Building Parsers With Javaв„ў

	Building Parsers with Java By Steven John Metsker
	Table of Contents

	Chapter 6. Transforming a Grammar

Content

Grammar G2 in the preceding section describes the Minimath language, but it translates to code that will not work. Once again, the grammar is

 // Grammar G2  e = e '' Num  Num; // suffers from left recursion!

The problem with G2 is that e depends directly on e . When e tries to parse text, it starts by trying its first alternative. Thus, trying e means trying e , which means trying e . This loops indefinitely.

It does not help to reverse the alternation , writing

 // Grammar G2b  e = Num  e '' Num; // suffers too!

Here, when e tries its second alternative, it will try to match e , which has two alternatives. When e matches its second alternative, it will try to match e again. This cycling continues until the Java virtual machine runs out of stack space. In other words, it loops indefinitely. For example, compiling and running the following class will hang or crash, depending on your compiler and virtual machine implementation:

 package sjm.examples.minimath;  import sjm.parse.*; import sjm.parse.tokens.*; /**  * This class uses a problematic grammar for Minimath. For  * a better grammar, see class <code>MinimathCompute</code>.  * Here, the grammar is:  *  * <blockquote><pre>  *     e = Num  e '-' Num;  * </pre></blockquote>  *  * Writing a parser directly from this grammar will show  * that left recursion will hang a parser.  */ public class MiniLeftRecursion { public static void main(String args[]) {     Alternation e = new Alternation();     Num n = new Num();     Sequence s = new Sequence();     s.add(e);     s.add(new Symbol('-').discard());     s.add(n);     e.add(n);     e.add(s);     // now hang (or crash)     e.completeMatch(new TokenAssembly("25 - 16 - 9")); } }

It may be tempting to "fix" the grammar by switching the parsers around the minus sign, giving

 e = Num  e  Num; // suffers from incorrect associativity!

This grammar for Minimath is insufficient because it subtracts numbers in the wrong order. You must eliminate left recursion without discarding the proper associativity of a grammar.

To solve the problem of left recursion, it helps to think of the grammar as generating the language rather than recognizing it. For example, given an e in grammar G2 , you can generate e “ Num . If you use the symbol " => " to mean "generates," you can apply G2 to write

 e => e  Num => e  Num - Num => e  Num  Num  Num  Num

This series of generations applies the e “ Num side of the alternation in grammar G2 . You can ultimately apply the other side, namely that e = Num , to get

 e => ... => e - Num  Num  Num  Num           => Num - Num  Num  Num  Num

By looking at the way e generates language elements, it becomes apparent that you could write the grammar as

 e = Num ('-' Num)*;

Equivalently, you can write this as

 e = Num m*;  m = '-' Num;

This arrives at the first grammar we used for Minimath. This grammar avoids left recursion, and it associates the subtraction in the right order. The class MinimathCompute uses a translation of this grammar, augmenting it with assemblers to generate a correct Minimath result:

 package sjm.examples.minimath;  import sjm.parse.*; import sjm.parse.tokens.*; import sjm.examples.arithmetic.*; /**  * This class provides a parser that recognizes minimal  * arithmetic expressions, specifically allowing only the  * '-' operator. The rules of the Minimath language are:  *  * <blockquote><pre>  *     e = Num m*;  *     m = '-' Num;  * </pre></blockquote>  *  * ...  */ public class MinimathCompute { public static void main(String args[]) {     Sequence e = new Sequence();     Num n = new Num();     n.setAssembler(new NumAssembler());     e.add(n);     Sequence m = new Sequence();     m.add(new Symbol('-').discard());     m.add(n);     m.setAssembler(new MinusAssembler());     e.add(new Repetition(m));     TokenAssembly t = new TokenAssembly("25 - 16 - 9");     Assembly out = e.completeMatch(t);     System.out.println(out.pop()); } }

Running this class prints the conventional value of 25 - 16 - 9 :

0.0

6.3.1 An Algorithm

You can always eliminate left recursion without changing associativity when you have a grammar of this form:

 e = e f  g; // Grammar with left recursion

Such a grammar suffers from left recursion, but clearly its meaning is, "Substitute e f for e as many times as you like; then substitute g for e ." For example:

 e => e f => e f f => e f f f => e f f f f => g f f f f

You can achieve this same pattern with this grammar:

 e = g f*; // Equivalent grammar without left recursion

If your grammar suffers from left recursion, use these steps to transform it to an equivalent grammar that does not have left recursion. This algorithm works with most left-recursive grammars encountered in practice. If, however, your grammar suffers from indirect left recursion, you need a more advanced algorithm. See Compilers [Aho et al.] for an algorithm that works with any grammar.

Top