7.2 An Arithmetic Grammar | Building Parsers With Javaв„ў

	Building Parsers with Java By Steven John Metsker
	Table of Contents

	Chapter 7. Parsing Arithmetic

Content

To observe four levels of precedence, an arithmetic grammar needs to have four subparsers. Each of these subparsers handles operators of a common precedence, delegating interpretation of other operators to other subparsers. You can use the words expression , term , factor , and phrase to denote the four subparsers.

To begin, you can define expression as observing the lowest level of precedence.

 expression = expression '+' term                expression '' term               term;

This rule suffers from left recursion, but it properly recognizes all pluses and minuses to the left of a term before adding or subtracting the term . To remove the left recursion, it helps to abbreviate the rule

 e = e '+' t  e '-' t  t;

and look at what an e might generate. For example:

 e => e '+' t    => e '-' t '+' t   => e '+' t '-' t '+' t   ...   => e ('+' t  '-' t)*   => t ('+' t  '-' t)*

As a pattern, e generates another e followed by 0 or more ( '+' t ) or ( '-' t ) sequences, until you generate a plain t , using the third alternate choice for e . You can describe this same pattern, avoiding left recursion, as

 e = t ('+' t  '-' t)*;

Or, unabbreviated :

 expression = term ('+' term  '-' term)*;

Using the same logic, you can define term to recognize multiplication and division as follows :

 term = factor ('*' factor  '/' factor)*;

To recognize exponents, associating them to the right, you can define factor as

 factor = phrase '^' factor  phrase;

Notice that this definition requires exponentiation of phrase to occur only after recognition of the factor to the right of the exponent sign. Thus, if a factor contains another exponent sign, the definition calls for a parser to recognize it first, and exponentiation associates to the right. There is no left recursion to remove in this rule, so it is fine as written.

The last rule the grammar needs is for phrase , which defines the highest level of precedence:

 phrase = '(' expression ')'  Num;

Putting the rules together, the grammar is

 expression = term ('+' term  '-' term)*;  term       = factor ('*' factor  '/' factor)*; factor     = phrase '^' factor  phrase; phrase     = '(' expression ')'  Num;

This grammar recognizes properly formed arithmetic expressions. The next step in designing an arithmetic parser is to determine how to use assemblers to evaluate expressions as the parser recognizes them.

Top