3.4 Grammars: A Shorthand for Parsers

Building Parsers with Java
By Steven John Metsker

Table of Contents
Chapter 3.  Building a Parser


A grammar is a collection of related parser definitions in which the definitions follow a standard shorthand. A goal of the design phase in software construction is to illustrate in compact form the important features that will appear in Java code. Consider again the code that builds a parser to recognize a description of a good cup of coffee:

 package sjm.examples.introduction;  import sjm.parse.*; import sjm.parse.tokens.*; /**  * Show how to create a composite parser.  */ public class ShowComposite { public static void main(String[] args) {     Alternation adjective = new Alternation();     adjective.add(new Literal("steaming"));     adjective.add(new Literal("hot"));     Sequence good = new Sequence();     good.add(new Repetition(adjective));     good.add(new Literal("coffee"));     String s = "hot hot steaming hot coffee";     Assembly a = new TokenAssembly(s);     System.out.println(good.bestMatch(a)); } } 

You can more simply describe the good parser with the following shorthand, or grammar:

 good      = adjective* "coffee";  adjective = "steaming"  "hot"; 

This shorthand relies on a few conventions, including showing literal values in quotes, showing alternation with a bar, and showing repetition with an asterisk. The point of using a grammar is that it is far more manageable than its corresponding Java code.

3.4.1 Standard Grammar Shorthand

This book observes the following conventions for writing a grammar, which is a compact definition of a parser.

  1. Show the definition of a parser by giving its name , an equal sign, a subparser expression, and a semicolon. For example, the rule

     adjective = "steaming"  "hot"; 

    defines the makeup of an adjective .

  2. Reference other subparsers as words that begin with a lowercase letter. For example, the rule

     good = adjective* "coffee"; 

    defines a good parser by referring to the adjective subparser.

  3. Show a specific string to match by writing it in quotes. For example, the rule

     adjective = "steaming"  "hot"; 

    uses quotes to show that "steaming" and "hot" are specific strings to match.

  4. Show a specific single character to match by writing it in single quotes, such as '-' in the Minimath rule:

     m = '-' Num; 

    Note that a tokenizer will return many, but perhaps not all, arithmetic operators, Boolean operators, and punctuation marks as separate symbol tokens, rather than as parts of words. Section 9.6, "Tokenizer Lookup Tables," describes which characters the Tokenizer class treats (by default) as individual symbols.

  5. Show repetition by using an asterisk (" * "). Consider the good grammar:

     good      = adjective* "coffee";  adjective = "steaming"  "hot"; 

    The rule for good uses an asterisk to describe a string that begins with 0 or more adjectives and ends with "coffee" .

  6. Imply sequence by writing subparsers next to each other. For example, the good grammar implies that good is a sequence of a-repetition-of-adjectives followed by the word "coffee" .

  7. Show alternation by using a vertical bar. For example, the adjective rule declares that both "steaming" and "hot" are suitable adjectives.

  8. Indicate precedence by using parentheses. For example, if you wanted to show the good grammar on one line, you could write

     good = ("hot"  "steaming")* "coffee"; 
  9. Show terminals as words that begin with a capital letter, such as Num , as in

     phrase = '(' expression ')'  Num; 

    Section 2.5, "Terminal Parsers," shows the terminals available in sjm.parse.tokens . Chapter 11, "Extending the Parser Toolkit," describes how to add new types of terminals.

  10. Parameterize subparsers if it makes your grammar a better design. (This is an optional element of grammar design. It works only if you will code your subparsers as methods of a class. See Section 3.6, "Translating a Grammar to Code.") For example, consider a grammar for a small set of markup tags:

     tag      = nameTag  roastTag  priceTag;  nameTag  = '<' "name" '>'; roastTag = '<' "roast" '>'; priceTag = '<' "price" '>'; 

    To avoid repeating the pattern of writing angle braces around a literal value, you can use a parameterized rule:

     tag       = braces("name")  braces("roast")               braces("price"); braces(p) = '<' p '>'; 

    The parameter to braces is a parser, specifically a CaselessLiteral for "name" , "roast" , or "price" .

3.4.2 Top-Down Grammar Design

When you design a parser for a language, you can use either a top-down or a bottom-up approach to design. A bottom-up approach includes designing small parts of your parser that you know you will need. If you understand the goal of your parser, you may be able to design your assemblers, which can help guide the decomposition of the language you intend to recognize. When you reach the point of designing the grammar itself, you will most likely find that a top-down approach to grammar writing is natural and intuitive.

A top-down design approach begins by breaking the design problem into components . When you are using a top-down approach to write a parser, you state the design problem as, "Can I decompose this design into parts?" You can also state the design challenge as, "Can I represent the parser I want as a composition?" As it happens, parsers are always either terminals or composites of other parsers. The fact that parsers are composites makes the top-down design approach natural. Here is an effective algorithm for designing a new parser:

  1. Define the parser you want as a composite of subparsers.

  2. Repeat step 1 until every subparser is defined or is a terminal.

This algorithm creates a grammar that will often be sufficient for direct translation to Java code. Otherwise, you must transform the grammar before implementing it, a topic described in Section 3.6, "Translating a Grammar to Code." Before looking at grammar transformation, it is useful to walk through an example of applying the design algorithm to a small example.


Building Parsers with Java
Building Parsers With Javaв„ў
ISBN: 0201719622
EAN: 2147483647
Year: 2000
Pages: 169

Similar book on Amazon

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net