3.6 Translating a Grammar to Code


 
Building Parsers with Java
By Steven John Metsker

Table of Contents
Chapter 3.  Building a Parser

    Content

You can write the code of a parser directly from its grammar. You apply each principle of grammar translation in turn until the grammar becomes a set of Java statements that define a parser. The following principles apply:

  • Treat quoted strings as CaselessLiteral objects.

  • Create Sequence objects for sequences.

  • Create Alternation objects for alternations .

  • Translate Terminal references to objects.

  • Create a subparser for each rule.

  • Declare each subparser, or arrange subparsers as methods .

  • Add a start() method.

3.6.1 Translate Quoted Strings

Treat each quoted word, such as "pick" , as a CaselessLiteral . For example, translate

 pickCommand  = "pick" "carrier" "from" location; 

to

 pickCommand = new CaselessLiteral("pick")                new CaselessLiteral("carrier")               new CaselessLiteral("from")               location; 

This translation immediately begins to look like Java code, although it is not yet compilable. When all translations are complete, the result will be compilable code.

3.6.2 Translate Sequences

When you write a grammar, you imply sequences simply by showing two subparsers next to each other. For example,

 placeCommand = "place" "carrier" "at" location; 

implies

 placeCommand  = new Sequence();  placeCommand.add(new CaselessLiteral("place")); placeCommand.add(new CaselessLiteral("carrier")); placeCommand.add(new CaselessLiteral("at")); placeCommand.add(location); 

Note that this is still not valid Java code, although you are approaching that goal. Specifically, you have not yet declared the type of placeCommand , nor have you established a strategy for referring to other subparsers. These translations follow shortly.

3.6.3 Translate Alternations

A vertical bar in a subparser definition means that the subparsers on either side of the bar may produce a successful match. When a series of vertical bars appears, you can create a single Alternation object. For example, translate

 command = pickCommand  placeCommand  scanCommand; 

to

 command = new Alternation();  command.add(pickCommand); command.add(placeCommand); command.add(scanCommand); 

This code looks almost like compilable Java, but you still have to translate references to subparsers.

3.6.4 Translate Terminals

Many of the terminals in the track robot language are literals, such as "place" and "carrier" . The only other terminal is Word in the location definition. To translate the grammar to Java code, replace each such terminal with a new terminal object of the specified type:

 location = new Word(); 

3.6.5 Create a Subparser for Each Rule

The translation steps given so far leave each subparser as a word that begins with a lowercase letter, such as pickCommand . There are two strategies for translating subparser definitions into Java code. You can declare each subparser as an appropriate kind of parser, or you can arrange the subparsers as a class's methods.

3.6.6 Option 1: Declare Each Subparser

For a small language, you can make each subparser a separate variable. The track robot command language is a little too large for this approach. To illustrate , here is RobotMonolithic.java :

 package sjm.examples.robot;  import sjm.parse.*; import sjm.parse.tokens.*; /**  * Show how to create a parser and use it in a single  * method.  */ public class RobotMonolithic { public static void main(String[] args) {     Alternation command = new Alternation();     Sequence pickCommand = new Sequence();     Sequence placeCommand = new Sequence();     Sequence scanCommand = new Sequence();     Word location = new Word();     command.add(pickCommand);     command.add(placeCommand);     command.add(scanCommand);     pickCommand.add(new CaselessLiteral("pick"));     pickCommand.add(new CaselessLiteral("carrier"));     pickCommand.add(new CaselessLiteral("from"));     pickCommand.add(location);     placeCommand.add(new CaselessLiteral("place"));     placeCommand.add(new CaselessLiteral("carrier"));     placeCommand.add(new CaselessLiteral("at"));     placeCommand.add(location);     scanCommand.add(new CaselessLiteral("scan"));     scanCommand.add(location);     String s = "pick carrier from DB101_IN";     System.out.println(         command.bestMatch(new TokenAssembly(s))); } } 

All the subparser declarations appear at the top of the code. The assignment statements build the command object into a parser for the track robot command language. Running this class prints the following:

 [pick, carrier, from, DB101_IN]  pick/carrier/from/DB101_IN^ 

The output shows that the command parser can completely parse at least one sample element of the language.

3.6.7 Option 2: Arrange Subparsers as Methods

For readability, you can create a method for each subparser of a grammar. For example, you can lift out the preceding code that creates the command object and place it in a method called command() . You can reapply this strategy, creating a method for each subparser. Because all the subparsers except command are useful only in constructing the command subparser, it is a good idea to make them protected and not public. Subparsers such as location are not intended for public use but might be overridden in a subclass.

Refactoring the RobotMonolithic class to apply the strategy of making subparsers methods results in the following:

 package sjm.examples.robot;  import sjm.parse.*; import sjm.parse.tokens.*; /**  * Provide an example of a class that affords a parser for  * the "robot" command language. This class is a refactored  * version of the <code>RobotMonolithic</code> class, with  * one method for each subparser in the robot language.  */ public class RobotRefactored { public Parser command() {     Alternation a = new Alternation();     a.add(pickCommand());     a.add(placeCommand());     a.add(scanCommand());     return a; } protected Parser pickCommand() {     Sequence s = new Sequence();     s.add(new CaselessLiteral("pick"));     s.add(new CaselessLiteral("carrier"));     s.add(new CaselessLiteral("from"));     s.add(location());     return s; } protected Parser placeCommand() {     Sequence s = new Sequence();     s.add(new CaselessLiteral("place"));     s.add(new CaselessLiteral("carrier"));     s.add(new CaselessLiteral("at"));     s.add(location());     return s; } protected Parser scanCommand() {     Sequence s = new Sequence();     s.add(new CaselessLiteral("scan"));     s.add(location());     return s; } protected Parser location() {     return new Word(); } } 

Here's a class that uses the refactored parser class:

 package sjm.examples.robot;  import sjm.parse.*; import sjm.parse.tokens.*; /**  * Show how to use a parser class that arranges its  * subparsers as methods.  */ public class ShowRobotRefactored { public static void main(String[] args) {     Parser p = new RobotRefactored().command();     String s = "place carrier at WB500_IN";     System.out.println(p.bestMatch(new TokenAssembly(s))); } } 

Running this class prints the following:

 [place, carrier, at, WB500_IN]  place/carrier/at/WB500_IN^ 

The class RobotRefactored is a refactoring of RobotMonolithic , with the subparsers arranged as a coordinated set of methods. This approach can lead to an infinite loop if rules in your grammar refer to each other in a cycle. Fortunately, you can eliminate such loops by using lazy initialization; Section 6.5 explains how. Many grammars, including the track robot grammar, are small and acyclic , so we defer this topic for now.

In the refactoring, I changed the variable names . This is an esthetic choice. You can decide whether you think it is easier to read and understand this:

 protected Parser scanCommand() {      Sequence s = new Sequence();     s.add(new CaselessLiteral("scan"));     s.add(location());     return s; } 

or this:

 protected Parser scanCommand() {      Sequence scanCommand = new Sequence();     scanCommand.add(new CaselessLiteral("scan"));     scanCommand.add(location());     return scanCommand; } 

3.6.8 Add a Start Method

A user of your class needs to be able to tell which subparser is the "primary" parser, the one that matches a useful language. To tell your prospective user which parser to use, introduce a start() method that returns the primary parser. For example, you could add the following method to RobotRefactored to make it easy for a user of the class to find the primary parser:

 /**   * Returns a parser that will recognize a command for a  * track robot, and build a corresponding command object.  */ public static Parser start() {     return new RobotParser().command(); } 

Making this method static allows a user to simply call start() as a class method.


   
Top


Building Parsers with Java
Building Parsers With Javaв„ў
ISBN: 0201719622
EAN: 2147483647
Year: 2000
Pages: 169

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net