11.2 New Terminals


 
Building Parsers with Java
By Steven  John  Metsker

Table of Contents
Chapter  11.   Extending the Parser Toolkit

    Content

Terminals use some judgment about whether the next element in an assembly qualifies as the type of element sought. You can refine this judgment by introducing new subclasses of Terminal . For example, consider a language that differentiates between known values and unknown values by using lowercase and uppercase letters . In this language, a structure might appear that looks like this:

 member(X, [republican, democrat]) 

This structure might imply that the unknown X can take on either of the known values in the list. A partial grammar for this language might look something like this:

 //...  term     = variable  known; variable = UppercaseWord; known    = LowercaseWord; 

Here is a sample program that depends on UppercaseWord and LowercaseWord to detect the difference between variables and known values:

 package sjm.examples.mechanics;  import sjm.parse.*; import sjm.parse.tokens.*; /**  * Show the use of new subclasses of <code>Terminal</code>.  */ public class ShowNewTerminals { public static void main(String[] args) {     /*  term     = variable  known;      *  variable = UppercaseWord;      *  known    = LowercaseWord;      */     Parser variable = new UppercaseWord();     Parser known    = new LowercaseWord();     Parser term = new Alternation()         .add(variable)         .add(known);     // anonymous Assembler subclasses note element type     variable.setAssembler(         new Assembler() {             public void workOn(Assembly a) {                 Object o = a.pop();                 a.push("VAR(" + o + ")");             }         });     known.setAssembler(         new Assembler() {             public void workOn(Assembly a) {                 Object o = a.pop();                 a.push("KNOWN(" + o + ")");             }         });     // term* matching against knowns and variables:     System.out.println(         new Repetition(term).bestMatch(             new TokenAssembly(                 "member X republican democrat"))); } } 

This sample program uses anonymous subclasses of Assembler to suggest how, in practice, you can react differently to different types of matches. Running this class prints the following:

 [KNOWN(member), VAR(X), KNOWN(republican), KNOWN(democrat)]  member/X/republican/democrat^ 

In practice, your assemblers will perform some useful function. For example, you might have Variable objects and KnownValue objects, and your assemblers might create these and pass them to the overall target of the assembly. The point here is that you can add new types of terminals to trigger these different types of actions.

Implementations of LowercaseWord and UppercaseWord (in package sjm.examples.mechanics ) require only overriding qualifies() from Word . Here is the code for LowercaseWord.qualifies() :

 /**   * Returns true if an assembly's next element is a  * lowercase word.  */ protected boolean qualifies (Object o) {     Token t = (Token) o;     if (!t.isWord()) {         return false;     }     String word = t.sval();     return word.length() > 0 &&         Character.isLowerCase(word.charAt(0)); } 

In this example, you could take a different design approach and push the job of distinguishing lowercase and uppercase words down to the tokenizer. You could create new token types and modify the tokenizer class to separate lowercase tokens from uppercase tokens. To distinguish lowercase from uppercase words, the approach of introducing new Terminal subclasses is simple and effective. However, there are times when you will want to modify your tokenizer, particularly when there is potential overlap and ambiguity surrounding token types.

In an earlier example (in Chapter 4, "Testing a Parser"), the word "cups" was seen by the parser both as a regular word and as a reserved word. In this case, it would be helpful to have the tokenizer make the distinction between reserved and nonreserved words.


   
Top


Building Parsers with Java
Building Parsers With Javaв„ў
ISBN: 0201719622
EAN: 2147483647
Year: 2000
Pages: 169

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net