8.4 Regular Expression Assemblers


 
Building Parsers with Java
By Steven  John  Metsker

Table of Contents
Chapter  8.   Parsing Regular Expressions

    Content

You need assemblers for each of the operators a user can specify, and an assembler to handle characters such as a and b . Figure 8.2 shows these assemblers as part of the regular package.

Figure 8.2. The regular expression package. This package contains classes that collaborate to create a new regular expression parser from text.

graphics/08fig02.gif

You must plug in the operator assemblers where each operator appears in the grammar. To see where the assemblers belong, it helps to rewrite the grammar from this:

 expression    = term ('' term)*;  term          = factor factor*; factor        = phrase  (phrase '*'); phrase        = letterOrDigit  '(' expression ')'; letterOrDigit = Letter  Digit; 

to this:

 expression    = term orTerm*;  term          = factor nextFactor*; orTerm        = '' term; factor        = phrase  phraseStar; nextFactor    = factor; phrase        = letterOrDigit  '(' expression ')'; phraseStar    = phrase '*'; letterOrDigit = Letter  Digit; 

This version of the grammar lets you specify the placement of the assemblers, as Figure 8.3 shows.

Figure 8.3. Regular expression assembler placement. This table shows the assembler employed by each subparser for the regular expression grammar.

graphics/08fig03.gif

A key design principle of the regular expression parser is that each of its subparsers leaves a parser on the stack. For example, after orTerm matches, there will be two parsers on the stack: one for each of the two term s seen. The OrAssembler object pops these two parsers and pushes a new parser that is an Alternation of the parsers that were on the stack. Similarly, AndAssembly pushes a Sequence of two popped parsers, and StarAssembly pushes a repetition of one popped parser.

8.4.1 Assembler Code

Your user will supply strings such as "(ab)*" , which contain letters and the punctuation of your metalanguage . Each letter in such a string becomes a specific character that your parser will look for when it matches against a string such as "aabbaab" . The Letter and Digit parsers simply stack whatever they see; that is, these terminal parsers stack Character objects. A CharAssembler object converts each such character into a SpecificChar parser. Here is the code for CharAssembler :

 package sjm.examples.regular;  import sjm.parse.*; import sjm.parse.chars.*; /**  * Pop a <code>Character</code> from the stack and push a  * <code>SpecificChar</code> parser in its place.  */ public class CharAssembler extends Assembler { public void workOn(Assembly a) {     a.push(new SpecificChar((Character) a.pop())); } } 

This class expects the stack to have a Character at its top. Your parser fulfills the expectation: Whenever it sees a letter character, it stacks the character and uses CharAssembler to convert the character into a SpecificChar parser. For example, when your parser sees an a , it stacks it and then converts it into a parser that matches the character a .

When your parser sees an " * ", it wants to form a repetition of whatever is at the top of the stack. The StarAssembler class performs that work.

 package sjm.examples.regular;  import sjm.parse.*; /**  * Pop a parser from the stack and push a new <code>  * Repetition</code> of it.  */ public class StarAssembler extends Assembler { public void workOn(Assembly a) {     a.push(new Repetition((Parser) a.pop())); } } 

When your parser sees two "factors" in sequence, it creates a Sequence of them. For example, your parser might be parsing the regular expression "(ab)c" . Your parser must place the results of parsing "(ab)" on the stack and the results of parsing "c" on the stack first. After parsing both of these factors, your parser asks an AndAssembler to create a new sequence.

 package sjm.examples.regular;  import sjm.parse.*; /**  * Pop two parsers from the stack and push a new <code>  * Sequence</code> of them.  */ public class AndAssembler extends Assembler { public void workOn(Assembly a) {     Object top = a.pop();     Sequence s = new Sequence();     s.add((Parser) a.pop());     s.add((Parser) top);     a.push(s); } } 

Like AndAssembler , the OrAssembler class expects two parsers to be at the top of an assembly's stack. This class pops them, composes an Alternation of them, and pushes back the Alternation . Here is the code:

 package sjm.examples.regular;  import sjm.parse.*; /**  * Pop two parsers from the stack and push a new <code>  * Alternation</code> of them.  */ public class OrAssembler extends Assembler { public void workOn(Assembly a) {     Object top = a.pop();     Alternation alt = new Alternation();     alt.add((Parser) a.pop());     alt.add((Parser) top);     a.push(alt); } } 

   
Top


Building Parsers with Java
Building Parsers With Javaв„ў
ISBN: 0201719622
EAN: 2147483647
Year: 2000
Pages: 169

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net