9.11 Token Strings


 
Building Parsers with Java
By Steven  John  Metsker

Table of Contents
Chapter  9.   Advanced Tokenizing

    Content

The package sjm.parse.tokens uses a TokenString class to hold the results of tokenizing a string. A TokenString is similar to a String , but it contains a series of tokens rather than a series of characters . Like String objects, TokenString objects are immutable, meaning that they cannot change after they are created. Figure 9.17 shows the TokenString class.

Figure 9.17. The TokenString class. A TokenString is essentially an array of Tokens . Like String , TokenString is immutable, so there is never a need to copy a TokenString .

graphics/09fig17.gif

The TokenAssembly class hides the fact that it relies on class TokenString . The TokenStringSource class, on the other hand, returns TokenString objects. If you use TokenStringSource to break up an input stream, you must understand the collaboration of several token- related classes. This example shows a collaboration of instances of these classes:

 package sjm.examples.tokens;  import sjm.parse.*; import sjm.parse.tokens.*; /**  * This class shows a collaboration of objects from classes  * <code>Tokenizer</code>, <code>TokenStringSource</code>,  * <code>TokenString</code>, <code>TokenAssembly</code>.  */ public class ShowTokenString public static void main(String args[]) {     // a parser that counts words     Parser w = new Word().discard();     w.setAssembler(new Assembler() {         public void workOn(Assembly a) {             if (a.stackIsEmpty()) {                 a.push(new Integer(1));             }else {                 Integer i = (Integer) a.pop();                 a.push(new Integer(i.intValue() + 1));             }         }     });     // a repetition of the word counter     Parser p = new Repetition(w);     // consume token strings separated by semicolons     String s = "I came; I saw; I left in peace;";     Tokenizer t = new Tokenizer(s);     TokenStringSource tss = new TokenStringSource(t, ";");     // count the words in each token string     while (tss.hasMoreTokenStrings()) {         TokenString ts = tss.nextTokenString();         TokenAssembly ta = new TokenAssembly(ts);         Assembly a = p.completeMatch(ta);         System.out.println(             ts + " (" + a.pop() + " words)");     } } } 

Running this class shows the word count of each semicolon-delimited section:

 I came (2 words)  I saw (2 words) I left in peace (4 words) 

In this example,

  • A Tokenizer object breaks the input into tokens.

  • A TokenStringSource object divides the tokens into TokenString objects.

  • A TokenString variable holds a succession of TokenString objects from the input.

  • A TokenAssembly variable holds a TokenAssembly object that wraps around a TokenString object.


   
Top


Building Parsers with Java
Building Parsers With Javaв„ў
ISBN: 0201719622
EAN: 2147483647
Year: 2000
Pages: 169

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net