9.8 Setting a Tokenizer s Source


 
Building Parsers with Java
By Steven  John  Metsker

Table of Contents
Chapter  9.   Advanced Tokenizing

    Content

9.8 Setting a Tokenizer's Source

Most of the examples in this chapter create a tokenizer and use it once. Because this is a common practice, the Tokenizer class lets you pass a string to tokenize into its constructor. For example, the following line creates a Tokenizer object and gives it the string to tokenize:

 Tokenizer t = new Tokenizer(">give 2receive"); 

You can also create a tokenizer without a string and then set the string later. This approach lets you create a customized tokenizer and reuse it for many strings. For example, the CoffeeParser class in Chapter 5, "Parsing Data Languages," creates a special tokenizer that allows spaces to appear inside words. Here is the tokenizer() method of CoffeeParser :

 /**   * Returns a tokenizer that allows spaces to appear inside  * the "words" that identify a coffee's name.  */ public static Tokenizer tokenizer() {     Tokenizer t = new Tokenizer();     t.wordState().setWordChars(' ', ' ', true);     return t; } 

This code creates a default tokenizer and asks its word state to allow spaces. A calling method can retrieve this tokenizer, set the tokenizer's string to be a string that describes a type of coffee, and then feed the tokenizer to a parser. Here, again, is ShowCoffee.java :

 package sjm.examples.coffee;  import java.io.*; import sjm.parse.*; import sjm.parse.tokens.*; /**  * Show the recognition of a list of types of coffee,  * reading from a file.  */ public class ShowCoffee { public static void main(String args[]) throws Exception {     InputStream is =         ClassLoader.getSystemResourceAsStream("coffee.txt");     BufferedReader r =         new BufferedReader(new InputStreamReader(is));     Tokenizer t = CoffeeParser.tokenizer();     Parser p = CoffeeParser.start();     while (true) {         String s = r.readLine();         if (s == null) {             break;         }         t.setString(s);         Assembly in = new TokenAssembly(t);         Assembly out = p.bestMatch(in);         System.out.println(out.getTarget());     } } } 

The main() method of ShowCoffee retrieves a tokenizer from the CoffeeParser class's static tokenizer() method and retrieves the parser itself. The method reuses these two objects to parse each line of input. For each coffee line, the method passes the string to the tokenizer, forms a TokenAssembly object from the tokenizer, and uses the parser to match the token assembly.

You can also set a tokenizer's source by passing it a PushbackReader object to read from. Java character streams follow the decorator pattern [Gamma et al.], meaning that you can wrap one reader around another. For example, you can construct a PushbackReader object from a FileReader object:

 package sjm.examples.tokens;  import java.io.*; import sjm.parse.tokens.*; /**  * This class shows that you can supply your own reader to  * a tokenizer.  */ public class ShowSuppliedReader { public static void main(String[] args) throws IOException {     String s = "Let's file this away.";     FileWriter fw = new FileWriter("temp.txt");     fw.write(s);     fw.close();     FileReader fr = new FileReader("temp.txt");     PushbackReader pr = new PushbackReader(fr, 4);     Tokenizer t = new Tokenizer();     t.setReader(pr);     while (true) {         Token tok = t.nextToken();         if (tok.equals(Token.EOF)) {             break;         }         System.out.println(tok);     } } } 

Running this class prints the following:

 Let's  file this away . 

The main() method in ShowSuppliedReader creates a FileReader , wraps a PushbackReader around it, and provides this reader to the tokenizer. The code supplies the reader with an ample (four-character) pushback buffer, which lets the reader handle multicharacter symbols.

The design of the Tokenizer class allows you to construct a Tokenizer object without having a string or reader to read from. This allows you to create a customized tokenizer that is independent of any particular string.


   
Top


Building Parsers with Java
Building Parsers With Javaв„ў
ISBN: 0201719622
EAN: 2147483647
Year: 2000
Pages: 169

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net