9.4 A Token Class


 
Building Parsers with Java
By Steven  John  Metsker

Table of Contents
Chapter  9.   Advanced Tokenizing

    Content

A token is a receptacle that holds a number or a morsel of text. A tokenizer is an object whose job is to define precisely how to build a number and to define where morsels of text begin and end. For example, a typical tokenizer (such as a StreamTokenizer object) will view the string ">give 2receive" as containing four tokens: a symbol, a word, a number, and another word. A tokenizer class might not create special objects for each token. In particular, the StreamTokenizer class in java.io holds token information in its own state and does not create instances of a Token class. The tokenizer in sjm.parse.tokens uses instances of a Token class to hold the results of tokenizing a string.

The Token class does not determine what is or is not a token. Instances of class Token are containers that hold the results of whatever the tokenizer decides is a token. The Token class does introduce a constraint that a token is definable in terms of a token type and an associated String or double value. Figure 9.1 shows the Token class in sjm.parse.tokens .

Figure 9.1. The Token class. A Token is a receptacle for the results of reading a typically small amount of text, such as a word or a number.

graphics/09fig01.gif

Class Token defines the following constants as token types:

 TT_EOF  TT_NUMBER TT_WORD TT_SYMBOL TT_QUOTED 

These types help a parser to distinguish between common types of language elements. For example, although words as well as symbols are strings of one or more characters , most programming languages allow words, but not symbols, as variable names . By default, the Tokenizer class in sjm.parse.tokens tokenizes the string ">give 2receive" as these four tokens:

 new Token(Token.TT_SYMBOL,">",        0)  new Token(Token.TT_WORD,   "give",     0) new Token(Token.TT_NUMBER, "",       2.0) new Token(Token.TT_WORD,   "receive",  0) 

To demonstrate this, the following class creates these four tokens individually and compares them to the results of tokenizing the string:

 package sjm.examples.tokens;  import java.io.*; import sjm.parse.tokens.*; /**  * This class shows some aspects of default tokenization.  */ public class ShowDefaultTokenization { public static void main(String args[]) throws IOException {     Tokenizer t = new Tokenizer(">give 2receive");     Token manual[] = new Token[] {         new Token(Token.TT_SYMBOL,">",        0),         new Token(Token.TT_WORD,   "give",     0),         new Token(Token.TT_NUMBER, "",       2.0),         new Token(Token.TT_WORD,   "receive",  0)};     for (int i = 0; i < 4; i++) {         Token tok = t.nextToken();         if (tok.equals(manual[i])) {             System.out.print("ok! ");         }     } } } 

This code prints:

 ok! ok! ok! ok! 

This code uses the Token.equals() method, shown in Figure 9.2 along with other methods of Token . The value() method returns either a String or a Double to represent the token. The various is methods verify the token's type.

Figure 9.2. Other methods in class Token . Class Token includes methods that return or compare a token's value, and methods that check a token's type.

graphics/09fig02.gif


   
Top


Building Parsers with Java
Building Parsers With Javaв„ў
ISBN: 0201719622
EAN: 2147483647
Year: 2000
Pages: 169

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net