A token is a receptacle that holds a number or a morsel of text. A tokenizer is an object whose job is to define precisely how to build a number and to define where morsels of text begin and end. For example, a typical tokenizer (such as a StreamTokenizer object) will view the string ">give 2receive" as containing four tokens: a symbol, a word, a number, and another word. A tokenizer class might not create special objects for each token. In particular, the StreamTokenizer class in java.io holds token information in its own state and does not create instances of a Token class. The tokenizer in sjm.parse.tokens uses instances of a Token class to hold the results of tokenizing a string. The Token class does not determine what is or is not a token. Instances of class Token are containers that hold the results of whatever the tokenizer decides is a token. The Token class does introduce a constraint that a token is definable in terms of a token type and an associated String or double value. Figure 9.1 shows the Token class in sjm.parse.tokens . Figure 9.1. The Token class. A Token is a receptacle for the results of reading a typically small amount of text, such as a word or a number. Class Token defines the following constants as token types: TT_EOF TT_NUMBER TT_WORD TT_SYMBOL TT_QUOTED These types help a parser to distinguish between common types of language elements. For example, although words as well as symbols are strings of one or more characters , most programming languages allow words, but not symbols, as variable names . By default, the Tokenizer class in sjm.parse.tokens tokenizes the string ">give 2receive" as these four tokens: new Token(Token.TT_SYMBOL,">", 0) new Token(Token.TT_WORD, "give", 0) new Token(Token.TT_NUMBER, "", 2.0) new Token(Token.TT_WORD, "receive", 0) To demonstrate this, the following class creates these four tokens individually and compares them to the results of tokenizing the string: package sjm.examples.tokens; import java.io.*; import sjm.parse.tokens.*; /** * This class shows some aspects of default tokenization. */ public class ShowDefaultTokenization { public static void main(String args[]) throws IOException { Tokenizer t = new Tokenizer(">give 2receive"); Token manual[] = new Token[] { new Token(Token.TT_SYMBOL,">", 0), new Token(Token.TT_WORD, "give", 0), new Token(Token.TT_NUMBER, "", 2.0), new Token(Token.TT_WORD, "receive", 0)}; for (int i = 0; i < 4; i++) { Token tok = t.nextToken(); if (tok.equals(manual[i])) { System.out.print("ok! "); } } } } This code prints: ok! ok! ok! ok! This code uses the Token.equals() method, shown in Figure 9.2 along with other methods of Token . The value() method returns either a String or a Double to represent the token. The various is methods verify the token's type. Figure 9.2. Other methods in class Token . Class Token includes methods that return or compare a token's value, and methods that check a token's type. |