3.2 Deciding to Tokenize

Building Parsers with Java
By Steven  John  Metsker

Table of Contents
Chapter  3.   Building a Parser


An early design decision is whether you want to treat your language as a pattern of characters or as a pattern of tokens. Most commonly, you will not want to use a tokenizer for languages that let a user specify patterns of characters to match against. Chapter 8, "Parsing Regular Expressions," gives an example of parsing without using a tokenizer.

Tokens are composed of characters, so every language that is a pattern of tokens is also a pattern of characters. Theoretically, then, tokenizers are never necessary. However, it is usually practical to tokenize text and to specify a grammar for a language in terms of token terminals. Consider a robot control language that allows this command:

 move robot 7.1 meters from base 

If you do not plan to tokenize, your parser must recognize every character, including the whitespace between words. You also must ensure that you properly gather characters into words, and you must build the number value yourself. All of this is work that a tokenizer will happily perform for you. Chapter 9, "Advanced Tokenizing," discusses how to customize a tokenizer. When you are learning to design new languages, you may want to limit your languages to those that can benefit from the default behavior of class Tokenizer in package sjm.parse.tokens .


Building Parsers with Java
Building Parsers With Javaв„ў
ISBN: 0201719622
EAN: 2147483647
Year: 2000
Pages: 169

Similar book on Amazon

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net