| Building Parsers with Java |
By Steven John Metsker
|Table of Contents|
This book does not assume that you understand compilers and language theory. But if you are well grounded in these topics, you may want to know where this book sits within established theory. This section explains the type of parsers that this book covers and describes how this book differs from others in terms of conventions regarding grammars and abstract syntax trees.
All parsers in this book are nondeterministic recursive-descent parsers. If you are interested in learning about other types of parsers, the classic source on this topic is Compilers: Principles, Techniques, and Tools [Aho et al.]. The choice of nondeterministic recursive-descent parsers springs from two objectives. The first is to empower a developer of a new little language to easily transition from language design to the implementation of a parser. The second objective is to answer the Extreme Programming question, "What is the simplest thing that could possibly work?" [Beck, page 30].
To simplify the coding of a parser from its design, a parsing technique should let a developer translate a grammar directly into a parser. The sequences, alternations , and repetitions in a grammar must correspond directly to instances of Sequence , Alternation , and Repetition classes. Furthermore, the developer should face few restrictions in the allowable attributes of an input grammar. Nondeterministic recursive-descent parsing provides a comparatively simple approach to meeting these objectives.
Nondeterminism is a central problem of parser construction; parsers do not always know which path to take as they recognize text. Nondeterministic recursive-descent parsing solves this problem by using sets to allow all possible parser paths to proceed. This approach used to be too slow, but modern computers make it sufficient for small languages, including all the languages in this book. An advantage of this approach is that the parsers accept any context-free grammar as long as the developer removes left recursion, by using a technique explained in this book. Nondeterministic recursive-descent parsers provide a broadly applicable and simply implemented approach to empowering developers of new languages.
The conventions in this book also differ from some conventions for writing grammars. Specifically, grammars in this book use class names to represent terminals and use semicolons to mark the end of rules. These standards support the simplicity of the translation from grammar to code.
Finally, this book is unusual in the little treatment it gives to abstract syntax trees (ASTs). It is common practice to parse input, create an AST, and then walk the tree. This book argues that it is more effective to build a target object as a parse completes, working on the result as each grammar rule succeeds. Most of the examples in this book build a useful result while parsing an input string, but none of the examples constructs an AST.