This book explains how to write parsers for new computer languages that you create. Each chapter focuses on background, techniques, or applications. Chapters on background give you the tools to build parsers. Chapters on techniques show you how to apply the tools. Chapters on applications explain how to create a specific parser for a particular type of language. Figure 1.1 shows the role of each chapter.
Figure 1.1. Each chapter in this book focuses on either background that supports later chapters, techniques that apply across parsers, or applications of parsers to a specific language type.
The structure of the book is fairly linear, with each chapter dependent only on preceding chapters. For example, Chapter 5, "Parsing Data Languages," depends primarily on background and techniques from Chapters 2 and 3. You can skip the middle chapters of the book ”on tokenizing, mechanics, and new types ”if you want to get right to the chapters on advanced languages.
Chapter 2, "The Elements of a Parser," explains what a parser is, introduces the building blocks of applied parsers, and shows how to compose new parsers from existing ones.
Chapter 3, "Building a Parser," explains the steps in designing and coding a working parser.
Chapter 4, "Testing a Parser," explains how to test the features of a new language and how to use random testing to detect ambiguity and other potential problems.
Chapter 5, "Parsing Data Languages," shows how to create a parser to read elements of a data language. A data language is a set of strings that describe objects following a local convention. This chapter also explains that, given the opportunity, you should consider migrating data-oriented languages to XML.
Chapter 6, "Transforming a Grammar," explains how to ensure the correct behavior of operators in a language and how to avoid looping in a parser, which can follow from loops in a grammar.
Chapter 7, "Parsing Arithmetic," develops an arithmetic parser. Arithmetic usually appears as part of a larger language, and the ideas in this chapter reappear in the chapters on query, logic, and imperative languages. To focus on the correct interpretation of arithmetic, this chapter develops an independent parser.
Chapter 8, "Parsing Regular Expressions," develops a regular expression parser. A regular expression is a string that uses symbols to describe a pattern of characters . For example, "~.txt" might represent all file names that end with .txt . This chapter explains how to read a string such as "~.txt" and create a parser that will recognize all the strings the given pattern describes.
Chapter 9, "Advanced Tokenizing," describes the tokenizers that come with Java and the customizable tokenizer used in this book. Tokenizing a string means breaking the string into logical nuggets, something that lets you define a parser in terms of the nuggets instead of individual characters. The default operation of the tokenizer used in this book is sufficient for many languages, so customizing a tokenizer is an advanced topic.
Chapter 10, "Matching Mechanics," explains how the fundamental types of parsers in this book match text.
Chapter 11, "Extending the Parser Toolkit," explains how to extend a parser toolkit, introducing new types of terminals or completely new parser types.
Chapter 12, "Engines," introduces a logic engine used in later chapters to build a logic language and a query language.
Chapter 13, "Logic Programming," explains how to program in logic, which means programming with facts and rules.
Chapter 14, "Parsing a Logic Language," explains how to construct a parser for a logic language. Chapter 13 explains logic programming, giving examples in the Logikus programming language; Chapter 14 explains how to construct a Logikus parser.
Chapter 15, "Parsing a Query Language," describes how to construct a parser for a query language. A query language parser translates textual queries into calls to an engine. The engine proves the query against a source of rules and data and returns successful proofs as the result of the query.
Chapter 16, "Parsing an Imperative Language," shows how to create a parser for an imperative language. An imperative language parser translates a textual script into a composition of commands that direct a sequence of actions.
Chapter 17, "Directions," points out areas for further reading and programming.
Appendix A,"UML Twice Distilled," explains the features of the Unified Modeling Language that this book applies.