8.2 Building a Regular Expression Parser

Building Parsers with Java
By Steven John Metsker

Table of Contents
Chapter 8.  Parsing Regular Expressions


Metalanguages that allow a user to specify character patterns, using symbols such as " " and " ~ ", are typically called regular expressions. There is no standard for which symbols belong in this type of metalanguage , although the language Perl is probably the most ambitious matcher of regular expressions. In the expression language you provide to your user, you have complete freedom in the symbols you provide and the meaning you assign to those symbols.

This section shows how to create a basic regular expression recognizer. This metalanguage will allow " " to mean alternation , " * " to mean repetition, and simple juxtaposition (or "nextness") to mean sequence. Individual characters such as a and b simply stand for themselves . For example, a* means zero or more a characters.

You can certainly extend this metalanguage. For example, you might want to allow 9 to mean any digit, so that "(999)999-9999" would match most U.S. phone numbers . How you craft your metalanguage depends on what patterns you want to provide your user.

To build a simple regular expression parser, you need only three types of terminal: Letter , Digit , and SpecificChar . The Letter class matches any single letter; Digit matches any character from to 9 . SpecificChar matches a specified character. Figure 8.1 shows these classes, along with Char , which is not needed in this section. The subclasses of Terminal in Figure 8.1 are members of the package sjm.parse.chars .

Figure 8.1. Character terminals. Shown here are subclasses of Terminal that work with CharacterAssembly objects.


The aim of a regular expression parser is to create a new parser. For example, given a* , your parser first recognizes the a and creates a parser that recognizes only the language { a }. Then your parser recognizes the " * " and creates a Repetition parser, passing it the a parser. This parser matches { "" , "a" , "aa" , }. As another example, given the string "(ab)*Z" , you want to create a parser that will match any string that starts with a s and b s and ends with Z .

The tasks for building a regular expression parser are as follows :

  • Write a grammar.

  • Write the assemblers that will build a parser from a user's input.

  • Generate the parser from the grammar, plugging in the assemblers.


Building Parsers with Java
Building Parsers With Javaв„ў
ISBN: 0201719622
EAN: 2147483647
Year: 2000
Pages: 169

Similar book on Amazon

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net