4.2 Random Testing

Building Parsers with Java
By Steven  John  Metsker

Table of Contents
Chapter  4.   Testing a Parser


In addition to testing whether your language supports the features you intend, it is a good idea to conduct random tests by running random language elements through your parser. Figure 4.1 shows the abstract class ParserTester , from package sjm.parse . This class accepts a parser in its constructor and tests the parser in response to a test() message.

Figure 4.1. The ParserTester class. This class tests the parser that it accepts in its constructor.


The ParserTester class is abstract because its assembly() method is abstract. Subclasses implement this method to treat the string as a sequence of either tokens or characters . The setLogTestStrings() method determines whether the tester object prints out each random test string. The test() method applies a parser against many test inputs. This method creates a new assembly for each test string and sets that assembly's target to the value of freshTarget() . The ParserTester class implements this method to return null . Subclasses can override this to insert a new target object for each test assembly.

Figure 4.2 shows the two primary subclasses of ParserTester . These classes implement the assembly() method, which determines whether to create a CharacterAssembly or a TokenAssembly from a given string.

Figure 4.2. Concrete subclasses of ParserTester . Class CharacterTester from sjm.parse.chars and class TokenTester from sjm.parse.tokens determine how to create an assembly from a random string.


In response to a test() message, the tester classes generate a large number of random language elements and ask the parser to test them. For example, here is a program that tests an arithmetic parser:

 package sjm.examples.tests;  import sjm.parse.*; import sjm.parse.tokens.*; import sjm.examples.arithmetic.*; /** * Test the <code>ArithmeticParser</code> class. */ public class ShowArithmeticTest { public static void main(String[] args) {     new TokenTester(ArithmeticParser.start()).test(); } } 

Running this class prints a series of random tests, such as the following:

 Testing depth 2...      Testing string 66.1 / 81.0 + 63.1 ^ 50.1 ^ 61.1 ^ 45.0                     ^ 4.3 ^ 96.3     Testing string ( 90.1 ^ 9.5 ^ 89.6 ^ 84.2 ^ 33.9 ^ 87.9                     ^ 28.5 ^ 43.5 ) ^ 98.4 / 92.3 + 85.8 ^                     45.8 ^ 82.1     Testing string ( 7.5 ) ^ 15.0 / 62.2 * 22.9 ^ 39.9 * 57.7                    ^ 15.4 + 16.1 ^ 17.4     Testing string 55.8 ^ 87.5 ^ 89.0 ^ 93.3 / 70.4 / 53.6 /                    20.8 + 12.4     Testing string 75.5 ^ 53.1 * 40.1 ^ 79.4 + 26.1 ^ 13.6 -                    50.0 ^ 28.3 ^ 56.9 ^ 8.1 ... 

These tests will continue for some time, and for the given parser they will find no errors. The depth of the test relates to how many parser expansions the test allows before it tries to resolve the test by preferring expansions to terminals over nonterminals . The class shown will generate hundreds of test strings, confirming that the parser ArithmeticParser.start() can parse each result.

The tester classes rely on support from the Parser hierarchy for these random expansions. Each subclass of Parser must be able to provide a random element of the parser's language. For example, the class Word produces a random word, and the class Literal always returns its literal value. The classes Sequence , Alternation , and Repetition are subtler, and a detailed explanation of their design is outside the scope of this book. The code for these classes is available on the CD that comes with this book.

4.2.1 Ambiguity Testing

A grammar is ambiguous if a parser for the grammar allows more than one way to completely match an assembly. This is usually an error because a language user normally expects a single interpretation of a given string. Unfortunately, "there exists no general method for determining whether an arbitrary [grammar] is ambiguous or not" [Slonneger and Kurtz]. In practice, random testing provides an effective approach to discovering ambiguity in a grammar. The tester classes in the ParserTester hierarchy will find ambiguity in a parser, although this is not guaranteed to work because the tests are random.

A classic case of ambiguity is the dangling else problem. Consider an imperative language that includes statements such as this one:

 if (overdueDays > 90)      if (balance >= 1000)         callCustomer(); else sendBill(); 

To a human reader, there may be some question as to whether the else clause belongs to the first or second if . Java associates the else with the nearest preceding if , although the indentation in this example implies otherwise . Although a person reading this code may incorrectly predict how Java will interpret the code, Java will find exactly one interpretation. It is also possible that a parser will find two ways to parse this code, giving one parse with the else associated with the first if and another parse with it associated with the second. Such a parser comes from an ambiguous grammar.

The following ambiguous grammar comprehends the preceding if statement:

 statement    = "if" comparison statement optionalElse                   callCustomer  sendBill; comparison   = '(' expression operator expression '); expression   = Word  Num; operator     = '<'  '>'  '='  "<=" ">="  "!="; optionalElse = "else" statement  Empty; callCustomer = "callCustomer" '(' ')' ';'; sendBill     = "sendBill" '(' ')' ';'; 

You can generate a parser class from this grammar: the class Dangle in sjm.examples.tests contains this code. To parse an if statement, you can apply the parser object


This parser sometimes finds more than one way to parse an input string. You can verify this by generating and parsing random elements of the language. Here is a program that tests the dangling else grammar:

 package sjm.examples.tests;  import sjm.parse.*; import sjm.parse.tokens.*; /** * Test the statement parser from class <code>Dangle</code> */ public class ShowDangleTest { public static void main(String[] args) {     Parser p = new Dangle().statement();     TokenTester tt = new TokenTester(p);     tt.setLogTestStrings(false);     tt.test(); } } 

This class creates a TokenTester object with the Dangle.statement() parser. The code asks the tester not to print the test strings and launches the test. A sample run of the program produces

 Testing depth 2...  Problem found for string:     if ( mnyawp > zzprr ) if ( 53.6 >= olzu ) sendBill ( ) ;     else callCustomer ( ) ; The parser found 2 ways to parse this string. 

A random test of the parser will quickly determine that the grammar is ambiguous, but it will not show how the parses of a given input differ . The package sjm.examples.pretty includes a PrettyParser class that resets the assemblers in a parser composite to show the order in which a parser parses input text. This reformats, or pretty prints the input string into a parse tree that indicates the order of the parse.

For example, the following program prints the two parses that a dangling else grammar produces:

 package sjm.examples.pretty;  import java.util.*; import sjm.parse.*; import sjm.parse.tokens.*; import sjm.examples.tests.*; /**  * Show that the <code>Dangle.statement()</code> parser  * is ambiguous.  */ public class ShowDangle { public static void main(String[] args) {     String s;     s  = "if (overdueDays > 90)    \n";     s += "    if (balance >= 1000) \n";     s += "        callCustomer();  \n";     s += "else sendBill();";     TokenAssembly ta = new TokenAssembly(s);     PrettyParser p = new PrettyParser(Dangle.statement());     Vector out = p.parseTrees(ta);     Enumeration e = out.elements();     while (e.hasMoreElements()) {         System.out.println("The input parses as:");         System.out.println("---------------------------");         System.out.println(e.nextElement());     } } } 

The main() method creates a token assembly from a sample string and creates a PrettyParser object from the Dangle.statement() parser. The code asks the PrettyParser object to produce a collection of parse trees for the input token assembly. The program displays the following:

 The input parses as:  ---------------------------     if         (             overdueDays             >             90.0         )             if                 (                     balance                     >=                     1000.0                 )                     callCustomer                     (                     )                     ;             else                     sendBill                     (                     )                     ; The input parses as: ---------------------------     if         (             overdueDays             >             90.0         )             if                 (                     balance                     >=                     1000.0                 )                     callCustomer                     (                     )                     ;     else             sendBill             (             )             ; 

The TokenTester class detects a problem with the dangling else grammar, and the PrettyParser class shows the different paths the grammar takes for a given input. Still, the problem remains that the grammar is ambiguous, and neither the TokenTester class nor the PrettyParser class shows how to remove this ambiguity. A simple solution is to require an if statement to conclude with the word endif . The book Compilers [Aho et al.] provides other cures for several forms of ambiguity, including curing the dangling else problem by rewriting the grammar as follows :

 statement = matched  unmatched;  matched   = "if" expression "then" matched "else" matched             other; unmatched = "if" expression "then" statement            "if" expression "then" matched "else" unmatched; 

If you need to remove ambiguity from a grammar, you may be able to follow this pattern to solve the problem.

4.2.2 Terminal Ambiguity

In addition to the classic dangling else case and other cases when ambiguity is inherent in a grammar, a common case of ambiguity occurs when two terminal parsers recognize a given language element. Consider a grammar for queries about relative volumes :

 query  = (Word  volume)* '?';  volume = "cups"  "gallon"  "liter"; 

The idea of this grammar is that we can interpret user queries such as this one:

 "How many cups are in a gallon?" 

To answer such questions you could discard meaningless words and use assemblers to manipulate recognizable units. You first need to address the ambiguity that occurs when you translate this grammar to code. Here is a parser class that implements the grammar:

 package sjm.examples.tests;  import sjm.parse.*; import sjm.parse.tokens.*;  /**   * This class provides an ambiguous parser in its <code>   * query</code> method, ....   */ public class VolumeQuery {  /*   * Return a parser that recognizes the grammar:   *   *     query = (Word  volume)* '?';   */ public static Parser query() {     Parser a = new Alternation()         .add(new Word())         .add(volume());     Parser s = new Sequence()         .add(new Repetition(a))         .add(new Symbol('?'));     return s; } /*  * Return a parser that recognizes the grammar:  *  *     volume = "cups"  "gallon"  "liter";  */ public static Parser volume() {     Parser a = new Alternation()         .add(new Literal("cups"))         .add(new Literal("gallon"))         .add(new Literal("liter"));     return a; } } 

The problem with this class is that the volume words, such as "cups" , qualify as both the literal values in the volume() parser and as Word values in the query() parser. You can see the ambiguity by sending the query parser to a tester:

 package sjm.examples.tests;  import sjm.parse.*; import sjm.parse.tokens.*; /**  * Test the query parser from class <code>VolumeQuery  * </code>.  */ public class ShowVolumeTest { public static void main(String[] args) {     Parser p = VolumeQuery.query();     TokenTester tt = new TokenTester(p);     tt.test(); } } 

A sample run of this class prints the following:

 Testing depth 2...      Testing string glmtt qbz ?     Testing string wqrab gallon eeoaqr ? Problem found for string:     wqrab gallon eeoaqr ? The parser found 2 ways to parse this string. 

A random test of this parser cannot go far. When it uses the volume rule it produces an element that qualifies both as a volume and as a meaningless word. To remove the ambiguity from this parser, you can create a new terminal type to distinguish reserved words from nonreserved words. Chapter 11, "Extending the Parser Toolkit," explains how to implement such a solution.


Building Parsers with Java
Building Parsers With Javaв„ў
ISBN: 0201719622
EAN: 2147483647
Year: 2000
Pages: 169

Similar book on Amazon

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net