4.2 IMPLEMENTATION | Algorithms for Compiler Design (Electrical and Computer Engineering Series)

4.1 TOP-DOWN PARSING

Top-down parsing attempts to find the left-most derivations for an input string w , which is equivalent to constructing a parse tree for the input string w that starts from the root and creates the nodes of the parse tree in a predefined order. The reason that top-down parsing seeks the left-most derivations for an input string w and not the right-most derivations is that the input string w is scanned by the parser from left to right, one symbol/token at a time, and the left-most derivations generate the leaves of the parse tree in left-to-right order, which matches the input scan order.

Since top-down parsing attempts to find the left-most derivations for an input string w , a top-down parser may require backtracking (i.e., repeated scanning of the input); because in the attempt to obtain the left-most derivation of the input string w , a parser may encounter a situation in which a nonterminal A is required to be derived next , and there are multiple A - productions , such as A ’ ± ₁ ± ₂ ± _n . In such a situation, deciding which A -production to use for the derivation of A is a problem. Therefore, the parser will select one of the A -productions to derive A , and if this derivation finally leads to the derivation of w , then the parser announces the successful completion of parsing. Otherwise, the parser resets the input pointer to where it was when the nonterminal A was derived, and it tries another A -production. The parser will continue this until it either announces the successful completion of the parsing or reports failure after trying all of the alternatives. For example, consider the top-down parser for the following grammar:

Let the input string be w = acb . The parser initially creates a tree consisting of a single node, labeled S , and the input pointer points to a , the first symbol of input string w . The parser then uses the S -production S ’ aAb to expand the tree as shown in Figure 4.1.

Figure 4.1: Parser uses the S-production to expand the parse tree.

The left-most leaf, labeled a , matches the first input symbol of w . Hence, the parser will now advance the input pointer to c , the second symbol of string w , and consider the next leaf labeled A . It will then expand A , using the first alternative for A in order to obtain the tree shown in Figure 4.2.

Figure 4.2: Parser uses the first alternative for A in order to expand the tree.

The parser now has the match for the second input symbol. So, it advances the pointer to b , the third symbol of w , and compares it to the label of the next leaf. If the label does not match d , it reports failure and goes back (backtracks) to A , as shown in Figure 4.3. The parser will also reset the input pointer to the second input symbol ”the position it had when the parser encountered A ”and it will try a second alternative to A in order to obtain the tree. If the leaf c matches the second symbol, and if the next leaf b matches the third symbol of w , then the parser will halt and announce the successful completion of parsing.

Figure 4.3: If the parser fails to match a leaf, the point of failure, d, reroutes (backtracks) the pointer to alternative paths from A.