14.3 Classifying errors by linguistic formalism


14.3 Classifying errors by linguistic formalism

14.3.1 Chomsky hierarchy

A grammar is a set of rules that define how to generate valid sentences in a language. Noam Chomsky defined a hierarchy of increasingly complex grammars and corresponding methods for recognizing the sentences that those grammars generate. Grammars are defined by four sets:

  1. Terminals are the strings of characters used in the language.

  2. Nonterminals are names to associate with sequences of strings and other nonterminals.

  3. The start symbol is the name given to the nonterminal that includes all other nonterminals.

  4. The rules define the legitimate sequences for the language.

    Rules have left- and right-hand sides. Rules determine possible substitutions.

14.3.1.1 The Chomsky hierarchy

Type

Grammar

Recognizer

3

Regular

Finite-state automaton

2

Context-free

Push-down automaton

1

Context-sensitive

Turing machine

0

Phrase-structure

None

14.3.1.2 Regular grammars

It is possible to describe the valid words in most high-level programming languages using regular grammars. The phase of a compiler that determines the valid sequences of strings or words is the lexical analyzer (or lexer). Lexical analyzers for languages that can be analyzed with regular grammars are usually created with special tools that generate the analyzer. Handwritten lexical analyzers are sometimes preferred, either for performance reasons or because the language contains some features that make it impractical to use a generated lexical analyzer.

14.3.1.3 Context-free grammars

It is possible to describe the valid sentences in most high-level programming languages using context-free grammars. The phase of a compiler that determines the valid sequences of strings or words is the syntactic analyzer (or parser). Syntactic analyzers for languages that can be analyzed with context-free grammars are usually created with special tools that generate the analyzer. Handwritten syntactic analyzers are sometimes preferred, either for performance reasons or because the language contains some features that make it impractical to use a generated syntactic analyzer.

14.3.1.4 Context-sensitive grammars

Context-sensitive grammars can be used to describe valid programs in many languages. It is difficult to write them and computationally expensive to use them.

Because of the drawbacks of context-sensitive grammars, most modern compilers use a context-free grammar to describe the valid syntax of a procedure, augmented with various techniques to handle the context-sensitive issues.

To verify the context-sensitive aspects of a program, some compilers use a hand-coded program that checks the results of performing the syntactic analysis. Other compilers use a combination of a hand-coded program and an attribute grammar. The phase of a compiler that determines the valid procedures is the semantic analyzer.

The semantic analyzer performs context-sensitive analysis by referring to additional data structures as it examines its representation of the program:

  • Symbol tables

  • Control-flow information

  • Data-flow information

Each of these data structures can represent information for a single procedure or for the entire program.

14.3.1.5 Phrase-structure grammar

Writing these grammars is very difficult. Executing tools that parse these grammars is very intensive computationally. We know of no practical use for them at the present time.




Debugging by Thinking. A Multidisciplinary Approach
Debugging by Thinking: A Multidisciplinary Approach (HP Technologies)
ISBN: 1555583075
EAN: 2147483647
Year: 2002
Pages: 172

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net