A grammar is a set of rules that define how to generate valid sentences in a language. Noam Chomsky defined a hierarchy of increasingly complex grammars and corresponding methods for recognizing the sentences that those grammars generate. Grammars are defined by four sets:
Terminals are the strings of characters used in the language.
Nonterminals are names to associate with sequences of strings and other nonterminals.
The start symbol is the name given to the nonterminal that includes all other nonterminals.
The rules define the legitimate sequences for the language.
Rules have left- and right-hand sides. Rules determine possible substitutions.
It is possible to describe the valid words in most high-level programming languages using regular grammars. The phase of a compiler that determines the valid sequences of strings or words is the lexical analyzer (or lexer). Lexical analyzers for languages that can be analyzed with regular grammars are usually created with special tools that generate the analyzer. Handwritten lexical analyzers are sometimes preferred, either for performance reasons or because the language contains some features that make it impractical to use a generated lexical analyzer.
It is possible to describe the valid sentences in most high-level programming languages using context-free grammars. The phase of a compiler that determines the valid sequences of strings or words is the syntactic analyzer (or parser). Syntactic analyzers for languages that can be analyzed with context-free grammars are usually created with special tools that generate the analyzer. Handwritten syntactic analyzers are sometimes preferred, either for performance reasons or because the language contains some features that make it impractical to use a generated syntactic analyzer.
Context-sensitive grammars can be used to describe valid programs in many languages. It is difficult to write them and computationally expensive to use them.
Because of the drawbacks of context-sensitive grammars, most modern compilers use a context-free grammar to describe the valid syntax of a procedure, augmented with various techniques to handle the context-sensitive issues.
To verify the context-sensitive aspects of a program, some compilers use a hand-coded program that checks the results of performing the syntactic analysis. Other compilers use a combination of a hand-coded program and an attribute grammar. The phase of a compiler that determines the valid procedures is the semantic analyzer.
The semantic analyzer performs context-sensitive analysis by referring to additional data structures as it examines its representation of the program:
Each of these data structures can represent information for a single procedure or for the entire program.
Writing these grammars is very difficult. Executing tools that parse these grammars is very intensive computationally. We know of no practical use for them at the present time.