Given a language, define a represention for its grammar along with an interpreter that uses the representation to interpret sentences in the language.
If a particular kind of problem occurs often enough, then it might be worthwhile to express instances of the problem as sentences in a simple language. Then you can build an interpreter that solves the problem by interpreting these sentences.
For example, searching for strings that match a pattern is a common problem. Regular expressions are a standard language for specifying patterns of strings. Rather than building custom algorithms to match each pattern against strings, search algorithms could interpret a regular expression that specifies a set of strings to match.
The Interpreter pattern describes how to define a grammar for simple languages, represent sentences in the language, and interpret these sentences. In this example, the pattern describes how to define a grammar for regular expressions, represent a particular regular expression, and how to interpret that regular expression.
Suppose the following grammar defines the regular expressions:
expression ::= literal | alternation | sequence | repetition | '(' expression ')' alternation ::= expression '|' expression sequence ::= expression '&' expression repetition ::= expression '*' literal ::= 'a' | 'b' | 'c' | ... { 'a' | 'b' | 'c' | ... }*
The symbol expression is the start symbol, and literal is a terminal symbol defining simple words.
The Interpreter pattern uses a class to represent each grammar rule. Symbols on the right-hand side of the rule are instance variables of these classes. The grammar above is represented by five classes: an abstract class RegularExpression and its four subclasses LiteralExpression, AlternationExpression, SequenceExpression, and RepetitionExpression. The last three classes define variables that hold subexpressions.
Every regular expression defined by this grammar is represented by an abstract syntax tree made up of instances of these classes. For example, the abstract syntax tree
represents the regular expression
raining & (dogs | cats) *
We can create an interpreter for these regular expressions by defining the Interpret operation on each subclass of RegularExpression. Interpret takes as an argument the context in which to interpret the expression. The context contains the input string and information on how much of it has been matched so far. Each subclass of RegularExpression implements Interpret to match the next part of the input string based on the current context. For example,
and so on.
Use the Interpreter pattern when there is a language to interpret, and you can represent statements in the language as abstract syntax trees. The Interpreter pattern works best when
The Interpreter pattern has the following benefits and liabilities:
The Interpreter and Composite patterns share many implementation issues. The following issues are specific to Interpreter:
Terminal nodes generally don't store information about their position in the abstract syntax tree. Parent nodes pass them whatever context they need during interpretation. Hence there is a distinction between shared (intrinsic) state and passed-in (extrinsic) state, and the Flyweight (195) pattern applies.
For example, each instance of LiteralExpression for dog receives a context containing the substring matched so far. And every such LiteralExpression does the same thing in its Interpret operationit checks whether the next part of the input contains a dogno matter where the instance appears in the tree.
Here are two examples. The first is a complete example in Smalltalk for checking whether a sequence matches a regular expression. The second is a C++ program for evaluating Boolean expressions.
The regular expression matcher tests whether a string is in the language defined by the regular expression. The regular expression is defined by the following grammar:
expression ::= literal | alternation | sequence | repetition | '(' expression ')' alternation ::= expression '|' expression sequence ::= expression '&' expression repetition ::= expression 'repeat' literal ::= 'a' | 'b' | 'c' | ... { 'a' | 'b' | 'c' | ... }*
This grammar is a slight modification of the Motivation example. We changed the concrete syntax of regular expressions a little, because symbol "*" can't be a postfix operation in Smalltalk. So we use repeat instead. For example, the regular expression
(('dog ' | 'cat ') repeat & 'weather')
matches the input string "dog dog cat weather".
To implement the matcher, we define the five classes described on . The class SequenceExpression has instance variables expression1 and expression2 for its children in the abstract syntax tree. AlternationExpression stores its alternatives in the instance variables alternative1 and alternative2, while RepetitionExpression holds the expression it repeats in its repetition instance variable. LiteralExpression has a components instance variable that holds a list of objects (probably characters). These represent the literal string that must match the input sequence.
The match: operation implements an interpreter for the regular expression. Each of the classes defining the abstract syntax tree implements this operation. It takes inputState as an argument representing the current state of the matching process, having read part of the input string.
This current state is characterized by a set of input streams representing the set of inputs that the regular expression could have accepted so far. (This is roughly equivalent to recording all states that the equivalent finite state automata would be in, having recognized the input stream to this point).
The current state is most important to the repeat operation. For example, if the regular expression were
'a' repeat
then the interpreter could match "a", "aa", "aaa", and so on. If it were
'a' repeat & 'bc'
then it could match "abc", "aabc", "aaabc", and so on. But if the regular expression were
'a' repeat & 'abc'
then matching the input "aabc" against the subexpression "'a' repeat" would yield two input streams, one having matched one character of the input, and the other having matched two characters. Only the stream that has accepted one character will match the remaining "abc".
Now we consider the definitions of match: for each class defining the regular expression. The definition for SequenceExpression matches each of its subexpressions in sequence. Usually it will eliminate input streams from its inputState.
match: inputState ^ expression2 match: (expression1 match: inputState).
An AlternationExpression will return a state that consists of the union of states from either alternative. The definition of match: for AlternationExpression is
match: inputState | finalState | finalState := alternative1 match: inputState. finalState addAll: (alternative2 match: inputState). ^ finalState
The match: operation for RepetitionExpression tries to find as many states that could match as possible:
match: inputState | aState finalState | aState := inputState. finalState := inputState copy. [aState isEmpty] whileFalse: [aState := repetition match: aState. finalState addAll: aState]. ^ finalState
Its output state usually contains more states than its input state, because a RepetitionExpression can match one, two, or many occurrences of repetition on the input state. The output states represent all these possibilities, allowing subsequent elements of the regular expression to decide which state is the correct one.
Finally, the definition of match: for LiteralExpression tries to match its components against each possible input stream. It keeps only those input streams that have a match:
match: inputState | finalState tStream | finalState := Set new. inputState do: [:stream | tStream := stream copy. (tStream nextAvailable: components size ) = components ifTrue: [finalState add: tStream] ]. ^ finalState
The nextAvailable: message advances the input stream. This is the only match: operation that advances the stream. Notice how the state that's returned contains a copy of the input stream, thereby ensuring that matching a literal never changes the input stream. This is important because each alternative of an AlternationExpression should see identical copies of the input stream.
Now that we've defined the classes that make up an abstract syntax tree, we can describe how to build it. Rather than write a parser for regular expressions, we'll define some operations on the RegularExpression classes so that evaluating a Smalltalk expression will produce an abstract syntax tree for the corresponding regular expression. That lets us use the built-in Smalltalk compiler as if it were a parser for regular expressions.
To build the abstract syntax tree, we'll need to define "|", "repeat", and "&" as operations on RegularExpression. These operations are defined in class RegularExpression like this:
& aNode ^ SequenceExpression new expression1: self expression2: aNode asRExp repeat ^ RepetitionExpression new repetition: self | aNode ^ AlternationExpression new alternative1: self alternative2: aNode asRExp asRExp ^ self
The asRExp operation will convert literals into RegularExpressions. These operations are defined in class String:
& aNode ^ SequenceExpression new expression1: self asRExp expression2: aNode asRExp repeat ^ RepetitionExpression new repetition: self | aNode ^ AlternationExpression new alternative1: self asRExp alternative2: aNode asRExp asRExp ^ LiteralExpression new components: self
If we defined these operations higher up in the class hierarchy (SequenceableCollection in Smalltalk-80, IndexedCollection in Smalltalk/V), then they would also be defined for classes such as Array and OrderedCollection. This would let regular expressions match sequences of any kind of object.
The second example is a system for manipulating and evaluating Boolean expressions implemented in C++. The terminal symbols in this language are Boolean variables, that is, the constants true and false. Nonterminal symbols represent expressions containing the operators and, or, and not. The grammar is defined as follows1:
BooleanExp ::= VariableExp | Constant | OrExp | AndExp | NotExp | '(' BooleanExp ')' AndExp ::= BooleanExp 'and' BooleanExp OrExp ::= BooleanExp 'or' BooleanExp NotExp ::= 'not' BooleanExp Constant ::= 'true' | 'false' VariableExp ::= 'A' | 'B' | ... | 'X' | 'Y' | 'Z'
We define two operations on Boolean expressions. The first, Evaluate, evaluates a Boolean expression in a context that assigns a true or false value to each variable. The second operation, Replace, produces a new Boolean expression by replacing a variable with an expression. Replace shows how the Interpreter pattern can be used for more than just evaluating expressions. In this case, it manipulates the expression itself.
We give details of just the BooleanExp, VariableExp, and AndExp classes here. Classes OrExp and NotExp are similar to AndExp. The Constant class represents the Boolean constants.
BooleanExp defines the interface for all classes that define a Boolean expression:
class BooleanExp { public: BooleanExp(); virtual ~BooleanExp(); virtual bool Evaluate(Context&) = 0; virtual BooleanExp* Replace(const char*, BooleanExp&) = 0; virtual BooleanExp* Copy() const = 0; };
The class Context defines a mapping from variables to Boolean values, which we represent with the C++ constants true and false. Context has the following interface:
class Context { public: bool Lookup(const char*) const; void Assign(VariableExp*, bool); };
A VariableExp represents a named variable:
class VariableExp : public BooleanExp { public: VariableExp(const char*); virtual ~VariableExp(); virtual bool Evaluate(Context&); virtual BooleanExp* Replace(const char*, BooleanExp&); virtual BooleanExp* Copy() const; private: char* _name; };
The constructor takes the variable's name as an argument:
VariableExp::VariableExp (const char* name) { _name = strdup(name); }
Evaluating a variable returns its value in the current context.
bool VariableExp::Evaluate (Context& aContext) { return aContext.Lookup(_name); }
Copying a variable returns a new VariableExp:
BooleanExp* VariableExp::Copy () const { return new VariableExp(_name); }
To replace a variable with an expression, we check to see if the variable has the same name as the one it is passed as an argument:
BooleanExp* VariableExp::Replace ( const char* name, BooleanExp& exp ) { if (strcmp(name, _name) == 0) { return exp.Copy(); } else { return new VariableExp(_name); } }
An AndExp represents an expression made by ANDing two Boolean expressions together.
class AndExp : public BooleanExp { public: AndExp(BooleanExp*, BooleanExp*); virtual ~ AndExp(); virtual bool Evaluate(Context&); virtual BooleanExp* Replace(const char*, BooleanExp&); virtual BooleanExp* Copy() const; private: BooleanExp* _operand1; BooleanExp* _operand2; }; AndExp::AndExp (BooleanExp* op1, BooleanExp* op2) { _operand1 = op1; _operand2 = op2; }
Evaluating an AndExp evaluates its operands and returns the logical "and" of the results.
bool AndExp::Evaluate (Context& aContext) { return _operand1->Evaluate(aContext) && _operand2->Evaluate(aContext); }
An AndExp implements Copy and Replace by making recursive calls on its operands:
BooleanExp* AndExp::Copy () const { return new AndExp(_operand1->Copy(), _operand2->Copy()); } BooleanExp* AndExp::Replace (const char* name, BooleanExp& exp) { return new AndExp( _operand1->Replace(name, exp), _operand2->Replace(name, exp) ); }
Now we can define the Boolean expression
(true and x) or (y and (not x))
and evaluate it for a given assignment of true or false to the variables x and y:
BooleanExp* expression; Context context; VariableExp* x = new VariableExp("X"); VariableExp* y = new VariableExp("Y"); expression = new OrExp( new AndExp(new Constant(true), x), new AndExp(y, new NotExp(x)) ); context.Assign(x, false); context.Assign(y, true); bool result = expression->Evaluate(context);
The expression evaluates to true for this assignment to x and y. We can evaluate the expression with a different assignment to the variables simply by changing the context.
Finally, we can replace the variable y with a new expression and then reevaluate it:
VariableExp* z = new VariableExp("Z"); NotExp not_z(z); BooleanExp* replacement = expression->Replace("Y", not_z); context.Assign(z, true); result = replacement->Evaluate(context);
This example illustrates an important point about the Interpreter pattern: many kinds of operations can "interpret" a sentence. Of the three operations defined for BooleanExp, Evaluate fits our idea of what an interpreter should do most closelythat is, it interprets a program or expression and returns a simple result.
However, Replace can be viewed as an interpreter as well. It's an interpreter whose context is the name of the variable being replaced along with the expression that replaces it, and whose result is a new expression. Even Copy can be thought of as an interpreter with an empty context. It may seem a little strange to consider Replace and Copy to be interpreters, because these are just basic operations on trees. The examples in Visitor illustrate how all three operations can be refactored into a separate "interpreter" visitor, thus showing that the similarity is deep.
The Interpreter pattern is more than just an operation distributed over a class hierarchy that uses the Composite pattern. We consider Evaluate an interpreter because we think of the BooleanExp class hierarchy as representing a language. Given a similar class hierarchy for representing automotive part assemblies, it's unlikely we'd consider operations like Weight and Copy as interpreters even though they are distributed over a class hierarchy that uses the Composite patternwe just don't think of automotive parts as a language. It's a matter of perspective; if we started publishing grammars of automotive parts, then we could consider operations on those parts to be ways of interpreting the language.
The Interpreter pattern is widely used in compilers implemented with object-oriented languages, as the Smalltalk compilers are. SPECTalk uses the pattern to interpret descriptions of input file formats [Sza92]. The QOCA constraint-solving toolkit uses it to evaluate constraints [HHMV92].
Considered in its most general form (i.e., an operation distributed over a class hierarchy based on the Composite pattern), nearly every use of the Composite pattern will also contain the Interpreter pattern. But the Interpreter pattern should be reserved for those cases in which you want to think of the class hierarchy as defining a language.
Composite : The abstract syntax tree is an instance of the Composite pattern.
Flyweight shows how to share terminal symbols within the abstract syntax tree.
Iterator : The interpreter can use an Iterator to traverse the structure.
Visitor can be used to maintain the behavior in each node in the abstract syntax tree in one class.
1For simplicity, we ignore operator precedence and assume it's the responsibility of whichever object constructs the syntax tree.