Chapter 7: An ML Implementation of the Lambda-Calculus

< Free Open Study >

In this chapter we construct an interpreter for the untyped lambda-calculus, based on the interpreter for arithmetic expressions in Chapter 4 and on the treatment of variable binding and substitution in Chapter 6.

An executable evaluator for untyped lambda-terms can be obtained by a straightforward translation of the foregoing definitions into OCaml. As in Chapter 4, we show just the core algorithms, ignoring issues of lexical analysis, parsing, printing, and so forth.

7.1 Terms and Contexts

We can obtain a datatype representing abstract syntax trees for terms by directly transliterating Definition 6.1.2:

   type term =       TmVar of int     | TmAbs of term     | TmApp of term * term

The representation of a variable is a number-its de Bruijn index. The representation of an abstraction carries just a subterm for the abstraction's body. An application carries the two subterms being applied.

The definition actually used in our implementation, however, will carry a little bit more information. First, as before, it is useful to annotate every term with an element of the type info recording the file position where that term was originally found, so that error printing routines can direct the user (or even the user's text editor, automatically) to the precise point where the error occurred. ^[1]

   type term =       TmVar of info * int     | TmAbs of info * term     | TmApp of info * term * term

Second, for purposes of debugging, it is helpful to carry an extra number on each variable node, as a consistency check. The convention will be that this second number will always contain the total length of the context in which the variable occurs.

   type term =       TmVar of info * int * int     | TmAbs of info * term     | TmApp of info * term * term

Whenever a variable is printed, we will verify that this number corresponds to the actual size of the current context; if it does not, then a shift operation has been forgotten someplace.

One last refinement also concerns printing. Although terms are represented internally using de Bruijn indices, this is obviously not how they should be presented to the user: we should convert from the ordinary representation to nameless terms during parsing, and convert back to ordinary form during printing. There is nothing very hard about this, but we should not do it completely naively (for example, generating completely fresh symbols for the names of variables), since then the names of the bound variables in the terms that are printed would have nothing to do with the names in the original program. This can be fixed by annotating each abstraction with a string to be used as a hint for the name of the bound variable.

   type term =       TmVar of info * int * int     | TmAbs of info * string * term     | TmApp of info * term * term

The basic operations on terms (substitution in particular) do not do anything fancy with these strings: they are simply carried along in their original form, with no checks for name clashes, capture, etc. When the printing routine needs to generate a fresh name for a bound variable, it tries first to use the supplied hint; if this turns out to clash with a name already used in the current context, it tries similar names, adding primes until it finds one that is not currently being used. This ensures that the printed term will be similar to what the user expects, modulo a few primes.

The printing routine itself looks like this:

   let rec printtm ctx t = match t with       TmAbs(fi,x,t1) →         let (ctx',x') = pickfreshname ctx x in         pr "(lambda "; pr x'; pr ". "; printtm ctx' t1; pr ")"     | TmApp(fi, t1, t2) →         pr "("; printtm ctx t1; pr " "; printtm ctx t2; pr ")"     | TmVar(fi,x,n) →         if ctxlength ctx = n then           pr (index2name fi ctx x)         else           pr "[bad index]"

It uses the datatype context,

   type context = (string * binding) list

which is just a list of strings and associated bindings. For the moment, the bindings themselves are completely trivial

   type binding = NameBind

carrying no interesting information. Later on (in Chapter 10), we'll introduce other clauses of the binding type that will keep track of the type assumptions associated with variables and other similar information.

The printing function also relies on several lower-level functions: pr sends a string to the standard output stream; ctxlength returns the length of a context; index2name looks up the string name of a variable from its index. The most interesting one is pickfreshname, which takes a context ctx and a string hint x, finds a name x′ similar to x such that x′ is not already listed in ctx, adds x′ to ctx to form a new context ctx′, and returns both ctx′ and x′ as a pair.

The actual printing function found in the untyped implementation on the book's web site is somewhat more complicated than this one, taking into account two additional issues. First, it leaves out as many parentheses as possible, following the conventions that application associates to the left and the bodies of abstractions extend as far to the right as possible. Second, it generates formatting instructions for a low-level pretty printing module (the OCaml Format library) that makes decisions about line breaking and indentation.

^[1]The system studied in most of this chapter is the pure untyped lambda-calculus (Figure 5-3). The associated implementation is untyped. The fulluntyped implementation includes extensions such as numbers and booleans.

< Free Open Study >