5.3 Formalities

< Free Open Study >

For the rest of the chapter, we consider the syntax and operational semantics of the lambda-calculus in more detail. Most of the structure we need is closely analogous to what we saw in Chapter 3 (to avoid repeating that structure verbatim, we address here just the pure lambda-calculus, unadorned with booleans or numbers). However, the operation of substituting a term for a variable involves some surprising subtleties.

Syntax

As in Chapter 3, the abstract grammar defining terms (on page 53) should be read as shorthand for an inductively defined set of abstract syntax trees.

5.3.1 Definition [Terms]

Let V be a countable set of variable names. The set of terms is the smallest set T such that

x Î T for every x Î V;
if t₁ Î T and x Î V, then λx.t₁ Î T;
if t₁ Î T and t₂ Î T, then t₁ t₂ Î T.

The size of a term t can be defined exactly as we did for arithmetic expressions in Definition 3.3.2. More interestingly, we can give a simple inductive definition of the set of variables appearing free in a lambda-term.

5.3.2 Definition

The set of free variables of a term t, written FV(t), is defined as follows:

FV(x)

=

{x}

FV(λx,t₁)

=

FV(t₁) \ {x}

FV(t₁ t₂)

=

FV(t₁) È FV(t₂)

5.3.3 Exercise [⋆⋆]

Give a careful proof that |FV(t)| ≤ size(t) for every term t.

Substitution

The operation of substitution turns out to be quite tricky, when examined in detail. In this book, we will actually use two different definitions, each optimized for a different purpose. The first, introduced in this section, is compact and intuitive, and works well for examples and in mathematical definitions and proofs. The second, developed in Chapter 6, is notationally heavier, depending on an alternative "de Bruijn presentation" of terms in which named variables are replaced by numeric indices, but is more convenient for the concrete ML implementations discussed in later chapters.

It is instructive to arrive at a definition of substitution via a couple of wrong attempts. First, let's try the most naive possible recursive definition. (Formally, we are defining a function [x ↦ s] by induction over its argument t.)

[x ↦ s]x	=	s
[x ↦ s]y	=	y	if x ≠ y
[x ↦ s](λy.t₁)	=	λy.[x ↦ s]t₁
[x ↦ s](t₁ t₂)	=	([x ↦ s]t₁) ([x ↦ s]t₂)

This definition works fine for most examples. For instance, it gives

   [x ↦ (λz. z w)](λy.x) = λy.z. z w,

which matches our intuitions about how substitution should behave. However, if we are unlucky with our choice of bound variable names, the definition breaks down. For example:

   [x ↦ y](λx.x) = λx.y

This conflicts with the basic intuition about functional abstractions that the names of bound variables do not matter-the identity function is exactly the same whether we write it λx.x or λy.y or λfranz.franz. If these do not behave exactly the same under substitution, then they will not behave the same under reduction either, which seems wrong.

Clearly, the first mistake that we've made in the naive definition of substitution is that we have not distinguished between free occurrences of a variable x in a term t (which should get replaced during substitution) and bound ones, which should not. When we reach an abstraction binding the name x inside of t, the substitution operation should stop. This leads to the next attempt:

This is better, but still not quite right. For example, consider what happens when we substitute the term z for the variable x in the term λz.x:

   [x ↦ z](λz.x) = λz.z

This time, we have made essentially the opposite mistake: we've turned the constant function λz.x into the identity function! Again, this occurred only because we happened to choose z as the name of the bound variable in the constant function, so something is clearly still wrong.

This phenomenon of free variables in a term s becoming bound when s is naively substituted into a term t is called variable capture. To avoid it, we need to make sure that the bound variable names of t are kept distinct from the free variable names of s. A substitution operation that does this correctly is called capture-avoiding substitution. (This is almost always what is meant by the unqualified term "substitution.") We can achieve the desired eMect by adding another side condition to the second clause of the abstraction case:

click to expand

Now we are almost there: this definition of substitution does the right thing when it does anything at all. The problem now is that our last fix has changed substitution into a partial operation. For example, the new definition does not give any result at all for [x ↦ y z](λy. x y): the bound variable y of the term being substituted into is not equal to x, but it does appear free in (y z), so none of the clauses of the definition apply.

One common fix for this last problem in the type systems and lambda-calculus literature is to work with terms "up to renaming of bound variables." (Church used the term alpha-conversion for the operation of consistently renaming a bound variable in a term. This terminology is still common-we could just as well say that we are working with terms "up to alpha-conversion.")

5.3.4 Convention

Terms that differ only in the names of bound variables are interchangeable in all contexts.

What this means in practice is that the name of any λ-bound variable can be changed to another name (consistently making the same change in the body of the λ), at any point where this is convenient. For example, if we want to calculate [x ↦ y z](λy. xy), we first rewrite (λy. x y) as, say, (λw. x w). We then calculate [x ↦ y z](λw. x w), giving (λw. y z w).

This convention renders the substitution operation "as good as total," since whenever we find ourselves about to apply it to arguments for which it is undefined, we can rename as necessary, so that the side conditions are satisfied. Indeed, having adopted this convention, we can formulate the definition of substitution a little more tersely. The first clause for abstractions can be dropped, since we can always assume (renaming if necessary) that the bound variable y is different from both x and the free variables of s. This yields the final form of the definition.

5.3.5 Definition [Substitution]

[x ↦ s]x	=	s
[x ↦ s]y	=	y	if y ≠ x
[x ↦ s](λy.t₁)	=	λy. [x ↦ s]t₁	if y ≠ x and y ∉ FV(s)
[x ↦ s](t₁ t₂)	=	[x ↦ s]t₁ [x ↦ s]t₂

Operational Semantics

The operational semantics of lambda-terms is summarized in Figure 5-3. The set of values here is more interesting than we saw in the case of arithmetic expressions. Since (call-by-value) evaluation stops when it reaches a lambda, values can be arbitrary lambda-terms.

Figure 5-3: Untyped Lambda-Calculus (λ)

The evaluation relation appears in the right-hand column of the figure. As in evaluation for arithmetic expressions, there are two sorts of rules: the computation rule E-APPABS and the congruence rules E-APP1 and E-APP2.

Notice how the choice of metavariables in these rules helps control the order of evaluation. Since v₂ ranges only over values, the left-hand side of rule E-APPABS can match any application whose right-hand side is a value. Similarly, rule E-APP1 applies to any application whose left-hand side is not a value, since t₁ can match any term whatsoever, but the premise further requires that t₁ can take a step. E-APP2, on the other hand, cannot fire until the left-hand side is a value so that it can be bound to the value-metavariable v. Taken together, these rules completely determine the order of evaluation for an application t₁ t₂: we first use E-APP1 to reduce t₁ to a value, then use E-APP2 to reduce t₂ to a value, and finally use E-APPABS to perform the application itself.

5.3.6 Exercise [⋆⋆]

Adapt these rules to describe the other three strategies for evaluation-full beta-reduction, normal-order, and lazy evaluation.

Note that, in the pure lambda-calculus, lambda-abstractions are the only possible values, so if we reach a state where E-APP1 has succeeded in reducing t₁ to a value, then this value must be a lambda-abstraction. This observation fails, of course, when we add other constructs such as primitive booleans to the language, since these introduce forms of values other than abstractions.

5.3.7 Exercise [⋆⋆ ↛]

Exercise 3.5.16 gave an alternative presentation of the operational semantics of booleans and arithmetic expressions in which stuck terms are defined to evaluate to a special constant wrong. Extend this semantics to λNB.

5.3.8 Exercise [⋆⋆]

Exercise 4.2.2 introduced a "big-step" style of evaluation for arithmetic expressions, where the basic evaluation relation is "term t evaluates to final result v." Show how to formulate the evaluation rules for lambda-terms in the big-step style.

< Free Open Study >

FV(x)	=	{x}
FV(λx,t₁)	=	FV(t₁) \ {x}
FV(t₁ t₂)	=	FV(t₁) È FV(t₂)