Chapter 3: Untyped Arithmetic Expressions | Types and Programming Languages

< Free Open Study >

To talk rigorously about type systems and their properties, we need to start by dealing formally with some more basic aspects of programming languages. In particular, we need clear, precise, and mathematically tractable tools for expressing and reasoning about the syntax and semantics of programs.

This chapter and the next develop the required tools for a small language of numbers and booleans. This language is so trivial as to be almost beneath consideration, but it serves as a straightforward vehicle for the introduction of several fundamental concepts-abstract syntax, inductive definitions and proofs, evaluation, and the modeling of run-time errors. Chapters 5 through 7 elaborate the same story for a much more powerful language, the untyped lambda-calculus, where we must also deal with name binding and substitution. Looking further ahead, Chapter 8 commences the study of type systems proper, returning to the simple language of the present chapter and using it to introduce basic concepts of static typing. Chapter 9 extends these concepts to the lambda-calculus.

3.1 Introduction

The language used in this chapter contains just a handful of syntactic forms: the boolean constants true and false, conditional expressions, the numeric constant 0, the arithmetic operators succ (successor) and pred (predecessor), and a testing operation iszero that returns true when it is applied to 0 and false when it is applied to some other number. These forms can be summarized compactly by the following grammar. ^[1]

  t  ::=         true         false         if t then t else t         0         succ t         pred t         iszero t

terms:

constant true

constant false

conditional

constant zero

successor

predecessor

zero test

The conventions used in this grammar (and throughout the book) are close to those of standard BNF (cf. Aho, Sethi, and Ullman, 1986). The first line (t ::=) declares that we are defining the set of terms, and that we are going to use the letter t to range over terms. Each line that follows gives one alternative syntactic form for terms. At every point where the symbol t appears, we may substitute any term. The italicized phrases on the right are just comments.

The symbol t in the right-hand sides of the rules of this grammar is called a metavariable. It is a variable in the sense that it is a place-holder for some particular term, and "meta" in the sense that it is not a variable of the object language-the simple programming language whose syntax we are currently describing-but rather of the metalanguage-the notation in which the description is given. (In fact, the present object language doesn't even have variables; we'll introduce them in Chapter 5.) The prefix meta- comes from meta-mathematics, the subfield of logic whose subject matter is the mathematical properties of systems for mathematical and logical reasoning (which includes programming languages). This field also gives us the term metatheory, meaning the collection of true statements that we can make about some particular logical system (or programming language)-and, by extension, the study of such statements. Thus, phrases like "metatheory of subtyping" in this book can be understood as, "the formal study of the properties of systems with subtyping."

Throughout the book, we use the metavariable t, as well as nearby letters such as s, u, and r and variants such as t₁ and s′, to stand for terms of whatever object language we are discussing at the moment; other letters will be introduced as we go along, standing for expressions drawn from other syntactic categories. A complete summary of metavariable conventions can be found in Appendix B.

For the moment, the words term and expression are used interchangeably. Starting in Chapter 8, when we begin discussing calculi with additional syntactic categories such as types, we will use expression for all sorts of syntactic phrases (including term expressions, type expressions, kind expressions, etc.), reserving term for the more specialized sense of phrases representing computations (i.e., phrases that can be substituted for the metavariable t).

A program in the present language is just a term built from the forms given by the grammar above. Here are some examples of programs, along with the results of evaluating them:

   if false then 0 else 1; ▸ 1   iszero (pred (succ 0)); ▸ true

Throughout the book, the symbol ▸ is used to display the results of evaluating examples. (For brevity, results will be elided when they are obvious or unimportant.) During typesetting, examples are automatically processed by the implementation corresponding to the formal system in under discussion (arith here); the displayed responses are the implementation's actual output.

In examples, compound arguments to succ, pred, and iszero are enclosed in parentheses for readability.^[2] Parentheses are not mentioned in the grammar of terms, which defines only their abstract syntax. Of course, the presence or absence of parentheses makes little difference in the extremely simple language that we are dealing with at the moment: parentheses are usually used to resolve ambiguities in the grammar, but this grammar is does not have any ambiguities-each sequence of tokens can be parsed as a term in at most one way. We will return to the discussion of parentheses and abstract syntax in Chapter 5 (p. 52).

For brevity in examples, we use standard arabic numerals for numbers, which are represented formally as nested applications of succ to 0. For example, succ(succ(succ(0))) is written as 3.

The results of evaluation are terms of a particularly simple form: they will always be either boolean constants or numbers (nested applications of zero or more instances of succ to 0). Such terms are called values, and they will play a special role in our formalization of the evaluation order of terms.

Notice that the syntax of terms permits the formation of some dubious-looking terms like succ true and if 0 then 0 else 0. We shall have more to say about such terms later-indeed, in a sense they are precisely what makes this tiny language interesting for our purposes, since they are examples of the sorts of nonsensical programs we will want a type system to exclude.

^[1]The system studied in this chapter is the untyped calculus of booleans and numbers (Figure 3-2, on page 41). The associated OCaml implementation, called arith in the web repository, is described in Chapter 4. Instructions for downloading and building this checker can be found at http://www.cis.upenn.edu/~bcpierce/tapl.

^[2]In fact, the implementation used to process the examples in this chapter (called arith on the book's web site) actually requires parentheses around compound arguments to succ, pred, and iszero, even though they can be parsed unambiguously without parentheses. This is for consistency with later calculi, which use similar-looking syntax for function application.

< Free Open Study >