Chapter 1: Introduction | Types and Programming Languages

< Free Open Study >

1.1 Types in Computer Science

Modern software engineering recognizes a broad range of formal methods for helping ensure that a system behaves correctly with respect to some specification, implicit or explicit, of its desired behavior. On one end of the spectrum are powerful frameworks such as Hoare logic, algebraic specification languages, modal logics, and denotational semantics. These can be used to express very general correctness properties but are often cumbersome to use and demand a good deal of sophistication on the part of programmers. At the other end are techniques of much more modest power—modest enough that automatic checkers can be built into compilers, linkers, or program analyzers and thus be applied even by programmers unfamiliar with the underlying theories. One well-known instance of this sort of lightweight formal methods is model checkers, tools that search for errors in finite-state systems such as chip designs or communication protocols. Another that is growing in popularity is run-time monitoring, a collection of techniques that allow a system to detect, dynamically, when one of its components is not behaving according to specification. But by far the most popular and best established lightweight formal methods are type systems, the central focus of this book.

As with many terms shared by large communities, it is difficult to define "type system" in a way that covers its informal usage by programming language designers and implementors but is still specific enough to have any bite. One plausible definition is this:

A type system is a tractable syntactic method for proving the absence of certain program behaviors by classifying phrases according to the kinds of values they compute.

A number of points deserve comment. First, this definition identifies type systems as tools for reasoning about programs. This wording reflects the orientation of this book toward the type systems found in programming languages. More generally, the term type systems (or type theory) refers to a much broader field of study in logic, mathematics, and philosophy. Type systems in this sense were first formalized in the early 1900s as ways of avoiding the logical paradoxes, such as Russell's (Russell, 1902), that threatened the foundations of mathematics. During the twentieth century, types have become standard tools in logic, especially in proof theory (see Gandy, 1976 and Hindley, 1997), and have permeated the language of philosophy and science. Major landmarks in this area include Russell's original ramified theory of types (Whitehead and Russell, 1910), Ramsey's simple theory of types (1925)—the basis of Church's simply typed lambda-calculus (1940)—Martin-Löf's constructive type theory (1973, 1984), and Berardi, Terlouw, and Barendregt's pure type systems (Berardi, 1988; Terlouw, 1989; Barendregt, 1992).

Even within computer science, there are two major branches to the study of type systems. The more practical, which concerns applications to programming languages, is the main focus of this book. The more abstract focuses on connections between various "pure typed lambda-calculi" and varieties of logic, via the Curry-Howard correspondence (§9.4). Similar concepts, notations, and techniques are used by both communities, but with some important differences in orientation. For example, research on typed lambda-calculi is usually concerned with systems in which every well-typed computation is guaranteed to terminate, whereas most programming languages sacrifice this property for the sake of features like recursive function definitions.

Another important element in the above definition is its emphasis on classification of terms—syntactic phrases—according to the properties of the values that they will compute when executed. A type system can be regarded as calculating a kind of static approximation to the run-time behaviors of the terms in a program. (Moreover, the types assigned to terms are generally calculated compositionally, with the type of an expression depending only on the types of its subexpressions.)

The word "static" is sometimes added explicitly—we speak of a "statically typed programming language," for example—to distinguish the sorts of compile-time analyses we are considering here from the dynamic or latent typing found in languages such as Scheme (Sussman and Steele, 1975; Kelsey, Clinger, and Rees, 1998; Dybvig, 1996), where run-time type tags are used to distinguish different kinds of structures in the heap. Terms like "dynamically typed" are arguably misnomers and should probably be replaced by "dynamically checked," but the usage is standard.

Being static, type systems are necessarily also conservative: they can categorically prove the absence of some bad program behaviors, but they cannot prove their presence, and hence they must also sometimes reject programs that actually behave well at run time. For example, a program like

   if <complex test> then 5 else <type error>

will be rejected as ill-typed, even if it happens that the <complex test> will always evaluate to true, because a static analysis cannot determine that this is the case. The tension between conservativity and expressiveness is a fundamental fact of life in the design of type systems. The desire to allow more programs to be typed—by assigning more accurate types to their parts—is the main force driving research in the field.

A related point is that the relatively straightforward analyses embodied in most type systems are not capable of proscribing arbitrary undesired program behaviors; they can only guarantee that well-typed programs are free from certain kinds of misbehavior. For example, most type systems can check statically that the arguments to primitive arithmetic operations are always numbers, that the receiver object in a method invocation always provides the requested method, etc., but not that the second argument to the division operation is non-zero, or that array accesses are always within bounds.

The bad behaviors that can be eliminated by the type system in a given language are often called run-time type errors. It is important to keep in mind that this set of behaviors is a per-language choice: although there is substantial overlap between the behaviors considered to be run-time type errors in different languages, in principle each type system comes with a definition of the behaviors it aims to prevent. The safety (or soundness) of each type system must be judged with respect to its own set of run-time errors.

The sorts of bad behaviors detected by type analysis are not restricted to low-level faults like invoking non-existent methods: type systems are also used to enforce higher-level modularity properties and to protect the integrity of user-defined abstractions. Violations of information hiding, such as directly accessing the fields of a data value whose representation is supposed to be abstract, are run-time type errors in exactly the same way as, for example, treating an integer as a pointer and using it to crash the machine.

Typecheckers are typically built into compilers or linkers. This implies that they must be able to do their job automatically, with no manual intervention or interaction with the programmer—i.e., they must embody computationally tractable analyses. However, there is still plenty of room for requiring guidance from the programmer, in the form of explicit type annotations in programs. Usually, these annotations are kept fairly light, to make programs easier to write and read. But, in principle, a full proof that the program meets some arbitrary specification could be encoded in type annotations; in this case, the typechecker would effectively become a proof checker. Technologies like Extended Static Checking (Detlefs, Leino, Nelson, and Saxe, 1998) are working to settle this territory between type systems and full-scale program verification methods, implementing fully automatic checks for some broad classes of correctness properties that rely only on "reasonably light" program annotations to guide their work.

By the same token, we are most interested in methods that are not just automatable in principle, but that actually come with efficient algorithms for checking types. However, exactly what counts as efficient is a matter of debate. Even widely used type systems like that of ML (Damas and Milner, 1982) may exhibit huge typechecking times in pathological cases (Henglein and Mairson, 1991). There are even languages with typechecking or type reconstruction problems that are undecidable, but for which algorithms are available that halt quickly "in most cases of practical interest" (e.g. Pierce and Turner, 2000; Nadathur and Miller, 1988; Pfenning, 1994).

< Free Open Study >