15.7 Intersection and Union Types

< Free Open Study >

A powerful refinement of the subtype relation can be obtained by adding an intersection operator to the language of types. Intersection types were invented by Coppo, Dezani, Sallé, and Pottinger (Coppo and Dezani-Ciancaglini, 1978; Coppo, Dezani-Ciancaglini, and Sallé, 1979; Pottinger, 1980). Accessible introductions can be found in Reynolds (1988, 1998b), Hindley (1992), and Pierce (1991b).

The inhabitants of the intersection type T₁ ∧ T₂ are terms belonging to both S and T-that is, T₁ ∧ T₂ is an order-theoretic meet (greatest lower bound) of T₁ and T₂. This intuition is captured by three new subtyping rules.

One additional rule allows a natural interaction between intersection and arrow types.

The intuition behind this rule is that, if we know a term has the function types S→T₁ and S→T₂, then we can certainly pass it an S and expect to get back both a T₁ and a T₂.

The power of intersection types is illustrated by the fact that, in a call-by-name variant of the simply typed lambda-calculus with subtyping and intersections, the set of untyped lambda-terms that can be assigned types is exactly the set of normalizing terms-i.e., a term is typable iff its evaluation terminates! This immediately implies that the type reconstruction problem (see Chapter 22) for calculi with intersections is undecidable.

More pragmatically, the interest of intersection types is that they support a form of finitary overloading. For example, we might assign the type (Nat→Nat→Nat) ∧ (Float→Float→Float) to an addition operator that can be used on both natural numbers and floats (using tag bits in the runtime representation of its arguments, for example, to select the correct instruction).

Unfortunately, the power of intersection types raises some difficult pragmatic issues for language designers. So far, only one full-scale language, Forsythe (Reynolds, 1988), has included intersections in their most general form. A restricted form known as refinement types may prove more manageable (Freeman and Pfenning, 1991; Pfenning, 1993b; Davies, 1997).

The dual notion of union types, T₁ V T₂, also turns out to be quite useful. Unlike sum and variant types (which, confusingly, are sometimes also called "unions"), T₁ V T₂ denotes the ordinary union of the set of values belonging to T₁ and the set of values belonging to T₂, with no added tag to identify the origin of a given element. Thus, Nat V Nat is actually just another name for Nat. Non-disjoint union types have long played an important role in program analysis (Palsberg and Pavlopoulou, 1998), but have featured in few programming languages (notably Algol 68; cf. van Wijngaarden et al., 1975); recently, though, they are increasingly being applied in the context of type systems for processing of "semistructured" database formats such as XML (Buneman and Pierce, 1998; Hosoya, Vouillon, and Pierce, 2001).

The main formal difference between disjoint and non-disjoint union types is that the latter lack any kind of case construct: if we know only that a value v has type T₁ V T₂, then the only operations we can safely perform on v are ones that make sense for both T₁ and T₂. (For example, if T₁ and T₂ are records, it makes sense to project v on their common fields.) The untagged union type in C is a source of type safety violations precisely because it ignores this restriction, allowing any operation on an element of T₁ V T₂ that makes sense for either T₁ or T₂.

< Free Open Study >