11.10 Variants

< Free Open Study >

Binary sums generalize to labeled variants just as products generalize to labeled records. Instead of T₁+T₂, we write <l₁:T₁, l₂:T₂>, where l₁ and l₂ are field labels. Instead of inl t as T₁+T₂, we write <l₁=t> as <l₁:T₁, l₂:T₂>. And instead of labeling the branches of the case with inl and inr, we use the same labels as the corresponding sum type. With these generalizations, the getAddr example from the previous section becomes:

   Addr = <physical:PhysicalAddr, virtual:VirtualAddr>;   a = <physical=pa> as Addr; ▸ a : Addr   getName = λa:Addr.     case a of       <physical=x> ⇒ x.firstlast     | <virtual=y> ⇒ y.name; ▸  getName : Addr → String

The formal definition of variants is given in Figure 11-11. Note that, as with records in §11.8, the order of labels in a variant type is significant here.

Figure 11-11: Variants

Options

One very useful idiom involving variants is optional values. For example, an element of the type

   OptionalNat = <none:Unit, some:Nat>;

is either the trivial unit value with the tag none or else a number with the tag some—in other words, the type OptionalNat is isomorphic to Nat extended with an additional distinguished value none. For example, the type

   Table = Nat→OptionalNat;

represents finite mappings from numbers to numbers: the domain of such a mapping is the set of inputs for which the result is <some=n> for some n. The empty table

   emptyTable = λn:Nat. <none=unit> as OptionalNat; ▸ emptyTable : Table

is a constant function that returns none for every input. The constructor

   extendTable =     λt:Table. λm:Nat. λv:Nat.       λn:Nat.         if equal n m then <some=v> as OptionalNat         else t n; ▸ extendTable : Table → Nat → Nat → Table

takes a table and adds (or overwrites) an entry mapping the input m to the output <some=v>. (The equal function is defined in the solution to Exercise 11.11.1 on page 510.)

We can use the result that we get back from a Table lookup by wrapping a case around it. For example, if t is our table and we want to look up its entry for 5, we might write

   x = case t(5) of         <none=u> ⇒ 999       | <some=v> ⇒ v;

providing 999 as the default value of x in case t is undefined on 5.

Many languages provide built-in support for options. OCaml, for example, predefines a type constructor option, and many functions in typical OCaml programs yield options. Also, the null value in languages like C, C++, and Java is actually an option in disguise. A variable of type T in these languages (where T is a "reference type"—i.e., something allocated in the heap) can actually contain either the special value null or else a pointer to a T value. That is, the type of such a variable is really Ref(Option(T)), where Option(T) = <none:Unit,some:T>. Chapter 13 discusses the Ref constructor in detail.

Enumerations

Two "degenerate cases" of variant types are useful enough to deserve special mention: enumerated types and single-field variants.

An enumerated type (or enumeration) is a variant type in which the field type associated with each label is Unit. For example, a type representing the days of the working week might be defined as:

   Weekday = <monday:Unit, tuesday:Unit, wednesday:Unit,              thursday:Unit, friday:Unit>;

The elements of this type are terms like <monday=unit> as Weekday. Indeed, since the type Unit has only unit as a member, the type Weekday is inhabited by precisely five values, corresponding one-for-one with the days of the week. The case construct can be used to define computations on enumerations.

   nextBusinessDay = λw:Weekday.     case w of <monday=x>    ⇒ <tuesday=unit> as Weekday             | <thursday=x>  ⇒ <wednesday=unit> as Weekday             | <wednesday=x> ⇒ <thursday=unit> as Weekday             | <tuesday=x>   ⇒ <friday=unit> as Weekday             | <friday=x>    ⇒ <monday=unit> as Weekday;

Obviously, the concrete syntax we are using here is not well tuned for making such programs easy to write or read. some languages (beginning with Pascal) provide special syntax for declaring and using enumerations. Others—such as ML, cf. page 141—make enumerations a special case of the variants.

Single-Field Variants

The other interesting special case is variant types with just a single label l:

   V = <l:T>;

Such a type might not seem very useful at first glance: after all, the elements of V will be in one-to-one correspondence with the elements of the field type T, since every member of V has precisely the form <l=t> for some t : T. What's important, though, is that the usual operations on T cannot be applied to elements of V without first unpackaging them: a V cannot be accidentally mistaken for a T.

For example, suppose we are writing a program to do financial calculations in multiple currencies. Such a program might include functions for converting between dollars and euros. If both are represented as Floats, then these functions might look like this:

   dollars2euros = λd:Float. timesfloat d 1.1325; ▸ dollars2euros : Float → Float   euros2dollars = λe:Float. timesfloat e 0.883; ▸ euros2dollars : Float → Float

(where timesfloat : Float→Float→Float multiplies floating-point numbers). If we then start with a dollar amount

   mybankbalance = 39.50;

we can convert it to euros and then back to dollars like this:

   euros2dollars (dollars2euros mybankbalance); ▸ 39.49990125 : Float

All this makes perfect sense. But we can just as easily perform manipulations that make no sense at all. For example, we can convert my bank balance to euros twice:

   dollars2euros (dollars2euros mybankbalance); ▸ 50.660971875 : Float

Since all our amounts are represented simply as floats, there is no way that the type system can help prevent this sort of nonsense. However, if we define dollars and euros as different variant types (whose underlying representations are floats)

   DollarAmount = <dollars:Float>;   EuroAmount = <euros:Float>;

then we can define safe versions of the conversion functions that will only accept amounts in the correct currency:

   dollars2euros =     λd:DollarAmount.       case d of <dollars=x> ⇒         <euros = timesfloat x 1.1325> as EuroAmount; ▸ dollars2euros : DollarAmount → EuroAmount   euros2dollars =     λe:EuroAmount.       case e of <euros=x> ⇒         <dollars = timesfloat x 0.883> as DollarAmount; ▸ euros2dollars : EuroAmount → DollarAmount

Now the typechecker can track the currencies used in our calculations and remind us how to interpret the final results:

   mybankbalance = <dollars=39.50> as DollarAmount;   euros2dollars (dollars2euros mybankbalance); ▸ <dollars=39.49990125> as DollarAmount : DollarAmount

Moreover, if we write a nonsensical double-conversion, the types will fail to match and our program will (correctly) be rejected:

   dollars2euros (dollars2euros mybankbalance); ▸ Error: parameter type mismatch

Variants vs. Datatypes

A variant type T of the form <l_i:T_i^iÎl..n> is roughly analogous to the ML datatype defined by:^[8]

   type T = l₁ of T₁          | l₂ of T₂          | ...          | l_n of T_n

But there are several differences worth noticing.

One trivial but potentially confusing point is that the capitalization conventions for identifiers that we are assuming here are different from those of OCaml. In OCaml, types must begin with lowercase letters and datatype constructors (labels, in our terminology) with capital letters, so, strictly speaking, the datatype declaration above should be written like this:
```
   type t = L₁ of t₁ | ... | L_n of t_n 
```
To avoid confusion between terms t and types T, we'll ignore OCaml's conventions for the rest of this discussion and use ours instead.
The most interesting difference is that OCaml does not require a type annotation when a constructor l_i is used to inject an element of T_i into the datatype T: we simply write l_i(t). The way OCaml gets away with this (and retains unique typing) is that the datatype T must be declared before it can be used. Moreover, the labels in T cannot be used by any other datatype declared in the same scope. So, when the typechecker sees l_i(t), it knows that the annotation can only be T. In effect, the annotation is "hidden" in the label itself.

This trick eliminates a lot of silly annotations, but it does lead to a certain amount of grumbling among users, since it means that labels cannot be shared between different datatypes—at least, not within the same module. In Chapter 15 we will see another way of omitting annotations that avoids this drawback.
Another convenient trick used by OCaml is that, when the type associated with a label in a datatype definition is just Unit, it can be omitted altogether. This permits enumerations to be defined by writing
```
   type Weekday = monday | tuesday | wednesday | thursday | friday 
```
for example, rather than:
```
   type Weekday = monday of Unit                | tuesday of Unit                | wednesday of Unit                | thursday of Unit                | friday of Unit 
```
Similarly, the label monday all by itself (rather than monday applied to the trivial value unit) is considered to be a value of type Weekday.
Finally, OCaml datatypes actually bundle variant types together with several additional features that we will be examining, individually, in later chapters.
- A datatype definition may be recursive—i.e., the type being defined is allowed to appear in the body of the definition. For example, in the standard definition of lists of Nats, the value tagged with cons is a pair whose second element is a NatList.
```
   type NatList = nil                | cons of Nat * NatList 
```
- An OCaml datatype can be [parametric data type]parameterizedparametric!data type on a type variable, as in the general definition of the List datatype:
```
   type 'a List = nil                 | cons of 'a * 'a List 
```
  Type-theoretically, List can be viewed as a kind of function—called a type operator—that maps each choice of ′a to a concrete datatype... Nat to NatList, etc. Type operators are the subject of Chapter 29.

Variants as Disjoint Unions

Sum and variant types are sometimes called disjoint unions. The type T₁+T₂ is a "union" of T₁ and T₂ in the sense that its elements include all the elements from T₁ and T₂. This union is disjoint because the sets of elements of T₁ or T₂ are tagged with inl or inr, respectively, before they are combined, so that it is always clear whether a given element of the union comes from T₁ or T₂. The phrase union type is also used to refer to untagged (non-disjoint) union types, described in §15.7.

Type Dynamic

Even in statically typed languages, there is often the need to deal with data whose type cannot be determined at compile time. This occurs in particular when the lifetime of the data spans multiple machines or many runs of the compiler—when, for example, the data is stored in an external file system or database, or communicated across a network. To handle such situations safely, many languages offer facilities for inspecting the types of values at run time.

One attractive way of accomplishing this is to add a type Dynamic whose values are pairs of a value v and a type tag T where v has type T. Instances of Dynamic are built with an explicit tagging construct and inspected with a type safe typecase construct. In effect, Dynamic can be thought of as an infinite disjoint union, whose labels are types. See Gordon (circa 1980), Mycroft (1983), Abadi, Cardelli, Pierce, and Plotkin (1991b), Leroy and Mauny (1991), Abadi, Cardelli, Pierce, and Rémy (1995), and Henglein (1994).

^[8]This section uses OCaml's concrete syntax for datatypes, for consistency with implementation chapters elsewhere in the book, but they originated in early dialects of ML and can be found, in essentially the same form, in Standard ML as well as in ML relatives such as Haskell. Datatypes and pattern matching are arguably one of the most useful advantages of these languages for day to day programming.

< Free Open Study >