What is a programming language? One way to think about it is as a specific syntax with a set of keywords that can be used to define data and express operations on that data. While language syntaxes differ, the underlying abstractions of most popular languages today are very similar. All of them support various data types such as integers and strings, all allow packaging code into methods, and all provide a way to group data and methods into classes. When a new programming language is defined, the usual approach is to define underlying abstractions such as thesekey aspects of the language's semanticsconcomitantly with the language's syntax.
Yet there are other possibilities. Suppose you choose to define the core abstractions for a programming model without mapping them to any particular syntax. If the abstractions were general enough, they could then be used in many different programming languages. Rather than inextricably mingling syntax and semantics, these two things could be kept separate, allowing different languages to be used with the same set of underlying abstractions. This is exactly what's done in the CLR's Common Type System (CTS). The CTS specifies no particular syntax or keywords, but instead defines a common set of types that can be used with many different language syntaxes. Each language has its own syntax, but if that language is built on the CLR, it will use at least some of the types defined by the CTS.
Types are fundamental to any programming language. One simple but concrete way to think of a type is as a set of rules for interpreting the value stored in some memory location, such as the value of a variable. If that variable has an integer type, for example, the bits stored in it are interpreted as an integer. If the variable has a string type, the bits stored in it are interpreted as characters. To a compiler, of course, a type is more than this. Compilers must also understand the rules that define what kinds of values are allowed for each type and what kinds of operations are legal on these values. Among other things, this knowledge allows a compiler to determine whether a value of a particular type is being used correctly.
The set of types defined by the CTS is at the core of the CLR. Programming languages built on the CLR expose these types in a language-dependent way. (For examples of this, see the descriptions of C# and VB in the next chapter.) While the creator of a CLR-based language is free to implement only a subset of the types defined by the CTS and even to add types of his own to his language, most languages built on the CLR make extensive use of the CTS-defined types.
Introducing the Common Type SystemA substantial subset of the types defined by the CTS is shown in Figure 2-1. The first thing to note is that every type inherits either directly or indirectly from a type called Object. (All of these types are actually contained in the System namespace, as mentioned in Chapter 1, so the complete name for this most fundamental type is System.Object.) The second thing to note is that every type defined by the CTS is either a reference type or a value type. As their names suggest, an instance of a reference type always contains a reference to a value of that type, while an instance of a value type contains the value itself. Reference types inherit directly from Object, while all value types inherit directly from a type called ValueType, which in turn inherits from Object. Figure 2-1. The CTS defines reference and value types, all of which inherit from a common Object type.
Value types tend to be simple. The types in this category include Byte, Char, signed and unsigned integers of various lengths, single- and double-precision floating point, Decimal, Boolean, and more. Reference types, by contrast, are typically more complex. As shown in the figure, for instance, Class, Interface, Array, and String are reference types. Yet to understand the difference between value types and reference typesa fundamental distinction in the CTSyou must first understand how memory is allocated for instances of each type. In managed code, values can have their memory allocated in one of two main ways, both managed by the CLR: on the stack or on the heap. Variables allocated on the managed stack are typically created when a method is called or when a running method creates them. In either case, the memory used by stack variables is automatically freed when the method in which they were created returns. Variables allocated on the managed heap, however, don't have their memory freed when the method that created them ends. Instead, the memory used by these variables is freed via a process called garbage collection, a topic that's described in more detail later in this chapter.
A basic difference between value types and reference types is that a standalone instance of a value type is allocated on the stack, while an instance of a reference type has only a reference to its actual value allocated on the stack. The value itself is allocated on the heap. Figure 2-2 shows an abstract picture of how this looks. In the case shown here, three instances of value typesInt16, Char, and Int32have been created on the managed stack, while one instance of the reference type String exists on the managed heap. Note that even the reference type instance has an entry on the stackit's a reference to the memory on the heapbut the instance's contents are stored on the heap[1]. Understanding the distinction between value types and reference types is essential in understanding the CTS type system and, ultimately, the types used by CLR-based languages.
Figure 2-2. Instances of value types are allocated on the managed stack, while instances of reference types are allocated on the managed heap.
A Closer Look at CTS TypesThe CTS defines a large set of types. As already described, the most fundamental of these is Object, from which every CTS type inherits directly or indirectly. In the object-oriented world of the CLR, having a common base for all types is useful. For one thing, since everything inherits from the same root type, an instance of this type can potentially contain any value. Object also implements several methods, and since every CTS type inherits from Object, these methods can be called on an instance of any type. Among the methods Object provides are Equals, which determines whether two objects are identical, and GetType, which returns the type of the object it's called on.
Value TypesAll value types inherit from ValueType. Like Object, ValueType provides an Equals method (in fact, it overrides the method defined in Object). Value types cannot act as a parent type for inheritance, however, so it's not possible to, say, define a new type that inherits from Int32. In the jargon of the CLR, value types are said to be sealed.
Many of the value types defined by the CTS were shown in Figure 2-1. Defined a bit more completely, those types are as follows:
Reference TypesCompared with most value types, the reference types defined by the CTS are relatively complicated. Before describing some of the more important reference types, it's useful to look first at a few elements, officially known as type members, that are common to several types (including both reference and value types). Those elements are as follows:
Type members can be assigned various characteristics. For example, methods, events, and properties can be labeled as abstract, which means that no implementation is supplied; as final, which means that the method, event, or property can't be overridden; or as virtual, which means that exactly which implementation is used can be determined at runtime rather than at compilation. Methods, events, properties, and fields can all be defined as static, which means they are associated with the type itself rather than with any particular instance of that type. (This allows a static method to be invoked on a class without first creating an instance of that class.) Members can also be assigned different accessibilities. For example, a private method can be accessed only from within the type in which it's defined or from another type nested in that type. A method whose accessibility is family, however, can be accessed from within the type in which it's defined and from types that inherit from that type. For even broader use, a method whose accessibility is public can be accessed from any other type.
Given this basic understanding of type members, we can now look at reference types themselves. Among the most important are the following:
As the next chapter shows, CLR-based programming languages such as C# and VB construct their own type system on top of the CTS types. Despite their different representations, however, the semantics of these types are essentially the same in C#, VB, and many other CLR-based languages. In fact, providing this foundation of common programming language types is one of the CLR's most important roles.
Converting Value Types to Reference Types: BoxingThere are cases when an instance of a value type needs to be treated as an instance of a reference type. For example, suppose you'd like to pass an instance of a value type as a parameter to some method, but that parameter is defined to be a reference to a value rather than the value itself. For situations like this, a value type instance can be converted into a reference type instance through a process called boxing.
When a value type instance is boxed, storage is allocated on the heap, and the instance's value is copied into that space. A reference to this storage is placed on the stack, as shown in Figure 2-3. The boxed value is an object, a reference type, that contains the contents of the value type instance. In the figure, the Int32 value 169 shown in Figure 2-2 has been converted to a value of type Object, and its contents have been placed on the heap. A boxed value type instance can also be converted back to its original form, a process called unboxing. Figure 2-3. Boxing converts a value type instance into an instance of an analogous reference type.
Languages built on the CLR commonly hide the process of boxing, so developers may not need to request this transformation explicitly. Still, boxing has performance implicationsdoing it takes time, and references to boxed values are a bit slower than references to unboxed valuesand boxed values behave somewhat differently than unboxed values. Even though the process usually happens silently, it's worth knowing what's going on.
The Common Language SpecificationThe CTS defines a large and fairly complex set of types. Not all of them make sense for all languages. Yet one of the key goals of the CLR is to allow creating code in one language, then calling that code from another. Unless both languages support the same types in the same way, doing this is problematic. Still, requiring every language to implement every CTS type would be burdensome to language developers. The solution to this conundrum is a compromise called the Common Language Specification (CLS). The CLS defines a (large) subset of the CTS that a language must obey if it wishes to interoperate with other CLS-compliant languages. For example, the CLS requires support for most CTS value types, including Boolean, Byte, Char, Decimal, Int16, Int32, Int64, Single, Double, and more. It does not require support, however, for UInt16, UInt32, or UInt64. Similarly, a CTS array is allowed to have its lower bound set at an arbitrary value, while a CLS-compliant array must have a lower bound of zero. There are many more restrictions in the CLS, all of them defined with the same end in mind: allowing effective interoperability among code written in CLR-based languages.
One important thing to note about the rules laid down by the CLS is that they apply only to externally visible aspects of a type. A language is free to do anything it wants within its own world, but whatever it exposes to the outside worldand thus potentially to other languagesis constrained by the CLS. Given the goal of cross-language interoperability, this distinction makes perfect sense. |