Type System Modifications | The Visual Basic .NET Programming Language

< Day Day Up >

In addition to extensions to the Visual Basic type system, a number of modifications to the existing type system were required by the move to the CLR. For the most part, this was because many of the CLR types that corresponded to COM types (such as arrays and Variant ) used different underlying storage.

Arrays

Arrays on the CLR are much more sophisticated than COM arrays (COM arrays are also called "SafeArrays"). There are two principal differences that caused changes from Visual Basic 6.0.

The first difference is that all arrays in the CLR encode their rank within their type. So a two-dimensional array of Integer is considered to be a different type from a one-dimensional array of Integer in the CLR. COM SafeArrays, however, encode their rank as an attribute of an array instance and not the array type. So a two-dimensional array of Integer is actually the same type as a one-dimensional array of Integer in COM; the instances just happen to have different ranks. Because of this difference between COM and the CLR, code from previous versions of Visual Basic that declares and uses arrays cannot be ported unchanged. In most cases, the differences between COM and the CLR don't cause serious problems ”most variables are ever assigned an array of only one particular rank, and the common case is that the rank of the array can be inferred easily from the code around it. For example, the following Visual Basic 6.0 code:

 Sub Main()   Dim x() As Integer   ReDim x(10, 10)   x(1, 1) = 10 End Sub

can pretty easily be rewritten as follows .

 Module Test   Sub Main()     Dim x(,) As Integer  ' 2-dimensional array     ReDim x(10, 10)     x(1, 1) = 10   End Sub End Module

This difference between array types can cause problems, however, when you are trying to interoperate with COM. When you are transferring an array from the CLR to COM, it is impossible to know what rank array a COM method expects, because there is no way to inspect the API's code. Currently, the CLR treats all COM array parameters as one-dimensional arrays of the specified type. This takes care of the most common situations, but is not perfect. At this point, there is no simple way to specify the rank of an array parameter of an imported COM method.

The other difference between SafeArrays and CLR arrays is more problematic . A SafeArray type can contain a set of bounds for each dimension of the array, but those bounds are not considered to be part of the actual type, just as the rank of the array is not considered to be part of the type. In the CLR, however, the lower bounds of each dimension of the array are allowed to be (but not required to be) part of the type itself. So while there is, for example, a one-dimensional Integer array type that can have any lower bound, there are also one-dimensional Integer array types that have a fixed lower bound of zero, of one, of two, and so on. While any array of the particular rank and type can be assigned to the first kind of array type (i.e., the one without any fixed lower bound), only an array that has matching lower bounds can be assigned to the other kinds of array types (i.e., a one-dimensional Integer array with a lower bound of 5 cannot be assigned to a variable of type one-dimensional Integer array with a lower bound of 6).

Overall, this whole question would be academic if it was not for one fact ”the Common Language Specification (CLS) decrees that all CLS-compliant languages must be able to use array types that have fixed lower bounds of zero. This is because languages such as C# and C++ do not allow the lower bounds of their arrays to be anything but zero. They could, conceivably, still use the array types that do not fix the lower bounds to represent an array that has a fixed lower bound of zero, but to do so would mean that several code optimizations based on knowing the lower bound of each dimension of an array would be lost. This was unacceptable to the designers of those languages.

A simple solution to this problem would be to say that Visual Basic .NET should use the array types that do not fix the lower bounds and just throw exceptions if you tried to pass one with nonzero lower bounds to a CLS-compliant method. The problem is that array types can make up part of the signatures of methods . So when you are overriding a method defined in C# that takes an array, the signature of the VB method has to match completely, including the fact that the array type has a fixed lower bound of zero. The result of this is that if Visual Basic .NET wishes to have nonzero lower bound arrays in the language, it must distinguish between array types that can have nonzero lower bounds and array types that have their lower bounds fixed at zero.

After much deliberation, it was decided that having two separate types of arrays in the language would be too confusing for most users. In many cases, it would be unclear which type should be used when declaring a method. Even worse , when errors occurred, it would be difficult to concisely explain exactly what was wrong. However, the removal of nonzero lower bound arrays is a significant issue for many users who routinely declared arrays with a lower bound other than zero.

Variant and Object

Previous versions of Visual Basic could be used in a loosely typed way through use of the Variant data type. A Variant is a structure that is a union of all the different COM types: Integer , Double , BSTR , IDispatch , and so on. This allows a Variant to contain any value expressible in Visual Basic. As a result, if you declared all variables as type Variant (implicitly or explicitly), Visual Basic became an almost "typeless" language.

This same scheme could not be used in the CLR, because the runtime does not allow reference types to be unioned with other types in a structure. This is because the CLR manages references to values on the heap and tracks them for garbage collection. To allow an Integer to be unioned with a reference to the heap would mean that the CLR would not be able to reliably know whether a particular instance of the type contained a reference or an Integer . This would cause safety and security issues, as well as making garbage collection impossible. It would have been possible for the CLR to instead expose a Variant type that it natively understood ”in other words, the CLR would ensure that the "type" field of the Variant structure was always consistent with the "value" field. However, this was not ideal, because of the existence of the CLR Object type.

The CLR type system was designed as a single-root type system. What this means it that every type in the type system ultimately derives from Object . For reference types, this is no different from COM ”in COM, every coclass implemented the IUnknown interface, which was equivalent to the Object type in Visual Basic 6.0. For value types (i.e., structures and primitive types), though, this is a big change. It is accomplished by something of a trick, as discussed in Chapter 9. Each value type can exist in one of two forms: its boxed form and its unboxed form. An unboxed value type is just a regular value type ”it exists on the stack, in an array or within another type, and has no identity in and of itself. Unboxed value types are not managed by the CLR, because they are always part of something else. A boxed value type is the tricky part. When a value type is boxed, a copy of the value is made on the heap, and a reference to the heap location is returned. From this point on, the boxed value type can be treated just as if it were a reference type. Essentially, a value type is boxed when it is cast to Object , and unboxed when it is cast from Object .

The problem may be clearer at this point. Because every type in the CLR type system can be cast to Object , Object is in many ways functionally equivalent to Variant . But even though Object and Variant are mostly equivalent, they are not exactly equivalent. Assigning a value to a Variant variable only copies the value into the Variant itself, no matter how many values are assigned to the variable. Assigning a value to an Object variable allocates space on the heap and then copies the value into that space. Each time you assign a new value to the Object variable, an allocation occurs on the heap. Thus, there is more overhead to using Object than Variant .

Ultimately, we had a choice to make. We could keep Variant and live with the confusion over whether to use Object or Variant when a "universal" type was needed, or just use Object exclusively and suffer some performance penalty. After much analysis, we determined that in most cases the performance of Object would not appreciably affect applications. Additionally, we were concerned that even within Microsoft itself, the teams designing various parts of the .NET Framework would not be able to decide between using Variant and Object consistently in their APIs. It became clear that using Object exclusively was the best option.

A side effect of this decision relates to how ByRef parameters of type Object behave. The COM Variant type can not only store a value itself, it can also store a pointer to a value. This means that when a method in Visual Basic 6.0 calls a method declared with a ByRef Variant parameter, the argument passes a pointer to itself, allowing the called method to modify the original argument storage. Because of the complexity that would have been involved in allowing it, the CLR does not allow arbitrary pointers to storage locations to be passed around, unless the code is declared to be "unsafe." Visual Basic .NET does not support writing unsafe code, so there was no simple way to simulate the ByRef Variant behavior on the CLR.

There is a way to simulate the ByRef Variant behavior on the CLR in a safe way by using the RefAny type, but this type had some significant limitations and greatly complicated the implementation of the language. A simpler mechanism that we ultimately settled on was to instead pass values to ByRef Object parameters (the CLR equivalent to ByRef Variant parameters) using copy-in/copy-out semantics rather than true byref semantics, as covered in Chapter 10. What this briefly means is that instead of passing a pointer to the storage of the argument, a temporary Object variable is allocated, the argument is copied into, and a pointer to the temporary is passed to the method. Then, when the method returns, the value in the Object variable is cast back to the argument type and assigned back to the argument source. This means that changes to the parameter will not be reflected in the original argument location until the method returns. In practice, this distinction should not be often noticed. For example, the following code:

 Module Test   Public g As Integer = 10   Function ChangeValue(ByRef o As Object)     o = 20     Console.WriteLine(g & " " & o)   End Function   Sub Main     ChangeValue(g)   End Sub End Module

will print "10 20" because it is equivalent to the following.

 Module Test   Public g As Integer = 10   Function ChangeValue(ByRef o As Object)     o = 20     Console.WriteLine(g & " " & o)   End Function   Sub Main()     Dim Temp As Object     Temp = g     ChangeValue(Temp)     g = Temp   End Sub End Module

Structures

User -defined types in Visual Basic 6.0 were renamed as structures and had their functionality significantly expanded in Visual Basic .NET. The name was changed both because structures can do more in the CLR and because the keyword Type conflicted with the type name System.Type . Because System.Type is the foundation of reflection (the .NET Framework system of runtime type inspection), keeping the keyword Type would have meant that writing code that uses reflection would require frequent use of escaping (i.e., [Type] instead of Type ).

In Visual Basic .NET, structures are user-defined value types. Since there is no distinction in terms of functionality between value types and reference types (only a distinction in terms of storage and lifetime), structures can now have methods, events, properties, and so on, and their data members can now have accessibility modifiers placed on them. In Visual Basic .NET, the choice between a class and a structure boils down to only storage considerations and assignment behavior.

Date, Currency, and Decimal

A few other data type changes are worth noting. First, COM has both a Currency and a Decimal type. Because the Decimal type has a greater range and precision than Currency (i.e., it can store all the values that Currency can), we decided to replace the Currency type entirely with Decimal .

The other change was more subtle. In COM, a Date variable is stored using a double precision floating-point number. The number to the left of the decimal point represents the number of days since December 30, 1899. The number to the right of the decimal point represents the number of seconds since midnight. (This is a slightly simplified description of the actual representation, but should give the general idea.) Because the Date data type is represented as a floating-point number, the data type has the same limitations in terms of range and exactness that the underlying data type does.

The CLR DateTime class, in contrast, is stored using a signed 64-bit integer number representing the number of milliseconds since midnight, January 1, 1. This is a more precise representation and can represent a greater range of dates. This has two impacts on the Visual Basic language, however. First, it means that Double values can no longer implicitly be converted to and from Date values. Second, transferring dates from COM to the CLR and vice versa involves a translation. By and large, this is mostly a question of whether a DateTime value will fit into the range of the COM date, but there is one special situation that can cause problems.

In previous versions of Visual Basic, there was no way to express just a time value (i.e., a time of day not tied to a particular day). To work around this, Visual Basic would treat the time of day on the first day representable (December 30, 1899) as just a time value. In other words, printing the date constant #12/30/1899 1:43PM# would just print 1:43 PM . December 30, 1899, is also the zero day for the COM Date type ”in other words, a Date variable that has no value assigned to it will by default represent midnight, December 30, 1899.

To preserve this behavior, Visual Basic .NET treats the zero day of the DateTime type (January 1, 1) as the "time only" date. In the case of code migrated from previous versions of Visual Basic, this preserves the semantics of the code. However, it does mean that transferring dates back and forth with COM is slightly complicated. Basically, any time during the zero date of DateTime is translated to the same time on the zero date of the COM Date , even though the two dates are different. Although this can cause some slightly anomalous results, by and large it produces the "expected" behavior.

< Day Day Up >