Type System Modifications

In addition to the extensions to the Visual Basic type system, a number of modifications to the existing type system were necessitated by the move to the .NET platform. For the most part, they were required because many of the .NET types that corresponded to COM types (such as arrays and variants) utilized different underlying storage.

Arrays

Arrays on the .NET platform are much more sophisticated than Safearrays are in COM. There are two principal differences that caused changes from Visual Basic 6.0.

The first difference is that all arrays in .NET encode their rank within their type. Because COM Safearrays did not encode their rank as an attribute of an array instance within the array type, code from previous versions of Visual Basic that declare arrays cannot be ported unchanged. In most cases, however, this is not a serious problem ”most variables are assigned only an array of one particular rank, and the common case is that the rank of the array can be inferred easily from the code around it. See, for example, Listings A.5 and A.6.

Listing A.5 `module1.bas` (VB6)

 Sub Main()     Dim x() As Integer     ReDim x(10, 10)     x(1, 1) = 10 End Sub

Listing A.6 `test.vb` (VB.NET)

 Module Test     Sub Main()         Dim x(,) As Integer   ' 2-dimensional array         ReDim x(10, 10)         x(1, 1) = 10     End Sub End Module

This difference in array types can cause problems, however, when trying to interoperate with COM methods . In that case, it is impossible to know what rank array a COM method expects because there is no way to inspect the API's code. Currently, the .NET platform treats all COM array parameters as one-dimensional arrays of the specified type. This takes care of the most common situations, but is not perfect. At this point, there is no simple way to specify the rank of an array parameter of an imported COM method.

The other difference between Safearrays and .NET arrays is more problematic . A Safearray type can contain a set of bounds for each dimension of the array, but those bounds are not considered to be part of the actual type, just like the rank of the array is not considered to be part of the type. In .NET, however, the lower bounds of each dimension of the array are allowed to be (but not required to be) considered to be part of the type itself. So while there is, for example, a one-dimensional Integer array type that can have any lower bound for its one dimension, there are also one-dimensional Integer array types whose one dimension has a fixed lower bound of zero, of one, of two, and so on. While any array of the particular rank and type can be assigned to the first kind of array type (i.e., the one without any fixed lower bounds), only an array that has matching lower bounds can be assigned to the other kinds of array types (i.e., a one-dimensional Integer array with a lower bound of 5 cannot be assigned to a variable of type one-dimensional Integer array with a lower bound of 6).

Overall, this whole question would be academic if not for one fact: The Common Language Specification (CLS) decrees that all CLS-compliant languages must be able to use array types that have fixed lower bounds of zero. This is because languages such as C# and C++ do not allow the lower bounds of their arrays to be anything but zero. They could, conceivably, still use the array types that do not fix the lower bounds to represent an array that has a fixed lower bound of zero, but to do so would mean that several code optimizations based on knowing the lower bound of each dimension of an array would be lost. This was unacceptable to the designers of those languages.

A simple solution to this problem would be to say that Visual Basic .NET should use the array types that do not fix the lower bounds and just throw exceptions if you tried to pass one with nonzero lower bounds to a CLS-compliant method. The problem is that array types can make up part of the signatures of methods. So, when overriding a method defined in C# that takes an array, the signature of the Visual Basic method has to match completely, including the fact that the array type has a fixed lower bound of zero. The result is that if Visual Basic .NET wishes to have nonzero-lower-bound arrays in the language, it must distinguish between array types that can have nonzero lower bounds and array types that have their lower bounds fixed at zero.

After much deliberation, it was decided that having two separate types of arrays in the language would be too confusing for most users. In many cases, it would be unclear as to which type should be used when declaring a method. Even worse , when errors occurred, it would be difficult to concisely explain exactly what was wrong. However, the removal of nonzero-lower-bound arrays is a significant issue for many users who routinely declared arrays with lower bounds other than zero. More consideration will be given to this problem in the future versions.

Variant and `Object` Types

Previous versions of Visual Basic could be used in a typeless way because of the variant data type. A variant was a structure that contained a union of all the different types that could be contained in a COM variable: Integer , Double , BSTR , IDispatch , and so on. This allowed a variant to contain any value that another variable could contain. As a result, if you declared all variables to be of type variant (implicitly or explicitly), Visual Basic became a typeless language. A number of helper routines in COM also allowed converting a variant from one type of value to another.

This same scheme could not be straightforwardly used in .NET because the runtime environment does not allow reference types to be unioned together with other types in a structure. The .NET runtime manages references to values on the heap and tracks them for garbage collection purposes. To allow an Integer to be unioned with a reference type in a structure would mean that the runtime environment would not be able to reliably know whether a particular instance of the structure contained a reference or an Integer in the field. This would cause safety and security issues and make garbage collection impossible. It would have been possible for the .NET runtime environment to instead expose a variant type that it natively understood ”that is, the runtime environment would ensure that the "type" field of the variant structure was always consistent with the "value" field. However, this was not ideal because of the existence of the .NET Object type.

The .NET runtime type system was designed as a single-root type system. What this means it that every type in the type system ultimately derives from Object . For reference types, this is no different than COM ”in COM, every coclass implemented the IUnknown interface, which was equivalent to the Object type in Visual Basic 6.0. For value types (i.e., structures and primitive types), though, this is a big change. It is accomplished by something of a trick. Each value type can exist in one of two forms: its boxed form or its unboxed form. An unboxed value type is just a regular value type ”it exists on the stack, in an array, or within another type and has no identity in and of itself. Unboxed value types are not managed by the .NET runtime environment because they are always part of something else. A boxed value type is the tricky part. When a value type is boxed, a copy of the value is made on the system heap and a reference to the heap location is returned. From this point on, the boxed value type can be treated just as if it was a reference type. Essentially, a value type is boxed when it is cast to Object and unboxed when it is cast from Object .

The problem may be becoming clearer at this point. Because every type in the .NET type system can be cast to Object , an Object is functionally equivalent to a variant. It is worth noting that even though Object and variant are functionally equivalent, they are two very different types. Assigning a value to a variant variable only copies the value into the variant itself, no matter how many values are assigned to the variable. Assigning a value to an Object variable allocates space on the heap and then copies the value into that space. Each time you assign a new value to the Object variable, an allocation occurs on the heap. Thus, there is more overhead when using Object than when using variant.

Ultimately, we had a choice to make: We could keep variant and live with the confusion over whether to use Object or variant when a "universal" type was needed, or we could just use Object exclusively and suffer some performance penalty. After much analysis, we determined that in most cases the performance of Object would not appreciably affect applications. Additionally, we were concerned that even within Microsoft itself the teams designing various parts of the .NET Framework would not be able to decide between using variant and Object consistently. It became clear that using Object exclusively was the best option.

A side effect of this decision relates to how ByRef parameters of type Object behave. The COM variant type can not only store a value, but also store a pointer to a value. This means that when a method in Visual Basic 6.0 calls a method declared with a ByRef variant parameter, the argument passes a pointer to itself, allowing the called method to modify the original argument storage. Because of the complexity that would have been involved in allowing it, the .NET runtime environment does not allow arbitrary pointers to storage to be passed around unless the code is declared to be "unsafe." Visual Basic .NET does not support writing unsafe code, so there was no simple way to simulate the ByRef variant behavior on the .NET run time.

There is a way to simulate the ByRef variant behavior on .NET in a safe way by using the RefAny type, but this type had some significant limitations and greatly complicated the implementation of the language. A simpler mechanism that we ultimately settled on was passing values to ByRef Object parameters (the .NET equivalent to ByRef variant parameters) using copy-in/copy-out semantics rather than true reference semantics. Thus, instead of passing a pointer to the storage of the argument, a temporary Object variable is allocated, the argument copied into it, and a pointer to the temporary variable passed to the method. Then, when the method returns, the value in the Object variable is cast back to the argument type and assigned back to the argument source. This means that changes to the parameter will not be reflected in the original argument location until the method returns. In practice, this distinction should not be often noticed.

As an example, consider Listing A.7.

Listing A.7 `test.vb` (VB.NET)

 Module Test     Public g As Integer = 10     Function ChangeValue(ByRef o As Object)         o = 20         Console.WriteLine(g & " " & o)     End Function     Sub Main         ChangeValue(g)     End Sub

Listing A.7 will print "10 20" because it is equivalent to Listing A.8.

Listing A.8 `test.vb` (VB.NET)

 Module Test    Public g As Integer = 10     Function ChangeValue(ByRef o As Object)         o = 20         Console.WriteLine(g & " " & o)     End Function Sub Main()         Dim Temp As Object         Temp = g         ChangeValue(Temp)         g = Temp     End Sub

Structures

Structures, also called user -defined types in Visual Basic 6.0, were renamed and had their functionality significantly expanded in Visual Basic .NET. The name was changed both because the functionality was expanded and because the keyword Type conflicts with the type name System.Type . System.Type is the fundamental type in reflection (the .NET system of runtime type inspection), so keeping the keyword Type would have meant that writing code that utilizes reflection would require frequent use of bracketing.

In Visual Basic .NET, structures are just user-defined value types. As there is no distinction made in terms of functionality between value types and reference types (just a distinction in terms of storage and lifetime), structures can now have methods, events, properties, and so on, and their data members can now have accessibility modifiers placed on them. The choice between a class and a structure boils down more to storage considerations and value versus reference semantics than to functionality.

`Date` , `Currency` , and `Decimal` `Types`

There were a few other data type changes that are worth noting. First, COM has both a Currency type and a Decimal type. Because Decimal has a greater range and precision than Currency (i.e., it can store all the values Currency can), we decided to replace the Currency type with the Decimal type.

The other change was more subtle. In COM, a Date variable is stored using a double-precision floating-point number. The number to the left of the decimal point represents the number of days since December 30, 1899. The number to the right of the decimal point represents the number of seconds since midnight. This is a simplified description of the actual representation, but should give the general idea. Because the Date data type is represented as a floating-point number, the data type has the same limitations in terms of range and exactness that the underlying data type does.

The .NET DateTime class, in contrast, is stored using a signed 64-bit integer number representing the number of milliseconds since midnight, January 1, 1 A.D . This is a more precise representation and can represent a greater range of dates. It has a dual impact on the Visual Basic language, however. First, it means that Double values can no longer implicitly be converted to and from Date values. Second, transferring dates from COM to .NET, and vice versa, involves a translation. By and large, this is mostly a question of whether the .NET date will fit into the range of the COM date, but there is one special situation that can cause problems.

In previous versions of Visual Basic, there was no way to express just a time value (i.e., a time of day not tied to a particular day). To work around this, Visual Basic would treat the time of day on the first day representable (December 30, 1899) as just a time value. In other words, printing the date constant #12/30/1899 1:43PM# would just output 1:43 PM . December 30, 1899, is also the zero day for the COM Date type ”in other words, a Date variable that has no value assigned to it will by default represent midnight, December 30, 1899.

To preserve this behavior, Visual Basic .NET treats the zero day of the DateTime type (January 1, 1 A.D. ) as the "time only" date. In the case of code migrated from previous versions of Visual Basic, this preserves the semantics of the code. However, it does mean that interoperation with COM is slightly more complicated. Basically, any time during the zero date of DateTime is translated to the same time on the zero date of COM Date , even though the two dates are different. Although this can cause some slightly anomalous results, by and large it produces the "expected" behavior.

Arrays

Listing A.5 module1.bas (VB6)

Listing A.6 test.vb (VB.NET)

Variant and Object Types

Listing A.7 test.vb (VB.NET)

Listing A.8 test.vb (VB.NET)