Using Variants Instead of Simple Data Types

In this section I'll discuss the pros and cons of using Variants in place of simple data types such as Integer, Long, Double, and String. This is an unorthodox practice—the standard approach is to avoid the use of Variants for a number of reasons. We'll look at the counterarguments first.

Performance Doesn't Matter

Every journal article on optimizing Visual Basic includes a mention of how Variants are slower than underlying first-class data types. This should come as no surprise. For example, when iterating through a sequence with a Variant of subtype Integer, the interpreted or compiled code must decode the structure of the Variant every time the code wants to use its integer value, instead of accessing an integer value directly. There is bound to be an overhead to doing this.

Plenty of authors have made a comparison using a Variant as a counter in a For loop, and yes, a Variant Integer takes about 50 percent more time than an Integer when used as a loop counter. This margin decreases as the data type gets more complex, so a Variant Double is about the same as a Double, whereas, surprisingly, a Variant Currency is quicker than a Currency. If you are compiling to native code, the proportions can be much greater in certain cases.

Is this significant? Almost always it is not. The amount of time that would be saved by not using Variants would be dwarfed by the amount of time spent in loading and unloading forms and controls, painting the screen, talking to databases, and so on. Of course, this depends on the details of your own application, but in most cases it is highly unlikely that converting local variables from Variants to Integers and Strings will speed up your code noticeably.

When optimizing, you benefit by looking at the bigger picture. If your program is too slow, you should reassess the whole architecture of your system, concentrating in particular on the database and network aspects. Then look at user interface and algorithms. If your program is still so locally computation-intensive and time-critical that you think significant time can be saved by using Integers rather than Variants, you should be considering writing the critical portion in C++ and placing this in a DLL.

Taking a historical perspective, machines continue to grow orders of magnitude faster, which allows software to take more liberties with performance. Nowadays, it is better to concentrate on writing your code so that it works, is robust, and is extensible. If you need to sacrifice efficiency in order to do this, so be it—your code will still run fast enough anyway.

Memory Doesn't Matter

A common argument against Variants is that they take up more memory than do other data types. In place of an Integer, which normally takes just 2 bytes of memory, a Variant of 16 bytes is taking eight times more space. The ratio is less, of course, for other underlying types, but the Variant always contains some wasted space.

The question is, as with the issue of performance in the previous section, how significant is this? Again I think not very. If your program has some extremely large arrays—say, tens of thousands of integers—an argument could be made to allow Integers to be used. But they are the exception. All your normal variables in any given program are going to make no perceptible difference whether they are Variants or not.

I'm not saying that using Variants improves performance or memory. It doesn't. What I'm saying is that the effect Variants have is not a big deal—at least, not a big enough deal to outweigh the reasons for using them.

Type Safety

A more complex argument is the belief that Variants are poor programming style—that they represent an unwelcome return to the sort of dumb macro languages that encouraged sloppy, buggy programming.

The argument maintains that restricting variables to a specific type allows various logic errors to be trapped at compile time, an obviously good thing. Variants, in theory, take away this ability.

To understand this issue fully we must first look at the way non-Variant variables behave. In the following pages I have split this behavior into four key parts of the language, and have contrasted how Variants behave compared to simple data types in each of these four cases:

Assignment
Function Calls
Operators and Expressions
Visual Basic Functions

Case 1: Assignment between incompatible variables

Consider the following code fragment (Example A):

Dim i As Integer, s As String s = "Hello" i = s

What happens? Well, it depends on which version of Visual Basic you run. In pre-OLE versions of Visual Basic you got a Type mismatch error at compile time. In Visual Basic 6, there are no errors at compile time, but you get the Type mismatch trappable error 13 at run time when the program encounters the i = s line of code.

NOTE

Visual Basic 4 was rewritten using the OLE architecture; thus, versions 3 and earlier are "pre-OLE."

The difference is that the error occurs at run time instead of being trapped when you compile. Instead of you finding the error, your users do. This is a bad thing.

The situation is further complicated because it is not the fact that s is a String and i is an Integer that causes the problem. It is the actual value of s that determines whether the assignment can take place.

This code succeeds, with i set to 1234 (Example B):

Dim i As Integer, s As String s = "1234" i = s

This code in Example C does not succeed (you might have thought that i would be set to 0, but this is not the case):

Dim i as Integer, s As String s = "" i = s

These examples demonstrate why you get the error only at run time. At compile time the compiler cannot know what the value of s will be, and it is the value of s that decides whether an error occurs.

The behavior is exactly the same with this piece of code (Example D):

Dim i As Integer, s As String s = "" i = CInt(s)

As in Example C, a type mismatch error will occur. In fact, Example C is exactly the same as Example D. In Example C, a hidden call to the CInt function takes place. The rules that determine whether CInt will succeed are the same as the rules that determine whether the plain i = s will succeed. This is known as implicit type conversion, although some call it "evil" type coercion.

The conversion functions CInt, CLng, and so on, are called implicitly whenever there is an assignment between variables of different data types. The actual functions are implemented within the system library file OLEAUT32.DLL. If you look at the exported functions in this DLL, you'll see a mass of conversion functions. For example, you'll see VarDecFromCy to convert a Currency to a Decimal, or VarBstrFromR8 to convert a string from an 8-byte Real, such as a Double. The code in this OLE DLL function determines the rules of the conversion within Visual Basic.

If the CInt function had worked the same way as Val does, the programming world would've been spared a few bugs (Example E).

Dim i As Integer, s As String s = "" i = Val(s)

This example succeeds because Val has been defined to return 0 when passed the empty string. The OLE conversion functions, being outside the mandate of Visual Basic itself, simply have different rules (Examples F and G).

Dim i As Integer, s As String s = "1,234" i = Val(s) Dim i As Integer, s As String s = "1,234" i = CInt(s)

Examples F and G also yield different results. In Example F, i becomes 1, but in Example G, i becomes 1234. In this case the OLE conversion functions are more powerful in that they can cope with the thousands separator. Further, they also take account of the locale, or regional settings. Should your machine's regional settings be changed to German standard, Example G will yield 1 again, not 1234, because in German the comma is used as the decimal point rather than as a thousands separator. This can have both good and bad side effects.

These code fragments, on the other hand, succeed in all versions of Visual Basic (Examples H and I):

Dim i As Variant, s As Variant s = "Hello" i = s  Dim i As Variant, s As Variant s = "1234" i = s

In both the above cases, i is still a string, but why should that matter? By using Variants throughout our code, we eliminate the possibility of type mismatches during assignment. In this sense, using Variants can be even safer than using simple data types, because they reduce the number of run-time errors. Let's look now at another fundamental part of the syntax and again contrast how Variants behave compared to simple data types.

LOCALE EFFECTS
Suppose you were writing a little calculator program, where the user types a number into a text box and the program displays the square of this number as the contents of the text box change.
Private Sub Text1_Change()     If IsNumeric(Text1.Text) Then         Label1.Caption = Text1.Text * Text1.Text     Else         Label1.Caption = ""     End If End Sub 
Note that the IsNumeric test verifies that it is safe to multiply the contents of the two text boxes without fear of type mismatch problems. Suppose "1,000" was typed into the text box—the label underneath would show 1,000,000 or 1, depending on the regional settings. On the one hand, it's good that you get this international behavior without performing any extra coding, but it could also be a problem if the user was not conforming to the regional setting in question. Further, to prevent this problem, if a number is to be written to a database or file, it should be written as a number without formatting, in case it is read at a later date on a machine where the settings are different.
Also, you should also avoid writing any code yourself that parses numeric strings. For example, if you were trying to locate the decimal point in a number using string functions, you might have a problem:
InStr(53.6, ".") 
This line of code will return 3 on English/American settings, but 0 on German settings.
Note, finally, that Visual Basic itself does not adhere to this convention in its own source code. The number 53.6 means the same whatever the regional settings. We all take this for granted, of course.

Case 2: Function parameters and return types

With pre-OLE versions of Visual Basic you get a Parameter Type Mismatch error at compile time, but in Visual Basic 4, 5, and 6 the situation is the same as in the previous example—a run-time type mismatch, depending on the value in s, and whether the implicit CInt could work.

The problem is that you might reasonably expect that after assigning 6.4 to x in the procedure subByRef, which is declared in the parameter list as a Variant, Debug.Print would show 6.4. But instead it shows only 6.

Now no run-time errors or compile-time type mismatch errors occur. Of course, it's not necessarily so obvious by looking at the declaration what the parameters mean, but then that's what the parameter name is for.

Returning to our survey of how Variants behave compared to simple data types, we now look at expressions involving Variants.

Case 3: Operators

I have already suggested, for the purposes of assignment and function parameters and return values, that using Variants cuts down on problematic run-time errors. Does this also apply to the use of Visual Basic's own built-in functions and operators? The answer is, "It depends on the operator or function involved."

Arithmetic operators All the arithmetic operators (such as +, -, *, \, /, and ^) evaluate their parameters at run time and throw the ubiquitous type mismatch error if the parameters do not apply. With arithmetic operators, there is neither an advantage nor a disadvantage to using Variants instead of simple data types; in either case, it's the value, not the data type, that determines whether the operation can take place. In Example A, we get type mismatch errors on both lines:

A lot of implicit type conversion is going on here. The parameters of "-" are converted at run time to Doubles before being supplied to the subtraction operator itself. CDbl("Fred") does not work, so both lines in Example A fail. CDbl("123") does work, so the subtraction succeeds in both lines of Example B.

There is one slight difference between v and s after the assignments in Example B: s is a string of length 1 containing the value 0, while v is a Variant of subtype Double containing the value 0. The subtraction operator is defined as returning a Double, so 0 is returned in both assignments. This is fine for v - v, which becomes a Variant of subtype Double, with value 0. On the other hand, s is a string, so CStr is called to convert the Double value to 0.

All other arithmetic operators behave in a similar way to subtraction, with the exception of +.

Performance Doesn't Matter

Memory Doesn't Matter

Type Safety

Case 1: Assignment between incompatible variables

Case 2: Function parameters and return types

Case 3: Operators

Case 4: Visual Basic's own functions

Flexibility

Defensive Coding

Using the Variant as a General Numeric Data Type