Predefined Data Types | Pro Visual C++ 2005 for C# Developers

Now that you have seen how to declare variables and constants, this section takes a closer look at the data types available in C#. As you will see, C# is a lot fussier about the types available and their definitions than some other languages are.

Value Types and Reference Types

Before examining the data types in C#, it is important to understand that C# distinguishes between two categories of data type:

Value types
Reference types

The next few sections look in detail at the syntax for value and reference types. Conceptually, the difference is that a value type stores its value directly, whereas a reference type stores a reference to the value. Compared to other languages, value types in C# are basically the same thing as simple types (integer, float, but not pointers or references) in Visual Basic or C++. Reference types are the same as reference types in Visual Basic, or are similar to types accessed through pointers in C++.

These types are stored in different places in memory; value types are stored in an area known as the stack, and reference types are stored in an area known as the managed heap. It is important to be aware of whether a type is a value type or a reference type because of the different effect that assignment has. For example, int is a value type, which means that the following statement will result in two locations in memory storing the value 20:

 // i and j are both of type int i = 20; j = i;

However, consider the following code. For this code, assume you have defined a class called Vector. Assume that Vector is a reference type and has an int member variable called Value:

 Vector x, y; x = new Vector(); x.Value = 30;   // Value is a field defined in Vector class y = x; Console.WriteLine(y.Value); y.Value = 50; Console.WriteLine(x.Value);

The crucial point to understand is that after executing this code, there is only one Vector object around. x and y both point to the memory location that contains this object. Because x and y are variables of a reference type, declaring each variable simply reserves a reference — it doesn't instantiate an object of the given type. This is the same as declaring a pointer in C++ or an object reference in Visual Basic. In neither case does an object actually get created. In order to create an object you have to use the new keyword, as shown. Because x and y refer to the same object, changes made to x will affect y and vice versa. Hence the code will display 30 then 50.

Note

C++ developers should note that this syntax is like a reference, not a pointer. We use the . notation, not ->, to access object members. Syntactically, C# references look more like C++ reference variables. However, behind the superficial syntax, the real similarity is with C++ pointers.

If a variable is a reference, it is possible to indicate that it does not refer to any object by setting its value to null:

 y = null;

This is just the same as setting a reference to null in Java, a pointer to NULL in C++, or an object reference in Visual Basic to Nothing. If a reference is set to null, then clearly it is not possible to call any non-static member functions or fields against it; doing so would cause an exception to be thrown at runtime.

In languages like C++, the developer could choose whether a given value was to be accessed directly or via a pointer. Visual Basic was more restrictive, taking the view that COM objects were reference types and simple types were always value types. C# is similar to Visual Basic in this regard: whether a variable is a value or reference is determined solely by its data type, so int, for example, is always a value type. It is not possible to declare an int variable as a reference (although in Chapter 5, "Operators and Casts" which covers boxing, you see it is possible to wrap value types in references of type object).

In C#, basic data types like bool and long are value types. This means that if you declare a bool variable and assign it the value of another bool variable, you will have two separate bool values in memory. Later, if you change the value of the original bool variable, the value of the second bool variable does not change. These types are copied by value.

In contrast, most of the more complex C# data types, including classes that you yourself declare, are reference types. They are allocated upon the heap, have lifetimes that can span multiple function calls, and can be accessed through one or several aliases. The Common Language Runtime (CLR) implements an elaborate algorithm to track which reference variables are still reachable, and which have been orphaned. Periodically, the CLR will destroy orphaned objects and return the memory that they once occupied back to the operating system. This is done by the garbage collector.

C# has been designed this way because high performance is best served by keeping primitive types (like int and bool) as value types, while having larger types that contain many fields (as is usually the case with classes) as reference types. If you want to define your own type as a value type, you should declare it as a struct.

CTS Types

As mentioned in Chapter 1, ".NET Architecture," the basic predefined types recognized by C# are not intrinsic to the language but are part of the .NET Framework. For example, when you declare an int in C#, what you are actually declaring is an instance of a .NET struct, System.Int32. This may sound like an esoteric point, but it has a profound significance: it means that you are able to treat all the primitive data types syntactically as if they were classes that supported certain methods. For example, to convert an int i to a string, you can write:

 string s = i.ToString();

It should be emphasized that, behind this syntactical convenience, the types really are stored as primitive types, so there is absolutely no performance cost associated with the idea that the primitive types are notionally represented by .NET structs.

The following sections review the types that are recognized as built-in types in C#. Each type is listed, along with its definition and the name of the corresponding .NET type (CTS type). C# has 15 predefined types, 13 value types, and 2 (string and object) reference types.

Predefined Value Types

The built-in value types represent primitives, such as integer and floating-point numbers, character, and Boolean types.

Integer types

C# supports eight predefined integer types:

Name	CTS Type	Description	Range (min:max)
sbyte	System.SByte	8-bit signed integer	-128:127 (-2⁷:2⁷-1)
short	System.Int16	16-bit signed integer	-32,768:32,767 (-2¹⁵:2¹⁵-1)
int	System.Int32	32-bit signed integer	-2,147,483,648:2,147,483,647 (-2³¹:2³¹-1)
long	System.Int64	64-bit signed integer	-9,223,372,036,854,775,808: 9,223,372,036,854,775,807 (-2⁶³:2⁶³-1)
byte	System.Byte	8-bit unsigned integer	0:255 (0:2⁸-1)
ushort	System.UInt16	16-bit unsigned integer	0:65,535 (0:2¹⁶-1)
uint	System.UInt32	32-bit unsigned integer	0:4,294,967,295 (0:2³²-1)
ulong	System.UInt64	64-bit unsigned integer	0:18,446,744,073,709,551,615 (0:2⁶⁴-1)

Future versions of Windows will target 64-bit processors, which can move bits into and out of memory in larger chunks to achieve faster processing times. Consequently, C# supports a rich palette of signed and unsigned integer types ranging in size from 8 to 64 bits.

Many of these type names will be new to Visual Basic. C++ and Java developers should be careful; some of the names of C# types are the same as C++ and Java types, but the types have different definitions. For example, in C#, an int is always a 32-bit signed integer. In C++ an int is a signed integer, but the number of bits is platform-dependent (32 bits on Windows). In C#, all data types have been defined in a platform- independent manner to allow for the possible future porting of C# and .NET to other platforms.

A byte is the standard 8-bit type for values in the range 0 to 255 inclusive. Be aware that, in keeping with its emphasis on type safety, C# regards the byte type and the char type as completely distinct, and any programmatic conversions between the two must be explicitly requested. Also be aware that unlike the other types in the integer family, a byte type is by default unsigned. Its signed version bears the special name sbyte.

With .NET, a short is no longer quite so short; it is now 16 bits long. The int type is 32 bits long. The long type reserves 64 bits for values. All integer-type variables can be assigned values in decimal or in hex notation. The latter require the 0x prefix:

 long x = 0x12ab;

If there is any ambiguity about whether an integer is int, uint, long, or ulong, it will default to an int. To specify which of the other integer types the value should take, you can append one of the following characters to the number:

 uint ui = 1234U; long l = 1234L; ulong ul = 1234UL;

You can also use lowercase u and l, although the latter could be confused with the integer 1 (one).

Floating-point types

Although C# provides a plethora of integer data types, it supports floating-point types as well. They will be familiar to C and C++ programmers:

Name	CTS Type	Description	Significant Figures	Range (approximate)
float	System.Single	32-bit single-precision floating point	7	1.5 10^-45 to 3.4 10³⁸
double	System.Double	64-bit double-precision floating point	15/16	5.0 10^-324 to 1.7 10³⁰⁸

The float data type is for smaller floating-point values, for which less precision is required. The double data type is bulkier than the float data type, but offers twice the precision (15 digits).

If you hard-code in a non-integer number (such as 12.3) in your code, the compiler will normally assume you want the number interpreted as a double. If you want to specify that the value is a float, you append the character F (or f) to it:

 float f = 12.3F;

The decimal type

In addition, there is a decimal type representing higher precision floating-point numbers:

Name	CTS Type	Description	Significant Figures	Range (approximate)
decimal	System.Decimal	128-bit high precision decimal notation	28	1.0 10^-28 to 7.9 10²⁸

One of the great things about the CTS and C# is the provision of a dedicated decimal type for financial calculations. How you use the 28 digits that the decimal type provides is up to you. In other words, you can track smaller dollar amounts with greater accuracy for cents, or larger dollar amounts with more rounding in the fractional area. You should bear in mind, however, that decimal is not implemented under the hood as a primitive type, so using decimal will have a performance impact on your calculations.

To specify that your number is of a decimal type rather than a double, float, or an integer, you can append the M (or m) character to the value as shown in the following example:

 decimal d = 12.30M;

The Boolean type

The C# bool type is used to contain Boolean values of either true or false:

Name	CTS Type	Values
bool	System.Boolean	true or false

You cannot implicitly convert bool values to and from integer values. If a variable (or a function return type) is declared as a bool, you can only use values of true and false. You will get an error if you try to use zero for false and a non-zero value for true.

The character type

For storing the value of a single character, C# supports the char data type:

Name	CTS Type	Values
char	System.Char	Represents a single 16-bit (Unicode) character

Although this data type has a superficial resemblance to the char type provided by C and C++, there is a significant difference. C++ char represents an 8-bit character, whereas a C# char contains 16 bits. This is part of the reason that implicit conversions between the char type and the 8-bit byte type are not permitted.

Although 8 bits may be enough to encode every character in the English language and the digits 0–9, they aren't enough to encode every character in more expansive symbol systems (such as Chinese). In a gesture toward universality, the computer industry is moving away from the 8-bit character set and toward the 16-bit Unicode scheme, of which the ASCII encoding is a subset.

Literals of type char are signified by being enclosed in single quotes, for example 'A'. If you try to enclose a character in double quotes, the compiler will treat this as a string and throw an error.

As well as representing chars as character literals, you can represent them with 4-digit hex Unicode values (for example '\u0041'), as integer values with a cast (for example, (char)65), or as hexadecimal values ('\x0041'). They can also be represented by an escape sequence:

Escape Sequence	Character
\'	Single quote
\"	Double quote
\\	Backslash
\0	Null
\a	Alert
\b	Backspace
\f	Form feed
\n	Newline
\r	Carriage return
\t	Tab character
\v	Vertical tab

C++ developers should note that because C# has a native string type, you don't need to represent strings as arrays of chars.

Predefined Reference Types

C# supports two predefined reference types:

Name	CTS Type	Description
object	System.Object	The root type, from which all other types in the CTS derive(including value types)
string	System.String	Unicode character string

The object type

Many programming languages and class hierarchies provide a root type, from which all other objects in the hierarchy derive. C# and .NET are no exception. In C#, the object type is the ultimate parent type from which all other intrinsic and user-defined types derive. This is a key feature of C#, which distinguishes it from both Visual Basic and C++, although its behavior here is very similar to Java. All types implicitly derive ultimately from the System.Object class. This means that you can use the object type for two purposes:

You can use an object reference to bind to an object of any particular subtype. For example, in Chapter 5, "Operators and Casts," you see how you can use the object type to box a value object on the stack to move it to the heap. object references are also useful in reflection, when code must manipulate objects whose specific types are unknown. This is similar to the role played by a void pointer in C++ or by a Variant data type in VB.
The object type implements a number of basic, general-purpose methods, which include Equals(), GetHashCode(), GetType(), and ToString(). Responsible user-defined classes may need to provide replacement implementations of some of these methods using an object- oriented technique known as overriding, which is discussed in Chapter 4, "Inheritance." When you override ToString(), for example, you equip your class with a method for intelligently providing a string representation of itself. If you don't provide your own implementations for these methods in your classes, the compiler will pick up the implementations in object, which may or may not be correct or sensible in the context of your classes.

The object type is examined in more detail in subsequent chapters.

The string type

Veterans of C and C++ probably have battle scars from wrestling with C-style strings. A C or C++ string was nothing more than an array of characters, so the client programmer had to do a lot of work just to copy one string to another or to concatenate two strings. In fact, for a generation of C++ programmers, implementing a string class that wrapped up the messy details of these operations was a rite of passage requiring many hours of teeth gnashing and head scratching. Visual Basic programmers had a somewhat easier life, with a string type, while Java people had it even better, with a String class that is in many ways very similar to C# string.

C# recognizes the string keyword, which under the hood is translated to the .NET class, System.String. With it, operations like string concatenation and string copying are a snap:

 string str1 = "Hello ";  string str2 = "World";  string str3 = str1 + str2; // string concatenation

Despite this style of assignment, string is a reference type. Behind the scenes, a string object is allocated on the heap, not the stack, and when you assign one string variable to another string, you get two references to the same string in memory. However, with string there are some differences from the usual behavior for reference types. For example, should you then make changes to one of these strings, note that this will create an entirely new string object, leaving the other string unchanged. Consider the following code:

 using System; class StringExample { public static int Main() { string s1 = "a string"; string s2 = s1; Console.WriteLine("s1 is " + s1); Console.WriteLine("s2 is " + s2); s1 = "another string"; Console.WriteLine("s1 is now " + s1); Console.WriteLine("s2 is now " + s2); return 0; } }

The output from this is:

s1 is a string  s2 is a string  s1 is now another string  s2 is now a string

Changing the value of s1 had no effect on s2, contrary to what you'd expect with a reference type! What's happening here is that when s1 is initialized with the value a string, a new string object is allocated on the heap. When s2 is initialized, the reference points to this same object, so s2 also has the value a string. However, when you now change the value of s1, instead of replacing the original value, a new object will be allocated on the heap for the new value. The s2 variable will still point to the original object, so its value is unchanged. Under the hood, this happens as a result of operator overloading, a topic that is explored in Chapter 5, "Operators and Casts." In general, the string class has been implemented so that its semantics follow what you would normally intuitively expect for a string.

String literals are enclosed in double quotes ("..."); if you attempt to enclose a string in single quotes, the compiler will take the value as a char, and throw an error. C# strings can contain the same Unicode and hexadecimal escape sequences as chars. Because these escape sequences start with a backslash, youcan't use this character unescaped in a string. Instead, you need to escape it with two backslashes (\\):

 string filepath = "C:\\ProCSharp\\First.cs";

Even if you are confident you can remember to do this all the time, it can prove annoying typing all those double backslashes. Fortunately, C# gives you an alternative. You can prefix a string literal with the at character (@) and all the characters in it will be treated at face value; they won't be interpreted as escape sequences:

 string filepath = @"C:\ProCSharp\First.cs";

This even allows you to include line breaks in your string literals:

 string jabberwocky = @"'Twas brillig and the slithy toves  Did gyre and gimble in the wabe.";

Then the value of jabberwocky would be this:

'Twas brillig and the slithy toves  Did gyre and gimble in the wabe.