Predefined Data Types | Performance Consulting: A Practical Guide for HR and Learning Professionals

Chapter 2 - C# Basics

bySimon Robinsonet al.
Wrox Press 2002

Now that we have seen how to declare variables and constants, we shall take a closer look at the data types available in C#. As we will see, C# is a lot fussier about the types available and their definitions than some other languages are.

Value Types and Reference Types

Before examining the data types in C#, it is important to understand that C# distinguishes between two categories of data type:

Value types
Reference types

We will look in detail at the syntax for value and reference types over the next few sections. Conceptually, the difference is that a value type stores its value directly, while a reference type stores a reference to the value. Comparing to other languages, value types in C# are basically the same thing as simple types (integer, float, but not pointers or references) in VB or C++. Reference types are the same as reference types in VB, or are similar to types accessed through pointers in C++.

These types are stored in different places in memory; value types in an area known as the stack , while reference types are stored in an area known as the managed heap . It is important to be aware of whether a type is a value type or a reference type because of the different effect that assignment has. For example, int is a value type, which means that the following statement will result in two locations in memory storing the value 20:

   // i and j are both of type int     i = 20;     j = i;

However, consider the following code. This code uses a class, MathTest , which we will define in an example to be introduced later. For now, we need to know only that MathTest is a reference type:

   // x and y are MathTest references     x = new MathTest();     x.value = 30;   // value is a field defined in MathTest sample     y = x;     Console.WriteLine(y.value);     y.value = 50;     Console.WriteLine(x.value);

The crucial point to understand is that after executing this code, there is only one MathTest object around. x and y both point to the memory location that contains this object. Since x and y are variables of a reference type, declaring each variable simply reserves a reference it doesn't instantiate an object of the given type. This is the same as declaring a pointer in C++ or an object reference in VB in neither case does an object actually get created. In order to create an object we have to use the new keyword, as shown. Since x and y refer to the same object, changes made to x will affect y and vice versa. Hence the above code will display 30 then 50 .

C++ developers should note that this syntax is like a reference, not a pointer. We use the . notation, not - > , to access object members . Syntactically, C# references look more like C++ reference variables. However, behind the superficial syntax, the real similarity is with C++ pointers.

If a variable is a reference, it is possible to indicate that it does not refer to any object by setting its value to null :

   y = null;

This is just the same as setting a reference to null in Java, a pointer to NULL in C++, or an object reference in VB to Nothing . If a reference is set to null , then clearly it is not possible to call any non-static member functions or fields against it doing so will cause an exception to be thrown at run time.

In languages like C++, the developer could choose whether a given value was to be accessed directly or via a pointer. VB was more restrictive , taking the view that COM objects were reference types and simple types were always value types. C# takes a similar view to VB: whether a variable is value or reference is determined solely by its data type, so int for example is always a value type. It is not possible to declare an int variable as a reference (although later on when we cover boxing we will see it is possible to wrap value types in references of type object ).

In C#, basic data types like bool and long are value types. This means that if we declare a bool variable and assign it the value of another bool variable, we will have two separate bool values in memory. Later, if we change the value of the original bool variable, the value of the second bool variable does not change. These types are copied by value.

In contrast, most of the more complex C# data types, including all classes that we ourselves declare, are reference types. They are allocated upon the heap, have lifetimes that can span multiple function calls, and can be accessed via one or several aliases. The CLR implements an elaborate algorithm to track which reference variables are still reachable , and which have been orphaned. Periodically, the CLR will 'clean house', destroying orphaned objects and returning the memory that they once occupied back to the operating system. This is done by the garbage collector.

C# has been designed this way because high performance is best served by keeping primitive types (like int and bool ) as value types, while having larger types that contain many fields (as is usually the case with classes) as reference types. If you wish to define your own type as a value type, you should declare it as a struct.

CTS Types

As we pointed out in Chapter 1, the basic predefined types recognized by C# are not intrinsic to the language but part of the .NET Framework. For example, when you declare an int in C#, what you are actually declaring is an instance of a .NET struct, System.Int32 . This may sound like an esoteric point, but it has a profound significance: it means that you are able to treat all the primitive data types syntactically as if they were classes that supported certain methods . For example, to convert an int i to a string you can write:

   string s = i.ToString();

It should be emphasized that, because of the way the .NET Framework implements these types, this syntactical convenience is achieved without any performance costs. In terms of performance, you really are dealing with primitive data types.

Let's now review the types defined in C#. We will list each type, along with its definition and the name of the corresponding .NET type (CTS type). C# has fifteen predefined types, thirteen value types, and two ( string and object ) reference types.

Predefined Value Types

The built-in value types represent primitives, such as integer and floating-point numbers , character, and Boolean types.

Integer Types

C# supports eight predefined integer types:

Name	CTS Type	Description	Range (min:max)
sbyte	System.SByte	8-bit signed integer	-128:127 (-2 ⁷ :2 ⁷ -1)
short	System.Int16	16-bit signed integer	-32,768:32,767 (-2 ¹⁵ :2 ¹⁵ -1)
int	System.Int32	32-bit signed integer	-2,147,483,648:2,147,483,647 (-2 ³¹ :2 ³¹ -1)
long	System.Int64	64-bit signed integer	-9,223,372,036,854,775,808: 9,223,372,036,854,775,807 (-2 ⁶³ :2 ⁶³ -1)
byte	System.Byte	8-bit unsigned integer	0:255 (0:2 ⁸ -1)
ushort	System.UInt16	16-bit unsigned integer	0:65,535 (0:2 ¹⁶ -1)
uint	System.UInt32	32-bit unsigned integer	0:4,294,967,295 (0:2 ³² -1)
ulong	System.UInt64	64-bit unsigned integer	0:18,446,744,073,709,551,615(0:2 ⁶⁴ -1)

Future versions of Windows will target 64-bit processors, which can move bits into and out of memory in larger chunks to achieve faster processing times. Consequently, C# supports provides a rich palette of signed and unsigned integer types ranging in size from 8 to 64 bits.

VB developers will of course find many of these type names to be new. C++ and Java developers should be careful; some of the names of C# types are the same as C++ and Java types, but the types nevertheless, have different definitions. For example, in C#, an int is always a 32-bit signed integer. In C++ an int is a signed integer, but the number of bits is platform-dependent (32 bits on Windows). In C#, all data types have been defined in a platform-independent manner in order to allow for the possible future porting of C# and .NET to other platforms.

A byte is the standard 8-bit type for values in the range 0 to 255 inclusive. Be aware that, in keeping with its emphasis on type safety, C# regards the byte type and the char type as completely distinct, and any programmatic conversions between the two must be explicitly requested . Also be aware that unlike the other types in the integer family, a byte type is by default unsigned. Its signed version bears the special name sbyte .

With .NET, a short is no longer quite so short; it is now 16 bits long. Even larger, the int type is 32 bits long. Now huge, the long type reserves 64 bits for values! All integer-type variables can be assigned values in decimal or in hex notation. The latter require the 0x prefix:

   long x = 0x12ab;

If there is any ambiguity about whether an integer is int , uint , long , or ulong , it will default to an int . In order to specify which of the other integer types the value should take, you can append one of the following characters to the number:

   uint ui = 1234U;     long l = 1234L;     ulong ul = 1234UL;

We can also use lower case u and l , although the latter could be confused with the integer 1 .

Floating Point Types

Although C# provides a plethora of integer data types, it supports floating-point types as well. They will be familiar to C and C++ programmers:

Name	CTS Type	Description	Significant Figures	Range (approximate)
float	System.Single	32-bit single-precision floating- point	7	1.5 — 10 ^-45 to 3.4 — 10 ³⁸
double	System.Double	64-bit double-precision floating- point	15/16	5.0 — 10 ^-324 to 1.7 — 10 ³⁰⁸

The float data type is for smaller floating-point values, for which less precision is required. The double data type is bulkier than the float data type, but offers twice the precision (15 digits).

If you hard-code in a non-integer number (such as 12.3) in your code, the compiler will normally assume you want the number interpreted as a double. If we want to specify that the value is a float , we append the character F (or f ) to it:

   float f = 12.3F;

Decimal Type

In addition, there is a decimal type representing higher precision floating-point numbers:

Name	CTS Type	Description	Significant Figures	Range (approximate)
decimal	System.Decimal	128-bit high precision decimal notation	28	1.0 — 10 ^-28 to 7.9 — 10 ²⁸

One of the great things about the CTS and C# is the provision of a dedicated decimal type for financial calculations. How you use the 28 digits that the decimal type provides is up to you. In other words, you can track smaller dollar amounts with greater accuracy for cents , or larger dollar amounts with more rounding in the fractional area.

To specify that our number is of a decimal type rather than a double , float, or an integer, we can append the M (or m ) character to the value, like so:

   decimal d = 12.30M;

Boolean Type

The C# bool type is used to contain Boolean values of either true or false :

Name	CTS Type	Values
bool	System.Boolean	true or false

We cannot implicitly convert bool values to and from integer values. If a variable (or a function return type) is declared as a bool , then we can only use values of true and false . We will get an error if we try to use zero for false and a non-zero value for true .

Character Type

For storing the value of a single character, C# supports the cha r data type:

Name	CTS Type	Values
char	System.Char	Represents a single 16-bit (Unicode) character

Although this data type has a superficial resemblance to the char type provided by C and C++, there is a significant difference. C++ char represents an 8-bit character, whereas a C# char contains 16 bits. This is part of the reason that implicit conversions between the char type and the 8-bit byte type are not permitted.

Although 8 bits may be enough to encode every character in the English language and the digits 0-9, they aren't enough to encode every character in more expansive symbol systems (such as Chinese). In a gesture towards universality, the computer industry is moving away from the 8-bit character set and towards the 16-bit Unicode scheme, of which the ASCII encoding is a subset.

Literals of type char are signified by being enclosed in single quotes, for example 'A' . If we try to enclose a character in double quotes, the compiler will treat this as a string, and throw an error.

As well as representing char s as character literals, we can represent them with 4-digit hex Unicode values (for example '\u0041' ), as integer values with a cast (for example, (char)65 ), or as hexadecimal values ( '\x0041' ). They can also be represented by an escape sequence:

Escape Sequence	Character
\'	Single quote
\"	Double quote
\\	Backslash
\0	Null
\a	Alert
\b	Backspace
\f	Form feed
\n	Newline
\r	Carriage return
\t	Tab character
\v	Vertical tab

C++ developers should note that because C# has a native string type, we don't need to represent strings as arrays of char s.

Predefined Reference Types

C# supports two predefined reference types:

Name	CTS Type	Description
object	System.Object	The root type, from which all other types in the CTS derive (including value types)
string	System.String	Unicode character string

The object Type

Many programming languages and class hierarchies provide a root type, from which all other objects in the hierarchy derive. C# and .NET are no exception. In C#, the object type is the ultimate parent type from which all other intrinsic and user -defined types derive. This is a key feature of C#, which distinguishes it from both VB and C++, although its behavior here is very similar to Java. All types implicitly derive ultimately from the System.Object class. This means that we can use the object type for two purposes.

We can use an object reference to bind to an object of any particular sub-type. For example, in the next chapter we'll see how we can use the object type to box a value object on the stack to move it to the heap. Object references are also useful in reflection, when code must manipulate objects whose specific types are unknown. This is similar to the role played by a void pointer in C++ or by a Variant data type in VB.
The object type implements a number of basic, general-purpose methods, which include Equals() , GetHashCode() , GetType() , and ToString() . Responsible user-defined classes may need to provide replacement implementations of some of these methods using an object-oriented technique known as overriding , which we will discuss in Chapter 3. When we override ToString() , for example, we equip our class with a method for intelligently providing a string representation of itself. If we don't provide our own implementations for these methods in our classes, the compiler will pick up the implementations in object , which may or may not be correct or sensible in the context of our classes.

We'll examine the object type in more detail in subsequent chapters.

The string Type

Veterans of C and C++ probably have battle scars from wrestling with C-style strings. A C or C++ string was nothing more than an array of characters, so the client programmer had to do a lot of work just to copy one string to another or to concatenate two strings. In fact, for a generation of C++ programmers, implementing a string class that wrapped up the messy details of these operations was a rite of passage requiring many hours of teeth gnashing and head scratching, while chasing memory leaks and faulty overloaded operators. VB programmers had a somewhat easier life, with a string type, while Java people had it even better, with a String class that is in many ways very similar to C# string.

C# provides its own string type. With it, operations like string concatenation and string copying are a snap:

   string str1 = "Hello ";     string str2 = "World";     string str3 = str1 + str2; // string concatenation

Despite this style of assignment, the CTS System.String class is a reference type. Behind the scenes, a string object is allocated on the heap, not the stack, and when we assign one string variable to another string, we get two references to the same string in memory. However, should we then make changes to one of these strings, note that this will create an entirely new string object, leaving the other string unchanged. Consider the code:

   using System;     class StringExample     {     public static int Main()     {     string s1 = "a string";     string s2 = s1;     Console.WriteLine("s1 is " + s1);     Console.WriteLine("s2 is " + s2);     s1 = "another string";     Console.WriteLine("s1 is now " + s1);     Console.WriteLine("s2 is now " + s2);     return 0;     }     }

The output from this is:

 s1 is a string s2 is a string s1 is now another string s2 is now a string

In other words, changing the value of s1 had no effect on s2 , contrary to what we'd expect with a reference type! What's happening here is that when s1 is initialized with the value a string , a new string object is allocated on the heap. When s2 is initialized, the reference points to this same object, so s2 also has the value a string . However, when we now change the value of s1 , instead of replacing the original value, a new object will be allocated on the heap for the new value. Our s2 variable will still point to the original object, so its value is unchanged.

String literals are enclosed in double quotes ( "..." ); if we attempt to enclose a string in single quotes, the compiler will take the value as a char , and throw an error. C# strings can contain the same Unicode and hexadecimal escape sequences as char s. Since these escape sequences start with a backslash, we can't use this character unescaped in a string. Instead, we need to escape it with two backslashes, \\ :

   string filepath = "C:\ProCSharp\First.cs";

Even if you are confident you can remember to do this all the time, it can prove annoying typing out all those double backslashes. Fortunately, C# gives us an alternative. We can prefix a string literal with the at character, @ , and all the characters in it will be treated at face value they won't be interpreted as escape sequences:

   string filepath = @"C:\ProCSharp\First.cs";

This even allows us to include line breaks in our string literals:

   string Jabberwocky = @"'Twas brillig and the slithy toves     Did gyre and gimble in the wabe.";

Then the value of Jabberwocky would be this:

 'Twas brillig and the slithy toves Did gyre and gimble in the wabe.