Primitives | Professional .NET Framework 2.0 (Programmer to Programmer)

Primitives — objects, strings, and numbers (simple scalars and floating point numbers) — are the closest thing to raw values that you will find on the platform. Most have native keywords in managed languages, and IL that make representing and working with them much simpler. Many are value types rather than complex data structures. And they are also treated specially by the runtime. Common operations on them are often handled by IL instructions instead of requiring calls to library-provided methods. All other types are merely a set of abstractions and operations that build on top of these fundamental data structures — grouping and naming them in meaningful ways — and providing operations to easily work with them. You saw how to do that in Chapter 2.

Before exploring each of the primitives in-depth, take a look at the following table. It outlines mappings between the primitive types in the BCL and their corresponding language keywords in C#, VB, C++, and IL. This table should be useful as a quick reference throughout the book, mainly so that we can have general discussions without catering to everybody's language preference. (I can see it now: the Int32 type, a.k.a. int in C#, a.k.a. Integer in VB, a.k.a.…) In most cases, this book uses either the C# or IL keywords rather than the BCL type names because they are more familiar to most readers.

Each of the primitive types discussed lives in the mscorlib.dll assembly and System namespace. All but Object and String are value types:

Type	C#	VB	C++	IL
Boolean	bool	Boolean	bool	bool
Byte	byte	Byte	unsigned char	unsigned int8
Char	char	Char	wchar_t	char
DateTime	n/a	Date	n/a	n/a
Decimal	decimal	Decimal	n/a	n/a
Double	double	Double	double	float64
Int16	short	Short	short	int16
Int32	int	Integer	int	int32
Int64	long	Long	int64	int64
IntPtr	n/a	n/a	n/a	native int
Object	object	Object	n/a	object
SByte	sbyte	n/a	signed char	int8
Single	float	Single	float	float32
String	string	String	n/a	string
UInt16	ushort	n/a	unsigned short	uint16
UInt32	uint	n/a	unsigned int	uint32
UInt64	ulong	n/a	unsigned int64	uint64
UIntPtr	n/a	n/a	n/a	native unsigned int

Next we'll discuss how you can use each of these primitives.

Object

Object unifies the .NET Framework type system as the implicit root of the entire class hierarchy. Any reference type you build will derive from the Object class unless you indicate otherwise. And even if you say otherwise, at some point in the hierarchy you will end up back at Object. Value types derive from ValueType implicitly, which itself is a direct subclass of Object. This type hierarchy was illustrated in Chapter 2, Figure -2.1. You might want to refer back to that for a quick refresher.

Because of this unification, an Object reference at runtime can point to any given instance of an arbitrary type. This makes it straightforward to generically deal with any instance at runtime, regardless of its type. Boxing and unboxing of course coerce the representation of values such that they can be referred to by an ordinary object reference. And because Object is at the root of every managed type's ancestry, there are a few public and protected methods that end up inherited and accessible on each object at runtime. Most of these are virtual and therefore can be overridden by custom types:

 public class Object {     // Default constructor:     public Object();     // Instance methods:     public virtual bool Equals(object obj);     protected override void Finalize();     public virtual int GetHashCode();     public extern Type GetType();     protected extern object MemberwiseClone();     public virtual string ToString(); }

Let's take a look at them.

Equality Methods

The instance method Equals returns true or false to indicate whether obj is equal to the target object. The default implementation provided Object evaluates reference equality. That is, two object references are considered equal only if they points to the same object on the heap. This is a virtual method, however, meaning that subclasses are free to override it to do whatever they'd like. Thus, the meaning of equality is subject to vary in practice. Regardless of the implementation, the following properties must hold:

Equals is reflexive. That is, a.Equals(a) must be true.
Equals is symmetric. That is, if a.Equals(b) then b.Equals(a) must also be true.
Equals is transitive. That is, if a.Equals(b) and b.Equals(c), then a.Equals(c) must also be true.
Equals is consistent. That is, if a.Equals(b) is true, if no state changes between invocations, additional calls to a.Equals(b) should also return true.
Equals handles null appropriately. That is a.Equals(null) will return false (and not fail with a NullReferenceException, for instance).

Many Framework classes are written with the assumption that these are in fact true for all Equals methods. Some types override Equals to perform value equality checks rather than the default of reference equality. Framework Design Guidelines (see "Further Reading") suggest using this technique sparingly, for example on value types only. Value equality simply means that the contents of two objects are memberwise-compared. For example:

 class Person {     public string Name;     public int Age;     public Person(string name, int age)     {         Name = name;         Age = age;     }     public override bool Equals(object obj)     {         // Simple checks...         if (obj == this)             return true;         if (obj == null || !obj.GetType().Equals(typeof(Person)))             return false;         // Now check for member-wise equality:         Person p = (Person)obj;         return this.Name == p.Name && this.Age == p.Age;   } }

In the override of Equals(object) above, we first check to see if the reference passed in points to the object currently being invoked; this is a very cheap thing to verify (which is why we do it first) and will catch simple cases where the method is invoked passing itself as the argument, for example a.Equals(a). Next, we check if the parameter is either null or of a different type, both conditions of which indicate that the objects are not equal. This will catch some subtle inconsistencies and violations of the rules outlined just below. Lastly, we compare the contents of the instances and return true only if each member is equal.

Two instances created with the same state would now be considered equal (e.g., new Person("Bob", 55).Equals(new Person("Bob", 55) == true), even though they in fact represent two distinct objects on the heap. This is not the case with Object's default implementation.

When performing a value equality check, you will have to make the choice about whether to do a deep or shallow equality check. That is, do you consider two objects to be equal if all fields are reference equal? Or only if they are value equal? It usually makes sense to do a deep check (by calling Equals instead of == on each instance field), but whatever you do be sure to remain consistent and careful to document it so callers know what to expect.

Value Type Equality

The type from which all value types implicitly derive, System.ValueType, supplies a custom implementation of the Equals method. This implementation checks for value equality. Specifically, it returns false if the two instances being compared are not of the same type or if any instance fields are not equal. It checks field equality using a deep check, that is, by calls to Equals(object) on each field. Unfortunately, this implementation is horribly inefficient. Sometimes this isn't a concern, but if you intend to invoke Equals on a large quantity of value types in a tight loop, for example, it probably should be. This is for two reasons. First, Equals(object) takes an object as its parameter. This means that you must first box the value being passed to Equals. Second, the implementation of ValueType.Equals uses reflection — a metadata-based approach — to retrieve field values instead of directly referencing them in IL. This slows execution down considerably.

Implementing your own version is boilerplate, but avoids these problems. Thus, it's advisable to override Equals on any value type you create. Consider this value type Car:

 struct Car {     public string Make;     public string Model;     public uint Year;     public Car(string make, string model, uint year)     {         // ...     } }

All you must do from here is to create a new Equals overload that takes a Car, and to override the default implementation inherited from ValueType:

 public bool Equals(Car c) {     return c.Make == this.Make &&         c.Model == this.Model &&         c.Year == this.Year; } public override bool Equals(object obj) {     if (obj is Car)         return Equals((Car)obj);     return false; }

This performs much more acceptably than ValueType's default implementation and avoids boxing entirely when the compiler knows statically that two things are Cars:

 Car c1 = new Car("BMW", "330Ci", 2001); Car c2 = new Car("Audi", "S4", 2005); bool isEqual = c1.Equals(c2); // No boxing required for this call...

In some simple performance tests, this runs over four times faster than the default.

Static Equality Helpers

A static bool Equals(object objA, object objB) method is available that also returns true if objA and objB are considered equal. This method first checks for reference equality; if the result is true, Equals returns true. This catches the case when both objA and objB are null. If false, it checks to see if only one object is null, in which case it returns false. Otherwise, it will return the result of invoking the objA.Equals(objB). In other words, the method returns true if (objA == objB) || (objA != null && objB != null && objA.Equals(objB)). Because of the built-in null checking, it's often more readable than writing explicit null checks at the call site, for example, with the instance Equals method:

 Object a = /*...*/; Object b = /*...*/; bool isEqual = false; // With instance Equals, null check is required: if (a != null)     isEqual = a.Equals(b); else     isEqual = (a == b); // With static Equals, it's not: isEqual = Object.Equals(a, b);

If you need to check for reference equality only, you can use the static bool ReferenceEquals(object objA, object objB) method. There is a subtle difference between this method and simply comparing two objects using the equals operator (i.e., == in C#). Any type is free to override the == operator if it chooses, in which case the C# compiler will bind to it when somebody writes ==. You can force a check for reference equality by casting each instance object and then comparing the two object references. For example, imagine that the author of MyType overrode the op_Equality (==) operator to give it value equality semantics:

 MyType a = new MyType("Joe"); MyType b = new MyType("Joe"); bool isValueEqual = (a == b);

In this case, isValueEqual would be false. But in either of the following cases, the comparison yields true:

 bool isRefEqual = Object.ReferenceEquals(a, b); bool isObjEqual = ((object)a == (object)b);

Hash-Codes

GetHashCode and Equals go hand in hand. In fact, the C# compiler will generate warnings when you override one but not the other. Any two objects a and b for which a.Equals(b) returns true must also return the same hash-code integer for GetHashCode. If you're providing your own implementation of Equals, there is no magic that makes this property hold true; you must manually override GetHashCode, too. Hash codes are used for efficiently storing instances in data structures such as hash-table dictionaries, for example (see Chapter 6 for more information on such data structures). To ensure compatibility with the algorithms these types use, you must supply your own implementation that follows the guidance outlined below. Dictionary<TKey,TValue>, for example, won't work correctly with your types otherwise.

Hash codes do not have to be unique for a single object. Two objects, a and b , for which a.Equals(b) returns false can return the same value for GetHashCode. To improve the performance of data structures that rely on this information, however, you should strive to create an even distribution of hash codes over the set of all possible instances of your type. This is difficult to achieve in practice, but a little effort to distribute the range of hash-code values often goes a long way.

Finalizers

When an object is garbage collected, a set of cleanup actions sometimes has to be taken to ensure that unmanaged resources (such as HANDLEs, void*s, etc.) are relinquished back to the system. (Note: value types are not allocated on the managed heap and cannot be finalized. Thus, the following discussion only applies to reference types.) We discussed this process in more detail in Chapter 3, including some more general details about the CLR's GC.

The virtual method Finalize exists to give you a last chance to perform resource cleanup before an object is garbage collected and gone forever. This is called as an object's finalizer, and any object whose type overrides this method is referred to as a finalizable object. The GC will invoke an object's finalizer before reclaiming its heap-allocated memory. Except for the event of a critical system failure or rude AppDomain shutdown, for example, a finalizer will always get a chance to execute. We discuss critical finalization in Chapter 11, which can be used for reliability-stringent code that must guarantee that this occurs.

Finalization is nondeterministic, meaning that no guarantees are made about when it runs. Finalize is also commonly referred to as a destructor, which is at best a (horrible) misnomer. C++ destructors are entirely deterministic. This is a direct result of the similar syntax C# chose when compared with C++ destructors (i.e., ~<ClassName>()). The IDisposable pattern provides the equivalent to a C++ destructor. In fact, C++/CLI emits classic C++ destructors as Dispose methods beginning in 2.0.

The Disposable Pattern

The IDisposable interface contains a single Dispose method. You should always supply a Dispose method on any type that holds on to unmanaged resources to provide callers with a standardized, deterministic way to initiate cleanup. Any object that stores an IDisposable field of any sort should also implement Dispose, the implementation for which just calls Dispose on all of its owned fields. Standardizing on this gives people an easy way to discover when cleanup is necessary ("oh, look — it implements IDisposable … I should probably call Dispose when I'm done using my instance") and enables constructs such as C#'s using statement to build on top of the pattern.

The full pattern for IDisposable falls into two categories. First, there is the simple pattern. This is used for classes that hold references to other IDisposable objects. Just write a Dispose method:

 class MyClass : IDisposable {     private Stream myStream = /*...*/;     // ...     public void Dispose()     {         Stream s = myStream;         if (s != null)             ((IDisposable)s).Dispose();     } }

The more complex pattern occurs if you hold true unmanaged resources. In this case, your type needs both a finalizer and a Dispose method. The pattern is to use a protected void Dispose(bool) method to contain the common logic between the two; the finalizer just calls Dispose(false), and the Dispose() method calls Dispose(true) and suppresses finalization:

 class MyClass : IDisposable {     private IntPtr myHandle = /*...*/;     // ...     ~MyClass()     {         Dispose(false);     }     public void Dispose()     {         Dispose(true);         GC.SuppressFinalize(this);     }     protected void Dispose(bool disposing)     {         IntPtr h = myHandle;         if (h != IntPtr.Zero)         {             CloseHandle(h);             h = IntPtr.Zero;         }     } }

Notice that our Dispose() method calls GC.SuppressFinalize. This unregisters our object for finalization; it is no longer necessary because the call to Dispose released its resources. The disposing parameter can be used from the Dispose(bool) method; it can use it for any logic that needs to know whether it's on the finalizer thread. Generally, you need to be cautious about interacting with the outside world inside of a finalizer because other state might be in the finalization queue with you (and indeed already finalized).

C# offers syntactic sugar — the using statement — which is a straightforward way to wrap an IDisposable object to ensure eager cleanup:

 using (MyClass mc = /*...*/) {     // Use 'mc'. }

This snippet compiles into the equivalent C#:

 MyClass mc = /*...*/ try {     // Use 'mc'. } finally {     if (mc != null)         mc.Dispose(); }

Note that you can actually have many disposable objects in the same using block, for example:

 using (MyClass mc = /*...*/) using (MyOtherDisposableClass modc = /*...*/) {     // Use 'mc' and 'modc'. }

You must be somewhat cautious when calling Dispose on objects, however. If you don't own the lifetime of a disposable object — for example, you found its reference embedded in another related object (e.g., passed as an argument) — and end up calling its Dispose method, other code might try to use it after it's been disposed. This could cause unexpected program behavior and (if all goes well) will cause an ObjectDisposedException to be thrown.

Type Identity

The GetType method returns a Type representing the runtime type of the target object. Object provides this implementation — it is nonvirtual. Chapter 14 discusses some of the more interesting things you can do with a Type object. For example, you can do things like inspect various properties:

 string s = "A string instance..."; Type t = s.GetType(); Console.WriteLine(t.Name);               // "String" Console.WriteLine(t.Namespace);          // "System" Console.WriteLine(t.IsPublic);           // "True" Console.WriteLine(t == typeof(string));  // "True"

ToString Method

The purpose of the ToString instance method is to convert an arbitrary object to a logical string representation. Overrides return a String containing some relevant representation of the instance, typically information about its identity and current state. This method unifies operations like concatenating strings with objects, printing objects to output, and so forth; any code that needs a string can just call ToString.

The default implementation of ToString just returns the full string name of an object's runtime type. We can of course provide an explicit override for ToString, however, for example:

 class Person {     public string Name;     public int Age;     public override string ToString()     {         return String.Format("{0}[Name={1}, Age={2}]",             base.ToString(), Name, Age);     } }

An instance of Person with values "Jamie" for Name and 16 for Age, for example, would return the following string in response to a call to ToString: "Person[Name=Jamie, Age=16]".

Numbers

Numeric primitives are of great importance for even the simplest programming tasks. Whether you are incrementing a counter, manipulating and totaling prices of commercial goods, or even writing a loop to iterate through an array, you're likely to run into one of these guys. There are two general categories of numbers available in the platform: integers (or whole numbers) and floating point numbers (fractions or decimals). The former category includes numbers such as 10, -53, and 0x8, while the latter covers numbers like 31.101099, -8.0, and 3.2e-11, for example.

Figure 5-1 depicts the hierarchy of CTS numerical types.

image from book
Figure 5-1: Numeric type hierarchy.

Each of these types is detailed in the upcoming sections.

Integers

Integers (a.k.a. scalars, integrals) are numbers that do not contain decimal or fractional parts; each type is a value type. These numbers include positive, negative, and zero. Two primary sets of integer types are available: signed and unsigned. Signed integers are able to represent both positive and negative numbers, while unsigned numbers are capable of only storing positive numbers. A 32-bit unsigned integer uses the same number of bits as a 32-bit signed integer but can use the spare bit (ordinarily used for "negative" indication) to store accommodate twice the range.

The following table depicts the ranges of all integer types. Each of these types offers static properties MinValue and MaxValue, in case you need to access this information via the type system:

Type	Signed	Size	Value Range
Byte	No	8 bits	0 to 255
Char	No	16 bits	0 to 65,535
Int16	Yes	16 bits	-32,768 to 32,767
Int32	Yes	32 bits	-2,147,483,648 to 2,147,483,647
Int64	Yes	64 bits	-9,223,372,036,854,775,808 to 9,223,372,036,854,775,807
SByte	Yes	8 bites	-128 to 127
UInt16	No	16 bits	0 to 65,535
UInt32	No	32 bits	0 to 4,294,967,295
UInt64	No	64 bits	0 to 18,446,744,073,709,551,615

16-Bit, 32-Bit, and 64-Bit Integers

The three sizes of integers, 16 (WORD), 32 (DWORD), and 64 bit (quad-WORD), enable you to choose the amount of memory storage that suits your circumstance best. In performance critical situations, when you are building reusable types or low-level infrastructure, it is usually profitable to think about the legal ranges of each field and variable you choose, and to choose the size based on that data. Choosing between signed and unsigned is often simpler; for example, if you're using a counter that begins at 0 and only increments, supporting negative values is wasteful. An unsigned integer will increase your upper limit without the expense of more storage.

Note

Note that APIs with unsigned integers are not CLS compliant. That is, a CLS language needn't support creating and consuming unsigned integers. Thus, if you use them in your public APIs, you can limit the extent to which you can interoperate with other languages. C# and C++/CLI, for example, have no problem consuming them.

Most languages offer literal representations for these types. In C#, the literal representation of integers can take decimal or hexadecimal form. For example, the integer fifty-nine can be represented as both 59 and 0x3B. You can also use a single letter suffix to indicate precisely what type should be allocated. L indicates long (Int64), U indicates unsigned int (UInt32), and UL indicates an unsigned long (UInt64):

 int v1 = -3329; uint v2 = 21U; long v3 = 53228910L; long v4 = 0x23A83L; ulong v5 = 3992018273647UL;

8-Bit Bytes

An 8-bit Byte is the smallest unit of allocation that the CLR supports. Bytes are often used when reading raw binary files, for example. Unlike the other integers, Byte's default representation is unsigned. This is because signed bytes are fairly uncommon. Most bytes are used to encode 256 unique values, from 0 to 255, rather than negative numbers. For those rare occasions where a signed byte is needed, the SByte type is available.

16-Bit Characters

Char is also an unsigned 16-bit number — similar to Int16 — but is treated as a wchar_t (Unicode character) by the runtime. It is the building block for string support in the Framework. Chapter 8 discusses encodings, including Unicode, in greater detail. Languages ordinarily support literal representations of character. For example, this is done in C# by enclosing an individual character in single quotes. For example, 'a', '9', and '!' are all valid Char literals. There are also a set of escape sequences that you may use to represent single characters that are difficult to represent literally:

Escape	Character #	Description
\'	\u0027 (39)	Single quotation mark - i.e., '
\"	\u0022 (34)	Double quotation mark - i.e., "
\\	\u005C (92)	Backslash
\0	\u0000 (0)	Null
\a	\u0007 (7)	Bell
\b	\u0008 (8)	Backspace
\t	\u0009 (9)	Tab
\v	\u000B (11)	Vertical tab
\f	\u000C (12)	Form feed
\n	\u000A (10)	New line
\r	\u000D (13)	Carriage return

Other characters might be difficult to create literals for, particularly those higher up in the Unicode code-point list. In C# you can specify characters using their code-point, that is, using the escape sequence \u[nnnn] or \xn[n][n][n]. \u is followed by a four-digit hexadecimal number, while \x supports a variable length set of digits, between 1 and 4. This number is the character's Unicode code-point. For example, '\u0061' is the character 'a', which can also be represented as '\x61'. Because you can explicitly cast from the integral types to a Char, you can achieve similar results with a cast, for example (char)97.

The Char class itself has some interesting static helper methods that enable you to check such things as whether a given character IsDigit, IsLetter, IsLower, IsPunctuation, and so forth. These all return a Boolean value to indicate whether the character fits into a certain class. These predicate methods are Unicode-sensitive, so for example IsDigit will return correct answers when working with non-English digit characters in the Unicode character set.

Floating Points and Decimals

Floating point types are capable of representing decimal or fractional numbers, referred to as real numbers in mathematics. Both Single and Double are value types that store their data using a standard IEEE 754 binary floating point representation, while Decimal uses a proprietary base-10 representation. Decimal's higher precision means that it is useful for calculations that require certain rounding and preciseness guarantees, such as monetary and/or financial calculations. The table below shows the ranges:

Type	Size	Value Range	Precision
Single	32 bits	1.5 * 10e-45 to 3.4 * 10e38	7 digits
Double	64 bits	5.0 * 10e-324 to 1.7 * 10e308	15–16 digits
Decimal	128 bits	-79,228,162,514,264,337,593,543,950,335 to 79,228,162,514,264,337,593,543,950,335	28–29 digits

Single and Double are 32 (DWORD) and 64 bits (quad-WORD), respectively, again leaving a choice to be made depending on your storage and range requirements. And much like the above types, languages offer their own literal syntax. In C#, this is expressed as a whole part (one or more numbers) followed by a decimal point and the fractional part (again one or more additional numbers). An exponent may also be specified immediately following the numbers using the character E and an exponent number. Singles, or floats, must be followed by the specifier F; otherwise, a Double is assumed as the default. You can also optionally specify a D following a number to explicitly indicate that it is a Double, although this isn't necessary:

 Single v1 = 75.0F; Single v2 = 2.4e2F; Double v3 = 32.3; Double v4 = 152993812554.2329830129D; Double v5 = 2.5e20D;

Floating precision numbers use the idea of positive infinity and negative infinity to represent overflow and underflow, respectively (instead of "wrapping around" like scalars do). The static values PositiveInfinity and NegativeInfinity are available as members on both types. Operations that involve the use of infinite numbers treat these values as extremely large numbers in the given sign. For example, PositiveInfinity * 2 is PositiveInfinity, and NegativeInfinity * NegativeInfinity is PositiveInfinity (any negative number squared is positive). Invalid mathematical operations, such as multiplying 0 with PositiveInfinity, for instance, will yield the not-a-number value — also defined on the floating point types as the static member NaN. Mathematical operations using at least one NaN will result in NaN.

Binary floating point numbers are actually stored as a set of three components, not an absolute number. These components are a sign, the mantissa, and an exponent. When the value of a floating point number is required for display or mathematical purposes, for example, it is calculated based on these three independent values. In other words, floating point numbers are imprecise. This method of storage allows floating point numbers to represent a wider range of numbers, albeit with some loss in precision. This lack of precision can result in some surprising behavior, such as the well-known problem that storing 0.1 is impossible using binary floating points. In addition to this, taking different steps that theoretically mathematically equivalent might in practice result in different numbers; for example (3 * 2) + (3 * 4) versus 3 * (2 + 4). Most computer hardware supports native calculations of numbers stored in this format to improve performance, in fact often using higher precision than what is requested by these types.

Decimals

The Decimal value type works around the impreciseness of floating point numbers, for situations where precise rounding and truncation guarantees must be made. A great example of this is monetary transactions, for example in banking programs. When dealing with money, any loss in precision is entirely unacceptable. Literal support for Decimal is provided in C# and accepts literals using the same syntax as Single- and Double-precision floats (shown above), albeit with a trailing M. For example, 398.22M is interpreted as a Decimal in C#.

Decimal is not a primitive in the strict sense of the word. It is indeed a low-level, reusable BCL type, but it doesn't have special IL support as the above types do. Adding one decimal with another actually ends up as a method call to the decimal's op_Addition method, for example. This sacrifices some level of performance, but it is an amount that is typically insignificant for managed applications.

These are some of the guarantees that Decimals provide for you:

Preservation of zeroes: For example, 9.56 + 2.44 will often be represented as 12.0 in traditional floating point data types. Decimal preserves the trailing zeroes, that is, 12.00. This is sometimes an important feature for financial operations and/or situations in which high-precision values are required.
Well-defined rounding guarantees: Decimal by default uses round half even algorithm, also called Banker's rounding. That is, a number is rounded to the nearest digit; if it is halfway between two numbers, it gets rounded to the closest even number. The Round(decimal d, int decimals) method uses this mechanism by default, and provides other overrides to specify a different midpoint technique. Decimal also offers Ceiling, Floor, and Truncate methods to do similarly powerful rounding operations;

Trying to store numbers with too many significant digits will result in truncation of the excess digits.

Boolean

A Boolean is a value type capable of representing two distinct values, true or false (i.e., 1 and 0 in the IL) and is the simplest possible. It has a range of two. It is used in logical operations and is ordinarily used for control flow purposes. Boolean has literal support as the true and false keywords in C#. The Boolean type itself offers two read-only properties: TrueString and FalseString. Each represents the ToString representation of true and false values, that is, "True" and "False", respectively.

Strings

The CLR is has a first class notion of a string in its type system. This is conceptually just a char[], in other words an array of 16-bit Unicode characters. But the String class itself comes with many methods to help with common tasks. String is a reference type unlike many of the other value type primitives we've discussed thus far. Strings are also immutable (read-only), so you don't have to worry about the contents of a string changing "underneath you" once you have a reference to one.

The easiest way to get an instance of a String is to use your favorite language's literal support. C# (and nearly every other language on the planet) does this with a sequence of characters surrounded by double quotation marks. You can also construct one using a char[], char*, or a variety of other things using one of the String type's constructors. You'll notice that there isn't any empty constructor; if you need to create an empty string, simply use the literal "" or the static member String.Empty:

 string s1 = "I am a string, and an amazing one at that."; string s2 = String.Empty; string s3 = "";

There are quite a few methods on the String class, many of which you'll need to use in everyday programming tasks. Let's take a look at some of them.

Concatenating

Concatenating strings together is the action of combining multiple strings into one. Most languages support native operations that provide shortcuts to do this. For example, C# uses the + character, while VB uses &. The following example allocates two strings referenced by s and w; it then creates an entirely new string on the heap to contain the combination of the two and stores a reference to it in sw:

 string s = "Hello, "; string w = "World"; string sw = s + w; Console.WriteLine(sw);

Executing this code prints "Hello, World" to the console. (Note that a smart compiler will often optimize this case: it can calculate the combined string statically at compile time, eliminating the need to do any dynamic allocation at runtime. This isn't always the case, for example with nonliteral strings.)

Because each string is immutable, a new string object must be allocated for each concatenation. The above syntax is really just a shortcut to the Concat method on String. Using this mechanism to concatenate large numbers of strings can degrade performance. Each operation will result in yet another String instance, some of which are only used momentarily. It isn't uncommon to find a large number of garbage strings as a result of this programming idiom. Consider using the StringBuilder type for this purpose, as detailed later in this chapter.

Thanks to the ToString method, you can typically concatenate any old object with a string. C# permits you to do this and inserts the call to ToString silently for you:

 int number = 10; string msg = "Jim's age is "+ number + "."; msg += "Isn't this great?"; Console.WriteLine(msg);

This code writes the words "Jim's age is 10. Isn't this great?" to the console.

In the C# language, every + found after the first string in an expression is interpreted as a string concatenation operation. Thus, if you want to actually perform any mathematical addition operation with two numbers in conjunction with string concatenation, you'll need to enclose it within parenthesis:

 String s1 = 10 + 5 + ": Two plus three is "+ 2 + 3; String s2 = 10 + 5 + ": Two plus three is "+ (2 + 3);

The first of these two strings probably isn't what the author intended; it prints out "15: Two plus three is 23". Notice that the first + operation is interpreted as addition and not concatenation. Only after a string is seen in the expression will string concatenation be assumed. The second actually performs the math at the end of the expression, outputting "15: Two plus three is 5".

Formatting

Rather than appending strings to each other, it's often more convenient to use the static String.Format method. This is a C-style printf-like mechanism to replace special tokens in the format string argument with the objects in the object[] argument. There are a large variety of formatting syntaxes available to control formatting of arguments, including the ability to create your own. We will only cover rudimentary format specifiers here; please refer to the sections on date and number formatting below for more details.

The format string can contain any number of special slots, which will be replaced with the actual values passed in the object[]. Each slot is indicated using curly braces containing positional-numbers, that is, {n}, where n is a 0-based numerical index mapping to the token to an element in the array. For example:

 int age = 25; string name = "Jim"; Console.WriteLine(String.Format("{0} is {1} years old.", name, age));

Executing this code prints "Jim is 25 years old." You can repeat any index more than once, which repeats the same input argument multiple times:

 string name = "Hello"; Console.WriteLine(String.Format("{0} there. I said {0}! {0}???", name));

This code prints out the string "Hello there. I said Hello! Hello???" Note that the Console.WriteLine overloads (i.e., the Write and WriteLine methods on System.IO.TextWriter) take advantage of String.Format, enabling you to use these methods as you would with Format:

 int age = 50; string name = "Alan"; Console.WriteLine("{0} is {1} years old.", name, age);

A formatting string can also include a so-called formatting specifier for customization of the resulting text. This is done by using {n:formatSpecifier} instead of just {n}. This enables you to take advantage of features of the IFormattable interface. Numbers and dates, for example, have custom formatting packages that enable you to specify the way they are to be converted into text. This snippet of code prints an integer using its hexadecimal format:

 Console.WriteLine("{0:X}", 29);

The result is "1D" printed to the console. More information on this is presented later in this chapter.

Accessing a String's Contents

As mentioned above, a string is really just a convenient wrapper on top of a char[]. Sometimes you need to deal with a string character by character. There are a few ways to go about this. First and foremost, String's Length property returns an int representing the number of characters the target string contains. String also provides a character-based indexer, meaning that you can treat any string as though it were just an array of characters. Both of these add up to an easy way of walking through a string's contents one character at a time:

 string s = /*...*/ for (int i = 0; i < s.Length; i++) {     char c = s[i];     // Do something w/ 'c'... }

Actually, because String implements the System.Collections.IEnumerable and System .Collections.Generic.IEnumerable<char> interfaces (more on these interfaces can be found in Chapter 6), you can simply use the C# foreach statement to walk through the contents of a string:

 string s = /*...*/; foreach (char c in s) {     // Do something w/ 'c'... }

If you have an existing character array that you need to turn into a string, the String type offers constructors to work with char[]s and char*s. There are times during which you need to do the reverse, in other words to extract a raw char[] from a string. ToCharArray does that, offering a no-argument overload and one that accepts startLength and length parameters to control how the characters are extracted more closely:

 string s = "Hello, World."; char[] c1 = s.ToCharArray(); char[] c2 = s.ToCharArray(0, 5);

In the above code, c1 equals the character array { 'H', 'e', 'l', 'l', 'o', ' ', 'W', 'o', 'r', 'l', 'd', '. ' }, and c2 will be { 'H', 'e', 'l', 'l', 'o' }.

Comparison

To test equality with another string instance, you can use either the equality operator offered by your language of choice (i.e., == in C#, = in VB) or the Equals method on String. They do the same thing. You'll likely also run into situations where you need to do a case-insensitive comparison or check for equality. You can do this with the bool Equals(String value, StringComparison comparisonType) overload, by supplying StringComparison.OrdinalIgnoreCase as the comparisonType argument. For example, consider the following set of string comparisons:

 bool b1 = "hello" == "hello";      // True bool b2 = "hello" == "hi";         // False bool b3 = "hello".Equals("hello"); // True bool b4 = "hello".Equals("hi");    // False bool b5 = "HoWdY".Equals("howdy"); // False bool b6 = "HoWdY".Equals("howdy",  // True     StringComparison.OrdinalIgnoreCase);

In this example, the first four checks are fairly obvious and result in the expected behavior. The fifth simply shows that string equality is usually case sensitive; the last shows how to do a case-insensitive comparison. Notice that we used OrdinalIgnoreCase instead of CurrentCultureIgnoreCase. The reason behind this is quite complex and has to do with culture-specific sort orders. Refer to Chapter 8 for more details on this and/or the "Further Reading" section.

It's often useful to check if a string either begins or ends with another string. For this, String provides two methods: StartsWith and EndsWith. Both likewise offer case-insensitive variants. For example, given a string that represents an absolute path to a file on disk, you may want to check certain of its properties. Consider the following code snippet:

 // Note: we use the '@' escape character for the strings below. This is // a C# language feature that avoids interpreting backslashes (\) as an // escape character. This is convenient for paths. string path = @" C:\Program Files\My Application\SomeFile.XML"; // StartsWith: bool b1 = path.StartsWith(@"c:\"); // True bool b2 = path.StartsWith(@"c:\"); // False bool b3 = path.StartsWith(@"c:\", true, null); // True // EndsWith: bool b4 = path.EndsWith(".XML"); // True bool b5 = path.EndsWith(".xml"); // False bool b6 = path.EndsWith(".xml", true, null); // True

Lastly, String also implements the IComparable interface with an int CompareTo(object) method and a few static Compare convenience methods, allowing you to do both culture-sensitive, ordinal and case-insensitive comparisons of strings. These operations help facilitate collation and ordering, and are often used in common operations such as sorting. All of these methods return an integer less than 0 to indicate the first string (or the object being invoked) is ordered before the second string (or the argument); 0 means that they are equal, and greater than 0 indicates that the first string should be ordered after the second. As noted above, these Compare methods run the risk of touching on some tricky internationalization issues. To avoid these, use ordinal comparisons whenever possible.

Creating Modified Strings from Other Strings

There are several instance methods that take an existing string and copy its contents into a new string instance, performing some interesting translation along the way. Clearly, these methods don't actually change the string on which they are called — remember, CLR strings are immutable — they simply return another string that contains the requested modifications.

Converting to Upper- and Lowercase

To create a string with all of its characters changed to uppercase, use the ToUpper method. Similarly, ToLower returns a new string with all lowercase letters:

 string s = "My little String."; Console.WriteLine(s); Console.WriteLine(s.ToUpper()); Console.WriteLine(s.ToLower()); Console.WriteLine(s);

This will print out the following to the console:

 My little String. MY LITTLE STRING. my little string. My little String.

Changing or Removing Contents

Another common operation is to replace occurrences of a word or character within a string with something else. This is much like search-and-replace in your favorite text editor. Two overrides of Replace are available: one takes two chars, while the other takes two strings. Each parameter represents the item to look for and the item to replace occurrences with, respectively. This method replaces every occurrence of the specified item within the string:

 string s = "If you want to be cool, you'll need to listen carefully..."; Console.WriteLine(s.Replace('c', 'k')); Console.WriteLine(s.Replace("you", "we"));

This code prints out the following:

 If you want to be kool, you'll need to listen carefully... If we want to be cool, we'll need to listen carefully...

Sometimes you need to remove an entire section of a string in one fell swoop. To do this, you can use the Remove method. It supplies two overrides: one takes only a single integer startIndex and returns a string with every contiguous character from startIndex to the end removed, inclusively; the other takes two integers startIndex and count, representing the index at which to begin removal and number of contiguous characters to remove:

 string s = "I am not happy today!"; Console.WriteLine(s); Console.WriteLine(s.Remove(4)); Console.WriteLine(s.Remove(5, 4));

This code results in the following output:

 I am not happy today! I am I am happy today!

Trimming

Trim removes sequences of characters at either end of your string. The no-argument overload will strip off whitespace characters from both the beginning and end of the string, returning the result:

 string s = "    My real string is surrounded by whitespace!    "; Console.WriteLine(s); Console.WriteLine(s.Trim());

This code prints out the following:

     My real string is surrounded by whitespace! My real string is surrounded by whitespace!

String Trim(params Char[] trimChars) is similar but allows you to pass in a custom set of characters to trim off instead of the default of trimming only whitespace. An entire sequence of contiguous characters that match at least one entry in this array will be removed both from the start and end of the string. The TrimStart and TrimEnd methods are similar but will trim only the beginning and end of the string, respectively:

 string s = "__...,Howdy there, pardner!,..._"; Console.WriteLine(s); char[] trimChars = new char[] { '.', ',', '_' }; Console.WriteLine(s.Trim(trimChars)); Console.WriteLine(s.TrimStart(trimChars)); Console.WriteLine(s.TrimEnd(trimChars));

Executing this snippet of code writes this to the console:

 __...,Howdy there, pardner!,..._ Howdy there, pardner! Howdy there, pardner!,..._ __...,Howdy there, pardner!

Note that these methods take the trimChars a params array. As a result, you can just pass in a sequence of char arguments rather than manually constructing and passing an array.

Padding

You can pad your strings with a specific character using the PadLeft and PadRight methods. Each has two overloads: one simply takes an integer totalWidth representing the desired length of the target string, padding included; the other takes totalWidth and a character paddingChar that indicates the character to pad with. If paddingChar is not specified, a space character is used. Imagine you have a string that you'd like to pad with '.' characters such that it is 20 characters long:

 string s = "Pad me, please"; Console.WriteLine(s.PadRight(20, '.'));

The output of this example will be "Pad me, please......" Listing 5-1 demonstrates how you might use PadLeft and PadRight to generate justified strings for formatting purposes.

Listing 5-1: Justified printing using padding

 enum PrintJustification {     Left,     Center,     Right } void PrintJustified(string s, int width, PrintJustification just) {     int diff = width - s.Length;     if (diff > 0)     {         switch (just)         {             case PrintJustification.Left:                 Console.Write(s.PadRight(width));                 break;             case PrintJustification.Right:                 Console.Write(s.PadLeft(width));                 break;             case PrintJustification.Center:                 s = s.PadLeft(s.Length + (diff / 2));                 s = s.PadRight(width);                 Console.Write(s);                 break;         }     } }

Extracting Substrings

You could use some of the character array operations outlined above to walk through a section of a string, accumulate a sequence of chars and extract a substring from it. The Substring offers you this exact functionality. Two overloads are available: one takes an integer startPosition representing the position at which to start extracting, while the other also takes an integer length representing the length of the string to extract. If length isn't supplied, the function extracts the remainder of the string:

 string s = "My nifty little Stringy-String."; Console.WriteLine(s.Substring(9, 6)); Console.WriteLine(s.Substring(16));

This prints out "little" and "Stringy-String.", respectively.

Splitting

You can split a string delimited by specific characters using the Split method. It takes either a char[] or string[] containing possible delimiters and returns a string[] containing contiguous strings between any delimiters found in the input string. It offers an overload that takes a params array for the delimiter characters:

 string s = "Joe Duffy|Microsoft|Program Manager|CLR Team"; string[] pieces = s.Split('|'); foreach (string piece in pieces) {     Console.WriteLine(piece); }

This piece of code will write out the following output:

 Joe Duffy Microsoft Program Manager CLR Team

You can pass a StringSplitOptions.RemoveEmptyEntries enumeration value to eliminate empty entries, that is, when two delimiters are found back to back with no text between them.

Merging

You can combine an array of strings together into a single piece of text delimited by a string of your choosing with the Join method:

 string[] s = new string[] { "Joe Duffy", "Microsoft",     "Program Manager", "CLR Team" }; string joined = String.Join("|", s);

After executing this code, the string array will be joined and delimited by "|", resulting in the text "Joe Duffy|Microsoft|Program Manager|CLR Team".

Searching

There are several methods defined on String that enable you to search for occurrences of a string within another. The most straightforward is Contains method, which returns true or false if the supplied string was found anywhere within the target string.

The IndexOf operation returns an int representing the starting index of the first occurrence of a given string or char. LastIndexOf returns the index of the first occurrence but begins searching from the end of the string. IndexOf searches from left to right, while LastIndexOf searches from right to left. Both methods offer overrides enabling you to control at what position the search is to begin and/or the maximum number of characters to examine. These methods return -1 to indicate that no match was found. Similarly, the IndexOfAny and LastIndexOfAny methods accept char[]s and search for the first occurrence of any character in the array. Much like IndexOf and LastIndexOf, these offer parameters to control where to begin the search and how many characters to examine.

IntPtr

The IntPtr type is a machine-sized pointer type. On 32-bit architectures, it is capable of holding 32-bits (i.e., it is essentially an int); on 64-bit architectures, it is capable of 64 bits (i.e., a long). IntPtrs are most often used to refer to a void* or OS HANDLE to an unmanaged resource. The type provides constructors that accept an int, long, or void*, each of which can be used to later dereference the resource with which the IntPtr is associated. This type is immutable. Chapter 11 discusses interoperating with unmanaged code further, including an overview of IntPtr and its safer counterpart, SafeHandle.

Dates and Times

The .NET Framework has a rich set of types to work with dates and times, also providing the ability to compute and represent time intervals. Mathematical operations on dates are provided that take into account calendars, end-of-year conditions, leap years, and daylight savings time, for example. Time zone support enables developers to present dates and times in a localized fashion.

DateTime

The DateTime value type enables you to capture a point in time, with both a date and time component. Values are actually stored and computed using a very small time interval called a tick, each of which represents the passing of 100 nanoseconds. A DateTime stores the number of ticks that have passed since midnight, January 1, 0001, and uses this for all date and time computations. This is an implementation detail, and normally you don't even need to be aware of this.

DateTime is capable of storing dates from midnight, January 1, 0001 through 11:59:59 PM, December 31, 9999 (i.e., 23:59:59 in 24-hour format) in the Gregorian calendar. These are the tick values 0L through 3155378975999999999L, respectively. You should also be aware that two primary "time zones" are used when dealing with dates and times: Local time format and Universal Coordinated Time (UTC). UTC is actually Greenwich Mean Time minus 1 hour (GMT-01:00). By using the information stored in the operating system, DateTime is able to obtain your current time zone to perform offset computations for conversions between Local and UTC. The local time zone data is also available through the TimeZone.CurrentTimeZone static member.

DateTime is inherently culture-aware when it comes to formatting dates and times, but not when dealing with different calendar or time zone systems. It relies on the System.Globalization.Calendar type to provide calendar information, for which there are many implementations in the Framework. Operations performed directly against a DateTime instance use the calendar and time zone that your computer has been configured to use. You can initialize your own TimeZone objects; unfortunately, the Framework doesn't ship with a standard set of prebuilt instances.

Creating a DateTime

You can obtain a snapshot of an immediate point in time by accessing the static DateTime.Now or DateTime.UtcNow property. Both return a DateTime instance initialized to the current date and time, the former of which is in Local time, while the latter is in UTC. You can determine what time zone kind a date is represented in by inspecting its Kind property. It returns a DateTimeKind enumeration value of either Local, Utc, or Unspecified. For example:

 DateTime localNow = DateTime.Now; Console.WriteLine("{0} - {1} ({2})", localNow, localNow.Kind,     TimeZone.CurrentTimeZone.StandardName); DateTime utcNow = DateTime.UtcNow; Console.WriteLine("{0} - {1}", utcNow, utcNow.Kind);

Sample output for this snippet is:

 10/19/2004 10:54:58 PM - Local (Pacific Standard Time) 10/20/2004 5:54:58 AM - Utc

A large number of constructors are available using which to instantiate a DateTime representing a precise point in time. You can supply a long representing the ticks for the target DateTime, or if you prefer, you can deal in terms of years, months, days, hours, minutes, seconds, and milliseconds. Constructors are available that enable you to specify as little as just months, days, and years to as many as all of these. For example:

 DateTime dt1 = new DateTime(2004, 10, 19); DateTime dt2 = new DateTime(2004, 10, 19, 22, 47, 35); DateTime dt3 = new DateTime(2004, 10, 19, 22, 47, 35, 259);

These represent the date and time October 19, 2004 12:00:00.000 AM (Midnight), October 19, 2004 10:47:35.000 PM, and October 19, 2004 10:47:35.259 PM, respectively. Notice that if you do not provide any time information, it defaults to 12:00:00.000 AM, that is, midnight.

By default, each constructor generates a DateTime with DateTimeKind.Local time zone kind. Most constructor styles offer overloads that permit you to specify a custom DateKind and/or a Calendar for globalization.

Properties of DateTime

Given a DateTime instance, you can retrieve any of its date- or time-based properties. Some of these even use the calendar to compute interesting information, such as day of the week, for example:

Property	Type	Description
Day	Int32	The day of the month
DayOfWeek	DayOfWeek	An enumeration value representing the day of the week (Mon—Sun)
DayOfYear	Int32	The number of days since and including January 1
Hour	Int32	The 24-hour-based hour of the day (1–24)
Millisecond	Int32	The millisecond-part of the time
Minute	Int32	The minute-part of the time
Month	Int32	The month of the year (1–12)
Second	Int32	The second-part of the time
Year	Int32	The four-digit year in which this date and time fall

This snippet illustrates what these properties might return for a given DateTime:

 DateTime dt1 = new DateTime(2004, 10, 19, 22, 47, 35, 259); Console.WriteLine("Day: {0}", dt1.Day); Console.WriteLine("DayOfWeek: {0}", dt1.DayOfWeek); Console.WriteLine("DayOfYear: {0}", dt1.DayOfYear); Console.WriteLine("Hour: {0}", dt1.Hour); Console.WriteLine("Millisecond: {0}", dt1.Millisecond); Console.WriteLine("Minute: {0}", dt1.Minute); Console.WriteLine("Month: {0}", dt1.Month); Console.WriteLine("Second: {0}", dt1.Second); Console.WriteLine("Year: {0}", dt1.Year);

This code produces the following output:

 Day: 19 DayOfWeek: Tuesday DayOfYear: 293 Hour: 22 Millisecond: 259 Minute: 47 Month: 10 Second: 35 Year: 2004

There are also a few static members available that use calendar information to calculate other non-instance information. DaysInMonth(int year, int month) returns an integer indicating the number of days in the given month and year. IsLeapYear(int year) returns a Boolean to indicate if the provided year is a leap year or not.

Converting Between Local and UTC Time Zones

Dates are represented as either Local or UTC time. Converting between them is a common requirement, for example when serializing and deserializing dates to and from disk. Going from UTC to Local with ToLocalTime simply uses your local computer's time zone information to compute the appropriate offset, adding the offset to a UTC date. Similarly, ToUniversalTime does the reverse; it takes a DateTime and adds the offset to get to UTC. Invoking ToLocalTime on a DateTimeKind.Local instance results in no change to the date; the same is true for a DateTimeKind.Utc date and ToUniversalTime. The static SpecifyKind method enables you to change a DateTime's Kind without modifying the underlying date and time it represents. It simply returns a new DateTime instance with the same value as the target and with the specified Kind value.

It is a common mistake to store DateTime values in Local format. You should almost always serialize dates using a neutral time zone to avoid conversion problems later on. For example, if your application is distributed across the globe and your date wasn't stored in UTC, you'll need to determine the offset for the time zone in which it was saved, convert it back to UTC, and then go back to the (now) Local time zone. Clearly this is less than straightforward. UTC solves this problem. This suggestion applies not only when saving dates to a database but also when serializing objects or raw data to disk. In fact, it is generally less error prone to deal only with UTC dates even internally to your program, converting and formatting them as appropriate only when they are presented to the end user.

Time Spans

TimeSpan is a value type that represents an arbitrary interval of time, regardless of the specific start and end point in time. For example, you can create an interval of "1 month, 3 days, and 23 minutes," which can then be applied to concrete instances of DateTimes later on. TimeSpan also supports negative time intervals. A TimeSpan can be used to modify DateTime instances with the DateTime.Add method.

TimeSpan's constructor offers an overload that uses ticks, and a few that take combinations of days, hours, minutes, seconds, and milliseconds. There is also a set of properties that returns the values for different granularities for the units of time that a given TimeSpan represents. You can add or subtract two TimeSpans to generate a new one using the Add and Subtract methods, or the supplied operator overloads (+ and -). TimeSpan, like DateTime, is immutable; thus, any such operations return new instances containing the requested changes, leaving the target of your method call unchanged.

Note

You will also notice that TimeSpan does not handle calendar-based intervals such as months or years. While this would certainly be useful in some circumstances, implementing this correctly is quite problematic. For example, if you wanted to represent an interval of "one month," how many days would that be? Well, the answer is that it depends! Some months have 30 days, while others have 29, 31, or even 28. Unfortunately, TimeSpan doesn't have any knowledge of points in time or calendar information — it simply represents generic intervals.

Because TimeSpan can represent negative durations, you might need to calculate its absolute duration, regardless of sign. The Duration method calculates and returns a new TimeSpan that will always represent a positive interval. The Negate method returns an interval with a flipped sign, so that positive TimeSpans become negative and negative ones become positive.