Generics | Professional .NET Framework 2.0 (Programmer to Programmer)

As of the CLR 2.0, a new feature called generics permits programmers to employ parametric polymorphism in the code they write. This term simply means that code can be written to deal generally with instances whose types aren't known when compiling the code itself yet can be instantiated by users of that code to work with precise types. The feature is a close cousin to C++ templates — with deceivingly similar syntax — but not quite as powerful, cumbersome, and problematic.

Basics and Terminology

The basic idea behind generics is that types (classes, structs, delegates) or methods are defined in terms of any number of formal type parameters, which may be substituted for real types inside the definitions. The number of parameters a type or method accepts is called its type arity. A type with an arity of one or greater is called a generic type, while a method with an arity of one or more is called a generic method. A consumer of this generic type or method supplies the actual type arguments when he or she wishes to use it. This activity is called type instantiation. Notice that this is quite like instantiating a new object instance through a constructor.

For example, a type might accept a single type parameter T, and use T inside its definition, for example:

 class Foo<T> {     T data;     public Foo(T data)     {         this.data = data;     }     public U Convert<U>(Converter<T,U> converter)     {         return converter(data);     } }

The Converter type is a delegate defined in the System namespace whose signature is:

 public delegate TOutput Converter<TInput, TOutput>(TInput input);

The type Foo has a single type parameter T and uses that to define the instance state data. It has an arity of 1. The type of data won't be known until a caller actually constructs the type Foo<T> with a type argument for T. This argument could be int, string, or any other type you wish, including you own custom-defined ones. Because of this fact, you can't do much with something typed as T. You can statically perform only operations defined on Object, since that's the common base type for all types, meaning that you may of course pass it to methods expecting such things and also dynamically introspect it using the Reflection feature. (Reflection is described in Chapter 14.)

Notice too that T is used in for the parameter type for the constructor and also used in another generic type Converter<T, U> accepted as input for the Convert method. This demonstrates working with and passing instances of Ts around in a type-safe fashion even though we don't know the runtime value yet. We then have a method Convert, which accepts its own type parameter U. It has access to both T and U in its definition, and accepts a Converter<T, U> parameter. Notice that its return type is U.

Instantiation

A user of the Foo<T> type might write the following code:

 Foo<int> f = new Foo<int>(2005); string s = f.Convert<string>(delegate(int i) { return i.ToString(); }); Console.WriteLine(s);

This instantiates the type Foo<T> with an argument type int. At that point, you can conceptually replace each occurrence of T with int in the above class definition. In other words, the field data is now typed as int, its constructor accepts an int, and the Convert method takes a Converter<int,U>, where U is still unknown. Foo<int> is the instantiated type of the Foo<T> generic type and is fully constructed because we've supplied arguments for all of its parameters. We then create an instance of that type passing 2005 as the constructor parameter.

The code then goes ahead and instantiates the Convert<U> method with an argument type string . Again, you can conceptually replace each occurrence of U with string in the method definition and body for Convert. In other words, it returns a string, and accepts as input a Converter<int,string>. We then pass in as anonymous delegate that converts an int to a string simply by calling its ToString method. The result is that s contains the string "2005", and we print it to the console. Of course, for any instance of a Foo<T>, a different value for T can be supplied, and for every invocation of the method Convert<U> a different value for U can be supplied.

Other Language Support

Of course, these examples have been very C#-centric. I noted at the beginning of this chapter that they would be. But generics is a feature that has been carefully woven throughout the entire type system, and both VB and C++/CLI support generics in the language syntax. The above class Foo<T> might be written as follows in VB, for example, using its Of T syntax:

 Class Foo(Of T)     Private data As T     Public Sub Foo(data As T)         Me.data = data     End Sub     Public Function Convert(Of U)(converter As Converter(Of T, U)) As U         Return converter(Me.data)     End Function End Class

And textual IL represents generics using its own syntax. Some of the following might seem foreign, as we haven't yet discussed IL in depth (that's next chapter). But nonetheless, you should just pay attention to the syntactical differences for representing type parameters and using them in the type and method definitions for the time being:

 .class auto ansi nested private beforefieldinit Foo'1<T>     extends [mscorlib]System.Object {     .field private !T data     .method public hidebysig specialname rtspecialname             instance void .ctor(!T data) cil managed     {         .maxstack 8         ldarg.0         call instance void [mscorlib]System.Object::.ctor()         ldarg.0         ldarg.1         stfld !0 class Foo'1<!T>::data         ret     }     .method public hidebysig instance !!U          Convert<U>(class [mscorlib]System.Converter'2<!T,!!U> convert)          cil managed     {         .maxstack  2         .locals init ([0] !!U $0000)         ldarg.1         ldarg.0         ldfld !0 class Foo'1<!T>::data         callvirt instance !1 class             [mscorlib]System.Converter'2<!T,!!U>::Invoke(!0)         stloc.0         ldloc.0         ret     } }

Notice that the type itself is named Foo'1<T>. The '1 represents the arity of the type, and uses a !T to refer to the T parameter throughout its definition. The method is quite similar, although it doesn't have an arity marker and refers to its parameters with a double bang, for example !!U.

Generics are admittedly a difficult concept to get your head around at first. Once you do, it's quite powerful. The best way to get started on the right foot is to look at an example.

An Example: Collections

Collections are the canonical example for illustrating the benefits of generics. With the very mature C++ templates-based library Standard Template Library (STL), it's no wonder; there are plenty of great APIs there to borrow from. But this is for good reason: Collections without generics are painful, and collections with generics are beautiful and simply feel natural to work with.

In version 1.x of the Framework, most developers used the System.Collections.ArrayList type to store a collection of objects or values. It has a number of convenient methods to add, locate, remove, and enumerate the contents, among others. Looking at the ArrayList public surface area reveals a number of methods that deal with items of type System.Object, for example:

 public class ArrayList : IList, ICollection, IEnumerable, ICloneable {     public virtual int Add(object value);     public virtual bool Contains(object value);     public object[] ToArray();     public object this[int index] { get; set; }     // And so forth... }

ArrayList's contents are typed as object so that any object or value can be stored inside of it. Other collections, such as Stack and Queue also follow this pattern. But some drawbacks to this approach become apparent after working with these types for a brief amount of time.

No Type Safety

First and foremost, most collections aren't meant to contain instances of any arbitrary types. You'll probably want a list of customers, a list of strings, or a list of some common base type. Seldom will it be a big bag of stuff that is operated on solely through the object interface. This means that you have to cast whenever you extract something from it:

 ArrayList listOfStrings = new ArrayList(); listOfStrings.Add("some string");  // ... string contents = (string)listOfStrings[0]; // must cast

But that's probably the least of your worries.

Remember, an instance of an ArrayList says nothing about the nature of its contents. In any single list, you can store anything you want inside of it. Only when you take items out of it will you realize that there is a problem. Consider this code:

 ArrayList listOfStrings = new ArrayList(); listOfStrings.Add("one"); listOfStrings.Add(2); // Whoops! listOfStrings.Add("three");

This snippet compiles just fine, although we accidentally added int 2 instead of the string "two" into the list. There nothing anywhere (other than perhaps the variable name, which can of course be aliased) that states we intended to allow only strings, certainly nothing that enforces it. You can expect some-body might write the following code somewhere:

 foreach (string s in listOfStrings)     // Do something with the string 's'...

At that point, the program will observe a spurious CastClassException from the foreach line. Why? It happens because somebody added an int, which clearly cannot be cast to a string.

Wouldn't it be nice if we could actually restrict the list to only strings? Many people work around this problem by authoring their own custom collections, writing strongly typed methods (e.g., Add, Remove , and so forth) that deal with the correct type only, overriding or hiding the object-based overloads to do a dynamic type check. This is called a strongly typed collection.

The System.Collections.Specialized.StringCollection is a reusable strongly typed collection for string. If the example above used StringCollection instead of ArrayList, the code wouldn't even compile if we tried adding an int. There are drawbacks even with this approach, however. If you're accessing methods through the IList interface — which StringCollection supports and is still typed as object — you won't get compiler support. Instead, the Add method will detect an incompatible type and throw at runtime. This is admittedly nicer than throwing when taking items out of the list, but still not perfect.

Boxing and Unboxing Costs

Subtler problems are present with the ArrayList, too. One major one is that creating lists of value types requires that you box values before putting them into the list, and unbox them as you take values out. For operations over large lists, the cost of boxing and unboxing can easily dominate the entire computation. For example, consider the following example. It generates a list of 1,000,000 random integers and then walks through them to perform an addition reduction:

 ArrayList listOfInts = GenerateRandomInts(10000000); long sum = 0; foreach (int x in listOfInts)     sum += x;  // ...

When I profile this piece of code, boxing and unboxing consumes roughly 74% of the execution time! Again, by creating your own strongly typed collection for ints, you can eliminate the boxing and unboxing costs (assuming that you call directly through the collection methods rather than through, say, IList).

A Solution: Enter Generics

Creating and maintaining your own collection type is costly, has little to do with application logic, and is so common that it often resulted in a proliferation of so-called strongly typed collections inside a single application. Let's face it: it's no fun.

Generics solves this problem for collections with a new generic type System.Collections.Generic .List<T>, which has an arity of 1 and a type parameter T. The type argument specified at instantiation time represents the type of instances the list is meant to hold. For example, if you want a "list of strings" you can express just that by typing your variable as List<string>. Similarly, specifying a List<int> will ensure that the list holds only ints, and avoids the costs of boxing and unboxing altogether because the CLR generates code that works directly with ints. Also note that you can represent a list that holds a set of items whose types are polymorphically compatible with the type argument. For example, if we had a type hierarchy where A was the base class, and both B and C derived from A, specifying List<A> this would mean that you can store items of type A, B, and C inside of it.

The List<T> type definition looks much like ArrayList, but uses T instead of object, for example:

 public class List<T> : IList<T>, ICollection<T>, IEnumerable<T>,     IList, ICollection, IEnumerable {     public virtual void Add(T item);     public virtual bool Contains(T item);     public T[] ToArray();     public T this[int index] { get; set; }     // And so forth... }

Now the original program can be rewritten as:

 List<string> listOfStrings = new List<string>(); listOfStrings.Add("one"); listOfStrings.Add(2); // The compiler will issue an error here listOfStrings.Add("three");

The compiler won't even permit you to add something of the wrong type to listOfStrings now, and your foreach statement can be sure that it won't encounter CastClassExceptions as it takes items out of the list. There is much more to generics than this, of course, which is the topic of this section. Similarly, there is much more to Collections, a detailed coverage of which has been deferred to Chapter 6.

Construction: From Open to Closed

We touched briefly in the opening paragraphs on this section on the idea of instantiating generic types and methods. But we did not specify precisely the various ways to do so. We call a generic type that has not been supplied any arguments for its type parameters an open type — because it is open to accepting more arguments — while one which has been supplied all of its type arguments is called a constructed type (sometimes called a closed type). A type can actually be somewhere between open and constructed, called an open constructed type. You can only create instances of a constructed type when all type arguments have been supplied, not open or open constructed types.

Let's see how you can end up with an open constructed type. One thing that hasn't been explicitly noted yet is that when deriving from a generic type, the subclass may specify one or more of the generic type parameters of its base class. Consider this generic type with an arity of 3:

 class MyBaseType<A, B, C> {}

Of course, to instantiate a new MyBaseType, the client would need to supply arguments for A, B, and C, creating a constructed type. But a subclass of MyBaseType can specify as many arguments for the parameters as it wishes; from 0 to 3, for example:

 class MyDerivedType1<A, B, C> : MyBaseType<A, B, C> {} class MyDerivedType2<A, B> : MyBaseType<A, B, int> {} class MyDerivedType3<B> : MyBaseType<string, B, int> {} class MyDerivedType4 : MyBaseType<string, object, int> {}

Without a user having to supply any type arguments whatsoever, MyDerivedType1 is an open type, MyDerivedType4 is a constructed type, and the other two are open constructed types. They have at least one type argument supplied yet still at least one type parameter that must be supplied before they are fully constructed.

Methods have the same open and closed designations but cannot take on the open constructed form. A generic method can either be fully constructed or not, and may not be somewhere in between. You may not, for example, override a virtual generic method and supply generic arguments.

Generic Type Storage: Statics and Inner Types

Data that belongs to a type — including static fields and inner types, for example — are unique for each instantiation of a generic type. This means that given a type such as the following:

 class Foo<T> {     public static int staticData; }

Each unique instantiation of Foo<T> will have its own copy of staticData. In other words, Foo<int>.staticData is an entirely different field location than Foo<string>.staticData, and so forth. If staticData were typed as T, it would be clear why.

Similarly, each instantiation of a generic type manufactures unique inner types:

 class Foo<T> {     enum MyEnum     {         One, Two, Three     } }

It turns out that Foo<int>.MyEnum and Foo<string>.MyEnum are two completely separate (and incompatible) types! Again, this shouldn't really be surprising but often is.

Some Words of Caution

Before you jump the gun and start sprinkling generics throughout all of your applications, you should consider the impacts on usability and maintainability. Here are some high-level points to keep in mind:

Many users have a difficult time with the generics syntax. If you understood the above without having to reread any sections, it's likely that you're familiar with one of generics' close cousins, such as C++ Templates or Eiffel's or Java 5's generics system. Most people don't latch on so quickly. It's likely that a large portion of the.NET Framework developer base will still have not yet read about or used generics in production even years after their release.
Choosing the correct naming for generic type parameters can also make a large difference in terms of usability. You'll notice many types using the traditional single-letter convention, starting with T and using the next letters in the alphabet for additional parameters. List<T> uses this very convention. But where the parameter isn't completely obvious, providing a more descriptive name — for example, System.EventArgs<TEventArgs> — can substantially improve usability. The convention is to prefix the type parameter with a T.
Generic types and methods with high arity are difficult to work with. Some languages (e.g., C#) will infer generic type arguments based on the type of ordinary arguments that can help to eliminate the burden, but in general it's best avoided. It's very easy to forget in what order type parameters appear, causing problems when writing the code but making them even worse when maintaining it.

There are also some performance considerations to make. We already saw above that when generics are used in situations requiring boxing and unboxing for values, you can realize some performance benefits. However, there is some cost you pay for code generation size (i.e., working set) that results from a large number of unique instantiations over a single generic type, especially for value type arguments. The reason is that specialized code is needed to work with the different type arguments. This is discussed in further detail in the context of the Just in Time (JIT) Compiler in Chapter 3.

Constraints

We've spoken about generics without introducing the notion of constraints thus far. But constraints are very powerful, enabling you to constrain the argument for a given type parameter using criteria and to perform operations inside type and method definitions using that type that are statically type-safe given those constraints. You can make assumptions about the value of the type argument, and the runtime will guarantee that they are true. Without them, all you can do with something typed as a type parameter is to treat them like Objects and to pass them to other things with the same type parameter.

Constraining on Type

There are two ways to constrain type parameters. The first is to define that a type parameter must be polymorphically compatible with a specific type, meaning that it inherits (or is) a common base type or implements a specific interface. You can think of a type parameter without constraints as being implicitly constrained to System.Object. This constraint enables you to choose any arbitrary nonsealed base class or an interface instead. C# makes this easy with special syntax, for example:

 class Foo<T> where T : IComparable<T> {     public void Bar(T x, T y)     {         int comparison = x.CompareTo(y);         // ...     } }

The where T : <type> syntax specifies the constraint type, in this case declaring that any argument for T must be a type that implements IComparable<T>. Notice that inside the type's definition we can now invoke IComparable<T> operations on instances typed as T. This would be true of members on a class as well, that is, if we had constrained to a base class. The same syntax may be applied to generic methods:

 class Foo {     public void Bar<T>(T x, T y) where T : IComparable<T>     {         int comparison = x.CompareTo(y);         // ...     } }

These examples actually show off a bit more of the power of generics — that is, the fact that the type parameter is in scope in the constraint itself, enabling you to define the constraint in terms of the runtime type argument. In other words, the constraint mentions T in that it specifies T must implement IComparable<T>. This has the potential to be confusing for newcomers to generics but is quite expressive indeed. You can of course use plain-old base types and interfaces too:

 class Foo<T> where T : IEnumerable {} class Foo<T> where T : Exception {} // And so forth...

Again, this can be applied to both generic type and method type parameters.

Special Runtime Constraints

The second way to constrain type parameters is to use one of the special constraints offered by the CLR. There are three. Two indicate whether the type argument is a reference or value type (class and struct) and use the same syntax as above, differing only in that the keyword class or struct takes the place of the type name, for example:

 class OnlyRefTypes<T> where T : class {} class OnlyValTypes<T> where T : struct {}

One interesting thing to note is that both the class and struct constraints intentionally exclude the special type System.Nullable<T>. This is because Nullable is somewhere in between a reference and value type in the runtime, and neither was deemed appropriate by the designers. Thus, it is not valid for a type parameter constrained to take on the Nullable<T> type argument at construction time.

Lastly, you may constrain a type parameter to only arguments with default constructors. This enables the generic code to create instances of them using the default coinstructor. For example:

 class Foo {     public void Bar<T>() where T : new()     {         T t = new T(); // This is possible only because of T : new()         // ...     } }

The emitted IL code uses the Activator.CreateInstance API to generate an instance of T, binding to the default constructor at runtime. This API is also used for reflection- and COM-based instantiation. It utilizes dynamic information available within internal CLR data structures to construct a new instance for you. This is mostly transparent, although if the constructor throws an exception, you will notice the call to CreateInstance in the call-stack.