Chapter 2: C Language Features | Introducing MicrosoftВ® LINQ

A full knowledge of the C# 3.0 language enhancements is not necessary to use Language Integrated Query (LINQ). For example, none of the new language features require a modification of the common language runtime (CLR). LINQ relies on new compilers (C# 3.0 or Microsoft Visual Basic 9.0), and these compilers generate intermediate code that works well on Microsoft .NET 2.0, given that you have the LINQ libraries available.

However, in this chapter, we provide short descriptions of C# features (ranging from C# 1.x to C# 3.0) that you need to clearly understand to work with LINQ most effectively. If you decide to skip this chapter, you can come back to it later when you want to understand what is really going on within LINQ syntax.

C# 2.0 Revisited

C# 2.0 improved the original C# language in many ways. For example, the introduction of generics enabled developers to use C# to define methods and classes having one or more type parameters. Generics are a fundamental pillar of LINQ.

In this section, we will describe several C# 2.0 features that are important to LINQ: generics, anonymous methods (which are the basis of lambda expressions in C# 3.0), the yield keyword, and the IEnumerable interface. You need to understand these concepts well to best understand LINQ.

Generics

Many programming languages handle variables and objects by defining specific types and strict rules about converting between types. Code that is written in a strongly typed language lacks something in terms of generalization, however. Consider the following code:

 int Min( int a, int b ) {     if (a < b) return a;     else return b; }

To use this code, we need a different version of Min for each type of parameter we want to compare. Developers who are accustomed to using objects as placeholders for a generic type (which is common with collections) might be tempted to write a single Min function such as this:

 object Min( object a, object b ) {     if (a < b) return a;     else return b; }

Unfortunately, the less than operator (<) is not defined for the generic object type. We need to use a common (or “generic”) interface to do that:

 IComparable Min( IComparable a, IComparable b ) {     if (a.CompareTo( b ) < 0) return a;     else return b; }

However, even if we solve this problem, we are faced with a bigger issue: the indeterminate result type of the Min function. A caller of Min that passes two integers should make a type conversion from IComparable to int, but this might raise an exception and surely would involve a CPU cost:

 int a = 5, b = 10; int c = (int) Min( a, b );

C# 2.0 solved this problem with generics. The basic principle of generics is that type resolution is moved from the C# compiler to the jitter. Here is the generic version of the Min function:

 T Min<T>( T a, T b ) where T : IComparable<T> {     if (a.CompareTo( b ) < 0) return a;     else return b; }

Note

The jitter is the run-time compiler that is part of the .NET runtime. It translates intermediate language (IL) code to machine code. When you compile .NET source code, the compiler generates an executable image containing IL code, which is compiled in machine code instructions by the jitter at some point before the first execution.

Moving type resolution to the jitter is a good compromise: the jitter can generate many versions of the same code, one for each type that is used. This approach is similar to a macro expansion, but it differs in the optimizations used to avoid code proliferation-all versions of a generic function that use reference types as generic types share the same compiled code, while the difference is maintained against callers.

With generics, instead of this:

 int a = 5, b = 10; int c = (int) Min( a, b );

you can write code such as this:

 int a = 5, b = 10; int c = Min<int>( a, b );

The cast for Min results has disappeared, and the code will run faster. Moreover, the compiler can infer the generic T type of the Min function from the parameters, and we can write this simpler form:

 int a = 5, b = 10; int c = Min( a, b );

Type Inference

Type inference is a key feature. It allows you to write more abstract code, making the compiler handle details about types. Nevertheless, the C# implementation of type inference does not remove type safety and can intercept wrong code (for example, a call that uses incompatible types) at compile time.

Generics can also be used with type declarations (as classes and interfaces) and not only to define generic methods. As we said earlier, a detailed explanation of generics is not the goal of this book, but we want to emphasize that you have to be comfortable with generics to work well with LINQ.

Delegates

A delegate is a class that encapsulates one or more methods. Internally, one delegate stores a list of method pointers, each of which can be paired with a reference to an instance of the class containing an instance method.

A delegate can contain a list of several methods, but our attention in this section is on delegates that contain only one method. From an abstract point of view, a delegate of this type is like a “code container.” The code in that container is not modifiable, but it can be moved along a call stack or stored in a variable until its use is no longer necessary. It stores a context of execution (the object instance), extending the lifetime of the object until the delegate is valid.

The syntax evolution of delegates is the foundation for anonymous methods, which we will cover in the next section. The declaration of a delegate actually defines a type that will be used to create instances of the delegate itself. The delegate declaration requires a complete method signature. In the code in Listing 2-1, we declare three different types: each one can be instantiated only with references to methods with the same signatures.

Listing 2-1: Delegate declaration

  delegate void SimpleDelegate(); delegate int ReturnValueDelegate(); delegate void TwoParamsDelegate( string name, int age );

Delegates are a typed and safe form of old-style C function pointers. With C# 1.x, a delegate instance can be created only through an explicit object creation, such as those shown in Listing 2-2.

Listing 2-2: Delegate instantiation (C# 1.x)

  public class DemoDelegate {     void MethodA() { … }     int MethodB() { … }     void MethodC( string x, int y ) { … }     void CreateInstance() {         SimpleDelegate a = new SimpleDelegate( MethodA );         ReturnValueDelegate b = new ReturnValueDelegate ( MethodB );         TwoParamsDelegate c = new TwoParamsDelegate( MethodC );         // …     } }

The original syntax needed to create a delegate instance is tedious: you always have to know the name of the delegate class, even if the context forces the requested type, because it does not allow any other. This requirement means, however, that the delegate type can be safely inferred from the context of an expression.

C# 2.0 is aware of this capability and allows you to skip part of the syntax. The previous delegate instances we have shown can be created without the new keyword. You only need to specify the method name. The compiler infers the delegate type from the assignment. If you are assigning a SimpleDelegate type variable, the new SimpleDelegate code is automatically generated by the C# compiler, and the same is true for any delegate type. The code for C# 2.0 shown in Listing 2-3 produces the same compiled IL code as the C# 1.x sample code.

Listing 2-3: Delegate instantiation (C# 2.0)

  public class DemoDelegate {     void MethodA() { … }     int MethodB() { … }     void MethodC( string x, int y ) { … }     void CreateInstance() {         SimpleDelegate a = MethodA;         ReturnValueDelegate b = MethodB;         TwoParamsDelegate c = MethodC;         // …     }     // … }

You can also define a generic delegate type, which is useful when a delegate is defined in a generic class and is an important capability for many LINQ features.

The common use for a delegate is to inject some code into an existing method. In Listing 2-4, we assume that Repeat10Times is an existing method that we do not want to change.

Listing 2-4: Common use for a delegate

  public class Writer {     public string Text;     public int Counter;     public void Dump() {         Console.WriteLine( Text );         Counter++;     } } public class DemoDelegate {     void Repeat10Times( SimpleDelegate someWork ) {         for (int i = 0; i < 10; i++) someWork();     }     void Run1() {         Writer writer = new Writer();         writer.Text = "C# chapter";         this.Repeat10Times( writer.Dump );         Console.WriteLine( writer.Counter );     }     // … }

The existing callback is defined as SimpleDelegate, but we want to pass a string to the injected method and we want to count how many times the method is called. We define the Writer class, which contains instance data that acts as a sort of parameter for the Dump method. As you can see, we need to define a separate class just to put together code and data that we want to use. A simpler way to code a similar pattern is to use the anonymous method syntax.

Anonymous Methods

In the previous section, we illustrated a common use for a delegate. C# 2.0 established a way to write the code shown in Listing 2-4 more concisely by using an anonymous method. Listing 2-5 shows an example.

Listing 2-5: Using an anonymous method

  public class DemoDelegate {     void Repeat10Times( SimpleDelegate someWork ) {         for (int i = 0; i < 10; i++) someWork();     }     void Run2() {         int counter = 0;         this.Repeat10Times( delegate {             Console.WriteLine( "C# chapter" );             counter++;         } );         Console.WriteLine( counter );     }     // … }

In this code, we no longer declare the Writer class. The compiler does this for us automatically with a hidden and automatically generated class name. Instead, we define a method inside the Repeat10Times call, which might seem as though we are really passing a piece of code as a parameter. Nevertheless, the compiler converts this code into a pattern similar to the common delegate example with an explicit Writer class. The only evidence for this conversion in our source code is the delegate keyword before the code block. This syntax is called an anonymous method.

Note

Remember that you cannot pass code into a variable. You can only pass a pointer to some code. Repeat this to yourself a couple of times before going on.

The delegate keyword for anonymous methods precedes the code block. When we have a method signature for a delegate that contains one or more parameters, this syntax allows us to define the names of the parameters for the delegate. The code in Listing 2-6 defines an anonymous method for the TwoParamsDelegate delegate type.

Listing 2-6: Parameters for an anonymous method

  public class DemoDelegate {     void Repeat10Times( TwoParamsDelegate callback ) {         for (int i = 0; i < 10; i++) callback( "Linq book", i );     }     void Run3() {         Repeat10Times( delegate( string text, int age ) {             Console.WriteLine( "{0} {1}", text, age );         } );     }     // … }

We are now passing two implicit parameters to the delegate inside the Repeat10Times method. Think about it: if you were to remove the declaration for the text and age parameters, the delegate block would generate two errors of undefined names.

Important

You will (indirectly) use delegates and anonymous methods in C# 3.0, and for this reason, it is important to understand the concepts behind them. Only in this way can you master this higher level of abstraction that hides growing complexity.

Enumerators and Yield

C# 1.x defines two interfaces to support enumeration. The namespace System.Collections contains these declarations, shown in Listing 2-7.

Listing 2-7: IEnumerator and IEnumerable declarations

  public interface IEnumerator {       bool MoveNext();       object Current { get; }       void Reset(); } public interface IEnumerable {       IEnumerator GetEnumerator(); }

An object that implements IEnumerable can be enumerated through an object that implements IEnumerator. The enumeration can be performed by calling the MoveNext method until it returns false.

The code in Listing 2-8 defines a class that can be enumerated in this way. As you can see, the CountdownEnumerator class is more complex, and it implements the enumeration logic in a single place. In this sample, the enumerator does not really enumerate anything but simply returns descending numbers starting from the StartCountdown number defined in the Countdown class (which is also the enumerated class).

Listing 2-8: Enumerable class

  public class Countdown : IEnumerable {     public int StartCountdown;     public IEnumerator GetEnumerator() {         return new CountdownEnumerator( this );     } } public class CountdownEnumerator : IEnumerator {     private int _counter;     private Countdown _countdown;     public CountdownEnumerator( Countdown countdown ) {         _countdown = countdown;         Reset();     }     public bool MoveNext() {         if (_counter > 0) {             _counter--;             return true;         }         else {             return false;         }     }     public void Reset() {         _counter = _countdown.StartCountdown;     }     public object Current {         get {             return _counter;         }     } }

The real enumeration happens only when the CountdownEnumerator is used by a code block. For example, one possible use is shown in Listing 2-9.

Listing 2-9: Sample enumeration code

  public class DemoEnumerator {     public static void DemoCountdown() {         Countdown countdown = new Countdown();         countdown.StartCountdown = 5;         IEnumerator i = countdown.GetEnumerator();         while (i.MoveNext()) {             int n = (int) i.Current;             Console.WriteLine( n );         }         i.Reset();         while (i.MoveNext()) {             int n = (int) i.Current;             Console.WriteLine( "{0} BIS", n );         }     }    // … }

The GetEnumerator call provides the enumerator object. We make two loops on it just to show the use of the Reset method. We need to cast the Current return value to int because we are using the nongeneric version of the enumerator interfaces.

Note

C# 2.0 introduced enumeration support through generics. The namespace System.Collections.Generic contains generic IEnumerable<T> and IEnumerator<T> declarations. These interfaces eliminate the need to convert data in and out from an object type. This capability is important when enumerating value types because there are no more box or unbox operations that might affect performance.

Since C# 1.x, enumeration code can be simplified by using the foreach statement. The code in Listing 2-10 produces a result equivalent to the previous example.

Listing 2-10: Enumeration using a foreach statement

  public class DemoEnumeration {     public static void DemoCountdownForeach() {         Countdown countdown = new Countdown();         countdown.StartCountdown = 5;         foreach (int n in countdown) {             Console.WriteLine( n );         }         foreach (int n in countdown) {             Console.WriteLine( "{0} BIS", n );         }     }    // … }

Using foreach, the compiler generates an initial call to GetEnumerator and a call to MoveNext before each loop. The real difference is that the code generated by foreach never calls the Reset method: two instances of CountdownEnumerator objects are created instead of one.

Note

The foreach statement can also be used with classes that do not expose an IEnumerable interface but that have a public GetEnumerator method.

C# 2.0 introduced the yield statement through which the compiler automatically generates a class that implements the IEnumerator interface returned by the GetEnumerator method. The yield statement can be used only immediately before a return or break keyword. The code in Listing 2-11 generates a class equivalent to the previous CountdownEnumerator.

Listing 2-11: Enumeration using a yield statement

  public class CountdownYield : IEnumerable {     public int StartCountdown;     public IEnumerator GetEnumerator() {         for (int i = StartCountdown - 1; i >= 0; i--) {             yield return i;         }     } }

From a logical point of view, the yield return statement is equivalent to suspending execution, which is resumed at the next MoveNext call. Remember that the GetEnumerator method is called only once for the whole enumeration, and it returns a class that implements an IEnumerator interface. Only that class really implements the behavior defined in the method that contains the yield statement.

A method that contains yield statements is called an iterator. An iterator can include many yield statements. The code in Listing 2-12 is perfectly valid and is functionally equivalent to the previous CountdownYield class with a StartCountdown value of 5.

Listing 2-12: Multiple yield statements

  public class CountdownYieldMultiple : IEnumerable {     public IEnumerator GetEnumerator() {         yield return 4;         yield return 3;         yield return 2;         yield return 1;         yield return 0;     } }

By using the generic version of IEnumerator, it is possible to define a strongly typed version of the CountdownYield class, shown in Listing 2-13.

Listing 2-13: Enumeration using yield (typed)

  public class CountdownYieldTypeSafe : IEnumerable<int> {     public int StartCountdown;     IEnumerator IEnumerable.GetEnumerator() {         return this.GetEnumerator();     }     public IEnumerator<int> GetEnumerator() {         for (int i = StartCountdown - 1; i >= 0; i--) {             yield return i;         }     } }

The strongly typed version contains two GetEnumerator methods: one is for compatibility with nongeneric code (returning IEnumerable), and the other is the strongly typed one (returning IEnumerator<int>).

The internal implementation of LINQ to Objects makes extensive use of enumerations and yield. Even if they work under the covers, keep their behavior in mind while you are debugging code.