Delegates | Professional .NET Framework 2.0 (Programmer to Programmer)

A delegate is a strongly typed function pointer in managed code. A "delegates" is a type that defines an expected return value and parameter types; an instance of a delegate is a target object and bound method that matches those expected types. A key feature of many dynamic and functional languages is first class functions, meaning that a function is just a value like anything else in the type system. You can pass them around as arguments, return them from functions, store them in fields of data structures, and so forth. C offers function pointers for this purpose. Delegates are the equivalent in managed code to such features in other environments.

To work with a delegate, you must first define a delegate type. We discuss this in depth in Chapter 2. All delegate types derive from System.MulticastDelegate. There is also a System.Delegate type (which is MulticastDelegate's base type), but the Framework has evolved such that no delegates actually derive directly from it in practice. This is an artifact of legacy usage; version 1.0 of the product used to distinguish between the two types.

This type represents the strongly typed signature for functions that instances of this delegate are able to refer to. This enables delegate-based method calls to sit nicely in the CLR's strongly typed type system. An instance of a delegate type is then formed over a target object and method. In the simple case, a delegate instance points to two things: a target object and a code pointer to the method to invoke. (For static methods, where there is no this pointer, null is used for the target.) A delegate can also refer to a linked list chain of targets and code pointers, each of which gets called during an Invoke.

Inside Delegates

Imagine that we have defined our own MyDelegate type like this in C#:

 delegate string MyDelegate(int x);

This represents a function pointer type that can refer to any method taking a single int argument and returning a string. The naming of the parameters is irrelevant. When you work with an instance of this delegate type, you'll declare variables of type MyDelegate. This type declaration is C# syntactic sugar. Behind the scenes, the compiler is generating a new type for you:

 private sealed class MyDelegate : MulticastDelegate {     public extern MyDelegate(object object, IntPtr method);     public extern virtual string Invoke(int x);     public extern virtual IAsyncResult BeginInvoke(int x,         AsyncCallback callback, object object);     public extern virtual string EndInvoke(IAsyncResult result); }

Notice that all of the methods are marked extern (i.e., runtime in IL). This means that the implementations are provided internally to the CLR rather than in managed code. The various methods are also marked virtual so that the CLR can play a trick. It enables the CLR to capture any calls through the delegate, and uses a thunk to produce the right code to make the actual call to the underlying target method.

Now imagine that we have our own custom type, MyType, with a method MyFunc whose signature matches MyDelegate:

 class MyType {     public string MyFunc(int foo)     {         return "MyFunc called with the value '"+ foo + "' for foo";     } }

This type isn't special at all. But MyType.MyFunc's signature matches MyDelegate's exactly. (You might notice that the parameters are not named identically. That is all right, since delegates only require that the expected types be found in the correct signature positions.) We will use this as an example of how to use delegates to call functions.

Once we've got a delegate type in metadata and a target function we'd like to call, we must form an instance of the delegate over a target. This constructs a new instance of the delegate type using the constructor MyDelegate(object, IntPtr). There is little magic here: the code simply passes the target as the first argument and a pointer to the code function as the second. C# has some nice syntax for this:

 MyType mt = new MyType(); MyDelegate md = mt.MyFunc;

The absence of parenthesis when accessing mt.MyFunc might look strange at first. This is really the only time you will ever dereference an object reference and access a method as though it were a property. If this were a static method, mt would be replaced by the class name (e.g., MyType.MyStaticFunc); again, there are no parentheses. The IL for this construct more accurately describes what is going on:

 newobj instance void MyType::.ctor() ldftn  instance string MyType::MyFunc(int32) newobj instance void MyDelegate::.ctor(object, native int)

Here, we just construct a new MyType instance and then issue a ldftn instruction for the MyType .MyFunc method token. For virtual methods, this would have been a ldvirtftn instruction and requires the target object on the stack so that it may resolve virtual method calls. Ld*ftn takes as input a methoddef or methodref, and leaves behind a pointer to code that, when invoked, results in the target method being called. Notice that the resulting code pointer is instance agnostic. When it's being called the this pointer (for instance methods) must be supplied. We then pass the newly constructed MyType as the target along with the pointer to the MyDelegate constructor.

While the IL produced looks like a simple set of instructions, the CLR's verifier and JIT actually have intimate knowledge of this calling sequence. The verifier ensures that the sequence of operations is typesafe. In other words, it ensures that you only pass function pointers to methods whose signature is 100% type compatible with the delegate type.

In-Memory Representation

Figure 14-3 depicts the memory layout. We have a single instance of both MyDelegate and MyType on the left-hand side (which is on the GC heap; in other words, these are both managed objects). The delegate instance holds a reference to the target, which, in this case, is an instance of MyType. On the right-hand side, you can see some of the internal CLR data structures. We discussed these structures in more detail in Chapter 3. The method table represents the type identity for any object instance and contains pointers to each of that type's method's code. This is how the delegate locates the code address to which it must bind. The delegate points directly at the code for the method after binding; in this case, that's the MyType.MyFunc method. Note that for static methods, the target is unnecessary and, thus, would be null.

image from book
Figure 14-3: Delegates inside the CLR.

One word of caution related to GC eligibility: Because both the delegate and the target are GC heap-allocated objects, they will be collected when both become unreachable. As long as an instance of the delegate lives, its corresponding target will remain alive. Especially in situations where you use a chain of delegates — for example, with the CLR events — it's very easy to forget about this. The result is that you will inadvertently keep an object alive until the event source goes away. We discuss events further below.

Invoking the Delegate

Once a delegate has been constructed, it may be passed around like any ordinary object. It can be stored away in a field, in a hash table, provided as an argument to a method, and so forth. They truly are first class in the CTS. But more interestingly, a delegate can be applied. In other words, you can invoke the method for which delegate has been formed. The target provided at delegate construction time is passed as the this pointer to the function and any arguments supplied at invocation time are passed to the actual method's code in memory.

Notice that the MyDelegate type above has an Invoke method that has parameters matching the wrapped function's parameters and that returns the same type as the underlying function. You can call this method directly or alternatively you can use your favorite language's syntactic support for it. For example, C# enables you to call a delegate as though it were a function name:

 MyDelegate md = /*...*/; Console.WriteLine(md.Invoke(5)); Console.WriteLine(md(5));

In C#, both of these lines are equivalent. In fact, the latter gets compiled down to the former in the resulting IL. Assuming that md refers to MyType.MyFunc (as demonstrated above), both of these lines would result in the string "MyFunc called with the value '5' for foo" being printed to the console window. This works via an internal implementation that the CLR provides for Delegate.Invoke methods. It does the minimal work possible to patch up register and stack state so that arguments can be passed seamlessly to the underlying target code address.

Dynamic Invocation

What you've seen thus far is the CLR's static type system support for delegates. Delegate binding had to be supplied up front in the metadata and invocations handled through a fast method on the delegate called Invoke. But what if you don't know your target up front? Keeping with the theme of dynamic programming in this chapter, delegates also expose a fully dynamic programming model. Using the static method Delegate.CreateDelegate and instance method Delegate.DynamicInvoke, you can avoid having to emit any delegate-related IL whatsoever.

CreateDelegate offers a number of overloads. Each one requires that you supply a delegate type as the first argument type. The various overloads offer a mixture of the following bits of information:

The target object over which to form the delegate. This can be null if binding to a static method, or you can use one of the overrides that omit this parameter altogether.
The method against which to bind. This can be provided in the form of a MethodInfo, or a string. If you're using the string override, you must supply either an object target (from which to infer type), a Type specifying the target type, or fully qualified method name.

DynamicInvoke takes an object[] and returns object instead of perfectly matching the signature of the underlying method. There is a cost associated with the internal code having to unravel the object array's contents and construct the call frame to dispatch to the underlying method. Therefore, don't expect as high performance with dynamic invocation as you would see with static invocation.

Events and Delegate Chains

A single delegate instance can refer to multiple target/method pairs. If a delegate holds more than one pair internally, when it is invoked it walks the chain and invokes each in order. This is precisely how C# events work. In fact, an event is simply a protocol for accessing MulticastDelegates, something we discussed in more detail in Chapter 2 on the CTS. The internal data fields of a delegate are actually immutable after it's been created, so in order to create a list you "combine" two delegates using the Delegate.Combine static method. This combination can involve two standalone delegates, two chains of delegates, or one standalone delegate and one chain, for example. All delegates in a chain must be of the same exact type, and the return of Combine is of the same type:

 delegate void FuncVoid<T> (T t); FuncVoid<int> f1 = delegate(int x) { Console.WriteLine(x * x); }; FuncVoid<int> f2 = delegate(int x) { Console.WriteLine(x * 2); }; FuncVoid<int> f3 = (FuncVoid<int>)Delegate.Combine(f1, f2); f3(10);

This example creates two delegates, f1 and f2, each of which take an integer as input. The first squares its input, while the second doubles it. Both print the result to the console. Using the Combine method, we then create a combined delegate, f3, which holds references to both f1 and f2. When we invoke f3, both of the underlying delgates are invoked in sequence. The result is that "100" and "20" are printed out, respectively. Note that the delegates can have return values. For example, consider what would happen if f1 and f3 returned the integer instead of printing it out to the console. In this case, invoking the combined delegate calls both delegates and returns only the last value in the chain.

Recall earlier that we stated that a delegate keeps its target alive while somebody holds a reference to the delegate itself. Thus, you have to take care — especially in the case of long-running event handlers — to remove delegates from a combination when you no longer need them. The static method Delegate .Remove takes as input two Delegates and returns a Delegate. The first argument is the chain, while the second is the delegate to remove from the chain. Remove stops once it has removed a single delegate. If you want to remove all instances of a delegate from a chain, use the RemoveAll method.

Parameterized Delegates

As noted in previous chapters, delegates can be parameterized on type, using the generics feature in 2.0. This means that you can create general-purpose delegates and have users specify the type arguments when they need to bind to a specific target.

A great example of where this feature has been useful is the new generic Framework delegate type EventHandler<TEventArgs>. Before generics, everybody wanting to create their own event handler would have to create their own EventHandler type that took the correct EventArgs e parameter. This was just another type to worry about. For instance:

 delegate void MyEventHanler(object sender, MyEventArgs e); class MyEventArgs : EventArgs { /* ... */ } class MyType {     public event MyEventHandler MyEvent; }

But now you can save some typing and maintenance by providing a generics version:

 class MyEventArgs : EventArgs { /* ... */ } class MyType {     public event EventHandler<MyEventArgs> MyEvent; }

Consider another example of where generic delegates help to save code and make life easier. Many people like the anonymity of C-style function pointers, for example, where you don't need to name the signature. The example above with MyDelegate might look as follows in C:

 char* myFunc(int foo) {     // ... } int main() {     char* (*ftnptr)(int) = myFunc;     printf(ftnptr(10)); }

Notice how we specify inline the types of return values and parameters rather than declaring a separate type elsewhere. It's very simple with parameterized delegates to create a set of all-purpose delegates, which can be formed over essentially any target signature. You can then specify the types at bind time and get an effect similar to C-style function pointers:

 delegate T Func<T>(); delegate T Func<T, A>(A a); delegate T Func<T, A, B>(A a, B b); // ... delegate void FuncVoid(); delegate void FuncVoid<A>(A a); delegate void FuncVoid<A, B>(A a, B b); // ...

Now we can use Func to form a delegate over our MyType.MyFunc method defined above:

 Func<string, int> f = new MyType().MyFunc; f(10);

The C# compiler is able to match the right-hand method (MyFunc) with the left-hand side after the generic arguments have been supplied. If you expand the type arguments T and A for Func<T, A> by hand, the type expression Func<string, int> above creates a delegate with the signature string Func(int a). The compiler can infer that these types match up correctly.

Covariant and Contravariant Delegates

I've made some simplifications on the rules for binding up until this point. I stated that the return value and parameter types of the target method must match the delegate exactly in order to form a delegate instance over a target. This is technically incorrect: The CLR 2.0 permits so-called covariant and contravariant delegates. These terms are well defined in the field of computer science (and derived from the field of mathematics) and are forms of type system polymorphism. Covariance means that a more derived type can be substituted where a lesser derived type was expected; contravariance is the opposite: a lesser derived type can be substituted where a further derived type was expected.

Covariant input is already permitted in terms of what arguments a user can supply to a method. If your method expect BaseClass and somebody gives you an instance of DerivedClass (which subclasses BaseClass), the runtime permits it. This is bread-and-butter object orientation and polymorphism. Similarly, output arguments can be contravariant because if the caller expects a lesser derived type, there is no harm in supplying an instance of a further derived type. Delegates permit you to use the same type system relationships to bind delegates to a target method.

Note

The topics of coand contravariance get relatively complex quickly. Much of the literature says that contravariant input and covariant output is sound, which happens to be the exact opposite of what I just stated! But literature is usually in reference to the ability to override a method with a co-/contra variant signature, in which case it's true. Derived classes can safely relax the typing requirements around input and tighten them around output if it deems it appropriate. Calling through the base signature is still be type-safe, albeit more strict than necessary. (The CLR doesn't support co- and contravari ance when overriding methods.)

For delegates, however, we are looking at the problem from a different angle: we're simply saying that anybody who makes a call through the delegate might be subject to more specific input requirements and can expect less specific output. If you consider that calling a function through a delegate is similar to calling through a base class signature, this is the same underlying principle.

As an example, consider the following definitions:

 class A { /* ... */ } class B : A { /* ... */ } class C : B { /* ... */ } B Foo(B b) { return b; }

If we wanted to form a delegate over the method Foo, ordinarily we'd need a delegate that returned a B and expected a B as input. This would look MyDelegate1 below:

 delegate B MyDelegate1(B b); delegate B MyDelegate2(C c); delegate A MyDelegate3(B b); delegate A MyDelegate4(C c);

But we can use covariance on the input to require that people calling through the delegate supply a more specific type than B. MyDelegate2 above demonstrates this. Alternatively, we could use contravariance to hide the fact that Foo returned a B and instead make it look as if it only returns an A, as is the case with MyDelegate3. Lastly, we could use both co- and contravariance simultaneously, as shown with MyDelegate4. All four of these delegate signatures will bind to the Foo method above.

Asynchronous Delegates

The asynchronous programming model (APM) is described in detail in Chapter 10, which discusses threading in general. Delegates conform to the APM by implementing a BeginInvoke and EndInvoke set of methods. This exposes a simple way to invoke any method in an asynchronous fashion, even if the API itself doesn't expose a way to do so. The System.IO.Stream types, for instance, inherently support the APM with things like Begin- and End-Read and -Write. But if an API doesn't support asynchronicity implicitly, all you need to do is form a delegate over it and invoke it using the BeginInvoke method.

Note

Note that I/O asynchronous calls are not just generic Begin/End calls. They actually make use of I/O Completion Ports for high performance and scalability, especially for the server. So, if I've led you to believe that an asynchronous delegate is identical to intrinsic asynchronous support in Framework APIs, I apologize: There can be significant differences. When an API offers inherent APM support, favor that (always) over cooking up a delegate and invoking it asynchronously.

The APM section in Chapter 10 describes all of these topics in more detail, but for completeness we'll discuss them at a high level here in the context of asynchronous delegates. The BeginInvoke method has a parameter for each parameter of the underlying delegate (just like Invoke) and adds two parameters: an AsyncCallback delegate, which gets invoked when the asynchronous operation completes, and an object, which is simply passed as the IAsyncResult.AsyncState property value to the callback function. The method returns an IAsyncResult, which can be used to monitor completion, wait on the WaitHandle, or complete the asynchronous call.

When the delegate has completed execution, you must call EndInvoke on the delegate, passing in the IAsyncResult. This cleans up the WaitHandle (if it was allocated), throws an exception if the delegate failed to execute correctly, and has a return type matching the underlying method's. It returns the value returned by delegate invocation. Calling EndInvoke can occur either from inside your callback or wherever you waited for completion.

Note

For people familiar with Asynchronous Procedure Calls (APCs) in Win32, there is a subtle difference in how callbacks occur there versus on the CLR. Namely, the thread that APCs are called back on is a specific thread, which has had an APC queued for it. But the CLR does not guarantee where a callback will execute. In practice, it will ordinarily be a ThreadPool thread, but it certainly does not get reentered on top of the initiating thread's stack as a result of entering an alertable wait state (as with APCs).

This code snippet demonstrates the three ways you might choose to call a delegate asynchronously:

 delegate int IntIntDelegate(int x); int Square(int x) { return x * x; } void AsyncDelegatesExample() {     IntIntDelegate f = Square;     /* Version 1: Spin wait (quick delegate method) */     IAsyncResult ar1 = f.BeginInvoke(10, null, null);     while (!ar1.IsCompleted)         // Do some expensive work while it executes...     Console.WriteLine(f.EndInvoke(ar1));     /* Version 2: WaitHandle wait (longer delegate method) */     IAsyncResult ar2 = f.BeginInvoke(20, null, null);     // Do some work...     ar2.AsyncWaitHandle.WaitOne();     Console.WriteLine(f.EndInvoke(ar2));     /* Version 3: Callback approach */     IAsyncResult ar3 = f.BeginInvoke(30, AsyncDelegateCallback, f);     // We return from the method (while the delegate executes)... } void AsyncDelegateCallback(IAsyncResult ar) {     IntIntDelegate f = (IntIntDelegate)ar.AsyncState;     Console.WriteLine(f.EndInvoke(ar)); }

The first version uses a loop, checking IsCompleted each time around. It might be doing some expensive computation, responding to messages, or otherwise doing something useful in the body of the loop. The second version uses a WaitHandle.WaitOne blocking wait instead of looping. This puts the current thread into a blocked state until the delegate finishes executing. The third and last version uses a delegate to process the result of the delegate once it is complete, and performs the EndInvoke inside the call-back itself. The section in Chapter 10 discusses rationale for choosing one over the other.

Anonymous Methods (Language Feature)

The ability to declare anonymous methods is a feature of the C# 2.0 language, not of the CLR itself. But they are so useful and pervasive that it's worth a brief mention in this chapter. Due to the ease with which delegates permit you to pass method pointers as arguments to other methods, it's sometimes preferable to simply write your block of code inline rather than having to set up another method by hand. Anonymous delegates permit you to do this. It's purely syntactic sugar.

Consider a method that takes a delegate and applies it a number of times:

 delegate int IntIntDelegate(int x); void TransformUpTo(IntIntDelegate d, int max) {     for (int i = 0; i <= max; i++)         d(i); }

If we wanted to pass a function to TransformUpTo that squared the input, we'd have to first write an entirely separate method over which we'd form a delegate. However, in 2.0, we can use anonymous delegates to accomplish the same thing:

 TransformUpTo(delegate(int x) { return x * x; }, 10);

The C# compiler generates an anonymous method in your assembly that implements the functionality indicated inside the curly braces. The compiler is smart enough to deduce that the function returns an integer (because it's used in a context that expected a function returning an integer and because that's precisely the type of the statement return x * x), and the parameter types are specified explicitly in the parenthesis following the delegate keyword.

We won't spend too much time on this feature. But suffice it to say, it's very complex and very powerful. You can capture variables inside the delegate that are in the current lexical scope. This is called a closure in many other programming environments, because the method is bound to such free variables using a heap-allocated object. The compiler does a lot of work to ensure that this works correctly. Take a look at the IL that is generated if you'd like to appreciate the work it's doing for you:

 delegate void FooBar(); void Foo() {     int i = 0;     Bar(delegate { i++; });     Console.WriteLine(i); } void Bar(FooBar d) {     d(); d(); d(); }

It shouldn't come as a surprise that the output of calling Foo is 3. A local variable i is declared in Foo and set initially to 0. Then we create an anonymous delegate that, when invoked, increments i by one. We pass that delegate to the Bar function, which applies it three times.

But if you stop to think about what the compiler is doing here, it's. The compiler notices that you've accessed the local variable i from inside your delegate and responds by hoisting its storage into a heap-allocated object shared by Foo and the object on which the delegate is placed. The type of this object is auto-generated by the C# compiler and never seen by your code.

Note

This is quite nice of the compiler to do, but performance-conscious readers might worry about feature abuse. What appears to be local variable access turns out to actually involve an object allocation and at least two levels of indirection. Your concern would not be without justification.