Attributes | Advanced .NET Programming

Viewed from high-level languages, attributes tend to fall into two informal categories, which are often known as custom attributes and Microsoft attributes. The syntax in the high-level languages is identical, but attributes are typically viewed as custom attributes if they have been defined in some code that wasn't written by Microsoft! Since the Microsoft compilers cannot have any awareness of individual third-party attributes, the sole effect of such attributes is to cause metadata to be emitted to an assembly. On the other hand, Microsoft-defined attributes are normally assumed to have some other effect on the code - for example, the Conditional attribute might cause certain code not to be compiled.

This common view of attributes is not really accurate; it is more accurate to divide attributes into three categories:

Custom Attributes. These are attributes whose sole purpose is to cause metadata to be emitted in assemblies. Other managed code may of course read the metadata data using reflection and change its behavior based on the presence of these attributes. All non-Microsoft attributes will fall into this category, but there are also some Microsoft-defined attributes that do so, such as STAThreadAttribute.
Distinguished Attributes. These exist in assemblies as metadata, but they are additionally recognized by the CLR itself, and the CLR will take some action if it sees these attributes.
CLS Attributes. These are similar to custom attributes, but are formally defined in the CLS. It is expected that certain developer tools will recognize these attributes.

Although I have listed three categories of attributes, it's important to understand that as far as IL is concerned, there is no difference. There is only one type of attribute at the level of IL: the custom attribute. Every attribute in existence is a custom attribute, and every attribute is introduced in IL source code with the same syntax, using the .custom directive:

 .custom instance void [mscorlib]System.STAThreadAttributes::.ctor() =                                                             (01 00 00 00)

There is also another category, that of pseudo-custom attributes. Pseudo-custom attributes are not really attributes at all. They are certain pre-defined flags in the metadata. However, they are represented in high-level languages by attributes - a representation which is useful, though misleading.

Obviously, I can't present a full list of attributes since new attributes are certain to get added over time, but to give you an idea, here are some of the attributes that Microsoft has defined:

Category	Attributes
Custom	BrowsableAttribute, DefaultPropertyAttribute, SoapAttribute, EditorAttribute
Distinguished Custom	SecurityAttribute, ObsoleteAttribute, SerializableAttribute
CLS Custom	AttributeUsageAttribute, CLSCompliantAttribute, ObsoleteAttribute

The following table gives some of the pseudo-custom attributes, along with their corresponding flags in IL:

Attribute	Flag(s)
DllImport	pinvokeimpl
StructLayoutAttribute	auto/explicit/sequential
MarshallAsAttribute	ansi/unicode/autochar

If you are interested, the full list of distinguished, CLS, and pseudo-custom attributes as of version 1.0 can be found in the Partition II specifications. However, in most cases you don't need to know which category an attribute falls into in order to use it, since the syntax is the same for all categories in high-level languages, and for all categories except pseudo-custom attributes in IL assembly.

The concept of pseudo-custom attributes is quite cunning, since it provides a way that Microsoft can in the future, if it so decides, define more flags or directives in IL, and have support for these new directives automatically propagated to all high-level languages. Take the DllImport attribute as an example. We have already seen how you can mark a method as [DllImport] in your C#, VB, or C++ source code, and have this converted to a method marked pinvokeimpl in the emitted IL. In fact, the compilers themselves know nothing of this attribute - they will simply pass it through their normal attribute syntax checks. However, the compilers will internally call up some code called the unmanaged metadata API to emit the metadata into the compiled assemblies. (If you're interested, this API is documented in the Tool Developers Guide subfolder of the Framework SDK). The metadata API will recognize DllImportAttribute and know to emit a pinvokimpl flag instead of an attribute into the metadata when it encounters DllImport. You can probably see where this is heading. If Microsoft decides they want to make some other IL flag or directive available to high-level languages, all they need to do is define a corresponding pseudo-custom attribute, update the metadata API to recognize the new attribute, and instantly, all high-level languages will gain support for the directive via the attribute. Clever, huh?

Let's quickly look in more detail at the IL syntax for defining custom attributes. As noted earlier, you can define attributes using the .custom directive. Here is an example of an attribute that has been applied to a method:

 .method public static void Main() cil managed {   .entrypoint   .custom instance void  [mscorlib]System.STAThreadAttribute::.ctor()  =                                                               (01 00 00 00)

The attribute in question here is the STAThread attribute, which indicates the COM threading model to be applied to a method if it calls into COM interop services. The .custom directive is followed by a token indicating the constructor of the attribute, which in turn tells the CLR what type the attribute is. The constructor token is followed by binary data, which will be embedded into the metadata, and which indicates the data that should be passed to the constructor if the attribute needs to be instantiated (this will occur if some other application or library uses reflection to instantiate the attribute). You may wonder why the above code shows four bytes being passed to a zero-parameter constructor. The answer is that the blob of data supplied with the .custom directive is set in a format defined in the Partition II document, and this format requires certain initial and terminating bytes which must always be present. The four bytes we see in this code are just those bytes.

The position of the .custom directive tells ilasm.exe what item the attribute should be applied to. For items that have some scope that is marked by braces (such as types and methods), you can place the directive inside the scope. For other items, such as fields, which do not have any scope, the .custom directive should be placed immediately after the declaration of the item (the opposite position to high-level languages, which normally place attribute declarations immediately before the item to which the attribute should be applied).

There is an alternative syntax, in which the .custom directive is qualified by a token indicating the object to which it should be applied, and which therefore allows the directive to be placed anywhere in the file:

 .custom (method void NamespaceName.ClassName::Main()) instance void [mscorlib]System.STAThreadAttribute::.ctor() = (01 00 00 00)

This latter syntax corresponds better to the internal representation of the attribute in the actual assembly: in the binary assembly, attributes are listed in the metadata, along with the token of the object to which they refer. Being able to place attributes next to the item they decorate in IL assembly code is a convenience permitted by the ilasm.exe assembler.

Delegates and Events

Delegates have a very similar status in the CLR to enums: we saw earlier that to define an enum, you simply declare a type that is derived from System.Enum. The CLR will recognize from the base type that this is a special class, and will therefore process it in a special manner. The same thing happens with delegates. You just derive a class from System.MulticastDelegate, and the CLR will recognize that it's a delegate and treat it accordingly. This means in particular that:

The CLR imposes the following restrictions on your definition of the class: it is not permitted to contain any fields, and the only methods it is allowed to contain are the ones that you'd normally expect a delegate to have: Invoke(), the constructor, and optionally the BeginInvoke() and EndInvoke() methods. (BeginInvoke() and EndInvoke() are used for invoking delegates asynchronously. Not all delegates implement them, and we'll postpone discussion of these methods until Chapter 9, when we discuss threading.)
You are not permitted to supply implementations for any of these methods, because the CLR does that for you - and the CLR's implementation entails all sorts of internal hooks into the execution engine to make delegates work correctly. You do, however, have to declare these methods just in order to make sure the appropriate tokens get put in the metadata to refer to the methods, so that other code can call them.

To illustrate the principles, let's quickly show the code for a simple delegate. We'll assume we want a delegate that allows a string to be output in some manner (for example, to the console, to a file, or in a message box). This is the simplest possible definition in IL:

 .class public auto ansi sealed WriteTextMethod      extends [mscorlib] System.MulticastDelegate {    .method public specialname rtspecialname         instance void .ctor(object, native int) runtime managed    {    }    .method public virtual instance void Invoke(string text) runtime managed    {    } }

This delegate is called WriteTextMethod. As noted above, the fact that it derives from System.MulticastDelegate is sufficient to identify it as a delegate. The two methods we define in it are both marked runtime managed - a designation we've not seen before. managed of course means that it contains managed code; runtime indicates that the CLR knows about this method and will supply an implementation for us.

The signatures of the methods can best be understood by comparison with a high-level language. The above definition corresponds to this C# code:

 public delegate void WriteTextMethod(string);

Recall that this definition requires a delegate instance to be created by passing in details of a method. The method details consist of an object reference (which is null for a static method), and details of the method in this object that is to be wrapped by the delegate. In C#, instantiating the delegate would look like this (for our example, we're using the delegate to wrap the Console.WriteLine() method, which has the correct signature):

 WriteTextMethod myDelegate = new WriteTextMethod(Console.WriteLine);

In C#, a lot of what's going on is hidden. However, the new keyword gives away the fact that a constructor is being called - in IL, the .ctor() method. As far as IL is concerned, we have to be really explicit about what we are passing in, so .ctor() takes two parameters, of types object and native int respectively. The object is of course the object reference. The native int is going to be a pointer to the entry point of the method the delegate will wrap. We've not seen native int used like this before, but it is legal IL - it's even verifiable provided we use the correct technique to obtain the function pointer, which we'll see soon.

The Invoke() method is of course used on the delegate to invoke the method that is wrapped by this instance of the delegate. It has to have the correct signature - and it's interesting to note that the signature of Invoke() is the only means the CLR has available to figure out what type of method this delegate can be used to invoke.

Now let's see how a delegate is actually used. We'll code up some IL that is equivalent to this C# code:

 WriteTextMethod myDelegate = new WriteTextMethod(Console,WriteLine); myDelegate("Hello,  World");

This code instantiates the delegate, and uses it to display Hello, World! in the console window. First we have to instantiate the delegate:

 ldnull ldftn    void [mscorlib]System.Console::WriteLine(string) newobj   instance void DelegateDemo.WriteTextMethod::.ctor(object,                                                            native int)

There are a couple of new commands here. ldnull loads a null object reference onto the stack. It's identical in its behavior to ldc.i4.0, except that the zero on the stack is interpreted as an object reference instead of an int32. We want null on the stack for the first parameter to the constructor, since we are passing a static method to it. For an instance method, the ldnull would be replaced by an instruction to put an object reference onto the stack, such as ldloc.0 (if local variable 0 contains the object reference).

ldftn is another new command. It's the instruction that supplies the function pointer. It takes as an argument a token indicating a method, and places the address where the code for that method starts onto the stack.

Then, with the parameters on the stack, we can call the actual constructor, which will leave a reference to the newly created delegate on the stack.

Next we invoke the delegate. This code assumes that a reference to the delegate object is stored in local variable 0, and that the delegate is in the DelegateDemo namespace:

 ldloc.0 ldstr    "Hello,  World" callvirt instance void DelegateDemo.WriteTextMethod::Invoke(string)

We load the delegate reference (the this reference as far as the Invoke() method is concerned), then the one parameter that must be passed to the delegate, and finally use callvirt to invoke the method.

One interesting point to note from all this is the way that type safety works here: the standard phrase is that delegates provide a type-safe wrapper for function pointers. You might wonder how that type safety can be enforced - after all, what's to stop us just loading any old number onto the stack in place of null or the method address, passing it to the delegate, and as a result having some code that shouldn't be executed invoked through the delegate? The answer is that Microsoft has defined certain standard IL instruction sequences for use when invoking delegates. The above code shows one example of these sequences, all of which will pass the correct information to the delegate. The JIT compiler will recognize these sequences and accept them as verifiable, but the verification algorithm will reject any other sequence. Obviously, high-level language compilers will only emit known verifiable sequences to invoke delegates.

One last point: while we're on the subject of delegates, we will quickly mention events. Events are very similar to properties to the extent that the CLR has no intrinsic knowledge or support for them. As far as IL is concerned, an event is declared with the .event directive. But all .event does essentially is the equivalent of attaching a flag to a specified existing delegate to indicate that high-level languages might wish to interpret that delegate as an event, and use any special supported high-level language event syntax in association with it. That's pretty much exactly analogous to the situation for properties and methods. Since there's no new .NET or CLR concepts to be understood by examining events, we won't consider events further.