The CLR in Relation to .NET | C# and the .NET Framework

I l @ ve RuBoard

The block diagram shown in Figure 1.2.1 shows the CLR in relation to the general .NET framework.

Figure 1.2.1. The CLR and the .NET framework.

graphics/0102fig01.gif

The CLR in Detail

As a subcomponent of the .NET framework, the CLR itself is built up from individual parts . Figure 1.2.2 shows those pieces in relation to one another.

Figure 1.2.2. Detail of the CLR subsections.

graphics/0102fig02.gif

As you can see from Figure 1.2.2, the primary function of the CLR is as a class loader. It loads classes that you create or that are supplied as part of the base class library, prepares them for use, and then either executes them or assists in their design-time use.

Classes are loaded from assemblies, the .NET equivalent of the DLL or EXE. The assembly may sometimes contain native code but is most likely to contain classes that have been compiled to IL and the metadata associated with them. At design time, the CLR interfaces with tools such as Visual Studio.NET to provide the Rapid Application Development (RAD) experience that VB programmers have used for so long, but for all modules in all languages. At runtime the class loader executes the classes.

The CLR at Runtime

When the class loader opens an assembly at execution time, it has a number of important steps to take before the classes can actually be run. Figure 1.2.3 illustrates the execution model for the CLR.

Figure 1.2.3. The CLR execution model.

graphics/0102fig03.gif

The class loader uses the Code Manager to assign memory for the objects and data. The layout of the classes in memory is computed, and each of the methods that are stored as intermediate language is given a stub that is used to invoke the compiler the first time it is run.

NOTE

You will notice that in the model in Figure 1.2.3 there are two distinct types of intermediate language, MSIL and OptIL, that are compiled by their own JIT compilers. OptIL is a highly optimized subset of MSIL that is designed specifically for use by host systems that don't have the luxury of lots of memory or computational speed. Because of the extra work that the high-level language compiler puts into the optimization, OptIL is more suited to PDAs, intelligent phones, and the like. EconoJIT is the compact, portable compiler that turns OptIL into machine code.

Whenever a function first references a class stored in a different assembly or a data type that hasn't been used before, the class loader is invoked again to bring in the required objects. The process is probably best understood by looking at a flowchart. The Common Language Runtime's virtual execution system (VES) and the major control paths are shown in Figure 1.2.4.

Figure 1.2.4. The virtual execution system.

graphics/0102fig04.gif

Install Time Code Generation

A part of the VES not shown in Figure 1.2.4 is the install time code generation that sometimes takes place when assemblies are first put onto a host system. This will create native code in the assembly that reduces the startup time of a particular object. Generally, verification of this code is done at install time, but if a runtime version or security checks are needed, the code will be recompiled by the standard JITer in the usual manner.

Data Types Supported by the CLR

As was mentioned earlier, the .NET framework has a set of native data types that are used by all languages. These include integral types like 8-, 16-, 32-, and 64-bit integers and chars. The list also contains floating-point types and pointers to both managed and unmanaged memory.

All of the native data types may be used by code written in IL, and you can write IL by hand. IL is very similar to assembly language, so if you've ever rolled up your sleeves and played with a Z80 or 6502, you'll feel right at home with IL. Later in this section we will show you some handwritten IL and the tools that come with the .NET framework that can be used to assemble and disassemble it.

The basic types are detailed in Table 1.2.1.

Table 1.2.1. Native CLR Data Types

Type Name	Description
`I1`	8-bit 2's complement signed value
`U1`	8-bit unsigned binary value
`I2`	16-bit 2's complement signed value
`U2`	16-bit unsigned binary value
`I4`	32-bit 2's complement signed value
`U4`	32-bit unsigned binary value
`I8`	64-bit 2's complement signed value
`U8`	64-bit unsigned binary value
`R4`	32-bit IEEE 754 floating-point value
`R8`	64-bit IEEE 754 floating-point value
`I`	Natural size 2's complement signed value
`U`	Natural size unsigned binary value, also unmanaged pointer
`R4Resul`	Natural size for result of a 32-bit floating point computation
`R8Result`	Natural size for result of a 64-bit floating point computation
`RPrecise`	Maximum-precision floating point value
`O`	natural size object reference to managed memory
`&`	Natural size managed pointer (may point into managed memory)

Table 1.2.1 mentions "natural" sizes for data types. This refers to sizes that are dictated by the hardware. For example, a machine with a 16-bit bus probably has a 16-bit natural integer, and a machine with a 32-bit bus will have a correspondingly large integer. This can be a problem when programs are supposed to talk to software at the other end of an Internet connection. Without explicit instructions, it is difficult to know how big an integer is at either end, and type size mismatches and structure misalignments can occur. To address this and other similar problems, the CLR uses only a select few data types when actually evaluating and running software. The native type that doesn't fit the few chosen sizes is carefully packed and unpacked to and from the more portable types. This happens behind the scenes, and you don't have to worry about it.

Managed Data and Code

The CLR contains the Code Manager and the Garbage Collector (GC). These functional blocks are responsible for the memory management of all .NET native objects, including all the managed data types. The Code Manager allocates storage space; the Garbage Collector deletes it and compacts the heap to reduce fragmentation.

Because the GC is responsible for compacting the heap, data or objects used in the .NET managed space are often moved from place to place during their lifetimes. This is why, under normal circumstances, objects are always accessed by reference. In the C++ world, a reference is not very different from a pointer. The mechanism used to find the object referred to is the same ”a pointer to a memory block somewhere. C++ references are the compiler's way of saying that it guarantees that you are pointing to a particular type. Under C++ it's possible to store a reference to an object, delete the object, and then use the stored, out-of-date reference to access the place where the object used to be. This is often the cause of disastrous failure in a C++ program.

Under the .NET system, however, this problem is eliminated. Whenever a reference to an object is made, the object knows it's being referred to, and the GC will never delete it unless it's free of encumbrance. Furthermore, when the GC tidies up the heap and moves the object in memory, all of the references to that object are automatically updated so that components that use the object get the right one when they access it next .

Through the Code Manager, the CLR is responsible for the actual memory layout of objects and structures that it hosts . Usually this layout process is automatic, but when you need to do so you can specify the order, packing, and specific layout of memory in the metadata. This is made possible by the fact that the metadata is available to the programmer through a set of well-defined interfaces.

Unmanaged Code and Data Access

It might not be apparent how .NET type safety, security, and verification enable you to protect your investment in the C++ code that you have lavished care and attention on for so long. Microsoft was in exactly the same boat and, as a consequence, the .NET framework has excellent capabilities for reusing your legacy code.

There are three basic mechanisms for managed/unmanaged interoperation under .NET. COM Interop enables your COM objects to be used by the .NET framework as if they were .NET objects. The Platform Invoke (or P/Invoke) method lets managed objects call static entry points in DLLs the same as LoadLibrary and GetProcAddress . Finally, IJW (It Just Works) in its most basic form enables you to recompile your code and it just works. A more complex form enables you to enhance your code using managed extensions to C++. This lets you create full GC memory-managed objects that your old C++ source code uses.

COM Interop Via the CLR

You can see in Figure 1.2.2 that the CLR contains a COM marshaler . This functional block is responsible for the COM/.NET Interop. There are two scenarios that require COM Interop with the .NET framework. The first is when you want to access your old COM objects from new C# or VB code you've written. The second is when you want to implement a well-known interface and have your COM objects access it.

For both of these scenarios, the CLR plays a very dynamic role by creating a specialized wrapper at runtime and then marshaling data in and out of the COM object as needed. The diagram in Figure 1.2.5 shows a COM object being accessed by a .NET client object.

Figure 1.2.5. Managed code to COM Interop.

graphics/0102fig05.gif

To use COM objects with your .NET programs, you need to import the type library definition. VS.NET will do this for you automatically when you add a reference to the COM object in the IDE, in the same way that VB programmers have done for many years . You can also use the type library import utility (TLBImp) explicitly. TLBImp will refactor the COM method signatures into .NET method signatures for you. For example, the COM method signature

 HRESULT MyMethod(BSTR b, int n, [out, retval]int *retval);

is transformed to

 int MyMethod(String b, int n);

As a result, when you call the method you don't have to worry about interpreting HRESULT s, you can simply assign the integer return value to a variable in your code. In fact, the whole .NET-to-COM Interop picture is very easy to use. You don't have to worry about data conversions. As you can see from the example, the COM data types such as BSTR map to sensible equivalent .NET types. You don't have to manage the reference counting and there are no GUIDS. If you do get a failure HRESULT , the runtime wrapper generates an exception that you can catch.

When you want to call a .NET object from existing COM objects, the process that the CLR uses to facilitate the connection is similar to that shown in Figure 1.2.5. Figure 1.2.6 illustrates the COM-to-.NET connection.

Figure 1.2.6. Accessing a .NET object from COM.

graphics/0102fig06.gif

COM objects can access only public methods of your .NET objects, but they can be used from the COM world.

First, create a typelib from your object using the .NET tool TlbExp. Next, register the assembly created by TlbExp with the tool RegAsm. This is the only time you'll need to register anything under .NET; COM running in the unmanaged portion of your host computer needs to activate the object in the standard way. Finally, from COM you can use CoCreateInstance and QueryInterface , use AddRef and Release as normal, and use the HRESULT s that the wrapper provides.

Later, in Part 5.3, we have a chapter dedicated to COM Interop where we will go into more detail and show you examples.

Platform Invoke (P/Invoke) from the CLR

One of the most common requirements for interoperation will be the use of a DLL that already exists on the local system. For example, you might need to continue to use tools that you've invested in and that are available only as DLLs.

The .NET framework provides the P/Invoke mechanism for this purpose. The internal mechanism is very similar to COM Interop, but because the DLLs that you use are always going to be local (on the same machine), the system is simpler to use.

To use the P/Invoke mechanism, you need a wrapper that delegates calls from the CLR to the actual DLL function. You create these wrappers by using special attributes in your method definitions.

The code in the following example would allow you to call a function in a custom DLL.

MyDll.Dll contains a function int MyFunction(int x) :

 public class MyDllWrapper {     [DllImport("MyDll.Dll",EntryPoint="MyFunction")]     public static extern int MyFunction(int x); }

You may now access the DLL using the C# commands:

 int r=MyDllWrapper.MyFunction(12);

In Chapter 1.4, "Working with Managed Extensions to C++," we will examine the P/Invoke mechanism in greater detail when we look at Managed Extensions to C++.

IJW ("It Just Works")

The simplest way to migrate your C++ applications to the .NET framework is to gather all the source code for your EXE or DLL. Recompile them all with the /CLR compiler switch and run them. Microsoft says "It Just Works." This is probably true if you've been programming in a reasonably modern way. If your code still contains good old K&R-style declarations, then it won't.

There are a few limitations to this process. You cannot recompile all your class libraries There are a few limitations to this process. You cannot recompile all your class libraries and then inherit from them in the managed world. You cannot easily pass managed code pointers to your functions and classes.

Both the IJW and P/Invoke methods prolong the life of your code and give you an opportunity to migrate to the .NET framework.

Managed Extensions to C++

When simply reusing your old code isn't enough, and you need a more integrated approach from a C++ programmer's perspective, managed extensions to C++ are an ideal solution or migration path .

The .NET framework adds some significant concepts to memory management and the use of data. These features are all available from the C# and VB languages but not natively to C++. The additions to the programmer's arsenal are discussed in the following sections.

Garbage Collected (GC) Classes

Garbage Collected (GC) classes are objects for which all memory management is performed by the CLR. Once you stop using a GC class, it is marked automatically for deletion and reclaimed by the GC heap. A GC class may also have a delete method that can be called explicitly.

You can create GC classes by using the new __gc class modifier on your classes:

 __gc class MyClass {     public:         int m_x;         int m_y;         SetXY(int x, int y)         {             m_x=x;             m_y=y;         } }

MyClass is now a Garbage Collected class.

CAUTION

If you use _asm or setjmp in your C++ methods, the compiler will issue warnings; attempting to run the code may result in failure if the method uses any managed types or managed code. Your GC classes may employ only single inheritance. They may not have a copy constructor and may not override operator & or operator new .

Value Classes

The new __value keyword enables you to create complex value types in your C++ code. For example, you may want to create a 3D_point structure with x,y,z coordinates to be used as a value type. These types are created on the heap but may be boxed or wrapped in a sort of object skin, using the __box keyword to use them as managed objects. (There is an extensive discussion of boxing and unboxing in Part II of this book, "The C# Language.") When they are boxed, these value types can be used by .NET managed code, held in .NET collection classes, serialized simply with XML, and so on.

Classes and structs created with the __value keyword inherit from the .NET framework class System.ValueType and may override any of the methods that are needed from that object. Putting any other virtual methods on your value class is not allowed. For example, you can override System.ValueType.ToString to render a string that describes the object. For example, int32.ToString() returns the number as a string. Like itoa(...) . You may also wish to override System.ValueType.GetHashCode for use in a map collection.

These value classes act like C++ classes when used from C++ but can be used by the managed runtime, too. They cannot be allocated directly from the managed heap, but they can be used by managed objects when boxed.

There are some other rules that apply to value classes. You may derive the value class from one or more managed interfaces, but you cannot define a copy constructor for it. A value class cannot have the __abstract keyword as a modifier. Value classes are always sealed, which means that they cannot be derived from. You can't declare a pointer to a value class.

As with the __gc keyword, you should not use _asm or setjmp if the class uses any managed code, accepts managed types as parameters, or returns a managed type. It might compile, but it will probably fail to work correctly.

Properties

.NET C# classes can have properties with get and set accessors. The __property keyword allows you to create GC classes that support properties. A property looks like a data member in your class but is really a piece of code. The example below shows how a property would be created and used.

 #using "mscorlib.dll" __gc class My3DPoint {     // members are private by default     int m_x;     int m_y;     int m_z; public:     __property int get_x(){ return m_x;} ;     __property void set_x(int value){ m_x=value;}     __property int get_y(){ return m_y;} ;     __property void set_y(int value){ m_y=value;}     __property int get_z(){ return m_z;} ;     __property void set_z(int value){ m_z=value;}     // other 3D operations go here... } ; void main(void) {     My3DPoint *pP=new My3DPoint();     pP->x=10; // calls set_x(10);     int X = pP->x; // calls get_x(); }

Pinning Values

The Garbage Collected heap will regularly move objects from place to place during the course of its operation. Whenever you need to pass a pointer to a managed object or to an unmanaged C++ function, you can pin the object in place using the __pin keyword. The pinning operation forbids the GC from moving the object until it's unpinned.

Exceptions

All exceptions within the .NET framework are handled by the CLR. This provides a very consistent and powerful mechanism for trapping and handling errors wherever they may occur.

Exceptions in the CLR use exception objects, usually derived from the .NET framework class System.Exception and one of four types of exception handlers. These handlers are

A finally handler is executed whenever a block exits. This handler is called during the normal shutdown of an object as well as when a fault occurs.
A fault handler runs when a true exception occurs.
A type-filtered handler services exceptions from a specific class or its derived classes. For example; catch(MyExceptionType e)
A user -filtered handler can figure out whether the exception should be ignored, handled by the associated handler, or passed on to the next exception handler available.

Every method in every class in the .NET framework libraries or in your code has an exception handler table associated with it. The entries in the array describe a protected block of code and the exception handler associated with it. There may be either no handler at all, a catch handler, a finally handler, or a fault handler for each table entry. When an exception occurs, the CLR looks for an entry in the table of the method that threw the exception for a handler. If one is found, control is passed to the handler as usual; if not, the CLR proceeds through a stack walk to the calling method and so on, back up the chain of callers until a handler is found. If one is not found, the CLR aborts the application and produces a stack dump.

As we progress through this book, examples will give more detailed information on how to create your own exception classes and handlers.

Debugging Support

The .NET framework has debugging built into it at a very low level. Unlike the debug schemes that C++ programmers are used to, which are often intrusive , the CLR, which runs all the code anyway, manages debugging and sends events to the connected debugger whenever needed.

The .NET SDK has its own debuggers, in addition to the one that's integrated with Visual Studio.NET. All debuggers operate through the public APIs provided by the framework.

I l @ ve RuBoard