Managed and Unmanaged Code | COM Programming with Microsoft .NET

First, let's establish a couple definitions. Managed code means code that executes within the .NET environment and is managed by the common language runtime. Unmanaged code , also known as native code , is ordinary Windows code, which executes outside .NET and doesn't have anything except the operating system controlling it.

The fact that managed code is executed by the runtime has many and far- reaching consequences, particularly in the areas of security and access to resources. What is most important from the point of view of this chapter, however, is the effect the runtime has on memory management.

All memory in managed code is under the control of the common language runtime. Dynamically allocated objects are accessed using references in Microsoft Visual C# and Microsoft Visual Basic .NET, and by using pointers in managed C++. The runtime determines where blocks of memory are allocated and can move them around to manage memory efficiently . The runtime also knows when there are no more references to blocks of memory, and it can reclaim them when the need arises.

The .NET garbage collector runs when unused memory needs to be freed up; it can recognize objects that are not being used and reclaim their memory. The garbage collector is discussed in the 'Garbage Collection in .NET' section later in the chapter. For now, note that it's usually left up to the system to decide when to do a collection, although a collection can be forced by the programmer by calling the System.GC.Collect function.

Unmanaged code, on the other hand, runs outside the .NET environment. This means nothing is responsible for monitoring the execution of unmanaged code or managing memory except the basic mechanisms provided by the operating system. For unmanaged code written in C or C++ (such as the Windows APIs, for example), memory allocation and deallocation is under the control of the programmer. In contrast to .NET, a block of memory always has the same address, and it's up to the programmer to deallocate it when no longer needed.

Manual and Automatic Memory Management

What are the benefits and costs of the automatic memory management provided by a garbage collector compared with the manual method used in C and C++?

Automatic memory management has two main advantages. The first is that you won't get memory leaks because unused memory will always be reclaimed when it's needed. Frequently in C/C++ programs, the programmer will forget to free up memory once it's no longer needed. This results in the program holding on to memory it no longer needs-that is, it has leaked memory. It's common to find C/C++ applications whose memory requirements steadily increase over time because of leaked memory.

The second advantage of automatic memory management is that memory will always be around as long as someone is still using it. If a programmer frees up dynamically allocated memory when it's still required, some other part of the code might try to use memory that has been freed and end up crashing the application. This won't happen if memory is only reclaimed when nobody is using it.

You might wonder why all languages don't use garbage collection. The answer is that manual memory allocation does have some advantages. The first advantage is that it's always possible to know exactly when an object has been destroyed . With garbage collection, you don't know when memory will be reclaimed. The second advantage is that you have control over the amount of memory being used by a process at any one time and can free up memory as soon as it's no longer needed.

Interoperating Between Managed and Unmanaged Code

Because of the differences between .NET code and unmanaged code, a lot of work needs to be done if you are to call unmanaged code from within .NET code.

The .NET Framework provides a set of fundamental data types that can be used by all .NET languages. To pass these data types to and from managed code, they'll need to be marshaled . Marshaling is the process whereby method parameters are passed across thread or process boundaries.

Marshaling is done automatically for you for the .NET value types and also for the string type. You'll see later in this chapter how Platform Invoke provides marshaling support and gives you a simple way to access unmanaged functions.

Unmanaged functions often need pointers to objects and data structures, and this has several consequences. The garbage collector compacts memory during a collection run, causing objects to move around during collections. In unmanaged code, pointers are assumed to be fixed, so the runtime must be told not to move the object while it's being used with unmanaged code.

You can define structures in .NET languages, but the runtime reserves the right to lay the structures out in memory in the most efficient way. In other words, you cannot rely on the members of a structure being laid out in memory in the same order as they were defined in the code. This does not matter to .NET client code because the runtime will always access the right member; unmanaged code, on the other hand, assumes the layout of an object in memory matches the definition. For this reason, you can tell .NET that the definition of a structure specifies the layout as well as simply defining the types of the members .

Garbage Collection in .NET

The .NET garbage collector uses references to keep track of allocated memory. When there are no longer any references to an object, the garbage collector marks the object as reclaimable. During a collection, the collector can return to the operating system the blocks of memory used by these reclaimable objects.

The .NET garbage collector uses the Win32 VirtualAlloc function to reserve a block of memory for its heap, which is commonly referred to as the managed heap . The garbage collector first reserves virtual memory, and then commits the memory as memory requirements grow. The garbage collector keeps track of the address at the end of the managed heap and allocates the next block of memory at this address. By this process, all .NET-managed memory allocations are placed in the managed heap one after another. This method of organization vastly improves allocation time because the garbage collector doesn't have to search through a list of memory blocks for an appropriately sized free block, as normal heap managers do.

Over time, holes begin to form in the managed heap as objects are deleted. When a garbage collection occurs, the collector compacts the heap, filling in holes by moving objects around. This has implications if pointers to managed memory are passed to unmanaged code because the garbage collector might end up moving-or even reclaiming-an object that is being used outside .NET. You'll see later in the chapter how to prevent this from happening.

Generations

In the past, one criticism of garbage collection mechanisms was that they affect the running of an application, often at the least convenient point of execution. Rather than examining every object when a collection occurs, the .NET garbage collector improves performance by dividing objects into generations . The garbage collector currently uses three generations, numbered 0, 1, and 2. All newly allocated objects are placed in generation 0. When a collection occurs, objects in generation 0 are examined and unused objects are reclaimed. If this doesn't free enough memory, successively older generations can also be collected.

Objects in generation 0 that survive a managed heap compaction are promoted to generation 1; objects in generation 1 that survive a collection move into generation 2. This use of generations requires the collector to work only with a subset of allocated objects at any one time and therefore decreases the amount of work needed for a collection.

Each generation has a capacity, and a generation will be collected when it becomes full. In .NET Version 1.0, the capacities are 256 KB for generation 0, 2 MB for generation 1, and 10 MB for generation 2. Note that these aren't fixed, and the garbage collector can dynamically adjust these thresholds based on an application's patterns of allocations.

The Large Object Heap

All allocations of 85 KB or larger use a separate heap called the large object heap , which is independent from the main managed heap. Using a separate heap for larger objects makes garbage collection of the main managed heap more efficient because collection requires moving memory and moving large blocks of memory is expensive.

In release 1.0 of .NET, the large object heap is never compacted , even when garbage collections occur. This means that if you allocate 5 MB of memory for an object, the large object heap will expand to be 5 MB in size . Even when the object is no longer referenced, the large object heap doesn't decommit the virtual memory and remains at 5 MB. If you allocate a smaller block later on-say 1 MB-the new block will be allocated within the 5 MB allocated to the large object heap. In other words, the large object heap will always grow to hold all the current allocations, but it will never shrink.

Roots

The garbage collector has to know which blocks of memory it can collect. Some collectors use a flag on each allocated block to indicate whether or not the block can be collected. The .NET collector, on the other hand, maintains a tree of references that tracks the objects referenced by the application.

Every .NET application has a set of roots. The root either refers to an object on the managed heap or is set to null. An application's roots include the following:

Global object pointers
Static object pointers
Local variables and reference object parameters
CPU registers

Figure 12-1 shows a tree of roots for an application. An object that is no longer referenced is not linked into the tree.

Figure 12-1: The application roots link all the memory used by an application.

An object is rooted if at least one parent object holds a reference to it. Before it performs a garbage collection, the collector has to know which objects are still in use. To find these objects, it follows chains of references from the application's roots: every object in a root can reference other objects; these in turn may reference further objects, and so on.

By following these reference chains, the collector can find all the objects that are reachable from the roots. Any objects that do not appear in this list are, by definition, not reachable from any live object. Because these objects cannot be referenced by any application code, it is safe for the collector to release the memory allocated to them. Figures 12-2 and 12-3 show how the heap is compacted during collection and when space allocated to objects that cannot be reached from the roots is reclaimed.

12-2: Objects 2, 7, and 9 on the managed heap cannot be reached from the application's roots, so they are candidates for garbage collection.

12-3: After a collection, the heap has been compacted.

Many garbage collectors simply use a flag or a reference count to indicate whether a block of memory is being used. Although that can be more efficient, the .NET approach has an advantage in that it is completely accurate. By following the reference chains from the application roots, the collector ensures that no live objects are collected by mistake.

Objects can be strongly or weakly rooted. A strongly rooted object is one to which there is a live reference; the garbage collector will not collect strongly rooted objects. A weakly rooted object-also known as a weak reference- is one that can be accessed by both the application and the garbage collector. This effectively means that weakly rooted objects can be used until they are collected by the garbage collector. These weakly rooted objects can be useful for data structures that are large but relatively easy to construct. You can create one of these objects and use it; if the garbage collector needs to reclaim memory it will collect the structure, and next time you want to use it, you'll have to rebuild it.

You can find more details of how .NET garbage collection works in the 'Garbage Collection' articles Jeffrey Richter wrote for MSDN Magazine, which you can find in the MSDN library at http://msdn.microsoft.com/msdnmag/find/tech.aspx?phrase=.net . You might also find useful information in the MSDN article 'Production Debugging for .NET Framework Applications,' which can be found at http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnbda/html/DBGch02.asp .

Finalization

If a type needs to free up resources-such as sockets or file handles-when instances are destroyed, the programmer can implement Finalize and Dispose methods for the type.

Note	You need these methods only if a type holds unmanaged resources. If a type has references only to other managed types, cleanup will occur automatically.

If implemented for a type, the Finalize method is called by the garbage collector when an object is collected. You can't tell when this will happen, so Finalize should not be used if it's important when resources are freed. When Finalize is called, object memory isn't freed until after the next collection, and all child objects are kept alive until that point. In other words, if an object has a finalizer, it will live through one more collection than an object that does not have a finalizer.

The Dispose method is part of the IDisposable interface. It provides a function that can be called explicitly by the programmer so that an object's resources can be freed at a known point. Resources held by the object will be freed during the call to Dispose rather than at the next collection.

When the garbage collector determines which objects aren't reachable from any of the program roots, it places those that have finalizers in the finalizer queue . A separate thread is used to walk down this queue, calling Finalize on the objects in turn. The programmer has no control over the order in which objects are placed in the queue or when the thread runs. This process has several possible consequences:

An object could remain in the finalizer queue for some time before its Finalize method is called.
Objects that need to free their resources in a particular sequence might not be able to do so. With this process, you have no idea when finalization will occur relative to other objects in the queue.
Small objects could hold pointers to large amounts of memory or other resources, and these might not be freed until the object is finalized.

You should not call any methods on other objects in a finalizer, with the exception of base class methods, because you don't know whether the objects you're calling have themselves been finalized yet.

You will often find it far more efficient to implement a Dispose method and also provide a finalizer as a backup, in case Dispose does not get called. The following sample class definition shows how this can be implemented:

 classTest:IDisposable{ privateboolbDisposed=false; publicvoidDispose(){ //Releasetheresources InternalDispose(true); //Suppressfinalizationforthisobject GC.SuppressFinalize(this); } protectedvirtualvoidInternalDispose(boolbFreeAll){ if(bFreeAll){ //Freemanagedresources } //Freeunmanagedresources bDisposed=true; } //Destructor ~Test(){ InternalDispose(false); } //Methodsmustcheckiftheobjecthasbeendisposed publicvoidDoSomething(){ if(bDisposed) thrownewObjectDisposedException("Test"); } }

This class shows a pattern that can be used to construct classes that can be disposed or finalized. The class implements IDisposable , which means it needs to provide a public Dispose method that takes no parameters. This method should not be declared virtual because derived classes should not be able to override it. Note how Dispose calls GC.SuppressFinalization . This method removes an object from the finalizer queue and is called here because all resources have been freed, so there is no need for further finalization.

The class also defines a destructor, which will be called when the object is finalized. Both the destructor and Dispose use the protected InternalDispose method to free up resources. InternalDispose takes a Boolean argument. When called by Dispose , the argument will be true and the method will release all resources held by the object whether they are managed or unmanaged. For managed resources, this will typically be accomplished by calling the Dispose method on member objects. If InternalDispose is called through the destructor when the object is being finalized, a false argument is passed and the method does not attempt to dispose of managed resources, leaving them to be finalized. Derived classes should implement their own InternalDispose to free their own resources, and they must call the base class method before returning.

Finally, all class methods must check whether the bDisposed flag has been set. If it has, the object is no longer valid for use and an ObjectDisposedException should be thrown. Note that the developer has to implement this check in all class methods: there is no way for an object to automatically throw an ObjectDisposedException exception.

Destructors in Visual C# and Managed C++

You use the familiar C++ syntax to declare a destructor in both Visual C# and managed C++ classes. These destructors, however, do not operate in the same way as traditional C++ destructors.

You'll find that you cannot override the protected Finalize method in Visual C# and managed C++ classes. To provide a finalizer, you implement a destructor in which you implement finalization code. The compiler will convert the destructor into a call to Finalize .