.NET Memory Management | Introducing Microsoft .NET (Pro-Developer)

One of the main sources of nasty, difficult-to-find bugs in modern applications is incorrect use of manual memory management. Older languages such as C++ required programmers to manually delete objects that they had created, which led to two main problems. First, programmers would create an object and forget to delete it when they finished using it. These leaks eventually consumed a process’s entire memory space and caused it to crash. Second, programmers would manually delete an object but then mistakenly try to access its memory location later. Visual Basic would have detected the reference to invalid memory immediately, but C++ often doesn’t. Sometimes the transistors that had made up the deleted object memory would still contain plausible values, and the program would continue to run with corrupted data. These mistakes seem painfully obvious in the trivial examples discussed here, and it’s easy to say, “Well, just don’t do that, you doofus.” But in real programs, you often create an object in one part of the program and delete it in another, with complex logic intervening—logic deleting the object in some cases but not others. Both of these bugs are devilishly difficult to reproduce and harder still to track down. Programming discipline helps, of course, but we’d really like some way to keep our programmers thinking about our business logic, not about resource management. You can bet that Julia Child, the grand dame of TV chefs, hires someone to clean up her kitchen when she’s done with it so that she can concentrate on the parts of cooking that require her unique problem-domain expertise.

Manual memory management leads to costly, hard- to-find bugs.

Modern languages such as Visual Basic and Java don’t have this type of problem. These languages feature “fire-and-forget” automatic memory management, which is one of the main reasons that programmers select them for development. A Visual Basic 6.0 programmer doesn’t have to remember to delete the objects that she creates in almost all cases. (Remember that “almost”; it will figure into an important design decision later.) Visual Basic 6.0 counts the references to each object and automatically deletes the object and reclaims its memory when its count reaches zero. Her development environment provides her with an automatic scullery maid cleaning the used pots and pans out of her sink and placing them back on her shelves. Wish I could get the same thing for my real kitchen. Maybe if you tell all your friends to buy this book....

Automatic memory management and resource recovery of the type built into Visual Basic and Java is a very useful feature.

Microsoft has made automatic memory management part of the .NET common language runtime, which allows it to be used from any language. It’s conceptually simple, as shown in Figure 2-16.

click to expand
Figure 2-16: Automatic memory management with garbage collection.

A programmer creates an object using the new operator and receives a reference to it. The common language runtime allocates that object’s memory from the managed heap, a portion of a process’s memory reserved by the runtime for this purpose. Every so often, a system thread examines all the objects in the managed heap to see which of them the program still holds outstanding references to. An object to which all references have disappeared is called garbage and is removed from the managed heap. The objects remaining in the managed heap are then compacted together, and the existing references in the program fixed to point to their new location. The entire operation is called garbage collection. It solves the aforementioned problems of manual memory management without you having to write any code. You can’t forget to delete an object because the system cleans up after you. And you can’t access a deleted object through an invalid reference because the object won’t be deleted as long as you hold any reference to it. Obviously, garbage collection is going to take more CPU cycles to run than just a standard in-out heap allocator, even though it is written to ensure that it doesn’t check an object twice or get caught in circular object references. As I said previously, I think this is a good investment of CPU cycles because it gets you faster development time with fewer bugs.

The common language runtime garbage collector makes automatic memory management available to any application.

This magical collection of garbage takes place when the garbage collector darn well feels like it. Apart from detecting no more available memory in the managed heap in response to an allocation request, no one really knows what the exact algorithm is for launching a garbage collection, and I wouldn’t be surprised to see it vary from one version to another of the released product. You can force a garbage collection manually by calling the function System.GC.Collect. You might want to make this call at logical points in your program; for example, to clear away the debris just after a user saves a file or perhaps to clear the decks just before starting a large operation. Most of the time you just let the garbage collector do its thing when it wants to.

The garbage collector runs when it feels like it, but you can force a garbage collection manually.

Automatic garbage collection looks great so far, but it leaves us with one gaping hole. What about the cleanup that an object needs to do when it gets destroyed? C++ applications usually cleaned up in an object’s destructor, and Visual Basic classes did the same thing in their Class_Terminate methods. This is a good location for cleanup code because a client can’t forget to call it, but how can we handle this with automatic garbage collection? First, let’s realize that the problem has gotten considerably smaller. The main cleanup task we performed in C++ destructors was to delete additional objects to which the destructing object held references, and now garbage collection takes care of that for us automatically. But occasionally we’ll need to do some cleanup that doesn’t involve local garbage-collected resources; for example, releasing a database connection or logging out from a remote system.

Before garbage collection, we often put cleanup code in an object’s destructor or Class_Terminate method.

The common language runtime garbage collection supports the notion of a finalizer, an object method that is called when the object is garbage collected. It is somewhat analogous to a C++ class destructor and also to the Visual Basic Class_Terminate method, both of which it replaces. However, a finalizer is significantly different from both of these other mechanisms in ways you may find unsettling. The universal runtime base class System.Object contains a method called Finalize, which we override as shown in Listing 2-10. When the object is garbage collected, the garbage collection thread detects the fact that our object has a Finalize method and calls it, thereby executing our cleanup code. Although early versions didn’t do it, the released version of .NET calls all outstanding finalizers automatically when an application shuts down.

The garbage collector supports an object finalizer method for necessary cleanup code.

Listing 2-10: Providing a Finalize function in an object.

Protected Overrides Sub Finalize() ‘ Perform whatever finalization logic we need. MessageBox.Show("In Finalize, my number = " + _ MyObjectNumber.ToString()) ‘ Forward the call to our base class. MyBase.Finalize() End Sub

In C#, you supply a finalizer by writing what looks like an ordinary destructor, but under the hood your compiler is overriding the Finalize method and it behaves as a garbage-collected finalizer and not a deterministic destructor as in C++. This is the only case I’ve ever seen in which Visual Basic code provides a clearer view of what’s really going on behind the scenes than a C-family language does.

Note

Finalizers look simple, but their internal behavior is actually quite complex and it’s fairly easy to mess them up. If you are planning on using them, you MUST read Jeffrey Richter’s account of garbage collection in his book, Applied Microsoft .NET Framework Programming (Microsoft Press, 2002). The fact that it took him a whole chapter to describe it should tell you something about the internal complexity of garbage collection, even if, or perhaps because, its connection to your program is so simple.

Using a finalizer has some disadvantages as well. Obviously it consumes CPU cycles, so you shouldn’t use it if you have no cleanup to do. There is no way to guarantee the order in which the garbage collector calls the finalizers of garbage objects, so don’t depend on one object finalizing before or after another, regardless of the order in which the last reference to each of them disappeared. Finalizers are called on a separate garbage-collector thread within your application, so you can’t do any of your own serialization to enforce a calling order or you’ll break the whole garbage collection system in your process. Since your object became garbage, the objects that it holds might have become garbage too unless you’ve taken steps to prevent that from happening. Don’t plan on calling any other object in your application from your finalizer unless you’ve explicitly written code to ensure that someone is holding a reference to keep the other object from becoming garbage. Don’t plan on throwing exceptions (see the section about structured exception handling later in this chapter) from your finalizer; no one is listening to you any more, you garbage object, you. And make sure you catch any exceptions generated by your cleanup code so that you don’t disturb the garbage collector’s thread that calls your finalizer.

Using a finalizer can be trickier than it looks.

Finalizers are fine if we don’t care when our cleanup gets done, if “eventually, by the time you really need it, I promise” is soon enough. Sometimes this is OK, but it isn’t so good if the resources that a finalizer would recover are scarce in the running process—database connections, for example. Eventual recovery isn’t good enough; we need this object shredded NOW so that we can recover its expensive resources that the generic garbage collector doesn’t know about. We could force an immediate garbage collection, as discussed previously, but that requires examining the entire managed heap, which can be quite expensive even if there’s nothing else to clean up. Since we know exactly which object we want to dismantle, we’d like a way of cleaning up only that object, as Julia often wipes off her favorite paring knife without having to clean up her entire kitchen (including taking out the garbage). This operation goes by the grand name of deterministic finalization. Objects that want to support deterministic finalization do so by implementing an interface called IDisposable, which contains the single method, Dispose. In this method, you place whatever code you need to release your expensive resources. The client calls this method to tell the object to release those resources right now. For example, all Windows Forms objects that represent a window in the underlying operating system support this feature to enable quick recovery of the operating system window handle that they contain.

Sometimes you will see an object provide a different method name for deterministic finalization in order for the name to make sense to a developer who needs to figure out which method to call. For example, calling Dispose on a file object would make you think that you were shredding the file, so the developer of such an object will provide deterministic finalization through a method with the more logical name of Close.

Deterministic finalization sounds like a good idea, but it also contains its own drawbacks. You can’t be sure that a client will remember to call your Dispose method, so you need to provide cleanup functionality in your finalizer as well. However, if your client does call Dispose, you probably don’t want the garbage collector to waste its time calling your object’s finalizer, as the cleanup should have already been done by the Dispose method. By calling the function System.GC.SuppressFinalize, you tell the garbage collector not to bother calling your finalizer even though you have one. A Visual Basic object also needs to expressly forward the Dispose call to its base class if the base class contains a Dispose method, as the call won’t otherwise get there and you will fail to release the base class’s expensive resources. A C# destructor does this automatically. A sample Dispose method is shown in Listing 2-11. This class is derived from System.Object, which doesn’t contain a Dispose method, so I’ve omitted the code that would forward that call.

An object that wants to provide a deterministic way for a client to release its resources exposes a method called Dispose.

Listing 2-11: Sample Dispose method for deterministic finalization.

Public Class Class1 Implements System.IDisposable Public Sub Dispose() Implements System.IDisposable.Dispose ‘ Do whatever logic we need to do to immediately free up ‘ our resources. MessageBox.Show("In Dispose(), my number = " + _ MyObjectNumber.ToString()) ‘ If our base class contained a Dispose method, we’d ‘ forward the call to it by uncommenting the following line. ‘ MyBase.Dispose() ‘ Mark our object as no longer needing finalization. System.GC.SuppressFinalize(Me) End Sub End Class

I’ve written a small sample program that illustrates the concepts of automatic memory management and garbage collection. You can download it from this book’s Web site. A picture of the client app is shown in Figure 2-17. Note that calling Dispose does not make an object garbage. In fact, by definition, you can’t call Dispose on an object that is garbage because then you wouldn’t have a reference with which to call Dispose. The object won’t become garbage until no more references to it exist, whenever that may be. I’d suggest that your object maintain an internal flag to remember when it has been disposed of and to respond to any other access after its disposal by throwing an exception.

You have to write code to handle the case in which a client accesses your object after calling Dispose on it.

Figure 2-17: Memory management client application.

While automatic garbage collection makes the simple operations of allocating and freeing objects easier to write and harder to mess up than they were in C++, it makes deterministic finalization harder to write and easier to mess up than it was in Visual Basic 6. C++ programmers will probably consider this a great profit, while Visual Basic programmers, who are used to their automatic, almost foolproof behavior also being deterministic, may at first consider it a step back. The reason that Microsoft switched to garbage collection is that Visual Basic’s reference counting algorithm didn’t correctly handle the case of circular object references, as in the case where a child object holds a reference to its parent. Suppose object A creates object B, object B creates object C, and object C obtains and holds a reference to its parent, object B. Suppose that object A now releases object B. Object B won’t be destroyed now because object C still holds a reference to it, and C won’t let go until B lets go of it. Unless a programmer writes code to break the circular reference before A lets go, both B and C are leaked away, orphans with no references except their hold on each other, which keeps them both alive. The garbage collection algorithm will automatically detect and handle this circular reference case, while reference counting will not. After much discussion of alternatives and banging of heads against walls, Microsoft decided that foolproof, automatic leak prevention in all cases was more important than easy determinism. Some programmers will agree, others won’t, but the choice was carefully reasoned and not capricious. After an initial period of suspicion, I’m coming around to this way of thinking. I find that I don’t need deterministic finalization very often, and as a refugee from C++ memory leaks, I REALLY love the fire-and-forget nature of garbage collection.

Microsoft decided on garbage collection memory management to make it leak proof, even at the cost of easy determinism.

Tips from the Trenches

The adoption of garbage collection has gone pretty much as I expected. My customers report that they absolutely love its automatic fire-and-forget nature, but they hate the deterministic finalization design pattern because clients often forget to call Dispose. Remember, you need deterministic finalization only in two cases: where your object is a wrapper for an expensive resource, or where you need to enforce object cleanup in a certain order. A somewhat hacky solution to the former case is to place a static counter in the class that wraps expensive resources, increment that counter in the class’s constructor, and, when it hits a certain value, reset it and force a garbage collection.