Take Out the Trash: Theory of Garbage Collection

Memory management used to be a painful exercise in application development, not to mention a source of security flaws. Applications that have poor memory management continue to consume system resources and eventually bring down the operating system. .NET solves this problem by making use of a generational garbage collector to manage all allocated memory within the application's process. This frees the developer from the tedious task of ensuring that allocated memory is properly release backed to the system.

Reference Counting

When an object is created, the garbage collector allocates memory from the existing application memory pool or, if necessary, increases the memory pool to accommodate the memory allocation. When the object is created, there is a single reference to it. During execution of code, the garbage collector keeps track of the number of references to the object. With each additional reference, a counter is incremented. When references go out of scope, the reference counter is decremented. When the reference counter reaches zero, the object and the memory it occupies are tagged for collection.

Generations

To provide efficient memory management, the garbage collector (GC) is based on the idea that newly created objects will have a short lifetime. These objects are created lower in the managed heap. Figure 2.2 shows a simplified view of the managed heap.

Figure 2.2. Basic view of the managed heap and generations.

As objects age, they are moved up in the generation ladder. The garbage collector scans the generations from lowest to highest when looking to reclaim objects. The lower generation, generation 0, is scanned most frequently while generation 2 is scanned least.

Collection

The GC implements both full and partial collections. In a full collection cycle, the program execution is stopped. By stopping the execution, the GC is free to move memory and fix up the addresses without affecting the executing program. During a full collection cycle, the GC locates all live and dead objects. The live objects are pushed up a generation and the dead objects are reclaimed. A full collection is very expensive and, as such, the GC implements a partial collection algorithm to provide optimal performance.

Partial Collection

Partial collection works under the premise that objects in generation 0 are more likely to be short-lived than in generation 1, and so on up the generation ladder. The GC scans for roots (pointers to objects) and determines which objects can be reached and which cannot be reached. In a perfect world, objects in generation 0 would be reclaimed more often than objects in generation 1. Therefore, the GC scan rate for generation 0 would be higher than that of generation 1 and 2.

Nondeterministic Finalization

An often heated discussion about the GC revolves around the topic of nondeterministic finalization. Essentially, nondeterministic finalization is not knowing exactly when an object is going to be reclaimed. At first sight, this might not seem like such an important topic. However, when you consider dealing with expensive resources such as database connections, socket connections, and graphics resources, this quickly becomes a big deal. Consider the code in Listing 2.2.

Listing 2.2. Potential Resource Leak in .NET

 public class MyDataLayer {   private SqlConnection connection;   public void DoDBWorkA( ) {     if( connection == null || connection.State != ConnectionState.Open ) {         OpenConnection( ); //opens connection to the database      }      ...rest of code...  }   public void DoDBWorkB( ) {     if( connection == null || connection.State != ConnectionState.Open ) {         OpenConnection( ); //opens connection to the database      }      ...rest of code...  } } //else where in code MyDataLayer dataLayer = new MyDataLayer( ); dataLayer.DoDBWorkA( ); dataLayer.DoDBWorkB( );

With nondeterministic finalization, you never know exactly when the SqlConnection might get disposed of and its underlying resources are reclaimed. This is really not a resource leak as much as it is poor coding and a lack of understanding how nondeterministic finalization works. When creating objects that contain expensive resources, such as the database connection, .NET provides the design pattern or concept of IDisposable.

NOTE

An interface defines a contract specifying methods and/or properties that a class agrees to implement. Although the introduction of an interface might seem a bit premature, it is necessary in order to understand how a developer can interact with the GC.

Using `IDisposable` to Create Well-Behaved Objects

By implementing IDisposable, a class says two things. First, it says, "I use expensive resources that need to be released when you've finished with me." Second, "the GC will call the Dispose method upon collection of the object if you forget to." To create a better .NET citizen, the MyDataLayer class has been updated to implement the IDisposable interface and the calling code uses the C# keyword using to ensure that the Dispose method will be invoked when it's done with the object. Look at the updated code for the MyDataLayer class in Listing 2.3.

Listing 2.3. Creating a Better .NET Citizen

   public class MyDataLayer : IDisposable {      private SqlConnection       connection;      public void Dispose( ) {         if( connection != null )            connection.Dispose( );         connection ==null;        }      public void DoDBWorkA( ) {        if( connection == null || connection.State != ConnectionState.Open ) {          OpenConnection( ); //opens connection to the database       }       ...rest of code...        }     public void DoDBWorkB( ) {       if( connection == null || connection.State != ConnectionState.Open ) {          OpenConnection( ); //opens connection to the database       }       ...rest of code...        }      ...rest of code for MyDataLayer }  //Working with the MyDataLayer class  using( MyDataLayer dataLayer = new MyDataLayer( )) {        dataLayer.DoDBWorkA( );        dataLayer.DoDBWorkB( );  }

There are a couple of key points to cover from Listing 2.3. First is the implementation of the IDisposable interface. By implementing the IDisposable interface, the MyDataLayer class is making a statement that it contains one or more expensive resources that should be released as soon as possible. When you encounter a class that implements this interface, be sure to invoke the Dispose method when you're done with it. Second, the use of the using keyword ensures that the Dispose method will be invoked automatically upon exiting the using scope.

The using keyword ensures that when execution exits, the execution block, the Dispose method of the MyDataLayer object, will be invoked. Doing so ensures that the underlying SqlConnection is disposed thereby returning that expensive resource back to the connection pool.

There are other practices that you should employ to write memory efficient code in .NET. The following is a basic to-do list for efficient memory usage and faster generation 0 collection:

Limit object allocation.
Keep objects as small as necessary. Don't create bloated code.
Keep object references to a minimum.

That's the basic tour of the GC provided by .NET. For more detailed information, see the Rotor shared source. Next, we'll look at the class framework provided by .NET.