Memory Management Under The Hood


One of the advantages of C# programming is that the programmer doesn't need to worry about detailed memory management; in particular, the garbage collector deals with the problem of memory cleanup on your behalf. The result is that you get something that approximates the efficiency of languages like C++ without the complexity of having to handle memory management yourself as you do in C++. However, although you don't have to manage memory manually, if you need to write efficient code, it still pays to understand what is going on behind the scenes. This section takes a look at what happens in the computer's memory when you allocate variables.

Note

The precise details of much of the content of this section are undocumented. You should interpret this section as a simplified guide to the general processes rather than as a statement of exact implementation.

Value Data Types

Windows uses a system known as virtual addressing, in which the mapping from the memory address seen by your program to the actual location in hardware memory is entirely managed by Windows. The result of this is that each process on a 32-bit processor sees 4GB of available memory, irrespective of how much hardware memory you actually have in your computer (on 64-bit processors this number will be greater). This 4GB of memory contains everything that is part of the program, including the executable code, any DLLs loaded by the code, and the contents of all variables used when the program runs. This 4GB of memory is known as the virtual address space or virtual memory. For convenience in this chapter it is referred to simply as memory.

Each memory location in the available 4GB is numbered starting from zero. To access a value stored at a particular location in memory, you need to supply the number that represents that memory location. In any compiled high-level language, including C#, Visual Basic, C++, and Java, the compiler converts human-readable variable names into memory addresses that the processor understands.

Somewhere inside a process's virtual memory is an area known as the stack. The stack stores value data types that are not members of objects. In addition, when you call a method, the stack is used to hold a copy of any parameters passed to the method. To understand how the stack works, you need to understand the importance of variable scope in C#. It is always the case that if a variable a goes into scope before variable b, then b will go out of scope first. Look at this code:

 { int a; // do something { int b; // do something else } } 

First, a gets declared. Then, inside the inner code block, b gets declared. Then the inner code block terminates and b goes out of scope; then a goes out of scope. So, the lifetime of b is entirely contained within the lifetime of a. The idea that you always deallocate variables in the reverse order to how you allocate them is crucial to the way the stack works.

You don't know exactly where in the address space the stack is — you don't need to know for C# development. A stack pointer (a variable maintained by the operating system) identifies the next free location on the stack. When your program first starts running, the stack pointer will point to just past the end of the block of memory that is reserved for the stack. The stack actually fills downward, from high memory addresses to low addresses. As data is put on the stack, the stack pointer is adjusted accordingly, so it always points to just past the next free location. This is illustrated in Figure 7-1, which shows a stack pointer with a value of 800000 (0xC3500 in hex) and the next free location is the address 799999.

image from book
Figure 7-1

The following code instructs the compiler that you need space in memory to store an integer and a double, and these memory locations are referred to as nRacingCars and engineSize. The line that declares each variable indicates the point at which you will start requiring access to this variable, and the closing curly brace of the block in which the variables are declared identifies the point at which both variables go out of scope.

 { int nRacingCars = 10; double engineSize = 3000.0; // do calculations; } 

Assuming you use the stack shown in Figure 7-1, when the variable nRacingCars comes into scope and is assigned the value 10, the value 10 is placed in locations 799996 through 799999, the four bytes just below the location pointed to by the stack pointer. (Four bytes because that's how much memory is needed to store an int.) To accommodate this, 4 is subtracted from the value of the stack pointer, so it now points to the location 799996, just after the new first free location (799995).

The next line of code declares the variable engineSize (a double) and initializes it to the value 3000.0. A double occupies 8 bytes, so the value 3000.0 will be placed in locations 799988 through 799995 on the stack, and the stack pointer is decremented by 8, so that once again, it points just after the next free location on the stack.

When engineSize goes out of scope, the computer knows that it is no longer needed. As a result of the way variable lifetimes are always nested, you can guarantee that, whatever else has happened while engineSize was in scope, the stack pointer is now pointing to the location where engineSize is stored. To remove engineSize from the stack, the stack pointer is incremented by 8, so that it now points to the location immediately after the end of engineSize. At this point in the code, you are at the closing curly brace and so nRacingCars also goes out of scope. The stack pointer gets incremented by 4. When another variable comes into scope after engineSize and nRacingCars have been removed from the stack, it would overwrite the memory descending from location 799999, where nRacingCars used to be stored.

If the compiler hits a line like int i, j, the order of coming into scope looks indeterminate. Both variables are declared at the same time and go out of scope at the same time. In this situation, it doesn't matter in what order the two variables are removed from memory. The compiler internally always ensures that the one that was put in memory first is removed last, thus preserving the rule about no crossover of variable lifetimes.

Reference Data Types

Although the stack gives very high performance, it is not flexible enough to be used for all variables. The requirement that the lifetimes of variables must be nested is too restrictive for many purposes. Often, you will want to use a method to allocate memory to store some data and be able to keep that data available long after that method has exited. This possibility exists whenever storage space is requested with the new operator — as is the case for all reference types. That's where the managed heap comes in.

If you have done any C++ coding that required low-level memory management, you will be familiar with the heap. The managed heap is not quite the same as the heap C++ uses; the managed heap works under the control of the garbage collector and provides significant benefits when compared to traditional heaps.

The managed heap (or heap for short) is just another area of memory from the process's available 4GB. The following code demonstrates how the heap works and how memory is allocated for reference data types:

 void DoWork() { Customer arabel; arabel = new Customer(); Customer mrJones = new Nevermore60Customer(); } 

This code assumes the existence of two classes, Customer and Nevermore60Customer. These classes are in fact taken from the Mortimer Phones examples in Appendix A (which is posted at www.wrox.com).

First, you declare a Customer reference called arabel. The space for this will be allocated on the stack, but remember that this is only a reference, not an actual Customer object. The arabel reference takes up 4 bytes, enough space to hold the address at which a Customer object will be stored. (You need 4 bytes to represent a memory address as an integer value between 0 and 4GB.)

The next line

arabel = new Customer();

does several things. First, it allocates memory on the heap to store a Customer object (a real object, not just an address). Then it sets the value of the variable arabel to the address of the memory it has allocated to the new Customer object. (It also calls the appropriate Customer() constructor to initialize the fields in the class instance, but we won't worry about that here.)

The Customer instance is not placed on the stack — it is placed on the heap. In this example, you don't know precisely how many bytes a Customer object occupies, but assume for the sake of argument it is 32. These 32 bytes contain the instance fields of Customer as well as some information that .NET uses to identify and manage its class instances.

To find a storage location on the heap for the new Customer object, the .NET runtime will look through the heap and grab the first contiguous, unused block of 32 bytes. Again for the sake of argument, assume that this happens to be at address 200000, and that the arabel reference occupied locations 799996 through 799999 on the stack. This means that before instantiating the arabel object, the memory contents will look similar to Figure 7-2.

image from book
Figure 7-2

After allocating the new Customer object, the contents of memory will look like Figure 7-3. Note that unlike the stack, memory in the heap is allocated upward, so the free space can be found above the used space.

image from book
Figure 7-3

The next line of code both declares a Customer reference and instantiates a Customer object. In this instance, space on the stack for the mrJones reference is allocated and space for the mrJones object is allocated on the heap in a single line of code:

Customer mrJones = new Nevermore60Customer();

This line allocates 4 bytes on the stack to hold the mrJones reference, stored at locations 799992 through 799995. The mrJones object is allocated space on the heap starting at location 200032.

It is clear from the example that the process of setting up a reference variable is more complex than that for setting up a value variable, and there is a performance overhead. In fact, the process is somewhat oversimplified here, because the .NET runtime needs to maintain information about the state of the heap, and this information needs to be updated whenever new data is added to the heap. Despite these overheads, you now have a mechanism for allocating variables that is not constrained by the limitations of the stack. By assigning the value of one reference variable to another of the same type, you have two variables that reference the same object in memory. When a reference variable goes out of scope, it is removed from the stack as described in the previous section, but the data for a referenced object is still sitting on the heap. The data will remain on the heap until either the program terminates, or the garbage collector removes it, which will only happen when it is no longer referenced by any variables.

That's the power of reference data types, and you will see this feature used extensively in C# code. It means that you have a high degree of control over the lifetime of your data, because it is guaranteed to exist in the heap as long as you are maintaining some reference to it.

Garbage Collection

The previous discussion and diagrams show the managed heap working very much like the stack, to the extent that successive objects are placed next to each other in memory. This means that you can work out where to place the next object by using a heap pointer that indicates the next free memory location, and which gets adjusted as you add more objects to the heap. However, things are complicated because the lives of the heap-based objects are not coupled to the scope of the individual stack-based variables that reference them.

When the garbage collector runs, it will remove all those objects from the heap that are no longer referenced. Immediately after it has done this, the heap will have objects scattered on it, mixed up with memory that has just been freed (see Figure 7-4).

image from book
Figure 7-4

If the managed heap stayed like this, allocating space for new objects would be an awkward process, with the runtime having to search through the heap for a block of memory big enough to store each new object. However, the garbage collector doesn't leave the heap in this state. As soon as the garbage collector has freed up all the objects it can, it compacts the heap by moving all remaining objects to form one contiguous block of memory. This means that the heap can continue working just like the stack as far as locating where to store new objects is concerned. Of course, when the objects are moved about, all the references to those objects need to be updated with the correct new addresses, but the garbage collector handles that too.

This action of compacting by the garbage collector is where the managed heap really works differently from old unmanaged heaps. With the managed heap, it is just a question of reading the value of the heap pointer, rather than iterating through a linked list of addresses to find somewhere to put the new data. For this reason, instantiating an object under .NET is much faster. Interestingly, accessing objects tends to be faster too, because the objects are compacted toward the same area of memory on the heap, resulting in less page swapping. Microsoft believes that these performance gains more than compensate for the performance penalty that you get whenever the garbage collector needs to do some work to compactthe heap and change all those references to objects it has moved.

Note

Generally, the garbage collector runs when the .NET runtime determines that a garbage collection is required. You can force the garbage collector to run at a certain point in your code by calling System.GC.Collect().The System.GC class is a .NET class that represents the garbage collector, and the Collect() method initiates a garbage collection. The GC class is intended for rare situations in which you know that it's a good time to call the garbage collector; for example, if you have just dereferenced a large number of objects in your code. However, the logic of the garbage collector does not guarantee that all unreferenced objects will be removed from the heap in a single garbage collection pass.




Professional C# 2005
Pro Visual C++ 2005 for C# Developers
ISBN: 1590596080
EAN: 2147483647
Year: 2005
Pages: 351
Authors: Dean C. Wills

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net