Understanding Memory | Advanced .NET Programming

I'm going to start off with this topic because the memory used by an application can have a big impact on performance, and the details of how Windows manages memory is something that is often misunderstood.

One of the first things that gets ingrained into developers who start working with pointers and memory allocation in Windows is that ever since 32-bit Windows emerged, every running process has had available a virtual address space of four gigabytes (it's more than that on a 64-bit machine - but the principles are unchanged, so will stick to a 32-bit analysis here). On Windows NT/2000/XP and so on, only the first 2GB of this are available for your application (3GB on some server platforms), the remainder being reserved for system use. On Windows 95/98/ME, the situation is more complicated with several specific areas of memory being reserved.

Of course, very few machines have that much RAM available. For example, the machine I'm currently writing this chapter on has 256MB of RAM - less than a tenth of the address space seen by every application I run on it. The way that this discrepancy is resolved is of course through Windows's virtual memory system, in which the addresses that code in an application refers to (the values of pointers) are transparently mapped by Windows into pages of memory in RAM. A page here simply means a block of memory. The size of the page is system dependent, but on x86 machines it's 4KB, and to keep this discussion simple, I'm going to work with that page size.

Although your application can see 4GB of virtual address space, only a tiny fraction of this is ever actually going to be used in most apps. The ranges of virtual memory that a process actually wants to use will be known to Windows (either because Windows reserved that memory when it loaded up the process, or because the process asked Windows for the memory) - and these ranges of memory will have been allocated in 4KB chunks to pages of physical memory. Whenever the code in the process refers to any address in memory, under the hood Windows will map this address onto the appropriate page in RAM, so that the correct memory is accessed. That's why the addresses seen by an app are known as virtual addresses.

Although I'll loosely refer to Windows mapping addresses onto an appropriate page, it's worth bearing in mind that this process of mapping virtual addresses is implemented directly by the hardware rather than by software on many machines, including x86-based machines.

To get a better feel for how this works, let's suppose that due to some dynamic memory allocation (for example, you just instantiated some new objects on the managed heap), your application (or strictly speaking, the garbage collector in this example) has asked the operating system for another thousand bytes of memory to start at virtual address 0x10000fce (I'll keep the numbers simple for the sake of argument). Suppose the situation before this request was as shown in the diagram:

click to expand

What the diagram shows is that the 4KB of memory starting at address 0x10000000 is already mapped to a page of memory. The final 500 bytes of this page (starting at address 0x10000fce) weren't actually in use, but they got mapped anyway because the blocks allocated always come in units of one page. This means that when the new request for memory gets submitted to Windows, the first 500 bytes of this memory can be satisfied from the existing page in RAM. The remaining 500 bytes, however, overrun that page, which means that in order to get those bytes of virtual address space mapped to some physical memory, Windows will have to locate another page somewhere else in RAM that's free to be allocated to the process. Windows will do this, giving your program another 4,096 bytes of memory - more than the process asked for, but that doesn't really matter.

Note that a consequence of this is that consecutive locations in virtual memory that fall on page boundaries probably don't correspond to consecutive addresses in physical memory. That's not a problem as Windows always maps the virtual memory addresses to the correct physical memory - you never have to worry about that in your code.

We can now start introducing some terminology. Virtual addresses in your code that are currently mapped to pages of physical memory are said to be committed. Applications can also indicate whether they need pages to be committed for read-write access or read access only. Requesting a block for read access only is normally done for pages containing executable code, and allows Windows to perform optimizations based on knowing that the contents of that page cannot ever be modified. Virtual addresses that have not been allocated to any physical memory are said to be free. There's also another status, reserved. Your program can reserve virtual address ranges, which means that these virtual addresses are marked as in use (and cannot therefore be the target of dynamic memory allocations), but no physical memory is actually allocated. This happens for example when we know that a DLL is going to need to be loaded at a certain address, but hasn't been loaded yet. Your application can of course only actually access memory in the committed address ranges. The usual cause of a memory access violation is an attempt in executable code to access memory, specifying a virtual address that has not been committed. Since no such physical memory exists for that address, there is not much that Windows can do other than throw an access violation. The actual process of committing, reserving, or freeing memory is handled by two API functions, VirtualAlloc() and VirtualFree() - these functions lie under the hood in many dynamic memory operations, such as the C++ new and delete operators. You can invoke these methods directly if you want more direct control over memory.

The procedure I've just described is of course sufficient provided all the processes running on your system - as well as the Windows operating system itself - do not between them attempt to commit more memory than is available in your RAM. When this happens, some process will request some memory, and Windows won't be able to satisfy the request because there is insufficient memory in RAM. In this case, Windows will perform an operation known as swapping out. It will identify some pages of memory that have not been used for a while. These pages may belong to the same process or they may belong to some other process. These pages will be transferred out from RAM to the hard drive, where they will be stored in a special file in the file system, known as the page file. That frees up some pages in RAM, which can then be allocated to satisfy the request made by your process. If at any point an application references some pages that have been swapped out to disk, Windows will automatically swap those pages back into RAM before they are accessed. Whenever this occurs, the application is said to incur a page fault.

Although I've talked about swapping in and out in terms of writing to the hard drive, the hard drive is not necessarily involved. In fact, Windows routinely swaps pages out of a process even if they are not immediately needed by anyone else - this is in order to ensure that a process doesn't hog too much memory in RAM, so that all other processes can get a fair slice of memory according to their needs. However, when this happens, the swapped out data isn't immediately transferred to disk - the relevant pages are simply marked as available to be grabbed if any process needs it. The data is only copied out if some other application does actually take over that memory. Indeed, it's quite common for pages to be swapped out, left untouched in RAM, and then swapped back into the same process when that process tries to access those pages again. In that case, the performance hit from swapping the pages back in is negligible - it's just the time taken for Windows to update its internal tables of what memory is being used for what purposes. This kind of page fault is known as a soft page fault (as opposed to a hard page fault, when the data actually needs to be retrieved from the page file). Windows is quite clever when it comes to monitoring whether memory has been modified - and it's quite adept at leaving pages of data lying around as long as possible in order to minimize the number of times that data needs to be copied to or from disk.

At this point it's worth introducing a couple of other terms that it's useful to be familiar with. You've probably heard the term working set thrown around occasionally. Many developers assume that the working set roughly means how much memory the application is consuming. More precisely, the working set is the memory in RAM that is marked as currently belonging to that process. In other words, it's basically that part of the process's memory that has not been paged out.

Besides the memory that forms part of a process's virtual address space, there is also some system memory that is maintained by Windows on behalf of the process. This memory contains information that Windows needs in order to be able to execute the process. Some of this memory is so vital that it is never allowed to be swapped out - this memory is referred to as the non-paged pool. Other parts of this memory can be swapped out if needed, and that memory is referred to as the paged pool.

The main significance of all this for performance is of course the time consumed by servicing page faults. Every hard page fault costs a process performance, so avoiding page faults is one of the big keys to keeping performance up. Unfortunately, because virtual memory management takes place invisibly to processes, you don't really have any low-level control over memory management. The main technique to avoid page faults is to keep the amount of virtual memory needed by an application to the minimum possible. It also helps if variables that you access together are stored in the same page - that's also not directly under your control, but you can influence it: variables are more likely get put on the same pages if dynamically allocated variables are allocated at the same time and if the heap is not too fragmented - the garbage collector is of course designed to take advantage of this optimization. Other things you can do are to buy more RAM, and not to run too many processes at the same time (though that's under the control of the user rather than the developer). In general, however, an excessive number of page faults can be a clue that your application is simply using too much memory. You can also get a direct visual and audio clue to this if you hear a lot of disk thrashing on your machine - a clear sign of excessive hard page faults.

Bear in mind as well that I have simplified the above description considerably. Behind the scenes, Windows is running some fairly sophisticated algorithms that determine, based on how each application is using memory, which pages can be swapped in and out of a process, and how large a working set to maintain for each application. Windows also does some work predicting which areas of virtual memory are likely to be needed by a particular process, and swapping them back into RAM before they are required. The main way of doing this optimization is that when a page is requested, neighboring pages (in the process's virtual address space) are automatically swapped in at the same time, on the assumption that pages tended to be accessed in clusters. Another performance gain comes from shared memory. If two applications load the same DLL at the same address, and neither application is going to alter the contents of the pages into which the DLL is loaded, then there is no need for Windows to hold two copies of the same data! Instead, both applications will have their relevant pages of virtual address space mapped to the same physical memory. This optimization is particularly important for managed code, since all managed programs need to load up mscoree.dll, as well as a number of related DLLs that - contain the code that runs the CLR. These files are quite large, and do hit performance at start-up time for the first managed program that is executed. However, you'll notice that any managed application typically starts up faster if there's another managed application already running. That's because DLLs such as mscoree.dll are already loaded into the first application. So the second managed application can simply share the pages that contain these files that are already in use - there's no need for the second application to load these files up separately.

One of the benefits of Windows XP, incidentally, is that XP has considerably more sophisticated algorithms for managing page swapping and virtual memory. Among the new things that XP will do is monitor the pages used by an application at start-up, so that the next time the application runs, those pages can be loaded more quickly.

Assessing Memory Usage with the Task Manager

The Task Manager provides an extremely convenient and quick way to get information about the resources that a process is using, including its memory usage and the number of handles it is holding. Just about every developer will have brought up the Task Manager at some point in order to kill a process, or to make a quick check on how much CPU time or memory some process is currently taking up. However, it's very easy to get misled by the information concerning memory usage that is reported by Task Manager. In this section I'll show you the correct way to interpret the information.

Information about memory usage is displayed in the Processes and Performance tabs, and we'll look at both of these in turn.

The Processes Tab

The Processes tab really comes into its own when you customize the columns displayed.

You can do this by clicking on the View menu and choosing the Select Columns option. You'll be presented with a dialog box that invites you to modify which information is shown for each process. Showing the full set of columns, the Task Manager looks something like this:

click to expand

The Task Manager by default shows a column called Mem Usage - and of course many developers assume that this column tells you how much virtual memory the application requires. In fact, this column tells you the working set of the application; that is to say the amount of space in RAM that has been specifically committed to this process. As we saw earlier, this figure excludes all virtual memory that has been swapped out, including memory that the operating system has simply soft-swapped out in case any other applications need it. As a guide to the memory demands that this application is making on the system, this figure is next to useless. A far better indicator of this quantity is the VM Size column, which directly measures the amount of a virtual address space that the application has committed. If you have a large memory leak it is the VM Size which will steadily creep up. If you are interested in using the Task Manager to monitor how much memory an application is consuming, then I strongly recommend that you replace Mem Usage with VM Size in your choice of columns that the Task Manager displays. And personally, I'd argue that displaying Mem Usage by default was quite a bad design choice for the Task Manager.

The only potential problem with the VM Size column is that it does not take account of shared memory. If, for example, you are running several managed applications, then the memory taken by the CLR's own code (which is quite substantial) will show up separately in all these processes, even though only one copy of this code will be loaded into memory. (The Mem Usage column has the same problem, by the way.)

Another indicator that is worth watching is the Page Faults column, since this can tell you if performance is being hit because of memory being swapped in and out of disk too much. Bear in mind, however, that this figure measures all page faults - including soft page faults, which have a negligible impact on performance.

There are several other memory-related indicators which I normally find less useful but which may still be of relevance in some situations:

The Paged Pool and NP Pool columns directly measure the memory taken by the system paged and non-paged pools. Since this memory is allocated by the system and not under the control of your application, there isn't really anything you can do with this data.
The Peak Mem indicates the highest figure for Mem Usage that has occurred since the process started.
Mem Delta indicates the change in the Mem Usage column between the last and the previous updates.

You can also get an indirect measure of resource usage by monitoring the columns related to handles: These columns are Handles, GDI Objects, and USER Objects. The GDI Objects column specifically indicates those objects committed to your application that are maintained by the graphics GDI system (GDI has its own resource manager so these objects are treated separately) while USER Objects indicates certain items related to the windowing system, such as windows, menus, cursors, and so on - user objects also have their own resource manager. You should be aware however that the GDI+ library that underpins many System.Drawing classes is independent of GDI, so GDI+ objects won't necessarily show up as GDI objects.

Monitoring how these columns change over time can give you a clue to any problems involving your application not freeing handles - in the case of managed code, that is most commonly caused by failing to call Dispose() on objects that you have finished with. By itself, this won't be much of a problem in most cases, but failing to call Dispose() might also be bloating your application's virtual memory requirements, and hence causing you extra page faults.

The Performance Tab

This tab gives similar information to some of the columns in the Processes tab, but aggregates the information over all running processes. It does not break down the data by process. Where this tab is useful, however, is that it presents a couple of graphs showing how the two main indicators have varied over time. These indicators are CPU usage (the percentage of time the CPU has spent actually running processes), and a quantity that is euphemistically called PF Usage in Windows XP, and Mem Usage in earlier versions of Windows. Both these descriptions are misleading - this quantity appears to be the sum of committed virtual memory by all running processes, and therefore includes memory in RAM and data that is paged out either to RAM or to the paging file.

click to expand

The main use of this tab is that if you are confident that no other running processes are going to significantly impact the graphs, you can use the graphs to get a feel for the way that a given process is using memory and CPU time, over a period of time. It's also useful on multi-processor machines, to check if all CPUs are actually being used.

As far as the data presented below the graphs is concerned, the Totals information should be obvious. The Commit Charge figure just shows the current value of the total virtual memory - the same as the current figure in the PF Usage History graph. The limit is the maximum that can be sustained between RAM and the paging file. If more than this is ever needed, Windows will automatically increase the size of the paging file - though that is usually an indication that some application is misbehaving. Physical Memory simply indicates how much of the RAM on your system is available for use. Kernel Memory indicates the sizes of the paged and non-paged pools we discussed earlier.

The UseResources Example

We're now going to look at an example application that uses a large amount of memory, so that we can illustrate how the Task Manager can be used to monitor the performance of this app.

The sample looks like this:

click to expand

It features buttons to allocate memory. Clicking these buttons will cause some large managed arrays to be created that occupy either 1 MB or 50 MB. This allocation is cumulative, so if you hit the Allocate 50MB button once and the Allocate 1 MB button six times, you'll have a total of 56 MB data allocated and referenced from the Form1 object. The Cleanup Arrays button causes all the references to this data to be removed, and a garbage collection to be started to free up the memory. The Empty Working Set button is somewhat different: clicking this button invokes a native API function, EmptyWorkingSet(), which causes all non-essential memory in this process to be removed from the working set.

Running this example can teach us quite a bit about memory usage of managed applications. But before we see it in action, let's have a quick look at the code behind it.

If you do download and run the UseResources example, do be careful how much memory you allocate. Clicking the Allocate 50MB button a few times can easily completely fill your RAM and paging file. If this happens, you'll get a dialog box warning you that the system is low on virtual memory, and Windows will then automatically and permanently increase the size of the paging file. Within moderation that's OK, but if your page file size gets too big, you might notice the corresponding reduction in free hard drive space. You might prefer to check the size of the page file before you run the application (it's in the Control Panel, under System), so that if it does grow, you can restore it to its original size after running UseResources.

Besides the Form1 class, the sample contains a class called MegaByteClass. The class doesn't do anything, except that each MegaByteClass instance occupies 1MB of memory:

 public class MegaByteClass {    int [][] array = new int[100][];    public MegaByteClass()    {       for (int i=0; i<100; i++)          array[i] = new int[2600];    } }

The int type occupies four bytes, so to occupy 1MB we need 262,144 ints - but since the exact quantities aren't too critical, I've called it 260,000. In this code I've broken that number down into a hundred arrays of 2,600 ints each. The reason for doing this (as opposed to having one single array) is because of the well-known large object memory leak in .NET version 1.0. If I simply declared one int[2500000] array, then the array size would exceed 20,000 bytes, and hence be placed on the special large object managed heap. Unfortunately, in .NET version 1.0, there is a bug, in which memory allocated on this heap is not freed. The bug will almost certainly get fixed in future versions of .NET, but since I want this example to work on any version of .NET, I've instead set up a large number of smaller arrays with the same total size. Since we are now dealing with many smaller objects, these objects will be allocated on the normal managed heap, which does not have any known memory bugs.

Now we need an extra member field in the Form1 class:

 public class Form1 : System.Windows.Forms.Form {    private ArrayList arrays = new ArrayList();

This ArrayList will hold the references to all the MegaByteClass objects we create. Extra memory is allocated using this method:

 public void AddArray() {    arrays.Add(new MegaByteClass()); }

Now for the button Click event handlers:

 private void btn1MB_Click(object sender, System.EventArgs e) {    AddArray();    this.statusBar.Text = arrays.Count.ToString() + " MB added"; } private void btn50MB_Click(object sender, System.EventArgs e) {    for (int i=0; i<50; i++)    {       AddArray();       this.statusBar.Text = arrays.Count.ToString() + " MB added";       this.statusBar.Refresh();    } }

When allocating blocks of 50MB, the sample takes care to display progress information in the status bar, as it goes along. This is because allocating that much memory can take a while, so it's nice to keep the user updated about what's happening.

Cleaning up the memory looks like this:

 private void btnCleanupArrays_Click(object sender, System.EventArgs e) {    arrays.Clear();    this.statusBar.Text = "No arrays allocated";    GC.Collect(GC.MaxGeneration); }

Finally, here's the code and associated DllImport declaration for when the user clicks on Empty Working Set:

 [DllImport("psapi.dll")] static extern int EmptyWorkingSet(IntPtr hProcess); private void btnEmptyWorkingSet_Click(object sender, System.EventArgs e) {    IntPtr hThisProcess = Process.GetCurrentProcess().Handle;    EmptyWorkingSet(hThisProcess); }

The EmptyWorkingSet() API function takes the handle that identifies a process, and simply pages out any memory from this process which can be paged out. The data isn't of course actually moved to disk, but any pages that don't form part of the non-paged pool will simply be marked as no longer a part of the application's working set. My reason for including this facility in the example was because it will provide a very clear demonstration of just how meaningless the Mem Usage column in Task Manager is for most purposes. Calling EmptyWorkingSet() immediately makes your application appear to be occupying less memory, even though the committed virtual memory is unchanged.

Running the UseResources application gives this result on my machine when the application first starts up:

click to expand

Now you can do something very interesting by minimizing the application...

click to expand

...and restoring it again...

click to expand

What's going on here? Well, when any managed application starts up, it brings in a lot of pages of data required to execute code that initializes the CLR and performs other start-up tasks. A lot of this code will never be executed again, but it remains in the process's working set until Windows decides that those pages have not been touched for a sufficiently long time that it may as well swap them out. Minimizing a form will provoke Windows into swapping out a large number of pages immediately, on the basis that if you are minimizing a form, that's usually a good indication that that process is unlikely to be doing anything more for a while. When you restore the application, Windows will find that some of the swapped out pages are actually needed, and will page fault them back into the working set. If those pages haven't been grabbed by any other app, these will of course be soft page faults that don't impact performance.

The same principles apply to unmanaged applications, except that unmanaged applications don't have the overhead of the CLR to bring into memory (though they may have other libraries such as the MFC or VB6 libraries). Notice that through all this, the VM size is virtually unchanged.

Now if I click the 1MB button five times to reserve 5MB of memory, we can see the virtual memory grow by 5MB:

click to expand

On clicking the 50MB button four times, this happens:

click to expand

As you can see, the virtual memory has grown roughly by the indicated 200 MB, but the working set hasn't grown by nearly as much. If you actually try this out, you may see the Mem Usage figure fluctuating; the allocated memory is always added immediately to the working set, but every so often Windows will decide that the working set is getting too big, and so will swap pages out. The less RAM you have or the more processes there are running on your system, the sooner this will start happening. Notice also the way that the number of page faults has shot up from about four thousand to over one hundred thousand now that so much more virtual memory is required.

Hitting the Cleanup Arrays button removes the added virtual memory:

click to expand

Finally, if we hit the Empty Working Set button, the results look dramatic for the Mem Usage column (though virtual memory is, obviously, unchanged):

click to expand

However, this low memory usage is illusory. If you do anything that causes any code to be executed (for example something that forces a repaint, or you click on another button), most of those pages will be immediately brought back into the working set.