When to Commit Physical Storage | Programming Applications for Microsoft Windows (Microsoft Programming Series)

[Previous] [Next]

Let's pretend you're implementing a spreadsheet application that supports 200 rows by 256 columns. For each cell, you need a CELLDATA structure that describes the contents of the cell. The easiest way for you to manipulate the two-dimensional matrix of cells would be to declare the following variable in your application:

 CELLDATA CellData[200][256];

If the size of a CELLDATA structure were 128 bytes, the two-dimensional matrix would require 6,553,600 (200 × 256 × 128) bytes of physical storage. That's a lot of physical storage to allocate from the paging file right up front for a spreadsheet, especially when you consider that most users put information into only a few spreadsheet cells, leaving the majority unused. The memory usage would be very inefficient.

So, historically, spreadsheets have been implemented using other data structure techniques, such as linked lists. With the linked-list approach, CELLDATA structures have to be created only for the cells in the spreadsheet that actually contain data. Since most cells in a spreadsheet go unused, this method saves a tremendous amount of storage. However, this technique makes it much more difficult to obtain the contents of a cell. If you want to know the contents of the cell in row 5, column 10, you must walk through linked lists in order to find the desired cell, which makes the linked-list method slower than the declared-matrix method.

Virtual memory offers us a compromise between declaring the two-dimensional matrix up front and implementing linked lists. With virtual memory, you get the fast, easy access offered by the declared-matrix technique combined with the superior storage savings offered by the linked-list technique.

For you to obtain the advantages of the virtual memory technique, your program needs to follow these steps:

Reserve a region large enough to contain the entire matrix of CELLDATA structures. Reserving a region uses no physical storage at all.

When the user enters data into a cell, locate the memory address in the reserved region where the CELLDATA structure should go. Of course, no physical storage is mapped to this address yet, so any attempts to access memory at this address will raise an access violation.

Commit only enough physical storage to the memory address located in step 2 for a CELLDATA structure. (You can tell the system to commit physical storage to specific parts of the reserved region—a region can contain both parts that are mapped to physical storage and parts that are not.)

Set the members of the new CELLDATA structure.

Now that physical storage is mapped to the proper location, your program can access the storage without raising an access violation. This virtual memory technique is excellent because physical storage is committed only as the user enters data into the spreadsheet's cells. Because most of the cells in a spreadsheet are empty, most of the reserved region will not have physical storage committed to it.

The one problem with the virtual memory technique is that you must determine when physical storage needs to be committed. If the user enters data into a cell and then simply edits or changes that data, there is no need to commit physical storage—the storage for the cell's CELLDATA structure was committed the first time data was entered.

Also, the system always commits physical storage with page granularity. So when you attempt to commit physical storage for a single CELLDATA structure (as in step 2 above), the system is actually committing a full page of storage. This is not as wasteful as it sounds: committing storage for a single CELLDATA structure has the effect of committing storage for other nearby CELLDATA structures. If the user then enters data into a neighboring cell—which is frequently the case—you might not need to commit additional physical storage.

There are four methods for determining whether to commit physical storage to a portion of a region:

Always attempt to commit physical storage. Instead of checking to see whether physical storage is mapped to a portion of the region, have your program try to commit storage every time it calls VirtualAlloc. The system first checks to see whether storage has already been committed and, if so, does not commit additional physical storage. This approach is the easiest but has the disadvantage of making an additional function call every time a CELLDATA structure is altered, which makes your program perform more slowly.

Determine (using the VirtualQuery function) whether physical storage has already been committed to the address space containing the CELLDATA structure. If it has, do nothing else; if it hasn't, call VirtualAlloc to commit the memory. This method is actually worse than the first one: it both increases the size of your code and slows down your program because of the additional call to VirtualQuery.

Keep a record of which pages have been committed and which haven't. Doing so makes your application run faster: you avoid the call to Virtual-Alloc, and your code can determine more quickly than the system can whether storage has already been committed. The disadvantage is that you must keep track of the page commit information somehow, which could be either very simple or very difficult depending on your specific situation.

Use structured exception handling (SEH)—the best method. SEH is an operating system feature that causes the system to notify your application when certain situations occur. Essentially, you set up your application with an exception handler, and then, whenever an attempt is made to access uncommitted memory, the system notifies your application of the problem. Your application then commits the memory and tells the system to retry the instruction that caused the exception. This time the memory access succeeds, and the program continues running as though there had never been a problem. This method is the most advantageous because it requires the least amount of work from you (meaning less code) and because your program will run at full speed. A complete discussion of the SEH mechanism is saved for Chapters 23, 24, and 25. The Spreadsheet sample application in Chapter 25 illustrates exactly how to use virtual memory as I've just described.