A.5 A Quick Word on Virtual Memory | HP-UX CSE(c) Official Study Guide and Desk Reference

We couldn't go any further with our discussion on processors without mentioning virtual memory . Processes are given a view of memory that makes them think they have access to every addressable memory reference: In a 64-bit architecture, this means that a process has access to 16EB. This isn't really the case. Each process uses Virtual Addresses to access their data and instructions. This is commonly known as a VAS: a Virtual Address Space. The operating system will maintain a Global Address Space, which is essentially a mapping of all of the VASes available and which process is using which one. The advantage of using Virtual Addresses it that processes have no concept of the actual memory hierarchy used at a hardware level, allowing programs to be moved from machine to machine with no regard for the underlying hardware memory interface. The operating system, i.e., the kernel, is the interface to hardware and as such has to maintain some form of list of which Virtual Addresses actually relate to pages of data located in Physical memory. This list of active pages is commonly referred to as the Page directory . In the context of this discussion, when a process is executing, it is using its Virtual Addresses to access data and instructions. Every memory reference will require the operating system to translate that Virtual Address to a real Physical Address. If the Page Directory itself is stored in memory, we have all the access time issues we have discussed previously with main memory. One solution is to maintain a cache of address translations in a special hardware cache known as the TLB (Translation Lookaside Buffer). The use and success of a hardware TLB again highlights the importance of the principles of locality . If we are using a piece of code now, we will probably need it again soon: That's temporal locality . It is also likely that the next piece we need will be relatively close to where we are now: That's spatial locality . This all lends weight to the argument of utilizing a hardware TLB. This is reflected in the fact that lots of processors utilize a hardware TLB. Being on-chip , as in the case of an L1 cache, the size of a TLB is limited. If we can't find a translation in the TLB, we have suffered a TLB miss . This needs to be rectified by fetching the translation from the main memory Page Directory , potentially wasting overall execution time. To alleviate this problem, we again need to look at our programs and how they are effectively using the TLB. We saw earlier that main memory is a collection of pages: On HP machines, that's a page 4KB in size. Programmers need to be very careful how they construct their data models and how subsequent data elements are referenced. If two frequently accessed data elements are more than 4KB apart, we will see two entries in the TLB. When accessing multiple pairs of data elements, this can lead to the TLB becoming full. Subsequent TLB references invoke a TLB miss and delays in ensuring that the TLB is updated appropriately. It is an unfortunate consequence of the programs we use that they utilize features such as multi-dimensional arrays and, from C programming, structures : This can be thought of as a record where we group together data elements that have some collective meaning, e.g., the name , address, and employee number would form an employee record or employee structure . We could ask our programmers to look again at their data model in lieu of the fact that we now know the issues dealing with page size, cache size, and memory access times. This is a lengthy process and needs to be thought out very carefully . One other solution is to fundamentally change the way our machines view memory. With bigger pages, it means that our big data structures will fit in fewer pages. Fewer pages means fewer TLB references, which in turn means fewer chances of a TLB miss and the associated costs in memory accesses . For HP-UX, this has become an option with the advent of the PA-8000 processor. What we can do is to change the effective page size on a program-by-program basis. This will gives us the effect described above: fewer pages meaning fewer TLB references. Some people think this is such an easy fix that we should apply this to all programs. This is not necessarily the case. If we have a small program, e.g., one of the many daemon processes we see on UNIX, and we set the page size at 1MB, that process will be allocated pages of memory in multiples of 1MB when it needs only a few KBs. This can lead to a significant waste of memory. Utilizing the idea of Variable Page Sizes needs to be done judicially and with an understanding of the underlying data access patterns. In Chapter 11, "Processes, Threads, and Bottlenecks," we discuss how we can instigate this change to a program whereby we can change the size of text as well as data pages.