While it is true that virtual-to-physical translation takes place in the processor hardware, the table that supports it resides in the kernel's memory space, and if the kernel wishes to move processes in and out of play, it must adjust entries in this table accordingly. Figure 6-3 demonstrates how the hardware's page directory, or pdir, is also known to the kernel as the hashtable. Figure 6-3. How the Kernel Views the pdir The processor populates the hashtable (htbl/pdir) with page data entries (pde), which contain a variety of status bits, the current virtual address, and the current physical address of a page (see Figure 6-4). The kernel references and maintains this structure as a table of hashed page data entries (hpde on narrow systems and hpde2_0 on wide systems). Figure 6-4. The Hashtable As the virtual map is huge in comparison to the physical map, it would be impractical to size it large enough for each virtual page to have its own pde. That's one of the reasons we utilize a hashtable. The trick then is to size the table large enough to minimize the occurrence of two virtual addresses hashing to the same entry yet small enough to avoid wasting physical memory. The basic approach is to size the hashtable in relationship to the number of translations the system is expected to manage at any one time. The simple answer would be to set it to the number of physical pages configured into the system, but this doesn't factor in all the addresses for which translations may be needed. Don't forget that PA-RISC processors use a memory-mapped I/O approach; these pseudo-memory pages may also require the creation of virtual translations. The size is determined in the following way: The total number of pages in physical memory is stored in the kernel parameter npdir. Each I/O module (buss controllers, graphics devices, …) is polled to determine its mapped memory requirements, and this value is stored in the kernel parameter niopdir. These two parameters are totaled and stored in the parameter nhtbl. We aren't quite done yet: in order to simplify the hashtable index masking, nhtbl is adjusted to a power of two. When the calculated nhtbl is an exact power of two then that value is used. In most cases the calculated value falls somewhere between two powers of two; if it less than 25% greater than the difference between them, the lower power of two value will be used. If is greater than 25% of the differential, then the larger power of two value will be used. Note the annotated study of the hpde and hpde2_0 kernel structures in Listings 6.1 and 6.2. Listing 6.1. q4> fields struct hpde The first word of the structure contains a valid bit, the 15 highest bits of the virtual page address, and 16 bits for the virtual space number 0 0 0 1 u_int pde_valid 0 1 1 7 u_int pde_vpage 2 0 2 0 u_int pde_space The second word contains status, protection and access bits. The reference, accessed, reference trap, and executed bits are used to indicate that the page is "in play" and to facilitate a "stingy cache" algorithm 4 0 0 1 u_int pde_ref 4 1 0 1 u_int pde_accessed 4 2 0 1 u_int pde_rtrap The dirty bit is set to indicate that the page's content in cache has been modified and is not in sync with its memory- resident copy. 4 3 0 1 u_int pde_dirty 4 4 0 1 u_int pde_dbrk The access rights consist of a 7-bit field and have the following translation: 0x00 PDE_AR_KR kernel read-only 0x10 PDE_AR_KRW kernel read-write 0x20 PDE_AR_KRX, PDE_AR_KXR kernel read-execute, kernel read-only 0x30 PDE_AR_KRWX, PDE_AR_KXRW kernel read-write-execute 0x0f PDE_AR_UR user read-only 0x7f PDE_AR_UX user execute-only 0x1f PDE_AR_URW user read-write 0x2f PDE_AR_URX, PDE_AR_URXKR, PDE_AR_CWX user read-execute, kernel read-only, and copy-on-write 0x3f PDE_AR_URWX user read-write-execute 0x4c PDE_AR_GATE promote on execute-only (gateway page) 0x73 PDE_AR_NOACC no access allowed 4 5 0 7 u_int pde_ar Next we can mark uncacheable pages 5 4 0 1 u_int pde_uncache The protection ID is a pseudorandom number between 0 and 32767. A bitmap is maintained to keep track of which values are in use (protid_map). If a unique protection ID cannot be found, the kernel will panic! 5 5 2 2 u_int pde_protid The execute bit is part of the stingy cache flush algorithm 7 7 0 1 u_int pde_executed The first bit of the third word is used to indicate when an update to the pde data is in progress (this alleviates the necessity of using a spin-lock during hashtable updates)while the physical page number occupies the next 20 bits 8 0 0 1 u_int pde_uip 8 7 2 4 u_int pde_phys The modified bit is used to let the kernel know the page's contents have been modified since the last time it was written to the back store (swap spaced) 11 4 0 1 u_int pde_modified The trickle and block bits facilitate operations on specific PA-RISC processors with hardware walkers and block TLB entries (only available on a limited number of older 32-bit processors) 11 5 0 1 u_int pde_ref_trickle 11 6 0 1 u_int pde_block_mapped If the alias bit is set, this structure was created as a result of a virtual page aliasing 11 7 0 1 u_int pde_alias The last word of the structure is a forward pointer to a sparse hpde entry if one is required. 12 0 4 0 * pde_next Listing 6.2. q4> fields struct hpde2_0 On wide hardware the hpde2_0 structure consists of four double words. Again, the first double word starts off with a "valid" bit (actually an "invalid" bit) followed by the high 20 bits of the virtual page and 32 bits containing the virtual space number 0 0 0 1 u_int pde_invalid 1 4 2 4 u_int pde_vpage 4 0 4 0 u_int pde_space The second double word contains various status bits, the access rights, and protection ID key 8 2 0 1 u_int pde_rtrap 8 3 0 1 u_int pde_dirty 8 4 0 1 u_int pde_dbrk 8 5 0 7 u_int pde_ar 9 4 0 1 u_int pde_uncache The PA-RISC 2.0 processor allows instruction ordering and branch prediction hints to be assigned to a page 9 5 0 1 u_int pde_order 9 6 0 1 u_int pde_br_predict 10 2 0 1 u_int pde_ref_trickle 10 3 0 1 u_int pde_block_mapped 10 4 0 1 u_int pde_executed 10 5 0 1 u_int pde_ref 10 6 0 1 u_int pde_accessed 10 7 0 1 u_int pde_modified 11 7 0 1 u_int pde_uip 12 0 3 7 u_int pde_protid 15 7 0 1 u_int pde_os The third double word begins with the virtual alias flag, has 52 bits for the physical page number (the first 25 are currently not used and labeled pde_phys_u, for "unused") 16 0 0 1 u_int pde_alias 16 7 3 1 u_int pde_phys_u 20 0 3 3 u_int pde_phys The next 5 bits are used in conjunction with performance- optimized page sizing 23 3 0 5 u_int var_page The fourth double word contains the pointer to the next sparse entry. 24 0 4 0 u_int unused_upper 28 0 4 0 u_int pde_next Let's check to see how observant you were of the annotations. Did you notice the size (in bits) of the virtual page number stored in the hpde? It is only 15 bits wide, which presents a challenge, since there are 2^20 virtual pages per space. The bits are noted to be the high-order 15 bits of the virtual page number. With this model, it seems that only one out of every 32 pages will be translatable by this table! There must be more to this story than meets the eye and there is. To understand the secret of the missing five virtual page address bits, we need to examine the hashing algorithm being used by the hardware (and understood by the kernel). |