Section 13.3. Configuring for Multiple Page Sizes


13.3. Configuring for Multiple Page Sizes

Once we determine that our application warrants the use of large pages, we need to construct a strategy for determining what parts of our application to enhance to use large pages. For example, should we attempt to enable large pages for our target process's heap, stack, text, etc.? The TRapstat utility gives us a little information about the types of our address space that incur the TLB misses.

The instruction TLB (iTLB) miss information is likely a result from the process's text and library text since instructions typically reside in these mappings. It is possible, however, for a program to execute code from other mappings; for example, the Java virtual machine compiles instructions onthefly into its heap and then executes from there. However, for the vast majority of applications, we can first guess that iTLB misses result from the text/library mappings.

Data TLB misses are likely to occur from the program's writable segments: its heap, stack, data mapping and readonly data within the text mapping.

The default page size (base page size) for the Solaris OS is 8 Kbytes on UltraSPARC and 4 Kbytes on Intel x86 microprocessors. Larger pages (4 Mbytes) are used by the Solaris kernel for its instruction and data sections; however, user applications requiring larger pages must explicitly request them.

The use of larger page sizes in Solaris 2.6 through Solaris 8 is only available through a special form of System V shared memory. To optimize database performance, we can use this form of shared memoryintimate shared memory (ISM). ISM is requested by the shmat(2) system call with the SHM_SHARE_MMU flag and is allocated as 4Mbyte pages if possible. Databases such as Oracle, Informix, and Sybase request shared memory by using this flag and typically perform as much as 10%20% better as a result of a reduced TLB miss rate.

Solaris 9 introduces a generic framework for allowing user applications to request larger page sizes. At the same time, ISM was also enhanced to take advantage of the other supported large page sizes (e.g., 64 Kbytes and 512 Kbytes). Unmodified applications can be directed to use larger page sizes by means of the ppgsz(1M) command and the libmpss.so library. Applications can also be customized to request larger page sizes by the memcntl(2) system call.

The Solaris 9 largepage infrastructure allows larger pages to be requested for the mappings of /dev/zero, that is, the heap, stack, and other anonymous mappings.

13.3.1. Enabling Large Pages

Solaris 9 provides a new framework: Multiple PageSize Support (MPSS). This allows larger page sizes to be requested for user processes. The memcntl() system call specifies pagesize advice for a given address range. A wrapper program, ppgsz, and an interposition library, libmpss.so, call memcntl() on behalf of the target process so that unmodified binaries can make use of larger page sizes.

13.3.2. Advising Page-Size Preferences with ppgsz(1M)

The ppgsz command is a wrapper that advises a preferred page size for a process's heap or stack of a target process. These pagesize preferences are inherited across fork() but not across exec(). Thus, if the target program spawns (forks then execs) another program, page sizes will not be inherited. If inheritance of page sizes is required, the mpss.so library should be used instead.

For example, to start a target process with 4Mbyte pages for its heap, we could use the ppgsz wrapper.

sol9# ppgsz -o heap=4M ./testprog & sol9# pmap -sx `pgrep testprog` 2953:   ./testprog  Address  Kbytes     RSS    Anon  Locked Pgsz Mode    Mapped File 00010000       8       8       -       -   8K r-x--  dev:277,83 ino:114875 00020000       8       8       8       -   8K rwx--  dev:277,83 ino:114875 00022000    3960    3960    3960       -   8K rwx--     [ heap ] 00400000  131072  131072  131072       -   4M rwx--     [ heap ] FF280000     120     120       -       -   8K r-x--  libc.so.1 FF29E000     136     128       -       -    - r-x--  libc.so.1 FF2C0000      72      72       -       -   8K r-x--  libc.so.1 FF2D2000     192     192       -       -    - r-x--  libc.so.1 FF302000     112     112       -       -   8K r-x--  libc.so.1 FF31E000      48      32       -       -    - r-x--  libc.so.1 FF33A000      24      24      24       -   8K rwx--  libc.so.1 FF340000       8       8       8       -   8K rwx--  libc.so.1 FF390000       8       8       -       -   8K r-x--  libc_psr.so.1 FF3A0000       8       8       -       -   8K r-x--  libdl.so.1 FF3B0000       8       8       8       -   8K rwx--     [ anon ] FF3C0000     152     152       -       -   8K r-x--  ld.so.1 FF3F6000       8       8       8       -   8K rwx--  ld.so.1 FFBFA000      24      24      24       -   8K rwx--     [ stack ] -------- ------- ------- ------- ------- total Kb  135968  135944  135112       - 


13.3.3. Interposing Shared Libraries with libmpss.so

The mpss.so shared object in /usr/lib provides a means by which the preferred stack or heap page size can be selectively configured for launched processes and their descendants. The library has an the advantage over the wrapper in that page sizes are inherited across exec(). To enable mpss.so, ensure that the following string is present in the environment (see ld.so.1(1)) along with one or more MPSS (multiple pagesize support) environment variables:

sol9# LD_PRELOAD=$LD_PRELOAD:mpss.so.1 


Once preloaded, the mpss.so.1 shared object reads the following environment variables to determine preferred pagesize requirements and processes for which these requirements are specific:

MPSSHEAP=size MPSSSTACK=size       MPSSHEAP and  MPSSSTACK  specify  the  preferred  page       sizes for the heap and stack, respectively. The specified       page  size(s)  are  applied   to   all   created       processes. MPSSCFGFILE=config-file       config-file is a text file which contains one or  more       mpss configuration entries of the form:       exec-name:heap-size:stack-size 


For example, the following commands enable 4Mbyte pages for the heap of all subsequently started processes:

sol9# export LD_PRELOAD=$LD_PRELOAD:mpss.so.1 sol9# export MPSSHEAP=4M sol9# ./testprog 


You can use a configuration file to configure a mechanism to enable 4Mbyte pages only for matching processes. For example, the following commands enable 4Mbyte pages for just testprog:

sol9# cat >/usr/local/etc/mpss.cfg testprog:4M:* sol9# export LD_PRELOAD=$LD_PRELOAD:mpss.so.1 sol9# export MPSSCFGFILE=/usr/local/etc/mpss 


See mpss.so.1(1) for all available configuration options.

13.3.4. Request Larger Page Sizes with the Compiler

The Sun Studio compilers provide options to cause the target application to request specific page sizes. The following options are supported for the compiler:

xpagesize= n. (SPARC) Sets the preferred page size for the stack and the heap. The n value must be one of the following: 8K, 64K, 512K, 4M, 32M, 256M, 2G, 16G, or default.

You must specify a valid page size for the Solaris Operating System on the target platform, as returned by getpagesize(3C). If you do not specify a valid page size, the request is silently ignored at runtime. The Solaris OS offers no guarantee that the pagesize request will be honored. You can use pmap(1) or meminfo(2) to determine the page size of the target platform.

The xpagesize option has no effect unless you use it at compile time and at link time. Note: This feature is not available on the Solaris 7 and Solaris 8 operating environments. A program compiled with this option will not link on the Solaris 7 and Solaris 8 operating environments.

If you specify xpagesize=default, the Solaris OS sets the page size. xpagesize without an argument is the equivalent to xpagesize=default.

Compiling with this option has the same effect as setting the LD_PRELOAD environment variable to mpss.so.1 with the equivalent options, or running the Solaris 9 command ppgsz(1) with the equivalent options before running the program. See the Solaris 9 man pages for details.

This option is a macro for xpagesize_heap and xpagesize_stack. These two options accept the same arguments as xpagesize: 8K, 64K, 512K, 4M, 32M, 256M, 2G, 16G, or default. You can set them both with the same value by specifying xpagesize, or you can specify them individually with different values.

xpagesize_heap= n. (SPARC) Sets the page size in memory for the heap. n can be 8K, 64K, 512K, 4M, 32M, 256M, 2G, 16G, or default. You must specify a valid page size for the Solaris OS on the target platform, as returned by getpagesize(3C). If you do not specify a valid page size, the request is silently ignored at runtime.

You can use pmap(1) or meminfo(2) to determine page size at the target platform. If you specify xpagesize_heap=default, the Solaris OS sets the page size. xpagesize_heap without an argument is equivalent to xpagesize_heap=default.

Compiling with this option has the same effect as setting the LD_PRELOAD environment variable to mpss.so.1 with the equivalent options, or running the Solaris 9 command ppgsz(1) with the equivalent options before running the program. See the Solaris 9 man pages for details.

Note: This feature is not available on the Solaris 7 and Solaris 8 operating environments. A program compiled with this option will not link on the Solaris 7 and Solaris 8 operating environments.

xpagesize_stack= n. (SPARC) Sets the page size in memory for the stack. n can be 8K, 64K, 512K, 4M, 32M, 256M, 2G, 16G, or default. You must specify a valid page size for the Solaris OS on the target platform, as returned by getpagesize(3C). If you do not specify a valid page size, the request is silently ignored at runtime. You can use pmap(1) or meminfo(2) to determine page size at the target platform.

If you specify xpagesize_stack=default, the Solaris OS sets the page size. xpagesize_stack without an argument is equivalent to xpagesize_stack= default.

Compiling with this option has the same effect as setting the LD_PRELOAD environment variable to mpss.so.1 with the equivalent options, or running the Solaris 9 command ppgsz(1) with the equivalent options before running the program. See the Solaris 9 man pages for details.

Note: This feature is not available on the Solaris 7 and Solaris 8 operating environments. A program compiled with this option will not link on the Solaris 7 and Solaris 8 operating environments.

13.3.5. Interfaces to Request Larger Page Sizes

The memcntl(3C) interface has been enhanced to allow pagesize requests to be made on behalf of a process. Thus, an application can automatically request larger page sizes when appropriate. Such an application wanting to request a larger page size should do so by using the existing memcntl() interface:

int memcntl(caddr_t addr, size_t len, int cmd, caddr_t arg,int attr, int mask); 


With the cmd argument we can now specify a new control operation, MC_HAT_ADVISE, for pagesize operations. When the cmd argument is set to MC_HAT_ADVISE, the caddr_t argument is interpreted as a pointer to a new structure, as shown below. Currently, only three commands are supported; each command sets a preferred page size. mha_flags must always be set to zero. It is reserved for future use. Only one command can be specified at a time.

struct memcntl mha{         uint_t mha_cmd; /* command(s) */         uint_t mha_flags; /* flags */         size_t mha_pagesize; }; 


If mha_cmd is set to MHA_MAPSIZE_VA, we apply the set preferred pagesize operation to the address range (addr, addr + len). mha_pagesize must be a supported page size, as returned by getpagesizes(), or zero to let the system select the page size. The address and size of the range must be aligned to the new preferred page size. The access protections within new pagesize regions contained in the range must be the same or the operation will fail. If there are holes in the address range or if the mapping is mapped with MAP_NORESERVE, the operation will fail. The address range can be contained inside a larger mapping or can span many mappings of varying sizes.

The memcntl() interface promotes or demotes the preferred page sizes for any MAP_PRIVATE /dev/zero mappings, provided that the constraints mentioned above are met. Two special objects in the user address space require special handling: the process's heap and the primary thread stack (not the stack for additional threads).

The heap consists of the last .bss adjacent to the brk area and the brk area itself. Figure 13.1 illustrates the mapping procedure.

Figure 13.1. Process Address Space Mappings


For these two cases we have separate commands:

MHA_MAPSIZE_STACK  /* token for processes main stack */ MHA_MAPSIZE_BSSBRK /* token heap */ 


When MHA_MAPSIZE_STACK and MHA_MAPSIZE_BSSBRK are used, mha_pagesize must be a supported page size, as returned by getpagesizes(3C), or zero to let the system select the page size. The operation is then applied to the entire existing stack or heap mappings. The advice is then used for future page allocations. These commands for changing the preferred page size for stack or heap may first adjust the existing range in accordance with the new page size. This could involve creating new segments to pad out the base and length of the existing range to the new, preferred, pagesize alignment.

Applications need to know what to align their memory requests on to attain maximum performance (e.g., when using mmap() for creating new mappings) and to avoid misaligned mprotect(), munmap(), and mmap() requests that could result in page demotion (when larger pages are broken up into smaller pages).

Most applications that use mmap() pass in NULL for its addr argument to let the OS manage its address space. If applications also want to use large pages with memcntl(), they should suggest to the OS that it specify, by means of a new flag, MAP_ALIGN, the minimum pagesize alignment desired. If specified, mmap() interprets the addr argument only as the required minimum alignment and is free to find a hole in the user address space that satisfies the minimum alignment specified in the addr argument. The alignment must be a power of two multiple of PAGESIZE, or zero to let the system choose the alignment. If MAP_ALIGN is specified along with MAP_FIXED, the request will fail. If the alignment request cannot be satisfied, mmap() will also fail.

For reference, we provide the example below. This code fragment sets the page size for the program's heap to 4 Mbytes. Note the use of memalign, to align the request on a 4Mbyte boundary. Since the heap starts on a boundary that is not 4Mbytealigned, the first few megabytes of the heap may reside on 8Kbyte pages. If the performancesensitive data structures reside within this area, the program might not realize the full benefits of a larger page size. By allocating a 4Mbyte aligned area, we increase the chance that the subsequent virtual addresses allocated will land on a large page.

#include <sys/types.h> #include <sys/mman.h> #include <stdlib.h> #define MEGABYTE ((size_t)(1024 * 1024)) #define FOUR_MEGABYTE ((size_t)4 * MEGABYTE) int main(int argc, char *argv[]) {        struct memcntl_mha mha;        char *my_memory; /* Set pagesize to 4MB for heap */ mha.mha_cmd = MHA_MAPSIZE_BSSBRK; mha.mha_flags = 0; mha.mha_pagesize = FOUR_MEGABYTE; memcntl(NULL, 0, MC_HAT_ADVISE, (char *)&mha, 0, 0); /* Ensure user memory starts on first large page */ my_memory = (char *)memalign(FOUR_MEGABYTE, (size_t)100 * MEGABYTE); 


13.3.6. CPU Specific Large Page Support

The TLB configurations are quite different across versions of UltraSPARC processors, but they share a few items in common. UltraSPARC I through IV support four page sizes: 8 Kbytes, 64 Kbytes, 512 Kbytes, and 4 Mbytes. In addition, there are separate TLBs for the instruction and data paths.

UltraSPARC I and II. The UltraSPARC I and II microprocessors (143 MHz480 MHz) have two TLBs, one for the instruction path and one for the data path. Each TLB is a 64entry, fully associative TLB that supports all four page sizes. User applications can use any of the four page sizes.

750 MHz UltraSPARC III. The 750 MHz UltraSPARC III microprocessor has four TLBs: two for instruction and two for data. The instruction TLBs are implemented as a 16entry, fully associative TLB that supports all four page sizes and a larger 128entry TLB that supports only 8Kbyte entries. The data TLBs are implemented as a 16entry, fully associative TLB that supports all four page sizes and a larger 512entry, twoway set associative TLB that supports only 8Kbyte entries.

The 16entry DTLB has nine locked entries (locked by software for the Solaris kernel), leaving only seven slots for large page sizes. Thus, use of large pages is typically not beneficial on 750 MHz UltraSPARC III systems.

900 MHz+ UltraSPARC III. The 900 MHz onwards UltraSPARC III microprocessors have five TLBs: two for instruction and three for data. The instruction TLBs are configured as a 16entry, fully associative TLB that supports all four page sizes and a larger 128entry TLB that supports only 8Kbyte entries. The data TLBs are configured as a 16entry, fully associative TLB that supports all four page sizes and two larger 512entry, twoway set associative TLBs that support one page size per process. The increased size of the data TLBs on a 900 MHz UltraSPARC III provides a large TLB spread (2 Gbytes when 4Mbyte pages are used) and typically increases performance significantly for large memory applications.

The large data TLBs are configured automatically in accordance with the most common page sizes in a process's address space. A process using one large page size in addition to the base page size (8 Kbytes) will have one of its large TLBs automatically programmed to enable the large page size when eight or more pages are using the larger page size within the process (it is assumed that the smaller TLB is available if there are fewer than eight pages).

Since the large TLBs support all four page sizes, large pages can be used effectively on UltraSPARC III. However, since the large TLBs can only be configured for one page size at a time per process, only two pages sizes should be used concurrently, where one of those pages sizes should be the system's base page size (8 Kbytes) for mappings not using large pagesfor example. program text, libraries, etc, leaving just one other larger page size available for the remainder of the mappings. The most common selections for page sizes are 8 Kbytes and 4 Mbytes, providing the greatest TLB spread for the large TLB.

x86. The implementation of Solaris on x86 processors provides support for 4Kbyte pages only.

AMD 64/x64. The AMD Opteron processor supports both 4Kbyte and 2Mbyte page sizes.




SolarisT Internals. Solaris 10 and OpenSolaris Kernel Architecture
Solaris Internals: Solaris 10 and OpenSolaris Kernel Architecture (2nd Edition)
ISBN: 0131482092
EAN: 2147483647
Year: 2004
Pages: 244

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net