Section 13.2. Measuring Application Performance


13.2. Measuring Application Performance

Two different types of page-size observability tools are available in Solaris: those that describe the page sizes in use by the system or application, and those that help determine whether using large pages will benefit performance. The pmap(1M) and pagesize(1M) commands, getpagesize(3C), and meminfo(2) interfaces discover information about the system' ability to support different TLB page sizes. The TRapstat(1M) and cpustat(1M) commands can approximate the amount of time that our target application spends waiting for the platform to service TLB misses.

We can use two methods to approximate the amount of time spent on servicing TLB misses: (1) we can observe the rate of TLB misses and then multiply rate of TLB misses by the cost of the TLB miss; or (2) if TLB misses are serviced by system software, we can directly measure the time spent in the TLB miss handlers. On Solaris 8, the cpustat(1M) command measures the rate of TLB misses, whereas Solaris 9 provides a new command, TRapstat, which computes and displays the amount of time spent servicing TLB misses.

TRapstat(1M). The Solaris TRapstat command provides information about processor exceptions on UltraSPARC platforms. Since TLB misses are serviced in software on UltraSPARC microprocessors, trapstat can also provide statistics about TLB misses.

Using the TRapstat command, we can observe the number of TLB misses and the amount of time spent servicing TLB misses. The -t and -T options provide information about TLB misses. Again with TRapstat, we can use the amount of time servicing TLB misses to approximate the potential gains we could make by using a larger page size or by moving to a platform that uses a microprocessor with a larger TLB.

The -t option provides first-level summary statistics. The time spent servicing TLB misses is summarized in the lower-right corner; in this case, 46.2% of the total execution time is spent servicing misses. Miss detail is provided for TLB misses incurred in the data portion of the address space and for the instruction portion of the address space. Data is also provided for user- and kernel-mode misses (we are primarily interested in the user-mode misses, since our application likely runs in user mode).

sol9# trapstat -t 1 111 cpu m| itlb-miss %tim itsb-miss %tim | dtlb-miss %tim dtsb-miss %tim |%tim -----+-------------------------------+-------------------------------+----   0 u|         1  0.0         0  0.0 |   2171237 45.7         0  0.0 |45.7   0 k|         2  0.0         0  0.0 |      3751  0.1         7  0.0 | 0.1 =====+===============================+===============================+====  ttl |         3  0.0         0  0.0 |   2192238 46.2         7  0.0 |46.2 


For further details, use the T option to provide a perpagesize breakdown. In this example, trapstat shows us that all of the misses occurred on 8Kbyte pages.

sol9# trapstat -T 1 111 cpu m size| itlb-miss %tim itsb-miss %tim | dtlb-miss %tim dtsb-miss %tim |%tim ----------+-------------------------------+-------------------------------+----   0 u   8k|        30  0.0         0  0.0 |   2170236 46.1         0  0.0 |46.1   0 u  64k|         0  0.0         0  0.0 |         0  0.0         0  0.0 | 0.0   0 u 512k|         0  0.0         0  0.0 |         0  0.0         0  0.0 | 0.0   0 u   4m|         0  0.0         0  0.0 |         0  0.0         0  0.0 | 0.0 - - - - - + - - - - - - - - - - - - - - - + - - - - - - - - - - - - - - - + - -   0 k   8k|         1  0.0         0  0.0 |      4174  0.1        10  0.0 | 0.1   0 k  64k|         0  0.0         0  0.0 |         0  0.0         0  0.0 | 0.0   0 k 512k|         0  0.0         0  0.0 |         0  0.0         0  0.0 | 0.0   0 k   4m|         0  0.0         0  0.0 |         0  0.0         0  0.0 | 0.0 ==========+===============================+===============================+====       ttl |        31  0.0         0  0.0 |   2174410 46.2        10  0.0 |46.2 


We can conclude from this analysis that our application could potentially run almost twice as fast if we could eliminate the majority of the TLB misses. Our objective in using the mechanisms discussed below is to minimize the usermode data TLB (dTLB) misses, by instructing the application to use larger pages for its data segments. Typically, data misses are incurred in the program's heap or stack segments. We can use the Solaris multiplepagesize support commands to direct the application to use 4Mbyte pages for its heap, stack, or anonymous memory mappings.

cpustat(1M). The cpustat command programs and reads the hardware counters in the microprocessor. These counters measure hardware events within the processor itself. Typically, two counters and a larger number of events can be counted. The UltraSPARC III processors can count TLB miss events. Since Solaris 8 lacks trapstat, the CPU counters can estimate the amount of time spent servicing TLB misses.

For example, the following cpustat command instructs the system to measure the number of DTLB miss events and the number of microprocessor cycles on each processor.

sol8# cpustat -c pic0=Cycle_cnt,pic1=DTLB_miss 1    time cpu event      pic0       pic1   1.006   0  tick 663839993    3540016   2.006   0  tick 651943834    3514443   3.006   0  tick 630482518    3398061   4.006   0  tick 634483028    3418046   5.006   0  tick 651910256    3511458   6.006   0  tick 651432039    3510201   7.006   0  tick 651512695    3512047   8.006   0  tick 613888365    3309406   9.006   0  tick 650806115    3510292 


By default, the cpustat command reports only counts representing the usermode processes. This cpustat output shows us that on processor 0, a usermode process consumes approximately 650 million cycles and that 3.5 million dTLB misses are serviced. An UltraSPARC TLB miss typically ranges from about 50 cycles (if the TLB entry being loaded is found in the microprocessor's cache) to about 300 cycles (if a memory load is required to fetch the new TLB entry). We can, therefore, approximate that between 175 million and 1050 million cycles are spent servicing TLB misses, per 1 second sample.

A quick check of the processor speed allows us to calculate the ratio of time spent servicing misses.

sol8# psrinfo -v Status of processor 0 as of: 11/10/2002 20:14:09   Processor has been on-line since 11/05/2002 20:59:17.   The sparcv9 processor operates at 900 MHz,         and has a sparcv9 floating point processor. 


Our microprocessor is running at 900 MHz, providing 900 million cycles per second. Therefore, at least 175/900, or 19%, of the time, is spent servicing TLB misses. The actual number could be larger if a large fraction of the TLB misses require memory loads.

13.2.1. Determination Allocated Page Sizes

The pmap command allows us to query a target process about pagesize information, and the meminfo system call provides a programmatic query to the operating system for information about the page sizes provided to it.

pmap(1). The pmap command displays the page sizes of memory mappings within the address space of a process. The xs option directs pmap to show the page size for each mapping.

sol9# pmap -sx `pgrep testprog` 2909:   ./testprog  Address  Kbytes     RSS    Anon  Locked Pgsz Mode   Mapped File 00010000       8       8       -       -   8K r-x--  dev:277,83 ino:114875 00020000       8       8       8       -   8K rwx--  dev:277,83 ino:114875 00022000  131088  131088  131088       -   8K rwx--    [ heap ] FF280000     120     120       -       -   8K r-x--  libc.so.1 FF29E000     136     128       -       -    - r-x--  libc.so.1 FF2C0000      72      72       -       -   8K r-x--  libc.so.1 FF2D2000     192     192       -       -    - r-x--  libc.so.1 FF302000     112     112       -       -   8K r-x--  libc.so.1 FF31E000      48      32       -       -    - r-x--  libc.so.1 FF33A000      24      24      24       -   8K rwx--  libc.so.1 FF340000       8       8       8       -   8K rwx--  libc.so.1 FF390000       8       8       -       -   8K r-x--  libc_psr.so.1 FF3A0000       8       8       -       -   8K r-x--  libdl.so.1 FF3B0000       8       8       8       -   8K rwx--     [ anon ] FF3C0000     152     152       -       -   8K r-x--  ld.so.1 FF3F6000       8       8       8       -   8K rwx--  ld.so.1 FFBFA000      24      24      24       -   8K rwx--    [ stack ] -------- ------- ------- ------- ------- total Kb  132024  132000  131168       -. 


The pmap command shows us the MMU page size for each mapping. In this case, 8 Kbytes are used for all mappings. To demonstrate a larger page size, we can use the Solaris ppgsz command (we discuss ppgsz in more detail in a later section) to set the page size for the heap of our test program to 4 Mbytes.

sol9# ppgsz -o heap=4M ./testprog & sol9# pmap -sx `pgrep testprog` 2953:   ./testprog  Address  Kbytes     RSS    Anon  Locked Pgsz Mode    Mapped File 00010000       8       8       -       -   8K r-x--  dev:277,83 ino:114875 00020000       8       8       8       -   8K rwx--  dev:277,83 ino:114875 00022000    3960    3960    3960       -   8K rwx--     [ heap ] 00400000  131072  131072  131072       -   4M rwx--     [ heap ] FF280000     120     120       -       -   8K r-x--  libc.so.1 FF29E000     136     128       -       -    - r-x--  libc.so.1 FF2C0000      72      72       -       -   8K r-x--  libc.so.1 FF2D2000     192     192       -       -    - r-x--  libc.so.1 FF302000     112     112       -       -   8K r-x--  libc.so.1 FF31E000      48      32       -       -    - r-x--  libc.so.1 FF33A000      24      24      24       -   8K rwx--  libc.so.1 FF340000       8       8       8       -   8K rwx--  libc.so.1 FF390000       8       8       -       -   8K r-x--  libc_psr.so.1 FF3A0000       8       8       -       -   8K r-x--  libdl.so.1 FF3B0000       8       8       8       -   8K rwx--    [ anon ] FF3C0000     152     152       -       -   8K r-x--  ld.so.1 FF3F6000       8       8       8       -   8K rwx--  ld.so.1 FFBFA000      24      24      24       -   8K rwx--    [ stack ] -------- ------- ------- ------- ------- total Kb  135968  135944  135112       - 


meminfo(2). The meminfo() system call enables a program to inquire about the physical pages mapping its address space. This system call provides a programmatic way of determining the page sizes allocated within a process's address space. An array is filled with a description of each page that backs the mapping.

NAME      meminfo - provide information about memory SYNOPSIS      #include <sys/types.h>      #include <sys/mman.h>      int meminfo(const uint64_t inaddr[], int  addr_count,  const      uint_t   info_req[],  int  info_count,  uint64_t  outdata[],      uint_t validity[]); DESCRIPTION      The meminfo() function provides  information  about  virtual      and  physical memory particular to the calling process.  The      user or developer of  performance  utilities  can  use  this      information to analyze system memory allocations and develop      a better understanding of the factors affecting  application      performance. 


13.2.2. Discovery of Supported Page Sizes

The three commands that enable us to determine information about the page size supported by Solaris are described in this section.

pagesize(1M). The pagesize command displays the base page size (default page size) used by the Solaris Operating System on the given microprocessor. The default is currently 8 Kbytes for all UltraSPARC platforms, and 4 Kbytes on x86/x64 platforms.

sol8# pagesize 8192 


The pagesize command can also display the available page sizes on the given microprocessor in Solaris. In this example, we can see that four page sizes are available on our UltraSPARC processor.

sol9# pagesize -a 8192 65536 524288 4194304 


getpagesize(3C). The getpagesize() function returns the base page size in bytes.

getpagesizes(3C). The getpagesizes() function reports the available page sizes on the given microprocessor.

NAME      getpagesizes - get system supported page-sizes SYNOPSIS      #include <sys/mman.h>      int getpagesizes(size_t pagesize[], int nelem); DESCRIPTION      The getpagesizes() function returns  either  the  number  of      different  page  sizes supported by the system or the actual      sizes themselves.  When called with nelem as 0 and  pagesize      as NULL, getpagesizes() returns the number of supported page      sizes. Otherwise, up to nelem page-sizes are  retrieved  and      assigned  to successive elements  of pagesize[].  The return      value is the number of  page  sizes  retrieved  and  set  in      pagesize[]. 





SolarisT Internals. Solaris 10 and OpenSolaris Kernel Architecture
Solaris Internals: Solaris 10 and OpenSolaris Kernel Architecture (2nd Edition)
ISBN: 0131482092
EAN: 2147483647
Year: 2004
Pages: 244

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net