Section 16.7. MPO APIs


16.7. MPO APIs

Several new APIs added to Solaris OS will help developers explore ways in which MPO technology can optimize an application's performance.

16.7.1. Informational

It is not always easy to identify potential memory-locality-related problems simply by studying an algorithm in isolation. Furthermore, the use of autoparallelizing compilers can introduce memory locality problems that do not exist in the serial algorithm. The APIs in this section allow an application to dynamically determine how its threads and virtual memory have been assigned to processors and physical memory by the kernel. The following is a high-level description of each of the new APIs. The full details of each can be found in the man pages starting with the Solaris 9 OS.

getcpuid(3C). This routine returns the cpuid on which the calling thread was running when it executed the call. Unless a thread is bound to a CPU, the kernel is free to schedule it on any CPU in the system (but following lgroup policies). Hence, there is no guarantee that a thread will still be running on this CPU.

lgroup_home(3C). This routine returns the ID of the home lgroup of the calling thread. A thread's home lgroup is a much less transitory value than the current CPU ID. Once a thread is assigned a home lgroup, that lgroup will not change unless the thread is explicitly bound to a CPU in a different lgroup or unless all the CPUs in the lgroup are taken offline. Note that this permanence may not continue to be true in future releases of the Solaris. It is possible that in the future, threads will eventually migrate from one lgroup to another in response to system utilization and migration policies.

meminfo(2). The meminfo(2) system call allows us to query the operating system about both virtual and physical memory assigned to the calling process. Given a virtual address in the calling process's address space, this call can return the physical address, the lgroup to which that physical address belongs, and the size of the page. Given a physical address, the call can return the lgroup in which the memory exists.

This call is useful for diagnostic and verification purposes. Knowing where a range of memory is physically stored can help explain why accesses to that memory take longer than expected. This information can then be used to determine where, or if, calls to madvise(3C) (see next section) might allow the kernel to make better decisions about where memory should be allocated. Once calls to madvise(3C) have been added, the meminfo(2) call can be used to verify that the kernel has made the expected changes in its behavior.

16.7.1.1. MPO Advice APIs

The goal of MPO technology in the kernel is to deliver good performance on servers with memory locality properties without making any changes to the applications. However, some applications could achieve better performance by improving the kernel's default placement policies.

For example, an application in which one thread allocates and initializes a large dataset from private memory will likely have all of its memory located on a single lgroup. If the application then spawns many new threads to access that data, a significant number of those threads are likely to be running on remote lgroups. Rather than making extensive modifications to an application, we can use the madvise(3C) API, which provides a relatively easy method for improving such an application's performance. While madvise(3C) is easy to use, using its MADV_ACCESS flags has some overhead. Consequently, we obtain optimal performance by simply having each thread initialize its own data for this example. This means that, for some applications, it may be the case that optimal performance may require that the application be restructured so that each thread initializes and uses a limited portion of the full dataset.

madvise(3C). This routine allows an application to provide the kernel with hints about how it expects a range of memory to be used. Specifically, it allows an application to indicate whether a range of memory will be used by many threads (MADV_ACCESS_MANY) or by the next thread that touches it (MADV_ACCESS_LWP).

The MADV_ACCESS_MANY hint may be used by an application that creates and initializes a large data structure in private memory and then creates multiple threads that will all access that data structure. This behavior is typical of many autoparallelized applications. Since the data structure is created while the application has only a single thread, by default the kernel will attempt to allocate it all on a single Uniboard. This hint will prompt the kernel to allocate the data structure according to a random placement policy, which offers higher bandwidth to all the application's threads.

The MADV_ACCESS_LWP hint is most useful when an application changes how it expects a range of memory to be used. If after this hint is received, the next thread to touch a page in the specified range is in a different lgroup from the memory, then the kernel may migrate the page to that thread's lgroup. This can be useful for applications that have multiple phases, each with distinctly different memory usage patterns. It can also be used for applications that allocate a large ISM segment in order to get large pages but that do not intend to share those pages with other threads. Note that migrating memory can be time consuming, so use MADV_ACCESS_LWP and MADV_ACCESS_MANY with discretion.

madv.so.1. madv.so.1 is a shared object that is superimposed on memory allocation system calls to allow the user to apply the hints described above without modifying the source code of the application. This functionality is less precise than that offered by the madvise() interface, since a user cannot choose to apply the advisement to specific address ranges, but only to the whole heap, just ISM or Dynamic ISM (DISM) segments, just private segments, and so on. This functionality is most useful for rapid prototyping and for tuning applications for which the source code is not available.

16.7.1.2. Explicit Lgroup APIs

The lgroup APIs export the lgroup abstraction for applications to use for observability and performance tuning. A new library, called liblgrp, contains the new APIs. Applications can use the APIs to perform the following tasks:

  • Traverse the group hierarchy

  • Discover the contents and characteristics of a given lgroup

  • Affect the thread and memory placement on lgroups

16.7.2. Verifying the Interface Version

The lgrp_version(3LGRP) function must be used to verify the presence of a supported lgroup interface before the lgroup API is used. The lgrp_version() function has the following syntax:

#include <sys/lgrp_user.h> int lgrp_version(const int version); 


The lgrp_version() function takes a version number for the lgroup interface as an argument and returns the lgroup interface version that the system supports. When the current implementation of the lgroup API supports the version number in the version argument, the lgrp_version() function returns that version number. Otherwise, the lgrp_version() function returns LGRP_VER_NONE.

#include <sys/lgrp_user.h> if (lgrp_version(LGRP_VER_CURRENT) != LGRP_VER_CURRENT) {     fprintf(stderr, "Built with unsupported lgroup interface %d\n",         LGRP_VER_CURRENT);     exit (1); } 


16.7.3. Initialization of the Locality Group Interface

Applications must call lgrp_init(3LGRP) in order to use the APIs for traversing the lgroup hierarchy and to discover the contents of the lgroup hierarchy. The call to lgrp_init() gives the application a consistent snapshot of the lgroup hierarchy. The application developer can specify whether the snapshot contains only the resources that are available to the calling thread specifically or the resources that are available to the operating system in general. The lgrp_init() function returns a cookie that is used for the following tasks:

  • Navigating the lgroup hierarchy

  • Determining the contents of an lgroup

  • Determining whether the snapshot is current

lgrp_init(). The lgrp_init() function initializes the lgroup interface and takes a snapshot of the lgroup hierarchy.

lgrp_fini(). The lgrp_fini(3LGRP) function ends the use of a given cookie and frees the corresponding lgroup hierarchy snapshot.




SolarisT Internals. Solaris 10 and OpenSolaris Kernel Architecture
Solaris Internals: Solaris 10 and OpenSolaris Kernel Architecture (2nd Edition)
ISBN: 0131482092
EAN: 2147483647
Year: 2004
Pages: 244

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net