Section 2.5. Kernel Process Table | Solaris Internals: Solaris 10 and OpenSolaris Kernel Architecture (2nd Edition)

2.5. Kernel Process Table

Every process occupies a slot in the kernel process table, which maintains a process structure (commonly abbreviated as proc structure) for the process. The process structure is relatively large, and contains all the information the kernel needs to manage the process and schedule the LWPs and kthreads for execution. As processes are created, kernel memory space for the process table is allocated dynamically by the kmem cache allocation and management routines.

The kernel process objects are allocated from object-specific kernel memory (kmem) caches. A process_cache, thread_cache, and lwp_cache are created and initialized at boot time, and kernel memory for processes, threads, and LWPs is managed through each object's respective kmem cache. Statistics on these caches can be observed with the mdb(1) kmem_cache and kmastat dcmds, as well as the kstat(1) command.

# mdb -k Loading modules: [ unix krtld genunix specfs dtrace ufs sd ip sctp usba fctl nca nfs random sppp lofs crypto ptm ipc logindmux ] > ::kmastat cache                        buf    buf    buf    memory      alloc alloc name                        size in use  total    in use    succeed  fail ------------------------- ------ ------ ------ --------- --------- ----- kmem_magazine_1               16   7982   8064    131072      7982      0 kmem_magazine_3               32   6790   6804    221184      8809      0 . . . thread_cache                 848    180    198    180224     12923      0 lwp_cache                   1408    180    192    294912       919      0 . . . process_cache               3120     50     63    200704      1509      0 . . .

The most commonly requested information for memory statistics is memory used or consumed, which can be determined for each cache from the memory in use column in the example above (the value is in bytes).

The kstats for each cache are observed with the kstat(1) command:

# kstat -n process_cache module: unix                             instance: 0 name:   process_cache                    class:    kmem_cache         align                            8         alloc                            1515         alloc_fail                       0         buf_avail                        22         buf_constructed                  14         buf_inuse                        50         buf_max                          72         buf_size                         3120         buf_total                        72         chunk_size                       3120         crtime                           246.452541137         depot_alloc                      46         depot_contention                 0         depot_free                       53         empty_magazines                  3         free                             1472         full_magazines                   0         hash_lookup_depth                0         hash_rescale                     0         hash_size                        64         magazine_size                    3         slab_alloc                       64         slab_create                      8         slab_destroy                     0         slab_free                        0         slab_size                        28672         snaptime                         284376.59969931         vmem_source                      23

The kstats maintained reflect the objects managed by the kmem allocator. See Section 11.2 for a description of the buf, depot, magazine, and slab objects that constitute a kmem cache. The same set of statistics is maintained for the thread_cache and lwp_cache. Actually, statistics are maintained for all kernel object kmem caches (try kstat -c kmem_cache on your Solaris 10 systems).

The fast, scalable kmem cache mechanism is a perfect fit for the kernel process objects. It quickly allocates and frees kernel memory as processes and threads are created and destroyed on a running system. It reuses uninitialized object structures for fast instantiation when a new process, thread, or LWP is created.

2.5.1. Process Limits

At system boot time, the kernel initializes the process_cache to begin the allocation of kernel memory for storing the process table. Initially, space is allocated for one proc structure. The table itself is implemented as a doubly linked list, such that each proc structure contains a pointer to the next process and previous processes on the list.

The maximum size of the process table is based on the amount of physical memory (RAM) in the system and is established at boot time. The system first sets an internal variable called maxusers (which has absolutely nothing to do with the maximum number of the users the system will support), using the following code.

#define MIN_DEFAULT_MAXUSERS    8u #define MAX_DEFAULT_MAXUSERS    2048u #define MAX_MAXUSERS            4096u if (maxusers == 0) {                 pgcnt_t physmegs = physmem >> (20 - PAGESHIFT);                 pgcnt_t virtmegs = vmem_size(heap_arena, VMEM_FREE) >> 20;                 maxusers = MIN(MAX(MIN(physmegs, virtmegs),                     MIN_DEFAULT_MAXUSERS), MAX_DEFAULT_MAXUSERS); } if (maxusers > MAX_MAXUSERS) {                 maxusers = MAX_MAXUSERS;                 cmn_err(CE_NOTE, "maxusers limited to %d", MAX_MAXUSERS); }                                                     See usr/src/uts/common/conf/param.c

The net effect of the code above is that maxusers is set according to memory size, with a ceiling value of MAX_MAXUSERS (4096). maxusers is subsequently used to set the kernel variables max_nprocs and maxuprc.

 /* * This allows platform-dependent code to constrain the maximum * number of processes allowed in case there are, e.g., VM limitations * with how many contexts are available. */ if (max_nprocs == 0)                 max_nprocs = (10 + 16 * maxusers); if (platform_max_nprocs > 0 && max_nprocs > platform_max_nprocs)                 max_nprocs = platform_max_nprocs; if (max_nprocs > maxpid)                 max_nprocs = maxpid; if (maxuprc == 0)                 maxuprc = (max_nprocs - reserved_procs);                                                     See usr/src/uts/common/conf/param.c

The max_nprocs value is the maximum number of processes systemwide, and maxuprc determines the maximum number of processes a non-root user can have occupying a process table slot at any time. The system actually uses a data structure, the var structure, which holds generic system configuration information, to store these values in. There are three related values:

v_proc. Set equal to max_nprocs.
v_maxupttl. The maximum number of process slots that can be used by all non-root users on the system. It is set to max_nprocs minus some number of reserved process slots (currently reserved_procs is 5).
v_maxup. The maximum number of process slots a non-root user can occupy. It is set to the maxuprc value. Note that v_maxup (an individual non-root user) and v_maxupttl (total of all non-root users on the system) end up being set to the same value, which is max_nprocs minus 5.

You can use mdb(1) to examine the values of maxusers, max_nprocs, and maxuprc on a running system.

# mdb -k Loading modules: [ unix krtld genunix specfs dtrace ufs sd ip sctp usba fctl nca nfs random sppp lofs crypto ptm ipc logindmux ] > max_nprocs/D max_nprocs: max_nprocs:     30000 > maxuprc/D maxuprc: maxuprc:        29995 > maxusers/D maxusers: maxusers:       2048 >

You can also use mdb(1) to examine the system var structure.

>  v::print "struct var" {     v_buf = 0x64     v_call = 0     v_proc = 0x7530     v_maxupttl = 0x752b     v_nglobpris = 0xaa     v_maxsyspri = 0x63     v_clist = 0     v_maxup = 0x752b     v_hbuf = 0x1000     v_hmask = 0xfff     v_pbuf = 0     v_sptmap = 0     v_maxpmem = 0     v_autoup = 0x1e     v_bufhwm = 0x14350 } >0x7530=d                 30000 >

Note that the values are displayed in base 16 (hex). You can convert to decimal right in mdb(1), as shown at the bottom of the example.

Finally, sar(1M) with the -v flag gives you the maximum process table size and the current number of processes on the system.

$ sar -v 1 1 SunOS pae1 5.10 Generic sun4u    02/24/2006 20:09:52  proc-sz    ov  inod-sz    ov  file-sz    ov   lock-sz 20:09:53  118/30000    0 21719/129797    0  556/556     0    0/0

Under the proc-sz column, the 118/30000 values represent the current number of processes (118) and the maximum number of processes (30,000).

The kernel does impose a maximum value in case max_nprocs is set in /etc/ system to something beyond what is reasonable, even for a large system. The maximum is 30,000, which is determined by the MAXPID macro in the param.h header file (available in /usr/include/sys).

In the kernel fork code, the current number of processes is checked against the v_proc parameter. If the limit is reached, the system produces an "out of processes" message on the console and increments the proc table overflow counter maintained in the cpu_sysinfo structure. This value is reflected in the ov column to the right of proc-sz in the sar(1M) output. For non-root users, a check is made against the v_maxup parameter, and an "out of per-user processes for uid (UID)" message is logged. In both cases, the calling program gets a -1 return value from fork(2), signifying an error.

The kernel maintains a /var/adm/utmp and /var/adm/wtmp file for the storage of user information used by the who(1), write(1), and login(1) commands (the accounting software and commands use utmp and wtmp as well). The PID data is maintained in a signed short data type, which has a maximum value of 32,000.

2.5.2. Thread Limits

Now that we've examined the limits the kernel imposes on the number of processes systemwide, let's look at the limits on the maximum number of LWP/kthread pairs that can exist in the system at any one time.

Each LWP has a kernel stack, allocated out of the segkp kernel address space segment. The size of the kernel segkp segment and the space allocated for LWP kernel stacks can vary according to the hardware platform. The stack itself is a default size of 24 Kbytes, and the default segkp size on both UltraSPARC and x64 platforms is 2 Gbytes. Thus, there is space for roughly (2GB ÷ 24K) 88,000 LWP stacks. This is a theoretical limitother constraining factors, such as available physical memory, may well come into play before we reach 88,000 LWPs. Also, the segkp segment is used for other pageable components of the LWP, not just the stack. Even though segkp is a pageable kernel segment, the performance of a system actively paging LWP stacks in and out would likely be unacceptable.

You can determine the size of your system's segkp segment by using kstat(1).

sol10$ kstat -n segkp module: vmem                            instance: 34 name:   segkp                           class:    vmem         alloc                           586432         contains                        0         contains_search                 0         crtime                          144.618836467         fail                            0         free                            586231         lookup                          170         mem_import                      0         mem_inuse                       26345472         mem_total                       2147483648 . . .

The mem_total field indicates 2 Gbytes for segkp on this system (26 Mbytes are actually being usedmem_inuse field).

The maximum number of user threads is constrained by the process's address space size for 32-bit binaries. Each user thread has a user stack, and the default stack size is 1 Mbyte for a 32-bit process. Since a 32-bit process has a maximum address space of 4 Gbytes (this varies slightly for different platforms), the maximum number of threads would equate to roughly (4GB ÷ 1MB) or 4,000 threads. In practice, the number is less since a process's address space is consumed by other segments (text, heap, etc.). For 64-bit processes, the default thread stack size is 2 Mbytes. The address space of a 64-bit process is large enough that limits imposed by available address space for thread stacks are virtually nonexistent. A 64-bit process tends to be constrained by other resource issues (available physical memory, LWP limits, etc.).