Section 7.7. Kernel Interfaces for Resource Controls


7.7. Kernel Interfaces for Resource Controls

The rctl subsystem provides a mechanism for kernel components to register their individual resource controls with the system as a whole, such that those controls can subscribe to specific actions while being associated with the various process-model entities provided by the kernel: process, task, project, and zone. (In principle, only minor modifications would be required to connect the resource control functionality to non-process-model entities associated with the system.)

Subsystems register their rctls with rctl_register(). Subsystems also wishing to provide additional limits on a given rctl can modify them once they have the rctl handle. Each subsystem should store the handle to its rctl for direct access.

A primary dictionary, rctl_dict, contains a hash of the ID to the default control definition for each controlled resource-entity pair on the system. The ::rctl_dict dcmd can be used to walk the resource dictionary.

> ::rctl_dict ID NAME                             ADDR    TYPE GLOBAL_FLAGS 12 process.max-port-events      d9911c80 process 0x20100000 11 process.max-msg-messages     d9911cb8 process 0x20100000 10 process.max-msg-qbytes       d9911cf0 process 0x20400000  9 process.max-sem-ops          d9911d28 process 0x20100000  8 process.max-sem-nsems        d9911d60 process 0x20100000  7 process.max-address-space    d9911d98 process 0x62400000  6 process.max-file-descriptor  d9911dd0 process 0x60100000  5 process.max-core-size        d9911e08 process 0x62400000  4 process.max-stack-size       d9911e40 process 0x62400000  3 process.max-data-size        d9911e78 process 0x62400000  2 process.max-file-size        d9911eb0 process 0x68400000  1 process.max-cpu-time         d9911ee8 process 0x55200000 27 task.max-cpu-time            d9911938    task 0x15a00000 26 task.max-lwps                d9911970    task 0x00100000 23 project.max-contracts        d9911a18 project 0x20100000 22 project.max-device-locked-memory  d9911a50 project  0xa0400000 21 project.max-port-ids         d9911a88 project 0xa0100000 20 project.max-shm-memory       d9911ac0 project 0xa0400000 19 project.max-shm-ids          d9911af8 project 0xa0100000 18 project.max-msg-ids          d9911b30 project 0xa0100000 17 project.max-sem-ids          d9911b68 project 0xa0100000 16 project.max-crypto-memory    d9911ba0 project 0xa0400000 15 project.max-tasks            d9911bd8 project 0x80100000 14 project.max-lwps             d9911c10 project 0x80100000 13 project.cpu-shares           d9911c48 project 0x92100000 25 zone.max-lwps                d99119a8    zone 0x80100000 24 zone.cpu-shares              d99119e0    zone 0x92100000 


A secondary dictionary, rctl_dict_by_name, contains a hash of name to resource control handles. The resource control handles are distributed by the rctl_ids ID space. The handles are private and not to be advertised to userland; all userland interactions are through the rctl names.

Entities inherit their rctls from their predecessor. Since projects have no ancestor, they inherit their rctls from the rctl dictionary for project rctls. It is expected that project controls will be set to their appropriate values shortly after project creation, presumably from a policy source such as the project database.

7.7.1. Data Structures

The rctl_set_t attached to each of the process model entities is a simple hash table keyed on the rctl handle assigned at registration. The entries in the hash table are rctl_ts, whose relationship with the active control values on that resource and with the global state of the resource we illustrate in Figure 7.3.

Figure 7.3. rctl EnTRy Relationships


That is, the rctl contains a back pointer to the global resource control state for this resource, which is also available in the rctl_dict hash table mentioned earlier. The rctl contains two pointers to resource control values: values indicates the entire sequence of control values and cursor indicates the currently active control valuethe next value to be enforced. The value list itself is an open, doubly linked list, the last non-NULL member of which is the system value for that resource (being the theoretical or conventional maximum allowable value for the resource on this OS instance). The ::rctl_list dcmd can be used to walk a processes rctl list.

> d847f420::rctl_list d862f3f8          8 : process.max-sem-nsems         (cur) 0x200               privileged     flags=<DENY>               0x7fff                  system     flags=<DENY,MAX> d862f3e0          1 : process.max-cpu-time         (cur) 0xffffffffffffffff  privileged     flags=<SIGNAL,MAX>               0xffffffffffffffff      system     flags=<MAX> d862f3c8          9 : process.max-sem-ops         (cur) 0x200               privileged     flags=<DENY>               0x7fffffff              system     flags=<DENY,MAX> 


7.7.2. Operations Vector

The ops vector contains methods to perform set, test, and actions.

/*  * Default resource control callback functions.  */ typedef struct rctl_ops {         void            (*rco_action)(struct rctl *, struct proc *,             rctl_entity_p_t *);         rctl_qty_t      (*rco_get_usage)(struct rctl *, struct proc *);         int             (*rco_set)(struct rctl *, struct proc *,             rctl_entity_p_t *, rctl_qty_t);         int             (*rco_test)(struct rctl *, struct proc *,             rctl_entity_p_t *, rctl_val_t *, rctl_qty_t, uint_t); } rctl_ops_t; #define RCTLOP_ACTION(r, p, e) (r->rc_dict_entry->rcd_ops->rco_action(r, p, e)) #define RCTLOP_GET_USAGE(r, p) (r->rc_dict_entry->rcd_ops->rco_get_usage(r, p)) #define RCTLOP_SET(r, p, e, v) (r->rc_dict_entry->rcd_ops->rco_set(r, p, e, v)) #define RCTLOP_TEST(r, p, e, v, i, f) \        (r->rc_dict_entry->rcd_ops->rco_test(r, p, e, v, i, f))                                                                           See sys/rctl.h 


Subsystems publishing rctls need not provide instances of all of the functions specified by the ops vector. In particular, if general rctl_*() entry points are not being called, certain functions can be omitted. These align as follows.

rco_set() You may wish to provide a set callback if locking circumstances prevent it or if the performance cost of requesting the enforced value from the resource control is prohibitively expensive. For instance, the currently enforced file size limit is stored on the process in the p_fsz_ctl to maintain read()/write() performance. int rco_test(rctl_hndl_t, rctl_set_t *, struct proc *, rctl_qty_t, uint_t); You must provide a test callback if you are using the rctl_test() interface. An action callback is optional. rco_action() You may wish to provide an action callback. 


7.7.3. Interface Overview

The commonly used resource control interfaces are listed in Table 7.8.

Table 7.8. Commonly Used rctl Kernel Interfaces

Interface

Description

rctl_register()

Registers a new rctl with the subsystem. New resource controls can be added to a running instance by loaded modules via registration. (The current implementation does not support unloadable modules; this functionality can be added if needed, through an activation/deactivation interface involving the manipulation of the ops vector for the resource control(s) needing to support unloading.)

rctl_test()

Increment the resource associated with the given handle, return an error if limits exceeded.

rctl_add_default_limit()

Add a new default limit for the resource control.

rctl_enforced_value()

Get the enforced value of the resource control.

rctl_action()

Ask the rctl subsystem to enforce the registered action for the rctl. Typically used for nonmanaged resources where rctl_test isn't used; e.g., maximum file size.

rctl_add_legacy_limit()

Add a limit with a named legacy /etc/system tuneable (used for System V tuneables, etc.)


7.7.4. Interface Definitions

rctl_hndl_t rctl_register(const char *, rctl_entity_t, int, rlim64_t,                           rlim64_t, rctl_ops_t *) Overview rctl_register() performs a look-up in the dictionary of rctls active on the system; if a rctl of that name is absent, an entry is made into the dictionary.  The rctl is returned with its reference count incremented by one.  If the rctl name already exists, we panic. (Were the resource control system to support dynamic loading and unloading, which it is structured for, duplicate registration should lead to load failure instead of panicking.) Each registered rctl has a requirement that a RCPRIV_SYSTEM limit be defined.  This limit contains the highest possible value for this quantity on the system.  Furthermore, the registered control must provide infinite values for all applicable address space models supported by the operating system.  Attempts to set resource control values beyond the system limit will fail. Return values The rctl's ID. Caller's context Caller must be in a context suitable for KM_SLEEP allocations. int rctl_test(rctl_hndl_t, rctl_set_t *, struct proc *, rctl_qty_t, uint_t) Overview Increment the resource associated with the given handle, returning zero if the incremented value does not exceed the threshold for the current limit on the resource. Return values Actions taken, according to the rctl_test bitmask. Caller's context p_lock held by caller. void rctl_add_default_limit(const char *name, rctl_qty_t value,   rctl_priv_t privilege, uint_t action) Overview Create a default limit with specified value, privilege, and action. Return value No value returned. rlim64_t rctl_enforced_value(rctl_hndl_t, rctl_set_t *, struct proc *) Overview Given a process, get the next enforced value on the rctl of the specified handle. Return value The enforced value. Caller's context For controls on process collectives, p->p_lock must be held across the operation.                                                                           See os/rctl.c int rctl_action(rctl_hndl_t, rctl_set_t *, struct proc *, uint_t) Overview Take the action associated with the enforced value (as defined by rctl_get_enforced_value()) being exceeded or encountered.  Possibly perform a restricted subset of the available actions, if circumstances dictate that we cannot safely allocate memory (for a sigqueue_t) or guarantee process persistence across the duration of the function (an asynchronous action). Return values Actions taken, according to the rctl_test bitmask. Caller's context Safe to acquire rcs_lock. void rctl_add_default_limit(const char *name, rctl_qty_t value,   rctl_priv_t privilege, uint_t action) Overview Create a default limit with specified value, privilege, and action. Return value No value returned.                                                                           See os/rctl.c 


7.7.5. An Example Resource Control

You can see an example of a resource control in the System V shared memory implementation. The System V shared memory implements a maximum size for share memory allocations, in bytes. Historically, this was implemented as a systemwide parameter, tuneable only through /etc/system. In Solaris 10, this limit is implemented with a resource control within the scope of a project. This allows the limit to be set dynamically (unlink the set and reboot required with the old model). It also allows the parameter to be set within the scope of a project, which permits multiple application configurations within the same kernel instance. For example, two database applications might be run, each within its own project ID, fitting within its own administrable shared memory limits.

We register the shared memory resource controls in project.c by setting up a rctl ops template and then performing the registration with rctl_register().

static rctl_ops_t project_shmmax_ops = {         rcop_no_action,         rcop_no_usage,         rcop_no_set,         project_shmmax_test }; /*ARGSUSED*/ static int project_shmmax_test(struct rctl *rctl, struct proc *p, rctl_entity_p_t *e,     rctl_val_t *rval, rctl_qty_t inc, uint_t flags) {         rctl_qty_t v;         ASSERT(MUTEX_HELD(&p->p_lock));         ASSERT(e->rcep_t == RCENTITY_PROJECT);         v = e->rcep_p.proj->kpj_data.kpd_shmmax + inc;         if (v > rval->rcv_value)                 return (1);         return (0); } project_init() { ...         rc_project_shmmax = rctl_register("project.max-shm-memory",             RCENTITY_PROJECT, RCTL_GLOBAL_DENY_ALWAYS | RCTL_GLOBAL_NOBASIC |             RCTL_GLOBAL_BYTES, UINT64_MAX, UINT64_MAX, &project_shmmax_ops);         rctl_add_default_limit("project.max-shm-memory", qty,             RCPRIV_PRIVILEGED, RCTL_LOCAL_DENY);                                                                 See common/os/project.c 


Once registered, the IPC subsystem can update and check the resource against the limits administered by the users of the system.

sol10$ prctl  -n project.max-shm-memory $$ process: 3053: ksh NAME    PRIVILEGE       VALUE    FLAG   ACTION                        RECIPIENT project.max-shm-memory         privileged       246MB      -   deny                                  -         system          16.0EB    max   deny                                  - 


The resource limits are checked and enforced in the IPC implementation. Each call to shmget() tests and decrements the available share memory, through rctl_test().

extern rctl_hndl_t rc_project_shmmax; shmget() { ..                 /*                  * Check rsize and the per-project limit on shared                  * memory.  Checking rsize handles both the size == 0                  * case and the size < ULONG_MAX & PAGEMASK case (i.e.                  * rounding up wraps a size_t).                  */                 if (rsize == 0 || (rctl_test(rc_project_shmmax,                     pp->p_task->tk_proj->kpj_rctls, pp, rsize,                     RCA_SAFE) & RCT_DENY)) {                         mutex_exit(&pp->p_lock);                         mutex_exit(lock);                         ipc_cleanup(shm_svc, (kipc_perm_t *)sp);                         return (EINVAL);                 } .. }                                                                 See common/os/project.c 





SolarisT Internals. Solaris 10 and OpenSolaris Kernel Architecture
Solaris Internals: Solaris 10 and OpenSolaris Kernel Architecture (2nd Edition)
ISBN: 0131482092
EAN: 2147483647
Year: 2004
Pages: 244

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net