Section 3.6. Scheduling Classes


3.6. Scheduling Classes

Before diving into the specifics of dispatcher thread selection and operations, we need to discuss thread priorities and the individual scheduling classes implemented in the kernel. The core dispatcher code and scheduling-class specific code are tightly integrated, and a thorough explanation of the CPU and thread selection and scheduling process requires a background in the priority scheme and the functions managed by the scheduling-class specific code.

The dispatcher subsystem can be decomposed into the core dispatcher functions and the scheduling-class specific functions. While core dispatcher code and scheduling class functions are tightly integrated and are maintained in the same source directoryusr/src/uts/common/dispthe architecture allows for a single instance of the dispatcher to support multiple scheduling classes. The different scheduling classes determine the priority range for threads and vary in terms of the algorithms applied to thread-specific functions.

Solaris provides six bundled scheduling classes:

  • Timeshare (TS). Priority adjustments are based on the time a thread spends waiting for processor resources or consuming processor resources. The thread's time quantumthe maximum amount of time the thread can execute on the processorvaries according to its priority.

  • Interactive (IA). The same as timeshare, with the addition of a mechanism that boosts the priority of a thread connected to the active window on a desktop. IA class threads exist only in a laptop/desktop environment when a window manager is started (you won't see IA class threads on a server).

  • Fair Share (FSS). Available processor cycles are divided into units called shares, and administrative tools allocate shares to processes using the Solaris projects and tasks framework. A thread in the FSS class has its priority adjusted according to its share allocation, recent utilization, and shares consumed by other threads in the FSS class.

  • Fixed Priority (FX). The assigned priority is not changed or adjusted by the kernel over the lifetime of the thread.

  • Real Time (RT). Real-time threads occupy the highest range of assignable priorities. Real-time scheduling provides the fastest possible dispatch latencythe elapsed time between an RT thread becoming runnable and getting scheduled onto a processor.

  • System (SYS). The kernel uses this class for the execution of operating system threads. The priority range occupied by the SYS class is higher than all other scheduling classes, with the exception of the real-time class.

The default scheduling class out of the box is the TS class or the IA class for desktops and laptops for threads started by the user under a window manager. User and administrative commands exist for placing threads in other classes. priocntl(1) can change the scheduling class and priority of a thread or process; note that improving priorities and using the RT class requires a privileged account). Using the FSS class requires a little more administrative work to do the share allocation. See System Administration Guide: Solaris ContainersResource Management and Solaris Zones (http://docs.sun.com) for specifics.

3.6.1. Scheduling Class Data

Each scheduling class has a unique data structure referenced through a kernel thread's t_cldata pointer. The structures take the name of xxproc, where xx is ts, rt, fss, fx or ia. As an example, the tsproc_t object is shown below. The class-specific structures for the other scheduling classes are similar in terms of the structure members and their use.

/*  * time-sharing class specific thread structure  */ typedef struct tsproc {         int             ts_timeleft;     /* time remaining in procs quantum */         uint_t          ts_dispwait;     /* wall clock seconds since start */                                 /*    of quantum (not reset upon preemption */         pri_t   ts_cpupri;      /* system controlled component of ts_umdpri */         pri_t   ts_uprilim;     /* user priority limit */         pri_t   ts_upri;        /* user priority */         pri_t   ts_umdpri;      /* user mode priority within ts class */         pri_t   ts_scpri;       /* remembered priority, for schedctl */         char    ts_nice;        /* nice value for compatibility */         char    ts_boost;       /* interactive priority offset */         uchar_t ts_flags;       /* flags defined below */         kthread_t *ts_tp;       /* pointer to thread */         struct tsproc *ts_next; /* link to next tsproc on list */         struct tsproc *ts_prev; /* link to previous tsproc on list */ } tsproc_t;                                                         See usr/src/uts/common/sys/ts.h 


The kernel maintains doubly linked lists of the class-specific structuresseparate lists for each class, with the exception of IA class threads. Threads in the IA class link to a tsproc structure, and most of the class-supporting code for interactive threads is handled by the TS routines. IA threads are distinguished from TS threads by a flag in the ts_flags field, the TSIA flag.

Maintaining the linked lists for the class structures greatly simplifies the dispatcher-supporting code that updates different fields, such as time quantum, in the structures during the clock-driven dispatcher housekeeping functions.

For the TS/IA, FX, and FSS classes, the kernel builds an array of 16 xxproc structure pointers that anchor up to 16 doubly linked lists of the xxproc structures, systemwide. The code implements a hash function, based on the thread pointer, to determine which list to place a thread on, and each list is protected by its own kernel mutex, implemented as a listlock array, once for each class. Implementing multiple linked lists in this way makes for faster traversal of all the xxproc structures for a given scheduling class in a running system, and the use of a lock per list allows for concurrencymultiple kernel threads can traverse the lists. Here's the implementation for the FSS class.

/*  * The fssproc_t structures are kept in an array of circular doubly linked  * lists.  A hash on the thread pointer is used to determine which list each  * thread should be placed in.  Each list has a dummy "head" which is never  * removed, so the list is never empty.  fss_update traverses these lists to  * update the priorities of threads that have been waiting on the run queue.  */ #define FSS_LISTS               16 /* number of lists, must be power of 2 */ #define FSS_LIST_HASH(t)        (((uintptr_t)(t) >> 9) & (FSS_LISTS - 1)) #define FSS_LIST_NEXT(i)        (((i) + 1) & (FSS_LISTS - 1)) #define FSS_LIST_INSERT(fssproc)                                \ {                                                               \         int index = FSS_LIST_HASH(fssproc->fss_tp);             \         kmutex_t *lockp = &fss_listlock[index];                 \         fssproc_t *headp = &fss_listhead[index];                \ . . . #define FSS_LIST_DELETE(fssproc)                                \ {                                                               \         int index = FSS_LIST_HASH(fssproc->fss_tp);             \         kmutex_t *lockp = &fss_listlock[index];                 \ . . . static fssproc_t fss_listhead[FSS_LISTS]; static kmutex_t fss_listlock[FSS_LISTS];                                                       See usr/src/uts/common/disp/fss.c 


The fss_listhead[] array represents the beginning of the 16 lists of fssproc_t structures, each with a corresponding lock in fss_listlock[]. The lists for the other classes are implemented in much the same fashion, with the exception of the RT list, which is implemented as a single list.

The kernel framework for scheduling classes begins with the sclass array of sclass_t structures.

extern struct sclass sclass[];  /* the class table */ typedef struct sclass {         char            *cl_name;       /* class name */         /* class specific initialization function */         pri_t           (*cl_init)(id_t, int, classfuncs_t **);         classfuncs_t    *cl_funcs;      /* pointer to classfuncs structure */         krwlock_t       *cl_lock;       /* class structure read/write lock */         int             cl_count;       /* # of threads trying to load class */ } sclass_t;                                                      See usr/src/uts/common/sys/class.h 


For each loaded scheduling class, the sclass array is initialized with the members listed above and indexed with the class ID (cid) kernel variable.

# mdb -k Loading modules: [ unix krtld genunix specfs dtrace uppc pcplusmp ufs ip sctp usba uhci s1394 fctl nca lofs zfs random nfs audiosup cpc fcip crypto ptm sppp ipc ] > ::class SLOT NAME       INIT FCN                 CLASS FCN    0 SYS        sys_init                 sys_classfuncs    1 TS         ts_init                  ts_classfuncs    2 FX         fx_init                  fx_classfuncs    3 IA         ia_init                  ia_classfuncs    4 RT         rt_init                  rt_classfuncs    5            0                        0    6            0                        0 . . . 


The example above uses the mdb(1) class dcmd to dump the sclass array. The cid is displayed in the SLOT column. Note that the FSS class is not loaded in the example. The kernel loaded required classes at boot time (SYS, TS)other classes get loaded dynamically as needed (as a result of placing a thread in a particular class) or through administrative commands (modload(1)). Part of the scheduling class loading and initializing process is the instantiation of the sclass_t object and entry in the sclass array.

Part of each scheduling class is a set of pointers to the functions within the class, referenced with the cl_funcs pointer in the sclass_t. Scheduling class functions are subdivided into two categoriesthread operations and class operations. As the names suggest, the thread operations are the class functions that act on a kernel thread, and the class operations are administrative and management functions.

typedef struct classfuncs {         class_ops_t     sclass;         thread_ops_t    thread; } classfuncs_t; typedef struct sclass {         char            *cl_name;       /* class name */         /* class specific initialization function */         pri_t           (*cl_init)(id_t, int, classfuncs_t **);         classfuncs_t    *cl_funcs;      /* pointer to classfuncs structure */         krwlock_t       *cl_lock;       /* class structure read/write lock */         int             cl_count;       /* # of threads trying to load class */ } sclass_t;                                                      See usr/src/uts/common/sys/class.h 


The class functions are embedded in a sclass_t object, which is also linked to kernel threads (based of course on the scheduling class of the thread). Figure 3.6 illustrates the big picture:

Figure 3.6. Scheduling Class Framework


For space and readability, the FSS class framework is shown separately in Figure 3.6. The framework is similar for FSS, with the addition of several FSS-specific objects linked off fssproc_t. The FSS class is unique since it implements a share-based scheduling policy that requires administrative input for share allocation and (optionally) processor sets. Additional support structures, the fssproj_t (project interface) and fsspset_t (processor set interface) are linked to the fssproc_t. There is also a fsszone_t to manage FSS threads running in zones.

Figure 3.7 shows three FSS class threads that are all part of the same project each thread's fssproc_t references the same fssproj_t project structure. The kernel's internal project structure, kproject_t, maintains the share value allocated to the project and various project-level resource controls. Data on the CPU set allocated to the project is maintained in the fsspset_t, which links to a CPU partition structure (cpupart_t). The fsszone_t object is defined and instantiated by the kernel when a zone is created and shares are allocated. This behavior supports Solaris Zones and the ability to allocate a given number of CPU shares to a zone.

Figure 3.7. FSS Structure Framework


Getting back to Figure 3.6, the scheduling class operations vector (the function pointers in the class_t object) is at the center of the framework, referenced by the kernel through the system class array and by individual kernel threads through the thread's t_clfuncs pointer. The class and thread operations function prototypes can be found in the class.h header file.

typedef struct class_ops {         int     (*cl_admin)(caddr_t, cred_t *);         int     (*cl_getclinfo)(void *);         int     (*cl_parmsin)(void *);         int     (*cl_parmsout)(void *, pc_vaparms_t *);         int     (*cl_vaparmsin)(void *, pc_vaparms_t *);         int     (*cl_vaparmsout)(void *, pc_vaparms_t *);         int     (*cl_getclpri)(pcpri_t *);         int     (*cl_alloc)(void **, int);         void    (*cl_free)(void *); } class_ops_t; typedef struct thread_ops {         int     (*cl_enterclass)(kthread_id_t, id_t, void *, cred_t *, void *);         void    (*cl_exitclass)(void *);         int     (*cl_canexit)(kthread_id_t, cred_t *);         int     (*cl_fork)(kthread_id_t, kthread_id_t, void *);         void    (*cl_forkret)(kthread_id_t, kthread_id_t);         void    (*cl_parmsget)(kthread_id_t, void *);         int     (*cl_parmsset)(kthread_id_t, void *, id_t, cred_t *);         void    (*cl_stop)(kthread_id_t, int, int);         void    (*cl_exit)(kthread_id_t);         void    (*cl_active)(kthread_id_t);         void    (*cl_inactive)(kthread_id_t);         pri_t   (*cl_swapin)(kthread_id_t, int);         pri_t   (*cl_swapout)(kthread_id_t, int);         void    (*cl_trapret)(kthread_id_t);         void    (*cl_preempt)(kthread_id_t);         void    (*cl_setrun)(kthread_id_t);         void    (*cl_sleep)(kthread_id_t);         void    (*cl_tick)(kthread_id_t);         void    (*cl_wakeup)(kthread_id_t);         int     (*cl_donice)(kthread_id_t, cred_t *, int, int *);         pri_t   (*cl_globpri)(kthread_id_t);         void    (*cl_set_process_group)(pid_t, pid_t, pid_t);         void    (*cl_yield)(kthread_id_t); } thread_ops_t;                                                      See usr/src/uts/common/sys/class.h 


The functions are described in the next section.

3.6.2. Scheduling Class Functions

Below is a complete list of the kernel scheduling-class-specific routines and a description of what they do. More details on many of the functions described below follow in the subsequent discussions on thread priorities and the dispatcher algorithms. The first nine functions fall into the class management category and, in general, support the priocntl(2) system call, which is invoked from the priocntl(1) and dispadmin(1M) commands. priocntl(2) can, of course, be called from an application program as well.

  • cl_admin. Retrieve or alter values in the dispatch table for the class.

  • cl_getclinfo. Get information about the scheduling class. Currently, only the max user priority (xx_maxupri) value is returned.

  • cl_parmsin. Validate user-supplied priority values to ensure that they fall within range. Also check permissions of caller to ensure that the requested operation is allowed. For the TS, IA, FX, and FSS classes, do a limit check against the max user priority (maxupri). For the RT class, the notion of a user priority does not exist, so make a range check against the max RT priority. The function supports the PC_SETPARMS command in priocntl(2).

  • cl_parmsout. Support PC_GETPARMS command in priocntl(2). Retrieve the class-specific scheduling parameters.

  • cl_vaparmsin, cl_vaparmsout. Are a variant of the parmsin/parmsout functions and take an addition argument with a variable parameter list.

  • cl_getclpri. Get class priority ranges. For each scheduling class, return the minimum (lowest) and maximum (highest) global priority.

  • cl_alloc, cl_free. Allocate or free a class-specific structure (xxproc_t).

The following functions support and manage threads.

  • cl_enterclass. Allocate the resources needed for a thread to enter a scheduling classthe xxproc_t structure. Initialize the fields and links. The class cl_enterclass functions are discussed in their respective sections.

  • cl_exitclass. Remove the class-specific data structure (xxproc_t) from the linked list and free it.

  • cl_canexit. For FSS class threads, ensure that the thread's credentials permit the thread to exit (requires the PRIV_PROC_PRIOCNTL privilege. See (Chapter 5 and privileges(1)).

  • cl_fork. Process fork support code. Allocate a class-specific data structure (tsproc or rtproc), initialize it with values from the parent thread, and add it to the linked list. Called from the lwpcreate() and lwpfork() kernel functions as part of the fork(2) system call.

  • cl_forkret. Support a fork(2) system call. It is called from the kernel cfork() (common fork) code and is the last thing done before the fork(2) returns to the calling parent and the newly created child process. The xx_forkret functions resolve the run order of the parent and child, since it is desired that the child run first so the new object can be exec'd and can set up its own address space mappings to prevent the kernel from needlessly duplicating copy-on-write pages. The child is placed at the back of the dispatch queue and the parent gives up the processor.

  • cl_parmsget. Get the current user priority and max user priority for a thread.

  • cl_parmsset. Set the priority of a thread on the basis of passed input arguments. A user parameter data structure, xxparms, is defined for each scheduling class.

  • cl_stop. Prepare a thread for a transition to the stop state.

  • cl_exit. Handle an exiting thread. For FSS class threads, the project's framework needs to be updated, such as freeing shares that have been allocated to the exiting thread. For FX class threads, any registered callback functions are nulled and the callback list entry is deleted.

  • cl_active, cl_inactive. Track active projects in a processor set. These functions are implemented only by the FSS scheduler and are called when an FSS class thread sleeps or wakes up.

  • cl_swapin. Calculate the effective priority of a thread to determine the eligibility of its associated LWP for swapping in.

  • cl_swapout. Calculate the effective priority of a thread for swapping out its LWP. Called by the memory scheduler (sched(), the swap-out function is passed a pointer to a kthread and a flag to indicate whether the memory scheduler is in hardswap or softswap mode (called from a similar loop in sched(), as described above). Softswap means avefree < desfree, (average free memory is less than desired free), so only threads sleeping longer than maxslp (20) seconds are marked for swap-out. Hard swap mode means that avefree has been less than minfree and desfree for an extended period of time (30 seconds), an average of two runnable threads are on the dispatch queues, and the paging (pagein + pageout) rate is high. (See Section 10.3.6.)

The code is relatively simple; if in softswap mode, set effective priority to 0. If in hardswap mode, calculate an effective priority in a similar fashion as for swap-in, such that threads with a small address space that have been in memory for a relatively long amount of time are swapped out first. A time field, t_stime, in the kthread structure is set by the swapper when a thread is marked for swap-out as well as swap-in.

  • cl_trapret. Readjust the thread's priority. Trap return code, called on return to user mode from a system call or trap.

  • cl_preempt. Preempt a kernel thread and place it on a dispatch queue. Threads interrupted in kernel mode are given a SYS class priority so that they return to execution quickly. Preemption is discussed in Section 3.9.

  • cl_setrun. Set a kernel thread runnable, typically called when a thread is removed from a sleep queue. Place the thread on a dispatch queue. For most threads, readjust the global dispatch priority if the thread has been waiting (sleeping) an inordinate amount time.

  • cl_sleep. Prepare a thread for sleep. Set the thread's priority on the basis of wait time or if a kernel priority is requested (the kernel thread's t_kpri_req flag). A kernel priority (SYS class priority) is set if the thread is holding an exclusive lock on a memory page or an RW write lock.

  • cl_tick. Process ticks for the thread. Called from the clock interrupt handler (see Section 19.1). Class-specific tick processing is discussed in the class-specific sections (beginning in Section 3.7.3.2).

  • cl_wakeup. Move a thread from a sleep to a dispatch queue and reset several thread and class structure values.

  • cl_donice. Adjust the priority according to the nice value for the target thread. Called when a nice(1) command is issued on the thread to alter the priority. nice(1) is not supported for RT and SYS class threads; the kernel functions for SYS and RT return an invalid operation error. The nice(1) command exists in Solaris for compatibility. Thread priority adjustments should be done with priocntl(1).

  • cl_globpri. Return the global dispatch priority that a thread would be assigned for a given user-mode priority. The calculation of the actual dispatch priority of a thread is based on several factors, including the notion of a user priority. See Section 3.7 for details.

  • cl_set_process_group. Establish the process group associated with the window session for IA class threads.

  • cl_yield. Cause a thread to surrender the processor. Called from the yield(2) system call. The kernel thread is placed at the back of a dispatch queue.

The dispatcher and the kernel-at-large call the appropriate routine for a specific scheduling class, using essentially the same method used in the VFS/Vnode subsystem. A set of macros resolve to the class-specific function by indexing through either the current kernel thread pointer or the system class array. Certain functions exist in support of setting up a thread for a scheduling class; as such, the links will not yet be in place in the thread to locate a function in the class operations array, so calls are resolved through the system class array.

#define CL_ENTERCLASS(t, cid, clparmsp, credp, bufp) \         (sclass[cid].cl_funcs->thread.cl_enterclass) (t, cid, \             (void *)clparmsp, credp, bufp) #define CL_EXITCLASS(cid, clprocp)\         (sclass[cid].cl_funcs->thread.cl_exitclass) ((void *)clprocp) #define CL_CANEXIT(t, cr)       (*(t)->t_clfuncs->cl_canexit)(t, cr) #define CL_FORK(tp, ct, bufp)   (*(tp)->t_clfuncs->cl_fork)(tp, ct, bufp) #define CL_FORKRET(t, ct)       (*(t)->t_clfuncs->cl_forkret)(t, ct) #define CL_GETCLINFO(clp, clinfop) \         (*(clp)->cl_funcs->sclass.cl_getclinfo)((void *)clinfop) . . .                                                      See usr/src/uts/common/sys/class.h 


CL_ENTERCLASS, for example, is entered through the system class array, indexed with the class ID (cid). CL_CANEXIT, CL_FORK, etc., are entered through the thread's t_clfuncs pointer. For a complete list of the class operations macros, see usr/src/uts/common/sys/class.h.

3.6.3. Scheduling Class Dispatcher Tables

Threads execute on a CPU until they block (sleepissue a blocking system call), are preempted (a higher-priority thread becomes runnable), or they use their time quantum. A time quantum is the maximum execution time allotted to a thread before it gets forced off the CPU and must wait for its turn to come around again. The allotted time quantum varies according to the scheduling class and, in some cases, the priority of the thread. Solaris maintains time quanta for each scheduling class in an object called a dispatch table. The row and columns in a table vary across the different scheduling classes, but they all provide the user interface to adjusting time quanta.

You can examine the dispatch table for a given scheduling class by using dispadmin(1):

# dispadmin -g -c FSS # # Fair Share Scheduler Configuration # RES=1000 # # Time Quantum # QUANTUM=110 


The -c flag in the command line is followed by the scheduling class we're interested in, FSS in this example. The QUANTUM unit of time is based on a resolution value (reported as RES in the output). The unit of time is a reciprocal of the resolution; thus, a resolution value of 1000 equates to a unit of time of milliseconds (1/1000 = 0.001), meaning the time quantum shown for FSS tHReads is 110 milliseconds for FSS threads at any priority.

The FX and RT classes allocate different time quanta according to the priority of the thread:

# Real Time Dispatcher Configuration RES=1000 # TIME QUANTUM                    PRIORITY # (rt_quantum)                      LEVEL       1000                    #        0 . . .        800                    #       10 . . .        600                    #       20 . . .        400                    #       30 . . .        200                    #       40 . . .        100                    #       50 . . .        100                    #       59 


The RT table above lists quantum values for every one of 60 (059) possible priorities. Starting with a quantum of 1 second (1000 milliseconds) for the lowest-priority RT threads (priorities 09), the quantum is reduced as the priorities get better, providing a balance: Higher-priority threads can consume fewer CPU cycles, and lower-priority threads, which tend to wait longer for CPU time, get a larger time quantum. The dispatch table for the FX class is similar, in that the table has two columns, assigning different time quanta for different priority threadsthe actual time quantum values are different.

The SYS class is not implemented with a dispatch table, since SYS class threads are not subject to time limits when they execute. A SYS class thread runs until it completes, is preempted, or voluntarily releases the processor.

The TS/IA table has several additional columns for managing the priority of TS/IA class threads based on different events and conditions. The example below shows the default values for a selected group of timeshare/interactive priorities. In the interest of space and readability, we don't list all 60 (059) priorities since we only need a representative sample for this discussion.

# Time Sharing Dispatcher Configuration RES=1000 # ts_quantum  ts_tqexp  ts_slpret  ts_maxwait ts_lwait  PRIORITY LEVEL        200         0        50           0        50        #     0 . . .        160         0        51           0        51        #    10 . . .        120        10        52           0        52        #    20 . . .         80        20        53           0        53        #    30 . . .         40        30        55           0        55        #    40 . . .         20        49        59       32000        59        #    59 


Each entry in the TS/IA dispatch table (each row) is defined by the tsdpent (timeshare dispatch entry) data structure.

/*  * time-sharing dispatcher parameter table entry  */ typedef struct tsdpent {         pri_t   ts_globpri;     /* global (class independent) priority */         int     ts_quantum;     /* time quantum given to procs at this level */         pri_t   ts_tqexp;       /* ts_umdpri assigned when proc at this level */                                 /*   exceeds its time quantum */         pri_t   ts_slpret;      /* ts_umdpri assigned when proc at this level */                                 /* returns to user mode after sleeping */         short   ts_maxwait;     /* bumped to ts_lwait if more than ts_maxwait */                                 /* secs elapse before receiving full quantum */         short   ts_lwait;       /* ts_umdpri assigned if ts_dispwait exceeds  */                                 /* ts_maxwait */ } tsdpent_t;                                                         See usr/src/uts/common/sys/ts.h 


RES and the PRIORITY LEVEL column are not defined in tsdpent. Those fields, along with the defined members in the structure table, are described below.

  • RES (resolution value). Defines the unit of time for the ts_quantum column.

  • PRIORITY LEVEL. The class-dependent priority, not the systemwide global priority. The PRIORITY LEVEL column is derived as the row number in the dispatch table. Every row corresponds to a unique priority level within the TS/IA) class, and each column in the row contains values that determine the priority adjustments made on the thread running at that particular priority. This is not the same as ts_globpri.

  • ts_globpri. The only table parameter (tsdpent structure member) that is not displayed in the output of the dispadmin(1M) command, and also the only value that is not tuneable. ts_globpri is the class-independent global priority that corresponds to the timeshare priority (column farthest to the right). Refer to Figure 3.8 for a list of global priorities when all the bundled scheduling classes are loaded. Since TS/IA is the lowest class, the kernel global priorities 059 correspond to the TS/IA class priorities 059.

    Figure 3.8. Dispatcher Global Priorities

  • ts_quantum. The time quantum; the amount of time that a thread at this priority is allowed to run before it must relinquish the processor, have its priority reset, and be assigned a new time quantum. Be aware that the ts_dptbl(4) man page, as well as other references, indicates that the value in the ts_quantum field is in ticks. A tick is a unit of time that can vary from platform to platform. In Solaris, there are 100 ticks per second, so a tick occurs every 10 milliseconds. The value in ts_quantum is in ticks only if RES is 100. If RES is any other value, including the default value of 1000, then ts_quantum represents some fraction of a second, the fractional value determined by the reciprocal value of RES. With a default value of RES = 1000, the reciprocal of 1000 is .001 (milliseconds).

We can change the RES value by using the -r flag with dispadmin(1M).

# dispadmin -g -c TS -r 100 # Time Sharing Dispatcher Configuration RES=100 # ts_quantum  ts_tqexp  ts_slpret  ts_maxwait ts_lwait  PRIORITY LEVEL         20         0        50           0        50        #     0         20         0        50           0        50        #     1 . . . 


This command causes the values in the ts_quantum column to change but does not change the actual quantum allocation. For example, at priority 0, instead of a quantum value of 200 with a RES of 1000, we have a quantum value of 20 with a RES of 100. The fractional unit is different. Instead of 200 milliseconds with a RES value of 1000, we get 20 tenths-of-a-second, which is the same amount of time, just represented differently [20 x .010 = 200 x .001]. In general, it makes sense to simply leave the RES value at the default of 1000, which makes it easy to interpret the ts_quantum field as milliseconds.

  • ts_tqexp. Time quantum expired. The new priority a thread is set to when it has exceeded its time quantum. From the default values in the TS dispatch table, threads at priorities 010 have their priority set to 0 if they burn through their allotted time quantum. As another example, threads at priority 50 have a 40-millisecond time quantum and have their priority set to 40 if they use up their time.

  • ts_slpret. The sleep return priority value. A thread that has been sleeping has its priority set to this value when it is woken up. These are set such that the thread will be placed at a higher priority (in some cases, substantially higher) so that the thread gets some processor time after having slept (waited for an event, which typically is a disk or network I/O).

  • ts_maxwait, ts_lwait. These parameters compensate threads that have been preempted and have waited a relatively long time before using up their time quantumit's a starvation avoidance mechanism that improves the priority of threads that have been sitting on a dispatch queue for an inordinate amount of time. ts_maxwait is the time threshold, and ts_lwait is the new priority for a thread that has waited longer than ts_maxwait.

    A thread's ts_dispwait variable is reset to zero when the thread is inserted on a dispatch queue, following a time-quantum expiration or a wakeup; note that preemption by a higher-priority thread does not result in ts_dispwait getting reset to zero. ts_dispwait is incremented once per second for every thread on a dispatch queue and sleep queue. When a thread's ts_dispwait exceeds ts_maxwait, the thread's priority is boosted to the corresponding priority value in the ts_lwait column.

    The priority boost for threads on sleep queues reflects a change that was introduced in Solaris 9, as a result of a thread starvation scenario that surfaced with certain workloads. The ts_dispwait field previously resulted only in a priority boost for threads in the TS_RUN state (runnable); threads on a sleep queue (TS_SLEEP state) did not get a priority change, so threads blocked on a synchronization object would continue to sleep with their priority unchanged. For certain types of synchronization, particularly where threads are woken one by one in priority order such as when acquiring an rwlock as a writer, threads that block at a low priority can be starved. For this reason, we added a change that bumps the priority of threads in sleep state as well as those in run state. This change is enabled with the ts_sleep_promote parameter, which is set to 1 by default.

    Interesting to note is that the default values in the TS/IA dispatch table inject a 0 value in ts_maxwait for every priority except the highest priority (59). So just one increment in the ts_dispwait field causes the thread priority to be readjusted to ts_lwait, except for priority 59 threads. The net effect is that all but the highest-priority (59) timeshare threads have their priority bumped to the 5059 range (ts_lwait) every second.

    This process has the desirable effect of not penalizing a thread that is CPU bound for an extended period of time. Threads that are CPU intensive will, over time, end up in the low 09 priority range as they keep using up their time quantum, because of priority readjustments by ts_tqexp. Once a second, they could get bumped back up to the 5059 range and will only migrate back down if they sustain their CPU-bound behavior.

    Priority 59 threads are handled differently. These threads are already at the maximum (best) priority for a timeshare thread, so there's no way to bump their priority with ts_maxwait and make it better. The ts_update() routine is the kernel code segment that increments the ts_dispwait value and readjusts thread priorities by means of ts_lwait. ts_update() reorders the linked list of threads on the dispatch queues after adjusting the priority. The reordering after the priority adjustment puts threads at the front of their new dispatch queue for that priority. The threads on the priority 59 linked list would end up reordered but still at the same priority.

You can apply user-supplied values to the dispatch tables by using the dispadmin(1M) command or by compiling a new /kernel/sched/TS_DPTBL loadable module and replacing the default module. The ts_dptbl(4) man page provides the source and the instructions for doing this. Either way, any changes to the dispatch tables should be done with extreme caution and tested extensively before going into production.




SolarisT Internals. Solaris 10 and OpenSolaris Kernel Architecture
Solaris Internals: Solaris 10 and OpenSolaris Kernel Architecture (2nd Edition)
ISBN: 0131482092
EAN: 2147483647
Year: 2004
Pages: 244

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net