Section 19.2. Callouts and Callout Tables


19.2. Callouts and Callout Tables

The Solaris kernel provides a callout facility for general-purpose, time-based event scheduling. A system callout table is initialized at boot time, and kernel routines can place functions on the callout table through the timeout(9F) interface. A callout table entry includes a function pointer, optional argument, and clock-tick value. With each clock interrupt, the tick value is tested and the function is executed when the time interval expires. The kernel interface, timeout(9F), is part of the device driver interface (DDI) specification and is commonly used by device drivers. Other kernel facilities, such as the page fsflush daemon, which sleeps at regular intervals, make use of callouts as well.

The kernel callout table is laid out as shown in Figure 19.2.

Figure 19.2. Solaris 10 Callout Tables


At boot time, the callout_table array is initialized with pointers to callout_table structures; the structures are also created at boot time. There are 16 callout tables8 for each of the two callout types, normal and real-time. Normal callouts are those callout entries created with a timeout(9F) call. The kernel also supports real-time callouts, created with the internal realtime_timeout() function. Real-time callouts are handled more expediently than are normal callouts through a soft interrupt mechanism, whereas normal callouts are subject to scheduling latency. Once the callout mechanism has executed the function placed on the callout queue, the callout entry is removed.

Each callout entry has a unique callout ID, c_xid, the extended callout ID. The callout ID contains the table ID, indicating which callout table the callout belongs to, a bit indicating whether this is a short-term or long-term callout, and a running counter.

The callout ID name space is partitioned into two pieces for short-term and long-term callouts. (A long-term callout is defined as a callout with a tick counter greater than 16,384, a value derived through testing and monitoring of real production systems.) This partitioning prevents collisions on the callout ID, which can result from the high volume of timeout(9f) calls typically generated by a running system. It's possible to run out of unique callout IDs, so IDs can be recycled. For short-term callouts, ID recycling is not a problem; a particular callout will likely have been removed from the callout table before its ID gets reused. A long-term callout could collide with a new callout entry reusing its ID.

High-volume, short-term callout traffic is handled on a callout table with short-term callouts, and the relatively few long-term callouts are maintained on their own callout table. The callout table maintains a ct_short_id and ct_long_id, to determine if a callout table is supporting long-term or short-term callout entries.

The short and long IDs are set to an initial value at boot time in each callout table structure, with short IDs ranging from 0x10000000 to 0x1000000f and long IDs ranging from 0x30000000 to 0x3000000f. The other callout table structure fields set at boot time are the ct_type field (eight each of normal or real-time) and the ct_runtime and ct_curtime, both set to the current lbolt value when the initialization occurs.

The callout entries, each represented by a callout structure, are linked to a call-out table through the ct_idhash[] and ct_lbhash[] arrays, where each array element is either null or a pointer to a callout structure. The callout entries are stored on each array; one hashes on the callout ID, the other hashes on the lbolt value. At initialization, the kernel also creates two callout threads with each call-out table. The callout threads are signaled through a condition variable when the callout_schedule() function executes (called from the clock handler) if functions with expired timers need to execute.

As we alluded to, the insertion and removal of callout table entries by the time-out(9F) function is a regular and frequent occurrence on a running Solaris system. The algorithm for placing an entry on the callout queue goes as follows (the timeout(9F) flow):

  1. timeout(function_pointer, argument_pointer, time value (delta)) enters timeout_common(), with all the arguments passed to timeout(9F) along with an index into the callout_table array. The index derivation is based on the CPU cpu_seqid (sequential ID) field and on the callout type, where normal callouts are placed on tables indexed between array locations 0 through 7 (real-time callouts, 8 through 15).

    Basically, the algorithm causes callout entries to cycle through indexes 8 through 15 as CPU IDs increment; the same CPU will reference the same index location every time.

  2. timeout_common() grabs a callout structure from the ct_freelist if one is available, or the kernel memory allocator allocates a new one.

  3. The c_func and c_arg fields are set in the callout structure, and the c_runtime field is set to the sum of the current lbolt value and the passed timer value.

  4. timeout_common() establishes the ID in the callout table structure, setting either the ct_short_id or ct_long_id (if the timer is larger than 16,384, it's a long ID).

    We saw earlier that the ID fields are initialized at boot time. As callout entries are added, the algorithm essentially counts up until it wraps around and starts over again. This process leads to the reuse problem we just discussed, which is why we have short-term and long-term IDs.

  5. The c_xid in the callout structure is set to the same ID value as the callout table ID.

  6. A callout entry (callout structure) is inserted into the callout table by adding the entry to both the ct_idhash[] and ct_lbhash[] arrays in the callout table.

  7. The algorithm derives the array index by hashing on the ID for ct_idhash[] placement and hashing on the c_runtime value set in the callout structure for the entry for ct_lbhash[]. If the array index already has a pointer, the algorithm links the callout structure by means of the next and prev pointers.

The callout entry is now established on the callout table, and timeout(9F) returns the ID to the calling function. The sequence of events for realtime_timeout() is the same.

The work done when callout_schedule() is called from the clock interrupt handler essentially happens through multiple loops. The outer loop hits all the callout tables, and the inner loop hits the callout entries in the table.

  1. A local function variable set to the current lbolt value is used for entry to the inner loop, and the callout entries' c_runtime values determine whether the callouts are due for execution.

  2. If the callout is not due or is already running, the code moves on to the next entry. Otherwise, it's time for the function in the callout entry to run.

  3. For normal callout types, a condition variable signal function is set to wake up one of the callout threads to execute the function. For real-time callouts, the kernel softcall() function is invoked to generate a soft interrupt, which interrupts a processor, resulting in the function executing without going through the dispatcher.

  4. Once the callout table is processed in the inner loop, the outer loop moves the code on to the next callout table. A mutex lock (ct_lock) is acquired in the inner loop to prevent another processor from processing the same callout table at the same time. The mutex is released when the inner loop through the callout table is completed.

  5. The callout threads created at initialization (two per callout table) then loop, waiting for the ct_threadpool condition variable. They're signaled through the condition variable when a normal callout entry is due to execute (as above), at which point they call the callout_execute() function. callout_execute() is also invoked through the softcall() interrupt function to run a function placed on a callout table by realtime_timeout().

    To reiterate, a normal callout can be exposed to some additional latency for the callout threads to be scheduled once they are signaled by the condition variable. The softcall() method will force a processor into the callout_execute() function sooner through the interrupt facility.

  6. callout_execute() loops again through the callout table, testing the conditions for function execution. It's possible that another processor took care of things in the interim between function calls and lock releases, so the kernel tests the time values and running flag for the entries in the table before actually executing the function.

  7. Assuming that it is time to run, callout_execute() sets the CALLOUT_EXECUTING flag in the callout entry's c_xid field, and the function is invoked.

  8. The callout entry is then removed from the callout table, the callout structure is placed on the free list (ct_freelist), and a condition variable is broadcasted if any threads are sleeping on the c_done condition variable. This condition variable is part of the callout entry and provides a method of generating a notification that a function placed on the callout table has executed.

The kernel also provides an untimeout(9F) interface, which removes a call-out. untimeout(9F) is passed the ID (which was returned from timeout(9F) when the function was placed on the callout table). The entry is located by means of the ct_idhash[] array and removed, with the callout structure being added to the free list. Callout entries added by realtime_timeout(9F) can also be removed with untimeout(9F). There is no separate function for the removal of real-time callouts.

You can examine the callout table on a running system with the callout dcmd in mdb.

# mdb -k Loading modules: [ unix krtld genunix specfs dtrace ufs ip sctp usba s1394 fcp fctl nca lofs zpool random nfs audiosup sppp crypto logindmux ptm fcip md cpc ipc ] > ::callout FUNCTION                 ARGUMENT         ID                        TIME setrun                   ffffffff8357a820 3fffffff27126120          3458ec80 (T+798) setrun                   ffffffff816d2dc0 3fffffff27120340          3458e9a5 (T+67) setrun                   ffffffff8337f7a0 3fffffff27120350          3458e9a1 (T+63) setrun                   ffffffff83530100 3fffffff27120380          3458eb1b (T+441) setrun                   ffffffff832cd280 3fffffff27120390          3458e976 (T+20) setrun                   ffffffff8172cf00 3fffffff271203a0          3458ef4b (T+1513) setrun                   ffffffff8358b200 3fffffff271203b0          3458eaf9 (T+407) setrun                   ffffffff83634060 3fffffff27120420          3458eef0 (T+1422) 


Some of the kernel functions that you will consistently find on the callout table of a running Solaris system include the following:

  • polltime. A real-time callout. Set from the poll(2) system call and based on the poll interval. polltime() wakes up a thread waiting on a poll event.

  • realitexpire. A real-time callout. Used in the real-time interval timer support when a timer is set. Callout ticks are derived from timer value. realitexpire() generates the SIGALRM to the process.

  • setrun. A real-time callout. Placed on the callout queue by sleep/wakeup code (condition variables) to force a thread wakeup when the sleep event has a timeout value; for example, an aiowait(2) call can specify a maximum tick count to wait for the I/O to complete. aiowait(2) with a timeout specificity uses a timed condition variable, which in turn places a setrun() event on the callout queue to force a thread wakeup if the time expires before the I/O has completed.

  • schedpaging. A normal callout. Part of the page-out subsystem in the VM system, used to manage the page-out rate.

  • mi_timer_fire. A normal callout. Part of the STREAMS-based TCP/IP protocol support. mi_timer_fire() generates regular message block processing through a STREAMS queue.

  • sigalarm2proc. A normal callout. The alarm(2) system call places sigalarm2proc on the callout queue to generate a SIGALRM when the timer expires.

  • ts_update. A normal callout. Checks a list of timeshare and interactive class threads and updates their priority as needed.

  • seg_pupdate. A normal callout. Used by the address space segment reclaim thread to find page-locked pages that have not been used in a while and reclaim them.

  • kmem_update. A normal callout. Performs low-level kernel memory allocator management.

This is by no means a complete list of all the kernel functions placed on the call-out queue, and of course you will typically see several of the same functions on the callout queue at the same time, with different IDs and timeout values.




SolarisT Internals. Solaris 10 and OpenSolaris Kernel Architecture
Solaris Internals: Solaris 10 and OpenSolaris Kernel Architecture (2nd Edition)
ISBN: 0131482092
EAN: 2147483647
Year: 2004
Pages: 244

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net