Section 20.2. Dynamic Task Queues | Solaris Internals: Solaris 10 and OpenSolaris Kernel Architecture (2nd Edition)

20.2. Dynamic Task Queues

Dynamic task queues first appeared in Solaris 9, as the first major revision to their introduction in Solaris 8.

20.2.1. Why a Dynamic Task Queue?

Suppose that two friends, Bob and Alice, are standing in a cafeteria line with Alice standing behind Bob. When the cashier checks Bob's tray, it turns out that Bob doesn't have enough money, so he wants to borrow from Alice. But Alice is not sure whether she has enough cash until she knows the cost of her lunch. This is a typical deadlock situationneither Bob nor Alice can make any forward progress because of waiting for the other. The same kind of deadlock may occur if two tasks A and B are placed on a queue that is served by a single thread and a resource dependency exists between A and B. One way to prevent such a deadlock is to guarantee that A and B are processed by two different threads; that way, when A stalls for B, the thread processing A will block until B makes enough progress to release the needed resource to B.

Dynamic task queues provide exactly such a deadlock-free way of scheduling potentially dependent tasks on the same queues. They guarantee that every task is processed by a separate thread. Since the number of tasks that can be scheduled at the same time is not known in advance, dynamic task queues maintain a dynamic thread pool that grows when the workload increases and shrinks when the workload dies off.

Dynamic task queues cannot (yet) be used through the DDI interfaces. Some kernel subsystems use the internal taskq calls directly to create and use dynamic task queues. The system also maintains one shared dynamic task queue, called system_taskq. You use it by specifying system_taskq as the taskq argument to the taskq_dispatch() function. We recommend that you also add TQ_NOSLEEP | TQ_NOQUEUE to the flags when using system_taskq.

20.2.2. Problems Addressed by Dynamic Task Queues

SMP scalability. In the prior implementation, each task queue uses a single list of entries and a single lock to protect it, so on a multiple-CPU SMP system, this behavior may become a performance bottleneck, especially for some "hot" (frequently accessed) task queue.
Thread usage. In many cases, programmers have no knowledge of how many worker threads they need to process task queues, so they usually choose some random number that seems reasonable, for example, number of processors. The number of worker threads servicing a task queue never changes during its lifetime and cannot adapt to a real task queue workload. Since all subsystems use their own private task queues, each with its own set of threads, the total thread pool may become very unbalancedsome threads will sleep most of the time while others will be busy most of the time.
Processing latency. A sequential taskq executes its tasks one by one, which means that the time between task dispatch and execution will be no less than the total time to process all preceding tasks in the list. This property may lead to a high processing latency and makes it more difficult to guarantee processing latency within certain boundaries (since both maximum number of tasks on the queue and individual processing time for each task need to be bound). What is worse, if some task sleeps or blocks, waiting for a resource, it blocks all tasks behind it. This cumulative behavior is very bad for latency-sensible applications. A dynamic taskq suffers the same problem when number of dispatched tasks is greater than the number of servicing threads.
Ordering constraints. Programmers should be careful to avoid dispatching dependent tasks on the same task queue. If some task depends on another one scheduled later, it may block forever because of a deadlock. This is especially a problem for a sequential task queue, but it can also happen for a dynamic one.
Blocking tasks. This is a more general issue than the previous one. Any blocking task or task that takes a very long time to complete potentially blocks or delays some or all tasks dispatched after it.
Priority dispatching. All worker threads have the same fixed priority specified at taskq creation, but there are no facilities to allow certain tasks to be scheduled with different priorities.

20.2.3. Task Pool Model

The design requirements for the new dynamic task queues specify that task queues should do the following:

Provide bounded scheduling latency.
Allow scheduling of dependent tasks in the same task queue.
Support a pool of threads that dynamically reflects system workload.
Scale well for any number of processors in the system, which means that it should not grab any global locks or write global data.
Allow scheduling of individual tasks with specified priorities.
Allow scheduling of tasks with different priorities.
Limit the total amount of servicing threads to prevent a task queue from exhausting system resources.
Be compatible with current uses of task queues.

Dependency and bounded latencies requirements 1 and 2 imply that execution of a task should not depend on execution of another one (unless some implicit dependencies are imposed by tasks themselves); this requirement can only be met if each of these tasks is processed by an independent thread. This immediately implies that the number of threads servicing a task queue should be at least the same as the number of pending tasks in this task queue and that each task should be assigned its own thread. This is the most important design decision that we made, and you can see that it logically follows the requirements. One interesting implication is that individual tasks can sleep without affecting execution of other tasks.

Obviously, we don't need to have more worker threads than pending tasks, so the number of pending jobs may also provide an upper boundary on the number of worker threads. But we do not use this requirement as a constraint, and we allow more threads. Being able to quickly use these extra threads for any new scheduled tasks more than compensates for the minor inefficiency stemming from these extra resources. Still, for efficiency we would like to keep extra threads to a minimum, so we set a hard limit for the maximum number of created threads.

The model with a worker thread assigned to each pending task also satisfies dynamic configuration requirement 3 since the total number of worker threads is based on the existing system workload (as defined by the number of outstanding tasks) and not by some arbitrary static decisions.

An interesting consequence of this design and the upper boundary requirement 6 is that if the threads' hard limit is set to M, we should not allow more than M outstanding tasks, and that means that a dispatch operation may fail even on a system with plenty of memory. This, in turn, means that users of task queues should be prepared to deal with dispatch fails and, possibly, provide backup mechanisms to schedule their tasks. We cannot do it transparently from users because of the ordering requirement 2, since a task queue has no knowledge of any ordering constraints. But programmers do have such knowledge, so if they want to schedule a task with no execution dependencies from other tasks, they can advise the dispatch operation that a task can safely be queued in a sequential task queue. In all other cases, the dispatch operation has no choice other than to fail and let the user decide the best backup method, for example, revert to using sequential task queues.

We use the term task pool to refer to the dynamic set of pending tasks, each of which has an assigned servicing thread. We chose this name to distinguish the dynamic set from the task queue since the former no longer operates like a queue.

The compatibility requirement means that the taskq_dispatch() function should work as expected for both standard task queues and the new task pool model. Since we also need to introduce priority dispatching (requirement 7), we extend our taskq API with a new function, taskq_pridispatch(), which includes an extra "priority" argument.

20.2.4. Interface Changes to Support Dynamic Task Queues

taskq_create(). For the new task pool model we use the existing API with some changes. First of all, we introduce a new taskq creation flag, TASKQ_DYNAMIC, which creates a taskq with a new semantics. The taskq created with this flag consists of a regular sequential task queue with a single worker thread (we call it "`backing queue") and a set of data structures needed to implement a dynamic pool. All servicing threads needed to process a task pool are created dynamically when tasks are dispatched. The TASKQ_DYNAMIC flag changes the meaning of the arguments passed to taskq_create(). With this flag set, nthreads means the maximum number of worker threads servicing the task pool (not counting a single task queue thread). The minalloc and maxalloc arguments are used in the usual way for the backing task queue. The TASKQ_DYNAMIC flag cannot be used in conjunction with TASKQ_CPR_SAFE, and TASKQ_PREPOPULATE flag prepopulates the backing task queue in the usual way.
taskq_dispatch(). The return type for the taskq_dispatch() function is changed from int to an opaque type taskqid_t.

typedef void *taskqid_t; taskqid_t taskq_dispatch(taskq_t *taskq, task_func_t f, void *a, uint_t flags);

This function returns NULL when the dispatch failed and some non-NULL value when the task was dispatched. The purpose of this extension is to allow cancellation of task queues in the future. The new set of dispatching flags is defined as follows:

TQ_NOQUEUEL: Do not queue a task if it can't be dispatched because available resources are lacking and return NULL. If this flag is not set and the task pool is exhausted, the task may be scheduled in a backing task queue. This flag should always be used when a task queue is used for tasks that may depend on each other for completion. Queuing dependent tasks may create deadlocks, as discussed earlier.

TQ_SLEEP: Do not wait for resources; may return NULL.

TQ_NOSLEEP: May block waiting for resources. May still fail for dynamic task queues if TQ_NOQUEUE is also specified.

taskq_wait() and taskq_lock(). We also decided to avoid supporting functions taskq_wait and taskq_lock for dynamic task queues. Support for such operations requires implementation to access some global (per-taskq) locks that we need to avoid for processor scalability. Instead, the following new functions are introduced to provide the functionality achieved by taskq_lock() in a more abstract way.
taskq_suspend(tq). Suspends any new dispatched tasks from being executed.
taskq_suspended(tq). Returns 1 if the task is in the suspended state, and 0 otherwise.
taskq_resume(tq). Resumes execution of tasks in the task queue.
taskq_member(tq, thread). Returns 1 if the thread is executing in the task queue context, and 0 otherwise. It is intended for ASSERTions checking that some piece of code is executed only by a task queue mechanism.