3.6. Keeping Track of Processes: Basic Scheduler Construction
Until this point, we kept the concepts of the states and the transitions process-centric. We have not spoken about how the transition is managed, nor have we spoken about the kernel infrastructure, which
performs
the running and stopping of processes. The scheduler handles all these details. Having finished the exploration of the process lifecycle, we now introduce the basics of the scheduler and how it
interacts
with the
do_fork()
function during process creation.
3.6.1. Basic Structure
The scheduler operates on a structure called a
run queue
. There is one run queue per CPU on the system. The
core
data structures within a run queue are two priority-ordered arrays. One of these contains active tasks and the other contains
expired
tasks
. In general, an active task runs for a set amount of time, the length of its
timeslice
or quantum, and is then inserted into the expired array to wait for more CPU time. When the active array is empty, the scheduler swaps the two arrays by exchanging the active and expired pointers. The scheduler then begins executing tasks on the new active array.
Figure 3.12 illustrates the priority arrays within the run queue. The definition of the priority array structure is as
follows
:
-----------------------------------------------------------------------
kernel/sched.c
192 struct prio_array {
193 int nr_active;
194 unsigned long bitmap[BITMAP_SIZE];
195 struct list_head queue[MAX_PRIO];
196 };
-----------------------------------------------------------------------
The fields of the
prio_array
struct are as follows:
-
nr_active
. A counter that keeps track of the number of tasks held in the priority array.
-
bitmap
. This keeps track of the priorities within the array. The actual length of
bitmap
depends on the
size
of unsigned longs on the system. It will always be enough to store
MAX_PRIO
bits, but it could be longer.
-
queue
. An array that stores lists of tasks. Each list holds tasks of a certain priority. Thus,
queue[0]
holds a list of all tasks of priority 0,
queue[1]
holds a list of all tasks of priority 1, and so on.
With this basic understanding of how a run queue is organized, we can now embark on following a task through the scheduler on a single CPU system.
3.6.2.
Waking
Up from Waiting or Activation
Recall that when a process calls
fork()
, a new process is made. As previously mentioned, the process calling
fork()
is called the parent, and the new process is called the child. The newly created process needs to be scheduled for access to the CPU. This occurs via the
do_fork()
function.
Two important lines deal with the scheduler in
do_fork()
related
to waking up processes.
copy_process()
, called on line 1184 of
linux/kernel/fork.c
, calls the function
sched_fork()
, which initializes the process for an
impending
insertion into the scheduler's run queue.
wake_up_forked_process()
, called on line 1222 of
linux/kernel/fork.c
, takes the
initialized
process and
inserts
it into the run queue. Initialization and insertion have been separated to allow for the new process to be
killed
, or
otherwise
terminated
, before being scheduled. The process will only be scheduled if it is created, initialized successfully, and has no pending signals.
3.6.2.1. sched_fork(): Scheduler Initialization for Newly Forked Process
The
sched_fork()
function performs the infrastructure setup the scheduler requires for a newly forked process:
-----------------------------------------------------------------------
kernel/sched.c
719 void sched_fork(task_t *p)
720 {
721 /*
722 * We mark the process as running here, but have not actually
723 * inserted it onto the runqueue yet. This guarantees that
724 * nobody will actually run it, and a signal or other external
725 * event cannot wake it up and insert it on the runqueue either.
726 */
727 p->state = TASK_RUNNING;
728 INIT_LIST_HEAD(&p->run_list);
729 p->array = NULL;
730 spin_lock_init(&p->switch_lock);
-----------------------------------------------------------------------
Line 727
The process is
marked
as running by setting the
state
attribute in the task structure to
TASK_RUNNING
to ensure that no event can insert it on the run queue and run the process before
do_fork()
and
copy_process()
have
verified
that the process was created properly. When that verification
passes
,
do_fork()
adds it to the run queue via
wake_up_forked_process()
.
Line 728730
The process'
run_list
field is initialized. When the process is activated, its
run_list
field is linked into the queue structure of a priority array in the run queue. The process'
array
field is set to
NULL
to represent that it is not part of either priority array on a run queue. The
next
block of
sched_fork()
, lines 731 to 739, deals with kernel preemption. (Refer to Chapter 7 for more information on preemption.)
-----------------------------------------------------------------------
kernel/sched.c
740 /*
741 * Share the timeslice between parent and child, thus the
742 * total amount of pending timeslices in the system doesn't change,
743 * resulting in more scheduling fairness.
744 */
745 local_irq_disable();
746 p->time_slice = (current->time_slice + 1) >> 1;
747 /*
748 * The remainder of the first timeslice might be recovered by
749 * the parent if the child exits early enough.
750 */
751 p->first_time_slice = 1;
752 current->time_slice >>= 1;
753 p->timestamp = sched_clock();
754 if (!current->time_slice) {
755 /*
756 * This case is rare, it happens when the parent has only
757 * a single jiffy left from its timeslice. Taking the
758 * runqueue lock is not a problem.
759 */
760 current->time_slice = 1;
761 preempt_disable();
762 scheduler_tick(0, 0);
763 local_irq_enable();
764 preempt_enable();
765 } else
766 local_irq_enable();
767 }
-----------------------------------------------------------------------
Lines 740753
After disabling local interrupts, we divide the parent's timeslice between the parent and the child using the shift operator. The new process' first timeslice is set to 1 because it hasn't been run yet and its timestamp is initialized to the current time in nanosec units.
Lines 754767
If the parent's timeslice is 1, the division results in the parent having 0 time left to run. Because the parent was the current process on the scheduler, we need the scheduler to choose a new process. This is done by calling
scheduler_tick()
(on line 762). Preemption is disabled to ensure that the scheduler chooses a new current process without being
interrupted
. Once all this is done, we enable preemption and restore local interrupts.
At this point, the newly created process has had its scheduler-specific
variables
initialized and has been given an initial timeslice of half the remaining timeslice of its parent. By forcing a process to sacrifice a portion of the CPU time it's been allocated and giving that time to its child, the kernel
prevents
processes from
seizing
large
chunks
of processor time. If processes were given a set amount of time, a malicious process could spawn many children and quickly become a CPU hog.
After a process has been successfully initialized, and that initialization verified,
do_fork()
calls
wake_up_forked_process()
:
-----------------------------------------------------------------------
kernel/sched.c
922 /*
923 * wake_up_forked_process - wake up a freshly forked process.
924 *
925 * This function will do some initial scheduler statistics housekeeping
926 * that must be done for every newly created process.
927 */
928 void fastcall wake_up_forked_process(task_t * p)
929 {
930 unsigned long flags;
931 runqueue_t *rq = task_rq_lock(current, &flags);
932
933 BUG_ON(p->state != TASK_RUNNING);
934
935 /*
936 * We decrease the sleep average of forking parents
937 * and children as well, to keep max-interactive tasks
938 * from forking tasks that are max-interactive.
939 */
940 current->sleep_avg = JIFFIES_TO_NS(CURRENT_BONUS(current) *
941 PARENT_PENALTY / 100 * MAX_SLEEP_AVG / MAX_BONUS);
942
943 p->sleep_avg = JIFFIES_TO_NS(CURRENT_BONUS(p) *
944 CHILD_PENALTY / 100 * MAX_SLEEP_AVG / MAX_BONUS);
945
946 p->interactive_credit = 0;
947
948 p->prio = effective_prio(p);
949 set_task_cpu(p, smp_processor_id());
950
951 if (unlikely(!current->array))
952 __activate_task(p, rq);
953 else {
954 p->prio = current->prio;
955 list_add_tail(&p->run_list, ¤t->run_list);
956 p->array = current->array;
957 p->array->nr_active++;
958 rq->nr_running++;
959 }
960 task_rq_unlock(rq, &flags);
961 }
-----------------------------------------------------------------------
Lines 930934
The first thing that the scheduler does is lock the run queue structure. Any modifications to the run queue must be made with the lock held. We also throw a bug notice if the process isn't marked as
TASK_RUNNING
, which it should be thanks to the initialization in
sched_fork()
(see Line 727 in
kernel/sched.c
shown previously).
Lines 940947
The scheduler calculates the sleep average of the parent and child processes. The sleep average is the value of how much time a process spends sleeping compared to how much time it spends running. It is incremented by the amount of time the process slept, and it is decremented on each timer tick while it's running. An interactive, or I/O bound, process spends most of its time waiting for input and normally has a high sleep average. A non-interactive, or CPU-bound, process spends most of its time using the CPU instead of waiting for I/O and has a low sleep average. Because users want to see results of their input, like keyboard strokes or mouse movements, interactive processes are given more scheduling advantages than non-interactive processes. Specifically, the scheduler reinserts an interactive process into the active priority array after its timeslice
expires
. To prevent an interactive process from creating a non-interactive child process and thereby seizing a disproportionate share of the CPU, these formulas are used to lower the parent and child sleep averages. If the newly forked process is interactive, it soon sleeps enough to regain any scheduling advantages it might have lost.
Line 948
The function
effective_prio()
modifies the process' static priority. It returns a priority between 100 and 139 (
MAX_RT_PRIO
to
MAX__PRIO-1
). The process' static priority can be modified by up to 5 in either direction based on its previous CPU usage and time spent sleeping, but it always remains in this range. From the command line, we talk about the
nice
value of a process, which can range from +19 to -20 (
lowest
to highest priority). A
nice
priority of 0 corresponds to a static priority of 120.
Line 749
The process has its CPU attribute set to the current CPU.
Lines 951960
The overview of this code block is that the new process, or child, copies the scheduling information from its parent, which is
current
, and then inserts itself into the run queue in the appropriate place. We have finished our modifications of the run queue, so we unlock it. The following paragraph and Figure 3.13 discuss this process in more detail.
The pointer
array
points to a priority array in the run queue. If the current process isn't pointing to a priority array, it means that the current process has finished or is asleep. In that case, the current process'
runlist
field is not in the queue of the run queue's priority array, which means that the
list_add_tail()
operation (on line 955) would fail. Instead, we insert the newly created process using
__activate_task()
, which adds the new process to the queue without referring to its parent.
In the normal case, when the current process is waiting for CPU time on a run queue, the process is added to the queue residing at slot
p->prio
in the priority array. The array that the process was added to has its process counter,
nr_active
, incremented and the run queue has its process counter,
nr_running
, incremented. Finally, we unlock the run queue lock.
The case where the current process doesn't point to a priority array on the run queue is useful in seeing how the scheduler
manages
the run queue and priority array attributes.
-----------------------------------------------------------------------
kernel/sched.c
366 static inline void __activate_task(task_t *p, runqueue_t *rq)
367 {
368 enqueue_task(p, rq->active);
369 rq->nr_running++;
370 }
-----------------------------------------------------------------------
__activate_task()
places the given process
p
on to the active priority array on the run queue
rq
and
increments
the run queue's
nr_running
field, which is the counter for total number of processes that are on the run queue.
-----------------------------------------------------------------------
kernel/sched.c
311 static void enqueue_task(struct task_struct *p, prio_array_t *array)
312 {
313 list_add_tail(&p->run_list, array->queue + p->prio);
314 __set_bit(p->prio, array->bitmap);
315 array->nr_active++;
316 p->array = array;
317 }
-----------------------------------------------------------------------
Lines 311312
enqueue_task()
takes a process
p
and places it on priority array
array
, while initializing aspects of the priority array.
Line 313
The process'
run_list
is added to the tail of the queue located at
p->prio
in the priority array.
Line 314
The priority array's bitmap at priority
p->prio
is set so when the scheduler runs, it can see that there is a process to run at priority
p->prio
.
Line 315
The priority array's process counter is incremented to reflect the addition of the new process.
Line 316
The process' array pointer is set to the priority array to which it was just added.
To recap, the act of adding a newly forked process is
fairly
straightforward, even though the code can be confusing because of similar
names
throughout the scheduler. A process is placed at the end of a list in a run queue's priority array at the slot specified by the process' priority. The process then records the location of the priority array and the list it's located in within its structure.
|