3.3. Process Creation: fork(), vfork(), and clone() System CallsAfter the sample code is compiled into a file (in our case, an ELF executable[2]), we call it from the command line. Look at what happens when we press the Return key. We already mentioned that any given process is created by another process. The operating system provides the functionality to do this by means of the fork(), vfork(), and clone() system calls.
The C library provides three functions that issue these three system calls. The prototypes of these functions are declared in <unistd.h>. Figure 3.9 shows how a process that calls fork() executes the system call sys_fork(). This figure describes how kernel code performs the actual process creation. In a similar manner, vfork() calls sys_fork(), and clone() calls sys_clone(). Figure 3.9. Process Creation System CallsAll three of these system calls eventually call do_fork(), which is a kernel function that performs the bulk of the actions related to process creation. You might wonder why three different functions are available to create a process. Each function slightly differs in how it creates a process, and there are specific reasons why one would be chosen over the other. When we press Return at the shell prompt, the shell creates the new process that executes our program by means of a call to fork(). In fact, if we type the command ls at the shell and press Return, the pseudocode of the shell at that moment looks something like this: if( (pid = fork()) == 0 ) execve("foo"); else waitpid(pid); We can now look at the functions and trace them down to the system call. Although our program calls fork(), it could just as easily have called vfork() or clone(), which is why we introduced all three functions in this section. The first function we look at is fork(). We delve through the calls fork(), sys_fork(), and do_fork(). We follow that with vfork() and finally look at clone() and trace them down to the do_fork() call. 3.3.1. fork() FunctionThe fork() function returns twice: once in the parent and once in the child process. If it returns in the child process, fork() returns 0. If it returns in the parent, fork() returns the child's PID. When the fork() function is called, the function places the necessary information in the appropriate registers, including the index into the system call table where the pointer to the system call resides. The processor we are running on determines the registers into which this information is placed. At this point, if you want to continue the sequential ordering of events, look at the "Interrupts" section in this chapter to see how sys_fork() is called. However, it is not necessary to understand how a new process gets created. Let's now look at the sys_fork() function. This function does little else than call the do_fork() function. Notice that the sys_fork() function is architecture dependent because it accesses function parameters passed in through the system registers. ----------------------------------------------------------------------- arch/i386/kernel/process.c asmlinkage int sys_fork(struct pt_regs regs) { return do_fork(SIGCHLD, regs.esp, ®s, 0, NULL, NULL); } ----------------------------------------------------------------------- ----------------------------------------------------------------------- arch/ppc/kernel/process.c int sys_fork(int p1, int p2, int p3, int p4, int p5, int p6, struct pt_regs *regs) { CHECK_FULL_REGS(regs); return do_fork(SIGCHLD, regs->gpr[1], regs, 0, NULL, NULL); } ----------------------------------------------------------------------- The two architectures take in different parameters to the system call. The structure pt_regs holds information such as the stack pointer. The fact that gpr[1] holds the stack pointer in PPC, whereas %esp[3] holds the stack pointer in x86, is known by convention.
3.3.2. vfork() FunctionThe vfork() function is similar to the fork() function with the exception that the parent process is blocked until the child calls exit() or exec(). sys_vfork() arch/i386/kernel/process.c asmlinkage int sys_vfork(struct pt_regs regs) { return do_fork(CLONE_VFORK | CLONE_VM | SIGCHLD, regs.ep, ®s, 0, NULL, NULL); } ----------------------------------------------------------------------- arch/ppc/kernel/process.c int sys_vfork(int p1, int p2, int p3, int p4, int p5, int p6, struct pt_regs *regs) { CHECK_FULL_REGS(regs); return do_fork(CLONE_VFORK | CLONE_VM | SIGCHLD, regs->gpr[1], regs, 0, NULL, NULL); } ----------------------------------------------------------------------- The only difference between the calls to sys_fork() in sys_vfork() and sys_fork() are the flags that do_fork() is passed. The presence of these flags are used later to determine if the added behavior just described (of blocking the parent) will be executed. 3.3.3. clone() FunctionThe clone() library function, unlike fork() and vfork(), takes in a pointer to a function along with its argument. The child process created by do_fork()calls this function as soon as it gets created.
As Table 3.4 shows, the only difference between fork(), vfork(), and clone() is which flags are set in the subsequent calls to do_fork().
Finally, we get to do_fork(), which performs the real process creation. Recall that up to this point, we only have the parent executing the call to fork(), which then enables the system call sys_fork(); we still do not have a new process. Our program foo still exists as an executable file on disk. It is not running or in memory. 3.3.4. do_fork() FunctionWe follow the kernel side execution of do_fork() line by line as we describe the details behind the creation of a new process.
Lines 11781183The code begins by verifying if the parent wants the new process ptraced. ptracing references are prevalent within functions dealing with processes. This book explains only the ptrace references at a high level. To determine whether a child can be traced, fork_traceflag() must verify the value of clone_flags. If CLONE_VFORK is set in clone_flags, if SIGCHLD is not to be caught by the parent, or if the current process also has PT_TRACE_FORK set, the child is traced, unless the CLONE_UNTRACED or CLONE_IDLETASK flags have also been set. Line 1184This line is where a new process is created and where the values in the registers are copied out. The copy_process() function performs the bulk of the new process space creation and descriptor field definition. However, the start of the new process does not take place until later. The details of copy_process() make more sense when the explanation is scheduler-centric. See the "Keeping Track of Processes: Basic Scheduler Construction" section in this chapter for more detail on what happens here. ----------------------------------------------------------------------- kernel/fork.c ... 1189 pid = IS_ERR(p) ? PTR_ERR(p) : p->pid; 1190 1191 if (!IS_ERR(p)) { 1192 struct completion vfork; 1193 1194 if (clone_flags & CLONE_VFORK) { 1195 p->vfork_done = &vfork; 1196 init_completion(&vfork); 1197 } 1198 1199 if ((p->ptrace & PT_PTRACED) || (clone_flags & CLONE_STOPPED)) { ... 1203 sigaddset(&p->pending.signal, SIGSTOP); 1204 set_tsk_thread_flag(p, TIF_SIGPENDING); 1205 } ... ----------------------------------------------------------------------- Line 1189This is a check for pointer errors. If we find a pointer error, we return the pointer error without further ado. Lines 11941197At this point, check if do_fork() was called from vfork(). If it was, enable the wait queue involved with vfork(). Lines 11991205If the parent is being traced or the clone is set to CLONE_STOPPED, the child is issued a SIGSTOP signal upon startup, thus starting in a stopped state. ----------------------------------------------------------------------- kernel/fork.c 1207 if (!(clone_flags & CLONE_STOPPED)) { ... 1222 wake_up_forked_process(p); 1223 } else { 1224 int cpu = get_cpu(); 1225 1226 p->state = TASK_STOPPED; 1227 if (!(clone_flags & CLONE_STOPPED)) 1228 wake_up_forked_process(p); /* do this last */ 1229 ++total_forks; 1230 1231 if (unlikely (trace)) { 1232 current->ptrace_message = pid; 1233 ptrace_notify ((trace << 8) | SIGTRAP); 1234 } 1235 1236 if (clone_flags & CLONE_VFORK) { 1237 wait_for_completion(&vfork); 1238 if (unlikely (current->ptrace & PT_TRACE_VFORK_DONE)) 1239 ptrace_notify ((PTRACE_EVENT_VFORK_DONE << 8) | SIGTRAP); 1240 } else ... 1248 set_need_resched(); 1249 } 1250 return pid; 1251 } ----------------------------------------------------------------------- Lines 12261229In this block, we set the state of the task to TASK_STOPPED. If the CLONE_STOPPED flag was not set in clone_flags, we wake up the child process; otherwise, we leave it waiting for its wakeup signal. Lines 12311234If ptracing has been enabled on the parent, we send a notification. Lines 12361239If this was originally a call to vfork(), this is where we set the parent to blocking and send a notification to the trace if enabled. This is implemented by the parent being placed in a wait queue and remaining there in a TASK_UNINTERRUPTIBLE state until the child calls exit() or execve(). Line 1248We set need_resched in the current task (the parent). This allows the child process to run first. |