Section 10.4. Process Primitives

10.4. Process Primitives

Despite the relatively long discussion needed to describe a process, creating and destroying processes in Linux is straightforward.

10.4.1. Having Children

Linux has two system calls that create new processes: fork() and clone(). As mentioned earlier, clone() is used for creating threads and is discussed briefly later in this chapter. For now, we focus on fork(), which is the most popular method of process creation.

 #include <unistd.h> pid_t fork(void);

This system call has the unusual property of not returning once per invocation, but twice: once in the parent and once in the child. Note that we did not say "first in the parent" writing code that makes any assumptions about the two processes executing in a deterministic order is a very bad idea.

Each return from the fork() system call returns a different value. In the parent process, the system call returns the pid of the newly created child process; in the child process, the call returns 0.

The difference in return value is the only difference apparent to the processes. Both have the same memory image, credentials, open files,^[4] and signal handlers. Here is a simple example of a program that creates a child:

^[4] For details on how the parent's and child's open files relate to each other, see page 197.

 #include <sys/types.h> #include <stdio.h> #include <unistd.h> int main(void) {     pid_t child;     if (!(child = fork())) {         printf("in child\n");         exit(0);     }     printf("in parent -- child is %d\n", child);     return 0; }

10.4.2. Watching Your Children Die

Collecting the exit status of a child is called waiting on the process. There are four ways this can be done, although only one of the calls is provided by the kernel. The other three methods are implemented in the standard C library. As the kernel system call takes four arguments, it is called wait4().

 pid_t wait4(pid_t pid, int *status, int options, struct rusage *rusage);

The first argument, pid, is the process whose exit code should be returned. It can take on a number of special values.

`< -1`	Waits for any child whose pgid is the same as the absolute value of `pid`.
`= -1`	Waits for any child to terminate.
`= 0`	Waits for a child in the same process group current process.^[5]
`> 0`	Waits for process `pid` to exit.

^[5] Process groups are described on pages 136 138.

The second parameter is a pointer to an integer that gets set to the exit status of the process that caused the wait4() to return (which we hereafter call the examined process). The format of the returned status is convoluted, and a set of macros is provided to make sense of it.

Three events cause wait4() to return the status of the examined process: The process could have exited, it could have been terminated by a kill() (sent a fatal signal), or it could have been stopped for some reason.^[6] You can find out which of these occurred through the following macros, each of which takes the returned status from wait4() as the sole parameter:

^[6] See Chapter 15 for reasons why this might happen.

 WIFEXITED(status)

Returns true if the process exited normally. A process exits normally when its main() function returns or the program calls exit(). If WIFEXITED() is true, WEXITSTATUS(status) returns the process's exit code.

 WIFSIGNALED(status)

Returns true if the process was terminated due to a signal (this is what happens for processes terminated with kill()). If this is the case, WTERMSIG(status) returns the signal number that terminated the process.

 WIFSTOPPED(status)

If the process has been stopped by a signal, WIFSTOPPED() returns true and WSTOPSIG(status) returns the signal that stopped the process. wait4() returns information on stopped processes only if WUNTRACED was specified as an option.

The options argument controls how the call behaves. WNOHANG causes the call to return immediately. If no processes are ready to report their status, the call returns 0 rather than a valid pid. WUNtrACED causes wait4() to return if an appropriate child has been stopped. See Chapter 15 for more information on stopped processes. Both of these behaviors may be specified by bitwise OR'ing the two values together.

The final parameter to wait4(), a pointer to a struct rusage, gets filled in with the resource usage of the examined process and all the examined process's children. See the discussion of geTRusage() and RUSAGE_BOTH on pages 118-119 for more information on what this entails. If this parameter is NULL, no status information is returned.

There are three other interfaces to wait4(), all of which provide subsets of its functionality. Here is a summary of the alternative interfaces.

 pid_t wait(int *status)

The only parameter to wait() is a pointer to the location to store the terminated process's return code. This function always blocks until a child has terminated.

 pid_t waitpid(pid_t pid, int *status, int options)

The waitpid() function is similar to wait4(); the only difference is that it does not return resource usage information on the terminated process.

 pid_t wait3(int *status, int options, struct rusage *rusage)

This function is also similar to wait4(), but it does not allow the caller to specify which child should be checked.

10.4.3. Running New Programs

Although there are six ways to run one program from another, they all do about the same thing replace the currently running program with another. Note the word replace all traces of the currently running program disappear. If you want to have the original program stick around, you must create a new process with fork() and then execute the new program in the child process.

The six functions feature only slight differences in the interface. Only one of these functions, execve(), is actually a system call under Linux. The rest of the functions are implemented in user-space libraries and utilize execve() to execute the new program. Here are the prototypes of the exec() family of functions:

 int execl(const char * path, const char * arg0, ...); int execlp(const char * file, const char * arg0, ...); int execle(const char * path, const char * arg0, ...); int execv(const char * path, const char ** argv); int execvp(const char * file, const char ** argv); int execve(const char * file, const char ** argv, const char ** envp);

As mentioned, all of these programs try to replace the current program with a new program. If they succeed, they never return (as the program that called them is no longer running). If they fail, they return -1 and the error code is stored in errno, as with any other system call. When a new program is run (or exec() ed) it gets passed an array of arguments (argv) and an array of environment variables (envp). Each element in envp is of the form VARIABLE=value.^[7]

^[7] This is the same format the command env uses to print the current environment variables settings, and the envp argument is of the same type as the environ global variable.

The primary difference between the various exec() functions is how the command line arguments are passed to the new program. The execl family passes each element in argv (the command-line arguments) as a separate argument to the function, and NULL terminates the entire list. Traditionally, the first element in argv is the command used to invoke the new program. For example, the shell command /bin/cat /etc/passwd/etc/group normally results in the following exec call:

 execl("/bin/cat", "/bin/cat", "/etc/passwd", "/etc/group", NULL);

The first argument is the full path to the program being executed and the rest of the arguments get passed to the program as argv. The final parameter to execl() must be NULL it indicates the end of the parameter list. If you omit the NULL, the function call is likely to result in either a segmentation fault or return EINVAL. The environment passed to the new program is whatever is pointed to by the environ global variable, as mentioned on page 116.

The execv functions pass the command-line argument as a C array of strings,^[8] which is the same format used to pass argv to the new program. The final entry in the argv array must be NULL to indicate the end of the array, and the first element (argv[0]) should contain the name of the program that was invoked. Our ./cat /etc/passwd /etc/group example would be coded using execv like this:

^[8] Technically, a pointer to a NULL-terminated array of pointers to '\0' terminated arrays of characters. If this does not make sense, see [Kernighan, 1988].

 char * argv[] = { "./cat", "/etc/passwd", "/etc/group", NULL }; execv("/bin/cat", argv);

If you need to pass a specific environment to the new program, execle() and execve() are available. They are exactly like execl() and execv() but they take a pointer to the environment as their final argument. The environment is set up just like argv.

For example, here is one way to execute /usr/bin/env (which prints out the environment it was passed) with a small environment:

 char * newenv[] = { "PATH=/bin:/usr/bin",                     "HOME=/home/sweethome", NULL }; execle("/usr/bin/env", "/usr/bin/env", NULL, newenv);

Here is the same idea implemented with execve():

 char * argv[] = { "/usr/bin/env", NULL }; char * newenv[] = { "PATH=/bin:/usr/bin",                     "HOME=/home/sweethome", NULL }; execve("/usr/bin/env", argv, newenv);

The final two functions, execlp() and execvp(), differ from the first two by searching the current path (set by the PATH environment variable) for the program to execute. The arguments to the program are not modified, however, so argv[0] does not contain the full path to the program being run. Here are modified versions of our first example that search for cat in the current PATH:

 execlp("cat", "cat", "/etc/passwd", "/etc/group", NULL); char * argv[] = { "cat", "/etc/passwd", "/etc/group", NULL }; execvp("cat", argv);

If execl() or execv() were used instead, those code fragments would fail unless cat was located in the current directory.

If you are trying to run a program with a specific environment while still searching the path, you need to search the path manually and use execle() or execve(), because none of the available exec() functions does quite what you want.

Signal handlers are preserved across the exec() functions in a slightly nonobvious way; the mechanism is described on page 205.

10.4.4. Faster Process Creation with `vfork()`

Normally processes that fork() immediately exec() another program (this is what shells do every time you type a command), making the full semantics of fork() more computationally expensive than is necessary. To help optimize this common case, vfork() is provided.

 #include <unistd.h> pid_t vfork(void);

Rather than creating an entirely new execution environment for the new process, vfork() creates a new process that shares the memory of the original process. The new process is expected to either _exit() or exec() another process very quickly, and the behavior is undefined if it modifies any memory, returns from the function the vfork() is contained in, or calls any new functions. In addition, the original process is suspended until the new one either terminates or calls an exec() function.^[9] Not all systems provide the memory-sharing and parent-suspending semantics of vfork(), however, and applications should never rely upon that behavior.

^[9] vfork() was motivated by older systems that needed to copy all of the memory used by the original process as part of the fork(). Modern operating systems use copy-on-write, which copies memory regions only as necessary, as discussed in most operating system texts [Vahalia, 1997] [Bach, 1986]. This facility makes fork() almost as fast as vfork(), and much easier to use.

10.4.5. Killing Yourself

Processes terminate themselves by calling either exit() or _exit(). When a process's main() function returns, the standard C library calls exit() with the value returned from main() as the parameter.

 void exit(int exitCode) void _exit(int exitCode)

The two forms, exit() and _exit(), differ in that exit() is a function in the C library, while _exit() is a system call. The _exit() system call terminates the program immediately, and the exitCode is stored as the exit code of the process. When exit() is used, functions registered by atexit() are called before the library calls _exit(exitCode). Among other things, this allows the ANSI/ISO standard I/O library to flush all its buffers.

Registering functions to be run when exit() is used is done through the atexit() function.

 int atexit(void (*function)(void));

The only parameter passed to atexit() is a pointer to a function. When exit() is invoked, all the functions registered with atexit() are called in the opposite order from which they were registered. Note that if _exit() is used or the process is terminated due to a signal (see Chapter 12 for details on signals), functions registered via atexit() are not called.

10.4.6. Killing Others

Destroying other processes is almost as easy as creating a new one just kill it:

 int kill(pid_t pid, int signum);

pid should be the pid of the process to kill, and signum describes how to kill it. There are two choices^[10] for how to kill a child. You can use SIGTERM to terminate the process gently. This means that the process can ask the kernel to tell it when someone is trying to kill it so that it can terminate gracefully (saving files, for example). The process may also ignore this type of request for it to terminate and, instead, continue running. Using SIGKILL for the signum parameter kills the process immediately, no questions asked. If signum is zero, then kill() checks to see if the process calling kill() has the proper permissions, and returns zero if so and nonzero if its permissions are insufficient. This provides a way for a process to check the validity of a pid.

^[10] This is a gross oversimplification. kill() actually sends a signal, and signals are a complicated topic in their own right. See Chapter 12 for a complete description of what signals are and how to use them.

The pid parameter can take on four types of values under Linux.

`pid > 0`	The signal is sent to the process whose pid is `pid`. If no process exists with that pid, `ESRCH` is returned.
`pid < -1`	The signal is sent to all the processes in the process group whose pgid is `-pid`. For example, `kill(-5316, SIGKILL)` immediately terminates all the processes in process group 5316. This ability is used by job control shells, as discussed in Chapter 15.
`pid = 0`	The signal is sent to all the processes in the current process's process group.
`pid = -1`	The signal is sent to all the processes on the system except the init process. This is used during system shutdown.

Processes can normally kill() only processes that share the same effective user ID as themselves. There are two exceptions to this rule. First, processes with an effective uid of 0 may kill() any process on the system. Second, any process can send a SIGCONT signal to any other process in the same session.^[11]

^[11] This is to allow job control shells to restart processes that have changed their effective user ID. See chapter Chapter 15 for more information on job control.

10.4.7. Dumping Core

Although we just mentioned that passing SIGTERM and SIGKILL to kill() causes a process to terminate, you can use quite a few different values (Chapter 12 discusses all of them). Some of these, such as SIGABRT, cause the program to dump core before dying. A program's core dump contains a complete history of the state of the program when it died.^[12] Most debuggers, including gdb, can analyze a core file and tell you what the program was doing when it died, as well as let you inspect the defunct process's memory image. Core dumps end up in the process's current working directory in a file called (simply enough) core.

^[12] A once-popular form of computer memory consists of small iron rings arranged in a matrix, with each ring held in place by two wires that are used to sense and set the magnetic polarity of the ring. Each of the rings is called a core, and the whole thing is core memory. So a core dump is a copy of the state of the system's memory (or core) at a given time.

When a process has violated some of the system's requirements (such as trying to access memory that it is not allowed to access), the kernel terminates the process by calling an internal version of kill() with a parameter that causes a core dump. The kernel may kill a process for several reasons, including arithmetic violations, such as division by zero; the programs running illegal instructions; and the programs trying to access inaccessible regions of memory. This last case causes a segmentation fault, which results in the message segmentation fault (core dumped). If you do any reasonable amount of Linux programming, you are sure to tire of this message!

If a process's resource limit for core files is 0 (see page 120 for details on the core resource limit), no core file is generated.