exec s Minions | Using Processes

Table of contents:

EXERCISE

Processes generate child processes for a number of reasons. In a Linux environment, there are several long-lived processes, which run continuously in the background and provide system services upon demand. These processes, called daemon processes, frequently generate child processes to carry out the requested service. Some daemon processes commonly found in a Linux environment are lpd , the line printer daemon; xinetd , the extended Internet services daemon; and syslogd , the system logging daemon. Some problems (such as with databases) lend themselves to concurrent type solutions that can be effected via multiple child processes executing the same code. More commonly, such as when the shell processes a command, a process procreates a child process because it would like to transform the child process by changing the program code the child process is executing.

In Linux, any one of five library functions and one system call can be used to replace the current process image with a new image. [1] The library functions act as a front end to the system call. The library functions are discussed in the exec manual pages (Section 3), while the system call ( execve ) warrants its own manual page entry in Section 2. Any of these can be directly invoked by the programmer. For ease of comparison, the library functions and the system call are discussed as a group . The phrase exec call will reference this group.

[1] In some versions of UNIX, such as Solaris, all the exec calls are system calls and are grouped together as library functions and discussed in one section of the manual. Linux has a more historic approach to things.

It is important to remember that when a process issues any exec call, if the call is successful, the existing process is overlaid with a new set of program code. The text, data ( initialized and uninitialized ), and stack segment of the process are replaced and only the u ( user ) area of the process remains the same. The new program code (if a C/C++ binary) begins its execution at the function main . Since the system is now executing a different set of code for the same process, some things, by necessity, must change:

Signals that were specified as being caught by the process (i.e., associated with a signal-catching routine) are reset to their default action. This is necessary, as the addresses for the signal-catching routines are no longer valid.
In a similar vein, if the process was profiling (determining how much time is spent in individual routines), the profiling will be turned off in the overlaid process.
If the new program has its SUID bit set, the effective EUID and EGID are set accordingly .

The program to be executed can be a script. In this case, the script should have its execute bit set and start with the line #! interpreter [ arg(s) ], where interpreter is a valid executable (but not another script). If successful, the exec calls do not return, as the initial calling image is lost when overlaid with a new image.

Before we delve into these calls, we should take a quick look at what normally transpires when a valid command is issued at the system (shell) level, as this process will reflect the functionality available in a program. If the command issued is

linux$ cat file.txt > file2.txt

the shell parses the command line and divides it into valid tokens (e.g., cat , file.txt , etc.). The shell (via a call to fork ) then generates a child process. After the fork , the shell closes standard output and opens the file file2.txt , mapping it to standard output in the child process. Next , by calling execve , the shell overlays the current program code with the program code for the command (in this case, the code for cat ). When the command is finished, the shell redisplays its prompt. Figure 3.2 shows the process creation and command execution sequence.

Figure 3.2. Process creation and command execution at the shell level.

graphics/03fig02.gif

While the command is executing, the shell, by default, waits in the background. As we will see, there is a wait system call that allows the shell or any other process to wait. Should the user place an & at the end of the command (to indicate to the shell that the command be placed in background), the shell will not wait and will return immediately with its prompt. When the command is finished, it may perform a call to exit or return when in the function main . The integer value passed to these calls is made available to the parent process via an argument to the wait system call. When on the command line, the returned value is stored in the system variable named status . If in the Bourne or BASH shell you issue the command

linux$ echo $?

the system will display the value returned by the last command executed. As the mapping of standard output to the file file2.txt was done in the child process and not in the shell, the I/O redirection has no further impact on ensuing command sequences.

We should note that it is possible for a user at the command line to issue an exec call. The syntax would be

linux$ exec command [arguments]

However, most users would not do this. The current process (the shell) would be overlaid with the program code for the command. Once the command was finished, the user would be logged out, as the original shell process would no longer exist!

In a programming environment, the exec calls can be used to execute another program. The prototypes for the exec calls are listed in Table 3.1.

Table 3.1. The exec Call Prototypes.

[View full width]

#include 

extern char **environ;

int execl (const char *path, const char *arg, ...);
int execv (const char *path, char *const argv[]);

int execle(const char *path, const char *arg , ...

, char * const envp[]);
int execve(const char *path, char *const argv[],

char * const envp[]);

int execlp(const char *file, const char *arg, ...);
int execvp(const char *file, char *const argv[]);

The naming convention for these system calls reflects their functionality. Each call starts with the letters exec . The next letter in the call name indicates if the call takes its arguments in a list format (i.e., literally specified as a series of arguments) or as a pointer to an array of arguments (analogous to the argv structure discussed earlier). The presence of the letter l indicates a list arrangement (a variable argument listsee the manual page on stdarg for details); v indicates the array or vector arrangement. The next letter of the call name (if present) is either an e or a p . The presence of an e indicates the programmers will construct (in the array/vector format) and pass their own environment variable list. The passed environment variable list will become the third argument to the function main (i.e., envp ). As noted in the section on environment variables , envp is of limited practical value. When the programmer is responsible for the environment, the current environment variable list is not passed. The presence of a p indicates the current environment PATH variable should be used when searching for a file whose name does not contain a slash. [2] In the four calls, where the PATH string is not used ( execl , execv , execle and execve ), the path to the program to be executed must be fully specified.

[2] If the executable file is a script, the Bourne shell ( /bin/sh ) is invoked to execute the script. The shell is then passed the specified argument information.

The functionality of the exec system calls is best summarized by Table 3.2.

Table 3.2. exec Call Functionality.

Library Call Name	Argument Format	Pass Current Set of Environment Variables?	Search of `PATH` Automatic?
`execl`	list	yes	no
`execv`	array	yes	no
`execle`	list	no	no
`execve`	array	no	no
`execlp`	list	yes	yes
`execvp`	array	yes	yes

Of the six variations, execlp and execvp calls are used most frequently (as automatic environment passing and path searching are usually desirable) and will be explained in detail.

3.3.1 execlp

The execlp library function (Table 3.3) is used when the number of arguments to be passed to the program to be executed is known in advance.

When using execlp , the initial argument, file , is a pointer to the file that contains the program code to be executed. If this file reference begins with a /, it is assumed that the reference is an absolute path to the file. In this circumstance, it would appear that the p specification ( execlp ) is superfluous; however, the PATH string is still used if other arguments are file names or if the code to be executed contains file references. If no / is found, each of the directories specified in the PATH variable will be, in turn , preappended to the file name specified, and the first valid program reference found will be the one executed. It is a good practice to fully specify the program to be executed in all situations to prevent a program with the same name, found in a prior PATH string directory, from being inadvertently executed. For the execlp call to be successful, the file referenced must be found and be marked as executable. If the call fails, it returns a -1 and sets errno to indicate the error. As the overlaying of one process image with another is very complex, the possibilities for failure are numerous (as shown in Table 3.4).

Table 3.3. Summary of the execlp Library Function.

Include File(s)	extern char **environ;		Manual Section	3
Summary	`int execlp(const char file,const char arg, . . .);`
Return	Success	Failure	Sets `errno`
	Does not return	-1	Yes

Table 3.4. exec Error Messages.

#	Constant	`perror` Message	Explanation
1	EPERM	Operation not permitted	The process is being traced, the user is not the superuser, and the file has an SUID or SGID bit set. The file system is mounted nosuid, the user is not the superuser, and the file has an SUID or SGID bit set.
2	ENOENT	No such file or directory	One or more parts of path to new process file does not exist (or is NULL).
4	EINTR	Interrupted system call	Signal was caught during the system call.
5	EIO	Input/output error
7	E2BIG	Argument list too long	New process argument list plus exported shell variables exceed the system limits.
8	ENOEXEC	Exec format error	New process file is not in a recognized format.
11	EAGAIN	Resource temporarily unavailable	Total system memory while reading raw I/O is temporarily insufficient.
12	ENOMEM	Cannot allocate memory	New process memory requirements exceed system limits.
13	EACCES	Permission denied	Search permission denied on part of file path. The new file to process is not an ordinary file. No execute permission on the new file to process.
14	EFAULT	Bad address	`path` references an illegal address.
20	ENOTDIR	Not a directory	Part of the specified `path` is not a directory.
21	EISDIR	Is a directory	An ELF interpreter was a directory.
22	EINVAL	Invalid argument	An ELF executable had more than one interpreter.
24	EMFILE	Too many open files	Process has exceeded the maximum number of files open.
26	ETXTBSY	Text file busy	More than one process has the executable open for writing.
36	ENAMETOOLONG	File name too long	The `path` value exceeds system path/file name length.
40	ELOOP	Too many levels of symbolic links	The `perror` message says it all.
67	ENOLINK	Link has been severed	The `path` value references a remote system that is no longer active.
72	EMULTIHOP	Multihop attempted	The `path` value requires multiple hops to remote systems, but file system does not allow it.
80	ELIBBAD	Accessing a corrupted shared library	An ELF interpreter was not in a recognized format.

The ellipses in the execlp function prototype can be thought of as argument 0 ( arg0 ) through argument n ( argn ). These arguments are pointers to the null- terminated strings that would be normally passed by the system to the program if it were invoked on the command line. That is, argument 0, by convention, should be the name of the program that is executing. This is usually the same as the value in file , although the program referenced by file may include an absolute path, while the value in argument 0 most often would not. Argument 1 would be the first parameter to be passed to the program (which, using argv notation, would be argv[1] ), argument 2 would be the second, and so on. The last argument to the execlp library call must be a NULL that is, for portability reasons, cast to a character pointer. Program 3.3, which invokes the cat utility program, demonstrates the use of the execlp library call.

Program 3.3 Using the execlp system call.

File : p3.3.cxx
 /*
 Running the cat utility via an exec system call
 */
 #include 
 + #include 
 #include 
 using namespace std;
 int
 main(int argc, char *argv[ ]){
 10 if (argc > 1) {
 execlp("/bin/cat", "cat", argv[1], (char *) NULL);
 perror("exec failure ");
 return 1;
 }
 + cerr << "Usage: " << *argv << " text_file" << endl;
 return 2;
 }

When passed a text file name on the command line, this program displays the contents of the file to the screen. The program accomplishes this by overlaying its own process image with the program code for the cat utility program. The program passes the cat utility program the name (referenced by argv[1] ) of the file to display. If the execlp system call fails, the call to perror is made and the program exits and returns the value 1 to the system. If the call is successful, the perror and return statements are never reached, as they are replaced with the program code for the cat utility.

A sample run of the program is shown in Figure 3.3.

Figure 3.3 Output of Program 3.3.

linux$ p3.3 test.txt
This is a sample text
file for the program to
display!

EXERCISE

Harley wondered what value is used by the system to generate a system process table entry when the execlp call is issued. Is it the value referenced by file or the value referenced by arg0 ? Further, what happens if arg0 is set to NULL (""), or if arg0 is omitted entirely (e.g., the file value is immediately followed with (char *)NULL )? Is it possible, in a case like this, for the value of argc to be 0? To test things she wrote, and compiled, the count.cxx program below. She then modified Program 3.3 to call her count executable by changing " /bin/cat " in line 11 of Program 3.3 to " ./count ". What did she find?

File : count.cxx
 #include 
 #include 
 #include 
 using namespace std;
 + int
 main(int argc, char *argv[]){
 cerr << "argc = " << argc << endl;
 cerr << "Processes running" << endl;
 system("ps -f"); // issue a shell
 ps cmd
 10 if ( argc > 1 ) { // value passed?
 int limit = atoi(argv[1]); // convert to #
 for(int i=limit; i ;--i){ // count
 cerr << i << endl;
 sleep( 1 );
 + }
 } else {
 cerr << "Nothing to count" << endl;
 return 2;
 }
 20 return 0;
 }

3.3.2 execvp

If the number of arguments for the program to be executed is dynamic, then the execvp call can be used (Table 3.5). As with the execlp call, the initial argument to execvp is a pointer to the file that contains the program code to be executed. However, unlike execlp , there is only one additional argument that execvp requires. This second argument, defined as

char *const argv[ ]

specifies that a reference to an array of pointers to character strings should be passed. The format of this array parallels that of argv and, in many cases, is argv . If the reference is not the argv values for the current program, the programmer is responsible for constructing and initializing a new argv -like array. If this second approach is taken, the last element of the new argv array should contain a NULL address value. If execvp fails, it returns a value of -1 and sets the value in errno to indicate the source of the error (see Table 3.5).

Table 3.5. Summary of the execvp System Call.

Include File(s)			Manual Section	3
Summary	`Int execvp(const char file, char const argv[]);`
Return	Success	Failure	Sets `errno`
	Does not return	-1	Yes

Program 3.4 makes use of the argv values for the current program.

Program 3.4 Using execvp with argv values.

File : p3.4.cxx
 /*
 Using execvp to execute the contents of argv
 */
 #include 
 + #include 
 #include 
 using namespace std;
 int
 main(int argc, char *argv[ ]) {
 10 if ( argc > 1 ) {
 execvp(argv[1], &argv[1]);
 perror("exec failure");
 return 1;
 }
 + cerr << "Usage: " << *argv << " exe [arg(s)]" << endl;
 return 2;
 }

The program will execute, via execvp , the program passed to it on the command line. The first argument to execvp , argv[1] , is the reference to the program to execute.

The second argument, &argv[1] , is the reference to the remainder of the command-line argv array. Notice that both of these references began with the second element of argv (that is, argv[1] ), as argv[0] is the name of the current program (e.g., p3.4 ). The output in Figure 3.4 shows that the program does work as expected.

Figure 3.4 Output of Program 3.4 when passed the cat command.

linux$ p3.4 cat test.txt
This is a sample text
file for a program to
display!

If we place additional information on the command line when running Program 3.4, we find the program will pass the information on, as demonstrated in Figure 3.5.

Figure 3.5 Output of Program 3.4 when passed the cat command with the -n option.

linux$ p3.4 cat -n test.txt
 1 This is a sample text
 2 file for a program to
 3 display!

If command-line argv values of the current program are not used with execvp , then the programmer must construct a new argv to be passed. An example of how this can be done is shown in Program 3.5.

Program 3.5 Using execvp with a programmer-generated argument list.

File : p3.5.cxx
 /*
 Generating our own argv type list for execvp
 */
 #include 
 + #include 
 #include 
 using namespace std;
 int
 main( ){
 10 char *new_argv[ ] = {"cat",
 "test.txt",
 (char *) 0
 };
 execvp("/bin/cat", new_argv );
 + perror("exec failure ");
 return 1;
 }

When compiled and run as p3.5 , the output of this program will be the same as the output from the first run of Program 3.4.

Programs and Processes

Processing Environment

Using Processes

Primitive Communications

Pipes

Message Queues

Semaphores

Shared Memory

Remote Procedure Calls

Sockets

Threads

Appendix A. Using Linux Manual Pages

Appendix B. UNIX Error Messages

Appendix B. UNIX Error Messages

Appendix C. RPC Syntax Diagrams

Appendix D. Profiling Programs