Program Invocation

Program invocation is provided by a flexible programmatic API that's buttressed by even more accommodating programs, such as command shells. History has shown that it's quite easy to shoot yourself in the foot when attempting to run external programs. The following sections explain calling programs directly through the system call interface and calling programs indirectly through an intermediary, such as a command shell or library code.

Direct Invocation

Processes are a generic data structure that OSs use to represent the single execution of a program. So far, you've seen that new processes are created by copying an existing process with fork(). Now you see how a process can load and run a program.

A process typically runs a new program by calling one of the exec family of functions. On most UNIX systems, several variations of these functions are provided by the standard libraries, which all end up using one powerful system call, execve(), which has the following prototype:

int execve(const char *path, char *const argv[],            char *const envp[]);

The first parameter, path, is a pathname that specifies the program to run. The second parameter, argv, is a pointer to command-line arguments for the program. The third argument, envp, is a pointer to environment variables for the program.

Note

The standard C libraries (libc) supplied with contemporary UNIX-based OSs provide a number of different functions to call a new program directly: execl(), execlp(), execle(), execv(), and execvp(). These functions provide slightly differing interfaces to the execve() system call, so when execve() is mentioned in this section, any of these functions should be considered to behave in the same manner.

The command-line arguments pointed to by argv are an array of pointers to character strings with a NULL pointer marking the end of the array. Each pointer in the array points to a different command-line argument for the program. By convention, the first argument, known as argument zero, or argv[0], contains the name of the program. This argument is controlled by the person who calls exec, so programs can't place any trust in it. The rest of the arguments are also C strings, and they can contain almost anything without a NUL byte. The environment argument, envp, points to a similarly constructed array of pointers to strings. Environment variables are explained in detail in "Environment Arrays" later in this chapter.

Dangerous execve() Variants

All exec functions are just variants of the execve() system call, so they should be regarded similarly in terms of process execution issues. Two variants of execve()execvp() and execlp()have an additional security concern. If either function is used with a filename that's missing slashes, it uses the PATH environment variable to resolve the location of the executable. (The PATH variable is discussed in "Common Environment Variables" later in this chapter.) So if either function is invoked without a pathname, users can set PATH to point to an arbitrary location on the file system where they can create a program to run code of their choosing. The following code shows a vulnerable invocation:

int print_directory_listing(char *path) {     char *av[] = { "ls", "-l", path, NULL };     int rc;     rc = fork();     if(rc < 0)         return -1;     if(rc == 0)         execvp("ls", av);     return 0; }

If this process is running with special privileges or if environment variables can be set remotely to a program containing this code, setting the PATH variable to something like PATH=/tmp runs the /tmp/ls file if it exists.

Both execvp() and execlp() have another behavioral quirk that might be exploitable in certain situations. Regardless of whether a full path is supplied in the filename argument, if the call to execve() fails with the return code ENOEXEC (indicating an error loading the binary), the shell is opened to try to run the file. This means all shell metacharacters and environment variables (discussed in more detail in "Indirect Invocation") come into play.

Auditing Tip

When auditing code that's running with special privileges or running remotely in a way that allows users to affect the environment, verify that any call to execvp() or execlp() is secure. Any situation in which full pathnames aren't specified, or the path for the program being run is in any way controlled by users, is potentially dangerous.

The Argument Array

When a program is called directly, you need to know how the argument list is built. Most programs process argument flags by using the - switch. Programs that fail to adequately sanitize user input supplied as arguments might be susceptible to argument switches being supplied that weren't intended.

David Sacerdote of Secure Networks Inc. (SNI) discovered a way to abuse additional command-line arguments in the vacation program (archived at http://insecure.org/sploits/vacation_program_hole.html), which can be used to automatically respond to incoming e-mails with a form letter saying the person is on vacation. The following code is responsible for sending the response message:

/*  * sendmessage --  *      exec sendmail to send the vacation file to sender  */ void sendmessage(myname)         char *myname; {         FILE *mfp, *sfp;         int i;         int pvect[2];         char buf[MAXLINE];         mfp = fopen(VMSG, "r");         if (mfp == NULL) {                 syslog(LOG_NOTICE, "vacation: no ~%s/%s "                        "file.\n", myname, VMSG);                 exit(1);        }        if (pipe(pvect) < 0) {                syslog(LOG_ERR, "vacation: pipe: %s",                    strerror(errno));                exit(1);        }        i = vfork();        if (i < 0) {                syslog(LOG_ERR, "vacation: fork: %s",                    strerror(errno));                exit(1);        }        if (i == 0) {                dup2(pvect[0], 0);                close(pvect[0]);                close(pvect[1]);                fclose(mfp);                execl(_PATH_SENDMAIL, "sendmail", "-f",                    myname, from, NULL);                syslog(LOG_ERR, "vacation: can't exec %s: %s",                        _PATH_SENDMAIL, strerror(errno));                _exit(1);        }        close(pvect[0]);        sfp = fdopen(pvect[1], "w");        fprintf(sfp, "To: %s\n", from);        while (fgets(buf, sizeof buf, mfp))                fputs(buf, sfp);        fclose(mfp);        fclose(sfp); }

The vulnerability is that myname is taken verbatim from the originating e-mail address of the incoming message and used as a command-line argument when sendmail is run with the execl() function. If someone sends an e-mail to a person on vacation from the address -C/some/file/here, sendmail sees a command-line argument starting with -C. This argument typically specifies an alternative configuration file, and Sacerdote was able to leverage this to get sendmail to run arbitrary commands on behalf of the vacationing user.

Typically, when looking for vulnerabilities of this nature, you must examine what invoked applications do with command-line arguments. Most of the time, they parse option arguments by using the getopt() function. In this case, you need to be aware of these points:

If an option takes an argument, it can be specified in the same string or in separate strings. For example, if the argument -C takes a file parameter, the argv array can contain one entry with just the string -C followed by another entry containing the filename, or it can contain just one entry in the form -C/filename.
If an argument with just two dashes is specified (--), any switches provided after that argument are ignored and treated as regular command-line arguments. For example, the command line ./program f file -- -C file results in the -f switch being processed normally and the -C switch being ignored by getopt().

The first point gives attackers more of a chance to exploit a potential vulnerability. It might be useful when user input hasn't been filtered adequately, but users can specify only a single argument. A bug of this nature existed in old versions of the Linux kernel when it invoked the modprobe application to automatically load kernel modules on a user's behalf. The vulnerable code is shown in Listing 10-1.

Listing 10-1. Kernel Probe Vulnerability in Linux 2.2

static int exec_modprobe(void * module_name) {     static char * envp[] = { "HOME=/", "TERM=linux",         "PATH=/sbin:/usr/sbin:/bin:/usr/bin", NULL };     char *argv[] = { modprobe_path, "-s", "-k",         (char*)module_name, NULL };     int i;     use_init_file_context();     ...     /* Allow execve args to be in kernel space. */     set_fs(KERNEL_DS);     /* Go, go, go... */     if (execve(modprobe_path, argv, envp) < 0) {         printk(KERN_ERR               "kmod: failed to exec %s -s -k %s, errno ="               " %d\n",               modprobe_path, (char*) module_name, errno);         return -errno;     }     return 0; }

The Linux kernel would run modprobe in certain circumstances to locate a module for handling a user-specified device. Using the ping utility (a setuid program was required to trigger the vulnerable code path), users could specify a utility with a leading dash, which resulted in modprobe interpreting the value as an argument switch rather than a normal argument. Using the -C switch, local users could exploit this vulnerability to gain root privileges.

The second point listed previously gives developers an easy-to-use mechanism for avoiding security problems when building argument lists. The Linux kernel example in Listing 10-1 was fixed by inserting a -- argument (among other things) to prevent future attacks of this nature. When auditing code where a program builds an argument list and calls another program, keep in mind that getopt() interprets only the arguments preceding --.

Indirect Invocation

Many libraries and language features allow developers to run a program or command by using a command subshell. Generally, these approaches aren't as safe as a straightforward execve(), because command shells are general-purpose applications that offer a lot of flexibility and potentially dangerous extraneous functionality. The issues outlined in this section apply to programs that use a command shell for various purposes and they also apply to shell scripts.

The library functions popen() and system() are the most popular C mechanisms for making use of a command subshell. Perl provides similar functionality through its flexible open() function as well as the system() function and backtick operators. Other languages also provide similar functionality; Python has a myriad of os modules, and even Java has the Runtime.getRuntime().exec() method.

Metacharacters

A shell command line can have a formidable amount of metacharacters. Stripping them all out is difficult unless you use a white-list approach. Metacharacters can be useful to attackers in a number of ways, listed in Table 10-1.

Table 10-1. Metacharacter Uses
Metacharacter Type	Explanation
Command separators	Command separators might be used to specify more commands in a shell invocation than the developer intended.
File redirection	Redirection operators might be used to trick a program into reading or writing files (or sockets, pipes, and so on) from the system. This might allow users to see contents of files that they shouldn't be able to or even create new files.
Evaluation operators	Most shells provide evaluation operators that perform some statement or expression and return a result. If users can specify them, they might be able to run arbitrary commands on the system.
Variable definitions	By specifying new environment variables or being able to include previously defined ones, users might be able to adversely affect the way the shell performs certain function. A good example is redefining the `IFS` environment variable (discussed later in "Common Environment Variables").

The subject of dealing with shell metacharacters (and associated data filters) was covered in depth in Chapter 8, "Strings and Metacharacters."

Globbing

In addition to the standard metacharacters a typical shell processes, it also supports the use of special characters for file system access. These characters, called globbing characters, are wildcards that can be used to create a pattern template for locating files based on the specified criteria. Most people use simple globbing patterns on a daily basis, when performing commands such as this one:

ls *.c

The characters that glob() interprets are ., ?, *, [, ], {, and }. Globbing functionality is inherent in shell interpreters as well as a number of other places, such as FTP daemons. If programs aren't careful to filter out these characters, they might render themselves susceptible to files being accessed that weren't intended.

Globbing Security Problems

In many circumstances, users can take advantage of globbing, and it doesn't represent a security threat, as in FTP. However, because of implementation problems within the glob() function in a number of libc implementations, users have been able to supply malformed pathnames that result in memory corruption vulnerabilitiesboth buffer overflows and double-frees. Anthony Osborne and John McDonald (one of this book's authors) published an advisory for Network Associates (NAI)'s Covert Labs that outlined multiple buffer overflows in several glob() implementations used in FTP daemons. The advisory is archived at www.securityfocus.com/advisories/3202.

Environment Issues

In addition to the problems with metacharacter and globbing character filters, an application is also at risk because of the shell's inherent interaction with its environment. Environment trust issues are covered in "Environment Arrays" later in this chapter, but they are mentioned here because shells tend to alter their behavior significantly based on certain environment variable settings. Depending on the shell, certain environmental variables can be supplied that cause the shell to read arbitrary files on the file system and, in some cases, execute them. Most modern libc's filter out potentially dangerous environment variables when a setuid root process invokes a shell (such as PATH, all the LD_* variables, and so on). However, this filtering is very basic and might not be sufficient in some cases. In fact, shell behavior can change dramatically in response to a wide variety of environment variables. For example, the sudo application was vulnerable to attack when running shell scripts at one point because of a feature in bash; certain versions of bash search for environment variables beginning with () and then create a function inside the running shell script with the contents of any matching environment variable. (The vulnerability is documented at www.courtesan.com/sudo/alerts/bash_functions.html.) Although this behavior might seem quirky, the point remains that shells frequently expand their functionality in response to certain environment variables. This rapid expansion combined with each shell using slightly different environment variables to achieve similar goals can make it hard for applications to protect themselves adequately. Most applications that deal with environment variable filtering perform a black-list approach rather than a white-list approach to known problem-prone environment variables, so you often find that unanticipated feature enhancements in shell implementations introduce the capability to exploit a script running with elevated privileges.

Setuid Shell Scripts

Running shell scripts with elevated privileges is always a bad idea. What makes it so dangerous is that the shell's flexibility can sometimes be used to trick the script into doing something it shouldn't. Using metacharacters and globbing, it might be possible to cause the script to run arbitrary commands with whatever privileges the shell script is running with.

An additional problem with running shell scripts is that they aren't directly invoked. The shell program is invoked with the shell script as an argument, in much the same way execvp() and execlp() work when ENOEXEC is returned. Because of this indirection, symlink attacks might also be possible.