Section 22.3. Common Security Holes


22.3. Common Security Holes

Now that we have looked at ways of reducing the potential impact of insecure code, we go over some of the most common programming mistakes that lead to security problems. While the rest of this chapter highlights some of the things to look out for, it is by no means a definitive list. Anyone writing programs that need to be secure needs to look beyond just this chapter for guidance.

22.3.1. Buffer Overflows

By far the most common programming mistake that leads to local and remote exploits is a buffer overflow. Here is an example of a program with an exploitable buffer overflow:

  1: /* bufferoverflow.c */  2:  3: #include <limits.h>  4: #include <stdio.h>  5: #include <string.h>  6:  7: int main(int argc, char ** argv) {  8:     char path[_POSIX_PATH_MAX];  9: 10:     printf("copying string of length %d\n", strlen(argv[1])); 11: 12:     strcpy(path, argv[1]); 13: 14:     return 0; 15: } 16: 

This looks pretty innocuous at first glance; after all, the program does not even really do anything. It does, however, copy a string provided by the user into a fixed space on the stack without making sure there is room on the stack for it. Try running this program with a single, long command-line argument (say, 300 characters). It causes a segmentation fault when the strcpy() writes beyond the space allocated for the path array.

To better understand how a program's stack space is allocated, take a look at Figure 22.1. On most systems, the processor stack grows down; that is, the earlier something is placed on the stack, the higher the logical memory address it gets. Above the first item on the stack is a protected region of memory; any attempt to access it is an error and causes a segmentation fault.

Figure 22.1. Memory Map of an Application's Stack

The next area on the stack contains local variables used by the code that starts the rest of the program. Here, we have called that function _main(), although it may actually get quite complex as it involves things like dynamic loading. When this startup code calls the main() routine for a program, it stores the address that the main() routine should return to after it is finished on the stack. When main() begins, it may need to store some of the microprocessor's registers on the stack so it can reuse those registers, and then it allocates space for its local variables.

Returning to our buffer overflow example, this means that the path variable gets allocated at the bottom of the stack. The byte path[0] is at the very bottom, the next byte is path[1], and so on. When our sample program writes more than _POSIX_PATH_MAX bytes into path, it starts to overwrite other items on the stack. If it keeps going, it tries to write past the top of the stack and causes the segmentation fault we saw.

The real problem occurs if the program writes past the return address on the stack, but does not cause a segmentation fault. That lets it change the return address from the function that is running to any arbitrary address in memory; when the function returns, it will go to this arbitrary address and continue execution at that point.

Exploits that take advantage of buffer overflows typically include some code in the array that is written to the stack, and they set the return address to that code. This technique allows the attacker to execute any arbitrary code with the permissions of the program being attacked. If that program is a network daemon running as root, it allows any remote user root access on the local system!

String handling is not the only place where buffer overflows occur (al-though it is probably the most common). Reading files is another common location. File formats often store the size of a data element followed by the data item itself. If the store size is used to allocate a buffer, but the end of the data field is determined by some other means, then a buffer overflow could occur. This type of error has made it possible for Web sites to refer to files that have been corrupted in such a way that reading them causes a remote exploit.

Reading data over a network connection provides one more opportunity for buffer overflows. Many network protocols specify a maximum size for data fields. The BOOTP protocol,[5] for example, fixes all packet sizes at 300 bytes. However, there is nothing stopping another machine from sending a 350-byte BOOTP packet to the network. If some programs on the network are not written properly, they could try and copy that rogue 350-byte packet into space intended for a valid 300-byte BOOTP packet and cause a buffer overflow.

[5] BOOTP is the predecessor to DHCP, which allows machines to learn their IP addresses automatically when they enable their network interfaces.

Localization and translation are two other instigators of buffer overflows. When a program is written for the English language, there is no doubt that a string of 10 characters is long enough to hold the name of a month loaded from a table. When that program gets translated into Spanish, "September" becomes "Septiembre" and a buffer overflow could result. Whenever a program supports different languages and locations, most of the formerly static strings become dynamic, and internal string buffers need to take this into account.

It should be obvious by now that buffer overflows are critical security problems. They are easy to overlook when you are programming (after all, who must worry about file names that are longer than _POSIX_PATH_MAX?) and easy to exploit.

There are a number of techniques to eliminate buffer overflows from code. Well-written programs use many of them to carefully allocate buffers of the right size.

The best way of allocating memory for objects is through malloc(), which avoids the problems incurred by overwriting the return address since malloc() does not allocate memory from the stack. Carefully using strlen() to calculate how large a buffer needs to be and dynamically allocating it on the program's heap provides good protection against overflows. Unfortunately, it also provides a good source of memory leaks as every malloc() needs a free(). Some good ways of tracking down memory leaks are discussed in Chapter 7, but even with these tools it can be difficult to know when to free the memory used by an object, especially if dynamic object allocation is being retrofitted into existing code. The alloca() function provides an alternative to malloc():

 #include <alloca.h> void * alloca(size_t size); 

Like malloc(), alloca() allocates a region of memory size bytes long, and returns a pointer to the beginning of that region. Rather than using memory from the program's heap, it instead allocates memory from the bottom of the program's stack, the same place local variables are stored. Its primary advantage over local variables is that the number of bytes needed can be calculated programmatically rather than guessed; its advantage over malloc() is that the memory is automatically freed when the function returns. This makes alloca() an easy way to allocate memory that is needed only temporarily. As long as the size is calculated properly (do not forget the '\0' at the end of every C language string!), there will not be any buffer overflows.[6]

[6] alloca() is not a standard feature of the C language, but the gcc compiler provides alloca() on most operating systems it supports. In older versions of the gcc compiler (before version 3.3), alloca() did not always interact properly with dynamically sized arrays (another GNU extension), so consider using only one of the two.

There are a couple of other functions that can make avoiding buffer over-flows easier. The strncpy() and strncat() library routines can make it easier to avoid buffer overruns when copying strings around.

 #include <string.h> char * strncpy(char * dest, const char * src, size_t max); char * strncat(char * dest, const char * src, size_t max); 

Both functions behave like their similarly named cousins, strcpy() and strcat(), but the functions return once max bytes have been copied to the destination string. If that limit is hit, the resulting string will not be '\0' terminated, so normal string functions will no longer work. For this reason, it is normally a good idea to explicitly end the string after calling one of these functions like this:

 strncpy(dest, src, sizeof(dest)); dest[sizeof(dest) - 1] = '\0'; 

It is a very common mistake when using strncat() to pass the total size of dest as the max parameter. This leads to a potential buffer overflow as strncat() appends up to max bytes onto dest; it does not stop copying bytes when the total length of dest reaches max bytes.

While using these functions may make the program perform incorrectly if long strings are present (by making those strings get truncated), this technique prevents buffer overflows in static-sized buffers. In many cases, this is an acceptable trade-off (and does not make the program perform any worse than it would if the buffer overflow were allowed to occur).

While strncpy() solves the problem of copying a string into a static buffer without overflowing the buffer, the strdup() functions automatically allocate a buffer large enough to hold a string before copying the original string into it.

 #include <string.h> char * strdup(const char * src); char * strdupa(const char * src); char * strndup(const char * src, int max); char * strndupa(const char * src, int max); 

The first of these, strdup(), copies the src string into a buffer allocated by malloc(), and returns the buffer to the caller while the second, strdupa() allocated the buffer with alloca(). Both functions allocate a buffer just long enough to hold the string and the trailing '\0'.

The other two functions, strndup() and strndupa() copy at most max bytes from str into the buffer (and allocated at most max + 1 bytes) along with a trailing '\0'. strndup() allocates the buffer with malloc() while strndupa() uses alloca().

Another function that often causes buffer overflows is sprintf(). Like strcat() and strcpy(), sprintf() has a variant that makes it easier to protect against overflows.

 #include <stdio.h> int snprintf(char * str, size_t max, char * format, ...); 

Trying to determine the size of a buffer required by sprintf() can be tricky, as it depends on items such as the magnitude of any numbers that are formatted (which may or may not need number signs), the formatting arguments that are used, and the length of any strings that are being used by the format. To make it easier to avoid buffer overflows, snprintf() fills in no more than max characters in str, including the terminating '\0'. Unlike strcat() and strncat(), snprintf() also terminates the string properly, omitting a character from the formatted string if necessary. It returns the number of characters that would be used by the final string if enough space were available, whether or not the string had to be truncated to max (not including the final '\0').[7] If the return value is less than max, then the function completed successfully; if it is the same or greater, then the max limit was encountered.

[7] On some obsolete versions of the C library, it will instead return -1 if the string does not fit; the old version of the C library is no longer maintained and secure programs will not use it, but the snprintf() man page demonstrates code that can handle both variants.

The vsprintf() function has similar problems, and vsnprintf() provides a way to overcome them.

22.3.2. Parsing Filenames

It is quite common for privileged applications to provide access to files to untrusted users and let those users provide the filenames they would like to access. A web server is a good example of this; an HTTP URL contains a filename that the web server is requested to send to the remote (untrusted) user. The web server needs to make sure that the file it returns is one that it has been configured to send, and checking filenames for validity must be done carefully.

Imagine a web server that serves files from home/httpd/html, and it does this by simply adding the filename from the URL it is asked to provide to the end of /home/httpd/html. This will serve up the right file, but it also allows remote users to see any file on the system the web server has access to by requesting a file like ../../../etc/passwd. Those .. directories need to be checked for explicitly and disallowed. The chroot() system call is a good way to make filename handling in programs simpler.

If those filenames are passed to other programs, even more checking needs to be done. For example, if a leading - is used in the filename, it is quite likely that the other program will interpret that filename as a command-line option.

22.3.3. Environment Variables

Programs run with setuid or setgid capabilities need to be extremely careful with their environment settings as those variables are set by the person running the program, allowing an avenue for attack. The most obvious attack is through the PATH environment variable, which changes what directories execlp() and execvp() look for programs. If a privileged program runs other programs, it needs to make sure it runs the right ones! A user who can override a program's search path can easily compromise that program.

There are other environment variables that could be very dangerous; the LD_PRELOAD environment variable lets the user specify a library to load before the standard C library. This can be useful, but is very dangerous in privileged applications (the environment variable is ignored if the real and effective uids are the same for exactly this reason).

If a program is localized, NLSPATH is also problematic. It lets a user switch the language catalog a program uses, which specifies how strings are translated. This means that, in translated programs, the user can specify the value for any translated string. The string can be made arbitrarily long, necessitating extreme vigilance in buffer allocation. Even more dangerously, if a format string for a function like printf() is translated, the format can change. This means that a string like Hello World, today is %s could become Hello World, today is %c%d%s. It is hard to tell what effect this type of change would have on a program's operation!

All of this means that the best solution for a setuid or setgid program's environment variables is to eliminate them. The clearenv()[8] function erases all values from the environment, leaving it empty. The program can then add back any environment variables it has to have with known values.

[8] Unfortunately, clearenv() has not been well standardized. It is included in recent versions of POSIX but was left out of the Single Unix Standard, and is not available on all Unix-type systems. If you need to support an operating system that does not include it, environ = NULL; should work just as well.

22.3.4. Running the Shell

Running the system shell from any program where security is a concern is a bad idea. It makes a couple of the problems that already have been discussed more difficult to protect against.

Every string passed to a shell needs to be very carefully validated. A '\n' or ; embedded in a string could cause the shell to see two commands instead of one, for example. If the string contains back tick characters (') or the $() sequence, the shell runs another program to build the full command-line argument. Normal shell expansion also takes place, making environment variables and globbing available to attackers. The IFS environment variable lets characters other than space and tab separate fields when command lines are parsed by the string, opening up new avenues of attack. Other special characters, like <, >, and |, provide even more ways to build up command lines that do not behave as a program intended.

Checking all of these possibilities is very difficult to get right. The best way, by far, of avoiding all of the possible attacks against a shell is to avoid running one in the first place. Functions like pipe(), fork(), exec(), and glob() make it reasonably easy to perform most tasks the shell is normally used for without opening the Pandora's box of shell command-line expansion.

22.3.5. Creating Temporary Files

It is quite common for programs to use temporary files; Linux even provides special directories (/tmp and /var/tmp) for this purpose. Unfortunately, using temporary files in a secure manner can be very tricky. The best way to use temporary files is to create them in a directory that can be accessed only by the program's effective uid; the home directory of that user would be a good choice. This approach makes using temporary files safe and easy; most programmers do not like this approach as it clutters directories and those files will probably never get erased if the program fails unexpectedly.

Imagine a program that is run by the root user that creates a shell script in a temporary file and then runs that script. To let multiple copies of the program run at the same time, perhaps it includes the program's pid as part of the filename, and creates the file with code like this:

 char fn[200];     int fd;     sprintf(fn, "/tmp/myprogram.%d", getpid());     fd = open(fn, O_CREAT | O_RDWR | O_TRUNC, 0600); 

The program creates a unique filename and truncates whatever file used to be there before writing to it. While it may look reasonable at first glance, it is actually trivial to exploit. If the file the program tries to create already exists as a symbolic link, the open call follows that symbolic link and opens whatever file it points to. One exploit is to create symbolic links in /tmp using many (or all) possible pids that point to a file like /etc/passwd, which would cause the system's password file to be overwritten when this program is run, resulting in a denial-of-service attack.

A more dangerous attack is for those symbolic links to be pointed at a file the attacker owns (or, equivalently, for the attacker to create regular files in /tmp with all of the possible names). When the file is opened, the targeted file will be truncated, but between the time the file is opened and the time the program gets executed, the attacker (who still owns the file) can write anything they like into it (adding a line like chmod u+s /bin/sh would certainly be advantageous in a shell script running as root!), creating an easy attack. While this may seem difficult to time properly, these types of race conditions are often exploited, leading to security compromises. If the program was setuid instead of run as root, the exploit actually becomes much easier as the user can send SIGSTOP to the program right after it opens the file, and then send SIGCONT after exploiting this race condition.

Adding O_EXCL to the open() call prevents open() from opening a file that is a symbolic link as well as a file that already exists. In this particular case, a simple denial-of-service attack also exists, as the code will fail if the first filename tried exists, but this is easily remedied by placing the open() in a loop that tries different filenames until one works.

A better way to create temporary files is by using POSIX's mkstemp() library function, which ensures that the file was created properly.[9]

[9] There are a few other library functions that deal with temporary files, such as tmpnam(), tempnam(), mktemp(),and tmpfile(). Unfortunately, using any of them is little help as they leave exploitable race conditions in programs that are not carefully implemented.

 int mkstemp(char * template); 

The template is a filename whose last six characters must be "XXXXXX". The last part is replaced with a number that allows the filename to be unique in the file system; this approach allows mkstemp() to try different filenames until one is found that works. The template is updated to contain the filename that was used (allowing the program to remove the file) and a file descriptor referring to the temporary file is returned. If the function fails, -1 is returned.

Older versions of Linux's C library created the file with mode 0666 (world read/write), and depended on the program's umask to get the proper permissions on the file. More recent versions allow only the user to read and write from the file, but as POSIX does not specify either behavior, it is a good idea to explicitly set the process's umask (077 would be a good choice!) before calling mkstemp().

Linux, and a few other operating systems, provide mkdtemp() for creating temporary directories.

 char * mkdtemp(char * template); 

The template works the same as it does for mkstemp(), but the function returns a pointer to template on success and NULL if it fails.

Most operating systems that provide mkdtemp() also provide a mktemp program that allows shell scripts to create both temporary files and directories in a secure manner.

There are other problems with temporary files that have not been covered here. They include race conditions added by temporary directories residing on networked (especially NFS) file systems as well as by programs that regularly remove old files from those directories, and the extreme care that needs to be taken to reopen temporary files after they have been created.

For details on these and other problems with temporary files, take a look at David A. Wheeler's HOWTO mentioned on page 531. If you need to do any of these things, it is probably a better idea just to figure out how to create the files in the effective user's home directory instead.

22.3.6. Race Conditions and Signal Handlers

Any time an attacker can cause a program to behave in an incorrect manner there is the potential for an exploit. Mistakes that seem as innocuous as freeing the same portion of memory twice have been successfully exploited in the past, highlighting the need for privileged programs to be very carefully written.

Race conditions, and the signal handlers that can easily cause race conditions, are a rich source of program bugs. Common mistakes in writing signal handlers include:

  • Performing dynamic memory allocation. Memory allocation functions are not reentrant, and should not be used from a signal handler.

  • Using functions other than those appearing in Table 12.2 is always a mistake. Programs that call functions such as printf() from a signal handler have race conditions, as printf() has internal buffers and is not reentrant.

  • Not properly blocking other signals. Most signal handlers are not meant to be reentrant, but it is quite common for a signal handler that handles multiple signals to not automatically block those other signals. Using sigaction() makes it easy to get this right as long as the programmer is diligent.

  • Not blocking signals around areas of code that modify variables that a signal handler also accesses. (These code regions are often called critical regions.)

While signal-induced race conditions may not seem dangerous, networking code, setuid, and setgid programs can all have signals sent by untrusted users. Sending out-of-band data to a program can cause a SIGURG to be sent, while setuid and setgid programs can be sent signals by the user who runs them as the real uid of those processes does not change. Even if those programs change their real uids to prevent signals from being sent, when the user closes a terminal, all programs using that terminal are sent a SIGHUP.

22.3.7. Closing File Descriptors

On Linux and Unix systems, file descriptors are normally inherited across exec() system calls (and are always inherited across fork() and vfork()). In most cases, this behavior is not particularly desirable, as it is only stdin, stdout, and stderr that ought to be shared. To prevent programs that a privileged process runs from having access to files they should not have via an inherited file descriptor, it is important that programs carefully close all of the file descriptors to which the new program should not have access, which can be problematic if your program calls library functions which open files without closing them. One way of closing these file descriptors is to blindly close all of the file descriptors from descriptor number 3 (the one just after stderr) to an arbitrary, large value (say, 100 or 1024).[10] For most programs, this ensures that all of the proper file descriptors have been closed.[11]

[10] Linux allows programs to open a very large number of files. Processes that are run as root can open millions of files simultaneously, but most distributions set a resource limit on the number of files a user process can open. This resource limit also limits the maximum file descriptor that can be used with dup2(), providing a usable upper limit for closing file descriptor.

[11] Another way of closing all of the files a program has open is to walk through the process's /proc file system directory that lists all of the files it has open and close each of them. The directory /proc/PID/fd (where PID is the pid of the running process) contains a symbolic link for each file descriptor the process has open, and the name of each symbolic link is the file descriptor to which it corresponds. By reading the contents of the directory, the program can easily close all of the file descriptors it no longer needs.

A better approach is for programs to set the close-on-exec flag for every file it leaves open for an extended period of time (including sockets and device files), which prevents any new programs that are run from having access to those files. For details on the close-on-exec flag, see page 196.


    Linux Application Development
    Linux Application Development (paperback) (2nd Edition)
    ISBN: 0321563220
    EAN: 2147483647
    Year: 2003
    Pages: 168

    Similar book on Amazon © 2008-2017.
    If you may any questions please contact us: