The Stdio File Interface

The UNIX kernel provides an interface for manipulating files based on file descriptors. The C stdio system provides a slightly richer interface for file interaction, which is based on the FILE structure. It's implemented as an abstraction layer on top of the kernel's file descriptor interface. UNIX application code commonly uses stdio in lieu of the lower-level system call API because it automatically implements buffering and a few convenience functions for data formatting. The extra layer of abstraction doesn't change the basic problems discussed so far, but it adds a few scenarios in which vulnerabilities can be introduced.

A number of functions are provided to manipulate files by using these structures and to convert between file structures and file descriptors. A typical FILE structure contains a pointer to buffered file data (if it's a buffered stream), the file descriptor, and flags related to how the stream is opened. The glibc FILE structure is shown in the following code (slightly modified for brevity):

struct _IO_FILE {   int _flags;        /* High-order word is _IO_MAGIC;                         rest is flags. */ #define _IO_file_flags _flags   /* The following pointers correspond to the C++    streambuf protocol. */   /* Note: Tk uses the _IO_read_ptr and     _IO_read_end fields directly. */   char* _IO_read_ptr;    /* Current read pointer */   char* _IO_read_end;    /* End of get area. */   char* _IO_read_base;    /* Start of putback+get area. */   char* _IO_write_base;    /* Start of put area. */   char* _IO_write_ptr;    /* Current put pointer. */   char* _IO_write_end;    /* End of put area. */   char* _IO_buf_base;    /* Start of reserve area. */   char* _IO_buf_end;    /* End of reserve area. */   /* The following fields are used to support      backing up and undo. */   char *_IO_save_base; /* Pointer to start of                           non-current get area. */   char *_IO_backup_base; /* Pointer to first valid                             character of backup area */   char *_IO_save_end; /* Pointer to end of non-current                          get area. */   int _fileno;   ...   _IO_lock_t *_lock; };

These structures can also be used for operating on other resources that can be represented by descriptors, such as sockets.

Opening a File

The fopen() function is used for opening files. It takes a path argument as well as a string indicating the mode for opening the file. The prototype is as follows:

FILE *fopen(char *path, char *mode);

Programs that use fopen() are subject to the same potential problems as those that use open(); the specified path must be validated correctly if it contains user-malleable data, and code should be careful not to work in directories where malicious attackers have influence. fopen()'s mode argument is a textual representation of what access the program needs for the file. The modes are listed in Table 9-6.

Table 9-6. File Access Modes for fopen()
Mode String	Meaning
`r`	Open the file for read-only access
`r+`	Open the file for reading and writing. The file offset pointer is pointing to the beginning of the file, so a write to this file causes data already in the file to be overwritten.
`w`	Open the file for writing. If the file already exists, it's truncated to 0 bytes. If it doesn't exist, it's created.
`w+`	Identical to `"r+"` except the file is truncated if it exists. Additionally, this mode creates a file if it exists, whereas `"r+"` doesn't.
`a`	Open in append modethat is, the file is opened for writing. If the file already exists, the file offset pointer points to the end of the file so that writing to the stream doesn't overwrite data already in the file. If the file doesn't exist, it's created.
`a+`	Open in append mode for both reading and writing. The file offset points to the beginning of the file so that data can be read from it, but when data is written, it's appended to the file. If the file doesn't exist, it's created.

Of these six modes, only two don't implicitly create a new file. Therefore, it's very easy to accidentally create new files unintentionally with fopen(). Furthermore, because fopen() does not explicitly take a permissions bitmask argument, the default permissions of octal 0666 are applied (that is, everyone can read and write to the file). fopen() always further restricts file permissions based on the umask value of the current process. Because this umask value is an inheritable attribute, users can quite easily abuse calls to fopen() in a privileged application to create a file that anyone is able to write to. Therefore, careful attention should be paid to how fopen() is used in a privileged context, especially when it's using modes that result in file creation. Even when it's creating a temporary file in a location that attackers can't generally control, modifying the umask and then writing malicious data can often result in a compromise of the application.

Note

Recent glibc fopen() implementations also allow developers to specify an 'x' in the mode string parameter. This causes fopen() to specify the O_EXCL flag to open(), thus ensuring that a new file is created.

Two other functions are provided for opening file streams: freopen() for reopening a previously opened file stream and fdopen() for creating a FILE structure for a preexisting socket descriptor. The freopen() function is vulnerable to the same sort of problems related to file creation as fopen() is; however, fdopen() is not because all it does is create a FILE structure and associate it with a preexisting file descriptor.

Reading from a File

The fread() function can be used to read data from files in a manner similar to the way read() works, except it's intended to read a certain number of elements of a specific size. The prototype for fread() is as follows:

int fread(void *buffer, size_t size, size_t count, FILE *fp)

This function reads count elements (each of which is size bytes long) from the file pointed to by fp.

Note

Notice that fread() takes two parameters, indicating the size of an element and the number of elements to be read. Since these parameters will eventually be multiplied together, there is the potential for fread() to contain an integer overflow internally (glibc has this problem). In certain situations, such an overflow might create an opportunity for exploitation.

Because many applications process files containing text data, the fgets() function is provided, which is used to read a single line of the input from the file. The function prototype looks like this:

char *fgets(char *buffer, size_t size, FILE *fp);

This function returns a pointer to the input buffer when it's able to read a line from the file successfully. It returns NULL if an error has occurred (usually an EOF was encountered). The fgets() function could be used in a manner that exposes the application to problems when parsing files. First, ignoring the return value can lead to problems, as you've seen in previous examples. When fgets() returns NULL, the contents of the destination buffer are unspecified, so a program that fails to check the return value of fgets() probably ends up processing uninitialized data in the destination buffer. An example of this mistake would look this:

int read_email(FILE *fp) {     char user[1024], domain[1024];     char buf[1024];     int length;     fgets(buf, sizeof(buf), fp);     ptr = strchr(buf, '@');     if(!ptr)         return 1;     *ptr++ = '\0';     strcpy(user, buf);     strcpy(domain, ptr);     ... }

In the read_email() function, the fact that the return value of fgets() is ignored means the content of buf remains undefined if fgets() fails. The fgets() function guarantees NUL-termination only when it returns successfully, so the buf variable that's subsequently copied out of might contain a text string that's longer than 1024 bytes (because it's uninitialized and fgets() hasn't done anything to it). Therefore, either of the calls to strcpy() can potentially overflow the user and domain stack buffers.

Note

Saying that the buffer contents aren't touched by fgets() when an error is encountered is an oversimplification, and isn't true for all fgets() implementations. If the file finishes with a partial line, BSD implementations copy the partial line into the buffer and then return NULL, indicating an EOF was encountered. The buffer is not NUL-terminated in this case. Using this behavioral quirk might allow easier exploitation of bugs resulting from unchecked fgets() return values because the stack buffer can have user-controllable data from the file in it. The Linux glibc implementation does not exhibit the same behavior; it copies a partial line into the buffer, NUL-terminates it, and returns successfully; then it signals an error the next time fgets() is called.

Another potential misuse of fgets() happens when a privileged file containing some user-controlled data is incorrectly parsed. For example, say a file is being parsed to check user credentials. Each line contains a valid user in the system and has the format user:password:real name (not unlike the UNIX /etc/passwd file format). The following code authenticates users:

struct entry {     char user[256];     char password[256];     char name[1024]; }; int line_to_entry (char *line, struct entry *ent) {     char *ptr, *nptr;     ptr = strchr(line, ':');     if(ptr == NULL || (ptr  line) >= sizeof(ent->user)))         return 1;     *ptr++ = '\0';     strcpy(ent->user, line);     nptr = strchr(ptr, ':');     if(nptr == NULL || (nptr  ptr) >= sizeof(ent->password))         return 1;     *nptr++ = '\0';     strcpy(ent->password, ptr);     if(strlen(nptr) >= sizeof(ent->name))         return 1;     strcpy(ent->name, nptr);     return 0; } int auth_user(char *user, char *password) {     FILE *fp;     struct entry ent;     fp = fopen("/data/users.pwd", "r");     if(fp == NULL)         return 0;     while(fgets(filedata, sizeof(filedata), fp) != NULL){         if(line_to_entry(filedata, &ent) < 0)             return 0;         if(strcmp(user, ent.user) != 0)             continue;         if(strcmp(password, ent.password) != 0)             break;        /* correct user,                             incorrect password */         fclose(fp);         return 1;            /* success! */     }     fclose(fp);     return 0; }

This example runs through each username and password in the file attempting to authenticate a user. The problem is that the bolded call to fgets() is potentially flawed. The fgets() function reads only up to the specified size (in this case, 1024 bytes), so if the line is longer, only the first 1023 bytes are returned in the first call to fgets(), and the rest of the line is returned in the next call. If attackers could specify a real name written to this file of 1024 bytes (or thereabouts), their username entry would be incorrectly parsed as two entriesthe first 1023 bytes being one entry, and the remaining data in the line being a new entry. They could use this result to effectively authenticate themselves as any user they wanted (including adding new usernames to the database).

Finally, the fscanf() function is used to read data of a specified format directly into variables, eliminating the need for application developers to interpret text data as integer values, strings, and so forth. As discussed in Chapter 8, "Strings and Metacharacters," it's easy for buffer overflows to occur when using this function to read in string values. To recap, here's a quick example:

struct entry {     char user[256];     char password[256];     char name[1024]; }; int line_to_entry (FILE *fp, struct entry *ent) {     int rc;     rc = fscanf(fp, "%s:%s:%s", ent->user,         ent->password, ent->name);     return (rc == 3) ? 0 : -1; }

This code is a slightly modified example of the fgets() vulnerability you saw previously. Notice how much work using fscanf() cut out. The function in the example is vulnerable to simple buffer overflows, however, because there are no limits on how large the username, password, and real name entries can be. Using qualifiers can help limit the length of strings being read in so that overflows don't occur.

Another important thing about fscanf() is checking that the return value is equal to the number of elements it successfully parsed according to the input string format. Like fgets(), a failure to check the return value means the program might deal with potentially uninitialized variables. It's a little more common that the return value from fscanf() isn't checked (or not adequately checked) than fgets(). Consider the following example:

struct entry {     char user[256];     char password[256];     char name[1024]; }; int line_to_entry (FILE *fp, struct entry *ent) {     if(fscanf(fp, "%s:%s:%s", ent->user,         ent->password, ent->name) < 0)         return -1;     return 0; }

This code checks that fscanf() returns a value greater than 0, but this check is insufficient; if the code encounters a line from the file it's parsing that doesn't contain any separators (:), ent->password and ent->name are never populated, so referencing them would result in the program processing uninitialized data.

Note

You might wonder why the discussion on format string vulnerabilities in Chapter 8 mentioned the printf() family of functions but not scanf(). The reason is that the authors have never encountered code in which a user can control part of the format string to a scanf() function, and it seems unlikely that would happen. However, if a user could partially control a format string passed to scanf(), it would likely be exploitable (depending on certain conditions, such as what data is on the stack). Malicious users who supplied extraneous format specifiers could corrupt memory and probably gain complete control over the application.

Writing to a File

Each function described in the previous section has a counterpart that writes data into a file. There are more limitations on users' ability to adversely affect an application that's writing to a file because the data being manipulated is already in memory; the process of writing it into a file doesn't often have as many security implications as reading and operating on data (except, of course, if you have already caused the application to open a sensitive file). Having said that, there are definitely things that can go wrong.

The first problem associated with writing to files is using the printf() functions. Chapter 8 discussed format string vulnerabilities that could occur when users can partially control the format string argument. This class of vulnerabilities allows users to corrupt arbitrary locations in memory by specifying extraneous format specifiers and usually result in a complete compromise of the vulnerable program.

Another problem with file output is inconsistencies in how the file should be formatted. If users can insert delimiters the application didn't adequately check for, that might allow malformed or additional entries to be inserted in the file. For example, the following code shows a privileged process charged with updating real name information in the system password file (/etc/passwd):

int update_info(FILE *fp, struct passwd *pw) {     if(fprintf(fp, "%s:%s:%lu:%lu:%s:%s%s\n",         pw->pw_name, pw->pw_passwd, pw->pw_uid, pw->pw_gid,         pw->pw_gecos, pw->pw_dir, pw->pw_shell) < 0)         return -1;     return 0; }

This example is almost identical to the putpwent() implementation in glibc. Obviously, any program using this function would need to be careful; if the pw_gecos field, for example, is being updated and contains extra delimiters (in this case, : or \n), it could be used to insert arbitrary password entries in the passwd file. Specifically, if a pw_gecos field contains the string hi:/:/bin/sh\nnew::0:0:, this function would inadvertently create a username called new that has no password and root privileges!

You learn about more types of writing-related problems when rlimits are discussed in Chapter 10, "UNIX II: Processes."

Closing a File

Finally, when a program is done with a file stream, it can close it in much the same way close() is used on a file descriptor. Here's the prototype:

int fclose(FILE *stream);

Because the file API uses descriptors internally, failure to close a file that has been opened results in file descriptor leaks (covered in the "File Descriptors" section earlier in this chapter).

Additionally, most fclose() implementations free memory that's being used to buffer file data and might also free the FILE structure. For example, look at the glibc fclose() implementation:

int _IO_new_fclose (fp)    _IO_FILE *fp; {  int status;  CHECK_FILE(fp, EOF);    ...  if (fp != _IO_stdin && fp != _IO_stdout && fp != _IO_stderr)    {      fp->_IO_file_flags = 0;      free(fp);    }  return status; }

Notice the call to free() that passes fp as a parameter. If a program calls fclose() twice on a FILE structure using this implementation, a double free() would occur, and the heap could potentially be corrupted. Other implementations (such as OpenBSD's) are a little more resistant to these problems; however, closing a file twice might still result in vulnerable situations related to a different file being closed unexpectedly.

Note

In the OpenBSD 3.6 fclose(), it might also be possible to trigger a double free() by closing a file twice, if the double fclose() was caused by a well-timed signal handler or competing thread.

Opening a File

Table 9-6. File Access Modes for fopen()

Reading from a File

Writing to a File

Closing a File