C String Handling | The Art of Software Security Assessment: Identifying and Preventing Software Vulnerabilities

In C, there's no native type for strings; instead, strings are formed by constructing arrays of the char data type, with the NUL character (0x00) marking the end of a string (sometimes referred to as a NULL character or EOS). Representing a string in this manner means that the length of the string is not associated with the buffer that contains it, and it is often not known until runtime. These details require programmers to manage the string buffers manually, generally in one of two ways. They can estimate how much memory to reserve (by choosing a conservative maximum) for a statically sized array, or they can use memory allocation APIs available on the system to dynamically allocate memory at runtime when the amount of space required for a data block is known.

The second option seems more sensible, but it has some drawbacks. Far more processing overhead is involved when allocating memory dynamically, and programmers need to ensure that memory is freed correctly in each possible code path to avoid memory leaks. The C++ standard library provides a string class that abstracts the internals so that programmers don't need to deal explicitly with memory-sizing problems. The C++ string class is, therefore, a little safer and less likely to be exposed to vulnerabilities that occur when dealing with characters in C. However, programmers often need to convert between C strings and C++ string classes to use APIs that require C strings; so even a C++ program can be vulnerable to C string handling vulnerabilities. Most C string handling vulnerabilities are the result of the unsafe use of a handful of functions, which are covered in the following sections.

Unbounded String Functions

The first group of functions is conventionally unsafe string manipulation functions. The main problem with these functions is that they are unboundedthat is, the destination buffer's size isn't taken into account when performing a data copy. This means that if the string length of the source data supplied to these functions exceeds the destination buffer's size, a buffer overflow condition could be triggered, often resulting in exploitable memory corruption. Code auditors must systematically examine each appearance of these functions in a codebase to determine whether they are called in an unsafe manner. Simply put, code auditors must find out whether those functions can be reached when the destination buffer isn't large enough to contain the source content. By analyzing all the code paths that lead to these unsafe routines, you can find whether this problem exists and classify the call as safe or unsafe.

scanf()

The scanf() functions are used when reading in data from a file stream or string. Each data element specified in the format string is stored in a corresponding argument. When strings are specified in the format string (using the %s format specifier), the corresponding buffer needs to be large enough to contain the string read in from the data stream. The scanf() function is summarized in the following list:

Function int scanf(const char *format, ...);
API libc (UNIX and Windows)
Similar functions _tscanf, wscanf, sscanf, fscanf, fwscanf, _snscanf, _snwscanf
Purpose The scanf() function parses input according to the format specified in the format argument.

The following code shows an example of misusing scanf():

int read_ident(int sockfd) {     int sport, cport;     char user[32], rtype[32], addinfo[32];     char buffer[1024];     if(read(sockfd, buffer, sizeof(buffer)) <= 0){         perror("read: %m");         return 1;     }     buffer[sizeof(buffer)1] = '\0';     sscanf(buffer, "%d:%d:%s:%s:%s", &sport, &cport, rtype,            user, addinfo);     ... }

The code in this example reads an IDENT response (defined at www.ietf.org/rfc/rfc1413.txt) from a client. As you can see, up to 1024 bytes are read and then parsed into a series of integers and colon-separated strings. The user, rtype, and addinfo variables are only 32 bytes long, so if the client supplies any of those fields with a string larger than 32 bytes, a buffer overflow occurs.

sprintf()

The sprintf() functions have accounted for many security vulnerabilities in the past. If the destination buffer supplied as the first parameter isn't large enough to handle the input data elements, a buffer overflow could occur. Buffer overflows happen primarily because of printing large strings (using the %s or %[] format specifiers). Although less common, other format specifiers (such as %d or %f) can also result in buffer overflows. If users can partially or fully control the format argument, another type of bug could occur, known as "format string" vulnerabilities. They are discussed in more detail later in this chapter in "C Format Strings." The sprintf() function is summarized in the following list:

Function int sprintf(char *str, const char *format, ...);
API libc (UNIX and Windows)
Similar functions _stprintf, _sprintf, _vsprintf, vsprintf, swprintf, swprintf, vsprintf, vswprintf, _wsprintfA, _wsprintfW
Purpose The sprintf() functions print a formatted string to a destination buffer.

The following example is taken from the Apache JRUN module:

static void WriteToLog(jrun_request *r, const char *szFormat, ...) {         server_rec *s = (server_rec *) r->context;     va_list list;     char szBuf[2048];         strcpy(szBuf, r->stringRep);     va_start (list, szFormat);     vsprintf (strchr(szBuf,'\0'), szFormat, list);     va_end (list); #if MODULE_MAGIC_NUMBER > 19980401         /* don't need to add newline - this function            does it for us */     ap_log_error(APLOG_MARK, APLOG_NOERRNO|APLOG_NOTICE, s, "%s", szBuf); #else     log_error(szBuf, s); #endif #ifdef WIN32         strcat(szBuf, "\r\n");         OutputDebugString(szBuf); #endif }

This example is a classic misuse of vsprintf(). The destination buffer's size isn't accounted for at all, so a buffer overflow occurs if the vsprintf() function can be called with any string larger than 2048 bytes.

Note

The _wsprintfA() and _wsprintfW() functions copy a maximum of 1024 characters into the destination buffer, as opposed to the other sprintf() functions, which copy as many as required.

strcpy()

The strcpy() family of functions is notorious for causing a large number of security vulnerabilities in many applications over the years. If the destination buffer can be smaller than the length of the source string, a buffer overflow could occur. The wscpy(), wcscpy(), and mbscpy() functions are similar to strcpy() except they deal with wide and multibyte characters and are common in Windows applications. The following list summarizes the strcpy() functions:

Function char *strcpy(char *dst, char *src)
API libc (UNIX and Windows)
Similar functions _tcscpy, lstrcpyA, wcscpy, _mbscpy
Purpose strcpy() copies the string located at src to the destination dst. It ceases copying when it encounters an end of string character (a NUL byte).

The following code is an example of misusing strcpy():

char *read_command(int sockfd) {     char username[32], buffer[1024];     int n;     if((n = read(sockfd, buffer, sizeof(buffer)1) <= 0)         return NULL;     buffer[n] = '\0';     switch(buffer[0]){         case 'U':             strcpy(username, &buffer[1]);             break;         ...     } }

This code is an obvious misuse of strcpy(). The source buffer can easily contain a string longer than the destination buffer, so a buffer overflow might be triggered. Bugs of this nature were once very common, but they are less common now because developers are more aware of the misuses of strcpy(); however, they still occur, particularly in closed-source applications that aren't widely distributed.

strcat()

String concatenation is often used when building strings composed of several components (such as paths). When calling strcat(), the destination buffer (dst) must be large enough to hold the string already there, the concatenated string (src), plus the NUL terminator. The following list summarizes the strcat() function:

Function char *strcat (char *dst, char *src)
API libc (UNIX and Windows)
Similar functions _tcscat, wcscat, _mbscat
Purpose The strcat() functions are responsible for concatenating two strings together. The src string is appended to dst.

The following code shows an example of misusing strcat():

int process_email(char *email) {     char username[32], domain[128], *delim;     int c;     delim = strchr(email, '@');     if(!delim)         return -1;     *delim++ = '\0';     if(strlen(email) >= sizeof(username))         return -1;     strcpy(username, email);     if(strlen(delim) >= sizeof(domain))         return -1;     strcpy(domain, delim);     if(!strchr(delim, '.'))         strcat(domain, default_domain);     delim[-1] = '@';     ... process domain ...     return 0; }

The code in this example performs several string copies, although each one includes a length check to ensure that the supplied buffer doesn't overflow any destination buffers. When a hostname is supplied without a trailing domain, however, a default string value is concatenated to the buffer in an unsafe manner (as shown in the bolded line). This vulnerability occurs because no size check is done to ensure that the length of default_domain plus the length of delim is less than the length of the domain buffer.

Bounded String Functions

The bounded string functions were designed to give programmers a safer alternative to the functions discussed in the previous section. These functions include a parameter to designate the length (or bounds) of the destination buffer. This length parameter makes it easier to use the bounded functions securely, but they are still susceptible to misuse in more subtle ways. For instance, it is important to double-check that the specified length is in fact the correct size of the resulting buffer. Although this check sounds obvious, length miscalculations or erroneous length parameters are frequent when using these functions. These are the conditions that might cause the length parameter to be incorrect:

Carelessness
Erroneous input
Length miscalculation
Arithmetic boundary conditions
Converted data types

This shouldn't be considered an exhaustive list of problems. However, it should emphasize the point that use of safe functions doesn't necessarily mean the code is secure.

snprintf()

The snprintf() function is a bounded sprintf() replacement; it accepts a maximum number of bytes that can be written to the output buffer. This function is summarized in the following list:

Function int snprintf(char *dst, size_t n, char *fmt, ...)
API libc (UNIX and Windows)
Similar functions _sntprintf, _snprintf, _snwprintf, vsnprintf, _vsnprintf, _vsnwprintf
Purpose snprintf() formats data according to format specifiers into a string, just like sprintf(), except it has a size parameter.

An interesting caveat of this function is that it works slightly differently on Windows and UNIX. On Windows OSs, if there's not enough room to fit all the data into the resulting buffer, a value of -1 is returned and NUL termination is not guaranteed. Conversely, UNIX implementations guarantee NUL termination no matter what and return the number of characters that would have been written had there been enough room. That is, if the resulting buffer isn't big enough to hold all the data, it's NUL-terminated, and a positive integer is returned that's larger than the supplied buffer size. This difference in behavior can cause bugs to occur in these situations:

A developer familiar with one OS is writing code for another and isn't aware of their differences.
An application is built to run on both Windows and UNIX, so the application works correctly on one OS but not the other.

Listing 8-1 is an example of a vulnerability resulting from assuming the UNIX behavior of vsnprintf() in a Windows application.

Listing 8-1. Different Behavior of vsnprintf() on Windows and UNIX

#define BUFSIZ 4096 int log(int fd, char *fmt, ...) {    char buffer[BUFSIZ];    int n;    va_list ap;    va_start(ap, fmt);    n = vsnprintf(buffer, sizeof(buffer), fmt, ap);    if(n >= BUFSIZ - 2)        buffer[sizeof(buffer)-2] = '\0';    strcat(buffer, "\n");    va_end(ap);    write_log(fd, buffer, strlen(buffer));    return 0; }

The code in Listing 8-1 works fine on UNIX. It checks to ensure that at least two bytes still remain in the buffer to fit in the trailing newline character or it shortens the buffer so that the call to strcat() doesn't overflow. If the same code is run on Windows, however, it's a different story. If buffer is filled, n is set to 1, so the length check passes and the newline character is written outside the bounds of buffer.

strncpy()

The strncpy() function is a "secure" alternative to strcpy(); it accepts a maximum number of bytes to be copied into the destination. The following list summarizes the strncpy() function:

Function char *strncpy(char *dst, char *src, size_t n)
API libc (UNIX and Windows)
Similar functions _tcsncpy, _csncpy, wcscpyn, _mbsncpy
Purpose strncpy() copies the string located at src to the destination dst. It ceases copying when it encounters an end of string character (a NUL byte) or when n characters have been written to the destination buffer.

The strncpy() function does not guarantee NUL-termination of the destination string. If the source string is larger than the destination buffer, strncpy() copies as many bytes as indicated by the size parameter, and then ceases copying without NUL-terminating the buffer. This means any subsequent operations performed on the resulting string could produce unexpected results that can lead to a security vulnerability. Listing 8-2 shows an example of misusing strncpy().

Listing 8-2. Dangerous Use of strncpy()

 int is_username_valid(char *username) {     char *delim;     int c;     delim = strchr(name, ':');     if(delim){         c = *delim;         *delim = '\0';     }     ... do some processing on the username ...     *delim = c;     return 1; } int authenticate(int sockfd) {     char user[1024], *buffer;     size_t size;     int n, cmd;     cmd = read_integer(sockfd);     size = read_integer(sockfd);     if(size > MAX_PACKET)         return -1;     buffer = (char *)calloc(size+1, sizeof(char));     if(!buffer)         return -1;     read_string(buffer, size);     switch(cmd){         case USERNAME:             strncpy(user, buffer, sizeof(user));             if(!is_username_valid(user))                 goto fail;             break;         ...     } }

The code copies data into a buffer by using strncpy() but fails to explicitly NUL-terminate the buffer afterward. The buffer is then passed as an argument to the is_username_valid() function, which performs a strchr() on it. The strchr() function searches for a specific character in a string (the : in this case). If strchr() finds the character it returns a pointer to it, otherwise it returns a NULL if the character is not found. Because there's no NUL character in this buffer, strchr() might go past the end of the buffer and locate the character it's searching for in another variable or possibly in the program's control information (such as a frame pointer, return address on the stack, or a chunk header on the heap). This byte is then changed, thus potentially affecting the program's state in an unpredictable or unsafe manner.

The wcscpyn() function is a safe alternative to wcscpy(). This function is susceptible to the same misuses as strncpy(). If the source string is larger than the destination buffer, no NUL terminator is appended to the resulting string. Additionally, when dealing with wide characters, application developers often make the mistake of supplying the destination buffer's size in bytes rather than specifying the number of wide characters that can fit into the destination buffer. This issue is discussed later in this chapter in "Windows Unicode Functions."

strncat()

The strncat() function, summarized in the following list, is intended to be a safe alternative to the strcat() function:

Function char *strncat(char *dst, char *src, size_t n)
API libc (UNIX and Windows)
Similar functions _tcsncat, wcsncat, _mbsncat
Purpose strncat() concatenates two strings together. The string src points to is appended to the string dst points to. It copies at most n bytes.

However, strncat() is nearly as dangerous as strcat(), in that it's quite easy to misuse. Specifically, the size parameter can be confusingit indicates the amount of space left in the buffer. The first common mistake application developers make is supplying the size of the entire buffer instead of the size remaining in the buffer. This mistake is shown in the following example:

int copy_data(char *username) {     char buf[1024];     strcpy(buf, "username is: ");     strncat(buf, username, sizeof(buf));     log("%s\n", buf);     return 0; }

This code incorrectly supplies the buffer's total size rather than the remaining size, thus allowing someone who can control the username argument to overflow the buffer.

A more subtle mistake can be made when using strncat(). As stated previously, the size argument represents how many bytes remain in the buffer. This statement was slightly oversimplified in that the size doesn't account for the trailing NUL byte, which is always added to the end of the string. Therefore, the size parameter needs to be the amount of space left in the buffer less one; otherwise, the NUL byte is written one byte past the end of the buffer. The following example shows how this mistake typically appears in application code:

int copy_data(char *username) {     char buf[1024];     strcpy(buf, "username is: ");     strncat(buf, username, sizeof(buf)  strlen(buf));     log("%s\n", buf);     return 0; }

This code doesn't account for the trailing NUL byte, so it's an off-by-one vulnerability. Note that even when supplying the correct length parameter to strncat (that is, sizeof(buf) strlen(buf) 1), an integer underflow could occur, also resulting in a buffer overflow.

strlcpy()

The strlcpy() function is a BSD-specific extension to the libc string APIs. It attempts to address the shortcomings of the strncpy() function. Specifically, it guarantees NUL byte termination of the destination buffer. This function is one of the safest alternatives to strcpy() and strncpy(); however, it's not used a great deal for portability reasons. The following list summarizes the strlcpy() function:

Function size_t strlcpy(char *dst, char *src, size_t n)
API libc (BSD)
Similar functions None
Purpose strlcpy() acts exactly the same as strncpy() except it guarantees that the destination buffer is NUL-terminated. The length argument includes space for the NUL byte.

When auditing code that uses strlcpy(), be aware that the size returned is the length of the source string (not including the NUL byte), so the return value can be larger than the destination buffer's size. The following example shows some vulnerable code:

int qualify_username(char *username) {     char buf[1024];     size_t length;     length = strlcpy(buf, username, sizeof(buf));     strncat(buf, "@127.0.0.1", sizeof(buf)  length);     ... do more stuff ... }

The length parameter returned from strlcpy() is used incorrectly in this code. If the username parameter to this function is longer than 1024 bytes, the strlcat() size parameter underflows and allows data to be copied out of the buffer's bounds. Vulnerabilities such as this aren't common because the return value is usually discarded. However, ignoring the result of this function can result in data truncation.

strlcat()

The strlcat() function, summarized in the following list, is another BSD-specific extension to the libc API that is intended to address the shortcomings of the strncat() function:

Function size_t strlcat(char *dst, char *src, size_t n)
API libc (BSD)
Similar functions None
Purpose strlcat() concatenates two strings together in much the same way as strncat().

The size parameter has been changed so that the function is simpler for developers to use. The size parameter for strlcat() is the total size of the destination buffer instead of the remaining space left in the buffer, as with strncat(). The strlcat() function guarantees NUL-termination of the destination buffer. Again, this function is one of the safest alternatives to strcat() and strncat(). Like strlcpy(), strlcat() returns the total number of bytes required to hold the resulting string. That is, it returns the string length of the destination buffer plus the string length of the source buffer. One exception is when the destination string buffer is already longer than the n parameter, in which case the buffer is left untouched and the n parameter is returned.

Common Issues

Parsing text at the character level can be a complicated task. Small oversights made by application developers can result in buffer overflows, operating on uninitialized memory regions, or misinterpretations of the content. Code auditors need to focus on code regions that manipulate text, particularly write operations because careless writes pose the most immediate threat to application security. The following sections introduce fundamental concepts and provide some common examples of text processing issues.

Unbounded Copies

The easiest unbounded copies to spot are those that simply don't do any checking on the bounds of destination buffers, much like the vulnerable use of strcpy() in "Unbounded String Functions." Listing 8-3 shows an example.

Listing 8-3. Strcpy()-like Loop

if (recipient == NULL     && Ustrcmp(errmess, "empty address") != 0)   {   uschar hname[64];   uschar *t = h->text;   uschar *tt = hname;   uschar *verb = US"is";   int len;   while (*t != ':') *tt++ = *t++;    *tt = 0;

Listing 8-3 shows a straightforward vulnerability. If the length of the source string is larger than the size of hname, a stack overflow occurs when the bolded code runs. It's a good idea to note functions that make blatantly unchecked copies like this and see whether they are ever called in a vulnerable manner.

Character Expansion

Character expansion occurs when software encodes special characters, resulting in a longer string than the original. This is common in metacharacter handling, as discussed over the course of this chapter, but it can also occur when raw data is formatted to make it human readable. Character expansion code may be vulnerable when the resulting expanded string is too large to fit in the destination buffer, as in the example in Listing 8-4.

Listing 8-4. Character Expansion Buffer Overflow

int write_log(int fd, char *data, size_t len) {     char buf[1024], *src, *dst;     if(strlen(data) >= sizeof(buf))         return -1;     for(src = data, dst = buf; *src; src++){         if(!isprint(*src)){             sprintf(dst, "%02x", *src);             dst += strlen(dst);         } else             *dst++ = *src;     }     *dst = '\0';     ... }

In Listing 8-4, you can see that if nonprintable characters are encountered, the bolded section of the code writes a hexadecimal representation of the character to the destination buffer. Therefore, for each loop iteration, the program could write two output characters for every one input character. By supplying a large number of nonprintable characters an attacker can cause an overflow to occur in the destination buffer.

Incrementing Pointers Incorrectly

Security vulnerabilities may occur when pointers can be incremented outside the bounds of the string being operated on. This problem happens primarily in one of the following two cases: when a string isn't NUL-terminated correctly; or when a NUL terminator can be skipped because of a processing error. You saw in Listing 8-2 that strncpy() can be the cause of a string not being NUL-terminated. Often when a string isn't terminated correctly, further processing on the string is quite dangerous. For example, consider a string being searched with the strchr() function for a particular separator. If the NUL terminator is missing, the search doesn't stop at the end of the user-supplied data as intended. The character being searched for may be located in uninitialized memory or adjacent variables, which is a potential vulnerability. The following example shows a similar situation:

int process_email(char *email) {     char buf[1024], *domain;     strncpy(buf, email, sizeof(buf));     domain = strchr(buf, '@');     if(!domain)         return -1;     *domain++ = '\0';     ...     return 0; }

The example neglects to NUL-terminate buf, so the subsequent character search might skip outside the buffer's bounds. Even worse, the character being searched for is changed to a NUL byte, so variables or program state could possibly be corrupted. Another interesting implication of neglecting to NUL-terminate a buffer is that a buffer overflow condition might be introduced if the programmer makes assumptions about the maximum length of the string in the buffer. The following code shows a common example of making this assumption:

int process_address(int sockfd) {     char username[256], domain[256], netbuf[256], *ptr;     read_data(sockfd, netbuf, sizeof(netbuf));     ptr = strchr(netbuf, ':');     if(ptr)         *ptr++ = '\0';     strcpy(username, netbuf);     if(ptr)         strcpy(domain, ptr);     ... }

The process_address() function is written with the assumption that read_data() correctly NUL-terminates the netbuf character array. Therefore, the strcpy() operations following it should be safe. If the read_data() function doesn't properly terminate the buffer, however, the length of the data read in to netbuf can be longer than 256 depending on what's on the program stack after it. Therefore, the strcpy() operations could overflow the username buffer.

There's also the odd situation of code that's processing text strings failing to identify when it has encountered a NUL byte because of an oversight in the processing. This error might happen because the code searches for a particular character in a string but fails to check for a NUL byte, as shown in the following example:

// locate the domain in an e-mail address for(ptr = src; *ptr != '@'; ptr++);

Notice that this loop is searching specifically for an @ character, but if none are in the string, the loop keeps incrementing past the end of the string until it finds one. There are also slight variations to this type of error, as in this example:

// locate the domain in an e-mail address for(ptr = src; *ptr && *ptr != '@'; ptr++); ptr++;

This second loop is formed more correctly and terminates when it encounters the @ symbol or a NUL byte. However, after the loop is completed, the programmer still made the assumption that it stopped because it found an @ symbol, not a NUL byte. Therefore, if the @ symbol is not found the pointer is incremented past the NUL byte.

The third example of incrementing outside a buffer's bounds usually occurs when programmers make assumptions on the content of the buffer they're parsing. An attacker can use intentionally malformed data to take advantage of these assumptions and force the program into doing something it shouldn't. Say you have a string containing variables submitted by a form from a Web site, and you want to parse and store these variables. This process involves decoding hexadecimal sequences in the form %XY; X and Y are hexadecimal characters (09, af, and AF) representing a byte value. If the application fails to ensure that one of the two characters following the % is a NUL terminator, the application might attempt to decode the hexadecimal sequence and then skip the NUL byte and continue processing on uninitialized memory. Listing 8-5 shows an example of this error.

Listing 8-5. Vulnerable Hex-Decoding Routine for URIs

/*  * Decoding URI-encoded strings  */ void nmz_decode_uri(char *str) {     int i, j;     for (i = j = 0; str[i]; i++, j++) {         if (str[i] == '%') {             str[j] = decode_uri_sub(str[i + 1], str[i + 2]);             i += 2;         } else if (str[i] == '+') {             str[j] = ' ';         } else {             str[j] = str[i];         }     }     str[j] = '\0'; }

This code contains a simple mistake in the bolded line: The developer makes the assumption that two valid characters follow a % character, which also assumes that the string doesn't terminate in those two bytes. Strings can often have a more complicated structure than the developer expects, however. Because there are multiple state variables that affect how the parsing function interprets text, there are more possibilities to make a mistake such as this one. Listing 8-6 shows another example of this type of error. It's taken from the mod_dav Apache module and is used to parse certain HTTP headers.

Listing 8-6. If Header Processing Vulnerability in Apache's mod_dav Module

while (*list) {   /* List is the entire production (in a URI scope) */ switch (*list) {   case '<':     if ((state_token = dav_fetch_next_token(&list, '>'))         == NULL) {     /* ### add a description to this error */       return dav_new_error(r->pool, HTTP_BAD_REQUEST,                              DAV_ERR_IF_PARSE, NULL);     }     if ((err = dav_add_if_state(r->pool, ih, state_token,          dav_if_opaquelock, condition, locks_hooks))           != NULL) {         /* ### maybe add a higher level description */       return err;     }     condition = DAV_IF_COND_NORMAL;     break;   case 'N':     if (list[1] == 'o' && list[2] == 't') {       if (condition != DAV_IF_COND_NORMAL) {         return dav_new_error(r->pool, HTTP_BAD_REQUEST,                                DAV_ERR_IF_MULTIPLE_NOT,                                "Invalid \"If:\" header: "                                "Multiple \"not\" entries "                                "for the same state.");       }       condition = DAV_IF_COND_NOT;     }     list += 2;     break;   case ' ':   case '\t':     break;   default:     return dav_new_error(r->pool, HTTP_BAD_REQUEST,                            DAV_ERR_IF_UNK_CHAR,                            apr_psprintf(r->pool,                            "Invalid \"If:\" "                            "header: Unexpected "                            "character encountered "                            "(0x%02x, '%c').",                          *list, *list));   }   list++; } break;

This code fails to check for NUL terminators correctly when it encounters an N character. The N case should check for the presence of the word "Not" and then skip over it. However, the code skips over the next two characters anytime it encounters an N character. An attacker can specify a header string ending with an N character, meaning an N character followed by a NUL character. Processing will continue past the NUL character to data in memory adjacent to the string being parsed. The vulnerable code path is demonstrated by the bolded lines in the listing.

Simple Typos

Text-processing vulnerabilities can occur because of simple errors that almost defy classification. Character processing is easy to mess up, and the more complex the code is, the more likely it is that a developer will make mistakes. One occasional mistake is a simple pointer use error, which happens when a developer accidentally dereferences a pointer incorrectly or doesn't dereference a pointer when necessary. These mistakes are often the result of simple typos, and they are particularly common when dealing with multiple levels of indirection. Listing 8-7 shows an example of a failure to dereference a pointer in Apache's mod_mime module.

Listing 8-7. Text-Processing Error in Apache mod_mime

while (quoted && *cp != '\0') {     if (is_qtext((int) *cp) > 0) {         cp++;   }   else if (is_quoted_pair(cp) > 0) {      cp += 2;     }   ...

This code block is in the analyze_ct() function, which is involved in parsing MIME (Multipurpose Internet Mail Extensions) content. If the is_quoted_pair() function returns a value greater than zero, the cp variable is incremented by two. The following code shows the definition of is_quoted_pair():

static int is_quoted_pair(char *s) {     int res = 1;     int c;     if (((s + 1) != NULL) && (*s == '\\')) {         c = (int) *(s + 1);         if (ap_isascii(c)) {            res = 1;         }     }     return (res); }

Notice that the function is intended to check for an escape sequence of a backslash (\) followed by a non-NUL byte. However, the programmer forgot to dereference (s + 1); so the check will never fail because the result of the comparison is always true. This is a very subtle typojust a missing * characterbut it completely changes the meaning of the code, resulting in a potential vulnerability.