Auditing Functions

Functions are a ubiquitous component of modern programs, regardless of the application's problem domain or programming language. Application programmers usually divide programs into functions to encapsulate functionality that can be reused in other places in the program and to organize the program into smaller pieces that are easier to conceptualize and manage. Object-oriented programming languages encourage creating member functions, which are organized around objects. As a code auditor, when you encounter a function call, it's important to be cognizant of that call's implications. Ask yourself: What program state changes because of that call? What things can possibly go wrong with that function? What role do arguments play in how that function operates? Naturally, you want to focus on arguments and aspects of the function that users can influence in some way. To formalize this process, look for these four main types of vulnerability patterns that can occur when a function call is made:

Return values are misinterpreted or ignored.
Arguments supplied are incorrectly formatted in some way.
Arguments get updated in an unexpected fashion.
Some unexpected global program state change occurs because of the function call.

The following sections explore these patterns and explain why they are potentially dangerous.

Function Audit Logs

Because functions are the natural mechanism by which programmers divide their programs into smaller, more manageable pieces, they provide a great way for code auditors to divide their analysis into manageable pieces. This section covers creating an audit log, where you can keep notes about locations in the program that could be useful in later analysis. This log is organized around functions and should contain notes on each function's purpose and side effects. Many code auditors use an informal process for keeping these kinds of notes, and the sample audit log used in this section synthesizes some of these informal approaches.

To start, list the basic components of an entry, as shown in Table 7-1, and then you can expand on the log as vulnerabilities related to function interaction are discussed.

Table 7-1. Sample Audit Log
Function prototype	`int read_data(int sockfd, char *buffer, int length)`
Description	Reads data from the supplied socket and allocates a buffer for storage.
Location	`src/net/read.c`, line 29
Cross-references	`process_request`, `src/net/process.c`, line 400 `process_login`, `src/net/process.c`, line 932
Return value type	32-bit signed integer.
Return value meaning	Indicates error: 0 for success or -1 for error.
Error conditions	`calloc()` failure when allocating `MAX_SIZE` bytes. If read returns less than or equal to 0.
Erroneous return values	When `calloc()` fails, the function returns NULL instead of -1.

While you don't need to understand the entire log yet, the following is a brief summary of each row that you can easily refer back to:

Function name The complete function prototype.
Description A brief description of what the function does.
Location The location of the function definition (file and line number).
Cross-references The locations that call this function definition (files and line numbers).
Return value type The C type that is returned.
Return value meaning The set of return types and the meaning they convey.
Error conditions Conditions that might cause the function to return error values.
Erroneous return values Return values that do not accurately represent the functions result, such as not returning an error value when a failure condition occurs.

Return Value Testing and Interpretation

Ignored or misinterpreted return values are the cause of many subtle vulnerabilities in applications. Essentially, each function in an application is a compartmentalized code fragment designed to perform one specific task. Because it does this in a "black box" fashion, details of the results of its operations are largely invisible to calling functions. Return values are used to indicate some sort of status to calling functions. Often this status indicates success or failure or provides a value that's the result of the function's taskwhether it's an allocated memory region, an integer result from a mathematical operation, or simply a Boolean true or false to indicate whether a specific operation is allowed. In any case, the return value plays a major part in function calling, in that it communicates some result between two separate functional components. If a return value is misinterpreted or simply ignored, the program might take incorrect code paths as a result, which can have severe security implications. As you can see, a large part of the information in the audit log is related to the return value's meaning and how it's interpreted. The following sections explore the process a code auditor should go through when auditing function calls to determine whether a miscommunication can occur between two components and whether that miscommunication can affect the program's overall security.

Ignoring Return Values

Many functions indicate success or failure through a return value. Consequently, ignoring a return value could cause an error condition to go undetected. In this situation, a code auditor must determine the implications of a function's potential errors going undetected. The following simple example is quite common:

char *buf = (char *)malloc(len); memcpy(buf, src, len);

Quite often, the malloc() function isn't checked for success or failure, as in the preceding code; the developer makes the assumption that it will succeed. The obvious implication in this example is that the application will crash if malloc() can be made to fail, as a failure would cause buf to be set to NULL, and the memcpy() would cause a NULL pointer dereference. Similarly, it's not uncommon for programmers to fail to check the return value of realloc(), as shown in Listing 7-25.

Listing 7-25. Ignoring realloc() Return Value

struct databuf {     char *data;     size_t allocated_length;     size_t used; }; ... int append_data(struct databuf *buf, char *src, size_t len) {     size_t new_size = buf->used + len + EXTRA;     if(new_size < len)         return -1;     if(new_size > buf->allocated_length)     {         buf->data = (char *)realloc(buf->data, new_size);         buf->allocated_length = new_size;     }     memcpy(buf->data + buf->used, src, len);     buf->used += len;     return 0; }

As you can see the buf->data element can be reallocated, but the realloc() return value is never checked for failure. When the subsequent memcpy() is performed, there's a chance an exploitable memory corruption could occur. Why? Unlike the previous malloc() example, this code copies to an offset from the allocated buffer. If realloc() fails, buf->data is NULL, but the buf->used value added to it might be large enough to reach a valid writeable page in memory.

Ignoring more subtle failures that don't cause an immediate crash can lead to far more serious consequences. Paul Starzetz, an accomplished researcher, discovered a perfect example of a subtle failure in the Linux kernel's memory management code. The do_mremap() code is shown in Listing 7-26.

Listing 7-26. Linux do_mremap() Vulnerability

    /* new_addr is valid only if MREMAP_FIXED is        specified */     if (flags & MREMAP_FIXED) {             if (new_addr & ~PAGE_MASK)                     goto out;             if (!(flags & MREMAP_MAYMOVE))                     goto out;             if (new_len > TASK_SIZE                 || new_addr > TASK_SIZE - new_len)                     goto out;             /* Check if the location you're moving into              * overlaps the old location at all, and              * fail if it does.              */             if ((new_addr <= addr)                 && (new_addr+new_len) > addr)                     goto out;             if ((addr <= new_addr)                 && (addr+old_len) > new_addr)                     goto out;             do_munmap(current->mm, new_addr, new_len);     }     /*      * Always allow a shrinking remap: that just unmaps      * the unnecessary pages.      */     ret = addr;     if (old_len >= new_len) {             do_munmap(current->mm, addr+new_len,                       old_len - new_len);             if (!(flags & MREMAP_FIXED)                 || (new_addr == addr))                     goto out;     }

The vulnerability in this code is that the do_munmap() function could be made to fail. A number of conditions can cause it to fail; the easiest is exhausting maximum resource limits when splitting an existing virtual memory area. If the do_munmap() function fails, it returns an error code, which do_mremap() completely ignores. The result of ignoring this return value is that the virtual memory area (VMA) structures used to represent page ranges for processes can be made inconsistent by having page table entries overlapped in two VMAs or totally unaccounted-for VMAs. Through a novel exploitation method using the page-caching system, arbitrary pages could be mapped erroneously into other processes, resulting in a privilege escalation condition. More information on this vulnerability is available at www.isec.pl/vulnerabilities/isec-0014-mremap-unmap.txt.

Generally speaking, if a function call returns a value, as opposed to returning nothing (such as a void function), a conditional statement should follow each function call to test for success or failure. Notable exceptions are cases in which the function terminates the application via a call to an exit routine or errors are handled by an exception mechanism in a separate block of code. If no check is made to test for success or failure of a function, the code auditor should take note of the location where the value is untested.

Taking this investigation a step further, the auditor can then ask what the implications are of ignoring this return value. The answer depends on what can possibly go wrong in the function. The best way to find out exactly what can go wrong is to examine the target function and locate each point at which the function can return. Usually, several error conditions exist that cause the function to return as well as one return at successful completion of its task. The most interesting cases for auditors to examine, naturally, are those in which errors do occur. After identifying all the ways in which the function might return, the auditor has a list of possible error conditions that can escape undetected in the application. After compiling this list, any conditions that are impossible for users to trigger can be classified as a lower risk, and auditors can focus on conditions users are able to trigger (even indirectly, such as a memory allocation failure). Listing 7-27 provides an opportunity to apply this investigation process to a simple code block.

Listing 7-27. Finding Return Values

int read_data(int sockfd, char **buffer, int *length) {     char *data;     int n, size = MAX_SIZE;     if(!(data = (char *)calloc(MAX_SIZE, sizeof(char))))         return 1;     if((n = read(sockfd, data, size)) <= 0)         return 1;     *length = n;     *buffer = data;     return 0; }

Assume you have noticed a case in which the caller doesn't check the return value of this function, so you decide to investigate to see what can possibly go wrong. The function can return in three different ways: if the call to calloc() fails, if the call to read() fails, or if the function successfully returns. Obviously the most interesting cases are the two error conditions, which should be noted in your audit log. An error condition occurs when the call to calloc() fails because the memory of the process has been exhausted. (Causing the program to exhaust its memory is tricky, but it's certainly possible and worth considering.) An error condition can also occur when read() returns an error or zero to indicate the stream is closed, which is probably quite easy to trigger. The implications of ignoring the return value to this function depend on operations following the function call in the calling routine, but you can immediately deduce that they're probably quite serious. How do you know this? The buffer and length arguments are never initialized if the function failsso if the caller fails to check for failure, most likely it continues processing under the assumption that the buffer contains a pointer to some valid memory region with bytes in it to process. Listing 7-28 shows an example of what this type of calling function might look like.

Listing 7-28. Ignoring Return Values

int process_request(int sockfd) {     char *request;     int len, reqtype;     read_data(sockfd, &request, &len);     reqtype = get_token(request, len);     ... }

The code is written with the assumption that read_data() returned successfully and passes what should be a character buffer and the number of bytes in it to the function get_token(), presumably to get a keyword out of the request buffer to determine what type of request is being issued. Because read_data() isn't checked for success, it turns out that two uninitialized stack variables could be supplied to get_token(): request, which is expected to point to some allocated memory, and len, which is expected to indicate the number of bytes read off the network into request. Although the exact consequences of this error depend on how get_token() operates, you know from the discussion earlier in this chapter that processing uninitialized variables can have severe consequences, so ignoring the return value of read_data() probably has serious implications. These implications range from a best-case scenario of just crashing the application to a worse-case scenario of corrupting memory in an exploitable fashion. Pay close attention to how small differences in the caller could affect the significance of these errors. As an example, take a look at this slightly modified calling function:

int process_request(int sockfd) {     char *request = NULL;     int len = 0, reqtype;     read_data(sockfd, &request, &len);     reqtype = get_token(request, len);     ... }

Here, you have the same function with one key difference: The stack variables passed to read_data() are initialized to zero. This small change in the code drastically affects the seriousness of ignoring the return value of read_data(). Now the worst thing that can happen is that the program can be made to crash unexpectedly, which although undesirable, isn't nearly as serious as the memory corruption that was possible in the function's original version. That being said, err on the side of caution when estimating the impact of return values, as crashing the application might not be the end of the story. The application might have signal handlers or exception handlers that perform some program maintenance before terminating the process, and they could provide some opportunity for exploitation (although probably not in this example).

Misinterpreting Return Values

Another situation that could cause problems happens when a return value of a function call is tested or utilized, but the calling function misinterprets it. A return value could be misinterpreted in two ways: A programmer might simply misunderstand the meaning of the return value, or the return value might be involved in a type conversion that causes its intended meaning to change. You learned about type conversion problems in Chapter 6, so this section focuses mainly on errors related to the programmer misinterpreting a return value.

This type of programmer error might seem unlikely or uncommon, but it tends to occur quite often in production code, especially when a team of programmers is developing an application and using third-party code and libraries. Often developers might not fully understand the external code's correct use, the external code might change during the development process, or specifications and documentation for the external code could be incorrect. Programmers can also misuse well-known APIs, such as the language's runtime library, because of a lack of familiarity or simple carelessness. To understand this point, consider the following code:

#define SIZE(x, y) (sizeof(x)  ((y)  (x))) char buf[1024], *ptr; ptr = buf; ptr += snprintf(ptr, SIZE(buf, ptr), "user: %s\n", username); ptr += snprintf(ptr, SIZE(buf, ptr), "pass: %s\n", password); ...

This code contains a simple mistake. On UNIX machines, the snprintf() function typically returns how many bytes it would have written to the destination, had there been enough room. Therefore, the first call to snprintf() might return a value larger than sizeof(buf) if the username variable is very long. The result is that the ptr variable is incremented outside the buffer's bounds, and the second call to snprintf() could corrupt memory due to and integer overflow in the SIZE macro. Hence, the password written into the buffer could be arbitrarily large.

Vulnerabilities that arise from misinterpreting return values are often quite subtle and difficult to spot. The best way to go about finding these vulnerabilities is by taking this systematic approach when examining a function:

Determine the intended meaning of the return value for the function. When the code is well commented or documented, the auditor might have a good idea of its meaning even before looking at the code; however, verifying that the function returns what the documenter says it does is still important.
Look at each location in the application where the function is called and see what it does with the return value. Is it consistent with that return value's intended meaning?

The first step raises an interesting point: Occasionally, the fault of a misinterpreted return value isn't with the calling function, but with the called function. That is, sometimes the function returns a value that's outside the documented or specified range of expected return values, or it's within the range of valid values but is incorrect for the circumstance. This error is usually caused by a minor oversight on the application developer's part, but the consequences can be quite drastic. For example, take a look at Listing 7-29.

Listing 7-29. Unexpected Return Values

int authenticate(int sock, int auth_type, char *login) {     struct key *k;     char *pass;     switch(auth_type){         case AUTH_USER:            if(!(pass = read_string(sock)))                return -1;            return verify_password(login, pass);        case AUTH_KEY:            if(!(key = read_key(sock)))                return 0;            return verify_key(login, k);        default:            return 0; } int check_credentials(int sock) {     int auth_type, authenticated = 0;     auth_type = read_int(sock);     authenticated = authenticate(sock, auth_type, login);     if(!authenticated)         die("couldn't authenticate %s\n", login);     return 0; }

Assume that the authenticate() function in Listing 7-29 is supposed to return 1 to indicate success or 0 to indicate failure. You can see, however, that a mistake was made because failure can cause the function to return -1 rather than 0. Because of the way the return value was checkedby testing the return value for zero or non-zerothis small logic flaw could allow users to log in even though their credentials are totally invalid! However, this program wouldn't be vulnerable if the return value check specifically tested for the value of 1, as in this example:

if(authenticated != 1)     .. error ..

Non-zero values represent true in a boolean comparison; so it's easy to see how such a misunderstanding could happen. To spot these errors, auditors can use a process similar to the one for identifying the implications of ignored return values:

Determine all the points in a function where it might return Again, usually there are multiple points where it might return because of errors and one point at which it returns because of successful completion.
Examine the value being returned Is it within the range of expected return values? Is it appropriate for indicating the condition that caused the function to return?

If you find a spot where an incorrect value is returned from a function, you should take note of its location and then evaluate its significance based on how the return value is interpreted in every place where the function is called. Because this process is so similar to determining the implications of ignoring the current function's return value, both tasks can and should be integrated into one process to save time. For example, say you're auditing the following function:

int read_data(int sockfd, char **buffer, int *length) {     char *data;     int n, size = MAX_SIZE;     if(!(data = (char *)calloc(MAX_SIZE, sizeof(char))))         return 0;     if((n = read(sockfd, data, size)) <= 0)         return -1;     *length = n;     *buffer = data;     return 0; }

The function audit logs presented earlier in this chapter provide an ideal way to capture all the important information about return values for the read_data() function presented here. Table 7-2 demonstrates the rows in an audit log for this function that encapsulates all of the relevant information on the expected return values from this function.

Table 7-2. Return Values from Sample Audit Log
Return value type	32-bit signed integer
Return value meaning	Indicates error: 0 for success or -1 for error

The implications of incorrect return values or of a calling function ignoring return values aren't listed in the table, as those implications vary depending on the calling function. Auditors could track this information in notes they keep on the process_request() and process_login() functions. Keeping a log for every function in a large application would be quite tedious (not to mention time consuming), so you might choose not to log this information based on two requirements: The function is never called in a context influenced by users who are potential attackers, such as configuration file utility functions, or the function is so small and simple that it's easy to remember how it operates.

Keeping these logs might seem excessive because after reading the code, you know all the information needed to audit a function's use; however, there are two compelling reasons for writing down this information:

Applications can be arbitrarily complex, and functions might be called in hundreds of different places, each with slightly differing sets of circumstances.
When the application is updated, it's helpful to have a set of notes you can refer to if you want to see whether the changes have an interesting impact. The small nuances of functions are easy to forget over time, and this way, you can refer to your notes without reading the application code again, or worse, assuming you know how the application works and missing new vulnerabilities.

The second way function return values can be misinterpreted is a type conversion that causes the return value's meaning to change. This misinterpretation is an extension of the first kind of misinterpretationthe calling function simply misunderstands the meaning of the value. You have already learned about type conversion issues in Chapter 6, so you don't need to revisit them. However, be aware that when a return value is tested and discarded or stored in a variable for later use, determining the type conversions that take place during each subsequent use of the value is essential. When the return value is tested and discarded, you need to consider the type conversion rules to verify that the value is being interpreted as intended. When the return value is stored, you should examine the type of variable it's stored in to ensure that it's consistent with the type of the function's return value.

The return value log shown in Table 7-2 can help you discover vulnerabilities related to return value type conversions. In particular, the Return type and Return value meaning rows serve as a brief summary of how the return value is intended to be used. So if a type conversion takes place, you can quickly see whether parts of the return value meaning could be lost or misinterpreted by a type conversion (such as negative values).

Function Side-Effects

Side-effects occur when a function alters the program state in addition to any values it returns. A function that does not generate any side-effects is considered referentially transparentthat is, the function call can be replaced directly with the return value. In contrast, a function that causes side-effects is considered referentially opaque. Function side effects are an essential part of most programming languages. They allow the programmer to alter elements of the program state or return additional pieces of data beyond what the return value can contain. In this section, you will explore the impact of two very specific function side effects: manipulating arguments passed by reference (value-result arguments) and manipulating globally scoped variables.

Vulnerabilities resulting from manipulating pass-by-reference arguments can occur because the calling function's author neglects to account for possibility of changes to the arguments, or the function can be made to manipulate its arguments in an unanticipated or inconsistent fashion. One of the more common situations in which this bug can occur is when realloc() is used to resize a buffer passed as a pointer argument. The vulnerability usually occurs for one of two reasons: The calling function has a pointer that was not updated after a call to realloc(), or the new allocation size is incorrect because of a length miscalculation. Listing 7-30 shows an example of a function that reallocates a buffer passed by reference, resulting in the calling function referencing an outdated pointer.

Listing 7-30. Outdated Pointer Vulnerability

int buffer_append(struct data_buffer *buffer, char *data,                   size_t n) {     if(buffer->size  buffer->used < n){         if(!(buffer->data =              realloc(buffer->data, buffer->size+n)))             return -1;         buffer->size = buffer->size+n;     }     memcpy(buffer->data + buffer->used, n);     buffer->used += n;     return 0; } int read_line(int sockfd, struct data_buffer *buffer) {     char data[1024], *ptr;     int n, nl = 0;     for(;;){         n = read(sockfd, data, sizeof(data)-1);         if(n <= 0)             return 1;         if((ptr = strchr(data, '\n'))){             n = ptr  data;            nl = 1;         }         data[n] = '\0';         if(buffer_append(buffer, data, n) < 0)             return -1;         if(nl){             break;     }     return 0; } int process_token_string(int sockfd) {     struct data_buffer *buffer;     char *tokstart, *tokend;     int i;     buffer = buffer_allocate();     if(!buffer)         goto err;     for(i = 0; i < 5; i++){         if(read_data(sockfd, buffer) < 0)             goto err;         tokstart = strchr(buffer->data, ':');         if(!tokstart)             goto err;         for(;;){             tokend = strchr(tokstart+1, ':');             if(tokend)                 break;             if(read_line(sockfd, buffer) < 0)                 goto err;         }         *tokend = '\0';         process_token(tokstart+1);         buffer_clear(buffer);     }     return 0; err:     if(buffer)         buffer_free(buffer);     return 1; }

The process_token_string() function reads five tokens that are delimited by a colon character and can expand to multiple lines. During token processing, the read_line() function is called to retrieve another line of data from the network. This function then calls buffer_append(), which reallocates the buffer when there's not enough room to store the newly read line. The problem is that when a reallocation occurs, the process_token_string() function might end up with two outdated pointers that referenced the original buffer: tokstart and tokend. Both of these outdated pointers are then manipulated (as shown in bold), resulting in memory corruption.

As you can see, these outdated pointer bugs are generally spread out between several functions, making them much harder to find. So it helps to have a little more practice in identifying code paths vulnerable to these issues. Listing 7-31 shows another example of an outdated pointer use do to buffer reallocation, this time from example from ProFTPD 1.2.7 through 1.2.9rc2.

Listing 7-31. Outdated Pointer Use in ProFTPD

static void _xlate_ascii_write(char **buf, unsigned int *buflen,     unsigned int bufsize, unsigned int *expand) {   char *tmpbuf = *buf;   unsigned int tmplen = *buflen;   unsigned int lfcount = 0;   int res = 0;   register unsigned int i = 0;   /* Make sure this is zero (could be a holdover from a      previous call). */   *expand = 0;   /* First, determine how many bare LFs are present. */   if (tmpbuf[0] == '\n')     lfcount++;   for (i = 1; i < tmplen; i++)     if (tmpbuf[i] == '\n' && tmpbuf[i-1] != '\r')       lfcount++;

The _xlate_ascii_write() function checks how many newline characters are in the file being transmitted. In ASCII FTP modes, each newline must be prepended with a carriage return, so the program developers want to allocate a buffer big enough for those extra carriage returns to compensate for ASCII file transfers. The buffer being reallocated is the destination buffer, the first argument to the _xlate_ascii_write() function. If a reallocation occurs, the destination buffer is updated, as shown in the following code:

if ((res = (bufsize - tmplen - lfcount)) < 0) {   pool *copy_pool = make_sub_pool(session.xfer.p);   char *copy_buf = pcalloc(copy_pool, tmplen);   memmove(copy_buf, tmpbuf, tmplen);   /* Allocate a new session.xfer.buf of the needed size. */     session.xfer.bufsize = tmplen + lfcount;     session.xfer.buf = pcalloc(session.xfer.p,                                session.xfer.bufsize);     ... do more stuff ...   *buf = tmpbuf;   *buflen = tmplen + (*expand); }

The preceding code is fine, but look at the code that calls _xlate_ascii_write():

int data_xfer(char *cl_buf, int cl_size) {   char *buf = session.xfer.buf;   int len = 0;   int total = 0;   ... does some stuff ...       while (size) {         char *wb = buf;         unsigned int wsize = size, adjlen = 0;         if (session.flags & (SF_ASCII|SF_ASCII_OVERRIDE))           _xlate_ascii_write(&wb, &wsize, session.xfer.bufsize,                              &adjlen);         if(pr_netio_write(session.d->outstrm, wb, wsize) == -1)           return -1;

The data_xfer() function has a loop for transferring a certain amount of data for each iteration. Each loop iteration, however, resets the input buffer to the original session.xfer.buf, which might have been reallocated in _xlate_ascii_write(). Furthermore, session.xfer.bufsize is passed as the length of the buffer, which _xlate_ascii_write() also might have updated. As a result, if _xlate_ascii_write() ever reallocates the buffer, any subsequent loop iterations use an outdated pointer with an invalid size!

The previous examples centered on reallocating memory blocks. Similar errors have been uncovered in a number of applications over the past few years. Sometimes unique situations that are less obvious crop up. The code in Listing 7-32 is taken from the prescan() function in Sendmail. The vulnerability involves updating an argument to prescan() (the delimptr argument) to point to invalid data when certain error conditions cause the function to terminate unexpectedly during a nested loop. This vulnerability revolves around the p variable being incremented as the prescan() function reads in a character.

Listing 7-32. Sendmail Return Value Update Vulnerability

/* read a new input character */    c = (*p++) & 0x00ff;    if (c == '\0')    {        /* diagnose and patch up bad syntax */        if (state == QST)        {            usrerr("553 Unbalanced '\"'");            c = '"';        }        else if (cmntcnt > 0)        {            usrerr("553 Unbalanced '('");            c = ')';        }        else if (anglecnt > 0)        {            c = '>';            usrerr("553 Unbalanced '<'");        }        else            break;        p--;

When the end of the string is encountered, the break statement is executed and the inner loop is broken out of. A token is then written to the output avp token list, as shown in the following code:

    /* new token */        if (tok != q)        {            /* see if there is room */            if (q >= &pvpbuf[pvpbsize - 5])                goto addrtoolong;                *q++ = '\0';                if (tTd(22, 36))                {                    sm_dprintf("tok=");                    xputs(tok);                    sm_dprintf("\n");            }            if (avp >= &av[MAXATOM])            {                usrerr("553 5.1.0 prescan: too many tokens");                goto returnnull;            }            if (q - tok > MAXNAME)            {                usrerr("553 5.1.0 prescan: token too long");                goto returnnull;            }            *avp++ = tok;         }     } while (c != '\0' && (c != delim || anglecnt > 0));

If an error condition is encountered (the token is too long or there's more than MAXATOM tokens), an error is indicated and the function returns. However, the delimptr argument is updated to point outside the bounds of the supplied string, as shown in this code:

returnnull:     if (delimptr != NULL)         *delimptr = p;     CurEnv->e_to = saveto;     return NULL; }

When the error conditions shown earlier are triggered, the p variable points one byte past where the NUL byte was encountered, and delimptr is consequently updated to point to uninitialized stack data. The p variable is then manipulated, which creates the possibility of exploitation.

When reviewing an application, code auditors should make note of security-relevant functions that manipulate pass-by-reference arguments, as well as the specific manner in which they perform this manipulation. These kinds of argument manipulations often use opaque pointers with an associated set of manipulation functions. This type of manipulation is also an inherent part of C++ classes, as they implicitly pass a reference to the this pointer. However, C++ member functions can be harder to review due to the number of implicit functions that may be called and the fact that the code paths do not follow a more direct procedural structure. Regardless of the language though, the best way to determine the risk of a pass-by-reference manipulation is to follow this simple process:

Find all locations in a function where pass-by-reference arguments are modified, particularly structure arguments, such as the buffer structure in Listing 7-25.
Differentiate between mandatory modification and optional modification. Mandatory modification occurs every time the function is called; optional modification occurs when an abnormal situation arises. Programmers are more likely to overlook exceptional conditions related to optional modification.
Examine how calling functions use the modified arguments after the function has returned.

In addition, note when arguments aren't updated when they should be. Recall the read_line() function that was used to illustrate return value testing (see Listing 7-30). When the data allocation or read function failed, arguments that were intended to be updated every time weren't updated. Also, pay close attention to what happens when functions return early because of some error: Are arguments that should be updated not updated for some reason? You might think that if the caller function tests return values correctly, not updating arguments wouldn't be an issue; however, there are definitely cases in applications when arguments are supposed to be updated even when errors do occur (such as the Sendmail example shown in Listing 7-32). Therefore, even though the error might be detected correctly, the program is still vulnerable to misuse because arguments aren't updated correctly.

To help identify these issues with argument manipulation, use your function audit logs to identify where pass-by-reference arguments are modified in the function and any cases in which pass-by-reference arguments aren't modified. Then examine calling functions to determine the implications (if any) of these updates or lack of updates. To incorporate this check, you could add some rows to the audit log, as shown in Table 7-3.

Table 7-3. Rows to Add to the Function Audit Log
Mandatory modifications	`char *buffer` (second argument): Updated with a data buffer that's allocated within the function. `int length` (third argument): Updated with how many bytes are read into **buffer for processing.
Optional modifications	None
Exceptions	Both arguments aren't updated if the buffer allocation fails or the call to `read()` fails.

Auditing functions that modify global variables requires essentially the same thought processes as auditing functions that manipulate pass-by-reference arguments. The process involves auditing each function and enumerating the situations in which it modifies global variables. However, vulnerabilities introduced by modifying global variables might be more subtle because any number of different functions can make use of a global variable and, therefore, expect it to be in a particular state. This is especially true for code that can run at any point in the program, such as an exception handler or signal handler.

In practice, you can conduct this analysis along with argument manipulation analysis when you're creating function audit logs. You can place the notes about global variable modification in the rows for modifications. There may be a little more work in determining the implications of modifying global variables, however. To evaluate the risk of these variables being modified (or not modified when they should be), simply look at every instance in which the global variable is used. If you find a case in which a global variable is assumed to be initialized or updated in a certain way, attackers might be able to leverage the application when functions that are supposed to operate on the global variable don't or when functions modify it unexpectedly. In Listing 7-4, you saw an example of this kind of vulnerability in OpenSSH with the global buffer structure variables. In that code, the destruction functions called by fatal() make an assumption about their state being consistent.

In object-oriented programs, it can be much harder to determine whether global variables are susceptible to misuse because of unexpected modification. The difficulty arises because the order of execution of constituent member functions often isn't clear. In these cases, it is best to examine each function that makes use of the global variable and then attempt to come up with a situation in which a vulnerability could happen. For example, say you have two classes, C and D. C has member functions cX, cY, and cZ, and D has member functions dX, dY, and dZ. If you spot a potentially unexpected modification of a global variable in cX, and then see that global variable manipulated in dY and dZ, the challenge is to determine whether the cX function can be called in such a way that the global variable is updated in an unexpected fashion, and dY and dZ can operate on the global variable when it's in this inconsistent state.

Argument Meaning

Chapter 2 presented clarity as a design principle that affects the security of a system. Misleading or confusing function arguments provide a very immediate example of just this issue. Any confusion over the intended meaning of arguments can have serious security implications because the function doesn't perform as the developer expected. An argument's "intended meaning" generally means the data type the function expects for that argument and what the data stored in that argument is supposed to represent.

When auditing a function for vulnerabilities related to incorrect arguments being supplied, the process is as follows:

List the type and intended meaning of each argument to a function.
Examine all the calling functions to determine whether type conversions or incorrect arguments could be supplied.

The first thing to check for is type conversions. Type conversions actually occur often in arguments passed to a function, but most of the time they don't cause security-relevant problems. For example, integers are often passed to read() as the third argument, where they're converted to a size_t, but usually this conversion doesn't matter because the integer is a constant value. For each function call they analyze, code auditors should note any type conversions that do occur and how that argument is used in the function being audited. The conversion might become an issue if the interpretation of the argument can change based on a sign change. The issue might be significant if the argument's bit pattern changes during the type conversion (as in a sign extension) because the application developer probably didn't expect this type conversion.

Next, examine the argument's intended meaning, which can usually be determined by observing the context in which it's used. If a function's interface is unclear or misleading, an application developer can easily misunderstand how to use the function correctly, which can introduce subtle vulnerabilities. Chapter 8, "Strings and Metacharacters," presents examples involving MultiByteToWideChar() and other similar functions that illustrate a common mistake made in code dealing with wide characters. Often, in these functions, length arguments indicate a destination buffer's size in wide characters, not in bytes. Confusing these two data sizes is an easy mistake to make, and the result of mixing them up is usually a buffer overflow.

So how do you find vulnerabilities of this nature? You need to understand exactly how the function works and what arguments are used for in the function. The general rule is this: The more difficult the function is to figure out, the more likely it is that it will be used incorrectly. As with the other elements of function auditing, making a log recording the meaning of different arguments is recommended. This log can be used with the argument modification log because similar information is being recorded; basically, you want to know what arguments are required, how they are used, and what happens to these arguments throughout the course of the function. Table 7-4 shows an example of a function arguments log.

Table 7-4. Function Argument Audit Log
Argument 1 prototype	`wchar_t *dest`
Argument 1 meaning	Destination buffer where data is copied into from the source buffer
Argument 2 prototype	`wchar_t *src`
Argument 2 meaning	Source buffer where wide characters are copied from
Argument 3 prototype	`size_t len`
Argument 3 meaning	Maximum size in wide characters of the destination buffer (doesn't include a NUL terminator)
Implications	NUL termination is guaranteed. The `len` parameter doesn't include the null terminator character, so the null character can be written out of bounds if the supplied `len` is the exact size of the buffer divided by 2. The `length` parameter is in wide characters; callers might accidentally use `sizeof(buf)`, resulting in an overflow. If 0 is supplied as a `len`, it's decremented to -1, and an infinite copy occurs. If -1 `length` is supplied, it's set artificially to 256.

Table 7-4 lists a prototype and the intended meaning for each argument. Probably the most important part of the log is the implications list, which summarizes how application programmers could use the function incorrectly and notes any idiosyncrasies in the function that might cause exploitable conditions. After compiling this list, you can reference it at each location where the function is called and attempt to determine whether any conditions in the list can be true in the calling functions. In the sample function in Table 7-4, quite a few conditions result in the function doing something it shouldn't. It's an example of a function with an awkward interface, as it can be called incorrectly in so many ways that it would be quite easy for an application developer to misuse it.

Ultimately, the trick to finding vulnerabilities related to misunderstanding functions arguments is to be able to conceptualize a chunk of code in isolation. When you're attempting to understand how a function operates, carefully examine each condition that's directly influenced by the arguments and keep thinking about what boundary conditions might cause the function to be called incorrectly. This task takes a lot of practice, but the more time you spend doing it, the faster you can recognize potentially dangerous code constructs. Many functions perform similar operations (such as string copying and character expansion) and are, therefore, prone to similar misuses. As you gain experience auditing these functions, you can observe patterns common to exceptional conditions and, over time, become more efficient at recognizing problems. Spend some time ensuring that you account for all quirks of the function so that you're familiar with how the function could be misused. You should be able to answer any questions about a functions quirks and log the answers so that the information is easily accessible later. The small details of what happens to an argument during the function execution could present a whole range of opportunities for the function to be called incorrectly. Finally, be especially mindful of type conversions that happen with arguments, such as truncation when dealing with short integers, because they are susceptible to boundary issues (as discussed in Chapter 6).