Auditing Variable Use | The Art of Software Security Assessment: Identifying and Preventing Software Vulnerabilities

Variables are objects used to store data elements that have some relevance to an application. They are given meaning by the way they're used: what's stored in them, what operations are performed on them, and what they represent. A large part of code auditing is based on understanding variables, their relationships to each other, and how an application can be affected adversely by unexpected manipulation of these relationships. This section discusses different techniques for recognizing variable and data structure misuse and presents several examples in popular applications to help reinforce the concepts.

Variable Relationships

Variables are related to each other if their values depend on each other in some fashion, or they are used together to represent some sort of application state. For example, a function might have one variable that points to a writeable location in a buffer and one variable that keeps track of the amount of space left in that buffer. These variables are related to each other, and their values should change in lockstep as the buffer is manipulated. The more variables used to represent state, the higher the chances that the variables can be manipulated to subvert the variable relationships, which can lead to an overall inconsistent state. As a code auditor, you must search for variables that are related to each other, determine their intended relationships, and then determine whether there's a way to desynchronize these variables from each other. This usually means finding a block of code that alters one variable in a fashion inconsistent with the other variables. Examples of this type of vulnerability can range from simple errors involving two variables in a loop to complicated ones involving many variables across multiple program modules that combine to represent complex state.

First, take a look at Listing 7-1, an example from the mod_dav Apache module. This code deals with CDATA XML elements.

Listing 7-1. Apache mod_dav CDATA Parsing Vulnerability

    cdata = s = apr_palloc(pool, len + 1);     for (scan = elem->first_cdata.first; scan != NULL;          scan = scan->next) {         tlen = strlen(scan->text);         memcpy(s, scan->text, tlen);         s += tlen;     }     for (child = elem->first_child; child != NULL;          child = child->next) {         for (scan = child->following_cdata.first;              scan != NULL;              scan = scan->next) {             tlen = strlen(scan->text);             memcpy(s, scan->text, tlen);             s += tlen;         }     }     *s = '\0';

In Listing 7-1, you can see that a data buffer, s (also set to cdata), is allocated via apr_palloc(), and then string data elements from two linked lists (elem->first_cdata.first and elem->first_child) are copied into the data buffer. The length of the cdata buffer, len, was calculated previously by two similar loops through the linked lists. At this point, you have two related variables you're interested in: a pointer to the buffer, cdata, and a variable representing the buffer's length, len. The preceding code is fine, but see what happens when mod_dav attempts to trim the buffer by pruning whitespace characters:

   if (strip_white) {        /* trim leading whitespace */        while (apr_isspace(*cdata)) /* assume: return false                                     * for '\0' */            ++cdata;        /* trim trailing whitespace */        while (len  > 0 && apr_isspace(cdata[len]))            continue;        cdata[len + 1] = '\0';    }    return cdata; }

The leading spaces are skipped by incrementing the cdata variable; however, the len variable doesn't get decremented to reflect the buffer's shrinking size. The relationship between the two variables has been rendered invalid. Therefore, when the trailing spaces are trimmed in the second while loop, the cdata[len] location can point outside the bounds of the buffer.

The previous example shows a reasonably straightforward error. Usually vulnerabilities of this nature are far more complicated because of several related variables combining to represent application state or complex code paths that allow more opportunities for variables to be desynchronized from one another. To see an example of these code paths, take a look at Listing 7-2, from the BIND 9.2.1 resolver code. This code has been shortened because it's quite long and rather difficult to follow.

Listing 7-2. Bind 9.2.1 Resolver Code gethostans() Vulnerability

static struct hostent * gethostans(struct irs_ho *this,        const u_char *ansbuf, int anslen,        const char *qname, int qtype,        int af, int size,    /* meaningless for addrinfo cases */        struct addrinfo **ret_aip, const struct addrinfo *pai) {     struct pvt *pvt = (struct pvt *)this->private;     int type, class, buflen, ancount, qdcount, n,         haveanswer, had_error;     int error = NETDB_SUCCESS, arcount;     int (*name_ok)(const char *);     const HEADER *hp;     const u_char *eom;     const u_char *eor;     const u_char *cp;     const char *tname;     const char *hname;     char *bp, **ap, **hap;     char tbuf[MAXDNAME+1];     struct addrinfo sentinel, *cur, ai;     const u_char *arp = NULL;     ...     eom = ansbuf + anslen;     ...     bp = pvt->hostbuf;     buflen = sizeof pvt->hostbuf;     cp = ansbuf + HFIXEDSZ;     ...     haveanswer = 0;     had_error = 0;     while (ancount > 0 && cp < eom && !had_error) {

Now look at the variables in play in the preceding code. Coming into this function, ansbuf is a pointer to a DNS response packet, and cp points to the first record in the packet after the DNS header. The pvt->hostbuf buffer holds hostnames read in from the DNS response. The buflen variable represents the amount of space left in the hostbuf buffer, and it's updated accordingly as the buffer is written into with each response from the packet. The bp variable holds the current write location in the hostname buffer. So every time bp is incremented to point further into the buffer, buflen should be decremented by the same amount. The while loop at the end iterates through each answer in the DNS response packet (as tracked by anscount), making sure it doesn't read past the end of the packet (stored in eom).

The following code handles extracting hostnames from a CNAME answer to a query. It's correct from a security perspective and should give you a little insight into the use of variables:

     ...      if ((qtype == T_A || qtype == T_AAAA ||           qtype == ns_t_a6 || qtype == T_ANY)          && type == T_CNAME) {          if (ap >= &pvt->host_aliases[MAXALIASES-1])              continue;          n = dn_expand(ansbuf, eor, cp, tbuf, sizeof tbuf);          if (n < 0 || !maybe_ok(pvt->res, tbuf, name_ok)) {              had_error++;              continue;          }          cp += n;          /* Store alias. */           *ap++ = bp;          ...          n = strlen(tbuf) + 1;    /* for the \0 */          if (n > buflen || n > MAXHOSTNAMELEN) {              had_error++;              continue;          }          strcpy(bp, tbuf);          pvt->host.h_name = bp;          hname = bp;          bp += n;          buflen -= n;          continue;

Basically, if the query is a request for an IP address (qtype==T_A), and the server responds with a CNAME, which is an alias, the program needs to record the alias into a list (pvt->host_aliases) and place the hostname into the pvt->hostbuf buffer. If there's room in the alias list, BIND uses dn_expand() to pull the hostname out of the packet into the temporary buffer tbuf. If this name is okay, the alias is stored in the hostname buffer. Note that the relationship highlighted earlier about bp and buflen moving in lockstep has been preserved. A code reviewer focusing on this relationship will see one case in which desynchronizing bp from buflen is possiblespecifically, when converting information related to A and AAAA records. The offending code is bolded in the following excerpt:

     case T_A:      case T_AAAA:      convertinfo:  /* convert addrinfo into hostent form */      ...         if (ret_aip) { /* need addrinfo. keep it. */             while (cur && cur->ai_next)                 cur = cur->ai_next;         } else if (cur->ai_next) { /* need hostent */             struct addrinfo *aip = cur->ai_next;             for (aip = cur->ai_next; aip;                  aip = aip->ai_next) {                 int m;                 m = add_hostent(pvt, bp, hap, aip);                 if (m < 0) {                     had_error++;                     break;                 }                 if (m == 0)                     continue;                 if (hap < &pvt->h_addr_ptrs[MAXADDRS-1])                     hap++;                 bp += m;             }             freeaddrinfo(cur->ai_next);             cur->ai_next = NULL;         }         cp += n;         break;     default:         abort();     }     if (!had_error)         haveanswer++; }

As you can see, the bp variable is updated without buflen being decremented, thus desynchronizing the two variables. This introduces the possibility for clients to send malformed DNS responses with multiple A and AAAA responses that aren't stored correctly; consequently, the pvt->hostbuf variable can be overflowed. This vulnerability has since been fixed by removing this variable relationship to ensure that another bug like this doesn't occur. Instead of having a buflen variable, a pointer variable, ep, is introduced that's statically set to the end of the buffer. Even though this variable is also related to bp, the relationship is safer, as ep never has to move and, therefore, can never be desynchronized. In a situation like this, you should try to identify parts of code where bp is incremented past ep and a subtraction of the two pointers (ep - bp) is converted to a large positive integer that is passed as a length argument.

The previous example demonstrated a length variable not being updated correctly to reflect the remaining space in a buffer. Despite the amount of code in this function, it's still a straightforward example, in that only two variables constituted the state you were attempting to desynchronize. Sometimes multiple variables interrelate to represent a more complicated state, as in Listing 7-3, which consists of code from Sendmail 8.11.x.

Listing 7-3. Sendmail crackaddr() Related Variables Vulnerability

char * crackaddr(addr)         register char *addr; {         register char *p;         register char c;         int cmtlev;         int realcmtlev;         int anglelev, realanglelev;         int copylev;         int bracklev;         bool qmode;         bool realqmode;         bool skipping;         bool putgmac = false;         bool quoteit = false;         bool gotangle = false;         bool gotcolon = false;         register char *bp;         char *buflim;         char *bufhead;         char *addrhead;         static char buf[MAXNAME + 1];     ...         bp = bufhead = buf;         buflim = &buf[sizeof buf - 7];         p = addrhead = addr;         copylev = anglelev = realanglelev = cmtlev =             realcmtlev = 0;         bracklev = 0;         qmode = realqmode = false;         while ((c = *p++) != '\0')         {                 /*                 **  If the buffer is overfull, go into a                 **  special "skipping" mode that tries to                 **  keep legal syntax but doesn't actually                 **  output things                 */                 skipping = bp >= buflim;

Listing 7-3 shows the initial setup of the crackaddr() function, which is used to check the syntax of a supplied e-mail address (as well as output it to the buf character array). Here, several variables combine to represent the function's state. (All variables ending in lev indicate some level of nested address components.) The skipping mode variable is used to indicate that no more output buffer space remains, and several other variables represent different aspects of the input string (and its validity). The following code shows a little more of the processing in this function.

      /* check for comments */          if (c == '(')          {                  cmtlev++;                  /* allow space for closing paren */                  if (!skipping)                  {                          buflim;                          realcmtlev++;                          if (copylev++ <= 0)                          {                                  if (bp != bufhead)                                          *bp++ = ' ';                                  *bp++ = c;                          }                  }         }         if (cmtlev > 0)         {                 if (c == ')')                 {                         cmtlev;                         copylev;                         if (!skipping)                         {                                 realcmtlev;                                 buflim++;                         }                 }                 continue;         }       ...         if (c == '>')         {                 if (anglelev > 0)                 {                         anglelev;                         if (!skipping)                         {                                 realanglelev;                                 buflim++;                         }                 }                 else if (!skipping)                 {                         /* syntax error: unmatched > */                         if (copylev > 0)                                 bp;                         quoteit = true;                         continue;                 }                 if (copylev++ <= 0)                         *bp++ = c;                 continue;         }

In some cases, the output buffer is written to without checking the skipping mode variable to make sure enough space is leftnamely, when dealing with the angle bracket character (>). After studying the code, you can see recurring patterns that users can supply to cause these characters to be written outside the buffer's bounds. Specifically, when an angle bracket character is supplied, it can be written to the output buffer despite skipping mode being on, as long as copylev is less than or equal to zero. When the angle character is written, copylev is incremented, so you need a way to decrement it back to zero. It turns out that you can decrement copylev by supplying a closed parenthesis character as long as cmtlev is greater than 0, which you can ensure by supplying an open parenthesis character first. Therefore, the pattern ()>()>()>... causes a number of > characters to be written outside the buffer's bounds. This bug has two root causes: There are places when characters can be written to an output buffer despite skipping mode being on, and the lev variables aren't incremented and decremented equally by characters enclosing an address component, such as (and), when skipping mode is on.

When you begin to examine a new function, it's a good idea to go through the code quickly and identify any relationships such as this one in the function. Then make one pass to see whether any variables can be desynchronized. A well-designed application tends to keep variable relationships to a minimum. Developers often conceal complex relationships in separate subsystems so that the internals aren't exposed to callers; concealing variables in this manner is known as data hiding and is generally considered good programming form. However, data hiding can make your job harder by spreading complex relationships across multiple files and functions. Examples of data hiding include private variables in a C++ class and the buffer management subsystem in OpenSSH. You see an example in the next section of a desynchronization vulnerability in this buffer management subsystem.

Structure and Object Mismanagement

Applications often use large structures to manage program and session state, and group related data elements. Indeed, the essence of object-oriented programming encourages this behavior, so code reviewers are often confronted with code that makes extensive use of opaque objects or structures, which are often manipulated through insufficiently documented interfaces. Code reviewers must familiarize themselves with these interfaces to learn the purpose of objects and their constituent members.

As discussed in the previous section, the more related variables there are in a part of an application, the higher the likelihood for an inconsistent state error. One goal of auditing object-oriented code should be to determine whether it's possible to desynchronize related structure members or leave them in an unexpected or inconsistent state to cause the application to perform some sort of unanticipated operation. For example, OpenSSH makes extensive use of dynamic resizable data buffers throughout the application. The routine responsible for initializing the buffer structure, buffer_init(), is shown in Listing 7-4.

Listing 7-4. OpenSSH 3.6.1 Buffer Corruption Vulnerability

/* Initializes the buffer structure. */ void buffer_init(Buffer *buffer) {    buffer->alloc = 4096;    buffer->buf = xmalloc(buffer->alloc);    buffer->offset = 0;    buffer->end = 0; }

From this, you can deduce that the buf and alloc variable share a relationship: The alloc member should always represent the amount of bytes allocated in the buffer. By examining the other buffer_* functions, you can deduce several more relationshipsnamely, that offset and end are offsets into a buffer, and both must be less than alloc, and offset should be less than end. If these relationships are not followed, the code might contain integer underflow problems. Therefore, when reviewing this application, you must determine whether any of these variable relationships can be violated, as the resulting inconsistent state could cause a buffer overflow.

In this code, two variables could become desynchronized in one instance: the buf and alloc variables. This problem occurs in buffer_append_space(), which is shown in the following code:

/*  * Appends space to the buffer, expanding the buffer if  * necessary. This does not actually copy the data into the  * buffer, but instead returns a pointer to the allocated  * region. */ void * buffer_append_space(Buffer *buffer, u_int len) {     void *p;     if (len > 0x100000)         fatal("buffer_append_space: len %u not supported", len);     /* If the buffer is empty, start using it from the beginning. */     if (buffer->offset == buffer->end) {         buffer->offset = 0;         buffer->end = 0;     } restart:     /* If there is enough space to store all data, store it        now. */     if (buffer->end + len < buffer->alloc) {         p = buffer->buf + buffer->end;         buffer->end += len;         return p;     }     /*      * If the buffer is quite empty, but all data is at      * the end, move the data to the beginning and retry.      */     if (buffer->offset > buffer->alloc / 2) {         memmove(buffer->buf, buffer->buf + buffer->offset,         buffer->end - buffer->offset);         buffer->end -= buffer->offset;         buffer->offset = 0;         goto restart;     }     /* Increase the size of the buffer and retry. */     buffer->alloc += len + 32768;     if (buffer->alloc > 0xa00000)         fatal("buffer_append_space: alloc %u not supported",                buffer->alloc);     buffer->buf = xrealloc(buffer->buf, buffer->alloc);     goto restart;     /* NOTREACHED */ }

The alloc variable is incremented by a certain amount, thus making it inconsistent with the amount of data that was allocated in buf. Afterward, buf is reallocated so that the structure is consistent when it's returned to the calling function, but the developer didn't consider the implications of the xrealloc() function failing or the length check of alloc against the constant value 0xa00000 failing. Both failures result in the fatal() function being called eventually. If the length check fails or xrealloc() fails, fatal() is called immediately. The xrealloc() implementation is shown in the following code:

void * xrealloc(void *ptr, size_t new_size) {     void *new_ptr;     if (new_size == 0)         fatal("xrealloc: zero size");     if (ptr == NULL)         new_ptr = malloc(new_size);     else         new_ptr = realloc(ptr, new_size);     if (new_ptr == NULL)         fatal("xrealloc: out of memory (new_size %lu bytes)",               (u_long) new_size);     return new_ptr; }

You can see that xrealloc() also calls fatal() upon failure. Further investigation reveals that the fatal() function cleans up several global variables, including buffers used for handling data input and output with the buffer_free() routine, which is shown here:

/* Frees any memory used for the buffer. */ void buffer_free(Buffer *buffer) {     memset(buffer->buf, 0, buffer->alloc);     xfree(buffer->buf); }

Therefore, if an allocation fails or the inbuilt size threshold is reached, and the buffer being resized is one of those global variables, the memset() function in buffer_free() writes a large amount of data past the end of the allocated buffer. Several other cleanup functions are subsequently called, allowing an opportunity for exploitation.

This example highlights how structure mismanagement bugs tend to be quite subtle, as the code to manage structures is spread out into several small functions that are individually quite simple. Therefore, any vulnerabilities tend to be a result of aggregate, emergent behavior occurring across multiple functions. One major problem area in this structure management code is low-level language issues, such as type conversion, negative values, arithmetic boundaries, and pointer arithmetic (discussed in Chapter 6, "C Language Issues"). The reason is that management code tends to perform a lot of length calculations and comparisons. Recall the OpenSSL example of dealing with arithmetic boundaries (see Listing 7-10). You were able to pass a negative value to the BUF_MEM_grow() function, which is responsible for buffer management in the OpenSSL libraries. Listing 7-5 shows the internals of how that function works.

Listing 7-5. OpenSSL BUF_MEM_grow() Signed Variable Desynchronization

typedef struct buf_mem_st         {         int length;     /* current number of bytes */         char *data;         int max;        /* size of buffer */         } BUF_MEM; ... int BUF_MEM_grow(BUF_MEM *str, int len)         {         char *ret;         unsigned int n;         if (str->length >= len)                 {                 str->length=len;                 return(len);                 }         if (str->max >= len)                 {                 memset(&str->data[str->length],0,                        len-str->length);                 str->length=len;                 return(len);                 }         n=(len+3)/3*4;         if (str->data == NULL)                 ret=OPENSSL_malloc(n);         else                 ret=OPENSSL_realloc(str->data,n);         if (ret == NULL)                 {                 BUFerr(BUF_F_BUF_MEM_GROW,ERR_R_MALLOC_FAILURE);                 len=0;                 }         else                 {                 str->data=ret;                 str->length=len;                 str->max=n;                 }         return(len);         }

As you can see, this structure represents lengths with signed integers. The code is quite dangerous in this context, as all comparisons in the function aren't taking negative values into account correctly. You can see that if this function receives a negative length value, the first comparison succeeds, and the program erroneously determines that enough free space already exists in the currently allocated buffer. Code reviewers must look for any place where a negative length could be supplied to this function because doing so would desynchronize data from length.

Naturally, when reviewing object-oriented code (such as C++ or Java applications), related variables are often sorted into classes. You have already looked at simple inconsistencies in objects related to uninitialized variables; however, a broader range of concerns stem from an object being left in an inconsistent state. The process for finding these vulnerabilities is similar to the OpenSSH example: Identify the manner in which variables relate to each other, and attempt to determine whether a code path exists in which these variables can be updated in an unexpected way. Implicit member functions are a major component of object-oriented languages, and code auditors will likely find more potential for subtle vulnerabilities caused by incorrect assumptions about the behavior of implicit member functions, such as overloaded operators.

Auditing Tip

Determine what each variable in the definition means and how each variable relates to the others. After you understand the relationships, check the member functions or interface functions to determine whether inconsistencies could occur in identified variable relationships. To do this, identify code paths in which one variable is updated and the other one isn't.

Variable Initialization

Occasionally, programmers make the mistake of reading a value from a variable before it has been initialized. This happens primarily in two circumstances:

The programmer intended for the variable to be initialized at the beginning of the function but forgot to specify an initializer during the declaration.
A code path exists where the variable is accidentally used without ever being initialized.

A variable initialization error results in uninitialized (and, therefore, undefined) data from a location in memory where the variable resides (typically, the program stack or heap) being interpreted and given meaning. In many cases, attackers can influence these memory areas and take advantage of the error to gain control of the process containing the offending code. In any event, unexpected data presents the opportunity to take unexpected code paths, which often has undesirable results. Listing 7-6 is a simple example.

Listing 7-6. Uninitialized Variable Usage

int login(char *login_string) {     char *user, *style, *ptr;     ptr = strchr(login_string, ':');     if(ptr){         *ptr = '\0';         user = strdup(login_string);         style = strdup(ptr+1);         *ptr = ':';     } else         user = strdup(login_string);     ...     if(style){         ...     } }

Listing 7-6 accepts a login string containing a username and an optional login style identified by a colon in the login string. The code later takes an alternative code path based on the supplied login style. The problem is that if no style is supplied, the style variable never gets initialized, and accesses to it read random data from the program stack. With careful manipulation, attackers could influence the values that uninitialized variables take. Attacking this vulnerability is possible, although quite complex; attackers need to work out the order in which functions are called and their relative stack depththat is, if function X calls function Y followed by function Z, the local variables from function Y are left on the stack in roughly the same place where the function Z allocates space for its local variables.

Most vulnerabilities of this nature occur when a function takes an abnormal code path. Functions that allocate a number of objects commonly have an epilogue that cleans up objects to avoid memory leaks when an error occurs. Consider the code in Listing 7-7.

Listing 7-7. Uninitialized Memory Buffer

int process_data(int sockfd) {     char *buf;     struct descriptor *desc;     ...     if(read_data(sockfd) < 0)         goto err;     ...    allocate buf and desc and process data normally ...     return 0; err:     if(buf)         free(buf);     if(desc)         free_descriptor(desc);     return 1; }

If an error occurs during the call to read_data(), the buffer buf isn't allocated, nor is struct descriptor *desc. However, they are still freed in the err condition, potentially creating an exploitable situation.

When auditing C++ code, pay close attention to member variables in objects, as unexpected code paths can leave objects in an inconsistent or partially uninitialized state. The best way to begin examining this code is usually by looking at constructor functions to see whether any constructors neglect to initialize certain elements of the object. Listing 7-8 shows a simple example.

Listing 7-8. Uninitialized Object Attributes

class netobj {     private:         char *data;         size_t datalen;    public:        netobj() { datalen = 0; }        ~netobj() { free(data); }        getdata() { return data; }        int setdata(char *d, int n) {            if(!(data = (char *)malloc(n)))                return -1;            memcpy(data, d, n);        }        ... } ... int get_object(int fd) {     char buf[1024];     netobj obj;     int n;     if((n = read(fd, buf, sizeof(buf))) < 0)         return -1;     obj.setdata(buf, n);     ...     return 0; }

The example has an obvious problem: The constructor never initializes its data member. Therefore, if the call to read() fails, the destructor is automatically called during the function epilogue. The default destructor for this object then calls the free() function on the data member, which is an arbitrary value because it was never initialized, as obj.setdata() was never called. This example illustrates an important point: Bugs of this nature occurring in C++ applications can be far more subtle, as many operations occur implicitly. Code reviewers need to be mindful of these implicit operations (such as constructor/destructor calls, overloaded operators, and so on) when evaluating a piece of code.

Auditing Tip

When variables are read, determine whether a code path exists in which the variable is not initialized with a value. Pay close attention to cleanup epilogues that are jumped to from multiple locations in a function, as they are the most likely places where vulnerabilities of this nature might occur. Also, watch out for functions that assume variables are initialized elsewhere in the program. When you find this type of code, attempt to determine whether there's a way to call functions making these assumptions at points when those assumptions are incorrect.

Arithmetic Boundaries

Arithmetic boundaries are presented at length in Chapter 6. However, when auditing variable use you will find it helpful to have a structured process for identifying these vulnerabilities. The following three steps provide a good plan of attack:

1.	Discover operations that, if a boundary condition could be triggered, would have security-related consequences (primarily length-based calculations and comparisons).
2.	Determine a set of values for each operand that trigger the relevant arithmetic boundary wrap.
3.	Determine whether this code path can be reached with values within the set determined in step 2.

The first step is usually simple. For auditors, it's common sense to determine whether an arithmetic operation would adversely affect an application. In some respects, any operation that can be undermined is detrimental to an application; however, problems should be considered in terms of severity, ranging from basic bugs to vulnerabilities that represent an imminent danger if exploited. You must also consider the context of the miscalculation. Depending on the nature of the application, an attacker might not be interested in a typical buffer length miscalculation. For example, a bank application that doesn't adequately handle negative values in transactions is potentially even more dangerous than a memory corruption vulnerability.

After problem areas have been identified, step 2 requires finding a problem domainthat is, a set of values that could trigger the arithmetic boundary conditions. For example, the following code line performs a length check before a data copy:

if (length + 32 > sizeof(buffer))

Assuming length is a 32-bit unsigned value, you can see that an integer wrap circumvents this check when length contains a value between 0xFFFFFFE0 and 0xFFFFFFFF. Calculations involving multiple variables often have problem domains that aren't a continuous set of values, as shown in the following expression:

if(length1 + length2 > sizeof(buffer))

In this example, the length check can be evaded as long as the sum of length1 and length2 overflow the zero boundary. It does not matter which variable takes a large value (or if both do), as long as both add up to a value larger than 0xFFFFFFFF. When assessing problems like these, you should record the location of the problem case, and then revisit it when you have some idea of the constraints placed on each variable.

Finally, in step 3, you need to determine whether the code path can be reached when variables contain values within the problem domain. You can perform this step in a fairly straightforward manner:

Identify the data type of the variable involved Identifying the data type allows you to define an initial set of values the variable can take. If a problem domain is 0xFFFFFFE0 through 0xFFFFFFFF, but the variable is a 16-bit unsigned integer, you can automatically discount the check because the variable can never take the values of the domain in question. (Note the use of unsigned: A signed 16-bit variable can take values in the problem domain if certain type conversions occur during the check.)
Determine the points at which the variable is assigned a value The next step is to identify where and how the variable is given a value. Pay attention to what default values the parameter can take, and note any special configurations that might make the application vulnerable. You also need to trace the values of other variables that are assigned to the suspect variable. If none of the assigned values can overlap with the problem domain, the operation can be marked as safe.
Determine the constraints on the variable from assignment until the vulnerable operation Now that you know an initial set of values and possible assignments, you must determine any restrictions placed on the variable in the vulnerable code path. Often the variable goes through a number of validation checks, which reduce the set of values the variable can take. You need to trace the variable through its use and determine what values are included in this reduced setknown as the validated domain. Any overlap between the problem domain and the validated domain represents vulnerability.
Determine supporting code path constraints In addition to the variable used in the vulnerable operation, other variables can play an important role in triggering the bug. You should record these additional variables and what values can lead to vulnerable code paths.

Now that you understand how to identify arithmetic boundary conditions, try applying the process to the vulnerable code path in Listings 7-9 and 7-10.

Listing 7-9. Arithmetic Vulnerability Example

#define BLOB_MAX    1024 unsigned char *read_blob(unsigned char *blob, size_t pktlen) {    int bloblen;    unsigned char *buffer;    bloblen = ntohl(blob);    if(bloblen + sizeof(long) > pktlen || bloblen > BLOB_MAX)        return NULL;    buffer = alloc(bloblen);    if(!buffer)        return NULL;    memcpy(buffer, blob+4, bloblen);    return buffer; }

For the purposes of this discussion, assume that the alloc() function in Listing 7-9 is vulnerable to an integer overflow condition, and you want to identify an exploitable path to this function. To do this, you must first determine how to evade the length comparison performed by the bolded code line. On the left side of the comparison, bloblen needs to take on a value that, after the addition of 4, is less than pktlen. Even though bloblen is signed, it's converted to an unsigned integer for the left side of this comparison. This leaves you with a small problem domain: 0xFFFFFFFC through 0xFFFFFFFF (-4 through -1). On the right side of the comparison, bloblen is treated as signed, so the problem domain is unchanged. To determine whether this function is vulnerable, you need to see how it's called, which is shown in Listing 7-10.

Note

The discussion of Listing 7-9 assumes a call to alloc() is vulnerable to an integer wrapping condition. In a real application, you would review alloc() and determine if this is the case, but it is a reasonable assumption. Custom allocation wrappers are often prone to a variety of arithmetic issues, as covered in "Auditing Memory Management," later in this chapter.

Listing 7-10. Arithmetic Vulnerability Example in the Parent Function

int process_packet(unsigned char *pkt, size_t pktlen) {    unsigned int length = 0;    int type = 0;    unsigned char *data;    type = pkt[0];    switch(type){        case TYPE_KEY:            length = ntohl(&pkt[1]);            if(length != RSA_KEY_SIZE)                return -1;            data = read_blob(&pkt[1], pktlen);            ...            break;        case TYPE_USER:            data = read_blob(&pkt[1], pktlen);            ...        default:            return -1; }

There are two calls to read_blob() in Listing 7-10. When type is TYPE_KEY, the length variable is checked against RSA_KEY_SIZE, and returns with an error if it doesn't match. This means the validated domain is only one valueRSA_KEY_SIZEand is unlikely to overlap the problem domain. Therefore, the call to read_blob() is safe in this location. When type is TYPE_USER, however, no such restrictions exist. Therefore, the validated domain is 0x00000000 through 0xFFFFFFFF, so there's an overlap! All values in the problem domain are within the validated domain, so you can say with confidence that this comparison can be evaded. These are the only constraints you have:

type == TYPE_USER
length (from the read_blob function) + sizeof(long) is less than pktlen (so you probably want pktlen to be larger than 4)

Type Confusion

The union-derived data type is used to store multiple data elements at the same location in memory. The intended purpose for this type of storage is that each of the data elements are mutually exclusive, and only one of them can be in use at a time. Union data types are most commonly used when structures or objects are required to represent multiple data types depending on an external condition, such as representing different opaque objects read off the network. Occasionally, application developers confuse what the data in a union represents. This can have disastrous consequences on an application, particularly when integer data types are confused with pointer data types, or complex structures of one type are confused with another. Although this mistake seems unlikely, it has shown up in at least one widely deployed application. Most vulnerabilities of this nature stem from misinterpreting a variable used to define what kind of data the structure contains. Listing 7-11 shows a brief example.

Listing 7-11. Type Confusion

struct object {     int type;     union {         int num;         char *str;         void *opaque;     } u; } struct object *object_read(int sockfd) {     int ret;     struct object *obj;     if(!(obj =         (struct object *)calloc(1, sizeof(struct object))))         die("calloc: %m");     obj->type = get_type(sockfd);     switch(obj->type & 0xFF){         case OBJ_NUM:             ret = read_number(sockfd, &(obj->u.num));             break;         case OBJ_STR:             ret = read_string(sockfd, &(obj->u.str));             break;         default:             ret = read_opaque(sockfd, &(obj->u.opaque));     }     if(ret < 0){         free(obj);         return NULL;     }     return obj; } int object_free(struct object *obj) {     if(!obj)         return -1;     switch(obj->type){         case OBJ_NUM:             break;         case OBJ_STR:             free_string(obj->u.str);             break;         default:             free_opaque(obj->u.opaque);     }     free(obj);     return 0; }

Listing 7-11 shows an interface for reading objects of some form off the network. Notice the small differences between the way objects are initialized and the way they are cleaned up. The type variable is a 32-bit integer read in from the network, yet only the lower 8 bits are examined during object initialization. When the object is cleaned up, all 32 bits are examined. Therefore, if a 32-bit integer type is supplied with the low bits equaling OBJ_NUM and the higher bits not all set to zero, a user-controlled integer is passed to the free_opaque() function and treated as a memory location, most likely resulting in a call to free() on an arbitrary memory location.

Lists and Tables

Linked lists and hash tables are often used in applications to keep a collection of data elements in a form that's easy to retrieve and manipulate. Some common errors are made when implementing routines that add and modify these data structures, and these mistakes can lead to inconsistencies in data structures. Attackers could take advantage of these inconsistencies to force an application into performing operations it wasn't intended to.

Linked lists are used frequently for storing related data elements that need to be looked up or modified later in the program. Linked lists can be singly linked or doubly linked. Singly linked lists are those in which elements contain a pointer to the next element in the list; doubly linked lists elements contain pointers to both the next and previous elements in the list. In addition, linked lists can be circular, meaning the last element of the list links back to the first element; for doubly linked lists, the previous pointer of the first element links back to the last element.

When auditing code that makes use of linked lists, you should examine how well the algorithm implements the list and how it deals with boundary conditions when accessing elements of these lists. Each of these points (discussed in the following sections) needs to be addressed:

Does the algorithm deal correctly with manipulating list elements when the list is empty?
What are the implications of duplicate elements?
Do previous and next pointers always get updated correctly?
Are data ranges accounted for correctly?

Manipulating List Elements in Empty Lists

Often, list structure members or global variables are used to point to the head of a list and potentially the tail of the list. If the code reviewer can find a case where these variables aren't updated correctly, there's the possibility for outdated elements or undefined data to be references as though they were part of the list. For example, consider the code in Listing 7-12.

Listing 7-12. Empty List Vulnerabilities

/* head and tail elements of a doubly linked, noncircular    list */ struct member *head, *tail; int delete_element(unsigned int key) {     struct member *tmp;     for(tmp = head; tmp; tmp = tmp->next){         if(tmp->key == key){            if(tmp->prev)                tmp->prev->next = tmp->next;            if(tmp->next)                tmp->next->prev = tmp->prev;           free(tmp);            return 1;        }    }    return 0; }

The deletion code in Listing 7-12 has an obvious omission: If the head or tail elements are deleted, this deletion isn't accounted for in the delete_element() function. Because the head and tail global variables aren't updated, the first or last element can be deleted with this function and then accessed by any code manipulating the head or tail pointers. Code that doesn't deal with head and tail elements correctly isn't common, but it can occur, particularly when list management is decentralized (that is, there's no clean interface for list management, so management happens haphazardly at different points in the code).

Some implementations initialize the list with blank head and/or tail elements, often called sentinel nodes (or sentinels). Sentinel nodes are used largely for convenience so that code doesn't need to specifically deal with instances of the list being empty, as sentinel nodes always have at least one element. If users can add data elements that appear to the program to be sentinels or cause sentinels to be deleted, the list management code might be susceptible to vulnerabilities stemming from code misinterpreting where the head or tail of the list is.

Duplicate Elements

Depending on the nature of the data being stored, duplicate elements can cause problems. Elements containing identical keys (data values used to characterize the structure as unique) could cause the two elements to get confused, resulting in the wrong element being selected from the list. This error might have interesting consequences; for example, sessions uniquely identified by a cookie could become confused if two or more clients supplied identical cookies. This confusion could lead to some sort of information leak, elevation of privilege, or other compromise.

Previous and Next Pointer Updates

Implementation flaws in deleting and inserting elements may prevent the previous and next pointers from being updated correctly. This is especially true if the program treats the current member as the head or tail of a list. Listing 7-13 shows a potential issue that occurs when updating list elements.

Listing 7-13. List Pointer Update Error

struct member *head, *tail; int delete_element(unsigned int key) {     struct member *tmp;     for(tmp = head; tmp; tmp = tmp->next){         if(tmp->key == key){             if(tmp->prev)                 tmp->prev->next = tmp->next;             if(tmp->next)                 tmp->next->prev = tmp->prev;             if(tmp == head)                 head = tmp->next;             else if(tmp == tail)                 tail = tmp->prev;             free(tmp);             return 1;        }    }    return 0; }

The code in Listing 7-13 has a small error when updating the head and tail elements. If only one element exists in the list, both the head and the tail element point to it, yet you can see in the code that an else statement is used when testing whether the element is the head or tail. Therefore, if a single element exists in the list and is deleted, the head element is updated correctly to be NULL; however, the tail element points to the outdated element.

Data Ranges

In ordered lists, the elements are sorted into some type of order based on a data member that distinguishes each list element. Often each data element in the list represents a range of values, such as part of an IP datagram in an IP fragment queue or memory ranges in kernel control structures for processes. The code used to implement this seemingly simple data structure can be quite complex, particularly when you have to take the following nuances of the data into account:

Can overlapping data ranges be supplied?
Can replacement data ranges (duplicate elements) be supplied?
Does old or new data take precedence?
What happens when 0 length data ranges are supplied?

These details, if not handled correctly, can result in logic flaws in processing data or inconsistencies in the list data structures. The most likely result of this oversight is an exploitable memory corruption condition. Listing 7-14 is code from the Linux kernelthe infamous teardrop bug. It shows how overlapping data ranges can be processed incorrectly, resulting in a vulnerability.

Listing 7-14. Linux Teardrop Vulnerability

    /*      *      We found where to put this one.      *      Check for overlap with preceding fragment,      *      and, if needed, align things so that any      *      overlaps are eliminated.      */     if (prev != NULL && offset < prev->end)     {             i = prev->end - offset;             offset += i;    /* ptr into datagram */             ptr += i;       /* ptr into fragment data */     }      ...     /* Fill in the structure. */     fp->offset = offset;     fp->end = end;      fp->len = end - offset;

This code processes incoming IP fragments to be placed into a queue with other fragments that are part of the same IP datagram. The offset variable represents the offset into the complete datagram where the current fragment begins. The end variable is the offset into the complete datagram where the current fragment ends, calculated by adding the starting offset of the fragment and its length. The IP code cycles through a list of fragments and breaks out when it finds the right place in the list to insert the incoming IP fragment. If there's any overlap between two fragments, the current fragment is shrunk so that only unaccounted for data ranges are added to the queue, and the overlapping data is discarded. An "overlap" in this situation means that two fragments or more partially or fully supply duplicate data ranges. For example, if one fragment supplies data from offset 1030, and another specifies 2040, they overlap because both fragments specify data from the offset 2030.

The vulnerability in this code occurs during the process of shrinking the current fragment; the code is written with the assumption that end is greater than or equal to prev->end. If this isn't the case, offset is incremented to become larger than end. As a result, the fp->len variable is set to a negative value, which is later used as an argument to memcpy(), resulting in a buffer overflow.

Hashing Algorithms

Hash tables are another popular data structure, typically used for speedy access to data elements. A hash table is often implemented as an array of linked lists, so the previous discussion on list management is relevant to hash tables as well. Hash tables use the list element as input to a hash function (hash functions are discussed in Chapter 2, "Design Review"). The resulting hash value is used as an index to an array. When dealing with hash tables, code auditors must address these additional questions:

Is the hashing algorithm susceptible to invalid results? Most hashing algorithms attempt to guarantee that the result lies within a certain range (the array size) by performing an operation and then using the modulus or and operator on the result. As discussed in Chapter 6, one potential attack vector is forcing the modulus operator to return negative results. This result would allow negative indexing into the array used to store elements. Additionally, code reviewers must evaluate the consequences if data elements can be influenced in such a way that many collisions could occur. Often this problem causes a slowdown in lookups, which can be a major problem if the application is time critical.
What are the implications of invalidating elements? Several algorithms that store many data elements can invalidate members based on certain conditions, such as timeouts or memory threshold limits reached. This pruning can sometimes have unexpected consequences. As with lists, code auditors must determine whether invalidated elements could be unlinked from the table incorrectly, resulting in the application potentially using outdated elements later. Invalidating elements in a predictable manner can have other interesting consequences, such as causing an application with several session data elements to delete valid sessions, resulting in a denial-of-service condition.