Type Conversion Vulnerabilities

Now that you have a solid grasp of C's type conversions, you can explore some of the exceptional circumstances they can create. Implicit type conversions can catch programmers off-guard in several situations. This section focuses on simple conversions between signed and unsigned types, sign extension, truncation, and the usual arithmetic conversions, focusing on comparisons.

Signed/Unsigned Conversions

Most security issues related to type conversions are the result of simple conversions between signed and unsigned integers. This discussion is limited to conversions that occur as a result of assignment, function calls, or typecasts.

For a quick recap of the simple conversion rules, when a signed variable is converted to an unsigned variable of the same size, the bit pattern is left alone, and the value changes correspondingly. The same thing occurs when an unsigned variable is converted to a signed variable. Technically, the unsigned-to-signed conversion is implementation defined, but in twos complement implementations, usually the bit pattern is left alone.

The most important situation in which this conversion becomes relevant is during function calls, as shown in this example:

int copy(char *dst, char *src, unsigned int len) {     while (len--)         *dst++ = *src++; }

The third argument is an unsigned int that represents the length of the memory section to copy. If you call this function and pass a signed int as the third argument, it's converted to an unsigned integer. For example, say you do this:

int f = -1; copy(mydst, mysrc, f);

The copy() function sees an extremely large positive len and most likely copies until it causes a segmentation fault. Most libc routines that take a size parameter have an argument of type size_t, which is an unsigned integer type that's the same width as pointer. This is why you must be careful never to let a negative length field make its way to a libc routine, such as snprintf(), strncpy(), memcpy(), read(), or strncat().

This situation occurs fairly often, particularly when signed integers are used for length values and the programmer doesn't consider the potential for a value less than 0. In this case, all values less than 0 have their value changed to a high positive number when they are converted to an unsigned type. Malicious users can often specify negative integers through various program interfaces and undermine an application's logic. This type of bug happens commonly when a maximum length check is performed on a user-supplied integer, but no check is made to see whether the integer is negative, as in Listing 6-7.

Listing 6-7. Signed Comparison Vulnerability Example

int read_user_data(int sockfd) {     int length, sockfd, n;     char buffer[1024];     length = get_user_length(sockfd);     if(length > 1024){         error("illegal input, not enough room in buffer\n");         return 1;     }     if(read(sockfd, buffer, length) < 0){         error("read: %m");         return 1;     }     return 0; }

In Listing 6-7, assume that the get_user_length() function reads a 32-bit integer from the network. If the length the user supplies is negative, the length check can be evaded, and the application can be compromised. A negative length is converted to a size_t type for the call to read(), which as you know, turns into a large unsigned value. A code reviewer should always consider the implications of negative values in signed types and see whether unexpected results can be produced that could lead to security exposures. In this case, a buffer overflow can be triggered because of the erroneous length check; consequently, the oversight is quite serious.

Auditing Tip: Signed/Unsigned Conversions

You want to look for situations in which a function takes a size_t or unsigned int length parameter, and the programmer passes in a signed integer that can be influenced by users. Good functions to look for include read(), recvfrom(), memcpy(), memset(), bcopy(), snprintf(), strncat(), strncpy(), and malloc(). If users can coerce the program into passing in a negative value, the function interprets it as a large value, which could lead to an exploitable condition.

Also, look for places where length parameters are read from the network directly or are specified by users via some input mechanism. If the length is interpreted as a signed variable in parts of the code, you should evaluate the impact of a user supplying a negative value.

As you review functions in an application, it's a good idea to note the data types of each function's arguments in your function audit log. This way, every time you audit a subsequent call to that function, you can simply compare the types and examine the type conversion tables in this chapter's "Type Conversions" section to predict exactly what's going to happen and the implications of that conversion. You learn more about analyzing functions and keeping logs of function prototypes and behavior in Chapter 7, "Program Building Blocks."

Sign Extension

Sign extension occurs when a smaller signed integer type is converted to a larger type, and the machine propagates the sign bit of the smaller type through the unused bits of the larger type. The intent of sign extension is that the conversion is value-preserving when going from a smaller signed type to a larger signed type.

As you know, sign extension can occur in several ways. First, if a simple conversion is made from a small signed type to a larger type, with a typecast, assignment, or function call, sign extension occurs. You also know that sign extension occurs if a signed type smaller than an integer is promoted via the integer promotions. Sign extension could also occur as a result of the usual arithmetic conversions applied after integer promotions because a signed integer type could be promoted to a larger type, such as long long.

Sign extension is a natural part of the language, and it's necessary for value-preserving promotions of integers. So why is it mentioned as a security issue? There are two reasons:

In certain cases, sign extension is a value-changing conversion that has an unexpected result.
Programmers consistently forget that the char and short types they use are signed!

To examine the first reason, if you recall from the conversion section, one of the more interesting findings was that sign extension is performed if a smaller signed type is converted into a larger unsigned type. Say a programmer does something like this:

char len; len=get_len_field(); snprintf(dst, len, "%s", src);

This code has disaster written all over it. If the result of get_len_field() is such that len has a value less than 0, that negative value is passed as the length argument to snprintf(). Say the programmer tries to fix this error and does the following:

char len; len=get_len_field(); snprintf(dst, (unsigned int)len, "%s", src);

This solution sort of makes sense. An unsigned integer can't be negative, right? Unfortunately, sign extension occurs during the conversion from char to unsigned int, so the attempt to get rid of characters less than 0 backfired. If len happens to be below 0, (unsigned int)len ends up with a large value.

This example might seem somewhat arbitrary, but it's similar to an actual bug the authors recently discovered in a client's code. The moral of the story is that you should always remember sign extension is applied when converting from a smaller signed type to a larger unsigned type.

Now for the second reasonprogrammers consistently forget that the char and short types they use are signed. This statement rings quite true, especially in network code that deals with signed integer lengths or code that processes binary or text data one character at a time. Take a look at a real-world vulnerability in the DNS packet-parsing code of l0pht's antisniff tool (http://packetstormsecurity.org/sniffers/antisniff/). It's an excellent bug for demonstrating some vulnerabilities that have been discussed. A buffer overflow was first discovered in the software involving the improper use of strncat(), and after that vulnerability was patched, researchers from TESO discovered that it was still vulnerable because of a sign-extension issue. The fix for the sign-extension issue wasn't correct, and yet another vulnerability was published. The following examples take you through the timeline of this vulnerability.

Listing 6-8 contains the slightly edited vulnerable code from version 1 of the antisniff research release, in the raw_watchdns.c file in the watch_dns_ptr() function.

Listing 6-8. Antisniff v1.0 Vulnerability

  char *indx;   int count;   char nameStr[MAX_LEN]; //256 ...   memset(nameStr, '\0', sizeof(nameStr)); ...   indx = (char *)(pkt + rr_offset);   count = (char)*indx;   while (count){     (char *)indx++;     strncat(nameStr, (char *)indx, count);     indx += count;     count = (char)*indx;     strncat(nameStr, ".",             sizeof(nameStr)  strlen(nameStr));   }   nameStr[strlen(nameStr)-1] = '\0';

Before you can understand this code, you need a bit of background. The purpose of the watch_dns_ptr() function is to extract the domain name from the packet and copy it into the nameStr string. The DNS domain names in DNS packets sort of resemble Pascal strings. Each label in the domain name is prefixed by a byte containing its length. The domain name ends when you reach a label of size 0. (The DNS compression scheme isn't relevant to this vulnerability.) Figure 6-8 shows what a DNS domain name looks like in a packet. There are three labelstest, jim, and comand a 0-length label specifying the end of the name.

Figure 6-8. Sample DNS domain name

The code starts by reading the first length byte from the packet and storing it in the integer count. This length byte is a signed character stored in an integer, so you should be able to put any value you like between -128 and 127 in count. Keep this in mind for later.

The while() loop keeps reading in labels and calling strncat() on them to the nameStr string. The first vulnerability that was published is no length check in this loop. If you just provide a long enough domain name in the packet, it could write past the bounds of nameStr[]. Listing 6-9 shows how this issue was fixed in version 1.1 of the research version.

Listing 6-9. Antisniff v1.1 Vulnerability

  char *indx;   int count;   char nameStr[MAX_LEN]; //256 ...   memset(nameStr, '\0', sizeof(nameStr)); ...   indx = (char *)(pkt + rr_offset);   count = (char)*indx;   while (count){     if (strlen(nameStr) + count < ( MAX_LEN - 1) ){       (char *)indx++;       strncat(nameStr, (char *)indx, count);       indx += count;       count = (char)*indx;       strncat(nameStr, ".",               sizeof(nameStr)  strlen(nameStr));     } else {       fprintf(stderr, "Alert! Someone is attempting "                       "to send LONG DNS packets\n");       count = 0;     }  }  nameStr[strlen(nameStr)-1] = '\0';

The code is basically the same, but length checks have been added to try to prevent the buffer from being overflowed. At the top of the loop, the program checks to make sure there's enough space in the buffer for count bytes before it does the string concatenation. Now examine this code with sign-extension vulnerabilities in mind. You know that count can be any value between -128 and 127, so what happens if you give a negative value for count? Look at the length check:

if (strlen(nameStr) + count < ( MAX_LEN - 1) ){

You know that strlen(nameStr) is going to return a size_t, which is effectively an unsigned int on a 32-bit system, and you know that count is an integer below 0. Say you've been through the loop once, and strlen(nameStr) is 5, and count is -1. For the addition, count is converted to an unsigned integer, and you have (5 + 4,294,967,295). This addition can easily cause a numeric overflow so that you end up with a small value, such as 4; 4 is less than (MAX_LEN - 1), which is 256. So far, so good. Next, you see that count (which you set to -1), is passed in as the length argument to strncat(). The strncat() function takes a size_t, so it interprets that as 4,294,967,295. Therefore, you win again because you can essentially append as much information as you want to the nameStr string.

Listing 6-10 shows how this vulnerability was fixed in version 1.1.1 of the research release.

Listing 6-10. Antisniff v1.1.1 Vulnerability

char *indx;   int count;   char nameStr[MAX_LEN]; //256 ...   memset(nameStr, '\0', sizeof(nameStr)); ...   indx = (char *)(pkt + rr_offset);   count = (char)*indx;   while (count){      /* typecast the strlen so we aren't dependent on         the call to be properly setting to unsigned. */      if ((unsigned int)strlen(nameStr) +         (unsigned int)count < ( MAX_LEN - 1) ){       (char *)indx++;       strncat(nameStr, (char *)indx, count);       indx += count;       count = (char)*indx;       strncat(nameStr, ".",               sizeof(nameStr)  strlen(nameStr));     } else {       fprintf(stderr, "Alert! Someone is attempting "                       "to send LONG DNS packets\n");       count = 0;     }  }  nameStr[strlen(nameStr)-1] = '\0';

This solution is basically the same code, except some typecasts have been added to the length check. Take a closer look:

if ((unsigned int)strlen(nameStr) +     (unsigned int)count < ( MAX_LEN - 1) ){

The result of strlen() is typecast to an unsigned int, which is superfluous because it's already a size_t. Then count is typecast to an unsigned int. This is also superfluous, as it's normally converted to an unsigned integer type by the addition operator. In essence, nothing has changed. You can still send a negative label length and bypass the length check! Listing 6-11 shows how this problem was fixed in version 1.1.2.

Listing 6-11. Antisniff v1.1.2 Vulnerability

  unsigned char *indx;   unsigned int count;   unsigned char nameStr[MAX_LEN]; //256 ...   memset(nameStr, '\0', sizeof(nameStr)); ...   indx = (char *)(pkt + rr_offset);   count = (char)*indx;   while (count){     if (strlen(nameStr) + count < ( MAX_LEN - 1) ){       indx++;       strncat(nameStr, indx, count);       indx += count;       count = *indx;       strncat(nameStr, ".",               sizeof(nameStr)  strlen(nameStr));   } else {     fprintf(stderr, "Alert! Someone is attempting "                     "to send LONG DNS packets\n");     count = 0;   }  }  nameStr[strlen(nameStr)-1] = '\0';

The developers have changed count, nameStr, and indx to be unsigned and changed back to the previous version's length check. So the sign extension you were taking advantage of now appears to be gone because the character pointer, indx, is now an unsigned type. However, take a closer look at this line:

count = (char)*indx;

This code line dereferences indx, which is an unsigned char pointer. This gives you an unsigned character, which is then explicitly converted into a signed char. You know the bit pattern won't change, so you're back to something with a range of -128 to 127. It's assigned to an unsigned int, but you know that converting from a smaller signed type to a larger unsigned type causes sign extension. So, because of the typecast to (char), you still can get a maliciously large count into the loop, but only for the first label. Now look at that length check with this in mind:

if (strlen(nameStr) + count < ( MAX_LEN - 1) ){

Unfortunately, strlen(nameStr) is 0 when you enter the loop for the first time. So the rather large value of count won't be less than (MAX_LEN - 1), and you get caught and kicked out of the loop. Close, but no cigar. Amusingly, if you do get kicked out on your first trip into the loop, the program does the following:

nameStr[strlen(nameStr)-1] = '\0';

Because strlen(nameStr) is 0, that means it writes a 0 at 1 byte behind the buffer, at nameStr[-1]. Now that you've seen the evolution of the fix from the vantage point of 20-20 hindsight, take a look at Listing 6-12, which is an example based on a short integer data type.

Listing 6-12. Sign Extension Vulnerability Example

 unsigned short read_length(int sockfd) {     unsigned short len;     if(full_read(sockfd, (void *)&len, 2) != 2)         die("could not read length!\n");     return ntohs(len); } int read_packet(int sockfd) {     struct header hdr;     short length;     char *buffer;     length = read_length(sockfd);     if(length > 1024){         error("read_packet: length too large: %d\n", length);         return 1;     }     buffer = (char *)malloc(length+1);     if((n = read(sockfd, buffer, length) < 0){         error("read: %m");         free(buffer);         return 1;     }     buffer[n] = '\0';     return 0; }

Several concepts you've explored in this chapter are in effect here. First, the result of the read_length() function, an unsigned short int, is converted into a signed short int and stored in length. In the following length check, both sides of the comparison are promoted to integers. If length is a negative number, it passes the check that tests whether it's greater than 1024. The next line adds 1 to length and passes it as the first argument to malloc(). The length parameter is again sign-extended because it's promoted to an integer for the addition. Therefore, if the specified length is 0xFFFF, it's sign-extended to 0xFFFFFFFF. The addition of this value plus 1 wraps around to 0, and malloc(0) potentially returns a small chunk of memory. Finally, the call to read() causes the third argument, the length parameter, to be converted directly from a signed short int to a size_t. Sign extension occurs because it's a case of a smaller signed type being converted to a larger unsigned type. Therefore, the call to read allows you to read a large number of bytes into the buffer, resulting in a potential buffer overflow.

Another quintessential example of a place where programmers forget whether small types are signed occurs with use of the ctype libc functions. Consider the toupper() function, which has the following prototype:

int toupper(int c);

The toupper() function works on most libc implementations by searching for the correct answer in a lookup table. Several libcs don't handle a negative argument correctly and index behind the table in memory. The following definition of toupper() isn't uncommon:

int toupper(int c) {     return _toupper_tab[c]; }

Say you do something like this:

void upperize(char *str) {   while (*str)   {     *str = toupper(*str);     str++;   } }

If you have a libc implementation that doesn't have a robust toupper() function, you could end up making some strange changes to your string. If one of the characters is -1, it gets converted to an integer with the value -1, and the toupper() function indexes behind its table in memory.

Take a look at a final real-world example of programmers not considering sign extension. Listing 6-13 is a Sendmail vulnerability that security researcher Michael Zalewski discovered (www.cert.org/advisories/CA-2003-12.html). It's from Sendmail version 8.12.3 in the prescan() function, which is primarily responsible for parsing e-mail addresses into tokens (from sendmail/parseaddr.c). The code has been edited for brevity.

Listing 6-13. Prescan Sign Extension Vulnerability in Sendmail

register char *p; register char *q; register int c; ... p = addr;     for (;;)     {         /* store away any old lookahead character */         if (c != NOCHAR && !bslashmode)         {             /* see if there is room */             if (q >= &pvpbuf[pvpbsize - 5])             {                 usrerr("553 5.1.1 Address too long");                 if (strlen(addr) > MAXNAME)                     addr[MAXNAME] = '\0'; returnnull:                 if (delimptr != NULL)                     *delimptr = p;                 CurEnv->e_to = saveto;                 return NULL;             }             /* squirrel it away */             *q++ = c;          }          /* read a new input character */          c = *p++;          ..          /* chew up special characters */          *q = '\0';          if (bslashmode)          {              bslashmode = false;              /* kludge \! for naive users */              if (cmntcnt > 0)              {                  c = NOCHAR;                  continue;              }              else if (c != '!' || state == QST)              {                  *q++ = '\\';                  continue;              }         }         if (c == '\\')             bslashmode = true;     }

The NOCHAR constant is defined as -1 and is meant to signify certain error conditions when characters are being processed. The p variable is processing a user-supplied address and exits the loop shown when a complete token has been read. There's a length check in the loop; however, it's examined only when two conditions are true: when c is not NOCHAR (that is, c != -1) and bslashmode is false. The problem is this line:

c = *p++;

Because of the sign extension of the character that p points to, users can specify the char 0xFF and have it extended to 0xFFFFFFFF, which is NOCHAR. If users supply a repeating pattern of 0x2F (backslash character) followed by 0xFF, the loop can run continuously without ever performing the length check at the top. This causes backslashes to be written continually into the destination buffer without checking whether enough room is left. Therefore, because of the character being sign-extended when stored in the variable c, an unexpected code path is triggered that results in a buffer overflow.

This vulnerability also reinforces another principle stated at the beginning of this chapter. Implicit actions performed by the compiler are subtle, and when reviewing source code, you need to examine the implications of type conversions and anticipate how the program will deal with unexpected values (in this case, the NOCHAR value, which users can specify because of the sign extension).

Sign extension seems as though it should be ubiquitous and mostly harmless in C code. However, programmers rarely intend for their smaller data types to be sign-extended when they are converted, and the presence of sign extension often indicates a bug. Sign extension is somewhat difficult to locate in C, but it shows up well in assembly code as the movsx instruction. Try to practice searching through assembly for sign-extension conversions and then relating them back to the source code, which is a useful technique.

As a brief demonstration, compare Listings 6-14 and 6-15.

Listing 6-14. Sign-Extension Example

unsigned int l; char c=5; l=c;

Listing 6-15. Zero-Extension Example

unsigned int l; unsigned char c=5; l=c;

Assuming the implementation calls for signed characters, you know that sign extension will occur in Listing 6-14 but not in Listing 6-15. Compare the generated assembly code, reproduced in Table 6-8.

Table 6-8. Sign Extension Versus Zero Extension in Assembly Code
Listing 6-14: Sign Extension		Listing 6-15: Zero Extension
`mov`	`[ebp+var_5], 5`	`mov`	`[ebp+var_5], 5`
`movsx`	`eax, [ebp+var_5]`	`xor`	`eax, eax`
		`mov`	`al, [ebp+var_5]`
`mov`	`[ebp+var_4], eax`	`mov`	`[ebp+var_4], eax`

You can see that in the sign-extension example, the movsx instruction is used. In the zero-extension example, the compiler first clears the register with xor eax, eax and then moves the character byte into that register.

Auditing Tip: Sign Extension

When looking for vulnerabilities related to sign extensions, you should focus on code that handles signed character values or pointers or signed short integer values or pointers. Typically, you can find them in string-handling code and network code that decodes packets with length elements. In general, you want to look for code that takes a character or short integer and uses it in a context that causes it to be converted to an integer. Remember that if you see a signed character or signed short converted to an unsigned integer, sign extension still occurs.

As mentioned previously, one effective way to find sign-extension vulnerabilities is to search the assembly code of the application binary for the movsx instruction. This technique can often help you cut through multiple layers of typedefs, macros, and type conversions when searching for potentially vulnerable locations in code.

Truncation

Truncation occurs when a larger type is converted into a smaller type. Note that the usual arithmetic conversions and the integral promotions never really call for a large type to be converted to a smaller type. Therefore, truncation can occur only as the result of an assignment, a typecast, or a function call involving a prototype. Here's a simple example of truncation:

int g = 0x12345678; short int h; h = g;

When g is assigned to h, the top 16 bits of the value are truncated, and h has a value of 0x5678. So if this data loss occurs in a situation the programmer didn't expect, it could certainly lead to security failures. Listing 6-16 is loosely based on a historic vulnerability in Network File System (NFS) that involves integer truncation.

Listing 6-16. Truncation Vulnerability Example in NFS

void assume_privs(unsigned short uid) {     seteuid(uid);     setuid(uid); } int become_user(int uid) {     if (uid == 0)         die("root isnt allowed");     assume_privs(uid); }

To be fair, this vulnerability is mostly known of anecdotally, and its existence hasn't been verified through source code. NFS forbids users from mounting a disk remotely with root privileges. Eventually, attackers figured out that they could specify a UID of 65536, which would pass the security checks that prevent root access. However, this UID would get assigned to an unsigned short integer and be truncated to a value of 0. Therefore, attackers could assume root's identity of UID 0 and bypass the protection.

Take a look at one more synthetic vulnerability in Listing 6-17 before looking at a real-world truncation issue.

Listing 6-17. Truncation Vulnerabilty Example

unsigned short int f; char mybuf[1024]; char *userstr=getuserstr(); f=strlen(userstr); if (f > sizeof(mybuf)-5)   die("string too long!"); strcpy(mybuf, userstr);

The result of the strlen() function, a size_t, is converted to an unsigned short. If a string is 66,000 characters long, truncation would occur and f would have the value 464. Therefore, the length check protecting strcpy() would be circumvented, and a buffer overflow would occur.

A show-stopping bug in most SSH daemons was caused by integer truncation. Ironically, the vulnerable code was in a function designed to address another security hole, the SSH insertion attack identified by CORE-SDI. Details on that attack are available at www1.corest.com/files/files/11/CRC32.pdf.

The essence of the attack is that attackers can use a clever known plain-text attack against the block cipher to insert small amounts of data of their choosing into the SSH stream. Normally, this attack would be prevented by message integrity checks, but SSH used CRC32, and the researchers at CORE-SDI figured out how to circumvent it in the context of the SSH protocol.

The responsibility of the function containing the truncation vulnerability is to determine whether an insertion attack is occurring. One property of these insertion attacks is a long sequence of similar bytes at the end of the packet, with the purpose of manipulating the CRC32 value so that it's correct. The defense that was engineered was to search for repeated blocks in the packet, and then do the CRC32 calculation up to the point of repeat to determine whether any manipulation was occurring. This method was easy for small packets, but it could have a performance impact on large sets of data. So, presumably to address the performance impact, a hashing scheme was used.

The function you're about to look at has two separate code paths. If the packet is below a certain size, it performs a direct analysis of the data. If it's above that size, it uses a hash table to make the analysis more efficient. It isn't necessary to understand the function to appreciate the vulnerability. If you're curious, however, you'll see that the simpler case for the smaller packets has roughly the algorithm described in Listing 6-18.

Listing 6-18. Detect_attack Small Packet Algorithm in SSH

 for c = each 8 byte block of the packet     if c is equal to the initialization vector block         check c for the attack.         If the check succeeds, return DETECTED.         If the check fails, you aren't under attack so return OK.     for d = each 8 byte block of the packet before c         If d is equal to c, check c for the attack.             If the check succeeds, return DETECTED.             If the check fails, break out of the d loop.     next d next c

The code goes through each 8-byte block of the packet, and if it sees an identical block in the packet before the current one, it does a check to see whether an attack is underway.

The hash-table-based path through the code is a little more complex. It has the same general algorithm, but instead of comparing a bunch of 8-byte blocks with each other, it takes a 32 bit hash of each block and compares them. The hash table is indexed by the 32-bit hash of the 8-byte block, modulo the hash table size, and the bucket contains the position of the block that last hashed to that bucket. The truncation problem happened in the construction and management of the hash table. Listing 6-19 contains the beginning of the code.

Listing 6-19. Detect_attack Truncation Vulnerability in SSH

/* Detect a crc32 compensation attack on a packet */ int detect_attack(unsigned char *buf, u_int32_t len,               unsigned char *IV) {     static u_int16_t *h = (u_int16_t *) NULL;     static u_int16_t n = HASH_MINSIZE / HASH_ENTRYSIZE;     register u_int32_t i, j;     u_int32_t l;     register unsigned char *c;     unsigned char *d;     if (len > (SSH_MAXBLOCKS * SSH_BLOCKSIZE) ||         len % SSH_BLOCKSIZE != 0) {         fatal("detect_attack: bad length %d", len);     }

First, the code checks whether the packet is overly long or isn't a multiple of 8 bytes. SSH_MAXBLOCKS is 32,768 and BLOCKSIZE is 8, so the packet can be as large as 262,144 bytes. In the following code, n starts out as HASH_MINSIZE / HASH_ENTRYSIZE, which is 8,192 / 2, or 4,096, and its purpose is to hold the number of entries in the hash table:

for (l = n; l < HASH_FACTOR(len / SSH_BLOCKSIZE); l = l << 2)        ;

The starting size of the hash table is 8,192 elements. This loop attempts to determine a good size for the hash table. It starts off with a guess of n, which is the current size, and it checks to see whether it's big enough for the packet. If it's not, it quadruples l by shifting it left twice. It decides whether the hash table is big enough by making sure there are 3/2 the number of hash table entries as there are 8-byte blocks in the packet. HASH_FACTOR is defined as ((x)*3/2). The following code is the interesting part:

if (h == NULL) {     debug("Installing crc compensation "           "attack detector.");     n = l;     h = (u_int16_t *) xmalloc(n * HASH_ENTRYSIZE); } else {     if (l > n) {         n = l;         h = (u_int16_t *)xrealloc(h, n * HASH_ENTRYSIZE);     } }

If h is NULL, that means it's your first time through this function and you need to allocate space for a new hash table. If you remember, l is the value calculated as the right size for the hash table, and n contains the number of entries in the hash table. If h isn't NULL, the hash table has already been allocated. However, if the hash table isn't currently big enough to agree with the newly calculated l, you go ahead and reallocate it.

You've looked at enough code so far to see the problem: n is an unsigned short int. If you send a packet that's big enough, l, an unsigned int, could end up with a value larger than 65,535, and when the assignment of l to n occurs, truncation could result. For example, assume you send a packet that's 262,144 bytes. It passes the first check, and then in the loop, l changes like so:

Iteration 1: l = 4096    l <  49152   l<<=4 Iteration 2: l = 16384   l <  49152   l<<=4 Iteration 3: l = 65536   l >= 49152

When l, with a value of 65,536, is assigned to n, the top 16 bits are truncated, and n ends up with a value of 0. On several modern OSs, a malloc() of 0 results in a valid pointer to a small object being returned, and the rest of the function's behavior is extremely suspect.

The next part of the function is the code that does the direct analysis, and because it doesn't use the hash table, it isn't of immediate interest:

    if (len <= HASH_MINBLOCKS) {         for (c = buf; c < buf + len; c += SSH_BLOCKSIZE) {             if (IV && (!CMP(c, IV))) {                 if ((check_crc(c, buf, len, IV)))                     return (DEATTACK_DETECTED);                 else                     break;             }             for (d = buf; d < c; d += SSH_BLOCKSIZE) {                 if (!CMP(c, d)) {                     if ((check_crc(c, buf, len, IV)))                     return (DEATTACK_DETECTED);                 else                     break;                }            }        }        return (DEATTACK_OK);     }

Next is the code that performs the hash-based detection routine. In the following code, keep in mind that n is going to be 0 and h is going to point to a small but valid object in the heap. With these values, it's possible to do some interesting things to the process's memory:

    memset(h, HASH_UNUSEDCHAR, n * HASH_ENTRYSIZE);     if (IV)         h[HASH(IV) & (n - 1)] = HASH_IV;     for (c = buf, j = 0; c < (buf + len); c += SSH_BLOCKSIZE, j++) {         for (i = HASH(c) & (n - 1); h[i] != HASH_UNUSED;              i = (i + 1) & (n - 1)) {             if (h[i] == HASH_IV) {                 if (!CMP(c, IV)) {                     if (check_crc(c, buf, len, IV))                         return (DEATTACK_DETECTED);                     else                         break;                  }              } else if (!CMP(c, buf + h[i] * SSH_BLOCKSIZE)) {                  if (check_crc(c, buf, len, IV))                      return (DEATTACK_DETECTED);                  else                      break;              }           }           h[i] = j;        }     return (DEATTACK_OK); }

If you don't see an immediate way to attack this loop, don't worry. (You are in good company, and also some critical macro definitions are missing.) This bug is extremely subtle, and the exploits for it are complex and clever. In fact, this vulnerability is unique from many perspectives. It reinforces the notion that secure programming is difficult, and everyone can make mistakes, as CORE-SDI is easily one of the world's most technically competent security companies. It also demonstrates that sometimes a simple black box test can uncover bugs that would be hard to find with a source audit; the discoverer, Michael Zalewski, located this vulnerability in a stunningly straightforward fashion (ssh -l long_user_name). Finally, it highlights a notable case in which writing an exploit can be more difficult than finding its root vulnerability.

Auditing Tip: Truncation

Truncation-related vulnerabilities are typically found where integer values are assigned to smaller data types, such as short integers or characters. To find truncation issues, look for locations where these shorter data types are used to track length values or to hold the result of a calculation. A good place to look for potential variables is in structure definitions, especially in network-oriented code.

Programmers often use a short or character data type just because the expected range of values for a variable maps to that data type nicely. Using these data types can often lead to unanticipated truncations, however.

Comparisons

You've already seen examples of signed comparisons against negative numbers in length checks and how they can lead to security exposures. Another potentially hazardous situation is comparing two integers that have different types. As you've learned, when a comparison is made, the compiler first performs integer promotions on the operands and then follows the usual arithmetic conversions on the operands so that a comparison can be made on compatible types. Because these promotions and conversions might result in value changes (because of sign change), the comparison might not be operating exactly as the programmer intended. Attackers can take advantage of these conversions to circumvent security checks and often compromise an application.

To see how comparisons can go wrong, take a look at Listing 6-20. This code reads a short integer from the network, which specifies the length of an incoming packet. The first half of the length check compares (length sizeof(short)) with 0 to make sure the specified length isn't less than sizeof(short). If it is, it could wrap around to a large integer when sizeof(short) is subtracted from it later in the read() statement.

Listing 6-20. Comparison Vulnerability Example

#define MAX_SIZE 1024 int read_packet(int sockfd) {     short length;     char buf[MAX_SIZE];     length = network_get_short(sockfd);     if(length  sizeof(short) <= 0 || length > MAX_SIZE){         error("bad length supplied\n");         return 1;     }     if(read(sockfd, buf, length sizeof(short)) < 0){         error("read: %m\n");         return 1;     }     return 0; }

The first check is actually incorrect. Note that the result type of the sizeof operator is a size_t, which is an unsigned integer type. So for the subtraction of (length - sizeof(short)), length is first promoted to a signed int as part of the integer promotions, and then converted to an unsigned integer type as part of the usual arithmetic conversions. The resulting type of the subtraction operation is an unsigned integer type. Consequently, the result of the subtraction can never be less than 0, and the check is effectively inoperative. Providing a value of 1 for length evades the very condition that the length check in the first half of the if statement is trying to protect against and triggers an integer underflow in the call to read().

More than one value can be supplied to evade both checks and trigger a buffer overflow. If length is a negative number, such as 0xFFFF, the first check still passes because the result type of the subtraction is always unsigned. The second check also passes (length > MAX_SIZE) because length is promoted to a signed int for the comparison and retains its negative value, which is less than MAX_SIZE (1024). This result demonstrates that the length variable is treated as unsigned in one case and signed in another case because of the other operands used in the comparison.

When dealing with data types smaller than int, integer promotions cause narrow values to become signed integers. This is a value-preserving promotion and not much of a problem in itself. However, sometimes comparisons can be promoted to a signed type unintentionally. Listing 6-21 illustrates this problem.

Listing 6-21. Signed Comparison Vulnerability

int read_data(int sockfd) {     char buf[1024];     unsigned short max = sizeof(buf);     short length;     length = get_network_short(sockfd);     if(length > max){         error("bad length: %d\n", length);         return 1;     }     if(read(sockfd, buf, length) < 0){         error("read: %m");         return 1;     }     ... process data ...     return 0; }

Listing 6-21 illustrates why you must be aware of the resulting data type used in a comparison. Both the max and length variables are short integers and, therefore, go through integer conversions; both get promoted to signed integers. This means any negative value supplied in length evades the length check against max. Because of data type conversions performed in a comparison, not only can sanity checks be evaded, but the entire comparison could be rendered useless because it's checking for an impossible condition. Consider Listing 6-22.

Listing 6-22. Unsigned Comparison Vulnerability

int get_int(char *data) {     unsigned int n = atoi(data);     if(n < 0 || n > 1024)         return 1;     return n; } int main(int argc, char **argv) {     unsigned long n;     char buf[1024];     if(argc < 2)        exit(0);     n = get_int(argv[1]);     if(n < 0){         fprintf(stderr, "illegal length specified\n");         exit(-1);     }     memset(buf, 'A', n);     return 0; }

Listing 6-22 checks the variable n to make sure it falls within the range of 0 to 1024. Because the variable n is unsigned, however, the check for less than 0 is impossible. An unsigned integer can never be less than 0 because every value that can be represented is positive. The potential vulnerability is somewhat subtle; if attackers provide an invalid integer as argv[1], get_int() returns a -1, which is converted to an unsigned long when assigned to n. Therefore, it would become a large value and end up causing memset() to crash the program.

Compilers can detect conditions that will never be true and issue a warning if certain flags are passed to it. See what happens when the preceding code is compiled with GCC:

[root@doppelganger root]# gcc -Wall -o example example.c [root@doppelganger root]# gcc -W -o example example.c example.c: In function 'get_int': example.c:10: warning: comparison of unsigned expression < 0 is always                        false example.c: In function 'main': example.c:25: warning: comparison of unsigned expression < 0 is always                        false [root@doppelganger root]#

Notice that the -Wall flag doesn't warn about this type of error as most developers would expect. To generate a warning for this type of bug, the -W flag must be used. If the code if(n < 0) is changed to if(n <= 0), a warning isn't generated because the condition is no longer impossible. Now take a look at a real-world example of a similar mistake. Listing 6-23 is taken from the PHP Apache module (4.3.4) when reading POST data.

Listing 6-23. Signed Comparison Example in PHP

/* {{{ sapi_apache_read_post  */ static int sapi_apache_read_post(char *buffer,                                  uint count_bytes TSRMLS_DC) {     uint total_read_bytes=0, read_bytes;     request_rec *r = (request_rec *) SG(server_context);     void (*handler)(int);     /*      * This handles the situation where the browser sends a      * Expect: 100-continue header and needs to receive      * confirmation from the server on whether or not it      * can send the rest of the request. RFC 2616      *      */     if (!SG(read_post_bytes) && !ap_should_client_block(r)) {         return total_read_bytes;     }     handler = signal(SIGPIPE, SIG_IGN);     while (total_read_bytes<count_bytes) {         /* start timeout timer */         hard_timeout("Read POST information", r);         read_bytes = get_client_block(r,                        buffer + total_read_bytes,                        count_bytes - total_read_bytes);         reset_timeout(r);         if (read_bytes<=0) {             break;         }         total_read_bytes += read_bytes;     }     signal(SIGPIPE, handler);     return total_read_bytes; }

The return value from get_client_block() is stored in the read_bytes variable and then compared to make sure a negative number wasn't returned. Because read_bytes is unsigned, this check doesn't detect errors from get_client_block() as intended. As it turns out, this bug isn't immediately exploitable in this function. Can you see why? The loop controlling the loop also has an unsigned comparison, so if total_read_bytes is decremented under 0, it underflows and, therefore, takes a value larger than count_bytes, thus exiting the loop.

Auditing Tip

Reviewing comparisons is essential to auditing C code. Pay particular attention to comparisons that protect allocation, array indexing, and copy operations. The best way to examine these comparisons is to go line by line and carefully study each relevant expression.

In general, you should keep track of each variable and its underlying data type. If you can trace the input to a function back to a source you're familiar with, you should have a good idea of the possible values each input variable can have. Proceed through each potentially interesting calculation or comparison, and keep track of potential values of the variables at different points in the function evaluation. You can use a process similar to the one outlined in the previous section on locating integer boundary condition issues.

When you evaluate a comparison, be sure to watch for unsigned integer values that cause their peer operands to be promoted to unsigned integers. sizeof and strlen () are classic examples of operands that cause this promotion.

Remember to keep an eye out for unsigned variables used in comparisons, like the following:

if (uvar < 0) ... if (uvar <= 0) ...

The first form typically causes the compiler to emit a warning, but the second form doesn't. If you see this pattern, it's a good indication something is probably wrong with that section of the code. You should do a careful line-by-line analysis of the surrounding functionality.