3.5 Preventing Integer Coercion and Wrap-Around Problems | Secure Programming Cookbook for C and C++: Recipes for Cryptography, Authentication, Input Validation & More

3.5.1 Problem

When using integer values, it is possible to make values go out of range in ways that are not obvious. In some cases, improperly validated integer values can lead to security problems, particularly when data gets truncated or when it is converted from a signed value to an unsigned value or vice versa. Unfortunately, such conversions often happen behind your back.

3.5.2 Solution

Unfortunately, integer coercion and wrap-around problems currently require you to be diligent.

Best practices for such problems require that you validate any coercion that takes place. To do this, you need to understand the semantics of the library functions you use well enough to know when they may implicitly cast data.

In addition, you should explicitly check for cases where integer data may wrap around. It is particularly important to perform wrap-around checks immediately before using data.

3.5.3 Discussion

Integer type problems are often quite subtle. As a result, they are very difficult to avoid and very difficult to catch unless you are exceedingly careful. There are several different ways that these problems can manifest themselves, but they always boil down to a type mismatch. In the following subsections, we'll illustrate the various classes of integer type errors with examples.

3.5.3.1 Signed-to-unsigned coercion

Many API functions take only positive values, and programmers often take advantage of that fact. For example, consider the following code excerpt:

if (x < MAX_SIZE) {   if (!(ptr = (unsigned char *)malloc(x))) abort(  ); } else {   /* Handle the error condition ... */ }

We might test against MAX_SIZE to protect against denial of service problems where an attacker causes us to allocate a large amount of memory. At first glance, the previous code seems to protect against that. Indeed, some people will worry about what happens in the case where someone tries to malloc( ) a negative number of bytes.

It turns out that malloc( )'s argument is of type size_t, which is an unsigned type. As a result, any negative numbers are converted to positive numbers. Therefore, we do not have to worry about allocating a negative number of bytes; it cannot happen.

However, the previous code may still not work correctly. The key to its correct operation is the data type of x. If x is some signed data type, such as an int, and is a negative value, we will end up allocating a large amount of data. For example, if an attacker manages to set x to -1, the call to malloc( ) will try to allocate 4,294,967,295 bytes on most platforms, because the hexadecimal value of that number (0xFFFFFFF) is the same hexadecimal representation of a signed 32-bit -1.

There are a few ways to alleviate this particular problem:

You can make sure never to use signed data types. Unfortunately, that is not very practical particularly when you are using API functions that take both signed and unsigned values. If you try to ensure that all your data is always unsigned, you might end up with an unsigned-to-signed conversion problem when you call a library function that takes a regular int instead of an unsigned int or a size_t.
You can check to make sure x is not negative while it is still signed. There is nothing wrong with this solution. Basically, you are always assuming the worst (that the data may be cast), and it might not be.
You can cast x to a size_t before you do your testing. This is a good strategy for those who prefer testing data as close as possible to the state in which it is going to be used to prevent an unanticipated change in the meantime. Of course, the cast to a signed value might be unanticipated for the many programmers out there who do not know that size_t is not a signed data type. For those people, the second solution makes more sense.

No matter what solution you prefer, you will need to be diligent about conversions that might apply to your data when you perform your bounds checking.

3.5.3.2 Unsigned-to-signed coercion

Problems may also occur when an unsigned value gets converted to a signed value. For example, consider the following code:

int main(int argc, char *argv[  ]) {   char         foo[  ] = "abcdefghij";   char         *p = foo + 4;   unsigned int x = 0xffffffff;      if (p + x > p + strlen(p)) {     printf("Buffer overflow!\n");     return -1;   }   printf("%s\n", p + x);   return 0; }

The poor programmer who wrote this code is properly preventing from reading past the high end of p, but he probably did not realize that the pointers are signed. Because x is -1 once it is cast to a signed value, the result of p + x will be the byte of memory immediately preceding the address to which p points.

While this code is a contrived example, this is still a very real problem. For example, say you have an array of fixed-size records. The program might wish to write arbitrary data into a record where the user supplies the record number, and the program might calculate the memory address of the item of interest dynamically by multiplying the record number by the size of a record, and then adding that to the address at which the records begin. Generally, programmers will make sure the item index is not too high, but they may not realize that the index might be too low!

In addition, it is good to remember that array accesses are rewritten as pointer arithmetic. For example, arr[x] can index memory before the start of your array if x is less than 0 once converted to a signed integer.

3.5.3.3 Size mismatches

You may also encounter problems when an integer type of one size gets converted to an integer type of another size. For example, suppose that you store an unsigned 64-bit quantity in x, then pass x to an operation that takes an unsigned 32-bit quantity. In C, the upper 32 bits will get truncated. Therefore, if you need to check for overflow, you had better do it before the cast happens!

Conversely, when there is an implicit coercion from a small value to a large value, remember that the sign bit will probably extend out, which may not be intended. That is, when C converts a signed value to a different-sized signed value, it does not simply start treating the same bits as a signed value. When growing a number, C will make sure that it retains the same value it once had, even if the binary representation is different. When shrinking the value, C may truncate, but even if it does, the sign will be the same as it was before truncation, which may result in an unexpected binary representation.

For example, you might have a string declared as a char *, then want to treat the bytes as integers. Consider the following code:

int main(int argc, char *argv[  ]) {   int x = 0;       if (argc > 1) x += argv[1][0];   printf("%d\n", x); }

If argv[1][0] happens to be 0xFF, x will end up -1 instead of 255! Even if you declare x to be an unsigned int, you will still end up with x being 0xFFFFFFFF instead of the desired 0xFF, because C converts size before sign. That is, a char will get sign-extended into an int before being coerced into an unsigned int.

3.5.3.4 Wrap-around

A very similar problem (with the same remediation strategy as those described in previous subsections) occurs when a variable wraps around. For example, when you add 1 to the maximum unsigned value, you will get zero. When you add 1 to the maximum signed value, you will get the minimum possible signed value.

This problem often crops up when using a high-precision clock. For example, some people use a 32-bit real-time clock, then check to see if one event occurs before another by testing the clock. Of course, if the clock rolls over (a millisecond clock that uses an unsigned 32-bit value will wrap around every 49.71 days or so), the result of your test is likely to be wrong!

In any case, you should be keeping track of wrap-arounds and taking appropriate measures when they occur. Often, when you're using a real-time clock, you can simply use a clock with more precision. For example, recent x86 chips offer the RDTSC instruction, which provides 64 bits of precision. (See Recipe 4.14.)

3.5.4 See Also

Recipe 4.14