C is extremely flexible in handling the interaction of different data types. For example, with a few casts, you can easily multiply an unsigned character with a signed long integer, add it to a character pointer, and then pass the result to a function expecting a pointer to a structure. Programmers are used to this flexibility, so they tend to mix data types without worrying too much about what's going on behind the scenes. To deal with this flexibility, when the compiler needs to convert an object of one type into another type, it performs what's known as a type conversion. There are two forms of type conversions: explicit type conversions, in which the programmer explicitly instructs the compiler to convert from one type to another by casting, and implicit type conversions, in which the compiler does "hidden" transformations of variables to make the program function as expected. Note You might see type conversions referred to as "type coercions" in programming-language literature; the terms are synonymous. Often it's surprising when you first learn how many implicit conversions occur behind the scenes in a typical C program. These automatic type conversions, known collectively as the default type conversions, occur almost magically when a programmer performs seemingly straightforward tasks, such as making a function call or comparing two numbers. The vulnerabilities resulting from type conversions are often fascinating, because they can be subtle and difficult to locate in source code, and they often lead to situations in which the patch for a critical remote vulnerability is as simple as changing a char to an unsigned char. The rules governing these conversions are deceptively subtle, and it's easy to believe you have a solid grasp of them and yet miss an important nuance that makes a world of difference when you analyze or write code. Instead of jumping right into known vulnerability classes, first you look at how C compilers perform type conversions at a low level, and then you study the rules of C in detail to learn about all the situations in which conversions take place. This section is fairly long because you have to cover a lot of ground before you have the foundation to analyze C's type conversions with confidence. However, this aspect of the language is subtle enough that it's definitely worth taking the time to gain a solid understanding of the ground rules; you can leverage this understanding to find vulnerabilities that most programmers aren't aware of, even at a conceptual level. OverviewWhen faced with the general problem of reconciling two different types, C goes to great lengths to avoid surprising programmers. The compilers follow a set of rules that attempt to encapsulate "common sense" about how to manage mixing different types, and more often than not, the result is a program that makes sense and simply does what the programmer intended. That said, applying these rules can often lead to surprising, unexpected behaviors. Moreover, as you might expect, these unexpected behaviors tend to have dire security consequences. You start in the next section by exploring the conversion rules, the general rules C uses when converting between types. They dictate how a machine converts from one type to another type at the bit level. After you have a good grasp of how C converts between different types at the machine level, you examine how the compiler chooses which type conversions to apply in the context of C expressions, which involves three important concepts: simple conversions, integer promotions, and usual arithmetic conversions. Note Although non-integer types, such as floats and pointers, have some coverage, the primary focus of this discussion is on how C manipulates integers because these conversions are widely misunderstood and are critical for security analysis. Conversion RulesThe following rules describe how C converts from one type to another, but they don't describe when conversions are performed or why they are performed. Note The following content is specific to twos complement implementations and represents a distilled and pragmatic version of the rules in the C specification. Integer Types: Value PreservationAn important concept in integer type conversions is the notion of a value-preserving conversion. Basically, if the new type can represent all possible values of the old type, the conversion is said to be value-preserving. In this situation, there's no way the value can be lost or changed as a result of the conversion. For example, if an unsigned char is converted into an int, the conversion is value-preserving because an int can represent all of the values of an unsigned char. You can verify this by referring to Table 6-2 again. Assuming you're considering a twos complement machine, you know that an 8-bit unsigned char can represent any value between 0 and 255. You know that a 32-bit int can represent any value between -2147483648 and 2147483647. Therefore, there's no value the unsigned char can have that the int can't represent. Correspondingly, in a value-changing conversion, the old type can contain values that can't be represented in the new type. For example, if you convert an int into an unsigned int, you have potentially created an intractable situation. The unsigned int, on a 32-bit machine, has a range of 0 to 4294967295, and the int has a range of -2147483648 to 2147483647. The unsigned int can't hold any of the negative values a signed int can represent. According to the C standard, some of the value-changing conversions have implementation-defined results. This is true only for value-changing conversions that have a signed destination type; value-changing conversions to an unsigned type are defined and consistent across all implementations. (If you recall from the boundary condition discussion, this is because unsigned arithmetic is defined as a modulus arithmetic system.) Twos complement machines follow the same basic behaviors, so you can explain how they perform value-changing conversions to signed destination types with a fair amount of confidence. Integer Types: WideningWhen you convert from a narrow type to a wider type, the machine typically copies the bit pattern from the old variable to the new variable, and then sets all the remaining high bits in the new variable to 0 or 1. If the source type is unsigned, the machine uses zero extension, in which it propagates the value 0 to all high bits in the new wider type. If the source type is signed, the machine uses sign extension, in which it propagates the sign bit from the source type to all unused bits in the destination type. Warning The widening procedure might have some unexpected implications: If a narrow signed type, such as signed char, is converted to a wider unsigned type, such as unsigned int, sign extension occurs. Figure 6-1 shows a value-preserving conversion of an unsigned char with a value of 5 to a signed int. Figure 6-1. Conversion of unsigned char to int (zero extension, big endian)
The character is placed into the integer, and the value is preserved. At the bit pattern level, this simply involved zero extension: clearing out the high bits and moving the least significant byte (LSB) into the new object's LSB. Now consider a signed char being converted into a int. A int can represent all the values of a signed char, so this conversion is also value-preserving. Figure 6-2 shows what this conversion looks like at the bit level. Figure 6-2. Conversion of signed char to integer (sign extension, big endian)
This situation is slightly different, as the value is the same, but the transformation is more involved. The bit representation of -5 in a signed char is 1111 1011. The bit representation of -5 in an int is 1111 1111 1111 1111 1111 1111 1111 1011. To do the conversion, the compiler generates assembly that performs sign extension. You can see in Figure 6-2 that the sign bit is set in the signed char, so to preserve the value -5, the sign bit has to be copied to the other 24 bits of the int. The previous examples are value-preserving conversions. Now consider a value-changing widening conversion. Say you convert a signed char with a value of -5 to an unsigned int. Because the source type is signed, you perform sign extension on the signed char before placing it in the unsigned int (see Figure 6-3). Figure 6-3. Conversion of signed char to unsigned integer (sign extension, big endian)
As mentioned previously, this result can be surprising to developers. You explore its security ramifications in "Sign Extension" later in this chapter. This conversion is value changing because an unsigned int can't represent values less than 0. Integer Types: NarrowingWhen converting from a wider type to a narrower type, the machine uses only one mechanism: truncation. The bits from the wider type that don't fit in the new narrower type are dropped. Figures 6-4 and 6-5 show two narrowing conversions. Note that all narrowing conversions are value-changing because you're losing precision. Figure 6-4. Conversion of integer to unsigned short integer (truncation, big endian)
Figure 6-5. Conversion of integer to signed char (truncation, big endian)
Integer Types: Signed and UnsignedOne final type of integer conversion to consider: If a conversion occurs between a signed type and an unsigned type of the same width, nothing is changed in the bit pattern. This conversion is value-changing. For example, say you have the signed integer -1, which is represented in binary as 1111 1111 1111 1111 1111 1111 1111 1111. If you interpret this same bit pattern as an unsigned integer, you see a value of 4,294,967,295. The conversion is summarized in Figure 6-6. The conversion from unsigned int to int technically might be implementation defined, but it works in the same fashion: The bit pattern is left alone, and the value is interpreted in the context of the new type (see Figure 6-7). Figure 6-6. Conversion of int to unsigned int (big endian)
Figure 6-7. Conversion of unsigned int to signed int (big endian)
Integer Type Conversion SummaryHere are some practical rules of thumb for integer type conversions:
Table 6-4 summarizes the processing that occurs when different integer types are converted in twos complement implementations of C. As you cover the information in the following sections, this table can serve as a useful reference for recalling how a conversion occurs.
Floating Point and Complex TypesAlthough vulnerabilities caused by the use of floating point arithmetic haven't been widely published, they are certainly possible. There's certainly the possibility of subtle errors surfacing in financial software related to floating point type conversions or representation issues. The discussion of floating point types in this chapter is fairly brief. For more information, refer to the C standards documents and the previously mentioned C programming references. The C standard's rules for conversions between real floating types and integer types leave a lot of room for implementation-defined behaviors. In a conversion from a real type to an integer type, the fractional portion of the number is discarded. If the integer type can't represent the integer portion of the floating point number, the result is undefined. Similarly, a conversion from an integer type to a real type transfers the value over if possible. If the real type can't represent the integer's value but can come close, the compiler rounds the integer to the next highest or lowest number in an implementation-defined manner. If the integer is outside the range of the real type, the result is undefined. Conversions between floating point types of different precision are handled with similar logic. Promotion causes no change in value. During a demotion that causes a change in value, the compiler is free to round numbers, if possible, in an implementation-defined manner. If rounding isn't possible because of the range of the target type, the result is undefined. Other TypesThere are myriad other types in C beyond integers and floats, including pointers, Booleans, structures, unions, functions, arrays, enums, and more. For the most part, conversion among these types isn't quite as critical from a security perspective, so they aren't extensively covered in this chapter. Pointer arithmetic is covered in "Pointer Arithmetic" later in this chapter. Pointer type conversion depends largely on the underlying machine architecture, and many conversions are specified as implementation defined. Essentially, programmers are free to convert pointers into integers and back, and convert pointers from one type to another. The results are implementation defined, and programmers need to be cognizant of alignment restrictions and other low-level details. Simple ConversionsNow that you have a good idea how C converts from one integer type to another, you can look at some situations where these type conversions occur. Simple conversions are C expressions that use straightforward applications of conversion rules. CastsAs you know, typecasts are C's mechanism for letting programmers specify an explicit type conversion, as shown in this example: (unsigned char) bob Whatever type bob happens to be, this expression converts it into an unsigned char type. The resulting type of the expression is unsigned char. AssignmentsSimple type conversion also occurs in the assignment operator. The compiler must convert the type of the right operand into the type of the left operand, as shown in this example: short int fred; int bob = -10; fred = bob; For both assignments, the compiler must take the object in the right operand and convert it into the type of the left operand. The conversion rules tell you that conversion from the int bob to the short int fred results in truncation. Function Calls: PrototypesC has two styles of function declarations: the old K&R style, in which parameter types aren't specified in the function declaration, and the new ANSI style, in which the parameter types are part of the declaration. In the ANSI style, the use of function prototypes is still optional, but it's common. With the ANSI style, you typically see something like this: int dostuff(int jim, unsigned char bob); void func(void) { char a=42; unsigned short b=43; long long int c; c=dostuff(a, b); } The function declaration for dostuff() contains a prototype that tells the compiler the number of arguments and their types. The rule of thumb is that if the function has a prototype, the types are converted in a straightforward fashion using the rules documented previously. If the function doesn't have a prototype, something called the default argument promotions kicks in (explained in "Integer Promotions"). The previous example has a character (a) being converted into an int (jim), an unsigned short (b) being converted into an unsigned char (bob), and an int (the dostuff() function's return value) being converted into a long long int (c). Function Calls: returnreturn does a conversion of its operand to the type specified in the enclosing function's definition. For example, the int a is converted into a char data type by return: char func(void) { int a=42; return a; } Integer PromotionsInteger promotions specify how C takes a narrow integer data type, such as a char or short, and converts it to an int (or, in rare cases, to an unsigned int). This up-conversion, or promotion, is used for two different purposes:
Note You might see the terms "integer promotions" and "integral promotions" used interchangeably in other literature, as they are synonymous. There's a useful concept from the C standards: Each integer data type is assigned what's known as an integer conversion rank. These ranks order the integer data types by their width from lowest to highest. The signed and unsigned varieties of each type are assigned the same rank. The following abridged list sorts integer types by conversion rank from high to low. The C standard assigns ranks to other integer types, but this list should suffice for this discussion:
Basically, any place in C where you can use an int or unsigned int, you can also use any integer type with a lower integer conversion rank. This means you can use smaller types, such as chars and short ints, in the place of ints in C expressions. You can also use a bit field of type _Bool, int, signed int, or unsigned int. The bit fields aren't ascribed integer conversion ranks, but they are treated as narrower than their corresponding base type. This makes sense because a bit field of an int is usually smaller than an int, and at its widest, it's the same width as an int. If you apply the integer promotions to a variable, what happens? First, if the variable isn't an integer type or a bit field, the promotions do nothing. Second, if the variable is an integer type, but its integer conversion rank is greater than or equal to that of an int, the promotions do nothing. Therefore, ints, unsigned ints, long ints, pointers, and floats don't get altered by the integer promotions. So, the integer promotions are responsible for taking a narrower integer type or bit field and promoting it to an int or unsigned int. This is done in a straightforward fashion: If a value-preserving transformation to an int can be performed, it's done. Otherwise, a value-preserving conversion to an unsigned int is performed. In practice, this means almost everything is converted to an int, as an int can hold the minimum and maximum values of all the smaller types. The only types that might be promoted to an unsigned int are unsigned int bit fields with 32 bits or perhaps some implementation-specific extended integer types. Historical Note The C89 standard made an important change to the C type conversion rules. In the K&R days of the C language, integer promotions were unsigned-preserving rather than value-preserving. So with the current C rules, if a narrower, unsigned integer type, such as an unsigned char, is promoted to a wider, signed integer, such as an int, value conversion dictates that the new type is a signed integer. With the old rules, the promotion would preserve the unsigned-ness, so the resulting type would be an unsigned int. This changed the behavior of many signed/unsigned comparisons that involved promotions of types narrower than int. Integer Promotions SummaryThe basic rule of thumb is this: If an integer type is narrower than an int, integer promotions almost always convert it to an int. Table 6-5 summarizes the result of integer promotions on a few common types.
Integer Promotion ApplicationsNow that you understand integer promotions, the following sections examine where they are used in the C language. Unary + OperatorThe unary + operator performs integer promotions on its operand. For example, if the bob variable is of type char, the resulting type of the expression (+bob) is int, whereas the resulting type of the expression (bob) is char. Unary - OperatorThe unary - operator does integer promotion on its operand and then does a negation. Regardless of whether the operand is signed after the promotion, a twos complement negation is performed, which involves inverting the bits and adding 1.
Unary ~ OperatorThe unary ~ operator does a ones complement of its operand after doing an integer promotion of its operand. This effectively performs the same operation on both signed and unsigned operands for twos complement implementations: It inverts the bits. Bitwise Shift OperatorsThe bitwise shift operators >> and << shift the bit patterns of variables. The integer promotions are applied to both arguments of these operators, and the type of the result is the same as the promoted type of the left operand, as shown in this example: char a = 1; char c = 16; int bob; bob = a << c; a is converted to an integer, and c is converted to an integer. The promoted type of the left operand is int, so the type of the result is an int. The integer representation of a is left-shifted 16 times. Switch StatementsInteger promotions are used in switch statements. The general form of a switch statement is something like this: switch (controlling expression) { case (constant integer expression): body; break; default: body; break; } The integer promotions are used in the following way: First, they are applied to the controlling expression, so that expression has a promoted type. Then, all the integer constants are converted to the type of the promoted control expression. Function InvocationsOlder C programs using the K&R semantics don't specify the data types of arguments in their function declarations. When a function is called without a prototype, the compiler has to do something called default argument promotions. Basically, integer promotions are applied to each function argument, and any arguments of the float type are converted to arguments of the double type. Consider the following example: int jim(bob) char bob; { printf("bob=%d\n", bob); } int main(int argc, char **argv) { char a=5; jim(a); } In this example, a copy of the value of a is passed to the jim() function. The char type is first run through the integer promotions and transformed into an integer. This integer is what's passed to the jim() function. The code the compiler emits for the jim() function is expecting an integer argument, and it performs a direct conversion of that integer back into a char format for the bob variable. Usual Arithmetic ConversionsIn many situations, C is expected to take two operands of potentially divergent types and perform some arithmetic operation that involves both of them. The C standards spell out a general algorithm for reconciling two types into a compatible type for this purpose. This procedure is known as the usual arithmetic conversions. The goal of these conversions is to transform both operands into a common real type, which is used for the actual operation and then as the type of the result. These conversions apply only to the arithmetic typesinteger and floating point types. The following sections tackle the conversion rules. Rule 1: Floating Points Take PrecedenceFloating point types take precedence over integer types, so if one of the arguments in an arithmetic expression is a floating point type, the other argument is converted to a floating point type. If one floating point argument is less precise than the other, the less precise argument is promoted to the type of the more precise argument. Rule 2: Apply Integer PromotionsIf you have two operands and neither is a float, you get into the rules for reconciling integers. First, integer promotions are performed on both operands. This is an extremely important piece of the puzzle! If you recall from the previous section, this means any integer type smaller than an int is converted into an int, and anything that's the same width as an int, larger than an int, or not an integer type is left alone. Here's a brief example: unsigned char jim = 255; unsigned char bob = 255; if ((jim + bob) > 300) do_something(); In this expression, the + operator causes the usual arithmetic conversions to be applied to its operands. This means both jim and bob are promoted to ints, the addition takes place, and the resulting type of the expression is an int that holds the result of the addition (510). Therefore, do_something() is called, even though it looks like the addition could cause a numeric overflow. To summarize: Whenever there's arithmetic involving types narrower than an integer, the narrow types are promoted to integers behind the scenes. Here's another brief example: unsigned short a=1; if ((a-5) < 0) do_something(); Intuition would suggest that if you have an unsigned short with the value 1, and it's subtracted by 5, it underflows around 0 and ends up containing a large value. However, if you test this fragment, you see that do_something() is called because both operands of the subtraction operator are converted to ints before the comparison. So a is converted from an unsigned short to an int, and then an int with a value of 5 is subtracted from it. The resulting value is -4, which is a valid integer value, so the comparison is true. Note that if you did the following, do_something() wouldn't be called: unsigned short a=1; a=a-5; if (a < 0) do_something(); The integer promotion still occurs with the (a-5), but the resulting integer value of -4 is placed back into the unsigned short a. As you know, this causes a simple conversion from signed int to unsigned short, which causes truncation to occur, and a ends up with a large positive value. Therefore, the comparison doesn't succeed. Rule 3: Same Type After Integer PromotionsIf the two operands are of the same type after integer promotions are applied, you don't need any further conversions because the arithmetic should be straightforward to carry out at the machine level. This can happen if both operands have been promoted to an int by integer promotions, or if they just happen to be of the same type and weren't affected by integer promotions. Rule 4: Same Sign, Different TypesIf the two operands have different types after integer promotions are applied, but they share the same signed-ness, the narrower type is converted to the type of the wider type. In other words, if both operands are signed or both operands are unsigned, the type with the lesser integer conversion rank is converted to the type of the operand with the higher conversion rank. Note that this rule has nothing to do with short integers or characters because they have already been converted to integers by integer promotions. This rule is more applicable to arithmetic involving types of larger sizes, such as long long int or long int. Here's a brief example: int jim =5; long int bob = 6; long long int fred; fred = (jim + bob) Integer promotions don't change any types because they are of equal or higher width than the int type. So this rule mandates that the int jim be converted into a long int before the addition occurs. The resulting type, a long int, is converted into a long long int by the assignment to fred. In the next section, you consider operands of different types, in which one is signed and the other is unsigned, which gets interesting from a security perspective. Rule 5: Unsigned Type Wider Than or Same Width as Signed TypeThe first rule for this situation is that if the unsigned operand is of greater integer conversion rank than the signed operand, or their ranks are equal, you convert the signed operand to the type of the unsigned operand. This behavior can be surprising, as it leads to situations like this: int jim = -5; if (jim < sizeof (int)) do_something(); The comparison operator < causes the usual arithmetic conversions to be applied to both operands. Integer promotions are applied to jim and to sizeof(int), but they don't affect them. Then you continue into the usual arithmetic conversions and attempt to figure out which type should be the common type for the comparison. In this case, jim is a signed integer, and sizeof (int) is a size_t, which is an unsigned integer type. Because size_t has a greater integer conversion rank, the unsigned type takes precedence by this rule. Therefore, jim is converted to an unsigned integer type, the comparison fails, and do_something() isn't called. On a 32-bit system, the actual comparison is as follows: if (4294967291 < 4) do_something(); Rule 6: Signed Type Wider Than Unsigned Type, Value Preservation PossibleIf the signed operand is of greater integer conversion rank than the unsigned operand, and a value-preserving conversion can be made from the unsigned integer type to the signed integer type, you choose to transform everything to the signed integer type, as in this example: long long int a=10; unsigned int b= 5; (a+b); The signed argument, a long long int, can represent all the values of the unsigned argument, an unsigned int, so the compiler would convert both operands to the signed operand's type: long long int. Rule 7: Signed Type Wider Than Unsigned Type, Value Preservation ImpossibleThere's one more rule: If the signed integer type has a greater integer conversion rank than the unsigned integer type, but all values of the unsigned integer type can't be held in the signed integer type, you have to do something a little strange. You take the type of the signed integer type, convert it to its corresponding unsigned integer type, and then convert both operands to use that type. Here's an example: unsigned int a = 10; long int b=20; (a+b); For the purpose of this example, assume that on this machine, the long int size has the same width as the int size. The addition operator causes the usual arithmetic conversions to be applied. Integer promotions are applied, but they don't change the types. The signed type (long int) is of higher rank than the unsigned type (unsigned int). The signed type (long int) can't hold all the values of the unsigned type (unsigned int), so you're left with the last rule. You take the type of the signed operand, which is a long int, convert it into its corresponding unsigned equivalent, unsigned long int, and then convert both operands to unsigned long int. The addition expression, therefore, has a resulting type of unsigned long int and a value of 30. Summary of Arithmetic ConversionsThe following is a summary of the usual arithmetic conversions. Table 6-6 demonstrates some sample applications of the usual arithmetic conversions.
Usual Arithmetic Conversion ApplicationsNow that you have a grasp of the usual arithmetic conversions, you can look at where these conversions are used. AdditionAddition can occur between two arithmetic types as well as between a pointer type and an arithmetic type. Pointer arithmetic is explained in "Pointer Arithmetic," but for now, you just need to note that when both arguments are an arithmetic type, the compiler applies the usual arithmetic conversions to them. SubtractionThere are three types of subtraction: subtraction between two arithmetic types, subtraction between a pointer and an arithmetic type, and subtraction between two pointer types. In subtraction between two arithmetic types, C applies the usual arithmetic conversions to both operands. Multiplicative OperatorsThe operands to * and / must be an arithmetic type, and the arguments to % must be an integer type. The usual arithmetic conversions are applied to both operands of these operators. Relational and Equality OperatorsWhen two arithmetic operands are compared, the usual arithmetic conversions are applied to both operands. The resulting type is an int, and its value is 1 or 0, depending on the result of the test. Binary Bitwise OperatorsThe binary bitwise operators &, ^, and | require integer operands. The usual arithmetic conversions are applied to both operands. Question Mark OperatorFrom a type conversion perspective, the conditional operator is one of C's more interesting operators. Here's a short example of how it's commonly used: int a=1; unsigned int b=2; int choice=-1; ... result = choice ? a : b ; In this example, the first operand, choice, is evaluated as a scalar. If it's set, the result of the expression is the evaluation of the second operand, which is a. If it's not set, the result is the evaluation of the third operand, b. The compiler has to know at compile time the result type of the conditional expression, which could be tricky in this situation. What C does is determine which type would be the result of running the usual arithmetic conversions against the second and third arguments, and it makes that type the resulting type of the expression. So in the previous example, the expression results in an unsigned int, regardless of the value of choice. Type Conversion SummaryTable 6-7 shows the details of some common type conversions.
Auditing Tip: Type Conversions Even those who have studied conversions extensively might still be surprised at the way a compiler renders certain expressions into assembly. When you see code that strikes you as suspicious or potentially ambiguous, never hesitate to write a simple test program or study the generated assembly to verify your intuition. If you do generate assembly to verify or explore the conversions discussed in this chapter, be aware that C compilers can optimize out certain conversions or use architectural tricks that might make the assembly appear incorrect or inconsistent. At a conceptual level, compilers are behaving as the C standard describes, and they ultimately generate code that follows the rules. However, the assembly might look inconsistent because of optimizations or even incorrect, as it might manipulate portions of registers that should be unused. |