2.7 Sign Extension, Zero Extension, and Contraction


2.7 Sign Extension, Zero Extension, and Contraction

Many modern high-level programming languages allow you to use expressions involving integer objects with differing sizes. So what happens when your two operands in an expression are of different sizes? Some languages will report an error, other languages will automatically convert the operands to a common format. This conversion, however, is not free, so if you don't want your compiler going behind your back and automatically inserting conversions into your otherwise great code, you should be aware of how compilers deal with such expressions.

With the two's complement system, a single negative value will have different representations depending on size of the representation. You cannot arbitrarily use an 8-bit signed value in an expression involving a 16-bit number; a conversion will be necessary. This conversion, and its converse (converting a 16-bit value to 8 bits) are the sign extension and contraction operations.

Consider the value ˆ’ 64. The 8-bit two's complement value for this number is $C0. The 16-bit equivalent of this number is $FFC0. Clearly, these are not the same bit pattern. Now consider the value +64. The 8- and 16-bit versions of this value are $40 and $0040, respectively. It should be obvious that extending the size of negative values is done differently than extending the size of non-negative values.

To sign extend a value from some number of bits to a greater number of bits is easy - just copy the sign bit into the additional HO bits in the new format. For example, to sign extend an 8-bit number to a 16-bit number, simply copy bit seven of the 8-bit number into bits 8..15 of the 16-bit number. To sign extend a 16-bit number to a double word, simply copy bit 15 into bits 16..31 of the double word.

You must use sign extension when manipulating signed values of varying lengths. For example, when adding a byte quantity to a word quantity, you will need to sign extend the byte to 16 bits before adding the two numbers . Other operations may require a sign extension to 32 bits.

Table 2-5: Sign Extension Examples

8 Bits

16 Bits

32 Bits

Binary (Two's Complement)

$80

$FF80

$FFFF_FF80

%1111_1111_1111_1111_1111_1111_1000_0000

$28

$0028

$0000_0028

%0000_0000_0000_0000_0000_0000_0010_1000

$9A

$FF9A

$FFFF_FF9A

%1111_1111_1111_1111_1111_1111_1001_1010

$7F

$007F

$0000_007F

%0000_0000_0000_0000_0000_0000_0111_1111

n/a

$1020

$0000_1020

%0000_0000_0000_0000_0001_0000_0010_0000

n/a

$8086

$FFFF_8086

%1111_1111_1111_1111_1000_0000_1000_0110

When processing unsigned binary numbers, zero extension lets you convert small unsigned values to larger unsigned values. Zero extension is very easy- just store a zero in the HO byte(s) of the larger operand. For example, to zero extend the 8-bit value $82 to 16 bits you just insert a zero for the HO byte yielding $0082.

Table 2-6: Zero Extension Examples

8 Bits

16 Bits

32 Bits

Binary

$80

$0080

$0000_0080

%0000_0000_0000_0000_0000_0000_1000_0000

$28

$0028

$0000_0028

%0000_0000_0000_0000_0000_0000_0010_1000

$9A

$009A

$0000_009A

%0000_0000_0000_0000_0000_0000_1001_1010

$7F

$007F

$0000_007F

%0000_0000_0000_0000_0000_0000_0111_1111

n/a

$1020

$0000_1020

%0000_0000_0000_0000_0001_0000_0010_0000

n/a

$8086

$0000_8086

%0000_0000_0000_0000_1000_0000_1000_0110

Many high-level language compilers automatically handle sign and zero extension. The following examples in C demonstrate how this works:

 signed char sbyte;        // Chars in C are byte values.  short int sword;          // Short integers in C are *usually* 16-bit values.  long int sdword;          // Long integers in C are *usually* 32-bit values.   . . .  sword = sbyte;       // Automatically sign extends the 8-bit value to 16 bits.  sdword = sbyte;      // Automatically sign extends the 8-bit value to 32 bits.  sdword = sword;      // Automatically sign extends the 16-bit value to 32 bits. 

Some languages (such as Ada) may require an explicit cast from a smaller size to a larger size. You'll have to check the language reference manual for your particular language to see if this is necessary. The advantage of a language that requires you to provide an explicit conversion is that the compiler never does anything behind your back. If you fail to provide the conversion yourself, the compiler emits a diagnostic message so you'll be made aware that your program will need to do additional work.

The important thing to realize about sign and zero extension is that they aren't always free. Assigning a smaller integer to a larger integer may require more machine instructions (taking longer to execute) than moving data between two like- sized integer variables. Therefore, you should be careful about mixing variables of different sizes within the same arithmetic expression or assignment statement.

Sign contraction, converting a value with some number of bits to the same value with a fewer number of bits, is a little more troublesome . Sign extension never fails. Given an m -bit signed value you can always convert it toan n -bit number (where n > m ) using sign extension. Unfortunately, given an n -bit number, you cannot always convert it to an m -bit number if m < n . For example, consider the value ˆ’ 448. As a 16-bit hexadecimal number, its representation is $FE40. Unfortunately, the magnitude of this number is too large to fit into eight bits, so you cannot sign contract it to eight bits.

To properly sign contract one value to another, you must look at the HO byte(s) that you want to discard. First, the HO bytes must all contain either zero or $FF. If you encounter any other values, you cannot sign contract the value. Second, the HO bit of your resulting value must match every bit you've removed from the number. Here are some examples of converting 16-bit values to 8-bit values:

 $FF80 (%1111_1111_1000_0000) can be sign contracted to  (%1000_0000). 
 $FF80 (%1111_1111_1000_0000) can be sign contracted to $80 (%1000_0000). $0040 (%0000_0000_0100_0000) can be sign contracted to $40 (%0100_0000). $FE40 (%1111_1110_0100_0000) cannot be sign contracted to 8 bits. $0100 (%0000_0001_0000_0000) cannot be sign contracted to 8 bits. 
40 (%0000_0000_0100_0000) can be sign contracted to (%0100_0000). $FE40 (%1111_1110_0100_0000) cannot be sign contracted to 8 bits. 00 (%0000_0001_0000_0000) cannot be sign contracted to 8 bits.

Contraction is somewhat difficult in a high-level language. Some languages, like C, will simply store the LO portion of the expression into a smaller variable and throw away the HO component (at best, the C compiler may give you a warning during compilation about the loss of precision that may occur). You can often quiet the compiler, but it still doesn't check for invalid values. Typically, you'd use code like the following to sign contract a value in C:

 signed char sbyte;    // Chars in C are byte values.  short int sword;      // Short integers in C are *usually* 16-bit values.  long int sdword;      // Long integers in C are *usually* 32-bit values.   . . .  sbyte = (signed char) sword;  sbyte = (signed char) sdword;  sword = (short int) sdword; 

The only safe solution in C is to compare the result of the expression to an upper and lower bounds value before attempting to store the value into a smaller variable. Unfortunately, such code tends to get unwieldy if you need to do this often. Here's what the preceding code might look like with these checks:

 if(sword >=   128 && sword <= 127)  {      sbyte = (signed char) sword;  }  else  {      // Report appropriate error.  }  // Another way, using assertions:  assert(sdword >=   128 && sdword <= 127)  sbyte = (signed char) sword;  assert(sdword >=   32768 && sdword <= 32767)  sword = (short int) sword; 

As you can plainly see, this code gets pretty ugly. In C/C++, you'd probably want to turn this into a macro ( #define ) or a function so your code would be a bit more readable.

Some high-level languages (such as Pascal and Delphi/Kylix) will automatically sign contract values for you and check the value to ensure it properly fits in the destination operation. [4] Such languages will raise some sort of exception (or stop the program) if a range violation occurs. Of course, if you want to take corrective action, you'll either need to write some exception handling code or resort to using an if statement sequence similar to the one in the C example just given.

[4] Borland's compilers require the use of a special compiler directive to activate this check. By default, the compiler does not do the bounds check.




Write Great Code. Understanding the Machine, Vol. 1
The Art of Assembly Language
ISBN: 1593270038
EAN: 2147483647
Year: 2003
Pages: 144
Authors: Randall Hyde

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net