3.7 Another Look at Roundoff Errors | Java Number Cruncher: The Java Programmers Guide to Numerical Computing

	Java Number Cruncher: The Java Programmer's Guide to Numerical Computing By Ronald Mak
	Table of Contents

	Chapter 3. The Floating-Point Standard

3.7 Another Look at Roundoff Errors

In Chapter 1, we saw that the float value of printed as 0.33333334. What happens if we assign the float value to a double variable and then print the double variable's value? Will we get 0.3333333400000000 ? Actually, what we get is 0.3333333432674408. Where did the last eight digits come from? Are they random garbage? Program 3-4 attempts to find some answers. See Listing 3-4.

Listing 3-4 Roundoff errors of the number .

 package numbercruncher.program3_4; import numbercruncher.mathutils.IEEE754; /**  * PROGRAM 3-4: One Third  *  * Investigate the floating-point representation of 1/3.  */ public class OneThird {     public static void main (String args[])     {         float  fThird     = 1/3f;         double dConverted = fThird;         double dThird     = 1/3d;         System.out.println("          Float 1/3 = " + fThird);         System.out.println("Converted to double = " + dConverted);         System.out.println("         Double 1/3 = " + dThird);         IEEE754 ieeeFThird     = new IEEE754(fThird);         IEEE754 ieeeDConverted = new IEEE754(dConverted);         IEEE754 ieeeDThird     = new IEEE754(dThird);         ieeeFThird.print();         ieeeDConverted.print();         ieeeDThird.print();         // Prepend the leading 0 bits of the converted 1/3.         int    unbiased = ieeeDConverted.unbiasedExponent();         String bits     = "1" + ieeeDConverted.fractionBits();         while (++unbiased < 0) bits = "0" + bits;         // Sum the indicated negative powers of 2.         double sum   = 0;         double power = 0.5;         for (int i = 0; i < bits.length(); ++i) {             if (bits.charAt(i) == '1') sum += power;             power /= 2;         }         System.out.println();         System.out.println("Converted 1/3 by summation = " + sum);     } }

Output:

 Float 1/3 = 0.33333334 Converted to double = 0.3333333432674408          Double 1/3 = 0.3333333333333333 ------------------------------ float value = 0.33333334 sign=0, exponent=01111101 (biased=125, normalized, unbiased=-2) significand=1.01010101010101010101011 ------------------------------ double value = 0.3333333432674408 sign=0, exponent=01111111101 (biased=1021, normalized, unbiased=-2) significand=1.0101010101010101010101100000000000000000000000000000 ------------------------------ double value = 0.3333333333333333 sign=0, exponent=01111111101 (biased=1021, normalized, unbiased=-2) significand=1.0101010101010101010101010101010101010101010101010101 Converted 1/3 by summation = 0.3333333432674408

Let's examine . Its unbiased exponent value is -2, so we need to shift the implied point of its significand two places to the left, giving us the value 0.0101010101010101010101011 in base 2. We can verify it by doing base 2 division:

graphics/03equ23.gif

The IEEE 754 representation of rounded up the last bit from 0 to 1, thus introducing a very small positive error. We see this error as the final digit 4 (instead of 3) when we print out the float value.

Program 3-4's output also shows what really happens when we convert the float value to a double. This widening operation is exact: It appends 29 (= 53 - 24) zero bits at the right. But when we converted that double value to a decimal number for printing, we didn't get eight decimal zeros at the end; instead, we got what appear to be the garbage digits. For comparison, Program 3-4 also computes and displays the double value of .

In fact, though, that "garbage" is quite valid, as the latter part of the program demonstrates . Using the binary representation of , we add the indicated negative powers of 2. The printed sum matches what was printed for the converted value.

As we saw in Chapter 1, a roundoff error occurs when an exact value, such as , lies between two representable floating-point values. How does Java decide which of the two floating-point values to choose? In the case of the float representation of , how did Java decide the last bit should be 1 instead of 0?

When an exact values lies between two representable floating-point values, Java picks the floating-point value that is closest to the exact value. If the exact value lies exactly halfway between two floating-point values, Java picks the floating-point value whose least significant (rightmost) bit is 0. This corresponds to the default rounding mode called round to nearest in the IEEE 754 standard.

The IEEE 754 standard defines several other rounding modes for floating-point: round down, round up, and round toward zero. Once again, Java deviates from the standard. Java does not implement the nondefault rounding modes defined by the standard. If you want your Java program to use these other rounding modes, you can write methods that emulate the floating-point operations with the desired modes, or your program can invoke floating-point routines written in other languages.

Top