3.9 The Machine Epsilon We conclude this chapter by computing approximations to the machine epsilon for types float and double. is the largest positive value that, when added to 1, produces a sum that is equal to 1. In other words, we want, because of roundoff, 1 + = 1 to be true. Any larger value for would have 1 + > 1. Listing 3-5a shows Epsilon in package numbercruncher.mathutils. It computes the value of its two static class variables , floatEpsilon and doubleEpsilon, and it has methods to return each value. Listing 3-5a Class Epsilon, which computes and returns machine for the float and double types.package numbercruncher.mathutils; /** * Compute the machine epsilon for the float and double types, * the largest positive floating-point value that, when added to 1, * results in a value equal to 1 due to roundoff. */ public final class Epsilon { private static final float floatEpsilon; private static final double doubleEpsilon; static { // Loop to compute the float epsilon value. float fTemp = 0.5f; while (1 + fTemp > 1) fTemp /= 2; floatEpsilon = fTemp; // Loop to compute the double epsilon value. double dTemp = 0.5; while (1 + dTemp > 1) dTemp /= 2; doubleEpsilon = dTemp; }; /** * Return the float epsilon value. * @returns the value */ public static float floatValue() { return floatEpsilon; } /** * Return the double epsilon value. * @returns the value */ public static double doubleValue() { return doubleEpsilon; } } Program 3 §C5, shown in Listing 3-5b, imports both classes IEEE754 and Epsilon in order to decompose and print the machine values. Listing 3-5b Printing the decomposed machine values.package numbercruncher.program3_5; import numbercruncher.mathutils.IEEE754; import numbercruncher.mathutils.Epsilon; /** * PROGRAM 3-5: Print Machine Epsilon * * Decompose and print the machine epsilon * for the float and double types. */ public class PrintEpsilon { public static void main(String args[]) { (new IEEE754(Epsilon.floatValue())).print(); (new IEEE754(Epsilon.doubleValue())).print(); } } Output: ------------------------------ float value = 5.9604645E-8 sign=0, exponent=01100111 (biased=103, normalized, unbiased=-24) significand=1.00000000000000000000000 ------------------------------ double value = 1.1102230246251565E-16 sign=0, exponent=01111001010 (biased=970, normalized, unbiased=-53) significand=1.0000000000000000000000000000000000000000000000000000 In the float case, we have 1 + is
Since the significand has 24 bits, the round to nearest mode rounds the sum down to 1. Any value larger than would result in a sum greater than 1. The double case is similar, except with a 53-bit significand. Table 3-4 summarizes the epsilon values. One key point is that whenever we add two float numbers whose binary exponents differ by more than the bitsize of the significand, the smaller addend is lost in the roundoff. We'll discuss this further in Chapter 4. In subsequent chapters, we can use (or multiples of ) during computations as the upper bound on relative roundoff errors. Table 3-4. float and double epsilon values, which are the smallest floating-point values that, when added to 1, produce a sum that is greater than 1.
|
Top |