4.8. Case Studies | Introduction to Java Programming-Comprehensive Version (6th Edition)

[Page 106 ( continued )]

4.7. Minimizing Numerical Errors

Numeric errors involving floating-point numbers are inevitable. This section discusses how to minimize such errors through an example.

Listing 4.5 presents an example that sums a series that starts with 0.01 and ends with 1.0 . The numbers in the series will increment by 0.01 , as follows : 0.01 + 0.02 + 0.03 and so on. The output of the program appears in Figure 4.7.

Figure 4.7. The program uses a for loop to sum a series from 0.01 to 1.0 in increments of 0.01 .

(This item is displayed on page 107 in the print version)

Listing 4.5. TestSum.java

(This item is displayed on pages 106 - 107 in the print version)

 1   import   javax.swing.JOptionPane; 2 3   public class   TestSum { 4   public static void   main(String[] args) {

[Page 107]

 5  // Initialize sum  6   float   sum =     ; 7 8  // Add 0.01, 0.02, ..., 0.99, 1 to sum  9    for   (   float   i =   0.01f   ; i <=   1.0f   ; i = i +   0.01f   )  10  sum += i;  11 12  // Display result  13 JOptionPane.showMessageDialog(   null   ,   "The sum is "   + sum); 14 } 15 }

The for loop (lines 9 “10) repeatedly adds the control variable i to the sum. This variable, which begins with 0.01 , is incremented by 0.01 after each iteration. The loop terminates when i exceeds 1.0 .

The for loop initial action can be any statement, but it is often used to initialize a control variable. From this example, you can see that a control variable can be a float type. In fact, it can be any data type.

The exact sum should be 50.50 , but the answer is 50.499985 . The result is not precise because computers use a fixed number of bits to represent floating-point numbers, and thus cannot represent some floating-point numbers exactly. If you change float in the program to double as follows, you should see a slight improvement in precision because a double variable takes sixty-four bits, whereas a float variable takes thirty-two bits.

  // Initialize sum     double    sum =     ;  // Add 0.01, 0.02, ..., 0.99, 1 to sum    for   (    double    i =   0.01   ; i <=   1.0   ; i = i +   0.01   ) sum += i;

However, you will be stunned to see that the result is actually 49.50000000000003 . What went wrong? If you print out i for each iteration in the loop, you will see that the last i is slightly larger than 1 (not exactly 1 ). This causes the last i not to be added into sum . The fundamental problem is that the floating-point numbers are represented by approximation . Errors commonly occur. There are two ways to fix the problem:

Minimizing errors by processing large numbers first.
Using an integer count to ensure that all the numbers are processed .
To minimize errors, add numbers from 1.0 , 0.99 , down to 0.1 , as follows:

  // Add 1, 0.99, ..., 0.01 to sum    for   (    double    i =   1.0   ; i >=   0.01   ; i = i -   0.01   ) sum += i;

[Page 108]

To ensure that all the items are added to sum , use an integer variable to count the items. Here is the new loop:

   double   currentValue =   0.01   ;   for    {   int   count =     ; count <   100   ; count++)  { sum += currentValue; currentValue +=   0.01   ; }

After this loop, sum is 50.50000000000003 .