12.3. Floating-Point Numbers

 < Free Open Study > 

The main consideration in using floating-point numbers is that many fractional decimal numbers can't be represented accurately using the 1s and 0s available on a digital computer. Nonterminating decimals like 1/3 or 1/7 can usually be represented to only 7 or 15 digits of accuracy. In my version of Microsoft Visual Basic, a 32-bit floating-point representation of 1/3 equals 0.33333330. It's accurate to 7 digits. This is accurate enough for most purposes but inaccurate enough to trick you sometimes.


Following are a few specific guidelines for using floating-point numbers:

Avoid additions and subtractions on numbers that have greatly different magnitudes With a 32-bit floating-point variable, 1,000,000.00 + 0.1 probably produces an answer of 1,000,000.00 because 32 bits don't give you enough significant digits to encompass the range between 1,000,000 and 0.1. Likewise, 5,000,000.02 - 5,000,000.01 is probably 0.0.

Cross-Reference

For algorithms books that describe ways to solve these problems, see "Additional Resources on Data Types" in Section 10.1.


Solutions? If you have to add a sequence of numbers that contains huge differences like this, sort the numbers first, and then add them starting with the smallest values. Likewise, if you need to sum an infinite series, start with the smallest term essentially, sum the terms backwards. This doesn't eliminate round-off problems, but it minimizes them. Many algorithms books have suggestions for dealing with cases like this.

Avoid equality comparisons Floating-point numbers that should be equal are not always equal. The main problem is that two different paths to the same number don't always lead to the same number. For example, 0.1 added 10 times rarely equals 1.0. The following example shows two variables, nominal and sum, that should be equal but aren't.

1 is equal to 2 for sufficiently large values of 1.

Anonymous

Java Example of a Bad Comparison of Floating-Point Numbers
 double nominal = 1.0;       <-- 1 double sum = 0.0; for ( int i = 0; i < 10; i++ ) {    sum += 0.1;       <-- 2 } if ( nominal == sum ) {       <-- 3    System.out.println( "Numbers are the same." ); } else {     System.out.println( "Numbers are different." ); }

(1)The variable nominal is a 64-bit real.

(2)sum is 10*0.1. It should be 1.0.

(3)Here's the bad comparison.

As you can probably guess, the output from this program is

Numbers are different.

The line-by-line values of sum in the for loop look like this:

0.1 0.2 0.30000000000000004 0.4 0.5 0.6 0.7 0.7999999999999999 0.8999999999999999 0.9999999999999999

Thus, it's a good idea to find an alternative to using an equality comparison for floating-point numbers. One effective approach is to determine a range of accuracy that is acceptable and then use a boolean function to determine whether the values are close enough. Typically, you'd write an Equals() function that returns true if the values are close enough and false otherwise. In Java, such a function would look like this:

Cross-Reference

This example is proof of the maxim that there's an exception to every rule. Variables in this realistic example have digits in their names. For the rule against using digits in variable names, see Section 11.7, "Kinds of Names to Avoid."


Java Example of a Routine to Compare Floating-Point Numbers
final double ACCEPTABLE_DELTA = 0.00001; boolean Equals( double Term1, double Term2 ) {    if ( Math.abs( Term1 - Term2 ) < ACCEPTABLE_DELTA ) {       return true;    }    else {       return false;    } }

If the code in the "bad comparison of floating-point numbers" example were converted so that this routine could be used for comparisons, the new comparison would look like this:

if ( Equals( Nominal, Sum ) ) ...

The output from the program when it uses this test is

Numbers are the same.

Depending on the demands of your application, it might be inappropriate to use a hard-coded value for ACCEPTABLE_DELTA. You might need to compute ACCEPTABLE_DELTA based on the size of the two numbers being compared.

Anticipate rounding errors Rounding-error problems are no different from the problem of numbers with greatly different magnitudes. The same issue is involved, and many of the same techniques help to solve rounding problems. In addition, here are common specific solutions to rounding problems:

  • Change to a variable type that has greater precision. If you're using single-precision floating point, change to double-precision floating point, and so on.

  • Change to binary coded decimal (BCD) variables. The BCD scheme is typically slower and takes up more storage space, but it prevents many rounding errors. This is particularly valuable if the variables you're using represent dollars and cents or other quantities that must balance precisely.

    Cross-Reference

    Usually the performance impact of converting to BCD will be minimal. If you're concerned about the performance impact, see Section 25.6, "Summary of the Approach to Code Tuning."


  • Change from floating-point to integer variables. This is a roll-your-own approach to BCD variables. You will probably have to use 64-bit integers to get the precision you want. This technique requires you to keep track of the fractional part of your numbers yourself. Suppose you were originally keeping track of dollars using floating point with cents expressed as fractional parts of dollars. This is a normal way to handle dollars and cents. When you switch to integers, you have to keep track of cents using integers and of dollars using multiples of 100 cents. In other words, you multiply dollars by 100 and keep the cents in the 0-to-99 range of the variable. This might seem absurd at first glance, but it's an effective solution in terms of both speed and accuracy. You can make these manipulations easier by creating a DollarsAndCents class that hides the integer representation and supports the necessary numeric operations.

Check language and library support for specific data types Some languages, including Visual Basic, have data types such as Currency that specifically support data that is sensitive to rounding errors. If your language has a built-in data type that provides such functionality, use it!

 < Free Open Study > 


Code Complete
Code Complete: A Practical Handbook of Software Construction, Second Edition
ISBN: 0735619670
EAN: 2147483647
Year: 2003
Pages: 334

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net