Hack 99. Track Your Approximations | Perl Hacks: Tips & Tools for Programming, Debugging, and Surviving

Avoid rounding errors; get the right results.

Floating-point numbers are inherently approximate. Perl represents numbers in the most accurate way it can on your hardware, but typically it can't do better than about 16 significant digits. That means that a calculation you think is accurate:

my $dx   = 21123059652674105965267410596526741059652674100000000; my $rate = 1.23e12; my $end  = ( 23 * $dx - $rate * 230 - 2.34562516 ** 2 - 0.5 ) ** 0.33;

may in fact not be precise.

On a 32-bit machine, the various floating-point approximations introduced along the way may mean the final value of $end is inaccurate by about 937. Of course, the correct value of $end is approximately 520642400412471062.6461124479995125761153. If $end is your annual profit margin, you may not really care if it's off by a thousand dollars or so. On the other hand, if the same calculation were a trajectory for a lunar lander, turning off the retro-rockets 937 meters too high might matter a great deal.

How can you make sure your calculations aren't fatally inaccurate? The easiest way is to let Perl do it for you. To accomplish that, you need to change the way floating-point numbers work. Easy.

Interval interlude

Interval arithmetic is a simple technique for tracking the accuracy of numeric computations. Instead of representing each value as a single number, encode it as a range: minimum possible value to maximum possible value.

For example, most platforms can't represent the number 1.2345678901234567890 exactly. Interval arithmetic would encode it as the range [1.23456789012345, 1.23456789012346], assuming those two values are the closest lower and upper bounds that your machine can represent. On the other hand, if your machine could represent the number 1.23456, then it would encode it as [1.23456, 1.23456], with identical lower and upper bounds as there is no uncertainty about the actual value.

Once every number is properly encoded, any unary operations on the number are applied to both the minimal and maximal values, producing a new range that encodes the result. For example, the operation:

sqrt( [1.2, 1.3] )

yields:

[1.095445, 1.140175]

being the square roots of the minimal and maximal values. Logically, the true square root of the true value must lie somewhere in that new range.

Binary operations are more complex. They have to be performed on every possible combination of the minimal and maximal values of each operand. Then the minimal and maximal outcomes are used as the encoding of the result. For example, the multiplication:

[1.2, 1.3] * [-1, 0.9]

produces the result:

[-1.3, 1.17]

because:

1.2 *  -1  -1.2 1.3 *  -1  -1.3   (minimal value) 1.2 * 0.9   1.08 1.3 * 0.9   1.17  (maximal value)

The advantage of interval arithmetic is that, provided you're careful about rounding errors, the exact result is always guaranteed to lie somewhere in the interval you produce. Intervals also make it easy to estimate how accurate your computation is: the smaller the interval, the more precise the answer.

You can also think of intervals as: average value( half-interval). So you could also write the operation:

[1.2, 1.3] * [-1, 0.9]    [-1.3, 1.17]

as:

1.25(0.05) * -0.05(0.95)    -0.065(1.235)

This representation gives a clear indication of how the accuracy of the approximation changes under different operations, and how the uncertainty in the result grows over time.

Teaching Perl to think in intervals

To make Perl track the accuracy of your floating-point calculations, you first have to convince it to represent every floating point number as an interval:

package Number::Intervals; # Compute maximal error in the representation of a given number... sub _eps_for {     my ($num, $epsilon) = (shift) x 2;              # copy arg to both vars     $epsilon /= 2 while $num + $epsilon/2 != $num;  # whittle epsilon down     return $epsilon; } # Create an interval object, allowing for representation errors... sub _interval {     use List::Util qw( min max );     my ($min, $max) = ( min(@_), max(@_) );     return bless [$min - _eps_for($min), $max + _eps_for($max)], __PACKAGE__; } # Convert all floating-point constants to interval objects... sub import {     use overload;     overload::constant(         float => sub         {             my ($raw, $cooked) = @_;             return _interval($cooked);         },     ); }

When your code uses Number::Intervals, its import( ) will call the constant( ) subroutine from the standard overload module. That subroutine does exactly what its name suggests: it overloads the standard handling of constants with new behaviors. In this case, you're overloading the handling of floating-point constants by providing a subroutine that will be called every time a literal floating-point value appears in your program.

That handler subroutine receives two arguments: a string containing the raw source code that defined the constant ($raw), and a numeric value that is how Perl would normally interpret that source code definition ($cooked). The subroutine should return an object to use instead of the constant value.

In this instance, the handler just returns the corresponding interval object for the cooked value, as provided by _interval($cooked). That function determines the minimal and maximal values it receives and uses them as the lower and upper bounds of the resulting range. Note that it also subtracts the smallest possible amount (_eps_for($min)) from the lower bound and adds the smallest possible amount (_eps_for($max)) to the upper bound.

Adding and subtracting these epsilon values doesn't produce the smallest possible interval representing the original value, but the interval it does produce has three essential advantages: it's trivial to compute, it's guaranteed to correctly bound any number passed to _interval( ) (regardless of the rounding scheme your floating-point implementation uses), and it still produces the second-smallest possible interval representing the original number.

Of course, this isn't the end of the story. Depending on how you use Number::Intervals objects, you may get the wrong results. You can fix that too, though, in "Overload Your Operators" [Hack #100].