Recipe18.16.Simulating Floating Point

Recipe 18.16. Simulating Floating Point

Credit: Raymond Hettinger

Problem

You need to simulate floating-point arithmetic in software, either to show to students the details of the various classic problems with floating point (e.g., representation error, loss of precision, failure of distributive, commutative, and associative laws), or to explore the numerical robustness of alternative algorithms.

Solution

We can reproduce every floating-point operation, with explicitly bounded precision, by coding a Python class that overloads all the special methods used in arithmetic operators:

prec = 8             # number of decimal digits (must be under 15) class F(object):     def _ _init_ _(self, value, full=None):         self.value = float('%.*e' % (prec-1, value))         if full is None:             full = self.value         self.full = full     def _ _str_ _(self):         return str(self.value)     def _ _repr_ _(self):         return "F(%s, %r)" % (self, self.full)     def error(self):         ulp = float('1'+('%.4e' % self.value)[-5:]) * 10 ** (1-prec)         return int(abs(self.value - self.full) / ulp)     def _ _coerce_ _(self, other):         if not isinstance(other, F):             return (self, F(other))         return (self, other)     def _ _add_ _(self, other):         return F(self.value + other.value, self.full + other.full)     def _ _sub_ _(self, other):         return F(self.value - other.value, self.full - other.full)     def _ _mul_ _(self, other):         return F(self.value * other.value, self.full * other.full)     def _ _div_ _(self, other):         return F(self.value / other.value, self.full / other.full)     def _ _neg_ _(self):         return F(-self.value, -self.full)     def _ _abs_ _(self):         return F(abs(self.value), abs(self.full))     def _ _pow_ _(self, other):         return F(pow(self.value, other.value), pow(self.full, other.full))     def _ _cmp_ _(self, other):         return cmp(self.value, other.value)

Discussion

The initializer of class F rounds the input value to the given precision (the global constant prec). This rounding produces what is known as representation error because the result is the nearest possible representable value for the specified number of digits. For instance, at three digits of precision, 3.527104 is stored as 3.53, so the representation error is 0.002896.

Since the underlying representation used in this recipe is Python's ordinary float, the simulation works only up to 15 digits (the typical limit for double-precision floating point). If you need more than 15 digits, you can use Python 2.4's decimal.Decimal type as the underlying representation. This way, you can get any precision you ask for, although the computation occurs in decimal rather than in binary. Alternatively, to get binary floating point with arbitrarily high precision, use the gmpy Python wrapper for the GMP (Gnu Multiple Precision) multiple-precision arithmetic library, specifically the gmpy.mpf type. One way or another, you need change only the two calls to float in this recipe's Solution into calls to Python 2.4's decimal.Decimal, or to gmpy.mpf (requesting the appropriate number of "digits" or bits), to use class F with higher precision than 15 digits. gmpy is at http://gmpy.sourceforge.net.

One key use of this recipe is to show to students the classic failure of associative, commutative, and distributive laws (Knuth, The Art of Computer Programming, vol. 2, pp. 214-15)for example:

# Show failure of the associative law u, v, w = F(11111113), F(-11111111), F(7.51111111) assert (u+v)+w == 9.5111111 assert u+(v+w) == 10 # Show failure of the commutative law for addition assert u+v+w != v+w+u # Show failure of the distributive law u, v, w = F(20000), F(-6), F(6.0000003) assert u*v == -120000 assert u*w == 120000.01 assert v+w == .0000003 assert (u*v) + (u*w) == .01 assert u * (v+w) == .006

The other main way to use the code in this recipe is to compare the numerical accuracy of different algorithms that compute the same results. For example, we can compare the following three approaches to computing averages:

def avgsum(data):       # Sum all of the elements, then divide     return sum(data, F(0)) / len(data) def avgrun(data):       # Make small adjustments to a running mean     m = data[0]     k = 1     for x in data[1:]:         k += 1         m += (x-m)/k    # Recurrence formula for mean     return m def avgrun_kahan(data): # Adjustment method with Kahan error correction term     m = data[0]     k = 1     dm = 0     for x in data[1:]:         k += 1         adjm = (x-m)/k - dm         newm = m + adjm         dm = (newm - m) - adjm         m = newm     return m

Here is a way to exercise these approaches and display their errors:

import random prec = 5 data = [F(random.random( )*10-5) for i in xrange(1000)] print '%s\t%s\t%s' %('Computed', 'ULP Error', 'Method') print '%s\t%s\t%s' %('--------', '---------', '------') for f in avgsum, avgrun, avgrun_kahan:     result = f(data)     print '%s\t%6d\t\t%s' % (result, result.error( ), f._ _name_ _) print '\n%r\tbaseline average using full precision' % result.full

Here is typical output from this snippet (the exact numbers in play will be different each time you run it, since what we are summing are random numbers):

Computed        ULP Error        Method --------        ---------        ------ -0.020086            15                avgsum -0.020061             9                avgrun -0.020072             1                avgrun_kahan -0.020070327734999997        baseline average using full precision

The last example demonstrates how to extract a full-precision floating-point result from an instance of F, by using the full attribute of the instance. This example is helpful for running an algorithm to full precision, as a baseline for seeing the effects of using less precision.

The full-precision result excludes the representation error in the "or"iginal inputs. For example, with prec = 3 and d = F(3.8761) / F(2.7181), d.full is 1.4264705882352939, the same result as regular division would yield, starting from the nearest representable values, 3.88 / 2.72. This helpful choice isolates accumulated floating-point operation errors from the artifacts of the "or"iginal data entry. This separation is reasonable because real floating-point systems have to start with representable constants; however, if the "or"iginal representation error has to be tracked, you can do so by entering the number twice in the call to Ffor example, use F(2.7181, 2.7181) rather than F(2.7181).

Recipe18.16.Simulating Floating Point