Practical C++ Programming
Authors: Oualline S.
Published year: 2003
Pages: 206-208/364
Buy this book on amazon.com >>
I l @ ve RuBoard

19.1 Floating-Point Format

Floating-point numbers consist of three parts : a sign, a fraction, and an exponent. Our fraction is expressed as a four-digit decimal. The exponent is a single-decimal digit. So our format is:

± f.fff x 10 ± e

where:

±

is the sign (plus or minus).

f.fff

is the four-digit fraction.

±e

is the single-digit exponent.

Zero is +0.000 x 10 +0 . We represent these numbers in "E" format: ± f.fff E ± e . This format is similar to the floating-point format used in many computers. The IEEE has defined a floating-point standard (#742), but not all machines use it.

Table 19-1 shows some typical floating-point numbers.

Table 19-1. Floating-point examples

Notation

Number

+1.000E+0

1.0

+3.300E+5

330000.0

-8.223E-3

-0.008223

+0.000E+0

0.0

The floating-point operations defined in this chapter follow a rigid set of rules. To minimize errors we make use of a guard digit . That is an extra digit added to the end of the fraction during computation. Many computers use a guard digit in their floating-point units.

I l @ ve RuBoard
I l @ ve RuBoard

19.2 Floating Addition/Subtraction

To add two numbers , such as 2.0 and 0.3, the computer must perform the following steps:

  1. Start with the numbers.

    +2.000E+0	The number is 2.0.
    +3.000E-1	The number is 0.3.
    
  2. Add guard digits to both numbers.

    +2.0000E+0	The number is 2.0.
    +3.0000E-1	The number is 0.3.
    
  3. Shift the number with the smallest exponent to the right one digit and increment its exponent. Continue until the exponents of the two numbers match.

    +2.0000E+0	The number is 2.0.
    +0.3000E-0	The number is 0.3.
    
  4. Add the two fractions. The result has the same exponent as the two numbers.

    +2.0000E+0	The number is 2.0.
    +0.3000E-0	The number is 0.3.
    _________________________________
    +2.3000E+0	The result is 2.3.
    
  5. Normalize the number by shifting it left or right until there is just one nonzero digit to the left of the decimal point. Adjust the exponent accordingly . A number like +0.1234E+0 would be normalized to +1.2340E-1. Because the number +2.3000E+0 is already normalized, we do nothing.

  6. Finally, if the guard digit is greater than or equal to 5, round the next digit up. Otherwise, truncate the number.

    +2.3000E+0	Round last digit.
    +2.300E+0	The result is 2.3.
    

To subtract a number:

  1. Change the sign of the second operand.

  2. Add.

I l @ ve RuBoard
I l @ ve RuBoard

19.3 Multiplication and Division

When we want to multiply two numbers , such as 0.12 x 11.0, the following rules apply:

  1. Start with the numbers:

    +1.200E-1	The number is 0.12.
    +1.100E+1	The number is 11.0.
    
  2. Add the guard digit.

    +1.2000E-1	The number is 0.12.
    +1.1000E+1	The number is 11.0.
    
  3. Multiply the two fractions and add the exponents (1.2 x 1.1 = 1.32, -1 + 1 = 0).

    +1.2000E-1	The number is 0.12.
    +1.1000E+1	The number is 11.0.
    __________________________________
    +1.320E+0	The result is 1.32.
    
  4. Normalize the result.

    +1.32000E+0	The number is 1.32.
    
  5. If the guard digit is greater than or equal to 5, round the next digit up. Otherwise, truncate the number.

    +1.3200E+0	The number is 1.32
    

Notice that in multiply, you didn't have to go through all that shifting. The rules for multiplication are a lot shorter than those for add as far as the computer hardware designers are concerned . Integer multiplication is a lot slower than integer addition. In floating point, multiplication speed is a lot closer to that of addition.

To divide numbers like 100.0 by 30.0, we must perform the following steps:

  1. Start with the numbers.

    +1.000E+2	The number is 100.0.
    +3.000E+1	The number is 30.0.
    
  2. Add the guard digit.

    +1.0000E+2	The number is 100.0.
    +3.0000E+1	The number is 30.0.
    
  3. Divide the fractions, and subtract the exponents.

    +1.0000E+2	The number is 100.0.
    +3.0000E+1	The number is 30.0.
    ___________________________________
    +0.3333E+1	The result is 3.333.
    
  4. Normalize the result.

    +3.3330E+0	The result is 3.333.
    
  5. If the guard digit is less than or equal to 5, round the next digit up. Otherwise, truncate the number.

    +3.333E+0	The result is 3.333.
    
I l @ ve RuBoard
Practical C++ Programming
Authors: Oualline S.
Published year: 2003
Pages: 206-208/364
Buy this book on amazon.com >>

Similar books on Amazon