Practical C++ Programming Authors: Oualline S. Published year: 2003 Pages: 206-208/364
 I l @ ve RuBoard

### 19.1 Floating-Point Format

Floating-point numbers consist of three parts : a sign, a fraction, and an exponent. Our fraction is expressed as a four-digit decimal. The exponent is a single-decimal digit. So our format is:

± f.fff x 10 ± e

where:

±

is the sign (plus or minus).

f.fff

is the four-digit fraction.

±e

is the single-digit exponent.

Zero is +0.000 x 10 +0 . We represent these numbers in "E" format: ± f.fff E ± e . This format is similar to the floating-point format used in many computers. The IEEE has defined a floating-point standard (#742), but not all machines use it.

Table 19-1 shows some typical floating-point numbers.

##### Table 19-1. Floating-point examples

Notation

Number

+1.000E+0

1.0

+3.300E+5

330000.0

-8.223E-3

-0.008223

+0.000E+0

0.0

The floating-point operations defined in this chapter follow a rigid set of rules. To minimize errors we make use of a guard digit . That is an extra digit added to the end of the fraction during computation. Many computers use a guard digit in their floating-point units.

 I l @ ve RuBoard
 I l @ ve RuBoard

To add two numbers , such as 2.0 and 0.3, the computer must perform the following steps:

```+2.000E+0	The number is 2.0.
+3.000E-1	The number is 0.3.
```
2. Add guard digits to both numbers.

```+2.0000E+0	The number is 2.0.
+3.0000E-1	The number is 0.3.
```
3. Shift the number with the smallest exponent to the right one digit and increment its exponent. Continue until the exponents of the two numbers match.

```+2.0000E+0	The number is 2.0.
+0.3000E-0	The number is 0.3.
```
4. Add the two fractions. The result has the same exponent as the two numbers.

```+2.0000E+0	The number is 2.0.
+0.3000E-0	The number is 0.3.
_________________________________
+2.3000E+0	The result is 2.3.
```
5. Normalize the number by shifting it left or right until there is just one nonzero digit to the left of the decimal point. Adjust the exponent accordingly . A number like +0.1234E+0 would be normalized to +1.2340E-1. Because the number +2.3000E+0 is already normalized, we do nothing.

6. Finally, if the guard digit is greater than or equal to 5, round the next digit up. Otherwise, truncate the number.

```+2.3000E+0	Round last digit.
+2.300E+0	The result is 2.3.
```

To subtract a number:

1. Change the sign of the second operand.

 I l @ ve RuBoard
 I l @ ve RuBoard

### 19.3 Multiplication and Division

When we want to multiply two numbers , such as 0.12 x 11.0, the following rules apply:

```+1.200E-1	The number is 0.12.
+1.100E+1	The number is 11.0.
```

```+1.2000E-1	The number is 0.12.
+1.1000E+1	The number is 11.0.
```
3. Multiply the two fractions and add the exponents (1.2 x 1.1 = 1.32, -1 + 1 = 0).

```+1.2000E-1	The number is 0.12.
+1.1000E+1	The number is 11.0.
__________________________________
+1.320E+0	The result is 1.32.
```
4. Normalize the result.

```+1.32000E+0	The number is 1.32.
```
5. If the guard digit is greater than or equal to 5, round the next digit up. Otherwise, truncate the number.

```+1.3200E+0	The number is 1.32
```

Notice that in multiply, you didn't have to go through all that shifting. The rules for multiplication are a lot shorter than those for add as far as the computer hardware designers are concerned . Integer multiplication is a lot slower than integer addition. In floating point, multiplication speed is a lot closer to that of addition.

To divide numbers like 100.0 by 30.0, we must perform the following steps:

```+1.000E+2	The number is 100.0.
+3.000E+1	The number is 30.0.
```

```+1.0000E+2	The number is 100.0.
+3.0000E+1	The number is 30.0.
```
3. Divide the fractions, and subtract the exponents.

```+1.0000E+2	The number is 100.0.
+3.0000E+1	The number is 30.0.
___________________________________
+0.3333E+1	The result is 3.333.
```
4. Normalize the result.

```+3.3330E+0	The result is 3.333.
```
5. If the guard digit is less than or equal to 5, round the next digit up. Otherwise, truncate the number.

```+3.333E+0	The result is 3.333.
```
 I l @ ve RuBoard
 Practical C++ Programming Authors: Oualline S. Published year: 2003 Pages: 206-208/364