Getting into the Guts of Floating-Point Numbers | Creating Games in C++: A Step-by-Step Guide

By definition, floating-point numbers have fractional parts. When a program stores a floating-point number in a variable, it actually stores two pieces of information; the significant digits and the exponent.

The significant digits in a floating-point number are the digits of the number that most significantly affect its value. For example, the number in Figure 9.1 contains a lot of digits. However, not all of them are significant.

Figure 9.1. The significant digits of a floating-point number.

As Figure 9.1 indicates, the leftmost seven digits of the floating-point number are the most significant. Anything to the right of those digits doesn't affect the value of the number much. This can lead to both important advantages and important problems. I'll talk about why in just a few moments.

Note

Seven significant digits is typical on most personal computers.

As I already mentioned, floating-point numbers store the most significant digits and an exponent. Figure 9.2 illustrates how programs would store the number in Figure 9.1.

Figure 9.2. The most significant digits and an exponent.

When you multiply a number by a power of 10, you move the decimal point. Multiplying any number by 10⁴ moves the decimal point right four places. As a result, the number 1.234567x10⁴ is the same as 12345.67.

If we take the leftmost seven digits of the number in Figure 9.1 as the most significant, then 12345.67 is approximately the same as 12345.678901234567890123456789. The difference is small. To make things easy for the computer, the decimal point is always assumed to be after the leftmost digit, as shown in Figure 9.2. To put into the correct place, you multiply it by a power of 10. In the case of Figure 9.2, the number must be multiplied by 10⁴ to get the decimal point into the correct spot. So when a computer stores a floating-point number, it stores the significant digits, in this case 1.234567, and the exponent, which is 10⁴ in this example.

Okay, I lied.

Note

Multiplying by a negative power of 10 moves the decimal point to the left.

There is one optimization that the computer makes when it stores floating-point numbers. If both we and the computer know that we're always storing the significant digits and a power of 10, then we really don't need to store the 10. Instead, the computer can just store the exponent or power that the 10 is raised to. In the case of Figure 9.2, the computer wouldn't store 10⁴; it would just store the 4. In reality, it's the 4 that's the exponent, and the 10 is just assumed. That's why we say that the computer stores the most significant digits and an exponent. Figure 9.3 shows what really gets stored in a floating-point variable.

Figure 9.3. The truth about floating-point numbers.

When the computer uses the number in Figure 9.3, it understands what the significant digits are for and it knows that the exponent is a power of 10. It also understands that in order to use the floating-point number, it must raise 10 by the exponent and then multiply the result by the significant digits. It does all of this automatically for you whenever you use floating-point numbers.

You may be wondering why you have to understand how C++ implements floating-point numbers. The answer is that this knowledge is necessary to avoid potential problems that can crop up when using floating-point numbers. To find out what they are, keep reading.

Note

Of course I'm using the words understands and knows in the preceding paragraph figuratively. Computers, in spite of what we see in the movies, are incapable of understanding or knowing anything.

Floating-Point Numbers and Precision Errors

If computers don't store floating-point numbers exactly, there's a potential problem. Let's go back to the number 12345.678901234567890123456789. If you declare a variable of type float in your program, the program stores 12345.678901234567890123456789 as 1.234567 and 4. It throws away all but the seven most significant digits. What if you need those digits for your calculation to be correct?

This type of problem is called a precision error. It's possible for floating-point variables to lack the precision needed to perform the calculation correctly. In other words, it's possible that the digits are not significant to the computer but are very significant to you.

Precision errors can also occur if the significant digits are too far apart. Take the number 900000.00000012. If your game stores that number in a variable of type float, it will contain 9.0 and 5. That's 9.0x10⁵. The .00000012 got thrown away.

How can you fix precision errors?

The answer is that you can use a bigger data type. For example, on most personal computers, the float data type currently contains about seven significant digits. If you need more significant digits, then you can use a double instead. The C++ specification states that a double is no shorter than a float. Usually, it is implemented as twice the number of bits as a float. If so, a variable of type double contains about 15 significant digits. I doubt you'll ever need anything more precise than that for your games.

Warning

Using a larger data type means that your program performs calculations more slowly and uses more memory. However, in most instances, you will probably not be able to notice the difference.

Floating-Point Numbers and Rounding Errors

If you divide 2.0/3.0, which is a floating-point calculation, what do you get?

You and I know the answer is 0.66666666666666, with the 6's repeating infinitely. However, computers don't have a way of storing an infinite number of 6's. When the computer has to represent an infinite number of floating-point digits in a fixed amount of memory, it rounds. As a result, 0.66666666666666 becomes 0.6666667.

Depending on the calculations your game is doing, rounding can cause errors. This is particularly true when your game is simulating advanced physics, as many 3D games do these days.

The fix for rounding errors is the same as the fix for precision errors. You use a larger data type. The program still rounds when it has to. However, using a larger data type, such as double, means a greater number of digits in the numbers. Therefore, your results are more accurate even if they are still rounded.