Remember that this is not rocket science, and thus minor deviations will occur in the formulas since, for example, a single-precision float is only 32 bits in size . For higher precision, 64-bit double-precision or 80-bit double extended-precision floating-point should be used instead. These floating-point numbers are based upon a similarity to the IEEE 754 standards specification. Unfortunately the 80-bit version is only available in a scalar form on the 80x86's FPU and the 64-bit packed double-precision is only available on the SSE2 processor.
Most programmers only know a floating-point value from using a declaration such as float, double, real4, or real8, etc. They know that there is a sign bit that if set indicates the value is negative and if clear indicates the value is positive. That is typically about the limit of the programmer's knowledge, as floating-point is typically treated as a black box and they typically do not need to dig into it further.
For this book you will be required to understand a little bit more and that will be discussed in Chapter 8.