| ||
For most of the number crunching in your games or tools you will most likely use single-precision floating-point. For artificial intelligence (AI) and other high-precision calculations, you may wish to use the higher precision double-precision, but it only exists in scalar form on the FPU, except for the case of the SSE2 or above, so functionality must be emulated in a sequential fashion whenever possible. But even with the higher precision, there is still a bit of an accuracy problem.
An alternative would be to use integer calculations in a fixed-point format of zero or more places. If the data size is large enough to contain the number, then there is no precision loss!
These can get pretty verbose, as for fixed-point (integer) addition there would be support for 8-, 16-, and 32-bit data elements within a 128-bit vector and these would be signed and unsigned, with and without saturation. The interesting thing about adding signed and unsigned numbers , other than the carry or borrow , is that the resulting value will be exactly the same and thus the same equation can be used. This can be viewed in the following 8-bit example:
Unsigned | Hex | Signed |
---|---|---|
95 | 05Fh | 95 |
Notice that the resulting bits from the 8-bit calculation are all the same. Only the carry is different and the resulting bits are only interpreted as being signed or unsigned.
Now let's examine these functions closer. MMX and SSE2 have the biggest payoff, as 3DNow! and SSE are primarily for floating-point support.
mov ebx,pbB ; Vector B mov eax,pbA ; Vector A mov edx,pbD ; Vector Destination
The following is a 16—8-bit addition but substituting a PSUBB for the PADDB will transform it into a subtraction.
movq mm0,[ebx+0] ; Read B Data {B _{ 7 } ...B _{ } } movq mm1,[ebx+8] ; {B _{ F } ...B _{ 8 } } movq mm2,[eax+0] ; Read A Data {A _{ 7 } ...A _{ } } movq mm3,[eax+8] ; {A _{ F } ...A _{ 8 } } paddb mm0,mm2 ; lower 64 bits {A _{ 7 } +B _{ 7 } ... A _{ } +B _{ } } paddb mm1,mm3 ; upper 64 bits {A _{ F } +B _{ F } ... A _{ 8 } +B _{ 8 } } movq [edx+0],mm0 movq [edx+8],mm1
For SSE, it is essentially the same function wrapper, keeping in mind aligned memory MOVDQA versus non-aligned memory MOVDQU.
movdqa xmm0,[ebx] ; Read B Data {B _{ F } ...B _{ } } movdqa xmm1,[eax] ; Read A Data {A _{ F } ...A _{ } } paddb xmm0,xmm1 ; {vA+vB} 128 bits {A _{ F } +B _{ F } ... A _{ } +B _{ } } movdqa [edx],xmm0 ; Write D Data
The following is a master substitution table for change of functionality, addition versus subtraction (inclusive/exclusive of saturation).
Add | Sub | Add | Sub | Add | Sub | |
---|---|---|---|---|---|---|
8-bit | paddb | psubb | paddsb | psubsb | paddusb | psubusb |
16-bit | paddw | psubw | paddsw | psubsw | paddusw | psubusw |
32-bit | paddd | psubd | ||||
64-bit | paddq | psubq |