Vector Addition and Subtraction (Fixed Point)   For most of the number crunching in your games or tools you will most likely use single-precision floating-point. For artificial intelligence (AI) and other high-precision calculations, you may wish to use the higher precision double-precision, but it only exists in scalar form on the FPU, except for the case of the SSE2 or above, so functionality must be emulated in a sequential fashion whenever possible. But even with the higher precision, there is still a bit of an accuracy problem.

An alternative would be to use integer calculations in a fixed-point format of zero or more places. If the data size is large enough to contain the number, then there is no precision loss!

Pseudo Vec

These can get pretty verbose, as for fixed-point (integer) addition there would be support for 8-, 16-, and 32-bit data elements within a 128-bit vector and these would be signed and unsigned, with and without saturation. The interesting thing about adding signed and unsigned numbers , other than the carry or borrow , is that the resulting value will be exactly the same and thus the same equation can be used. This can be viewed in the following 8-bit example:

Unsigned

Hex

Signed

95
+ 240
335
C=1 79

05Fh
+ 0F0h
C=1 04Fh
C=1 (79)

95
+ 16
C=0 79
C=0 79

Notice that the resulting bits from the 8-bit calculation are all the same. Only the carry is different and the resulting bits are only interpreted as being signed or unsigned.

Pseudo Vec (x86)

Now let's examine these functions closer. MMX and SSE2 have the biggest payoff, as 3DNow! and SSE are primarily for floating-point support.

mov    ebx,pbB    ; Vector B mov    eax,pbA    ; Vector A mov    edx,pbD    ; Vector Destination

The following is a 16—8-bit addition but substituting a PSUBB for the PADDB will transform it into a subtraction. movq    mm0,[ebx+0]    ; Read B Data {B  7  ...B   } movq    mm1,[ebx+8]    ;             {B  F  ...B  8  } movq    mm2,[eax+0]    ; Read A Data {A  7  ...A   } movq    mm3,[eax+8]    ;             {A  F  ...A  8  }   paddb   mm0,mm2        ; lower 64 bits {A  7  +B  7  ... A   +B   }   paddb   mm1,mm3        ; upper 64 bits {A  F  +B  F  ... A  8  +B  8  } movq    [edx+0],mm0 movq    [edx+8],mm1 For SSE, it is essentially the same function wrapper, keeping in mind aligned memory MOVDQA versus non-aligned memory MOVDQU. movdqa xmm0,[ebx]     ; Read B Data {B  F  ...B   } movdqa xmm1,[eax]     ; Read A Data {A  F  ...A   }   paddb   xmm0,xmm1      ; {vA+vB} 128 bits {A  F  +B  F  ... A   +B   } movdqa [edx],xmm0     ; Write D Data The following is a master substitution table for change of functionality, addition versus subtraction (inclusive/exclusive of saturation).

Sub

Sub

Sub

8-bit

psubb

psubsb

psubusb

16-bit

psubw

psubsw

psubusw

32-bit

psubd

64-bit

psubq 32/64-Bit 80x86 Assembly Language Architecture
ISBN: 1598220020
EAN: 2147483647
Year: 2003
Pages: 191

Similar book on Amazon