Vector Addition and Subtraction (Fixed Point)

For most of the number crunching in your games or tools you will most likely use single-precision floating-point. For artificial intelligence (AI) and other high-precision calculations, you may wish to use the higher precision double-precision, but it only exists in scalar form on the FPU, except for the case of the SSE2 or above, so functionality must be emulated in a sequential fashion whenever possible. But even with the higher precision, there is still a bit of an accuracy problem.

An alternative would be to use integer calculations in a fixed-point format of zero or more places. If the data size is large enough to contain the number, then there is no precision loss!

Pseudo Vec

These can get pretty verbose, as for fixed-point (integer) addition there would be support for 8-, 16-, and 32-bit data elements within a 128-bit vector and these would be signed and unsigned, with and without saturation. The interesting thing about adding signed and unsigned numbers , other than the carry or borrow , is that the resulting value will be exactly the same and thus the same equation can be used. This can be viewed in the following 8-bit example:

Unsigned

Hex

Signed

    95
+ 240
  335
C=1 79

  05Fh
+ 0F0h
C=1 04Fh
C=1 (79)

    95
+ 16
C=0 79
C=0 79

Notice that the resulting bits from the 8-bit calculation are all the same. Only the carry is different and the resulting bits are only interpreted as being signed or unsigned.

Pseudo Vec (x86)

Now let's examine these functions closer. MMX and SSE2 have the biggest payoff, as 3DNow! and SSE are primarily for floating-point support.

 mov    ebx,pbB    ; Vector B mov    eax,pbA    ; Vector A mov    edx,pbD    ; Vector Destination 

The following is a 16—8-bit addition but substituting a PSUBB for the PADDB will transform it into a subtraction.

vmp_paddB (MMX) 16—8-Bit

Listing 7-1: \chap07\pas\PAddX86M.asm
image from book
 movq    mm0,[ebx+0]    ; Read B Data {B  7  ...B   } movq    mm1,[ebx+8]    ;             {B  F  ...B  8  } movq    mm2,[eax+0]    ; Read A Data {A  7  ...A   } movq    mm3,[eax+8]    ;             {A  F  ...A  8  }   paddb   mm0,mm2        ; lower 64 bits {A  7  +B  7  ... A   +B   }   paddb   mm1,mm3        ; upper 64 bits {A  F  +B  F  ... A  8  +B  8  } movq    [edx+0],mm0 movq    [edx+8],mm1 
image from book
 

For SSE, it is essentially the same function wrapper, keeping in mind aligned memory MOVDQA versus non-aligned memory MOVDQU.

vmp_paddB (SSE2) 16—8-Bit

Listing 7-2: \chap07\pas\PAddX86M.asm
image from book
 movdqa xmm0,[ebx]     ; Read B Data {B  F  ...B   } movdqa xmm1,[eax]     ; Read A Data {A  F  ...A   }   paddb   xmm0,xmm1      ; {vA+vB} 128 bits {A  F  +B  F  ... A   +B   } movdqa [edx],xmm0     ; Write D Data 
image from book
 

The following is a master substitution table for change of functionality, addition versus subtraction (inclusive/exclusive of saturation).

 

Add

Sub

Add

Sub

Add

Sub

8-bit

paddb

psubb

paddsb

psubsb

paddusb

psubusb

16-bit

paddw

psubw

paddsw

psubsw

paddusw

psubusw

32-bit

paddd

psubd

       

64-bit

paddq

psubq

       


32.64-Bit 80X86 Assembly Language Architecture
32/64-Bit 80x86 Assembly Language Architecture
ISBN: 1598220020
EAN: 2147483647
Year: 2003
Pages: 191

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net