Vector Addition and Subtraction (Fixed Point) | 32/64-Bit 80x86 Assembly Language Architecture

For most of the number crunching in your games or tools you will most likely use single-precision floating-point. For artificial intelligence (AI) and other high-precision calculations, you may wish to use the higher precision double-precision, but it only exists in scalar form on the FPU, except for the case of the SSE2 or above, so functionality must be emulated in a sequential fashion whenever possible. But even with the higher precision, there is still a bit of an accuracy problem.

An alternative would be to use integer calculations in a fixed-point format of zero or more places. If the data size is large enough to contain the number, then there is no precision loss!

Pseudo Vec

These can get pretty verbose, as for fixed-point (integer) addition there would be support for 8-, 16-, and 32-bit data elements within a 128-bit vector and these would be signed and unsigned, with and without saturation. The interesting thing about adding signed and unsigned numbers , other than the carry or borrow , is that the resulting value will be exactly the same and thus the same equation can be used. This can be viewed in the following 8-bit example:

Unsigned	Hex	Signed
95 + 240 335 ^C=1 79	05Fh + 0F0h ^C=1 04Fh ^C=1 (79)	95 + 16 ^C=0 79 ^C=0 79

Notice that the resulting bits from the 8-bit calculation are all the same. Only the carry is different and the resulting bits are only interpreted as being signed or unsigned.

Pseudo Vec (x86)

Now let's examine these functions closer. MMX and SSE2 have the biggest payoff, as 3DNow! and SSE are primarily for floating-point support.

 mov    ebx,pbB    ; Vector B mov    eax,pbA    ; Vector A mov    edx,pbD    ; Vector Destination

The following is a 16—8-bit addition but substituting a PSUBB for the PADDB will transform it into a subtraction.

vmp_paddB (MMX) 16—8-Bit

Listing 7-1: \chap07\pas\PAddX86M.asm

 movq    mm0,[ebx+0]    ; Read B Data {B ₇ ...B  } movq    mm1,[ebx+8]    ;             {B _F ...B ₈ } movq    mm2,[eax+0]    ; Read A Data {A ₇ ...A  } movq    mm3,[eax+8]    ;             {A _F ...A ₈ }   paddb   mm0,mm2        ; lower 64 bits {A ₇ +B ₇ ... A  +B  }   paddb   mm1,mm3        ; upper 64 bits {A _F +B _F ... A ₈ +B ₈ } movq    [edx+0],mm0 movq    [edx+8],mm1

For SSE, it is essentially the same function wrapper, keeping in mind aligned memory MOVDQA versus non-aligned memory MOVDQU.

vmp_paddB (SSE2) 16—8-Bit

Listing 7-2: \chap07\pas\PAddX86M.asm

 movdqa xmm0,[ebx]     ; Read B Data {B _F ...B  } movdqa xmm1,[eax]     ; Read A Data {A _F ...A  }   paddb   xmm0,xmm1      ; {vA+vB} 128 bits {A _F +B _F ... A  +B  } movdqa [edx],xmm0     ; Write D Data

The following is a master substitution table for change of functionality, addition versus subtraction (inclusive/exclusive of saturation).

	Add	Sub	Add	Sub	Add	Sub
8-bit	paddb	psubb	paddsb	psubsb	paddusb	psubusb
16-bit	paddw	psubw	paddsw	psubsw	paddusw	psubusw
32-bit	paddd	psubd
64-bit	paddq	psubq