| ||
The scalar multiplication of vectors is also a relatively simple matter for vector math instructions to handle, just like the scalar addition and subtraction of vectors. The trick is to replicate the scalar so it appears like a second vector.
This function multiplies a scalar with each element of a vector. A scalar has multiple uses but the primary is in the use of "scaling" a vector. A scalar of one would result in the same size . Two would double the length of the vector, etc.
void vmp_VecScale(vmp3DVector * const pvD, const vmp3DVector * const pvA, float fScalar) { pvD>x = pvA>x * fScalar; pvD>y = pvA>y * fScalar; pvD>z = pvA>z * fScalar; }
void vmp_QVecScale(vmp3DQVector * const pvD, const vmp3DQVector * const pvA, float fScalar) { pvD>x = pvA>x * fScalar; pvD>y = pvA>y * fScalar; pvD>z = pvA>z * fScalar; pvD>w = pvA>w * fScalar; }
mov eax,vA ; Vector A mov edx,vD ; Vector destination
The 32-bit scalar is unpacked into a pair and then treated similar to the vector multiplication of two vectors.
movd mm0,fScalar ; fScalar {0 s} punpckldq mm0,mm0 ; {s s} movq mm1,[eax] ; vA.xy {Ay Ax} movd mm2,(vmp3DVector PTR [eax]).z ; {0 Az} pfmul mm1,mm0 ; {Ays Axs} pfmul mm2,mm0 ; {0s Azs} movq [edx],mm1 ; {Ays Axs} movd (vmp3DVector PTR [edx]).z,mm2 ; { Azs}
The SSE version of the code is changed from a 64-bit load to a 128-bit load, but the principles remain the same.
pxor xmm1,xmm1 ; {0 0 0 0} movss xmm1,fScalar ; {0 0 0 s} movaps xmm2,[edx] ; {Dw # # #} movaps xmm0,[eax] ; vA.xyz# {# Az Ay Ax} shufps xmm1,xmm1,11000000b ; 3 0 0 0 {0 s s s} andps xmm2,OWORD PTR himsk32 ; {Dw 0 0 0} mulps xmm0,xmm1 ; {# Azs Ays Axs} andps xmm0,OWORD PTR lomsk96 ; {0 Azs Ays Axs} orps xmm0,xmm2 ; {Dw Azs Ays Axs} movaps [edx],xmm0 ; {Dw Azs Ays Axs}
movss xmm1,fScalar ; {0 0 0 s} movaps xmm0,[eax] ; vA.xyzw {Aw Az Ay Ax} shufps xmm1,xmm1,00000000b ; 0 0 0 0 {s s s s} mulps xmm0,xmm1 ; {Aws Azs Ays Axs} movaps [edx],xmm0 ; {Aws Azs Ays Axs}
What is the difference between a dot product and a cross product and what are their equations?
A dot product, also known as an inner product , of two vectors is the summation of the results of the product for each of their {XYZ} elements, thus resulting in a scalar. Not to oversimplify it, but this scalar is equal to 0 if the angle made up by the two vectors are perpendicular (=90), positive if the angle is acute (<90), and negative if the angle is obtuse (>90).
v ={v _{ 1 } , v _{ 2 } , v _{ 3 } } and w = {w _{ 1 } , w _{ 2 } , w _{ 3 } }
These are vectors that produce a scalar defined by v w when their products are combined. The dot product is represented by the following equation:
v w = v _{ 1 } w _{ 1 } + v _{ 2 } w _{ 2 } + v _{ 3 } w _{ 3 }
The equation resolves to the following simplified form:
D = A _{ x } B _{ x } + A _{ y } B _{ y } + A _{ z } B _{ z }
D = Ax*Bx + Ay*By + Az*Bz;
So as we have learned, we first write it in a high-level language before writing it in assembly code.
void vmp_DotProduct(float * const pfD, const vmp3DVector * const pvA, const vmp3DVector * const pvB) { *pfD = pvA>x * pvB>x + pvA>y * pvB>y + pvA>z * pvB>z; }
This is one of my favorite equations because it does not slice, dice, or chop, but it culls, it illuminizes, it simplifies , it cosineizes (not a real word, but you know what I mean). It is the Sledge-O-Matic!!! Well, not quite comedian Gallagher's watermelon disintegration kitchen utensil, but it does do many things and so it is just as useful.
From Figure 13-1 you will note that if the resulting scalar value is positive (+), the vectors are pointing in the same general direction. If zero (0), they are perpendicular to each other, and if negative (), they are pointed in opposite directions.
Before explaining further it should be pointed out that to keep 3D graphic algorithms as simple as possible the three vertices for each polygon should all be ordered in the same direction. For example, by using the left-hand rule and keeping all the vertices of a visible face in a clockwise direction, such as in Figure 13-2, back face culling will result. If all visible face surfaces use this same orientation, then if the vertices occur in a counterclockwise direction they are back faced and thus pointing away and need not be drawn, saving render time.
Contrarily, if polygons are arranged in a counterclockwise orientation, then the inverse occurs where a positive value is drawn and a negative value is culled. Keep in mind, however, that most software algorithms keep things in a clockwise orientation.
By calculating the dot product of the normal vector of the polygon with a vector between one of the polygon's vertices and the camera, it can be determined if the polygon is back facing and needs to be culled. A resulting positive value indicates that the face is pointed away, hence back facing and can be culled and not rendered. A negative value indicates a face oriented toward the camera and thus visible.
Another use for the dot product equation is that it is also the cosine of the angle. The cosine is returned by dividing the dot product by the product of the magnitudes of the two vectors. Note that v and w are vectors and that v and w are their magnitudes.
And using standard trigonometric formulas, such as:
1 = Cos ^{ 2 } + Sin ^{ 2 }
sine and other trigonometric results can be calculated.
So the good stuff is yet to come!
The 3DNow! instruction set uses the 64-bit MMX registers, but 64-bit memory alignment cannot be guaranteed . In this case, it is typically better to handle memory access as individual 32-bit floats then to unpack into 64-bit pairs, process, then save individually as 32 bit. The PFACC instruction is unique as it allows the hi/lo 32 bits to be summed with each other, within each of the vectors.
mov ebx,vB ; Vector B mov eax,vA ; Vector A mov edx,vD ; Vector destination movd mm0,(vmp3DVector PTR [ebx]).z ; {0 Bz} movd mm1,(vmp3DVector PTR [eax]).z ; {0 Az} movq mm2,[ebx] ; {By Bx} movq mm3,[eax] ; {Ay Ax} pfmul mm0,mm1 ; {00 BzAz} pfmul mm2,mm3 ; {ByAy BxAx} pfacc mm2,mm2 ; {ByAy+BxAx ByAy+BxAx} pfadd mm0,mm2 ; {ByAy+BxAx+0 ByAy+BxAx+BzAz} movd [edx],mm0 ; Save {ByAy+BxAx+BzAz}
The SSE instruction loads the 96-bit vector value using a 128-bit XMM register. The operation entails the multiplication of the {XYZ} pairs from both vectors. The data is swizzled to allow scalar additions, and then the 32-bit single-precision float scalar result is written to memory.
movaps xmm1,[ebx] ; vB.xyz# {# Bz By Bx} movaps xmm0,[eax] ; vA.xyz# {# Az Ay Ax} mulps xmm0,xmm1 ; {A#B# AzBz AyBy AxBx} movaps xmm1,xmm0 movaps xmm2,xmm0 unpckhps xmm0,xmm0 ; {A#B# A#B# AzBz AzBz} shufps xmm1,xmm1,11100001b ; {A#B# AzBz AxBx AyBy} addss xmm2,xmm0 ; {A#B# AzBz AxBx AzBz+AxBx} addss xmm2,xmm1 ; {A#B# AzBz AxBx AzBz+AxBx+AyBy} movss [edx],xmm2 ; Save {AzBz+AxBx+AyBy}
A cross product, also known as the outer product , of two vectors is a third vector perpendicular to the plane of the two original vectors. The two vectors define two sides of a polygon face and their cross product points away from that face.
v ={v _{ 1 } , v _{ 2 } , v _{ 3 } } and w = {w _{ 1 } , w _{ 2 } , w _{ 3 } } are vectors of a plane denoted by matrix R ^{ 3 } . The cross product is represented by the following equation:
The standard basis vectors are i =(1,0,0) j =(0,1,0) k =(0,0,1).
v — w = (v _{ 2 } w _{ 3 } v _{ 3 } w _{ 2 } ) i (v _{ 1 } w _{ 3 } v _{ 3 } w _{ 1 } ) j + (v _{ 1 } w _{ 2 } v _{ 2 } w _{ 1 } ) k
The equation resolves to the following simplified form:
D _{ x } = A _{ y } B _{ z } A _{ z } B _{ y } Dx = Ay*Bz Az*By;
D _{ y } = A _{ z } B _{ x } A _{ x } B _{ z } Dy = Az*Bx Ax*Bz;
D _{ z } = A _{ x } B _{ y } A _{ y } B _{ x } Dz = Ax*By Ay*Bx;
Note the following simple vector structure is actually 12 bytes, which will pose a data alignment problem for SIMD operations.
One method is to use individual single-precision floating-point calculations, of which you may already be familiar. With this in mind, examine the following simple C language function to implement it. Note the use of the temporary floats x , y to prevent the resulting solutions of each field {x,y,z} from affecting either source pvA or pvB in the case where the destination pvD is also a source.
void vmp_CrossProduct(vmp3DVector* const pvD, const vmp3DVector* pvA, const vmp3DVector* pvB) { float x, y; x = pvA>y * pvB>z pvA>z * pvB>y; y = pvA>z * pvB>x pvA>x * pvB>z; pvD>z = pvA>x * pvB>y pvA>y * pvB>x; pvD>x = x; pvD>y = y; }
The 3DNow! instruction set uses the 64-bit MMX registers, but 64-bit memory alignment cannot be guaranteed. In this case it is typically better to handle memory access as individual 32-bit floats than to unpack into 64-bit pairs, process, then save individually as 32 bit. This example is kind of big so there are extra blank lines to help separate the various logic stages and it is not optimized to make it more readable.
mov ebx,vB ; Vector B mov eax,vA ; Vector A mov edx,vD ; Vector destination movd mm0,(vmp3DVector PTR [ebx]).x ; vB.x {0 Bx} movd mm1,(vmp3DVector PTR [ebx]).y ; vB.y {0 By} movd mm2,(vmp3DVector PTR [ebx]).z ; vB.z {0 Bz} movd mm3,(vmp3DVector PTR [eax]).x ; vA.x {0 Ax} movd mm4,(vmp3DVector PTR [eax]).y ; vA.y {0 Ay} movd mm5,(vmp3DVector PTR [eax]).z ; vA.z {0 Az} pfmul mm4,mm0 ; vB.xy {0 AyBx} punpckldq mm0,mm1 ; {By Bx} movd mm1,(vmp3DVector PTR [eax]).y ; vA.y {Ay} movd mm6,(vmp3DVector PTR [ebx]).y ; vB.y {By} punpckldq mm2,mm2 ; {Bz Bz} punpckldq mm3,mm1 ; {Ay Ax} punpckldq mm5,mm5 ; {Az Az} pfmul mm2,mm3 ; vA.xy {BzAy BzAx} pfmul mm5,mm0 ; vB.xy {AzBy AzBx} pfmul mm6,mm3 ; vA.xy {0Ay ByAx} movq mm7,mm2 ; {BzAy BzAx} pfsub mm2,mm5 ; {BzAyAzBy BzAxAzBx} psrlq mm2,32 ; x@ {0 BzAyAzBy} pfsub mm5,mm7 ; y@ {AzByBzAy AzBxBzAx} pfsub mm6,mm4 ; z@ {00 ByAxAyBx} movd (vmp3DVector PTR [edx]).x,mm2 ; x=AyBzAzBy movd (vmp3DVector PTR [edx]).y,mm5 ; y=AzBxAxBz movd (vmp3DVector PTR [edx]).z,mm6 ; z=AxByAyBx
If you examine it closely you will notice the operations performed within each block and how they correlate to the generic C code that was provided.
The SSE instruction set uses the 128-bit XMM registers with MOVUPS instead of MOVAPS for unaligned memory. This function has also been unoptimized so as to make it more readable.
movaps xmm1,[ebx] ; vB.xyz# {# Bz By Bx} movaps xmm0,[eax] ; vA.xyz# {# Az Ay Ax} ; Crop the 4 ^{ th } (w) field andps xmm1,OWORD PTR lomsk96 ; {0 Bz By Bx} andps xmm0,OWORD PTR lomsk96 ; {0 Az Ay Ax} movaps xmm5,xmm1 movaps xmm6,xmm0 shufps xmm1,xmm1,11010010b ; 3 1 0 2 {0 By Bx Bz} shufps xmm0,xmm0,11001001b ; 3 0 2 1 {0 Ax Az Ay} shufps xmm6,xmm6,11010010b ; 3 1 0 2 {0 Ay Ax Az} shufps xmm5,xmm5,11001001b ; 3 0 2 1 {0 Bx Bz By} movaps xmm2,[edx] ; Get destination {Dw # # #} mulps xmm1,xmm0 mulps xmm5,xmm6 andps xmm2,OWORD PTR himsk32 ; {Dw 0 0 0} subps xmm1,xmm5 ; { 0 z y x} orps xmm1,xmm2 ; [Dw z y x} movups [edx],xmm1 ; vD.wxyz {Dw z y x}