We shall now illustrate the very common operation of referring to successive entries in a list using vector components. Three-component vectors occur frequently in physics and engineering problems. In vector algebra, the scalar product of two vectors (also called the inner product, or the dot product) is the sum of products of corresponding components:
It makes sense to store the x-, y-, and z-components of each vector in adjacent information units. We will select word-length storage for components of two vectors, V and W , in our sample program (Figure 4-5), and the resulting scalar product, P, will be stored in a quad word. Figure 4-5 DOTPROD: An illustration of data access instructions// DOTPROD Scalar Product of 3-vectors // This program will compute the scalar product // of two three-element vectors V and W. .data // Declare storage .align 8 // Desired alignment P: .skip 8 // Space for product V: data2 -1,+3,+5 // Vx, Vy, Vz W: data2 -2,-4,+6 // Wx, Wy, Wz .text // Section for code .align 32 // Desired alignment .global main // These three lines .proc main // mark the mandatory main: // 'main' program entry .body // Now we really begin... first: movl r14 = V;; // Pointer for V movl r15 = W;; // Pointer for W movl r16 = P;; // Pointer for P mov r20 = 0;; // R20 = running sum ld2 r21 = [r14],2;; // Get Vx; bump pointer ld2 r22 = [r15],2;; // Get Wx; bump pointer pmpy2.r r21 = r21,r22;; // Compute Vx times Wx sxt4 r21 = r21;; // Extend 32 bits to 64 add r20 = r20,r21;; // Update the sum ld2 r21 = [r14],2;; // Get Vy; bump pointer ld2 r23 = [r15],2;; // Get Wy; bump pointer pmpy2.r r21 = r21,r22;; // Compute Vy times Wy sxt4 r21 = r21;; // Extend 32 bits to 64 add r20 = r20,r21;; // Update the sum ld2 r21 = [r14],2;; // Get Vz; bump pointer ld2 r22 = [r15],2;; // Get Wz; bump pointer pmpy2.r r21 = r21,r22;; // Compute Vz times Wz sxt4 r21 = r21;; // Extend 32 bits to 64 add r20 = r20,r21;; // Update the sum st8 [r16] = r20;; // Store computed product // No more components... done: mov r8 = 0;; // Signal all is normal br.ret.sptk.many b0 // Back to command line .endp main // Mark end of procedure Some computer architectures can map data structures using fixed offsets for the component values relative to a fixed base address for each vector i.e., (V, V+8, V+16). The Itanium ISA, on the other hand, offers only register indirect addressing. We chose registers r14, r15, and r16 to point to V , W , and the result P, respectively. Each component is expressed as a 2-byte word. We used ld2 instructions that also perform zero-extension in the destination register. We took advantage of postincrementing with the Itanium load and store instructions, since the x-, y-, and z-components of each vector are stored as successive words. (We did not remove the increment of 2 from the last set of load and store instructions. If we were to write a more general scheme utilizing a loop to compute the dot product of two N-component vectors, it would not be convenient to isolate the last component as a special case.) Each multiplication of two word-length components using the pmpy2.r instruction yields a product expressed as a double word in the destination register. We extended that intermediate product to 64 bits using the sxt4 instruction to ensure correct results. With this background, you should have little difficulty in following the flow of the entire calculation. Using the debugger on a Linux system, we could proceed as follows: L> gcc -Wall -O0 -o bin/dotprod dotprod.s L> gdb bin/dotprod [messages deleted here] (gdb) break done Breakpoint 1 at 0x40000000000005e0 (gdb) run Starting program: /home/user/bin/dotprod Breakpoint 1, 0x40000000000005e0 in done () (gdb) x/g &P 0x6000000000000770 <P>: 0x0000000000000014 (gdb) q The program is running. Exit anyway? (y or n) y L> The correct answer is ( 1 x 2) + (+3 x 4) + (+5 x +6) = (+2) + ( 12) + (+30) = 2010 = 1416. Alternatively, you could monitor the contents of registers r20 and r21 as you step through the sequence of instructions to the label done. Be attentive to the two's complement arithmetic operations. Using a label such as done, where output instructions would be inserted, works just as well in the HP-UX command-line environment: H> cc +DD64 -o bin/dotprod dotprod.s H> adb bin/dotprod adb> done:b adb> :r Process 9619 Thread 9728 Execed Breakpoint 1 set at address 0x4000980 main + 0xc0: > adds r8=0,r0 nop.f 0 nop.b 0;; Hit Breakpoint 1 at address 0x4000980 adb> P/jx P: 0x14 adb> q H> where P is the symbolic address for the quad word result in memory. In later chapters, we shall usually demonstrate the sample programs using either the GNU tools (Linux) or the HP-UX tools, but not both, in the interest of keeping the book concise and readable. |