The formatting of numeric data as strings of ASCII characters is a common and important issue. When we discussed number systems in Chapter 1, we did not dwell on detailed methods for converting numbers from one radix to another. In this section, we develop one method for formatting unsigned decimal integers that illustrates several instructions and techniques presented in this chapter. If we start with a value such as FE16, how might we systematically convert this to 25410? If we divide the value by 10, we get a quotient of 25 and a remainder of 4. If we divide that first quotient by 10, we get a second quotient of 2 and a second remainder of 5. We can repeat this process until we get a quotient of 0 and a final remainder of 2. The sequence of computed remainders (4, 5, 2 in this example) comprises the digits of the base-10 result we want, but in the reverse of printed order. In order to put those characters into the proper order, we could use some form of last-in first-out (LIFO) stack. We briefly touched upon stack addressing in Chapter 4, and we shall describe stacks based on Itanium memory addressing in a later chapter. A buffer region set aside in the data segment, with appropriate address pointers, will meet the needs of the current example, regardless of endian order of storage. The program in Figure 6-5 contains, between the labels booth and nosign, a version of Booth's algorithm for multiplication, which was developed in Sections 6.5.1 and 6.5.2. The method of loop control, using the loop count register, is that of the DOTCLOOP program (Figure 5-2). Instructions at the label first are the usual initializations. Instructions at the label again initialize the values reused by each simulated division. Likewise, instructions at the label booth are the necessary loop initializations for multiplication by the reciprocal. The outer program loop runs once per remainder value in register r3 computed from the division operations on the previous quotient in register r20; it stops when the final quotient computed in register r9 is 0. The previous quotient is saved at the beginning of this loop. Algorithms of this type usually have such a requirement for passing the baton at each stage. The inner loop controlled by the loop count register ar.lc runs 64 times. The extra addition of X that occurs at label nosign makes the operation correct, on an unsigned basis, even when the multiplier (0.8) appears to be negative. An unsigned shift (shr.u) is used for the division by 8 in order to emphasize that this algorithm operates on an unsigned basis. The remainder is computed in a standard manner (dividend divisor x quotient), where appropriate shladd and add instructions perform the multiplication by 10 (Section 4.2.3). The or instruction converts each isolated remainder value (in the range 0 to 9) into its corresponding ASCII representation (0x30 to 0x39). Each newly produced byte is stored in front of its predecessor i.e., at a lower address and the st1 instruction uses postdecrementing. The first byte is stored at the address ZERO-1. When the program reaches done, the last byte has been stored at an address that is one larger than the value in register r16. The gdb debugger can print a null-terminated string using the /s specification if given the appropriate starting address: (gdb) break done Breakpoint 1 at 0x4000000000000700 (gdb) run Starting program: /house/user/bin/decnum Breakpoint 1, 0x4000000000000700 in done () (gdb) p/x $r16+1 $1 = 0x600000000000097d (gdb) x/s $r16+1 0x600000000000097d <BUF+13>: "1193046" (gdb) The printed string should be 1193046, the decimal equivalent of 0x123456. As a function for a larger program, DECNUM would probably return an address pointer to the first valid character in the string in memory. Figure 6-5 DECNUM: Multiplying by a reciprocal to perform integer division// DECNUM Convert integer to ASCII // This program converts a positive number from X1 // into a string of ASCII-encoded decimal digits // starting somewhere in buffer BUF, ending with ZERO. W = 64 // W = number width DOT8 = 0xcccccccccccccccd // 0.8 (approx.) .data // Declare storage .align 8 // Desired alignment X1: data8 0x123456 // Number to convert BUF: .skip 20*1 // Space for 20 digits ZERO: data1 0 // Null termination .text // Section for code .align 32 // Desired alignment .global main // These three lines .proc main // mark the mandatory main: // 'main' program entry .prologue// Leaf procedure can save .save ar.lc, r31 // the caller's ar.lc mov r31 = ar.lc // in a scratch register .body // Now we really begin... first: movl r21 = DOT8 // 0.8 reciprocal factor new: add r15 = @gprel(X1),r1;; // r15 -> input ld8 r9 = [r15] // Get number to convert movl r16 = ZERO-1;; // r16 -> output (backwards) again: mov r20 = r9 // Save previous quotient mov ar.lc = W-1 // Traversals minus one booth: mov r19 = 0 // Set bit n-1 to zero mov r8 = r21 // Get reciprocal factor (R) mov r9 = 0;; // Set L to zero cycle: and r22 = 0x1,r8;; // Isolate lowest bit of R xor r23 = r19,r22;; // r23 = whether to act cmp.ne p6,p0 = 0,r23 // p6 = whether to act mov r19 = r22;; // Bit n-1 for next time (p6) cmp.eq.unc p7,p8 = 0,r22;;// Add, subtract, nop? (p7) add r9 = r9,r20 // Add X to L (p8) sub r9 = r9,r20;; // Subtract X from L shrp r8 = r9,r8,1 // New R of shifted LR shr r9 = r9,1 // New L of shifted LR br.cloop.sptk.few cycle;; // More cycles? nosign: add r9 = r9,r20;; // Add X to L shr.u r9 = r9,3;; // r9 = quotient = r17/10 shladd r3 = r9,2,r9;; // r3 = 5*quotient add r3 = r3,r3;; // r3 = 10*quotient now sub r3 = r20,r3;; // r3 = remainder now or r3 = 0x30,r3;; // Make into ASCII char st1 [r16] = r3,-1 // Store the character cmp.ne p6,p0 = r9,r0 // If quotient not zero, (p6) br.cond.sptk.few again // repeat the cycle done: mov r8 = 0 // Signal all is normal mov ar.lc = r31 // Restore caller's ar.lc br.ret.sptk.many b0;; // Back to command line .endp main // Mark end of procedure The label new seems superfluous, but it has been carefully positioned just above the start of the program loop. Future examples will follow this design; truly global initializations (e.g., loading the fraction 0.8 into register r21 in DECNUM) are set apart from functional (re)initializations, such as preparing to obtain the next number to be processed in DECNUM at label new. Careful organization is required for any program that has a main, or outer, loop. Each iteration can prompt for case-specific input, process that case, produce output, and return to the prompt until an exit condition occurs. You may now have an appreciation for the generous number of Itanium processor registers. In DECNUM we have used more registers in one simple program than were available in many older architectures, such as IA-32. Finally, note that careful register assignments were made for consistency with variations of this program later in the book. |