6.6 DECNUM: Converting an Integer to Decimal Format | ItaniumR Architecture for Programmers. Understanding 64-Bit Processors and EPIC Principles

The formatting of numeric data as strings of ASCII characters is a common and important issue. When we discussed number systems in Chapter 1, we did not dwell on detailed methods for converting numbers from one radix to another. In this section, we develop one method for formatting unsigned decimal integers that illustrates several instructions and techniques presented in this chapter.

If we start with a value such as FE₁₆, how might we systematically convert this to 254₁₀? If we divide the value by 10, we get a quotient of 25 and a remainder of 4. If we divide that first quotient by 10, we get a second quotient of 2 and a second remainder of 5. We can repeat this process until we get a quotient of 0 and a final remainder of 2. The sequence of computed remainders (4, 5, 2 in this example) comprises the digits of the base-10 result we want, but in the reverse of printed order.

In order to put those characters into the proper order, we could use some form of last-in first-out (LIFO) stack. We briefly touched upon stack addressing in Chapter 4, and we shall describe stacks based on Itanium memory addressing in a later chapter. A buffer region set aside in the data segment, with appropriate address pointers, will meet the needs of the current example, regardless of endian order of storage.

The program in Figure 6-5 contains, between the labels booth and nosign, a version of Booth's algorithm for multiplication, which was developed in Sections 6.5.1 and 6.5.2. The method of loop control, using the loop count register, is that of the DOTCLOOP program (Figure 5-2).

Instructions at the label first are the usual initializations. Instructions at the label again initialize the values reused by each simulated division. Likewise, instructions at the label booth are the necessary loop initializations for multiplication by the reciprocal.

The outer program loop runs once per remainder value in register r3 computed from the division operations on the previous quotient in register r20; it stops when the final quotient computed in register r9 is 0. The previous quotient is saved at the beginning of this loop. Algorithms of this type usually have such a requirement for passing the baton at each stage.

The inner loop controlled by the loop count register ar.lc runs 64 times. The extra addition of X that occurs at label nosign makes the operation correct, on an unsigned basis, even when the multiplier (0.8) appears to be negative. An unsigned shift (shr.u) is used for the division by 8 in order to emphasize that this algorithm operates on an unsigned basis.

The remainder is computed in a standard manner (dividend divisor x quotient), where appropriate shladd and add instructions perform the multiplication by 10 (Section 4.2.3). The or instruction converts each isolated remainder value (in the range 0 to 9) into its corresponding ASCII representation (0x30 to 0x39).

Each newly produced byte is stored in front of its predecessor i.e., at a lower address and the st1 instruction uses postdecrementing. The first byte is stored at the address ZERO-1. When the program reaches done, the last byte has been stored at an address that is one larger than the value in register r16.

The gdb debugger can print a null-terminated string using the /s specification if given the appropriate starting address:

 (gdb) break done Breakpoint 1 at 0x4000000000000700 (gdb) run Starting program: /house/user/bin/decnum Breakpoint 1, 0x4000000000000700 in done () (gdb) p/x $r16+1 $1 = 0x600000000000097d (gdb) x/s $r16+1 0x600000000000097d <BUF+13>:     "1193046" (gdb)

The printed string should be 1193046, the decimal equivalent of 0x123456. As a function for a larger program, DECNUM would probably return an address pointer to the first valid character in the string in memory.

Figure 6-5 DECNUM: Multiplying by a reciprocal to perform integer division

 // DECNUM         Convert integer to ASCII // This program converts a positive number from X1 // into a string of ASCII-encoded decimal digits // starting somewhere in buffer BUF, ending with ZERO.          W        = 64              // W = number width          DOT8     = 0xcccccccccccccccd  // 0.8 (approx.)          .data                      // Declare storage          .align   8                 // Desired alignment X1:      data8    0x123456          // Number to convert BUF:     .skip    20*1              // Space for 20 digits ZERO:    data1    0                 // Null termination          .text                      // Section for code          .align   32                // Desired alignment          .global main               // These three lines          .proc   main               //  mark the mandatory main:                               //   'main' program entry          .prologue// Leaf procedure can save          .save    ar.lc, r31        //  the caller's ar.lc          mov      r31 = ar.lc       //   in a scratch register          .body                      // Now we really begin... first:   movl     r21 = DOT8        // 0.8 reciprocal factor new:     add      r15 = @gprel(X1),r1;; // r15 -> input          ld8      r9 = [r15]        // Get number to convert          movl     r16 = ZERO-1;;    // r16 -> output (backwards) again:   mov      r20 = r9          // Save previous quotient          mov      ar.lc = W-1       // Traversals minus one booth:   mov      r19 = 0           // Set bit n-1 to zero          mov      r8 = r21          // Get reciprocal factor (R)          mov      r9 = 0;;          // Set L to zero cycle:   and      r22 = 0x1,r8;;    // Isolate lowest bit of R          xor      r23 = r19,r22;;   // r23 = whether to act          cmp.ne   p6,p0 = 0,r23     // p6 = whether to act          mov      r19 = r22;;       // Bit n-1 for next time    (p6)  cmp.eq.unc p7,p8 = 0,r22;;// Add, subtract, nop?    (p7)  add      r9 = r9,r20       // Add X to L    (p8)  sub      r9 = r9,r20;;     // Subtract X from L          shrp     r8 = r9,r8,1      // New R of shifted LR          shr      r9 = r9,1         // New L of shifted LR          br.cloop.sptk.few cycle;;  // More cycles? nosign:  add      r9 = r9,r20;;     // Add X to L          shr.u    r9 = r9,3;;       // r9 = quotient = r17/10          shladd   r3 = r9,2,r9;;    // r3 = 5*quotient          add      r3 = r3,r3;;      // r3 = 10*quotient now          sub      r3 = r20,r3;;     // r3 = remainder now          or       r3 = 0x30,r3;;    // Make into ASCII char          st1      [r16] = r3,-1     // Store the character          cmp.ne   p6,p0 = r9,r0     // If quotient not zero,    (p6)  br.cond.sptk.few again     //  repeat the cycle done:    mov      r8 = 0            // Signal all is normal          mov      ar.lc = r31       // Restore caller's ar.lc          br.ret.sptk.many b0;;      // Back to command line          .endp    main              // Mark end of procedure

The label new seems superfluous, but it has been carefully positioned just above the start of the program loop. Future examples will follow this design; truly global initializations (e.g., loading the fraction 0.8 into register r21 in DECNUM) are set apart from functional (re)initializations, such as preparing to obtain the next number to be processed in DECNUM at label new. Careful organization is required for any program that has a main, or outer, loop. Each iteration can prompt for case-specific input, process that case, produce output, and return to the prompt until an exit condition occurs.

You may now have an appreciation for the generous number of Itanium processor registers. In DECNUM we have used more registers in one simple program than were available in many older architectures, such as IA-32. Finally, note that careful register assignments were made for consistency with variations of this program later in the book.