9.4 SORTSTR: Sorting Strings | ItaniumR Architecture for Programmers. Understanding 64-Bit Processors and EPIC Principles

A major task to which computers are applied is the sorting of numeric or string data. In the study of computer science, one encounters numerous sorting algorithms, each with its own strengths and weaknesses. The bubble sort is among the simplest sorting algorithms, and is an excellent choice to demonstrate use of the Itanium ISA for sorting tasks.

The SORTSTR program (Figure 9-4) sorts strings alphabetically according to the sequence of ASCII character codes. It resembles the SCANTERM program (Figure 9-3) in its input and output aspects; it also illustrates some of the principles of the byte-oriented instructions introduced in Chapter 6. A companion program that sorts numeric data is presented later in this chapter.

Figure 9-4 SORTSTR: Bubble sort for strings

 // SORTSTR      Bubble sort strings // This program can read 100 or fewer strings (each <= SIZE // characters) from the standard input, sort them using // the bubble sort algorithm, and display the sorted list.          SIZE    = 20             // Max size of a string          SIZEQ   = ((SIZE+7)/8)*8 // Round up to quad words          CASES   = 100            // Array length          .global gets, puts          .data                    // Declare storage          .align  8                // Desired alignment ARRAY:   .skip   CASES*SIZEQ      // Room for 100 strings PRMT:    stringz "Enter strings (null line to end)"          .text                    // Section for code          .align  32               // Desired alignment          .global main             // These three lines          .proc   main             //  mark the mandatory main:                             //   'main' program entry          .prologue 12,r32         // Mask for rp, ar.pfs only          alloc   loc0 = ar.pfs,0,5,3,0  // ins, locals, outs          .save   rp,loc1          // Must save return address          mov     loc1 = b0;;      //  to our caller          .body first:   add     out0 = @gprel(PRMT),gp  // out0 -> prompt          mov     loc2 = gp        // Save gp          br.call.sptk.many b0 = puts  // Unformatted output          mov     gp = loc2        // Restore gp          cmp4.lt p6,p0 = ret0,r0  // If any error,    (p6)  br.cond.sptk.few stop0;; //  go to handler (null)          mov     loc3 = 0         // loc3 counts strings          add     loc4 = @gprel(ARRAY),gp;; // loc4 -> store line:    mov     out0 = loc4      // out0 -> buffer          br.call.sptk.many b0 = gets  // Unformatted input          mov     gp = loc2        // Restore gp          cmp.eq  p6,p0 = ret0,r0  // If any error,    (p6)  br.cond.sptk.few stop1;; //  go to handler (null)          ld1     r22 = [loc4],SIZEQ-1;;// Get 1st character          cmp.eq  p6,p7 = 0,r22    // Null code marks end    (p6)  br.cond.spnt.few sort;;  //  of our work here          add     loc3 = 1,loc3    // Count one line          st1     [loc4] = r0,1;;  // Ensure null at end          cmp.gt  p6,p0 = CASES-1,loc3 // If storage is ok,    (p6)  br.cond.sptk.many line   //  then look for more sort:    add     r20 = -1,loc3;;  // r20 = outer count-down o_loop:  mov     r21 = r20        // r21 = inner count-down          add     r20 = -1,r20     // Count down for o_loop          add     r22 = @gprel(ARRAY),gp;; // r22 -> 1st          add     r23 = SIZEQ,r22;;// r23 -> 2nd string i_loop:  mov     r2 = r22         // r2 -> 1st string (temp)          mov     r3 = r23         // r3 -> 2nd string (temp)          add     r21 = -1,r21;;   // Count down for i_loop look:    ld1     r24 = [r2],1     // r24 = byte from 1st          ld1     r25 = [r3],1;;   // r25 = byte from 2nd          cmp.eq  p6,p0 = 0,r24    // Null code means    (p6)  br.cond.sptk.many adjust;; //  1,2 order is ok why:     cmp.eq  p6,p0 = r24,r25  // If equal at this byte,    (p6)  br.cond.spnt.few look;;  //  go look at the next          cmp.ltu p6,p0 = r24,r25  // If 1,2 order is ok,    (p6)  br.cond.spnt.many adjust //  go on to next strings          mov     r2 = r22         // r2 -> 1st string (temp)          mov     r3 = r23;;       // r3 -> 2nd string (temp)          mov     r26 = (SIZEQ/8)-1// r26 = quad word counter swap:    ld8     r27 = [r2]       // Get a quad word          ld8     r28 = [r3]       //  from each string          add     r26 = -1,r26;;   // Count down...          st8     [r2] = r28,8     // Swap the data          st8     [r3] = r27,8     //  between the locations          cmp.gt  p6,p0 = r26,r0   //  ...until whole strings    (p6)  br.cond.spnt.few swap;;  //      have been swapped adjust:  add     r22 = SIZEQ,r22  // Advance to next 1st          add     r23 = SIZEQ,r23  // Advance to next 2nd          cmp.gt  p6,p0 = r21,r0   // Still more work    (p6)  br.cond.sptk.many i_loop;; //  for inner loop to do?          cmp.gt   p6,p0 = r20,r0  // Still more work    (p6)  br.cond.sptk.many o_loop //  for outer loop to do?          add     loc4 = @gprel(ARRAY-SIZEQ),gp;; p_loop:  add     loc4 = SIZEQ,loc4;; // loc4 -> string          mov     out0 = loc4      // out0 -> format          add     loc3 = -1,loc3   // Count down for p_loop          br.call.sptk.many b0 = puts // C output function          mov     gp = loc2        // Restore gp          cmp4.lt p6,p0 = ret0,r0  // If any error,    (p6)  br.cond.sptk.few stop0;; //  go to handler (null)          cmp.gt  p6,p0 = loc3,r0  // Still more work    (p6)  br.cond.sptk.few p_loop  //  for print loop to do? stop0:                            // Output error stop1:                            // EOF or input error done:    mov     ret0 = 0         // Signal all is normal          mov     b0 = loc1        // Restore return address          mov     ar.pfs = loc0    // Restore caller's ar.pfs          br.ret.sptk.many b0;;    // Back to command line          .endp   main             // Mark end of procedure

Each string is allotted SIZE bytes, rounded up to SIZEQ, a multiple of 8 bytes. As the gets function does not ensure that each string has a null at the end, we always force a null byte in the last character position of every element in ARRAY.

A maximum of five local variables is needed to preserve entry context, and also to manage the input loop extending from line to sort. Program termination is determined by testing the first character of each newly entered string against the NUL character (ASCII 0x0). Rather than warn when ARRAY is filled, the program simply proceeds to the sorting phase with the maximum number of CASES (strings). Register loc3 retains the number of non-null strings entered.

The section starting at sort consists of outer and inner loops required for an implementation of the bubble sort algorithm. The outer loop starts at o_loop and is controlled by register r20. The inner loop starts at i_loop and is controlled by register r21. Registers r22 and r23 point to the active string elements in ARRAY, and registers r2 and r3 are temporary pointers initialized relative to r22 and r23, respectively. These loops surround two other separate loops, one to make comparisons and the other to perform interchanges of strings as they are sorted.

The loop starting at look inspects two adjacently stored strings, byte by byte, to determine whether they are already in proper order or must be exchanged. Bytes are loaded into registers r24 and r25 for inspection and comparison. If the first string is shorter (i.e., the null byte is detected), it should normally precede the other string. If the character from the first string comes earlier in the ASCII collating sequence than the character from the second, as determined by an unsigned comparison, the two strings are already in the proper order and program execution moves to the next pair of strings by advancing pointers at adjust.

The loop starting at swap exchanges two strings eight bytes at a time. This 64-bit technique, which requires quad word alignment and storage allotment of quad word multiples, is significantly faster than moving data by the byte.

In classic textbook explanations of the bubble sort algorithm, a theoretical temporary storage element, equivalent in capacity to the sorted entities, is used to demonstrate the operation:

 TEMP := A A := B B := TEMP

It would be a waste of effort to move entire strings twice. As we have pointers to each element (registers r2, r3), and the quad word values (registers r27, r28), we simply swap the address pointers. While this involves two temporary storage units (registers r27, r28), rather than one, these units are of a size that can be manipulated in a single cycle, whereas a string of undetermined length would require many more cycles to move.

Control of the various loop structures in this program has been designed with the fewest branch instructions and the smoothest possible logical flow. At adjust, the pointers are advanced by the quad word multiple SIZEQ allotted to each string. For the loop that prints the results, the storage pointer loc4 into ARRAY is preindexed by -SIZEQ in order to handle the first and subsequent traversals smoothly.

Strings begin at regularly spaced addresses and the null at the end is preserved. After the sorting phase, the simple loop starting at p_loop prints each string in order using the printf function:

 Sample input:  two                Two                TWO                three                ten                thirteen                twenty                one hundred Sample output: TWO                Two                one hundred                ten                thirteen                three                twenty                two

Note that this ASCII-oriented sort is not the same as the case-insensitive sorts and merges used by dictionaries and book indices.