A major task to which computers are applied is the sorting of numeric or string data. In the study of computer science, one encounters numerous sorting algorithms, each with its own strengths and weaknesses. The bubble sort is among the simplest sorting algorithms, and is an excellent choice to demonstrate use of the Itanium ISA for sorting tasks. The SORTSTR program (Figure 9-4) sorts strings alphabetically according to the sequence of ASCII character codes. It resembles the SCANTERM program (Figure 9-3) in its input and output aspects; it also illustrates some of the principles of the byte-oriented instructions introduced in Chapter 6. A companion program that sorts numeric data is presented later in this chapter. Figure 9-4 SORTSTR: Bubble sort for strings// SORTSTR Bubble sort strings // This program can read 100 or fewer strings (each <= SIZE // characters) from the standard input, sort them using // the bubble sort algorithm, and display the sorted list. SIZE = 20 // Max size of a string SIZEQ = ((SIZE+7)/8)*8 // Round up to quad words CASES = 100 // Array length .global gets, puts .data // Declare storage .align 8 // Desired alignment ARRAY: .skip CASES*SIZEQ // Room for 100 strings PRMT: stringz "Enter strings (null line to end)" .text // Section for code .align 32 // Desired alignment .global main // These three lines .proc main // mark the mandatory main: // 'main' program entry .prologue 12,r32 // Mask for rp, ar.pfs only alloc loc0 = ar.pfs,0,5,3,0 // ins, locals, outs .save rp,loc1 // Must save return address mov loc1 = b0;; // to our caller .body first: add out0 = @gprel(PRMT),gp // out0 -> prompt mov loc2 = gp // Save gp br.call.sptk.many b0 = puts // Unformatted output mov gp = loc2 // Restore gp cmp4.lt p6,p0 = ret0,r0 // If any error, (p6) br.cond.sptk.few stop0;; // go to handler (null) mov loc3 = 0 // loc3 counts strings add loc4 = @gprel(ARRAY),gp;; // loc4 -> store line: mov out0 = loc4 // out0 -> buffer br.call.sptk.many b0 = gets // Unformatted input mov gp = loc2 // Restore gp cmp.eq p6,p0 = ret0,r0 // If any error, (p6) br.cond.sptk.few stop1;; // go to handler (null) ld1 r22 = [loc4],SIZEQ-1;;// Get 1st character cmp.eq p6,p7 = 0,r22 // Null code marks end (p6) br.cond.spnt.few sort;; // of our work here add loc3 = 1,loc3 // Count one line st1 [loc4] = r0,1;; // Ensure null at end cmp.gt p6,p0 = CASES-1,loc3 // If storage is ok, (p6) br.cond.sptk.many line // then look for more sort: add r20 = -1,loc3;; // r20 = outer count-down o_loop: mov r21 = r20 // r21 = inner count-down add r20 = -1,r20 // Count down for o_loop add r22 = @gprel(ARRAY),gp;; // r22 -> 1st add r23 = SIZEQ,r22;;// r23 -> 2nd string i_loop: mov r2 = r22 // r2 -> 1st string (temp) mov r3 = r23 // r3 -> 2nd string (temp) add r21 = -1,r21;; // Count down for i_loop look: ld1 r24 = [r2],1 // r24 = byte from 1st ld1 r25 = [r3],1;; // r25 = byte from 2nd cmp.eq p6,p0 = 0,r24 // Null code means (p6) br.cond.sptk.many adjust;; // 1,2 order is ok why: cmp.eq p6,p0 = r24,r25 // If equal at this byte, (p6) br.cond.spnt.few look;; // go look at the next cmp.ltu p6,p0 = r24,r25 // If 1,2 order is ok, (p6) br.cond.spnt.many adjust // go on to next strings mov r2 = r22 // r2 -> 1st string (temp) mov r3 = r23;; // r3 -> 2nd string (temp) mov r26 = (SIZEQ/8)-1// r26 = quad word counter swap: ld8 r27 = [r2] // Get a quad word ld8 r28 = [r3] // from each string add r26 = -1,r26;; // Count down... st8 [r2] = r28,8 // Swap the data st8 [r3] = r27,8 // between the locations cmp.gt p6,p0 = r26,r0 // ...until whole strings (p6) br.cond.spnt.few swap;; // have been swapped adjust: add r22 = SIZEQ,r22 // Advance to next 1st add r23 = SIZEQ,r23 // Advance to next 2nd cmp.gt p6,p0 = r21,r0 // Still more work (p6) br.cond.sptk.many i_loop;; // for inner loop to do? cmp.gt p6,p0 = r20,r0 // Still more work (p6) br.cond.sptk.many o_loop // for outer loop to do? add loc4 = @gprel(ARRAY-SIZEQ),gp;; p_loop: add loc4 = SIZEQ,loc4;; // loc4 -> string mov out0 = loc4 // out0 -> format add loc3 = -1,loc3 // Count down for p_loop br.call.sptk.many b0 = puts // C output function mov gp = loc2 // Restore gp cmp4.lt p6,p0 = ret0,r0 // If any error, (p6) br.cond.sptk.few stop0;; // go to handler (null) cmp.gt p6,p0 = loc3,r0 // Still more work (p6) br.cond.sptk.few p_loop // for print loop to do? stop0: // Output error stop1: // EOF or input error done: mov ret0 = 0 // Signal all is normal mov b0 = loc1 // Restore return address mov ar.pfs = loc0 // Restore caller's ar.pfs br.ret.sptk.many b0;; // Back to command line .endp main // Mark end of procedure Each string is allotted SIZE bytes, rounded up to SIZEQ, a multiple of 8 bytes. As the gets function does not ensure that each string has a null at the end, we always force a null byte in the last character position of every element in ARRAY. A maximum of five local variables is needed to preserve entry context, and also to manage the input loop extending from line to sort. Program termination is determined by testing the first character of each newly entered string against the NUL character (ASCII 0x0). Rather than warn when ARRAY is filled, the program simply proceeds to the sorting phase with the maximum number of CASES (strings). Register loc3 retains the number of non-null strings entered. The section starting at sort consists of outer and inner loops required for an implementation of the bubble sort algorithm. The outer loop starts at o_loop and is controlled by register r20. The inner loop starts at i_loop and is controlled by register r21. Registers r22 and r23 point to the active string elements in ARRAY, and registers r2 and r3 are temporary pointers initialized relative to r22 and r23, respectively. These loops surround two other separate loops, one to make comparisons and the other to perform interchanges of strings as they are sorted. The loop starting at look inspects two adjacently stored strings, byte by byte, to determine whether they are already in proper order or must be exchanged. Bytes are loaded into registers r24 and r25 for inspection and comparison. If the first string is shorter (i.e., the null byte is detected), it should normally precede the other string. If the character from the first string comes earlier in the ASCII collating sequence than the character from the second, as determined by an unsigned comparison, the two strings are already in the proper order and program execution moves to the next pair of strings by advancing pointers at adjust. The loop starting at swap exchanges two strings eight bytes at a time. This 64-bit technique, which requires quad word alignment and storage allotment of quad word multiples, is significantly faster than moving data by the byte. In classic textbook explanations of the bubble sort algorithm, a theoretical temporary storage element, equivalent in capacity to the sorted entities, is used to demonstrate the operation: TEMP := A A := B B := TEMP It would be a waste of effort to move entire strings twice. As we have pointers to each element (registers r2, r3), and the quad word values (registers r27, r28), we simply swap the address pointers. While this involves two temporary storage units (registers r27, r28), rather than one, these units are of a size that can be manipulated in a single cycle, whereas a string of undetermined length would require many more cycles to move. Control of the various loop structures in this program has been designed with the fewest branch instructions and the smoothest possible logical flow. At adjust, the pointers are advanced by the quad word multiple SIZEQ allotted to each string. For the loop that prints the results, the storage pointer loc4 into ARRAY is preindexed by -SIZEQ in order to handle the first and subsequent traversals smoothly. Strings begin at regularly spaced addresses and the null at the end is preserved. After the sorting phase, the simple loop starting at p_loop prints each string in order using the printf function: Sample input: two Two TWO three ten thirteen twenty one hundred Sample output: TWO Two one hundred ten thirteen three twenty two Note that this ASCII-oriented sort is not the same as the case-insensitive sorts and merges used by dictionaries and book indices. |