9.3 SCANTERM: Using C Standard I/OWe presented in Chapter 6 the simple SCANTEXT program to scan through a stored string, byte by byte, looking for space characters as separators between words. That program, like many of our other illustrations, had no provision for I/O outside of the debugger. We will rework the SCANTEXT program to add input and output from the "terminal" (i.e., the keyboard and display), using external calls to standard C library functions (Table 9-1). The new SCANTERM program in Figure 9-3 displays a prompt with puts, obtains a line of input using gets, counts the number of words and the total number of characters, and then prints a summary line containing two embedded numeric fields in decimal radix using printf. We encourage you to study SCANTERM carefully, as it employs several useful techniques. We used symbols to facilitate setting up the data segment (e.g., IBUFL) and to identify the space character. Most of the program is composed of one big loop, starting at line and ending just above done. Line-oriented utility programs depend on exception handling by the operating environment to stop what would otherwise be an infinite loop. This program is halted if control-C is entered at the keyboard when gets is called, as Linux and Unix consider control-C to be an error condition for gets. Figure 9-3 SCANTERM: Showing calls to C functions for standard I/O// SCANTERM Demonstrate I/O for terminal // This program does lexical analysis by finding the // separate words on an input line. It counts words and // characters within words. (Stop it with CTRL/C.) IBUFL = 256 // Input allowance SPACE = 0x20 // ASCII code for <SP> .global gets, puts, printf .data // Declare storage .align 8 // Desired alignment IBUF: .skip IBUFL // Input line PRMT: stringz "Enter a line of text (CTRL/C to quit)" TELL: stringz "Found %d words containing %d characters.\n" .text // Section for code .align 32 // Desired alignment .global main // These three lines .proc main // mark the mandatory main: // 'main' program entry .prologue 12,r32 // Mask for rp, ar.pfs only alloc loc0 = ar.pfs,0,3,3,0 // ins, locals, outs .save rp,loc1 // Must save return address mov loc1 = b0;; // to our caller .body mov loc2 = gp // Save gp line: add out0 = @gprel(PRMT),gp // out0 -> prompt br.call.sptk.many b0 = puts // Unformatted output mov gp = loc2 // Restore gp cmp4.lt p6,p0 = ret0,r0 // If any error, (p6) br.cond.sptk.few stop0;; // go to handler (null) add out0 = @gprel(IBUF),gp // out0 -> buffer br.call.sptk.many b0 = gets // Unformatted input mov gp = loc2 // Restore gp cmp.eq p6,p0 = ret0,r0 // If any error, (p6) br.cond.sptk.few stop1;; // go to handler (null) first: mov r20 = 0 // Gr20 = character count mov r21 = 0 // Gr21 = word count addl r14 = @gprel(IBUF),gp;; // Gr14 --> input next: ld1 r22 = [r14],1;; // Get a character; bump cmp.eq p7,p8 = SPACE,r22 // End of word? cmp.eq p6,p0 = 0x0,r22 // NUL marks end (p6) br.cond.spnt.few nomore;; // of our work here (p8) add r20 = 0x1,r20 // No: count a character (p7) add r21 = 0x1,r21 // Yes: count a word br.cond.sptk.few next // Go back for more nomore: add r21 = 0x1,r21 // The last word add out0 = @gprel(TELL),gp;; // out0 -> format mov out1 = r21 // out1 = number of words mov out2 = r20 // out2 = number of chars br.call.sptk.many b0 = printf // C print function mov gp = loc2 // Restore gp cmp4.lt p6,p0 = ret0,r0 // If any error, (p6) br.cond.sptk.few stop0 // go to handler (null) br.cond.sptk.many line // Look for another line? stop0: // Output error stop1: // EOF or input error done: mov ret0 = 0 // Signal all is normal mov b0 = loc1 // Restore return address mov ar.pfs = loc0 // Restore caller's ar.pfs br.ret.sptk.many b0;; // Back to command line .endp main // Mark end of procedure In SCANTERM, the global pointer must be reset immediately after calling a routine using br.call with register b0. Only then should any returned values be accessed or tested for errors, as we do here with the conditional branches to labels stop0 and stop1. If SCANTERM were a production program, we would insert code to handle error reporting and recovery at those locations; instead, we simply exit in the event of any error. Inside the big loop, SCANTERM has a nested loop from next to nomore that is traversed once per byte encountered in the input text stream. Loop control here simply involves testing for the expected null byte written into the input buffer by the gets function. When this inner loop terminates, the word counter must be specially incremented one final time in order to reflect the last word on the input line that was followed by the null byte instead of a space. The calls to the C functions follow the principles that we developed in Chapter 7 for HP-UX and Linux calling standards, using the specific argument requirements given in Table 9-1. For example, printf requires that out0 contain the address of a string to include in the printed output, and if that string contains format parameters, number values or string addresses should be contained in out1 through out7 (and the stack thereafter). Since the SCANTERM program uses standard functions from the C language library, we anticipated that it would function correctly with HP-UX software: H> cc_bundled +DD64 -o bin/scanterm scanterm.s H> bin/scanterm Enter a line of text (CTRL/C to quit) one two three Found 3 words containing 11 characters. Enter a line of text (CTRL/C to quit) 1 2 999 Found 3 words containing 5 characters. Enter a line of text (CTRL/C to quit) H> and with gcc: L> gcc -O0 -o bin/scanterm scanterm.s /tmp/cc9T8kwJ.o: In function 'line': /tmp/cc9T8kwJ.o(.text+0x32): the 'gets' function is dangerous and should not be used. L> scanterm Enter a line of text (CTRL/C to quit) one two three Found 3 words containing 11 characters. Enter a line of text (CTRL/C to quit) 1 2 999 Found 3 words containing 5 characters. Enter a line of text (CTRL/C to quit) L> or with Intel's ecc software for Linux: L> ecc -O0 -o bin/scanterm scanterm.s scanterm.s: scanterm.o: In function 'main': scanterm.o(.text+0x32): the `gets' function is dangerous and should not be used. L> scanterm Enter a line of text (CTRL/C to quit) one two three Found 3 words containing 11 characters. Enter a line of text (CTRL/C to quit) 1 2 999 Found 3 words containing 5 characters. Enter a line of text (CTRL/C to quit) L> Note that the Linux compilers issue a security warning about using the gets function; HP-UX software, using default options, does not. The gets function lacks any parameter to specify the size of the input, which might lead to a buffer overrun with related security issues on multiuser systems. Although its use is discouraged in critical applications, we illustrate gets for ease of testing, learning, and debugging simple programs without intending it as a model for advanced development. |