9.3 SCANTERM: Using C Standard IO

9.3 SCANTERM: Using C Standard I/O

We presented in Chapter 6 the simple SCANTEXT program to scan through a stored string, byte by byte, looking for space characters as separators between words. That program, like many of our other illustrations, had no provision for I/O outside of the debugger.

We will rework the SCANTEXT program to add input and output from the "terminal" (i.e., the keyboard and display), using external calls to standard C library functions (Table 9-1). The new SCANTERM program in Figure 9-3 displays a prompt with puts, obtains a line of input using gets, counts the number of words and the total number of characters, and then prints a summary line containing two embedded numeric fields in decimal radix using printf.

We encourage you to study SCANTERM carefully, as it employs several useful techniques. We used symbols to facilitate setting up the data segment (e.g., IBUFL) and to identify the space character. Most of the program is composed of one big loop, starting at line and ending just above done. Line-oriented utility programs depend on exception handling by the operating environment to stop what would otherwise be an infinite loop. This program is halted if control-C is entered at the keyboard when gets is called, as Linux and Unix consider control-C to be an error condition for gets.

Figure 9-3 SCANTERM: Showing calls to C functions for standard I/O
 // SCANTERM      Demonstrate I/O for terminal // This program does lexical analysis by finding the // separate words on an input line. It counts words and // characters within words. (Stop it with CTRL/C.)          IBUFL   = 256            // Input allowance          SPACE   = 0x20           // ASCII code for <SP>          .global gets, puts, printf          .data                    // Declare storage          .align  8                // Desired alignment IBUF:    .skip   IBUFL            // Input line PRMT:    stringz "Enter a line of text (CTRL/C to quit)" TELL:    stringz "Found %d words containing %d characters.\n"          .text                    // Section for code          .align  32               // Desired alignment          .global main             // These three lines          .proc   main             //  mark the mandatory main:                             //   'main' program entry          .prologue 12,r32         // Mask for rp, ar.pfs only          alloc   loc0 = ar.pfs,0,3,3,0  // ins, locals, outs          .save   rp,loc1          // Must save return address          mov     loc1 = b0;;      //  to our caller          .body          mov     loc2 = gp        // Save gp line:    add     out0 = @gprel(PRMT),gp  // out0 -> prompt          br.call.sptk.many b0 = puts  // Unformatted output          mov     gp = loc2        // Restore gp          cmp4.lt p6,p0 = ret0,r0  // If any error,    (p6)  br.cond.sptk.few stop0;; //  go to handler (null)          add      out0 = @gprel(IBUF),gp  // out0 -> buffer          br.call.sptk.many b0 = gets  // Unformatted input          mov     gp = loc2        // Restore gp          cmp.eq  p6,p0 = ret0,r0  // If any error,    (p6)  br.cond.sptk.few stop1;; //  go to handler (null) first:   mov     r20 = 0          // Gr20 = character count          mov     r21 = 0          // Gr21 = word count          addl    r14 = @gprel(IBUF),gp;; // Gr14 --> input next:    ld1     r22 = [r14],1;;  // Get a character; bump          cmp.eq  p7,p8 = SPACE,r22 // End of word?          cmp.eq  p6,p0 = 0x0,r22  //   NUL marks end    (p6)  br.cond.spnt.few nomore;; //   of our work here    (p8)  add     r20 = 0x1,r20    // No: count a character    (p7)  add     r21 = 0x1,r21    // Yes: count a word          br.cond.sptk.few next    // Go back for more nomore:  add     r21 = 0x1,r21    // The last word          add     out0 = @gprel(TELL),gp;;  // out0 -> format          mov     out1 = r21       // out1 = number of words          mov     out2 = r20       // out2 = number of chars          br.call.sptk.many b0 = printf  // C print function          mov     gp = loc2        // Restore gp          cmp4.lt p6,p0 = ret0,r0  // If any error,    (p6)  br.cond.sptk.few stop0   //  go to handler (null)          br.cond.sptk.many line   // Look for another line? stop0:                            // Output error stop1:                            // EOF or input error done:    mov     ret0 = 0         // Signal all is normal          mov     b0 = loc1        // Restore return address          mov     ar.pfs = loc0    // Restore caller's ar.pfs          br.ret.sptk.many b0;;    // Back to command line          .endp   main             // Mark end of procedure 

In SCANTERM, the global pointer must be reset immediately after calling a routine using br.call with register b0. Only then should any returned values be accessed or tested for errors, as we do here with the conditional branches to labels stop0 and stop1. If SCANTERM were a production program, we would insert code to handle error reporting and recovery at those locations; instead, we simply exit in the event of any error.

Inside the big loop, SCANTERM has a nested loop from next to nomore that is traversed once per byte encountered in the input text stream. Loop control here simply involves testing for the expected null byte written into the input buffer by the gets function. When this inner loop terminates, the word counter must be specially incremented one final time in order to reflect the last word on the input line that was followed by the null byte instead of a space.

The calls to the C functions follow the principles that we developed in Chapter 7 for HP-UX and Linux calling standards, using the specific argument requirements given in Table 9-1. For example, printf requires that out0 contain the address of a string to include in the printed output, and if that string contains format parameters, number values or string addresses should be contained in out1 through out7 (and the stack thereafter).

Since the SCANTERM program uses standard functions from the C language library, we anticipated that it would function correctly with HP-UX software:

 H> cc_bundled +DD64 -o bin/scanterm scanterm.s H> bin/scanterm Enter a line of text (CTRL/C to quit) one two three Found 3 words containing 11 characters. Enter a line of text (CTRL/C to quit) 1 2 999 Found 3 words containing 5 characters. Enter a line of text (CTRL/C to quit) H> 

and with gcc:

 L> gcc -O0 -o bin/scanterm scanterm.s /tmp/cc9T8kwJ.o: In function 'line': /tmp/cc9T8kwJ.o(.text+0x32): the 'gets' function is dangerous and should not be used. L> scanterm Enter a line of text (CTRL/C to quit) one two three Found 3 words containing 11 characters. Enter a line of text (CTRL/C to quit) 1 2 999 Found 3 words containing 5 characters. Enter a line of text (CTRL/C to quit) L> 

or with Intel's ecc software for Linux:

 L> ecc -O0 -o bin/scanterm scanterm.s scanterm.s: scanterm.o: In function 'main': scanterm.o(.text+0x32): the `gets' function is dangerous and should not be used. L> scanterm Enter a line of text (CTRL/C to quit) one two three Found 3 words containing 11 characters. Enter a line of text (CTRL/C to quit) 1 2 999 Found 3 words containing 5 characters. Enter a line of text (CTRL/C to quit) L> 

Note that the Linux compilers issue a security warning about using the gets function; HP-UX software, using default options, does not.

The gets function lacks any parameter to specify the size of the input, which might lead to a buffer overrun with related security issues on multiuser systems. Although its use is discouraged in critical applications, we illustrate gets for ease of testing, learning, and debugging simple programs without intending it as a model for advanced development.



ItaniumR Architecture for Programmers. Understanding 64-Bit Processors and EPIC Principles
ItaniumR Architecture for Programmers. Understanding 64-Bit Processors and EPIC Principles
ISBN: N/A
EAN: N/A
Year: 2003
Pages: 223

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net