String IO

7.2 String I/O

Now we'll zoom back in to the string I/O level and examine the echo and read statements, which give the shell I/O capabilities that are more analogous to those of conventional programming languages.

7.2.1 echo

As we've seen countless times in this book, echo simply prints its arguments to standard output. Now we'll explore the command in greater detail.

7.2.1.1 Options to echo

echo accepts a few dash options, listed in Table 7.2 .

Table 7.2. echo Options
Option	Function
-e	Turns on the interpretation of backslash-escaped characters
-E	Turns off the interpretation of backslash-escaped character on systems where this mode is the default
-n	Omit the final newline (same as the \c escape sequence)

7.2.1.2 echo escape sequences

echo accepts a number of escape sequences that start with a backslash. ^[2] These are similar to the escape sequences recognized by echo and the C language; they are listed in Table 7.3 .

^[2] You must use a double backslash if you don't surround the string that contains them with quotes; otherwise , the shell itself "steals" a backslash before passing the arguments to echo .

These sequences exhibit fairly predictable behavior, except for \f : on some displays, it causes a screen clear, while on others it causes a line feed. It ejects the page on most printers. \v is somewhat obsolete; it usually causes a line feed.

Table 7.3. echo Escape Sequences
Sequence	Character Printed
\a	ALERT or CTRL-G (bell)
\b	BACKSPACE or CTRL-H
\c	Omit final NEWLINE
\E	Escape character ^a
\f	FORMFEED or CTRL-L
\n	NEWLINE (not at end of command) or CTRL-J
\r	RETURN (ENTER) or CTRL-M
\t	TAB or CTRL-I
\v	VERTICAL TAB or CTRL-K
\ n	ASCII character with octal (base-8) value n , where n is 1 to 3 digits
\\	Single backslash

^[3]

^[3] Not available in versions of bash prior to 2.0.

The \n sequence is even more device-dependent and can be used for complex I/O, such as cursor control and special graphics characters.

7.2.2 read

The other half of the shell's string I/O facilities is the read command, which allows you to read values into shell variables . The basic syntax is:

 read   var1 var2...

This statement takes a line from the standard input and breaks it down into words delimited by any of the characters in the value of the environment variable IFS (see Chapter 4 ; these are usually a space, a TAB, and NEWLINE). The words are assigned to variables var1 , var2 , etc. For example:

 $  read character1 character2alice duchess  $    echo $character1  alice

  $   echo $character2  duchess

If there are more words than variables, then excess words are assigned to the last variable. If you omit the variables altogether, the entire line of input is assigned to the variable REPLY .

You may have identified this as the "missing ingredient" in the shell programming capabilities we have seen thus far. It resembles input statements in conventional languages, like its namesake in Pascal. So why did we wait this long to introduce it?

Actually, read is sort of an "escape hatch" from traditional shell programming philosophy, which dictates that the most important unit of data to process is a text file , and that UNIX utilities such as cut , grep , sort , etc., should be used as building blocks for writing programs.

read , on the other hand, implies line-by-line processing. You could use it to write a shell script that does what a pipeline of utilities would normally do, but such a script would inevitably look like:

 while (read a line) do

 process the line

 print the processed line

end

This type of script is usually much slower than a pipeline; furthermore, it has the same form as a program someone might write in C (or some similar language) that does the same thing much faster. In other words, if you are going to write it in this line-by-line way, there is no point in writing a shell script.

7.2.2.1 Reading lines from files

Nevertheless, shell scripts with read are useful for certain kinds of tasks . One is when you are reading data from a file small enough so that efficiency isn't a concern (say a few hundred lines or less), and it's really necessary to get bits of input into shell variables.

Consider the case of a UNIX machine that has terminals that are hardwired to the terminal lines of the machine. It would be nice if the TERM environment variable was set to the correct terminal type when a user logged in.

One way to do this would be to have some code that sets the terminal information when a user logs in. This code would presumably reside in /etc/profile , the system-wide initialization file that bash runs before running a user's .bash_profile . If the terminals on the system change over time ”as surely they must ”then the code would have to be changed. It would be better to store the information in a file and change just the file instead.

Assume we put the information in a file whose format is typical of such UNIX "system configuration" files: each line contains a device name , a TAB, and a TERM value.

We'll call the file /etc/terms , and it would typically look something like this:

 console console

 tty01 wy60

 tty03 vt100

 tty04 vt100

 tty07 wy85

 tty08 vt100

The values on the left are terminal lines and those on the right are the terminal types that TERM can be set to. The terminals connected to this system are a Wyse 60 (wy60), three VT100s (vt100), and a Wyse 85 (wy85). The machines' master terminal is the console, which has a TERM value of console .

We can use read to get the data from this file, but first we need to know how to test for the end-of-file condition. Simple: read 's exit status is 1 (i.e., non-zero ) when there is nothing to read. This leads to a clean while loop:

 TERM=vt100 # assume this as a default

 line=$(tty)

 while read dev termtype; do

 if [ $dev = $line ]; then

 TERM=$termtype

 echo "TERM set to $TERM."

 break

fi

 done

The while loop reads each line of the input into the variables dev and termtype . In each pass through the loop, the if looks for a match between $dev and the user's tty ( $line , obtained by command substitution from the tty command). If a match is found, TERM is set, a message is printed, and the loop exits; otherwise TERM remains at the default setting of vt100 .

We're not quite done, though: this code reads from the standard input, not from /etc/terms ! We need to know how to redirect input to multiple commands . It turns out that there are a few ways of doing this.

7.2.2.2 I/O redirection and multiple commands

One way to solve the problem is with a subshell , as we'll see in the next chapter. This involves creating a separate process to do the reading. However, it is usually more efficient to do it in the same process; bash gives us four ways of doing this.

The first, which we have seen already, is with a function:

 findterm () {

 TERM=vt100 # assume this as a default

 line=$(tty)

 while read dev termtype; do

 if [ $dev = $line ]; then

 TERM=$termtype

 echo "TERM set to $TERM."

 break;

fi

 done

 findterm < /etc/terms

A function acts like a script in that it has its own set of standard I/O descriptors, which can be redirected in the line of code that calls the function. In other words, you can think of this code as if findterm were a script and you typed findterm < /etc/terms on the command line. The read statement takes input from /etc/terms a line at a time, and the function runs correctly.

The second way is to simplify this slightly by placing the redirection at the end of the function:

 findterm () {

 TERM=vt100 # assume this as a default

 line=$(tty)

 while read dev termtype; do

 if [ $dev = $line ]; then

 TERM=$termtype

 echo "TERM set to $TERM."

 break;

fi

 done

 } < /etc/terms

Whenever findterm is called, it takes its input from /etc/terms .

The third way is by putting the I/O redirector at the end of the loop, like this:

 TERM=vt100 # assume this as a default

 line=$(tty)

 while read dev termtype; do

 if [ $dev = $line ]; then

 TERM=$termtype

 echo "TERM set to $TERM."

 break;

fi

 done < /etc/terms

You can use this technique with any flow-control construct, including if ... fi , case ... esac , select ... done , and until ... done . This makes sense because these are all compound statements that the shell treats as single commands for these purposes. This technique works fine ”the read command reads a line at a time ”as long as all of the input is done within the compound statement.

7.2.2.3 Command blocks

But if you want to redirect I/O to or from an arbitrary group of commands without creating a separate process, you need to use a construct that we haven't seen yet. If you surround some code with { and } , the code will behave like a function that has no name. This is another type of compound statement. In accordance with the equivalent concept in the C language, we'll call this a command block .

What good is a block? In this case, it means that the code within the curly brackets ( {} ) will take standard I/O descriptors just as we described in the last block of code. This construct is appropriate for the current example because the code needs to be called only once, and the entire script is not really large enough to merit breaking down into functions. Here is how we use a block in the example:

 TERM=vt100 # assume this as a default

 line=$(tty)

 while read dev termtype; do

 if [ $dev = $line ]; then

 TERM=$termtype

 echo "TERM set to $TERM."

 break;

fi

 done

 } < /etc/terms

To help you understand how this works, think of the curly brackets and the code inside them as if they were one command, i.e.:

 { TERM=vt100; line=$(tty); while ... } < /etc/terms;

Configuration files for system administration tasks like this one are actually fairly common; a prominent example is /etc/ hosts , which lists machines that are accessible in a TCP/IP network. We can make /etc/terms more like these standard files by allowing comment lines in the file that start with # , just as in shell scripts. This way /etc/terms can look like this:

 # System Console is console

 console console

 # Cameron's line has a Wyse 60

 tty01 wy60

...

We can handle comment lines by modifying the while loop so that it ignores lines begining with # . We can place a grep in the test:

 if [ -z "$(echo $dev  grep ^#)" ] && [ $dev = $line ]; then

...

As we saw in Chapter 5 , the && combines the two conditions so that both must be true for the entire condition to be true.

As another example of command blocks, consider the case of creating a standard algebraic notation frontend to the dc command. dc is a UNIX utility that simulates a Reverse Polish Notation (RPN) calculator: ^[4]

^[4] If you have ever owned a Hewlett-Packard calculator you will be familiar with RPN. We'll discuss RPN further in one of the exercises at the end of this chapter.

 { while read line; do

 echo "$(alg2rpn $line)"

 done

 }  dc

We'll assume that the actual conversion from one notation to the other is handled by a function called alg2rpn . It takes a line of standard algebraic notation as an argument and prints the RPN equivalent on the standard output. The while loop reads lines and passes them through the conversion function, until an EOF is typed. Everything is executed inside the command block and the output is piped to the dc command for evaluation.

7.2.2.4 Reading user input

The other type of task to which read is suited is prompting a user for input. Think about it: we have hardly seen any such scripts so far in this book. In fact, the only ones were the modified solutions to Task 5-4, which involved select .

As you've probably figured out, read can be used to get user input into shell variables.

We can use echo to prompt the user, like this:

 echo -n 'terminal? '

 read TERM

 echo "TERM is $TERM"

Here is what this looks like when it runs:

 terminal?  wy60  TERM is wy60

However, shell convention dictates that prompts should go to standard error , not standard output. (Recall that select prompts to standard error.) We could just use file descriptor 2 with the output redirector we saw earlier in this chapter:

 echo -n 'terminal? ' >&2

 read TERM

 echo TERM is $TERM

We'll now look at a more complex example by showing how Task 5-5 would be done if select didn't exist. Compare this with the code in Chapter 5 :

 echo 'Select a directory:'

 done=false

 while [ $done = false ]; do

 do=true

 num=1

 for direc in $DIR_STACK; do

 echo $num) $direc

 num=$((num+1))

 done

 echo -n 'directory? '

 read REPLY

 if [ $REPLY -lt $num ] && [ $REPLY -gt 0 ]; then

 set - $DIR_STACK

 #   statements that manipulate the stack...

 break

 else

 echo 'invalid selection.'

fi

 done

The while loop is necessary so that the code repeats if the user makes an invalid choice. select includes the ability to construct multicolumn menus if there are many choices, and better handling of null user input.

Before leaving read , we should note that it has four options: -a , -e , -p , and -r . ^[5] The first of these options allows you to read values into an array. Each successive item read in is assigned to the given array starting at index 0. For example:

^[5] -a , -e , and -p are not available in versions of bash prior to 2.0.

  $  read -a people  alice duchess dodo

   $     echo ${people[2]}  dodo

In this case, the array people now contains the items alice , duchess , and dodo .

The option -e can be used only with scripts run from interactive shells . It causes readline to be used to gather the input line, which means that you can use any of the readline editing features that we looked at in Chapter 2 .

The -p option followed by a string argument prints the string before reading input. We could have used this in the earlier examples of read , where we printed out a prompt before doing the read. For example, the directory selection script could have used read -p `directory? ' REPLY .

read lets you input lines that are longer than the width of your display by providing a backslash (\) as a continuation character, just as in shell scripts. The -r option overrides this, in case your script reads from a file that may contain lines that happen to end in backslashes. read -r also preserves any other escape sequences the input might contain. For example, if the file hatter contains this line:

  A line with a\n escape sequence

Then read -r aline will include the backslash in the variable aline , whereas without the -r , read will "eat" the backslash. As a result:

  $  read -r aline < hatter  $    echo -e "$aline"  A line with a

   escape sequence

However:

  $  read aline < hatter  $    echo -e "$aline"  A line with an escape sequence