7.2. String IO

< Day Day Up >

7.2. String I/O

Now we'll zoom back in to the string I/O level and examine the echo and read statements, which give the shell I/O capabilities that are more analogous to those of conventional programming languages.

7.2.1. echo

As we've seen countless times in this book, echo simply prints its arguments to standard output. Now we'll explore the command in greater detail.

7.2.1.1 Options to echo

echo accepts a few dash options, listed in Table 7-2.

Table 7-2. echo options
Option	Function
-e	Turns on the interpretation of backslash-escaped characters
-E	Turns off the interpretation of backslash-escaped characters on systems where this mode is the default
-n	Omits the final newline (same as the \c escape sequence)

7.2.1.2 echo escape sequences

echo accepts a number of escape sequences that start with a backslash.^[2] They are listed in Table 7-3.

^[2] You must use a double backslash if you don't surround the string that contains them with quotes; otherwise, the shell itself "steals" a backslash before passing the arguments to echo.

These sequences exhibit fairly predictable behavior, except for \f: on some displays, it causes a screen clear, while on others it causes a line feed. It ejects the page on most printers. \v is somewhat obsolete; it usually causes a line feed.

Table 7-3. echo escape sequences
Sequence	Character printed
\a	ALERT or CTRL-G (bell)
\b	BACKSPACE or CTRL-H
\c	Omit final NEWLINE
\e	Escape character (same as \E)
\E	Escape character^[3]
\f	FORMFEED or CTRL-L
\n	NEWLINE (not at end of command) or CTRL-J
\r	RETURN (ENTER) or CTRL-M
\t	TAB or CTRL-I
\v	VERTICAL TAB or CTRL-K
\n	ASCII character with octal (base-8) value n, where n is 1 to 3 digits
\0nnn	The eight-bit character whose value is the octal (base-8) value nnn where nnn is 1 to 3 digits
\xHH	The eight-bit character whose value is the hexadecimal (base-16) value HH (one or two digits)
\\	Single backslash

^[3] Not available in versions of bash prior to 2.0.

The \n, \0, and \x sequences are even more device-dependent and can be used for complex I/O, such as cursor control and special graphics characters.

7.2.2. printf

bash 's echo command is quite powerful and for most cases entirely adequate. However, there are occasions where a more powerful and flexible approach is needed for printing information, especially when the information needs to be formatted. bash provides this by giving access to a powerful system-level printing library known as printf.^[4]

^[4] printf is not available in versions of bash prior to version 2.02.

The printf command can output a string similar to the echo command:

printf "hello world"

Unlike the echo command, printf does not automatically provide a newline. If we want to make it do the exactly same as a standard echo then we must provide one by adding \n to the end:

printf "hello world\n"

You may ask why this is any better than echo. The printf command has two parts, which is what makes it so powerful.

printf format-string [arguments]

The first part is a string that describes the format specifications; this is best supplied as a string constant in quotes. The second part is an argument list, such as a list of strings or variable values that correspond to the format specifications. (The format is reused as necessary to use up all of the arguments. If the format requires more arguments than are supplied, the extra format specifications behave as if a zero value or null string, as appropriate, had been supplied). A format specification is preceded by a percent sign (%), and the specifier is one of the characters described below. Two of the main format specifiers are %s for strings and %d for decimal integers.

This sounds complicated but we can begin by re-casting the last example:

printf "%s %s\n" hello world

This prints hello world on a line of its own, just as the previous example did. The word hello has been assigned to the first format specification, %s. Likewise, world has been assigned to the second %s. printf then prints these two strings followed by the newline.

We could also achieve the same result by making hello an explicit part of the format string:

$ printf "hello %s\n" world hello world

The allowed specifiers are shown in Table 7-4.

Table 7-4. printf format specifiers
Specifier	Description
%c	ASCII character (prints first character of corresponding argument)
%d	Decimal integer
%i	Same as %d
%e	Floating-point format ([-]d.precisione[+-]dd) (see following text for meaning of precision)
%E	Floating-point format ([-]d.precisionE[+-]dd)
%f	Floating-point format ([-]ddd.precision)
%g	%e or %f conversion, whichever is shorter, with trailing zeros removed
%G	%E or %f conversion, whichever is shortest, with trailing zeros removed
%o	Unsigned octal value
%s	String
%u	Unsigned decimal value
%x	Unsigned hexadecimal number; uses a-f for 10 to 15
%X	Unsigned hexadecimal number; uses A-F for 10 to 15
%%	Literal %

The printf command can be used to specify the width and alignment of output fields. A format expression can take three optional modifiers following % and preceding the format specifier:

%flags width.precision format-specifier

The width of the output field is a numeric value. When you specify a field width, the contents of the field are right-justified by default. You must specify a flag of "-" to get left-justification. (The rest of the flags are discussed shortly.) Thus, "%-20s" outputs a left-justified string in a field 20 characters wide. If the string is less than 20 characters, the field is padded with whitespace to fill. In the following examples, a | is output to indicate the actual width of the field. The first example right-justifies the text:

printf "|%10s|\n" hello

It produces:

|     hello|

The next example left-justifies the text:

printf "|%-10s|\n" hello

It produces:

|hello     |

The precision modifier, used for decimal or floating-point values, controls the number of digits that appear in the result. For string values, it controls the maximum number of characters from the string that will be printed.

You can specify both the width and precision dynamically, via values in the printf argument list. You do this by specifying asterisks, instead of literal values.

$ myvar=42.123456 $ printf "|%*.*G|\n" 5 6 $myvar |42.1235|

In this example, the width is 5, the precision is 6, and the value to print comes from the value of myvar.

The precision is optional. Its exact meaning varies by control letter, as shown in Table 7-5.

Table 7-5. Meaning of precision
Conversion	Precision means
%d, %I, %o, %u, %x, %X	The minimum number of digits to print. When the value has fewer digits, it is padded with leading zeros. The default precision is 1.
%e, %E	The minimum number of digits to print. When the value has fewer digits, it is padded with zeros after the decimal point. The default precision is 10. A precision of 0 inhibits printing of the decimal point.
%f	The number of digits to the right of the decimal point.
%g, %G	The maximum number of significant digits.
%s	The maximum number of characters to print.

Finally, one or more flags may precede the field width and the precision. We've already seen the "-" flag for left-justification. The rest of the flags are shown in Table 7-6.

Table 7-6. Flags for printf
Character	Description
-	Left-justify the formatted value within the field.
space	Prefix positive values with a space and negative values with a minus.
+	Always prefix numeric values with a sign, even if the value is positive.
#	Use an alternate form: %o has a preceding 0; %x and %X are prefixed with 0x and 0X, respectively; %e, %E and %f always have a decimal point in the result; and %g and %G do not have trailing zeros removed.
0	Pad output with zeros, not spaces. This only happens when the field width is wider than the converted result. In the C language, this flag applies to all output formats, even non-numeric ones. For bash, it only applies to the numeric formats.

If printf cannot perform a format conversion, it returns a non-zero exit status.

7.2.2.1 Additional bash printf specifiers

Besides the standard specifiers just described, the bash shell (and other POSIX compliant shells) accepts two additional specifiers. These provide useful features at the expense of nonportability to versions of the printf command found in some other shells and in other places in UNIX:

%b

When used instead of %s, expands echo-style escape sequences in the argument string. For example:

$ printf "%s\n" 'hello\nworld' hello\nworld $ printf "%b\n" 'hello\nworld' hello world

%q

When used instead of %s, prints the string argument in such a way that it can be used for shell input. For example:

$ printf "%q\n" "greetings to the world" greetings\ to\ the\ world

7.2.3. read

The other half of the shell's string I/O facilities is the read command, which allows you to read values into shell variables. The basic syntax is:

read var1 var2...

This statement takes a line from the standard input and breaks it down into words delimited by any of the characters in the value of the environment variable IFS (see Chapter 4; these are usually a space, a TAB, and NEWLINE). The words are assigned to variables var1, var2, etc. For example:

$ read character1 character2alice duchess$ echo $character1alice $ echo $character2duchess

If there are more words than variables, then excess words are assigned to the last variable. If you omit the variables altogether, the entire line of input is assigned to the variable REPLY.

You may have identified this as the "missing ingredient" in the shell programming capabilities we have seen thus far. It resembles input statements in conventional languages, like its namesake in Pascal. So why did we wait this long to introduce it?

Actually, read is sort of an "escape hatch" from traditional shell programming philosophy, which dictates that the most important unit of data to process is a text file, and that UNIX utilities such as cut, grep, sort, etc., should be used as building blocks for writing programs.

read, on the other hand, implies line-by-line processing. You could use it to write a shell script that does what a pipeline of utilities would normally do, but such a script would inevitably look like:

while (read a line) do     process the line     print the processed line end

This type of script is usually much slower than a pipeline; furthermore, it has the same form as a program someone might write in C (or some similar language) that does the same thing much faster. In other words, if you are going to write it in this line-by-line way, there is little point in writing a shell script.

7.2.3.1 Reading lines from files

Nevertheless, shell scripts with read are useful for certain kinds of tasks. One is when you are reading data from a file small enough so that efficiency isn't a concern (say a few hundred lines or less), and it's really necessary to get bits of input into shell variables.

Consider the case of a UNIX machine that has terminals that are hardwired to the terminal lines of the machine. It would be nice if the TERM environment variable was set to the correct terminal type when a user logged in.

One way to do this would be to have some code that sets the terminal information when a user logs in. This code would presumably reside in /etc/profile, the system-wide initialization file that bash runs before running a user's .bash_profile. If the terminals on the system change over time as surely they must then the code would have to be changed. It would be better to store the information in a file and change just the file instead.

Assume we put the information in a file whose format is typical of such UNIX "system configuration" files: each line contains a device name, a TAB, and a TERM value.

We'll call the file /etc/terms, and it would typically look something like this:

console        console tty01        wy60 tty03        vt100 tty04        vt100 tty07        wy85 tty08        vt100

The values on the left are terminal lines and those on the right are the terminal types that TERM can be set to. The terminals connected to this system are a Wyse 60 (wy60), three VT100s (vt100), and a Wyse 85 (wy85). The machines' master terminal is the console, which has a TERM value of console.

We can use read to get the data from this file, but first we need to know how to test for the end-of-file condition. Simple: read's exit status is 1 (i.e., non-zero) when there is nothing to read. This leads to a clean while loop:

TERM=vt100       # assume this as a default line=$(tty) while read dev termtype; do     if [ $dev = $line ]; then         TERM=$termtype         echo "TERM set to $TERM."         break     fi done

The while loop reads each line of the input into the variables dev and termtype. In each pass through the loop, the if looks for a match between $dev and the user's tty ($line, obtained by command substitution from the tty command). If a match is found, TERM is set, a message is printed, and the loop exits; otherwise TERM remains at the default setting of vt100.

We are not quite done, though: this code reads from the standard input, not from /etc/terms! We need to know how to redirect input to multiple commands. It turns out that there are a few ways of doing this.

7.2.3.2 I/O redirection and multiple commands

One way to solve the problem is with a subshell, as we'll see in the next chapter. This involves creating a separate process to do the reading. However, it is usually more efficient to do it in the same process; bash gives us four ways of doing this.

The first, which we have seen already, is with a function:

findterm ( ) {     TERM=vt100       # assume this as a default     line=$(tty)     while read dev termtype; do         if [ $dev = $line ]; then             TERM=$termtype             echo "TERM set to $TERM."             break;         fi     done }       findterm < /etc/terms

A function acts like a script in that it has its own set of standard I/O descriptors, which can be redirected in the line of code that calls the function. In other words, you can think of this code as if findterm were a script and you typed findterm < /etc/terms on the command line. The read statement takes input from /etc/terms a line at a time, and the function runs correctly.

The second way is to simplify this slightly by placing the redirection at the end of the function:

findterm ( ) {     TERM=vt100       # assume this as a default     line=$(tty)     while read dev termtype; do         if [ $dev = $line ]; then             TERM=$termtype             echo "TERM set to $TERM."             break;         fi     done } < /etc/terms

Whenever findterm is called, it takes its input from /etc/terms.

The third way is by putting the I/O redirector at the end of the loop, like this:

TERM=vt100       # assume this as a default line=$(tty) while read dev termtype; do     if [ $dev = $line ]; then         TERM=$termtype         echo "TERM set to $TERM."         break;     fi done < /etc/terms

You can use this technique with any flow-control construct, including if...fi, case...esac, select...done, and until...done. This makes sense because these are all compound statements that the shell treats as single commands for these purposes. This technique works fine the read command reads a line at a time as long as all of the input is done within the compound statement.

7.2.3.3 Command blocks

But if you want to redirect I/O to or from an arbitrary group of commands without creating a separate process, you need to use a construct that we haven't seen yet. If you surround some code with { and }, the code will behave like a function that has no name. This is another type of compound statement. In accordance with the equivalent concept in the C language, we'll call this a command block.

What good is a block? In this case, it means that the code within the curly brackets ({}) will take standard I/O descriptors just as we described in the last block of code. This construct is appropriate for the current example because the code needs to be called only once, and the entire script is not really large enough to merit breaking down into functions. Here is how we use a block in the example:

{     TERM=vt100       # assume this as a default     line=$(tty)     while read dev termtype; do         if [ $dev = $line ]; then             TERM=$termtype             echo "TERM set to $TERM."             break;         fi     done } < /etc/terms

To help you understand how this works, think of the curly brackets and the code inside them as if they were one command, i.e.:

{ TERM=vt100; line=$(tty); while ... } < /etc/terms;

Configuration files for system administration tasks like this one are actually fairly common; a prominent example is /etc/hosts, which lists machines that are accessible in a TCP/IP network. We can make /etc/terms more like these standard files by allowing comment lines in the file that start with #, just as in shell scripts. This way /etc/terms can look like this:

# # System Console is console console        console # # Cameron's line has a Wyse 60 tty01        wy60 ...

We can handle comment lines by modifying the while loop so that it ignores lines begining with #. We can place a grep in the test:

if [ -z "$(echo $dev | grep ^#)" ]  && [ $dev = $line ]; then     ...

As we saw in Chapter 5, the && combines the two conditions so that both must be true for the entire condition to be true.

As another example of command blocks, consider the case of creating a standard algebraic notation frontend to the dc command. dc is a UNIX utility that simulates a Reverse Polish Notation (RPN) calculator:^[5]

^[5] If you have ever owned a Hewlett-Packard calculator you will be familiar with RPN. We'll discuss RPN further in one of the exercises at the end of this chapter.

{ while read line; do     echo "$(alg2rpn $line)"   done } | dc

We'll assume that the actual conversion from one notation to the other is handled by a function called alg2rpn. It takes a line of standard algebraic notation as an argument and prints the RPN equivalent on the standard output. The while loop reads lines and passes them through the conversion function, until an EOF is typed. Everything is executed inside the command block and the output is piped to the dc command for evaluation.

7.2.3.4 Reading user input

The other type of task to which read is suited is prompting a user for input. Think about it: we have hardly seen any such scripts so far in this book. In fact, the only ones were the modified solutions to Task 5-4, which involved select.

As you've probably figured out, read can be used to get user input into shell variables.

We can use echo to prompt the user, like this:

echo -n 'terminal? ' read TERM echo "TERM is $TERM"

Here is what this looks like when it runs:

terminal? wy60TERM is wy60

However, shell convention dictates that prompts should go to standard error, not standard output. (Recall that select prompts to standard error.) We could just use file descriptor 2 with the output redirector we saw earlier in this chapter:

echo -n 'terminal? ' >&2 read TERM echo TERM is $TERM

We'll now look at a more complex example by showing how Task 5-5 would be done if select didn't exist. Compare this with the code in Chapter 5:

echo 'Select a directory:' done=false       while [ $done = false ]; do     do=true     num=1     for direc in $DIR_STACK; do         echo $num) $direc          num=$((num+1))     done     echo -n 'directory? '     read REPLY           if [ $REPLY -lt $num ] && [ $REPLY -gt 0 ]; then         set - $DIR_STACK               #statements that manipulate the stack...               break     else         echo 'invalid selection.'     fi done

The while loop is necessary so that the code repeats if the user makes an invalid choice. select includes the ability to construct multicolumn menus if there are many choices, and better handling of null user input.

Before leaving read, we should note that it has eight options: -a, -d, -e, -n, -p, -r, -t, and -s.^[6] The first of these options allows you to read values into an array. Each successive item read in is assigned to the given array starting at index 0. For example:

^[6] -a, -d, -e, -n, -p, -t and -s are not available in versions of bash prior to 2.0.

$ read -a people alice duchess dodo $ echo ${people[2]} dodo $

In this case, the array people now contains the items alice, duchess, and dodo.

A delimiter can be specified with the -d option. This will read a line up until the first character of the delimiter is reached. For example:

$ read -s stop aline alice duches$ $ echo $aline alice duche $

The option -e can be used only with scripts run from interactive shells. It causes readline to be used to gather the input line, which means that you can use any of the readline editing features that we looked at in Chapter 2.

The -n option specifies how many characters will be read by read. For example, if we specify that it should read only ten characters in then it will return after reading that many:

$ read -n 10 aline abcdefghij$ $ echo $aline abcdefghij $

The -p option followed by a string argument prints the string before reading input. We could have used this in the earlier examples of read, where we printed out a prompt before doing the read. For example, the directory selection script could have used read -p `directory?' REPLY.

read lets you input lines that are longer than the width of your display by providing a backslash (\) as a continuation character, just as in shell scripts. The -r option overrides this, in case your script reads from a file that may contain lines that happen to end in backslashes. read -r also preserves any other escape sequences the input might contain. For example, if the file hatter contains this line:

A line with a\n escape sequence

Then read -r aline will include the backslash in the variable aline, whereas without the -r, read will "eat" the backslash. As a result:

$ read -r aline < hatter$ echo -e "$aline" A line with a  escape sequence $

However:

$ read aline < hatter$ echo -e "$aline" A line with an escape sequence $

The -s option forces read to not echo the characters that are typed to the terminal. This can be useful in cases where a shell may want to take single keystroke commands without displaying the typed characters on the terminal (e.g., moving something around with the arrow keys). In this case it could be combined with the -n option to read a single character each time in a loop: read -s -n1 key

The last option, -t, allows a time in seconds to be specified. read will wait the specified time for input and then finish. This is useful if you want a script to wait for input but continue processing if nothing is supplied.