Input and Output | UNIX: The Complete Reference, Second Edition (Complete Reference Series)

As mentioned at the beginning of this chapter, awk uses standard input and output. This means that you can use the normal shell redirection operators to save output to a file (or read input from a file). You can include awk in command pipelines, and you can get input from the keyboard if no file is specified. However, awk also has a few special features for working with input and output.

Getting Input

Normally your awk program gets input from the file or files that you specify when you run the command, or from standard input if no files are specified. Sometimes, however, you need to get input from another source in addition to this input file. For example, as part of a program you may want to display a message and get a response that the user types in at the keyboard.

You can use the getline function to read a line of input from the keyboard or another file. By default, getline reads its input from the same file that you specified on the awk command line. Each time it is called, it reads the next line and splits it into fields. This is useful if you want precise control over the input loop-for example, if you wish to read the file only up to a certain point and then go to an END statement.

The following instruction reads a line from standard input and assigns it to the variable X:

 getline X

To get input from another file, you redirect the input to getline, as in this example:

 getline < "my_file"

This will read the next line of the file my_file. You can then use the built-in variables $0, $1, and so on to work with the line. Note that unlike shell file redirection, awk requires you to put quotes around the filename, or it will be interpreted as a variable name. You might use getline like this to combine data from two different files, by reading in data from my_file in addition to whatever file you may have opened from the command line. You can also read input from a named file and assign it to a variable, as in this example:

 getline nextline < "my_file"

This reads a line from my_file and assigns it to nextline.

The UNIX System identifies the keyboard as the logical file /dev/tty. To read a line from the keyboard, use getline with /dev/tty as the filename. You must enclose /dev/tty in quotes, as in “/dev/tty”, just as you would any other string or filename.

This example shows how you could use for keyboard input to add information interactively to a file. The following program fragment prints the item name (field 1) and old price for each inventory record, prompts the user to type in the new price, and then substitutes the new price and prints the new record on standard output:

 {   print $1, "Old price:", $4   getline new < "/dev/tty"   $4=new   print "New price:", $0 > "outputfile" }

Using Command-Line Arguments

Normally awk interprets words on the command line after the program as names of input files. However, it is possible to use the command line to give arguments to an awk program.

The number of command-line arguments is stored as a built-in variable (ARGC). The command-line arguments themselves are stored in a built-in array called ARGV. The awk command itself is counted as an argument, so ARGV[0] is awk. ARGV[1] is the next command-line argument, ARGV[2] comes after that, and so on.

Since by default awk treats words on the command line as input filenames, you must tell it not to try to read the contents of your command-line arguments. If you want to use a word as an argument, you must read its value in a BEGIN statement and then set the corresponding ARGV element to null so that it will not be treated as a filename. For example,

 BEGIN {   searchpattern=ARGV[1]   ARGV[1]="" } $0 ~ searchpattern {print}

sets a variable called searchpattern equal to the first command-line argument and then sets ARGV[1] equal to null so that awk will not try to read in lines from it. The program then searches its input for the word in searchpattern and prints any lines that match.

Printing Output

There are two commands for printing output in awk. One of these is the print command, which you have been using. The other command, printf, can be used to print formatted output.

The print command has this form:

 print expr1, expr2, ...

The expressions may be variables, strings, or any other awk expression. The commas are necessary if you want items separated by the output field separator. If you leave out the commas, the values will be printed with no separators between them. Remember that if you want to print a string, it must be enclosed in quotes. A word that is not enclosed in quotes is treated as a variable. By itself, print prints the entire record ($0).

You can control the character used to separate output fields by setting the output field separator (OFS) variable. The following statement prints the item name and selling price from an inventory file, using a tab as the output field separator:

 BEGIN {OFS="\t"} {print $1, $4}

The printf command provides formatted output, similar to C. With printf, you can print out a number as an integer, with different numbers of decimal places, or in octal or hex. You can print a string left-justified, truncated to a specific length, or padded with initial spaces. For example,

 printf("%s\n%d\n%f\n", $1, $2, $3)

will print three fields-a string, an integer, and a decimal-with a new line after each one.

Sending Output to Files

You can use the shell redirection operators on the command line to save output from an awk program in a file or pipe it to a command. But you can also use file redirection inside a program to send part of the output to a file. For example,

 {   if ($6 ~ "toy")     print $0 >> "toy_file"   else     print $0 >> "alt_file" }

This separates an inventory file into two parts based on the contents of the sixth field. The operator >> is used to append output to the files toy_file and alt_file.