How awk Works


The basic operation of awk is simple. It reads input from a file, a pipe, or the keyboard, and searches each line of input for patterns that you have specified. When it finds a line that matches a pattern, it performs an action. You specify the patterns and actions in an awk program.

An awk program consists of one or more pattern/action statements of the form

 pattern {action} 

A statement like this tells awk to test for the pattern in every line of input, and to perform the corresponding action whenever the pattern matches the input line. The pattern/action concept is an extension of the target/search model used by grep. In grep, the target is a pattern, and the action is to print the line containing the pattern.

You can use awk as a replacement for grep. The following awk program searches for lines containing the word “widget.” When it finds such a line, it prints it.

 /widget/ {print}

The slashes indicate that you are searching for the target string “widget”. The action, print, is enclosed in braces.

Here is another example of a simple awk program:

 /widget/ {w_count=w_count+1}

The pattern is the same, but the action is different. In this case, whenever a line contains “widget,” the variable w_count is incremented by 1.

The simplest way to run an awk program is to include it on the command line as an argument to the awk command, followed by the name of an input file. For example, the following program prints every line from the file inventory that contains the string “widget”:

 $ awk '/widget/ {print}' inventory

This command line consists of the awk command, then the text of the program itself in single quotes, and then name of the input file, inventory. The program text is enclosed in single quotes to prevent the shell from interpreting its contents as separate arguments or as instructions to the shell.

Default Patterns and Actions

If you want the action to apply to every line in the file, omit the pattern. By default, awk will match every line, so an action statement with no pattern causes awk to perform that action for every line in the input. For example, the command

 $ awk '{print $1}' students

uses the special variable $1 to print the first field of every line in the file students.

You can also omit the action. The default action is to print an entire line, so if you specify a pattern with no action, awk will print every line that matches that pattern. For example,

 $ awk '/science/' students

will print every line in students that contains the string science.

Working with Fields

You may recall from Chapter 20 that the shell automatically assigns the variables $1, $2, and so on to the command-line arguments for a script. Similarly, awk automatically separates each line of input into fields and assigns the fields to variables. So $1 is the first field in each line, $2 is the second, and so on. The entire line is in $0.

This makes it easy to work with tables and other formatted text files. For example, instead of printing whole lines, you can print specific fields from a table. Suppose you have the following list of names, states, and phone numbers:

 Ben       IN      650-333-4321 Dan       AK      907-671-4321 Marissa   NJ      732-741-1234 Robin     CA      650-273-1234

If you want to print the names of everyone in area code 650, the pattern to match is 650-, and the action when a match is found is to print the name in the first field.

You can use the awk program

 /650-/ {print $1}

where $1 indicates the first field in each line. You can run this program with the following command:

 $ awk '/650-/ {print $1}' contacts

This produces the following output:

 Ben Robin

Fields are separated by a field separator. The default field separator is white space, consisting of any number of spaces and/or tabs. This means that each word in a line is a separate field. Many structured files use a field separator other than a space, such as a colon, a comma, or a single tab, so that you can have several words in one field. You can use the -F option on the command line to specify the field separator. For example,

 $ awk -F, 'program goes here'

specifies a comma as the separator, and

 $ awk -F"\t" 'program goes here'

tells awk to use a tab as a separator. Since the backslash is a special character in the shell, it must be enclosed in quotation marks. Otherwise, the effect would be to tell awk to use t as the field separator.

Using Standard Input and Output

Like most UNIX System commands, awk uses standard input and output. If you do not specify an input file, the program will read and act on standard input. This allows you to use an awk program as a part of a command pipeline. For example, it is common to use sort to sort data before awk operates on it:

 sort input_file awk -f program_file

Because the default for standard input is the keyboard, if you do not specify an input file, and if it is not part of a pipeline, an awk program will read and act on lines that you type in from the keyboard. This can be useful for testing your awk programs. Remember that you can terminate input by typing CTRL-D.

As with any command that uses standard output, you can redirect output from an awk program to a file or to a pipeline. For example, the command

 $ awk '{print $1}' contacts > namelist

copies the first field from each line of contacts to a file called namelist.

You can get input from multiple files by listing each filename in the command line. awk takes its input from each file in turn. For example, the following command line reads and acts on all of the first file, list1, and then reads and acts on the second file, list2. It sends the output (the first field of each file) to lp.

 $ awk '{print $1}' phone1 phone2 | lp

Running an awk Program from a File

You can store the text of an awk program in a file. To run a program from a file, use awk -f, followed by the filename. The following command line runs the program saved in the file prog_file. awk takes its input from input_file:

 $ awk -f prog_file input_file

If the file is not in the current directory, you must give awk a full pathname. If you are using gawk, you can use the environment variable AWKPATH to specify a list of directories to search for program files. The default AWKPATH is .:/usr/lib/awk:/usr/local/lib/awk. If you modify your AWKPATH, you may want to save it in your shell configuration file (e.g., in .bash_profile if you are using bash).

Here’s how you could set and use AWKPATH in bash:

 $ export AWKPATH=$AWKPATH:$HOME/bin/awk $ ls ~/bin/awk testprog $ gawk -f testprog testinput

An even better way to save an awk program in a file is to create an executable script. If you add the line #!/bin/awk -f (where /bin/awk is the path for awk on your system) to the top of your file, you can run the program as a stand-alone script. You must have execute permission on the file before you can run it.

 $ cat sampleProg #!/bin/awk -f /black/ {print} $ chmod u+x sampleProg $ ./sampleProg inputfile Sphinx of black quartz, judge my vow.

When you run this script, the shell reads the first line and calls awk, which runs the program.

Multiline Programs

You can do a surprising amount with one-line awk programs, but programs can also contain many lines. Multiline programs simply consist of multiple pattern/action statements. Each line of input is checked against all of the patterns in turn. For each matching pattern, the corresponding action is performed. For example,

 $ cat countStudents # Count the number of lines containing "science" or "writing" /science/ { sci = sci + 1 } /writing/ { wri = wri + 1 } # At the end of the input, print the totals END {print sci " science and " wri "writing students." } $ awk -f countStudents student-list 47 science and 39 writing students.

This program uses the END statement to perform an action at the end of the input. See the section “BEGIN and END” later in this chapter for more information about how END works.

An action statement can also continue over multiple lines. Although you can chain together multiple actions using semicolons, your programs will be easier to read if you break them up into separate lines. If you do, the opening brace of the action must be on the same line as the pattern it matches. You can have as many lines as you want in the action before the final brace. For example,

 $ cat numberLines # Add line numbers to the input # Since there is no pattern, do this to every line in the file {   n = n + 1             # add 1 to the number of lines   print n " " $0        # print the line number, a space, and the original line }

The comments in these programs make them easier to read. Like the shell, awk uses the # symbol for comments. Any line or part of a line beginning with the # symbol will be ignored by awk. The comment begins with the # character and ends at the end of the line.




UNIX. The Complete Reference
UNIX: The Complete Reference, Second Edition (Complete Reference Series)
ISBN: 0072263369
EAN: 2147483647
Year: 2006
Pages: 316

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net