Chapter 5. The awk Utility: awk as a UNIX Tool

CONTENTS

5.1 What Is awk?
5.2 awk's Format
5.3 Formatting Output
5.4 awk Commands from Within a File
5.5 Records and Fields
5.6 Patterns and Actions
5.7 Regular Expressions
5.8 awk Commands in a Script File
5.9 Review
UNIX TOOLS LAB EXERCISE

graphics/ch05.gif

5.1 What Is awk?

Awk is a programming language used for manipulating data and generating reports. The data may come from standard input, one or more files, or as output from a process. Awk can be used at the command line for simple operations, or it can be written into programs for larger applications. Because awk can manipulate data, it is an indispensable tool used in shell scripts and for managing small databases.

Awk scans a file (or input) line by line, from the first to the last line, searching for lines that match a specified pattern and performing selected actions (enclosed in curly braces) on those lines. If there is a pattern with no specific action, all lines that match the pattern are displayed; if there is an action with no pattern, all input lines specified by the action are executed upon.

5.1.1 What Does awk Stand for?

Awk stands for the first initials in the last names of each of the authors of the language, Alfred Aho, Brian Kernighan, and Peter Weinberger. They could have called it wak or kaw, but for whatever reason, awk won out.

5.1.2 Which awk?

There are a number of versions of awk: old awk, new awk, gnu awk (gawk), POSIX awk, and others. Awk was originally written in 1977, and in 1985, the original implementation was improved so that awk could handle larger programs. Additional features included user-defined functions, dynamic regular expressions, processing multiple input files, and more. On most systems, the command is awk if using the old version, nawk if using the new version, and gawk if using the gnu version.^[1]

5.2 awk's Format

An awk program consists of the awk command, the program instructions enclosed in quotes (or in a file), and the name of the input file. If an input file is not specified, input comes from standard input (stdin), the keyboard.

Awk instructions consist of patterns, actions, or a combination of patterns and actions. A pattern is a statement consisting of an expression of some type. If you do not see the keyword if, but you think the word if when evaluating the expression, it is a pattern. Actions consist of one or more statements separated by semicolons or newlines and enclosed in curly braces. Patterns cannot be enclosed in curly braces, and consist of regular expressions enclosed in forward slashes or expressions consisting of one or more of the many operators provided by awk.

Awk commands can be typed at the command line or in awk script files. The input lines can come from files, pipes, or standard input.

5.2.1 Input from Files

In the following examples, the percent sign (%) is the C shell prompt.

FORMAT

% nawk 'pattern' filename % nawk '{action}' filename % nawk 'pattern {action}' filename

Here is a sample file called employees:

Example 5.1

% cat employees Tom Jones     4424   5/12/66   543354 Mary Adams    5346   11/4/63   28765 Sally Chang   1654   7/22/54   650000 Billy Black   1683   9/23/44   336500 % nawk '/Mary/' employees Mary Adams    5346   11/4/63   28765

EXPLANATION

Nawk prints all lines that contain the pattern Mary.

Example 5.2

% cat employees Tom Jones     4424    5/12/66   543354 Mary Adams    5346    11/4/63   28765 Sally Chang   1654    7/22/54   650000 Billy Black   1683    9/23/44   336500 % nawk '{print $1}' employees Tom Mary Sally Billy

EXPLANATION

Nawk prints the first field of file employees, where the field starts at the left margin of the line and is delimited by whitespace.

Example 5.3

% cat employees Tom Jones     4424    5/12/66   543354 Mary Adams    5346    11/4/63   28765 Sally Chang   1654    7/22/54   650000 Billy Black   1683    9/23/44   336500 % nawk '/Sally/{print $1, $2}' employees Sally Chang

EXPLANATION

Nawk prints the first and second fields of file employees, only if the line contains the pattern Sally. Remember, the field separator is whitespace.

5.2.2 Input from Commands

The output from a UNIX command or commands can be piped to awk for processing. Shell programs commonly use awk for manipulating commands.

FORMAT

% command | nawk 'pattern' % command | nawk '{action}' % command | nawk 'pattern {action}'

Example 5.4

1   % df | nawk '$4 > 75000'     /oracle   (/dev/dsk/c0t0d057 ):390780 blocks      105756 files     /opt      (/dev/dsk/c0t0d058 ):1943994 blocks      49187 files 2   % rusers  | nawk '/root$/{print  $1}'     owl     crow     bluebird

EXPLANATION

The df command reports the free disk space on file systems. The output of the df command is piped to nawk (new awk). If the fourth field is greater than 75,000 blocks, the line is printed.
The rusers command prints those logged on remote machines on the network. The output of the rusers command is piped to nawk as input. The first field is printed if the regular expression root is matched at the end of the line ($); that is, all machine names are printed where root is logged on.

5.3 Formatting Output

5.3.1 The print Function

The action part of the awk command is enclosed in curly braces. If no action is specified and a pattern is matched, awk takes the default action, which is to print the lines that are matched to the screen. The print function is used to print simple output that does not require fancy formatting. For more sophisticated formatting, the printf or sprintf functions are used. If you are familiar with C, then you already know how printf and sprintf work.

The print function can also be explicitly used in the action part of awk as {print}. The print function accepts arguments as variables, computed values, or string constants. Strings must be enclosed in double quotes. Commas are used to separate the arguments; if commas are not provided, the arguments are concatenated together. The comma evaluates to the value of the output field separator (OFS), which is by default a space.

The output of the print function can be redirected or piped to another program, and the output of another program can be piped to awk for printing. (See "Redirection" on page 16 and "Pipes" on page 19.)

Example 5.5

% date Wed Jul 28 22:23:16 PDT 2001 % date | nawk '{ print "Month: " $2 "\nYear: " , $6 }' Month: Jul Year: 2001

EXPLANATION

The output of the UNIX date command will be piped to nawk. The string Month: is printed, followed by the second field, the string containing the newline character, \n, and Year:, followed by the sixth field ($6).

Escape Sequences. Escape sequences are represented by a backslash and a letter or number. They can be used in strings to represent tabs, newlines, form feeds, and so forth (see Table 5.1).

Table 5.1. print Escape Sequences
Escape Sequence	Meaning
\b	Backspace.
\f	Form feed.
\n	Newline.
\r	Carriage return.
\t	Tab.
\047	Octal value 47, a single quote.
\c	c represents any other character, e.g., \".

Example 5.6

Tom Jones      4424    5/12/66   543354 Mary Adams     5346    11/4/63   28765 Sally Chang    1654    7/22/54   650000 Billy Black    1683    9/23/44   336500 % nawk '/Sally/{print "\t\tHave a nice day, " $1, $2 "\!"}' employees         Have a nice day, Sally Chang!

EXPLANATION

If the line contains the pattern Sally, the print function prints two tabs, the string Have a nice day, the first (where $1 is Sally) and second fields (where $2 is Chang), followed by a string containing two exclamation marks.

5.3.2 The OFMT Variable

When printing numbers, you may want to control the format of the number. Normally this would be done with the printf function, but the special awk variable, OFMT, can be set to control the printing of numbers when using the print function. It is set by default to "%.6g" six significant digits to the right of the decimal are printed. (The following section describes how this value can be changed.)

Example 5.7

% nawk  'BEGIN{OFMT="%.2f"; print 1.2456789, 12E 2}' 1.25  0.12

EXPLANATION

The OFMT variable is set so that floating point numbers (f) will be printed with two numbers following the decimal point. The percent sign (%) indicates a format is being specified.

5.3.3 The printf Function

When printing output, you may want to specify the amount of space between fields so that columns line up neatly. Since the print function with tabs does not always guarantee the desired output, the printf function can be used for formatting fancy output.

The printf function returns a formatted string to standard output, like the printf statement in C. The printf statement consists of a quoted control string that may be imbedded with format specifications and modifiers. The control string is followed by a comma and a list of comma-separated expressions that will be formatted according to the specifications stated in the control string. Unlike the print function, printf does not provide a newline. The escape sequence, \n, must be provided if a newline is desired.

For each percent sign and format specifier, there must be a corresponding argument. To print a literal percent sign, two percent signs must be used. See Table 5.2 for a list of printf conversion characters and Table 5.3 for printf modifiers. The format specifiers are preceded by a percent sign; see Table 5.4 for a list of format printf specifiers.

When an argument is printed, the place where the output is printed is called the field, and the width of the field is the number of characters contained in that field.

The pipe symbol (vertical bar) in the following examples, when part of the printf string, is part of the text and is used to indicate where the formatting begins and ends.

Example 5.8

1   % echo "UNIX" | nawk ' {printf "|% 15s|\n", $1}'     (Output)     |UNIX              | 2   % echo "UNIX" | nawk '{ printf "|%15s|\n", $1}'     (Output)     |              UNIX|

EXPLANATION

The output of the echo command, UNIX, is piped to nawk. The printf function contains a control string. The percent sign alerts printf that it will be printing a 15-space, left-justified string enclosed in vertical bars and terminated with a newline. The dash after the percent sign indicates left justification. The control string is followed by a comma and $1. The string UNIX will be formatted according to the format specification in the control string.
The string UNIX is printed in a right-justified, 15-space string, enclosed in vertical bars, and terminated with a newline.

Example 5.9

% cat employees Tom Jones     4424    5/12/66   543354 Mary Adams    5346    11/4/63   28765 Sally Chang   1654    7/22/54   650000 Billy Black   1683    9/23/44   336500 % nawk '{printf "The name is: %-15s ID  is %8d\n", $1, $3}' employees The name is Tom               ID is4424 The name is Mary              ID is5346 The name is Sally             ID is1654 The name is Billy             ID is1683

EXPLANATION

The string to be printed is enclosed in double quotes. The first format specifier is %-15s. It has a corresponding argument, $1, positioned directly to the right of the comma after the closing quote in the control string. The percent sign indicates a format specification: The dash means left justify, the 15s means 15-space string. At this spot, print a left-justified, 15-space string followed by the string ID is and a number.

The %8d format specifies that the decimal (integer) value of $2 will be printed in its place within the string. The number will be right justified and take up eight spaces. Placing the quoted string and expressions within parentheses is optional.

Table 5.2. printf Conversion Characters
Conversion Character	Definition
c	Character.
s	String.
d	Decimal number.
ld	Long decimal number.
u	Unsigned decimal number.
lu	Long unsigned decimal number.
x	Hexadecimal number.
lx	Long hexidecimal number.
o	Octal number.
lo	Long octal number.
e	Floating point number in scientific notation (e-notation).
f	Floating point number.
g	Floating point number using either e or f conversion, whichever takes the least space.

Table 5.3. printf Modifiers
Character	Definition
-	Left-justification modifier.
#	Integers in octal format are displayed with a leading 0; integers in hexadecimal form are displayed with a leading 0x.
+	For conversions using d, e, f, and g, integers are displayed with a numeric sign + or -.
0	The displayed value is padded with zeros instead of whitespace.

Table 5.4. printf Format Specifiers
Format Specifier	What It Does
Given x = 'A', y = 15, z = 2.3, and $1 = Bob Smith:
%c	Prints a single ASCII character. printf("The character is %c\n",x) prints: The character is A.
%d	Prints a decimal number. printf("The boy is %d years old\n", y) prints: The boy is 15 years old.
%e	Prints the e notation of a number. printf("z is %e\n",z) prints: z is 2.3e+01.
%f	Prints a floating point number. printf("z is %f\n", 2.3 *2) prints: z is 4.600000.
%o	Prints the octal value of a number. printf("y is %o\n", y) prints: z is 17.
%s	Prints a string of characters. printf("The name of the culprit is %s\n", $1) prints: The name of the culprit is Bob Smith.
%x	Prints the hex value of a number. printf ("y is %x\n", y) prints: x is f.

5.4 awk Commands from Within a File

If awk commands are placed in a file, the f option is used with the name of the awk file, followed by the name of the input file to be processed. A record is read into awk's buffer and each of the commands in the awk file are tested and executed for that record. After awk has finished with the first record, it is discarded and the next record is read into the buffer, and so on. If an action is not controlled by a pattern, the default behavior is to print the entire record. If a pattern does not have an action associated with it, the default is to print the record where the pattern matches an input line.

Example 5.10

(The Database)    $1       $2         $3         $4           $5    Tom      Jones      4424       5/12/66      543354    Mary     Adams      5346       11/4/63      28765    Sally    Chang      1654       7/22/54      650000    Billy    Black      1683       9/23/44      336500    % cat awkfile 1  /^Mary/{print "Hello Mary!"} 2  {print $1, $2, $3}    % nawk  f awkfile employees    Tom Jones 4424    Hello Mary!    Mary Adams 5346    Sally Chang 1654    Billy Black 1683

EXPLANATION

If the record begins with the regular expression Mary, the string Hello Mary! is printed. The action is controlled by the pattern preceding it. Fields are separated by whitespace.
The first, second, and third field of each record are printed. The action occurs for each line because there is not a pattern controlling the action.

5.5 Records and Fields

5.5.1 Records

Awk does not see input data as an endless string of characters, but sees it as having a format or structure. By default, each line is called a record and is terminated with a newline.

The Record Separator. By default, the output and input record separator (line separator) is a carriage return, stored in the built-in awk variables ORS and RS, respectively. The ORS and RS values can be changed, but only in a limited fashion.

The $0 Variable. An entire record is referenced as $0 by awk. (When $0 is changed by substitution or assignment, the value of NF, the number of fields, may be changed.) The newline value is stored in awk's built-in variable RS, a carriage return by default.

Example 5.11

% cat employees Tom Jones     4424     5/12/66     543354 Mary Adams    5346     11/4/63     28765 Sally Chan    1654     7/22/54     650000 Billy Blac    1683     9/23/44     336500 % nawk '{print $0}' employees Tom Jones     4424     5/12/66     543354 Mary Adams    5346     11/4/63     28765 Sally Chang   1654     7/22/54     650000 Billy Black   1683     9/23/44     336500

EXPLANATION

The nawk variable $0 holds the current record. It is printed to the screen. By default, nawk would also print the record if the command were

% nawk '{print}' employees

The NR Variable. The number of each record is stored in awk's built-in variable, NR. After a record has been processed, the value of NR is incremented by one.

Example 5.12

% cat employees Tom Jones     4424     5/12/66     543354 Mary Adams    5346     11/4/63     28765 Sally Chang   1654     7/22/54     650000 Billy Black   1683     9/23/44     336500 % nawk '{print NR, $0}' employees 1 Tom Jones   4424     5/12/66     543354 2 Mary Adams  5346     11/4/63     28765 3 Sally Chang 1654     7/22/54     650000 4 Billy Black 1683     9/23/44     336500

EXPLANATION

Each record, $0, is printed as it is stored in the file and is preceded with the number of the record, NR.

5.5.2 Fields

Each record consists of words called fields which, by default, are separated by whitespace, that is, blank spaces or tabs. Each of these words is called a field, and awk keeps track of the number of fields in its built-in variable, NF. The value of NF can vary from line to line, and the limit is implementation-dependent, typically 100 fields per line. New fields can be created. The following example has four records (lines) and five fields (columns). Each record starts at the first field, represented as $1, then moves to the second field, $2, and so forth.

Example 5.13

(Fields are represented by a dollar sign and the number of the field.) (The Database) $1        $2         $3        $4          $5 Tom       Jones      4424      5/12/66     543354 Mary      Adams      5346      11/4/63     28765 Sally     Chang      1654      7/22/54     650000 Billy     Black      1683      9/23/44     336500 % nawk '{print NR, $1, $2, $5}'  employees 1 Tom Jones 543354 2 Mary Adams 28765 3 Sally Chang 650000 4 Billy Black 336500

EXPLANATION

Nawk will print the number of the record (NR), and the first, second, and fifth fields (columns) of each line in the file.

Example 5.14

% nawk  '{print $0, NF}'  employees Tom Jones      44234  5/12/66  543354    5 Mary Adams     5346   11/4/63  28765     5 Sally Chang    1654   7/22/54  650000    5 Billy Black    1683   9/23/44  336500    5

EXPLANATION

Nawk will print each record ($0) in the file, followed by the number of fields.

5.5.3 Field Separators

The Input Field Separator. Awk's built-in variable, FS, holds the value of the input field separator. When the default value of FS is used, awk separates fields by spaces and/or tabs, stripping leading blanks and tabs. The FS can be changed by assigning a new value to it, either in a BEGIN statement or at the command line. For now, we will assign the new value at the command line. To change the value of FS at the command line, the F option is used, followed by the character representing the new separator.

Changing the Field Separator at the Command Line

Example 5.15

% cat employees Tom Jones:4424:5/12/66:543354 Mary Adams:5346:11/4/63:28765 Sally Chang:1654:7/22/54:650000 Billy Black:1683:9/23/44:336500 % nawk  F: '/Tom Jones/{print $1, $2}'  employees2 Tom Jones  4424

EXPLANATION

The F option is used to reassign the value of the input field separator at the command line. When a colon is placed directly after the F option, nawk will look for colons to separate the fields in the employees file.

Using More Than One Field Separator. You may specify more than one input separator. If more than one character is used for the field separator, FS, then the string is a regular expression and is enclosed in square brackets. In the following example, the field separator is a space, colon, or tab. (The old version of awk did not support this feature.)

Example 5.16

% nawk  F'[ :\t]'  '{print $1, $2, $3}' employees Tom Jones 4424 Mary Adams 5346 Sally Chang 1654 Billy Black 1683

EXPLANATION

The F option is followed by a regular expression enclosed in brackets. If a space, colon, or tab is encountered, nawk will use that character as a field separator. The expression is surrounded by quotes so that the shell will not pounce on the metacharacters for its own. (Remember that the shell uses brackets for filename expansion.)

The Output Field Separator. The default output field separator is a single space and is stored in awk's internal variable, OFS. In all of the examples thus far, we have used the print statement to send output to the screen. The comma that is used to separate fields in print statements evaluates to whatever the OFS has been set. If the default is used, the comma inserted between $1 and $2 will evaluate to a single space and the print function will print the fields with a space between them. The OFS can be changed.

The fields are jammed together because the comma was not used to separate the fields. The OFS will not be evaluated unless the comma separates the fields.

Example 5.17

% cat employees2 Tom Jones:4424:5/12/66:543354 Mary Adams:5346:11/4/63:28765 Sally Chang:1654:7/22/54:650000 Billy Black:1683:9/23/44:336500 (The Command Line) % nawk  F: '/Tom Jones/{print $1, $2, $3, $4}' employees2 Tom Jones  4424 5/12/66  543354

EXPLANATION

The output field separator, a space, is stored in nawk's OFS variable. The comma between the fields evaluates to whatever is stored in OFS. The fields are printed to standard output separated by a space.

Example 5.18

% nawk  F: '/Tom Jones/{print $1 $2 $3 $4}' employees2 Tom Jones44245/12/66543354

EXPLANATION

% nawk  F: '/Tom Jones/{print $0}' employees2 Tom Jones:4424:5/12/66:543354

The $0 variable holds the current record exactly as it is found in the input file. The record will be printed as-is.

5.6 Patterns and Actions

5.6.1 Patterns

Awk patterns control what actions awk will take on a line of input. A pattern consists of a regular expression, an expression resulting in a true or false condition, or a combination of these. The default action is to print each line where the expression results in a true condition. When reading a pattern expression, there is an implied if statement. When an if is implied, there can be no curly braces surrounding it. When the if is explicit, it becomes an action statement and the syntax is different. (See "Conditional Statements".)

Example 5.19

% cat employees Tom Jones      4424    5/12/66   543354 Mary Adams     5346    11/4/63   28765 Sally Chang    1654    7/22/54   650000 Billy Black    1683    9/23/44   336500 (The Command Line) 1    nawk '/Tom/' employees      Tom Jones   4424  5/12/66   543354 2    nawk '$3 < 4000' employees      Sally Chang 1654  7/22/54   650000      Billy Black 1683  9/23/44   336500

EXPLANATION

If the pattern Tom is matched in the input file, the record is printed. The default action is to print the line if no explicit action is specified. This is equivalent to
```
nawk '$0 ~ /Tom/{print $0}' employees 
```
If the third field is less than 4000, the record is printed.

5.6.2 Actions

Actions are statements enclosed within curly braces and separated by semicolons.^[2] If a pattern precedes an action, the pattern dictates when the action will be performed. Actions can be simple statements or complex groups of statements. Statements are separated by semicolons, or by a newline if placed on their own line.

nawk '/Tom/\{print "hi Tom"\};\{x=5\}' file

FORMAT

{ action }

Example 5.20

{ print $1, $2 }

EXPLANATION

The action is to print fields 1 and 2.

Patterns can be associated with actions. Remember, actions are statements enclosed in curly braces. A pattern controls the action from the first open curly brace to the first closing curly brace. If an action follows a pattern, the first opening curly brace must be on the same line as the pattern. Patterns are never enclosed in curly braces.

FORMAT

pattern{ action statement; action statement; etc. }        or pattern{        action statement        action statement }

Example 5.21

% nawk '/Tom/{print "Hello there, "  $1}' employees Hello there, Tom

EXPLANATION

If the record contains the pattern Tom, the string Hello there, Tom will print.

A pattern with no action displays all lines matching the pattern. String-matching patterns contain regular expressions enclosed in forward slashes.

5.7 Regular Expressions

A regular expression to awk is a pattern that consists of characters enclosed in forward slashes. Awk supports the use of regular expression metacharacters (same as egrep) to modify the regular expression in some way. If a string in the input line is matched by the regular expression, the resulting condition is true, and any actions associated with the expression are executed. If no action is specified and an input line is matched by the regular expression, the record is printed. See Table 5.5.

Example 5.22

% nawk  '/Mary/'  employees Mary Adams     5346   11/4/63   28765

EXPLANATION

All lines in the employees file containing the regular expression pattern Mary are displayed.

Example 5.23

% nawk  '/Mary/{print $1, $2}'  employees Mary Adams

EXPLANATION

The first and second fields of all lines in the employees file containing the regular expression pattern Mary are displayed.

Table 5.5. awk Regular Expression Metacharacters
^	Matches at the beginning of string.
$	Matches at the end of string.
.	Matches for a single character.
*	Matches for zero or more of preceding character.
+	Matches for one or more of preceding character.
?	Matches for zero or one of preceding character.
[ABC]	Matches for any one character in the set of characters, i.e., A, B, or C.
[^ABC]	Matches character not in the set of characters, i.e., A, B, or C.
[A Z]	Matches for any character in the range from A to Z.
A\|B	Matches either A or B.
(AB)+	Matches one or more sets of AB.
\*	Matches for a literal asterisk.
&	Used in the replacement string, to represent what was found in the search string.

Example 5.24

% nawk  '/^Mary/'  employees Mary Adams    5346   11/4/63   28765

EXPLANATION

All lines in the employees file that start with the regular expression Mary are displayed.

Example 5.25

% nawk  '/^[A-Z][a-z]+ /'  employees Tom Jones     4424   5/12/66   543354 Mary Adams    5346   11/4/63   28765 Sally Chang   1654   7/22/54   650000 Billy Black   1683   9/23/44   336500

EXPLANATION

All lines in the employees file where the line begins with an uppercase letter, followed by one or more lowercase letters, followed by a space are displayed.

5.7.1 The match Operator

The match operator, the tilde (~), is used to match an expression within a record or field.

Example 5.26

% cat employees Tom Jones     44234  5/12/66   543354 Mary Adams    5346   11/4/63   28765 Sally Chang   1654   7/22/54   650000 Billy Black   1683   9/23/44   336500 % nawk '$1 ~ /[Bb]ill/' employees Billy Black    1683    9/23/44   336500

EXPLANATION

Any lines matching Bill or bill in the first field are displayed.

Example 5.27

% nawk '$1 !~ /ly$/' employees Tom Jones     4424    5/12/66   543354 Mary Adams    5346    11/4/63   28765

EXPLANATION

Any lines not matching ly, when ly is at the end of the first field are displayed.

5.8 awk Commands in a Script File

When you have multiple awk pattern/action statements, it is often much easier to put the statements in a script. The script is a file containing awk comments and statements. If statements and actions are on the same line, they are separated by semicolons. If statements are on separate lines, semicolons are not necessary. If an action follows a pattern, the opening curly brace must be on the same line as the pattern. Comments are preceded by a pound (#) sign.

Example 5.28

% cat employees Tom Jones:4424:5/12/66:54335 Mary Adams:5346:11/4/63:28765 Billy Black:1683:9/23/44:336500 Sally Chang:1654:7/22/54:65000 Jose Tomas:1683:9/23/44:33650 (The Awk Script) % cat info 1   # My first awk script by Jack Sprat     # Script name: info; Date: February 28, 2001 2   /Tom/{print  "Tom's birthday is "$3} 3   /Mary/{print NR, $0} 4   /^Sally/{print "Hi Sally. "  $1 " has a salary of  $" $4 "."}     # End of info script (The Command Line) 5   % nawk  F:  f info employees2     Tom's birthday is 5/12/66     2 Mary Adams:5346:11/4/63:28765     Hi Sally. Sally Chang has a salary of $65000.

EXPLANATION

These are comment lines.
If the regular expression Tom is matched against an input line, the string Tom's birthday is and the value of the third field ($3) are printed.
If the regular expression Mary is matched against an input line, the action block prints NR, the number of the current record, and the record.
If the regular expression Sally is found at the beginning of the input line, the string Hi Sally. is printed, followed by the value of the first field ($1), the string has a salary of $, and the value of the fourth field ($4).
The nawk command is followed by the F: option, specifying the colon to be the field separator. The f option is followed by the name of the awk script. Awk will read instructions from the info file. The input file, employees2, is next.

5.9 Review

The examples in this section use a sample database, called datafile. In the database, the input field separator, FS, is whitespace, the default. The number of fields, NF, is 8. The number may vary from line to line, but in this file, the number of fields is fixed. The record separator, RS, is the newline, which separates each line of the file. Awk keeps track of the number of each record in the NR variable. The output field separator, OFS, is a space. If a comma is used to separate fields, when the line is printed, each field printed will be separated by a space.

5.9.1 Simple Pattern Matching

% cat datafile northwest          NW    Joel Craig          3.0     .98     3       4 western            WE    Sharon Kelly        5.3     .97     5       23 southwest          SW    Chris Foster        2.7     .8      2       18 southern           SO    May Chin            5.1     .95     4       15 southeast          SE    Derek Johnson       4.0     .7      4       17 eastern            EA    Susan Beal          4.4     .84     5       20 northeast          NE    TJ Nichols          5.1     .94     3       13 north              NO    Val Shultz          4.5     .89     5       9 central            CT    Sheri Watson        5.7     .94     5       13

Example 5.29

nawk '/west/'  datafile northwest     NW      Joel Craig         3.0  .98  3      4 western       WE      Sharon Kelly       5.3  .97  5      23 southwest     SW      Chris Foster       2.7  .8   2      18

EXPLANATION

All lines containing the pattern west are printed.

Example 5.30

nawk '/^north/' datafile northwest     NW      Joel Craig          3.0  .98  3    4 northeast     NE      TJ Nichols          5.1  .94  3    13 north         NO      Val Shultz          4.5  .89  5    9

EXPLANATION

All lines beginning with the pattern north are printed.

Example 5.31

nawk '/^(no|so)/' datafile northwest     NW      Joel Craig          3.0  .98  3    4 southwest     SW      Chris Foster        2.7  .8   2    18 southern      SO      May Chin            5.1  .95  4    15 southeast     SE      Derek Johnson       4.0  .7   4    17 northeast     NE      TJ Nichols          5.1  .94  3    13 north         NO      Val Shultz          4.5  .89  5    9

EXPLANATION

All lines beginning with the pattern no or so are printed.

5.9.2 Simple Actions

Example 5.32

nawk '{print $3, $2}' datafile Joel NW Sharon WE Chris SW May SO Derek SE Susan EA TJ NE Val NO Sheri CT

EXPLANATION

The output field separator, OFS, is a space by default. The comma between $3 and $2 is translated to the value of the OFS. The third field is printed, followed by a space and the second field.

% cat datafile northwest           NW   Joel Craig           3.0    .98     3     4 western             WE   Sharon Kelly         5.3    .97     5     23 southwest           SW   Chris Foster         2.7    .8      2     18 southern            SO   May Chin             5.1    .95     4     15 southeast           SE   Derek Johnson        4.0    .7      4     17 eastern             EA   Susan Beal           4.4    .84     5     20 northeast           NE   TJ Nichols           5.1    .94     3     13 north               NO   Val Shultz           4.5    .89     5     9 central             CT   Sheri Watson         5.7    .94     5     13

Example 5.33

nawk '{print $3 $2}' datafile JoelNW SharonWE ChrisSW MaySO DerekSE SusanEA TJNE ValNO SheriCT

EXPLANATION

The third field is followed by the second field. Since the comma does not separate fields $3 and $2, the output is displayed without spaces between the fields.

Example 5.34

nawk 'print $1' datafile nawk: syntax error at source line 1  context is          >>> print <<<  $1 nawk: bailing out at source line 1

EXPLANATION

This is the nawk (new awk) error message. Nawk error messages are much more verbose than those of the old awk. In this program, the curly braces are missing in the action statement.

Example 5.35

awk 'print $1' datafile awk: syntax error near line 1 awk: bailing out near line 1

EXPLANATION

This is the awk (old awk) error message. Old awk programs were difficult to debug since almost all errors produced this same message. The curly braces are missing in the action statement.

Example 5.36

nawk '{print $0}' datafile northwest     NW      Joel Craig          3.0  .98  3    4 western       WE      Sharon Kelly        5.3  .97  5    23 southwest     SW      Chris Foster        2.7  .8   2    18 southern      SO      May Chin            5.1  .95  4    15 southeast     SE      Derek Johnson       4.0  .7   4    17 eastern       EA      Susan Beal          4.4  .84  5    20 northeast     NE      TJ Nichols          5.1  .94  3    13 north         NO      Val Shultz          4.5  .89  5    9 central       CT      Sheri Watson        5.7  .94  5    13

EXPLANATION

Each record is printed. $0 holds the current record.

Example 5.37

nawk '{print "Number of fields: "NF}' datafile Number of fields: 8 Number of fields: 8 Number of fields: 8 Number of fields: 8 Number of fields: 8 Number of fields: 8 Number of fields: 8 Number of fields: 8 Number of fields: 8

EXPLANATION

There are 8 fields in each record. The built-in awk variable NF holds the number of fields and is reset for each record.

% cat datafile northwest           NW   Joel Craig          3.0    .98    3     4 western             WE   Sharon Kelly        5.3    .97    5     23 southwest           SW   Chris Foster        2.7    .8     2     18 southern            SO   May Chin            5.1    .95    4     15 southeast           SE   Derek Johnson       4.0    .7     4     17 eastern             EA   Susan Beal          4.4    .84    5     20 northeast           NE   TJ Nichols          5.1    .94    3     13 north               NO   Val Shultz          4.5    .89    5     9 central             CT   Sheri Watson        5.7    .94    5     13

5.9.3 Regular Expressions in Pattern and Action Combinations

Example 5.38

nawk '/northeast/{print $3, $2}' datafile TJ NE

EXPLANATION

If the record contains (or matches) the pattern northeast, the third field, followed by the second field, is printed.

Example 5.39

nawk '/E/' datafile western       WE      Sharon Kelly         5.3  .97  5    23 southeast     SE      Derek Johnson        4.0  .7   4    17 eastern       EA      Susan Beal           4.4  .84  5    20 northeast     NE      TJ Nichols           5.1  .94  3    13

EXPLANATION

If the record contains an E, the entire record is printed.

Example 5.40

nawk '/^[ns]/{print $1}' datafile northwest southwest southern southeast northeast north

EXPLANATION

If the record begins with an n or s, the first field is printed.

Example 5.41

nawk '$5 ~ /\.[7-9]+/' datafile southwest     SW     Chris Foster       2.7  .8  2    18 central       CT     Sheri Watson       5.7  .94 5    13

EXPLANATION

If the fifth field ($5) contains a literal period, followed by one or more numbers between 7 and 9, the record is printed.

Example 5.42

nawk '$2 !~ /E/{print $1, $2}' datafile northwest NW southwest SW southern SO north NO central CT

EXPLANATION

If the second field does not contain the pattern E, the first field followed by the second field ($1, $2) is printed.

% cat datafile northwest           NW   Joel Craig           3.0    .98    3    4 western             WE   Sharon Kelly         5.3    .97    5    23 southwest           SW   Chris Foster         2.7    .8     2    18 southern            SO   May Chin             5.1    .95    4    15 southeast           SE   Derek Johnson        4.0    .7     4    17 eastern             EA   Susan Beal           4.4    .84    5    20 northeast           NE   TJ Nichols           5.1    .94    3    13 north               NO   Val Shultz           4.5    .89    5    9 central             CT   Sheri Watson         5.7    .94    5    13

Example 5.43

nawk '$3 ~ /^Joel/{print $3 " is a nice guy."}' datafile Joel is a nice guy.

EXPLANATION

If the third field ($3) begins with the pattern Joel, the third field followed by the string is a nice guy. is printed. Note that a space is included in the string if it is to be printed.

Example 5.44

nawk '$8 ~ /[0-9][0-9]$/{print $8}' datafile 23 18 15 17 20 13 13

EXPLANATION

If the eighth field ($8) ends in two digits, it is printed.

Example 5.45

nawk '$4 ~ /Chin$/{print "The price is $" $8 "."}' datafile The price is $15.

EXPLANATION

If the fourth field ($4) ends with Chin, the string enclosed in double quotes ("The price is $"), the eighth field ($8), and the string containing a period are printed.

Example 5.46

nawk '/TJ/{print $0}' datafile northeast     NE      TJ Nichols           5.1   .94    3    13

EXPLANATION

If the record contains the pattern TJ, $0 (the record) is printed.

5.9.4 Input Field Separators

% cat datafile2 Joel Craig:northwest:NW:3.0:.98:3:4 Sharon Kelly:western:WE:5.3:.97:5:23 Chris Foster:southwest:SW:2.7:.8:2:18 May Chin:southern:SO:5.1:.95:4:15 Derek Johnson:southeast:SE:4.0:.7:4:17 Susan Beal:eastern:EA:4.4:.84:5:20 TJ Nichols:northeast:NE:5.1:.94:3:13 Val Shultz:north:NO:4.5:.89:5:9 Sheri Watson:central:CT:5.7:.94:5:13

Example 5.47

nawk '{print $1}' datafile2 Joel Sharon Chris May Derek Susan TJ Val Sheri

EXPLANATION

The default input field separator is whitespace. The first field ($1) is printed.

% cat datafile2 Joel Craig:northwest:NW:3.0:.98:3:4 Sharon Kelly:western:WE:5.3:.97:5:23 Chris Foster:southwest:SW:2.7:.8:2:18 May Chin:southern:SO:5.1:.95:4:15 Derek Johnson:southeast:SE:4.0:.7:4:17 Susan Beal:eastern:EA:4.4:.84:5:20 TJ Nichols:northeast:NE:5.1:.94:3:13 Val Shultz:north:NO:4.5:.89:5:9 Sheri Watson:central:CT:5.7:.94:5:13

Example 5.48

nawk -F: '{print $1}' datafile2 Joel Craig Sharon Kelly Chris Foster               <more output here> Val Shultz Sheri Watson

EXPLANATION

The F option specifies the colon as the input field separator. The first field ($1) is printed.

Example 5.49

nawk '{print "Number of fields: "NF}' datafile2 Number of fields: 2 Number of fields: 2 Number of fields: 2               <more of the same output here> Number of fields: 2 Number of fields: 2

EXPLANATION

Since the field separator is the default (whitespace) the number of fields for each record is 2. The only space is between the first and last name.

Example 5.50

nawk -F: '{print "Number of fields: "NF}' datafile2 Number of fields: 7 Number of fields: 7 Number of fields: 7               <more of the same output here> Number of fields: 7 Number of fields: 7

EXPLANATION

Since the field separator is a colon, the number of fields in each record is 7.

Example 5.51

nawk -F"[ :]" '{print $1, $2}' datafile2 Joel Craig northwest Sharon Kelly western Chris Foster southwest May Chin southern Derek Johnson southeast Susan Beal eastern TJ Nichols northeast Val Shultz north Sheri Watson central

EXPLANATION

Multiple field separators can be specified with nawk as a regular expression. Either a space or a colon will be designated as a field separator. The first and second fields ($1, $2) are printed.

% cat datafile northwest           NW   Joel Craig           3.0    .98    3    4 western             WE   Sharon Kelly         5.3    .97    5    23 southwest           SW   Chris Foster         2.7    .8     2    18 southern            SO   May Chin             5.1    .95    4    15 southeast           SE   Derek Johnson        4.0    .7     4    17 eastern             EA   Susan Beal           4.4    .84    5    20 northeast           NE   TJ Nichols           5.1    .94    3    13 north               NO   Val Shultz           4.5    .89    5    9 central             CT   Sheri Watson         5.7    .94    5    13

5.9.5 awk Scripting

Example 5.52

cat nawk.sc1 # This is a comment # This is my first awk script 1   /^north/{print $1, $2, $3} 2   /^south/{print "The " $1 " district."} nawk -f nawk.sc1 datafile 3   northwest NW Joel     The southwest district.     The southern district.     The southeast district.     northeast NE TJ     north NO Val

EXPLANATION

If the record begins with the pattern north, the first, second, and third fields ($1, $2, $3) are printed.
If the record begins with the pattern south, the string The, followed by the value of the first field ($1), and the string district. are printed.
The f option precedes the name of the awk script file, followed by the input file that is to be processed.

UNIX TOOLS LAB EXERCISE

Lab 3: awk Exercise

Mike Harrington:(510) 548-1278:250:100:175

Christian Dobbins:(408) 538-2358:155:90:201

Susan Dalsass:(206) 654-6279:250:60:50

Archie McNichol:(206) 548-1348:250:100:175

Jody Savage:(206) 548-1278:15:188:150

Guy Quigley:(916) 343-6410:250:100:175

Dan Savage:(406) 298-7744:450:300:275

Nancy McNeil:(206) 548-1278:250:80:75

John Goldenrod:(916) 348-4278:250:100:175

Chet Main:(510) 548-5258:50:95:135

Tom Savage:(408) 926-3456:250:168:200

Elizabeth Stachelin:(916) 440-1763:175:75:300

The database above contains the names, phone numbers, and money contributions to the party campaign for the past three months.

1:	Print all the phone numbers.
2:	Print Dan's phone number.
3:	Print Susan's name and phone number.
4:	Print all last names beginning with D.
5:	Print all first names beginning with either a C or E.
6:	Print all first names containing only four characters.
7:	Print the first names of all those in the 916 area code.
8:	Print Mike's campaign contributions. Each value should be printed with a leading dollar sign; e.g., $250 $100 $175.
9:	Print last names followed by a comma and the first name.
10:	Write an awk script called facts that: Prints full names and phone numbers for the Savages. Prints Chet's contributions. Prints all those who contributed $250 the first month.

[1] On SCO UNIX, the new version is spelled awk, and on Linux, the gnu version is spelled awk. This text pertains primarily to the new awk, nawk. The gnu implementation, gawk, is fully upward-compatible with nawk.
[2] On some versions of awk, actions must be separated by semicolons or newlines, and the statements within the curly braces also must be separated by semicolons or newlines. SVR4's nawk requires the use of semicolons or newlines to separate statements within an action, but does not require the use of semicolons to separate actions; for example, the two actions that follow do not need a semicolon:

CONTENTS

Chapter 5. The awk Utility: awk as a UNIX Tool