CONTENTS |
Awk is a programming language used for manipulating data and generating reports. The data may come from standard input, one or more files, or as output from a process. Awk can be used at the command line for simple operations, or it can be written into programs for larger applications. Because awk can manipulate data, it is an indispensable tool used in shell scripts and for managing small databases.
Awk scans a file (or input) line by line, from the first to the last line, searching for lines that match a specified pattern and performing selected actions (enclosed in curly braces) on those lines. If there is a pattern with no specific action, all lines that match the pattern are displayed; if there is an action with no pattern, all input lines specified by the action are executed upon.
Awk stands for the first initials in the last names of each of the authors of the language, Alfred Aho, Brian Kernighan, and Peter Weinberger. They could have called it wak or kaw, but for whatever reason, awk won out.
There are a number of versions of awk: old awk, new awk, gnu awk (gawk), POSIX awk, and others. Awk was originally written in 1977, and in 1985, the original implementation was improved so that awk could handle larger programs. Additional features included user-defined functions, dynamic regular expressions, processing multiple input files, and more. On most systems, the command is awk if using the old version, nawk if using the new version, and gawk if using the gnu version.[1]
An awk program consists of the awk command, the program instructions enclosed in quotes (or in a file), and the name of the input file. If an input file is not specified, input comes from standard input (stdin), the keyboard.
Awk instructions consist of patterns, actions, or a combination of patterns and actions. A pattern is a statement consisting of an expression of some type. If you do not see the keyword if, but you think the word if when evaluating the expression, it is a pattern. Actions consist of one or more statements separated by semicolons or newlines and enclosed in curly braces. Patterns cannot be enclosed in curly braces, and consist of regular expressions enclosed in forward slashes or expressions consisting of one or more of the many operators provided by awk.
Awk commands can be typed at the command line or in awk script files. The input lines can come from files, pipes, or standard input.
In the following examples, the percent sign (%) is the C shell prompt.
FORMAT% nawk 'pattern' filename % nawk '{action}' filename % nawk 'pattern {action}' filename |
Here is a sample file called employees:
% cat employees Tom Jones 4424 5/12/66 543354 Mary Adams 5346 11/4/63 28765 Sally Chang 1654 7/22/54 650000 Billy Black 1683 9/23/44 336500 % nawk '/Mary/' employees Mary Adams 5346 11/4/63 28765
EXPLANATIONNawk prints all lines that contain the pattern Mary. |
% cat employees Tom Jones 4424 5/12/66 543354 Mary Adams 5346 11/4/63 28765 Sally Chang 1654 7/22/54 650000 Billy Black 1683 9/23/44 336500 % nawk '{print $1}' employees Tom Mary Sally Billy
EXPLANATIONNawk prints the first field of file employees, where the field starts at the left margin of the line and is delimited by whitespace. |
% cat employees Tom Jones 4424 5/12/66 543354 Mary Adams 5346 11/4/63 28765 Sally Chang 1654 7/22/54 650000 Billy Black 1683 9/23/44 336500 % nawk '/Sally/{print $1, $2}' employees Sally Chang
EXPLANATIONNawk prints the first and second fields of file employees, only if the line contains the pattern Sally. Remember, the field separator is whitespace. |
The output from a UNIX command or commands can be piped to awk for processing. Shell programs commonly use awk for manipulating commands.
FORMAT% command | nawk 'pattern' % command | nawk '{action}' % command | nawk 'pattern {action}' |
1 % df | nawk '$4 > 75000' /oracle (/dev/dsk/c0t0d057 ):390780 blocks 105756 files /opt (/dev/dsk/c0t0d058 ):1943994 blocks 49187 files 2 % rusers | nawk '/root$/{print $1}' owl crow bluebird
EXPLANATION
|
The action part of the awk command is enclosed in curly braces. If no action is specified and a pattern is matched, awk takes the default action, which is to print the lines that are matched to the screen. The print function is used to print simple output that does not require fancy formatting. For more sophisticated formatting, the printf or sprintf functions are used. If you are familiar with C, then you already know how printf and sprintf work.
The print function can also be explicitly used in the action part of awk as {print}. The print function accepts arguments as variables, computed values, or string constants. Strings must be enclosed in double quotes. Commas are used to separate the arguments; if commas are not provided, the arguments are concatenated together. The comma evaluates to the value of the output field separator (OFS), which is by default a space.
The output of the print function can be redirected or piped to another program, and the output of another program can be piped to awk for printing. (See "Redirection" on page 16 and "Pipes" on page 19.)
% date Wed Jul 28 22:23:16 PDT 2001 % date | nawk '{ print "Month: " $2 "\nYear: " , $6 }' Month: Jul Year: 2001
EXPLANATIONThe output of the UNIX date command will be piped to nawk. The string Month: is printed, followed by the second field, the string containing the newline character, \n, and Year:, followed by the sixth field ($6). |
Escape Sequences. Escape sequences are represented by a backslash and a letter or number. They can be used in strings to represent tabs, newlines, form feeds, and so forth (see Table 5.1).
Escape Sequence | Meaning |
---|---|
\b | Backspace. |
\f | Form feed. |
\n | Newline. |
\r | Carriage return. |
\t | Tab. |
\047 | Octal value 47, a single quote. |
\c | c represents any other character, e.g., \". |
Tom Jones 4424 5/12/66 543354 Mary Adams 5346 11/4/63 28765 Sally Chang 1654 7/22/54 650000 Billy Black 1683 9/23/44 336500 % nawk '/Sally/{print "\t\tHave a nice day, " $1, $2 "\!"}' employees Have a nice day, Sally Chang!
EXPLANATIONIf the line contains the pattern Sally, the print function prints two tabs, the string Have a nice day, the first (where $1 is Sally) and second fields (where $2 is Chang), followed by a string containing two exclamation marks. |
When printing numbers, you may want to control the format of the number. Normally this would be done with the printf function, but the special awk variable, OFMT, can be set to control the printing of numbers when using the print function. It is set by default to "%.6g" six significant digits to the right of the decimal are printed. (The following section describes how this value can be changed.)
% nawk 'BEGIN{OFMT="%.2f"; print 1.2456789, 12E 2}' 1.25 0.12
EXPLANATIONThe OFMT variable is set so that floating point numbers (f) will be printed with two numbers following the decimal point. The percent sign (%) indicates a format is being specified. |
When printing output, you may want to specify the amount of space between fields so that columns line up neatly. Since the print function with tabs does not always guarantee the desired output, the printf function can be used for formatting fancy output.
The printf function returns a formatted string to standard output, like the printf statement in C. The printf statement consists of a quoted control string that may be imbedded with format specifications and modifiers. The control string is followed by a comma and a list of comma-separated expressions that will be formatted according to the specifications stated in the control string. Unlike the print function, printf does not provide a newline. The escape sequence, \n, must be provided if a newline is desired.
For each percent sign and format specifier, there must be a corresponding argument. To print a literal percent sign, two percent signs must be used. See Table 5.2 for a list of printf conversion characters and Table 5.3 for printf modifiers. The format specifiers are preceded by a percent sign; see Table 5.4 for a list of format printf specifiers.
When an argument is printed, the place where the output is printed is called the field, and the width of the field is the number of characters contained in that field.
The pipe symbol (vertical bar) in the following examples, when part of the printf string, is part of the text and is used to indicate where the formatting begins and ends.
1 % echo "UNIX" | nawk ' {printf "|% 15s|\n", $1}' (Output) |UNIX | 2 % echo "UNIX" | nawk '{ printf "|%15s|\n", $1}' (Output) | UNIX|
EXPLANATION
|
% cat employees Tom Jones 4424 5/12/66 543354 Mary Adams 5346 11/4/63 28765 Sally Chang 1654 7/22/54 650000 Billy Black 1683 9/23/44 336500 % nawk '{printf "The name is: %-15s ID is %8d\n", $1, $3}' employees The name is Tom ID is4424 The name is Mary ID is5346 The name is Sally ID is1654 The name is Billy ID is1683
EXPLANATIONThe string to be printed is enclosed in double quotes. The first format specifier is %-15s. It has a corresponding argument, $1, positioned directly to the right of the comma after the closing quote in the control string. The percent sign indicates a format specification: The dash means left justify, the 15s means 15-space string. At this spot, print a left-justified, 15-space string followed by the string ID is and a number. The %8d format specifies that the decimal (integer) value of $2 will be printed in its place within the string. The number will be right justified and take up eight spaces. Placing the quoted string and expressions within parentheses is optional. |
Conversion Character | Definition |
---|---|
c | Character. |
s | String. |
d | Decimal number. |
ld | Long decimal number. |
u | Unsigned decimal number. |
lu | Long unsigned decimal number. |
x | Hexadecimal number. |
lx | Long hexidecimal number. |
o | Octal number. |
lo | Long octal number. |
e | Floating point number in scientific notation (e-notation). |
f | Floating point number. |
g | Floating point number using either e or f conversion, whichever takes the least space. |
Character | Definition |
---|---|
- | Left-justification modifier. |
# | Integers in octal format are displayed with a leading 0; integers in hexadecimal form are displayed with a leading 0x. |
+ | For conversions using d, e, f, and g, integers are displayed with a numeric sign + or -. |
0 | The displayed value is padded with zeros instead of whitespace. |
Format Specifier | What It Does |
---|---|
Given x = 'A', y = 15, z = 2.3, and $1 = Bob Smith: | |
%c | Prints a single ASCII character. printf("The character is %c\n",x) prints: The character is A. |
%d | Prints a decimal number. printf("The boy is %d years old\n", y) prints: The boy is 15 years old. |
%e | Prints the e notation of a number. printf("z is %e\n",z) prints: z is 2.3e+01. |
%f | Prints a floating point number. printf("z is %f\n", 2.3 *2) prints: z is 4.600000. |
%o | Prints the octal value of a number. printf("y is %o\n", y) prints: z is 17. |
%s | Prints a string of characters. printf("The name of the culprit is %s\n", $1) prints: The name of the culprit is Bob Smith. |
%x | Prints the hex value of a number. printf ("y is %x\n", y) prints: x is f. |
If awk commands are placed in a file, the f option is used with the name of the awk file, followed by the name of the input file to be processed. A record is read into awk's buffer and each of the commands in the awk file are tested and executed for that record. After awk has finished with the first record, it is discarded and the next record is read into the buffer, and so on. If an action is not controlled by a pattern, the default behavior is to print the entire record. If a pattern does not have an action associated with it, the default is to print the record where the pattern matches an input line.
(The Database) $1 $2 $3 $4 $5 Tom Jones 4424 5/12/66 543354 Mary Adams 5346 11/4/63 28765 Sally Chang 1654 7/22/54 650000 Billy Black 1683 9/23/44 336500 % cat awkfile 1 /^Mary/{print "Hello Mary!"} 2 {print $1, $2, $3} % nawk f awkfile employees Tom Jones 4424 Hello Mary! Mary Adams 5346 Sally Chang 1654 Billy Black 1683
EXPLANATION
|
Awk does not see input data as an endless string of characters, but sees it as having a format or structure. By default, each line is called a record and is terminated with a newline.
The Record Separator. By default, the output and input record separator (line separator) is a carriage return, stored in the built-in awk variables ORS and RS, respectively. The ORS and RS values can be changed, but only in a limited fashion.
The $0 Variable. An entire record is referenced as $0 by awk. (When $0 is changed by substitution or assignment, the value of NF, the number of fields, may be changed.) The newline value is stored in awk's built-in variable RS, a carriage return by default.
% cat employees Tom Jones 4424 5/12/66 543354 Mary Adams 5346 11/4/63 28765 Sally Chan 1654 7/22/54 650000 Billy Blac 1683 9/23/44 336500 % nawk '{print $0}' employees Tom Jones 4424 5/12/66 543354 Mary Adams 5346 11/4/63 28765 Sally Chang 1654 7/22/54 650000 Billy Black 1683 9/23/44 336500
EXPLANATIONThe nawk variable $0 holds the current record. It is printed to the screen. By default, nawk would also print the record if the command were % nawk '{print}' employees |
The NR Variable. The number of each record is stored in awk's built-in variable, NR. After a record has been processed, the value of NR is incremented by one.
% cat employees Tom Jones 4424 5/12/66 543354 Mary Adams 5346 11/4/63 28765 Sally Chang 1654 7/22/54 650000 Billy Black 1683 9/23/44 336500 % nawk '{print NR, $0}' employees 1 Tom Jones 4424 5/12/66 543354 2 Mary Adams 5346 11/4/63 28765 3 Sally Chang 1654 7/22/54 650000 4 Billy Black 1683 9/23/44 336500
EXPLANATIONEach record, $0, is printed as it is stored in the file and is preceded with the number of the record, NR. |
Each record consists of words called fields which, by default, are separated by whitespace, that is, blank spaces or tabs. Each of these words is called a field, and awk keeps track of the number of fields in its built-in variable, NF. The value of NF can vary from line to line, and the limit is implementation-dependent, typically 100 fields per line. New fields can be created. The following example has four records (lines) and five fields (columns). Each record starts at the first field, represented as $1, then moves to the second field, $2, and so forth.
(Fields are represented by a dollar sign and the number of the field.) (The Database) $1 $2 $3 $4 $5 Tom Jones 4424 5/12/66 543354 Mary Adams 5346 11/4/63 28765 Sally Chang 1654 7/22/54 650000 Billy Black 1683 9/23/44 336500 % nawk '{print NR, $1, $2, $5}' employees 1 Tom Jones 543354 2 Mary Adams 28765 3 Sally Chang 650000 4 Billy Black 336500
EXPLANATIONNawk will print the number of the record (NR), and the first, second, and fifth fields (columns) of each line in the file. |
% nawk '{print $0, NF}' employees Tom Jones 44234 5/12/66 543354 5 Mary Adams 5346 11/4/63 28765 5 Sally Chang 1654 7/22/54 650000 5 Billy Black 1683 9/23/44 336500 5
EXPLANATIONNawk will print each record ($0) in the file, followed by the number of fields. |
The Input Field Separator. Awk's built-in variable, FS, holds the value of the input field separator. When the default value of FS is used, awk separates fields by spaces and/or tabs, stripping leading blanks and tabs. The FS can be changed by assigning a new value to it, either in a BEGIN statement or at the command line. For now, we will assign the new value at the command line. To change the value of FS at the command line, the F option is used, followed by the character representing the new separator.
Changing the Field Separator at the Command Line
% cat employees Tom Jones:4424:5/12/66:543354 Mary Adams:5346:11/4/63:28765 Sally Chang:1654:7/22/54:650000 Billy Black:1683:9/23/44:336500 % nawk F: '/Tom Jones/{print $1, $2}' employees2 Tom Jones 4424
EXPLANATIONThe F option is used to reassign the value of the input field separator at the command line. When a colon is placed directly after the F option, nawk will look for colons to separate the fields in the employees file. |
Using More Than One Field Separator. You may specify more than one input separator. If more than one character is used for the field separator, FS, then the string is a regular expression and is enclosed in square brackets. In the following example, the field separator is a space, colon, or tab. (The old version of awk did not support this feature.)
% nawk F'[ :\t]' '{print $1, $2, $3}' employees Tom Jones 4424 Mary Adams 5346 Sally Chang 1654 Billy Black 1683
EXPLANATIONThe F option is followed by a regular expression enclosed in brackets. If a space, colon, or tab is encountered, nawk will use that character as a field separator. The expression is surrounded by quotes so that the shell will not pounce on the metacharacters for its own. (Remember that the shell uses brackets for filename expansion.) |
The Output Field Separator. The default output field separator is a single space and is stored in awk's internal variable, OFS. In all of the examples thus far, we have used the print statement to send output to the screen. The comma that is used to separate fields in print statements evaluates to whatever the OFS has been set. If the default is used, the comma inserted between $1 and $2 will evaluate to a single space and the print function will print the fields with a space between them. The OFS can be changed.
The fields are jammed together because the comma was not used to separate the fields. The OFS will not be evaluated unless the comma separates the fields.
% cat employees2 Tom Jones:4424:5/12/66:543354 Mary Adams:5346:11/4/63:28765 Sally Chang:1654:7/22/54:650000 Billy Black:1683:9/23/44:336500 (The Command Line) % nawk F: '/Tom Jones/{print $1, $2, $3, $4}' employees2 Tom Jones 4424 5/12/66 543354
EXPLANATIONThe output field separator, a space, is stored in nawk's OFS variable. The comma between the fields evaluates to whatever is stored in OFS. The fields are printed to standard output separated by a space. |
% nawk F: '/Tom Jones/{print $1 $2 $3 $4}' employees2 Tom Jones44245/12/66543354
EXPLANATION% nawk F: '/Tom Jones/{print $0}' employees2 Tom Jones:4424:5/12/66:543354 The $0 variable holds the current record exactly as it is found in the input file. The record will be printed as-is. |
Awk patterns control what actions awk will take on a line of input. A pattern consists of a regular expression, an expression resulting in a true or false condition, or a combination of these. The default action is to print each line where the expression results in a true condition. When reading a pattern expression, there is an implied if statement. When an if is implied, there can be no curly braces surrounding it. When the if is explicit, it becomes an action statement and the syntax is different. (See "Conditional Statements".)
% cat employees Tom Jones 4424 5/12/66 543354 Mary Adams 5346 11/4/63 28765 Sally Chang 1654 7/22/54 650000 Billy Black 1683 9/23/44 336500 (The Command Line) 1 nawk '/Tom/' employees Tom Jones 4424 5/12/66 543354 2 nawk '$3 < 4000' employees Sally Chang 1654 7/22/54 650000 Billy Black 1683 9/23/44 336500
EXPLANATION
|
Actions are statements enclosed within curly braces and separated by semicolons.[2] If a pattern precedes an action, the pattern dictates when the action will be performed. Actions can be simple statements or complex groups of statements. Statements are separated by semicolons, or by a newline if placed on their own line.
nawk '/Tom/\{print "hi Tom"\};\{x=5\}' file
FORMAT{ action } |
{ print $1, $2 }
EXPLANATIONThe action is to print fields 1 and 2. Patterns can be associated with actions. Remember, actions are statements enclosed in curly braces. A pattern controls the action from the first open curly brace to the first closing curly brace. If an action follows a pattern, the first opening curly brace must be on the same line as the pattern. Patterns are never enclosed in curly braces. |
FORMATpattern{ action statement; action statement; etc. } or pattern{ action statement action statement } |
% nawk '/Tom/{print "Hello there, " $1}' employees Hello there, Tom
EXPLANATIONIf the record contains the pattern Tom, the string Hello there, Tom will print. A pattern with no action displays all lines matching the pattern. String-matching patterns contain regular expressions enclosed in forward slashes. |
A regular expression to awk is a pattern that consists of characters enclosed in forward slashes. Awk supports the use of regular expression metacharacters (same as egrep) to modify the regular expression in some way. If a string in the input line is matched by the regular expression, the resulting condition is true, and any actions associated with the expression are executed. If no action is specified and an input line is matched by the regular expression, the record is printed. See Table 5.5.
% nawk '/Mary/' employees Mary Adams 5346 11/4/63 28765
EXPLANATIONAll lines in the employees file containing the regular expression pattern Mary are displayed. |
% nawk '/Mary/{print $1, $2}' employees Mary Adams
EXPLANATIONThe first and second fields of all lines in the employees file containing the regular expression pattern Mary are displayed. |
^ | Matches at the beginning of string. |
$ | Matches at the end of string. |
. | Matches for a single character. |
* | Matches for zero or more of preceding character. |
+ | Matches for one or more of preceding character. |
? | Matches for zero or one of preceding character. |
[ABC] | Matches for any one character in the set of characters, i.e., A, B, or C. |
[^ABC] | Matches character not in the set of characters, i.e., A, B, or C. |
[A Z] | Matches for any character in the range from A to Z. |
A|B | Matches either A or B. |
(AB)+ | Matches one or more sets of AB. |
\* | Matches for a literal asterisk. |
& | Used in the replacement string, to represent what was found in the search string. |
% nawk '/^Mary/' employees Mary Adams 5346 11/4/63 28765
EXPLANATIONAll lines in the employees file that start with the regular expression Mary are displayed. |
% nawk '/^[A-Z][a-z]+ /' employees Tom Jones 4424 5/12/66 543354 Mary Adams 5346 11/4/63 28765 Sally Chang 1654 7/22/54 650000 Billy Black 1683 9/23/44 336500
EXPLANATIONAll lines in the employees file where the line begins with an uppercase letter, followed by one or more lowercase letters, followed by a space are displayed. |
The match operator, the tilde (~), is used to match an expression within a record or field.
% cat employees Tom Jones 44234 5/12/66 543354 Mary Adams 5346 11/4/63 28765 Sally Chang 1654 7/22/54 650000 Billy Black 1683 9/23/44 336500 % nawk '$1 ~ /[Bb]ill/' employees Billy Black 1683 9/23/44 336500
EXPLANATIONAny lines matching Bill or bill in the first field are displayed. |
% nawk '$1 !~ /ly$/' employees Tom Jones 4424 5/12/66 543354 Mary Adams 5346 11/4/63 28765
EXPLANATIONAny lines not matching ly, when ly is at the end of the first field are displayed. |
When you have multiple awk pattern/action statements, it is often much easier to put the statements in a script. The script is a file containing awk comments and statements. If statements and actions are on the same line, they are separated by semicolons. If statements are on separate lines, semicolons are not necessary. If an action follows a pattern, the opening curly brace must be on the same line as the pattern. Comments are preceded by a pound (#) sign.
% cat employees Tom Jones:4424:5/12/66:54335 Mary Adams:5346:11/4/63:28765 Billy Black:1683:9/23/44:336500 Sally Chang:1654:7/22/54:65000 Jose Tomas:1683:9/23/44:33650 (The Awk Script) % cat info 1 # My first awk script by Jack Sprat # Script name: info; Date: February 28, 2001 2 /Tom/{print "Tom's birthday is "$3} 3 /Mary/{print NR, $0} 4 /^Sally/{print "Hi Sally. " $1 " has a salary of $" $4 "."} # End of info script (The Command Line) 5 % nawk F: f info employees2 Tom's birthday is 5/12/66 2 Mary Adams:5346:11/4/63:28765 Hi Sally. Sally Chang has a salary of $65000.
EXPLANATION
|
The examples in this section use a sample database, called datafile. In the database, the input field separator, FS, is whitespace, the default. The number of fields, NF, is 8. The number may vary from line to line, but in this file, the number of fields is fixed. The record separator, RS, is the newline, which separates each line of the file. Awk keeps track of the number of each record in the NR variable. The output field separator, OFS, is a space. If a comma is used to separate fields, when the line is printed, each field printed will be separated by a space.
% cat datafile northwest NW Joel Craig 3.0 .98 3 4 western WE Sharon Kelly 5.3 .97 5 23 southwest SW Chris Foster 2.7 .8 2 18 southern SO May Chin 5.1 .95 4 15 southeast SE Derek Johnson 4.0 .7 4 17 eastern EA Susan Beal 4.4 .84 5 20 northeast NE TJ Nichols 5.1 .94 3 13 north NO Val Shultz 4.5 .89 5 9 central CT Sheri Watson 5.7 .94 5 13
nawk '/west/' datafile northwest NW Joel Craig 3.0 .98 3 4 western WE Sharon Kelly 5.3 .97 5 23 southwest SW Chris Foster 2.7 .8 2 18
EXPLANATIONAll lines containing the pattern west are printed. |
nawk '/^north/' datafile northwest NW Joel Craig 3.0 .98 3 4 northeast NE TJ Nichols 5.1 .94 3 13 north NO Val Shultz 4.5 .89 5 9
EXPLANATIONAll lines beginning with the pattern north are printed. |
nawk '/^(no|so)/' datafile northwest NW Joel Craig 3.0 .98 3 4 southwest SW Chris Foster 2.7 .8 2 18 southern SO May Chin 5.1 .95 4 15 southeast SE Derek Johnson 4.0 .7 4 17 northeast NE TJ Nichols 5.1 .94 3 13 north NO Val Shultz 4.5 .89 5 9
EXPLANATIONAll lines beginning with the pattern no or so are printed. |
nawk '{print $3, $2}' datafile Joel NW Sharon WE Chris SW May SO Derek SE Susan EA TJ NE Val NO Sheri CT
EXPLANATIONThe output field separator, OFS, is a space by default. The comma between $3 and $2 is translated to the value of the OFS. The third field is printed, followed by a space and the second field. |
% cat datafile northwest NW Joel Craig 3.0 .98 3 4 western WE Sharon Kelly 5.3 .97 5 23 southwest SW Chris Foster 2.7 .8 2 18 southern SO May Chin 5.1 .95 4 15 southeast SE Derek Johnson 4.0 .7 4 17 eastern EA Susan Beal 4.4 .84 5 20 northeast NE TJ Nichols 5.1 .94 3 13 north NO Val Shultz 4.5 .89 5 9 central CT Sheri Watson 5.7 .94 5 13
nawk '{print $3 $2}' datafile JoelNW SharonWE ChrisSW MaySO DerekSE SusanEA TJNE ValNO SheriCT
EXPLANATIONThe third field is followed by the second field. Since the comma does not separate fields $3 and $2, the output is displayed without spaces between the fields. |
nawk 'print $1' datafile nawk: syntax error at source line 1 context is >>> print <<< $1 nawk: bailing out at source line 1
EXPLANATIONThis is the nawk (new awk) error message. Nawk error messages are much more verbose than those of the old awk. In this program, the curly braces are missing in the action statement. |
awk 'print $1' datafile awk: syntax error near line 1 awk: bailing out near line 1
EXPLANATIONThis is the awk (old awk) error message. Old awk programs were difficult to debug since almost all errors produced this same message. The curly braces are missing in the action statement. |
nawk '{print $0}' datafile northwest NW Joel Craig 3.0 .98 3 4 western WE Sharon Kelly 5.3 .97 5 23 southwest SW Chris Foster 2.7 .8 2 18 southern SO May Chin 5.1 .95 4 15 southeast SE Derek Johnson 4.0 .7 4 17 eastern EA Susan Beal 4.4 .84 5 20 northeast NE TJ Nichols 5.1 .94 3 13 north NO Val Shultz 4.5 .89 5 9 central CT Sheri Watson 5.7 .94 5 13
EXPLANATIONEach record is printed. $0 holds the current record. |
nawk '{print "Number of fields: "NF}' datafile Number of fields: 8 Number of fields: 8 Number of fields: 8 Number of fields: 8 Number of fields: 8 Number of fields: 8 Number of fields: 8 Number of fields: 8 Number of fields: 8
EXPLANATIONThere are 8 fields in each record. The built-in awk variable NF holds the number of fields and is reset for each record. |
% cat datafile northwest NW Joel Craig 3.0 .98 3 4 western WE Sharon Kelly 5.3 .97 5 23 southwest SW Chris Foster 2.7 .8 2 18 southern SO May Chin 5.1 .95 4 15 southeast SE Derek Johnson 4.0 .7 4 17 eastern EA Susan Beal 4.4 .84 5 20 northeast NE TJ Nichols 5.1 .94 3 13 north NO Val Shultz 4.5 .89 5 9 central CT Sheri Watson 5.7 .94 5 13
nawk '/northeast/{print $3, $2}' datafile TJ NE
EXPLANATIONIf the record contains (or matches) the pattern northeast, the third field, followed by the second field, is printed. |
nawk '/E/' datafile western WE Sharon Kelly 5.3 .97 5 23 southeast SE Derek Johnson 4.0 .7 4 17 eastern EA Susan Beal 4.4 .84 5 20 northeast NE TJ Nichols 5.1 .94 3 13
EXPLANATIONIf the record contains an E, the entire record is printed. |
nawk '/^[ns]/{print $1}' datafile northwest southwest southern southeast northeast north
EXPLANATIONIf the record begins with an n or s, the first field is printed. |
nawk '$5 ~ /\.[7-9]+/' datafile southwest SW Chris Foster 2.7 .8 2 18 central CT Sheri Watson 5.7 .94 5 13
EXPLANATIONIf the fifth field ($5) contains a literal period, followed by one or more numbers between 7 and 9, the record is printed. |
nawk '$2 !~ /E/{print $1, $2}' datafile northwest NW southwest SW southern SO north NO central CT
EXPLANATIONIf the second field does not contain the pattern E, the first field followed by the second field ($1, $2) is printed. |
% cat datafile northwest NW Joel Craig 3.0 .98 3 4 western WE Sharon Kelly 5.3 .97 5 23 southwest SW Chris Foster 2.7 .8 2 18 southern SO May Chin 5.1 .95 4 15 southeast SE Derek Johnson 4.0 .7 4 17 eastern EA Susan Beal 4.4 .84 5 20 northeast NE TJ Nichols 5.1 .94 3 13 north NO Val Shultz 4.5 .89 5 9 central CT Sheri Watson 5.7 .94 5 13
nawk '$3 ~ /^Joel/{print $3 " is a nice guy."}' datafile Joel is a nice guy.
EXPLANATIONIf the third field ($3) begins with the pattern Joel, the third field followed by the string is a nice guy. is printed. Note that a space is included in the string if it is to be printed. |
nawk '$8 ~ /[0-9][0-9]$/{print $8}' datafile 23 18 15 17 20 13 13
EXPLANATIONIf the eighth field ($8) ends in two digits, it is printed. |
nawk '$4 ~ /Chin$/{print "The price is $" $8 "."}' datafile The price is $15.
EXPLANATIONIf the fourth field ($4) ends with Chin, the string enclosed in double quotes ("The price is $"), the eighth field ($8), and the string containing a period are printed. |
nawk '/TJ/{print $0}' datafile northeast NE TJ Nichols 5.1 .94 3 13
EXPLANATIONIf the record contains the pattern TJ, $0 (the record) is printed. |
% cat datafile2 Joel Craig:northwest:NW:3.0:.98:3:4 Sharon Kelly:western:WE:5.3:.97:5:23 Chris Foster:southwest:SW:2.7:.8:2:18 May Chin:southern:SO:5.1:.95:4:15 Derek Johnson:southeast:SE:4.0:.7:4:17 Susan Beal:eastern:EA:4.4:.84:5:20 TJ Nichols:northeast:NE:5.1:.94:3:13 Val Shultz:north:NO:4.5:.89:5:9 Sheri Watson:central:CT:5.7:.94:5:13
nawk '{print $1}' datafile2 Joel Sharon Chris May Derek Susan TJ Val Sheri
EXPLANATIONThe default input field separator is whitespace. The first field ($1) is printed. |
% cat datafile2 Joel Craig:northwest:NW:3.0:.98:3:4 Sharon Kelly:western:WE:5.3:.97:5:23 Chris Foster:southwest:SW:2.7:.8:2:18 May Chin:southern:SO:5.1:.95:4:15 Derek Johnson:southeast:SE:4.0:.7:4:17 Susan Beal:eastern:EA:4.4:.84:5:20 TJ Nichols:northeast:NE:5.1:.94:3:13 Val Shultz:north:NO:4.5:.89:5:9 Sheri Watson:central:CT:5.7:.94:5:13
nawk -F: '{print $1}' datafile2 Joel Craig Sharon Kelly Chris Foster <more output here> Val Shultz Sheri Watson
EXPLANATIONThe F option specifies the colon as the input field separator. The first field ($1) is printed. |
nawk '{print "Number of fields: "NF}' datafile2 Number of fields: 2 Number of fields: 2 Number of fields: 2 <more of the same output here> Number of fields: 2 Number of fields: 2
EXPLANATIONSince the field separator is the default (whitespace) the number of fields for each record is 2. The only space is between the first and last name. |
nawk -F: '{print "Number of fields: "NF}' datafile2 Number of fields: 7 Number of fields: 7 Number of fields: 7 <more of the same output here> Number of fields: 7 Number of fields: 7
EXPLANATIONSince the field separator is a colon, the number of fields in each record is 7. |
nawk -F"[ :]" '{print $1, $2}' datafile2 Joel Craig northwest Sharon Kelly western Chris Foster southwest May Chin southern Derek Johnson southeast Susan Beal eastern TJ Nichols northeast Val Shultz north Sheri Watson central
EXPLANATIONMultiple field separators can be specified with nawk as a regular expression. Either a space or a colon will be designated as a field separator. The first and second fields ($1, $2) are printed. |
% cat datafile northwest NW Joel Craig 3.0 .98 3 4 western WE Sharon Kelly 5.3 .97 5 23 southwest SW Chris Foster 2.7 .8 2 18 southern SO May Chin 5.1 .95 4 15 southeast SE Derek Johnson 4.0 .7 4 17 eastern EA Susan Beal 4.4 .84 5 20 northeast NE TJ Nichols 5.1 .94 3 13 north NO Val Shultz 4.5 .89 5 9 central CT Sheri Watson 5.7 .94 5 13
cat nawk.sc1 # This is a comment # This is my first awk script 1 /^north/{print $1, $2, $3} 2 /^south/{print "The " $1 " district."} nawk -f nawk.sc1 datafile 3 northwest NW Joel The southwest district. The southern district. The southeast district. northeast NE TJ north NO Val
EXPLANATION
|
Mike Harrington:(510) 548-1278:250:100:175
Christian Dobbins:(408) 538-2358:155:90:201
Susan Dalsass:(206) 654-6279:250:60:50
Archie McNichol:(206) 548-1348:250:100:175
Jody Savage:(206) 548-1278:15:188:150
Guy Quigley:(916) 343-6410:250:100:175
Dan Savage:(406) 298-7744:450:300:275
Nancy McNeil:(206) 548-1278:250:80:75
John Goldenrod:(916) 348-4278:250:100:175
Chet Main:(510) 548-5258:50:95:135
Tom Savage:(408) 926-3456:250:168:200
Elizabeth Stachelin:(916) 440-1763:175:75:300
The database above contains the names, phone numbers, and money contributions to the party campaign for the past three months.
1: | Print all the phone numbers. |
2: | Print Dan's phone number. |
3: | Print Susan's name and phone number. |
4: | Print all last names beginning with D. |
5: | Print all first names beginning with either a C or E. |
6: | Print all first names containing only four characters. |
7: | Print the first names of all those in the 916 area code. |
8: | Print Mike's campaign contributions. Each value should be printed with a leading dollar sign; e.g., $250 $100 $175. |
9: | Print last names followed by a comma and the first name. |
10: | Write an awk script called facts that:
|
[1] On SCO UNIX, the new version is spelled awk, and on Linux, the gnu version is spelled awk. This text pertains primarily to the new awk, nawk. The gnu implementation, gawk, is fully upward-compatible with nawk.
[2] On some versions of awk, actions must be separated by semicolons or newlines, and the statements within the curly braces also must be separated by semicolons or newlines. SVR4's nawk requires the use of semicolons or newlines to separate statements within an action, but does not require the use of semicolons to separate actions; for example, the two actions that follow do not need a semicolon:
CONTENTS |