An awk program (from the command line or from program-file) consists of one or more lines containing a pattern and/or action in the following format: pattern { action } The pattern selects lines from the input. The awk utility performs the action on all lines that the pattern selects. The braces surrounding the action enable awk to differentiate it from the pattern. If a program line does not contain a pattern, awk selects all lines in the input. If a program line does not contain an action, awk copies the selected lines to standard output. To start, awk compares the first line of input (from the file-list or standard input) with each pattern in the program. If a pattern selects the line (if there is a match), awk takes the action associated with the pattern. If the line is not selected, awk takes no action. When awk has completed its comparisons for the first line of input, it repeats the process for the next line of input, continuing this process of comparing subsequent lines of input until it has read all of the input. If several patterns select the same line, awk takes the actions associated with each of the patterns in the order in which they appear in the program. It is possible for awk to send a single line from the input to standard output more than once. PatternsYou can use a regular expression (Appendix A), enclosed within slashes, as a pattern. The ~ operator tests whether a field or variable matches a regular expression. The !~ operator tests for no match. You can perform both numeric and string comparisons using the relational operators listed in Table 14-1. You can combine any of the patterns using the Boolean operators || (OR) or && (AND).
BEGIN and END Two unique patterns, BEGIN and END, execute commands before awk starts its processing and after it finishes. The awk utility executes the actions associated with the BEGIN pattern before, and with the END pattern after, it processes all the input. , (comma) The comma is the range operator. If you separate two patterns with a comma on a single awk program line, awk selects a range of lines, beginning with the first line that matches the first pattern. The last line awk selects is the next subsequent line that matches the second pattern. If no line matches the second pattern, awk selects every line through the end of the input. After awk finds the second pattern, it begins the process again by looking for the first pattern again. ActionsThe action portion of an awk command causes awk to take that action when it matches a pattern. When you do not specify an action, awk performs the default action, which is the print command (explicitly represented as {print}). This action copies the record (normally a linesee "Variables" later in this chapter) from the input to standard output. When you follow a print command with arguments, awk displays only the arguments you specify. These arguments can be variables or string constants. You can send the output from a print command to a file (>), append it to a file (>>), or send it through a pipe to the input of another program (|). Unless you separate items in a print command with commas, awk catenates them. Commas cause awk to separate the items with the output field separator (OFS, normally a SPACEsee "Variables"). You can include several actions on one line by separating them with semicolons. CommentsThe awk utility disregards anything on a program line following a pound sign (#). You can document an awk program by preceding comments with this symbol. VariablesAlthough you do not need to declare awk variables prior to their use, you can optionally assign initial values to them. Unassigned numeric variables are initialized to 0; string variables are initialized to the null string. In addition to user variables, awk maintains program variables. You can use both user and program variables in the pattern and in the action portion of an awk program. Table 14-2 lists a few program variables.
In addition to initializing variables within a program, you can use the v option to initialize variables on the command line. This feature is useful when the value of a variable changes from one run of awk to the next. By default the input and output record separators are NEWLINE characters. Thus awk takes each line of input to be a separate record and appends a NEWLINE to the end of each output record. By default the input field separators are SPACEs and TABs. The default output field separator is a SPACE. You can change the value of any of the separators at any time by assigning a new value to its associated variable either from within the program or from the command line by using the v option. FunctionsTable 14-3 lists a few of the functions that awk provides for manipulating numbers and strings.
Arithmetic OperatorsThe awk arithmetic operators listed in Table 14-4 are from the C programming language.
Associative ArraysAn associative array is one of awk's most powerful features. These arrays use strings as indexes. Using an associative array, you can mimic a traditional array by using numeric strings as indexes. You assign a value to an element of an associative array just as you would assign a value to any other awk variable. The syntax is array[string] = value where array is the name of the array, string is the index of the element of the array you are assigning a value to, and value is the value you are assigning to that element. You can use a special for structure with an associative array. The syntax is for (elem in array) action where elem is a variable that takes on the value of each element of the array as the for structure loops through them, array is the name of the array, and action is the action that awk takes for each element in the array. You can use the elem variable in this action. The "Examples" section found later in this chapter contains programs that use associative arrays. printfYou can use the printf command in place of print to control the format of the output that awk generates. The awk version of printf is similar to that found in the C language. A printf command has the following syntax: printf "control-string", arg1, arg2, ..., argn The control-string determines how printf formats arg1, arg2, ..., argn. These arguments can be variables or other expressions. Within the control-string you can use \n to indicate a NEWLINE and \t to indicate a TAB. The control-string contains conversion specifications, one for each argument. A conversion specification has the following syntax: %[][x[.y]]conv where causes printf to left-justify the argument; x is the minimum field width, and .y is the number of places to the right of a decimal point in a number. The conv indicates the type of numeric conversion and can be selected from the letters in Table 14-5. Refer to "Examples" later in this chapter for examples of how to use printf.
Control StructuresControl (flow) statements alter the order of execution of commands within an awk program. This section details the if...else, while, and for control structures. In addition, the break and continue statements work in conjunction with the control structures to alter the order of execution of commands. See page 524 for more information on control structures. You do not need to use braces around commands when you specify a single, simple command. if...elseThe if...else control structure tests the status returned by the condition and transfers control based on this status. The syntax of an if...else structure is shown below. The else part is optional. if (condition) {commands} [else {commands}] The simple if statement shown here does not use braces: if ($5 <= 5000) print $0 Next is an awk program that uses a simple if...else structure. Again, there are no braces. $ cat if1 BEGIN { nam="sam" if (nam == "max") print "nam is max" else print "nam is not max, it is", nam } $ awk -f if1 nam is not max, it is sam whileThe while structure loops through and executes the commands as long as the condition is true. The syntax of a while structure is while (condition) {commands} The next awk program uses a simple while structure to display powers of 2. This example uses braces because the while loop contains more than one statement. $ cat while1 BEGIN{ n = 1 while (n <= 5) { print n "^2", 2**n n++ } } $ awk -f while1 1^2 2 2^2 4 3^2 8 4^2 16 5^2 32 forThe syntax of a for control structure is for (init; condition; increment) {commands} A for structure starts by executing the init statement, which usually sets a counter to 0 or 1. It then loops through the commands as long as the condition is true. After each loop it executes the increment statement. The for1 awk program does the same thing as the preceding while1 program except that it uses a for statement, which makes the program simpler: $ cat for1 BEGIN { for (n=1; n <= 5; n++) print n "^2", 2**n } $ awk -f for1 1^2 2 2^2 4 3^2 8 4^2 16 5^2 32 The awk utility supports an alternative for syntax for working with associative arrays: for (var in array) {commands} This for structure loops through elements of the associative array named array, assigning the value of the index of each element of array to var each time through the loop. END {for (name in manuf) print name, manuf[name]} breakThe break statement transfers control out of a for or while loop, terminating execution of the innermost loop it appears in. continueThe continue statement transfers control to the end of a for or while loop, causing execution of the innermost loop it appears in to continue with the next iteration. |