Language Basics


An awk program (from the command line or from program-file) consists of one or more lines containing a pattern and/or action in the following format:

pattern { action }


The pattern selects lines from the input. The awk utility performs the action on all lines that the pattern selects. The braces surrounding the action enable awk to differentiate it from the pattern. If a program line does not contain a pattern, awk selects all lines in the input. If a program line does not contain an action, awk copies the selected lines to standard output.

To start, awk compares the first line of input (from the file-list or standard input) with each pattern in the program. If a pattern selects the line (if there is a match), awk takes the action associated with the pattern. If the line is not selected, awk takes no action. When awk has completed its comparisons for the first line of input, it repeats the process for the next line of input, continuing this process of comparing subsequent lines of input until it has read all of the input.

If several patterns select the same line, awk takes the actions associated with each of the patterns in the order in which they appear in the program. It is possible for awk to send a single line from the input to standard output more than once.

Patterns

You can use a regular expression (Appendix A), enclosed within slashes, as a pattern. The ~ operator tests whether a field or variable matches a regular expression. The !~ operator tests for no match. You can perform both numeric and string comparisons using the relational operators listed in Table 14-1. You can combine any of the patterns using the Boolean operators || (OR) or && (AND).

Table 14-1. Relational operators

Relop

Meaning

<

Less than

<=

Less than or equal to

= =

Equal to

!=

Not equal to

>=

Greater than or equal to

>

Greater than


BEGIN and END

Two unique patterns, BEGIN and END, execute commands before awk starts its processing and after it finishes. The awk utility executes the actions associated with the BEGIN pattern before, and with the END pattern after, it processes all the input.

, (comma)

The comma is the range operator. If you separate two patterns with a comma on a single awk program line, awk selects a range of lines, beginning with the first line that matches the first pattern. The last line awk selects is the next subsequent line that matches the second pattern. If no line matches the second pattern, awk selects every line through the end of the input. After awk finds the second pattern, it begins the process again by looking for the first pattern again.

Actions

The action portion of an awk command causes awk to take that action when it matches a pattern. When you do not specify an action, awk performs the default action, which is the print command (explicitly represented as {print}). This action copies the record (normally a linesee "Variables" later in this chapter) from the input to standard output.

When you follow a print command with arguments, awk displays only the arguments you specify. These arguments can be variables or string constants. You can send the output from a print command to a file (>), append it to a file (>>), or send it through a pipe to the input of another program (|).

Unless you separate items in a print command with commas, awk catenates them. Commas cause awk to separate the items with the output field separator (OFS, normally a SPACEsee "Variables").

You can include several actions on one line by separating them with semicolons.

Comments

The awk utility disregards anything on a program line following a pound sign (#). You can document an awk program by preceding comments with this symbol.

Variables

Although you do not need to declare awk variables prior to their use, you can optionally assign initial values to them. Unassigned numeric variables are initialized to 0; string variables are initialized to the null string. In addition to user variables, awk maintains program variables. You can use both user and program variables in the pattern and in the action portion of an awk program. Table 14-2 lists a few program variables.

Table 14-2. Variables

Variable

Meaning

$0

The current record (as a single variable)

$1-$n

Fields in the current record

FILENAME

Name of the current input file (null for standard input)

FS

Input field separator (default: SPACE or TAB)

NF

Number of fields in the current record

NR

Record number of the current record

OFS

Output field separator (default: SPACE)

ORS

Output record separator (default: NEWLINE)

RS

Input record separator (default: NEWLINE)


In addition to initializing variables within a program, you can use the v option to initialize variables on the command line. This feature is useful when the value of a variable changes from one run of awk to the next.

By default the input and output record separators are NEWLINE characters. Thus awk takes each line of input to be a separate record and appends a NEWLINE to the end of each output record. By default the input field separators are SPACEs and TABs. The default output field separator is a SPACE. You can change the value of any of the separators at any time by assigning a new value to its associated variable either from within the program or from the command line by using the v option.

Functions

Table 14-3 lists a few of the functions that awk provides for manipulating numbers and strings.

Table 14-3. Functions

Function

Meaning

length(str)

Returns the number of characters in str; without an argument, returns the number of characters in the current record

int(num)

Returns the integer portion of num

index(str1, str2)

Returns the index of str2 in str1 or 0 if str2 is not present

split(str,arr,del)

Places elements of str, delimited by del, in the array arr[1]...arr[n]; returns the number of elements in the array

sprintf(fmt,args)

Formats args according to fmt and returns the formatted string; mimics the C programming language function of the same name

substr(str,pos,len)

Returns the substring of str that begins at pos and is len characters long

tolower(str)

Returns a copy of str in which all uppercase letters are replaced with their lowercase counterparts

toupper(str)

Returns a copy of str in which all lowercase letters are replaced with their uppercase counterparts


Arithmetic Operators

The awk arithmetic operators listed in Table 14-4 are from the C programming language.

Table 14-4. Arithmetic operators

Operator

Meaning

*

Multiplies the expression preceding the operator by the expression following it

/

Divides the expression preceding the operator by the expression following it

%

Takes the remainder after dividing the expression preceding the operator by the expression following it

+

Adds the expression preceding the operator to the expression following it

Subtracts the expression following the operator from the expression preceding it

=

Assigns the value of the expression following the operator to the variable preceding it

++

Increments the variable preceding the operator

Decrements the variable preceding the operator

+=

Adds the expression following the operator to the variable preceding it and assigns the result to the variable preceding the operator

=

Subtracts the expression following the operator from the variable preceding it and assigns the result to the variable preceding the operator

*=

Multiplies the variable preceding the operator by the expression following it and assigns the result to the variable preceding the operator

/=

Divides the variable preceding the operator by the expression following it and assigns the result to the variable preceding the operator

%=

Assigns the remainder, after dividing the variable preceding the operator by the expression following it, to the variable preceding the operator


Associative Arrays

An associative array is one of awk's most powerful features. These arrays use strings as indexes. Using an associative array, you can mimic a traditional array by using numeric strings as indexes.

You assign a value to an element of an associative array just as you would assign a value to any other awk variable. The syntax is

array[string] = value


where array is the name of the array, string is the index of the element of the array you are assigning a value to, and value is the value you are assigning to that element.

You can use a special for structure with an associative array. The syntax is

for (elem in array) action


where elem is a variable that takes on the value of each element of the array as the for structure loops through them, array is the name of the array, and action is the action that awk takes for each element in the array. You can use the elem variable in this action.

The "Examples" section found later in this chapter contains programs that use associative arrays.

printf

You can use the printf command in place of print to control the format of the output that awk generates. The awk version of printf is similar to that found in the C language. A printf command has the following syntax:

printf "control-string", arg1, arg2, ..., argn


The control-string determines how printf formats arg1, arg2, ..., argn. These arguments can be variables or other expressions. Within the control-string you can use \n to indicate a NEWLINE and \t to indicate a TAB. The control-string contains conversion specifications, one for each argument. A conversion specification has the following syntax:

%[][x[.y]]conv


where causes printf to left-justify the argument; x is the minimum field width, and .y is the number of places to the right of a decimal point in a number. The conv indicates the type of numeric conversion and can be selected from the letters in Table 14-5. Refer to "Examples" later in this chapter for examples of how to use printf.

Table 14-5. Numeric conversion

conv

Type of conversion

d

Decimal

e

Exponential notation

f

Floating-point number

g

Use f or e, whichever is shorter

o

Unsigned octal

s

String of characters

x

Unsigned hexadecimal


Control Structures

Control (flow) statements alter the order of execution of commands within an awk program. This section details the if...else, while, and for control structures. In addition, the break and continue statements work in conjunction with the control structures to alter the order of execution of commands. See page 524 for more information on control structures. You do not need to use braces around commands when you specify a single, simple command.

if...else

The if...else control structure tests the status returned by the condition and transfers control based on this status. The syntax of an if...else structure is shown below. The else part is optional.

if (condition)          {commands}      [else          {commands}]


The simple if statement shown here does not use braces:

if ($5 <= 5000) print $0


Next is an awk program that uses a simple if...else structure. Again, there are no braces.

$ cat if1 BEGIN   {         nam="sam"         if (nam == "max")                print "nam is max"             else                print "nam is not max, it is", nam         } $ awk -f if1 nam is not max, it is sam


while

The while structure loops through and executes the commands as long as the condition is true. The syntax of a while structure is

while (condition)     {commands}


The next awk program uses a simple while structure to display powers of 2. This example uses braces because the while loop contains more than one statement.

$ cat while1 BEGIN{     n = 1     while (n <= 5)         {         print n "^2", 2**n         n++         }     } $ awk -f while1 1^2 2 2^2 4 3^2 8 4^2 16 5^2 32


for

The syntax of a for control structure is

for (init; condition; increment)     {commands}


A for structure starts by executing the init statement, which usually sets a counter to 0 or 1. It then loops through the commands as long as the condition is true. After each loop it executes the increment statement. The for1 awk program does the same thing as the preceding while1 program except that it uses a for statement, which makes the program simpler:

$ cat for1 BEGIN   {         for (n=1; n <= 5; n++)         print n "^2", 2**n         } $ awk -f for1 1^2 2 2^2 4 3^2 8 4^2 16 5^2 32


The awk utility supports an alternative for syntax for working with associative arrays:

for (var in array)      {commands}


This for structure loops through elements of the associative array named array, assigning the value of the index of each element of array to var each time through the loop.

END    {for (name in manuf) print name, manuf[name]}


break

The break statement transfers control out of a for or while loop, terminating execution of the innermost loop it appears in.

continue

The continue statement transfers control to the end of a for or while loop, causing execution of the innermost loop it appears in to continue with the next iteration.




A Practical Guide to UNIX[r] for Mac OS[r] X Users
A Practical Guide to UNIX for Mac OS X Users
ISBN: 0131863339
EAN: 2147483647
Year: 2005
Pages: 234

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net