Command-Line AWK


Command-Line AWK

Let s start by looking at some simple awk scripts that demonstrate its behavior. We ll use the following data in the file  missiles.txt (which contains data about Cold War nuclear delivery platforms). The data ( Listing 22.1) is delimited with : and contains five fields (missile name , length, weight, range, and speed).

Listing 22.1: Missile Data for awk Scripts. (This information can also be viewed at the Strategic Air Command Web site at http://www.strategic-air-command.com and on the CD-ROM at ./source/ch22/missiles.txt )
start example
 Thor:65:109330:1725:10250     Snark:67:48147:6325:650     Jupiter:55:110000:1976:9022     Atlas:75:260000:6300:17500     Titan:98:221500:6300:15000     Minuteman III:56:65000:6300:15000     Peacekeeper:71:195000:6000:15000 
end example
 

We can emit lines in much the same way that sed did, but including a search expression as the pattern and the print command as the action:

 # awk '/Thor/{print}' missiles.txt     Thor:65:109330:1725:10250     # 

Without a pattern, we simply emit the entire file:

 # awk '{print}' missiles.txt     Thor:65:109330:1725:10250     Snark:67:48147:6325:650     Jupiter:55:110000:1976:9022     Atlas:75:260000:6300:17500     Titan:98:221500:6300:15000     Minuteman III:56:65000:6300:15000     Peacekeeper:71:195000:6000:15000     # 

Rather than emit the entire line, we can emit selected fields instead. Awk automatically splits the line ( otherwise known as a record ) into the fields delimited by the colon . So if we wanted to emit the missile and range for the Thor missile, we could do this as:

 # awk -F: '/Thor/{print  ":" }' missiles.txt     Thor:10250     # 

Note that we specify the delimiter as : using the -F command-line option. Each field is parsed to a $ variable. The first field is defined as $1 , the second as $2 , and so on. The entire record is defined as $0 .

We can add additional text to make our output more reasonable by simply including more text for the print command (the command is actually one line):

 # awk -F: '/Thor/{print "Missile "  " has a range of "  " miles"}' missiles.txt     Missile Thor has a range of 10250 miles     # 

Arithmetic expressions are also possible on the data. Consider this example, which emits those missiles that have a range of 12,000 miles or more:

 # awk -F: ' > 12000 { print  }' missiles.txt     Atlas     Titan     Minuteman III     Peacekeeper     # 

In this example, our pattern is the test of $5 (the range field) being greater than 12,000. When this test pattern is satisfied, our action is to emit the first field (the name of the missile).

Awk provides a number of built-in variables that can be useful. For example, if we wanted to know the number of records in the file, we could use the optional END section with the NR built-in variable (Number of the Record):

 # awk 'END { print NR }' missiles.txt     7     # 

Note here that we emit NR at the end, so it s the total number of records that were in the file. If we emit NR at each line, it s the number of that given line, such as:

 # awk -F: '{ print NR, 
 # awk -F: '{ print NR, $0 }' missiles.txt 1 Thor:65:109330:1725:10250 2 Snark:67:48147:6325:650 3 Jupiter:55:110000:1976:9022 4 Atlas:75:260000:6300:17500 5 Titan:98:221500:6300:15000 6 Minuteman III:56:65000:5300:15000 7 Peacekeeper:71:195000:6000:15000 # 
}' missiles.txt 1 Thor:65:109330:1725:10250 2 Snark:67:48147:6325:650 3 Jupiter:55:110000:1976:9022 4 Atlas:75:260000:6300:17500 5 Titan:98:221500:6300:15000 6 Minuteman III:56:65000:5300:15000 7 Peacekeeper:71:195000:6000:15000 #

Given the range and speed data, we can calculate how long it takes to reach its maximum target ( roughly calculated as range over speed). This is provided as:

 # awk -F: '{ printf "%15s %3.2f\n", , /}' missiles.txt                Thor 0.17               Snark 9.73             Jupiter 0.22               Atlas 0.36               Titan 0.42       Minuteman III 0.35         Peacekeeper 0.40     # 

Here we demonstrate simple arithmetic ( $4 / $5 to compute the time to target) but also the use of printf within awk . Rather than simply print the results (as we ve done in previous examples), we use the printf command to provide a more structured output. We specify size and alignment for our string (missile name) and also the format of our time-to-target result. From this data, we can see that the Snark has the longest time of flight (it s also the slowest of the missiles shown here), and Thor the least.

Let s look at one final example in this command-line section that demonstrates a bit more of the arithmetic properties of awk . Let s say that we have one of each of these missiles, and we want to know their combined weight. This is easily calculated, using each of the three awk sections, as:

 # awk 'BEGIN {FS=":"} {wt += } END {print wt}' missiles.txt     1008977     # 

Let s look at each of the three sections to see what s going on. In the first section ( BEGIN ), we specify our field separator (using the built-in FS variable). We could also specify this on the command line (with the -F option), but this use makes it part of the script and is therefore less error prone. For each record that we find, we sum the weight field (field 3). Note that we did not initialize our wt variable, as awk will automatically initialize it to zero when it s created. After we ve processed the last record, our END section is performed where we simply print the weight total (just a tad over 1 million pounds or 457,664.27 kilograms).

We ve looked at some of awk s built-in variables so far (such as FS and NR ). These and other built-in variables are available for use. A list of some of the most useful is shown in Table 22.1.

Table 22.1: Awk s Built-in Variables

Variable

Description

NR

Input record number

NF

Number of fields in the current record

FS

Field separator (default space and tab)

OFS

Output field separator (default space)

RS

Input record separator (default newline)

ORS

Output record separator (default newline)

FILENAME

Current input filename




GNU/Linux Application Programming
GNU/Linux Application Programming (Programming Series)
ISBN: 1584505680
EAN: 2147483647
Year: 2006
Pages: 203
Authors: M. Tim Jones

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net