Now let s dig in further and explore some of awk s other capabilities. In this section, we ll move beyond the simple single-line scripts and look at how applications can be developed in awk .
Scripting applications in awk allows us to build bigger and more complex applications. Let s start with an update to our previous application (printing the total weight of all missiles). In this example, we ll sum all of the numeric data and add headers and trailers to the data (see Listing 22.2).
1 : BEGIN { 2 : 3 : FS = ":" 4 : printf "\n Name Length Range" 5 : printf " Speed Weight\n" 6 : 7 : } 8 : 9 : { 10 : printf "%15s %8d %8d %8d %8d\n", , , , , 11 : 12 : len += 13 : wt += 14 : rng += 15 : spd += 16 : 17 : } 18 : 19 : END { 20 : 21 : printf "\n Totals " 22 : printf " \n" 23 : printf " %8d %8d %8d %8d\n\n", 24 : len, rng, spd, wt 25 : 26 : }
We invoke this script (called tabulate.awk ) as follows (with sample results shown given our input file from Listing 22.1):
# awk -f tabulate.awk missiles.txt Name Length Range Speed Weight Thor 65 1725 10250 109330 Snark 67 6325 650 48147 Jupiter 55 1976 9022 110000 Atlas 75 6300 17500 260000 Titan 98 6300 15000 221500 Minuteman III 56 5300 15000 65000 Peacekeeper 71 6000 15000 195000 Totals 487 33926 82422 1008977 #
Granted, the data is meaningless, but it illustrates how we can process the input data and format the output data.
Let s now update our application to find the extremes of the data. In this example, we ll store the missile that is the longest, heaviest, has the longest range, and is the fastest . This example illustrates a very interesting aspect of awk that is not provided in many other languages that we use every day (dynamic and associative arrays).
Listing 22.3 shows our new application, which is similar to our original in Listing 22.2. In our BEGIN section (lines 1 “7) we set up our field separator and then emit our table header.
For each record in the file, we have a default action (lines 9 “29). We check each of our test elements (length, weight, range, and speed), and if we find one that exceeds the current (default of zero), then we save it and the current record. Recall that numeric variables are automatically initialized to zero. Note here that saving the current record is done with an associative array that is also dynamic. We ve not declared our array ( saved ) or its size . Our index is a string that identifies the particular record of interest. Note that we can remove an element from our associative array using the delete command, such as:
delete saved["longest"]
which will remove the entry identified by the longest index.
Once the last record is processed, we perform our END section (lines 31 “52). Here we simply emit our data stored from the previous extreme s capture. Note the use of the split command, which provides the means to split a line into its individual fields (as is done automatically when the record is read). The split command takes a string (stored in our associative array) and another variable that will represent our array of split elements.
1 : BEGIN { 2 : 3 : FS = ":" 4 : printf "\n Name Length Range" 5 : printf " Speed Weight\n" 6 : 7 : } 8 : 9 : { 10 : if (> longest) { 11 : saved["longest"] =1 : BEGIN { 2 : 3 : FS = ":" 4 : printf "\n Name Length Range" 5 : printf " Speed Weight\n" 6 : 7 : } 8 : 9 : { 10 : if ($2 > longest) { 11 : saved["longest"] = $0 12 : longest = $2 13 : } 14 : 15 : if ($ 3 > heaviest) { 16 : saved["heaviest"] = $0 17 : heaviest = $3 18 : } 19 : 20 : if ($4 > longest_range) { 21 : saved["longest_range"] = $0 22 : longest_range = $4 23 : } 24 : 25 : if ($5 > fastest) { 26 : saved["fastest"] = $0 27 : fastest = $5 28 : } 29 : } 30 : 31 : END { 32 : 33 : printf " ” ” ” ” ” ” ” ” ” ” ” ” ” ” ” ”" 34 : printf " ” ” ” ” ” ” ” ”\n" 35 : 36 : split(saved["longest"], var, ":") 37 : printf "%15s %8d %8d %8d %8d (Longest)\n\n", 38 : var[1], var[2], var[4], var[5], var[3] 39 : 40 : split(saved["heaviest"], var, ":") 41 : printf "%15s %8d %8d %8d %8d (Heaviest)\n\n", 42 : var[1], var[2], var[4], var[5], var[3] 43 : 44 : split(saved["longest_range"], var, ":") 45 : printf "%15s %8d %8d %8d %8d (Longest Range)\n\n", 46 : var[1], var[2], var[4], var[5], var[3] 47 : 48 : split(saved["fastest"], var, ":") 49 : printf "%15s %8d %8d %8d %8d (Fastest)\n\n", 50 : var[1], var[2], var[4], var[5], var[3] 51 : 52 : }12 : longest = 13 : } 14 : 15 : if ($ 3 > heaviest) { 16 : saved["heaviest"] =1 : BEGIN { 2 : 3 : FS = ":" 4 : printf "\n Name Length Range" 5 : printf " Speed Weight\n" 6 : 7 : } 8 : 9 : { 10 : if ($2 > longest) { 11 : saved["longest"] = $0 12 : longest = $2 13 : } 14 : 15 : if ($ 3 > heaviest) { 16 : saved["heaviest"] = $0 17 : heaviest = $3 18 : } 19 : 20 : if ($4 > longest_range) { 21 : saved["longest_range"] = $0 22 : longest_range = $4 23 : } 24 : 25 : if ($5 > fastest) { 26 : saved["fastest"] = $0 27 : fastest = $5 28 : } 29 : } 30 : 31 : END { 32 : 33 : printf " ” ” ” ” ” ” ” ” ” ” ” ” ” ” ” ”" 34 : printf " ” ” ” ” ” ” ” ”\n" 35 : 36 : split(saved["longest"], var, ":") 37 : printf "%15s %8d %8d %8d %8d (Longest)\n\n", 38 : var[1], var[2], var[4], var[5], var[3] 39 : 40 : split(saved["heaviest"], var, ":") 41 : printf "%15s %8d %8d %8d %8d (Heaviest)\n\n", 42 : var[1], var[2], var[4], var[5], var[3] 43 : 44 : split(saved["longest_range"], var, ":") 45 : printf "%15s %8d %8d %8d %8d (Longest Range)\n\n", 46 : var[1], var[2], var[4], var[5], var[3] 47 : 48 : split(saved["fastest"], var, ":") 49 : printf "%15s %8d %8d %8d %8d (Fastest)\n\n", 50 : var[1], var[2], var[4], var[5], var[3] 51 : 52 : }17 : heaviest = 18 : } 19 : 20 : if (> longest_range) { 21 : saved["longest_range"] =1 : BEGIN { 2 : 3 : FS = ":" 4 : printf "\n Name Length Range" 5 : printf " Speed Weight\n" 6 : 7 : } 8 : 9 : { 10 : if ($2 > longest) { 11 : saved["longest"] = $0 12 : longest = $2 13 : } 14 : 15 : if ($ 3 > heaviest) { 16 : saved["heaviest"] = $0 17 : heaviest = $3 18 : } 19 : 20 : if ($4 > longest_range) { 21 : saved["longest_range"] = $0 22 : longest_range = $4 23 : } 24 : 25 : if ($5 > fastest) { 26 : saved["fastest"] = $0 27 : fastest = $5 28 : } 29 : } 30 : 31 : END { 32 : 33 : printf " ” ” ” ” ” ” ” ” ” ” ” ” ” ” ” ”" 34 : printf " ” ” ” ” ” ” ” ”\n" 35 : 36 : split(saved["longest"], var, ":") 37 : printf "%15s %8d %8d %8d %8d (Longest)\n\n", 38 : var[1], var[2], var[4], var[5], var[3] 39 : 40 : split(saved["heaviest"], var, ":") 41 : printf "%15s %8d %8d %8d %8d (Heaviest)\n\n", 42 : var[1], var[2], var[4], var[5], var[3] 43 : 44 : split(saved["longest_range"], var, ":") 45 : printf "%15s %8d %8d %8d %8d (Longest Range)\n\n", 46 : var[1], var[2], var[4], var[5], var[3] 47 : 48 : split(saved["fastest"], var, ":") 49 : printf "%15s %8d %8d %8d %8d (Fastest)\n\n", 50 : var[1], var[2], var[4], var[5], var[3] 51 : 52 : }22 : longest_range = 23 : } 24 : 25 : if (> fastest) { 26 : saved["fastest"] =1 : BEGIN { 2 : 3 : FS = ":" 4 : printf "\n Name Length Range" 5 : printf " Speed Weight\n" 6 : 7 : } 8 : 9 : { 10 : if ($2 > longest) { 11 : saved["longest"] = $0 12 : longest = $2 13 : } 14 : 15 : if ($ 3 > heaviest) { 16 : saved["heaviest"] = $0 17 : heaviest = $3 18 : } 19 : 20 : if ($4 > longest_range) { 21 : saved["longest_range"] = $0 22 : longest_range = $4 23 : } 24 : 25 : if ($5 > fastest) { 26 : saved["fastest"] = $0 27 : fastest = $5 28 : } 29 : } 30 : 31 : END { 32 : 33 : printf " ” ” ” ” ” ” ” ” ” ” ” ” ” ” ” ”" 34 : printf " ” ” ” ” ” ” ” ”\n" 35 : 36 : split(saved["longest"], var, ":") 37 : printf "%15s %8d %8d %8d %8d (Longest)\n\n", 38 : var[1], var[2], var[4], var[5], var[3] 39 : 40 : split(saved["heaviest"], var, ":") 41 : printf "%15s %8d %8d %8d %8d (Heaviest)\n\n", 42 : var[1], var[2], var[4], var[5], var[3] 43 : 44 : split(saved["longest_range"], var, ":") 45 : printf "%15s %8d %8d %8d %8d (Longest Range)\n\n", 46 : var[1], var[2], var[4], var[5], var[3] 47 : 48 : split(saved["fastest"], var, ":") 49 : printf "%15s %8d %8d %8d %8d (Fastest)\n\n", 50 : var[1], var[2], var[4], var[5], var[3] 51 : 52 : }27 : fastest = 28 : } 29 : } 30 : 31 : END { 32 : 33 : printf " " 34 : printf " \n" 35 : 36 : split(saved["longest"], var, ":") 37 : printf "%15s %8d %8d %8d %8d (Longest)\n\n", 38 : var[1], var[2], var[4], var[5], var[3] 39 : 40 : split(saved["heaviest"], var, ":") 41 : printf "%15s %8d %8d %8d %8d (Heaviest)\n\n", 42 : var[1], var[2], var[4], var[5], var[3] 43 : 44 : split(saved["longest_range"], var, ":") 45 : printf "%15s %8d %8d %8d %8d (Longest Range)\n\n", 46 : var[1], var[2], var[4], var[5], var[3] 47 : 48 : split(saved["fastest"], var, ":") 49 : printf "%15s %8d %8d %8d %8d (Fastest)\n\n", 50 : var[1], var[2], var[4], var[5], var[3] 51 : 52 : }
The sample output, given our previous data file, is shown below:
# awk -f order.awk missiles.txt Name Length Range Speed Weight Titan 98 6300 15000 221500 (Longest) Atlas 75 6300 17500 260000 (Heaviest) Snark 67 6325 650 48147 (Longest Range) Atlas 75 6300 17500 260000 (Fastest) #
Awk does provide some shortcuts to simplify our application. Consider the following replacement to the END section of Listing 22.3 (see Listing 22.4).
1 : END { 2 : 3 : printf " " 4 : printf " \n" 5 : 6 : for (name in saved) { 7 : 8 : split(saved[name], var, ":") 9 : printf "%15s %8d %8d %8d %8d (%s)\n\n", 10 : var[1], var[2], var[4], var[5], var[3], name 11 : 12 : } 13 : 14 : }
In this example, we illustrate awk s for loop, but using an index other than an integer (what we commonly think of for iterating through a loop). At line 6, we walk through the indices of the saved array ( longest , heaviest , longest__range , and fastest ). Using name at line 8, we split out the entry in the saved array for that index and emit the data as we did before.