Scripted AWK


Scripted AWK

Now let s dig in further and explore some of awk s other capabilities. In this section, we ll move beyond the simple single-line scripts and look at how applications can be developed in awk .

Scripting applications in awk allows us to build bigger and more complex applications. Let s start with an update to our previous application (printing the total weight of all missiles). In this example, we ll sum all of the numeric data and add headers and trailers to the data (see Listing 22.2).

Listing 22.2: Expanding Our Summing Application (on the CD-ROM at ./source/ch21/_tabulate.awk )
start example
  1  :       BEGIN {  2  :  3  :         FS = ":"  4  :         printf "\n           Name    Length     Range"  5  :         printf "     Speed    Weight\n"  6  :  7  :       }  8  :  9  :       {  10  :         printf "%15s  %8d  %8d  %8d  %8d\n", , , , ,  11  :  12  :         len +=  13  :         wt  +=  14  :         rng +=  15  :         spd +=  16  :  17  :       }  18  :  19  :       END {  20  :  21  :         printf "\n         Totals     "  22  :         printf "    \n"  23  :         printf "                 %8d  %8d  %8d  %8d\n\n",  24  :                len, rng, spd, wt  25  :  26  :       } 
end example
 

Listing 22.2 illustrates the three awk sections. We define our field separator and emit a header line at lines 3 “5 within the BEGIN section. Next, for each record in the input file, we emit the fields of the record in an order different than that of the original itself (note line 10). Lines 10 “15 are performed for each record (as there s no pattern here, only an action that defaults to each record). Lines 12 “15 simply sum each of the fields that we desire an accumulation of. We keep track of the total lengths ( len ), total weight ( wt ), total range ( rng ), and finally total speed ( spd ). The END section (which is performed after the last record is processed ) emits the totals. Note the use of printf here to better control the format of the output.

We invoke this script (called  tabulate.awk ) as follows (with sample results shown given our input file from Listing 22.1):

 # awk -f tabulate.awk missiles.txt            Name    Length     Range     Speed    Weight            Thor        65      1725     10250    109330           Snark        67      6325       650     48147         Jupiter        55      1976      9022    110000           Atlas        75      6300     17500    260000           Titan        98      6300     15000    221500   Minuteman III        56      5300     15000     65000     Peacekeeper        71      6000     15000    195000          Totals                                               487     33926     82422   1008977 # 

Granted, the data is meaningless, but it illustrates how we can process the input data and format the output data.

Let s now update our application to find the extremes of the data. In this example, we ll store the missile that is the longest, heaviest, has the longest range, and is the fastest . This example illustrates a very interesting aspect of awk that is not provided in many other languages that we use every day (dynamic and associative arrays).

Listing 22.3 shows our new application, which is similar to our original in Listing 22.2. In our BEGIN section (lines 1 “7) we set up our field separator and then emit our table header.

For each record in the file, we have a default action (lines 9 “29). We check each of our test elements (length, weight, range, and speed), and if we find one that exceeds the current (default of zero), then we save it and the current record. Recall that numeric variables are automatically initialized to zero. Note here that saving the current record is done with an associative array that is also dynamic. We ve not declared our array ( saved ) or its size . Our index is a string that identifies the particular record of interest. Note that we can remove an element from our associative array using the delete command, such as:

 delete saved["longest"] 

which will remove the entry identified by the longest index.

Once the last record is processed, we perform our END section (lines 31 “52). Here we simply emit our data stored from the previous extreme s capture. Note the use of the split command, which provides the means to split a line into its individual fields (as is done automatically when the record is read). The split command takes a string (stored in our associative array) and another variable that will represent our array of split elements.

Listing 22.3: Finding and Storing the Extremes (on the CD-ROM at ./source/ch21/_order.awk )
start example
  1  :       BEGIN {  2  :  3  :         FS = ":"  4  :         printf "\n           Name    Length     Range"  5  :         printf "     Speed    Weight\n"  6  :  7  :       }  8  :  9  :       {  10  :         if (> longest) {  11  :           saved["longest"] = 
  1  : BEGIN {  2  :  3  : FS = ":"  4  : printf "\n Name Length Range"  5  : printf " Speed Weight\n"  6  :  7  : }  8  :  9  : {  10  : if ($2 > longest) {  11  : saved["longest"] = $0  12  : longest = $2  13  : }  14  :  15  : if ($  3  > heaviest) {  16  : saved["heaviest"] = $0  17  : heaviest = $3  18  : }  19  :  20  : if ($4 > longest_range) {  21  : saved["longest_range"] = $0  22  : longest_range = $4  23  : }  24  :  25  : if ($5 > fastest) {  26  : saved["fastest"] = $0  27  : fastest = $5  28  : }  29  : }  30  :  31  : END {  32  :  33  : printf " ” ” ” ” ” ” ” ” ” ” ” ” ” ” ” ”"  34  : printf " ” ” ” ” ” ” ” ”\n"  35  :  36  : split(saved["longest"], var, ":")  37  : printf "%15s %8d %8d %8d %8d (Longest)\n\n",  38  : var[1], var[2], var[4], var[5], var[3]  39  :  40  : split(saved["heaviest"], var, ":")  41  : printf "%15s %8d %8d %8d %8d (Heaviest)\n\n",  42  : var[1], var[2], var[4], var[5], var[3]  43  :  44  : split(saved["longest_range"], var, ":")  45  : printf "%15s %8d %8d %8d %8d (Longest Range)\n\n",  46  : var[1], var[2], var[4], var[5], var[3]  47  :  48  : split(saved["fastest"], var, ":")  49  : printf "%15s %8d %8d %8d %8d (Fastest)\n\n",  50  : var[1], var[2], var[4], var[5], var[3]  51  :  52  : } 
12 : longest = 13 : } 14 : 15 : if ($ 3 > heaviest) { 16 : saved["heaviest"] =
  1  : BEGIN {  2  :  3  : FS = ":"  4  : printf "\n Name Length Range"  5  : printf " Speed Weight\n"  6  :  7  : }  8  :  9  : {  10  : if ($2 > longest) {  11  : saved["longest"] = $0  12  : longest = $2  13  : }  14  :  15  : if ($  3  > heaviest) {  16  : saved["heaviest"] = $0  17  : heaviest = $3  18  : }  19  :  20  : if ($4 > longest_range) {  21  : saved["longest_range"] = $0  22  : longest_range = $4  23  : }  24  :  25  : if ($5 > fastest) {  26  : saved["fastest"] = $0  27  : fastest = $5  28  : }  29  : }  30  :  31  : END {  32  :  33  : printf " ” ” ” ” ” ” ” ” ” ” ” ” ” ” ” ”"  34  : printf " ” ” ” ” ” ” ” ”\n"  35  :  36  : split(saved["longest"], var, ":")  37  : printf "%15s %8d %8d %8d %8d (Longest)\n\n",  38  : var[1], var[2], var[4], var[5], var[3]  39  :  40  : split(saved["heaviest"], var, ":")  41  : printf "%15s %8d %8d %8d %8d (Heaviest)\n\n",  42  : var[1], var[2], var[4], var[5], var[3]  43  :  44  : split(saved["longest_range"], var, ":")  45  : printf "%15s %8d %8d %8d %8d (Longest Range)\n\n",  46  : var[1], var[2], var[4], var[5], var[3]  47  :  48  : split(saved["fastest"], var, ":")  49  : printf "%15s %8d %8d %8d %8d (Fastest)\n\n",  50  : var[1], var[2], var[4], var[5], var[3]  51  :  52  : } 
17 : heaviest = 18 : } 19 : 20 : if (> longest_range) { 21 : saved["longest_range"] =
  1  : BEGIN {  2  :  3  : FS = ":"  4  : printf "\n Name Length Range"  5  : printf " Speed Weight\n"  6  :  7  : }  8  :  9  : {  10  : if ($2 > longest) {  11  : saved["longest"] = $0  12  : longest = $2  13  : }  14  :  15  : if ($  3  > heaviest) {  16  : saved["heaviest"] = $0  17  : heaviest = $3  18  : }  19  :  20  : if ($4 > longest_range) {  21  : saved["longest_range"] = $0  22  : longest_range = $4  23  : }  24  :  25  : if ($5 > fastest) {  26  : saved["fastest"] = $0  27  : fastest = $5  28  : }  29  : }  30  :  31  : END {  32  :  33  : printf " ” ” ” ” ” ” ” ” ” ” ” ” ” ” ” ”"  34  : printf " ” ” ” ” ” ” ” ”\n"  35  :  36  : split(saved["longest"], var, ":")  37  : printf "%15s %8d %8d %8d %8d (Longest)\n\n",  38  : var[1], var[2], var[4], var[5], var[3]  39  :  40  : split(saved["heaviest"], var, ":")  41  : printf "%15s %8d %8d %8d %8d (Heaviest)\n\n",  42  : var[1], var[2], var[4], var[5], var[3]  43  :  44  : split(saved["longest_range"], var, ":")  45  : printf "%15s %8d %8d %8d %8d (Longest Range)\n\n",  46  : var[1], var[2], var[4], var[5], var[3]  47  :  48  : split(saved["fastest"], var, ":")  49  : printf "%15s %8d %8d %8d %8d (Fastest)\n\n",  50  : var[1], var[2], var[4], var[5], var[3]  51  :  52  : } 
22 : longest_range = 23 : } 24 : 25 : if (> fastest) { 26 : saved["fastest"] =
  1  : BEGIN {  2  :  3  : FS = ":"  4  : printf "\n Name Length Range"  5  : printf " Speed Weight\n"  6  :  7  : }  8  :  9  : {  10  : if ($2 > longest) {  11  : saved["longest"] = $0  12  : longest = $2  13  : }  14  :  15  : if ($  3  > heaviest) {  16  : saved["heaviest"] = $0  17  : heaviest = $3  18  : }  19  :  20  : if ($4 > longest_range) {  21  : saved["longest_range"] = $0  22  : longest_range = $4  23  : }  24  :  25  : if ($5 > fastest) {  26  : saved["fastest"] = $0  27  : fastest = $5  28  : }  29  : }  30  :  31  : END {  32  :  33  : printf " ” ” ” ” ” ” ” ” ” ” ” ” ” ” ” ”"  34  : printf " ” ” ” ” ” ” ” ”\n"  35  :  36  : split(saved["longest"], var, ":")  37  : printf "%15s %8d %8d %8d %8d (Longest)\n\n",  38  : var[1], var[2], var[4], var[5], var[3]  39  :  40  : split(saved["heaviest"], var, ":")  41  : printf "%15s %8d %8d %8d %8d (Heaviest)\n\n",  42  : var[1], var[2], var[4], var[5], var[3]  43  :  44  : split(saved["longest_range"], var, ":")  45  : printf "%15s %8d %8d %8d %8d (Longest Range)\n\n",  46  : var[1], var[2], var[4], var[5], var[3]  47  :  48  : split(saved["fastest"], var, ":")  49  : printf "%15s %8d %8d %8d %8d (Fastest)\n\n",  50  : var[1], var[2], var[4], var[5], var[3]  51  :  52  : } 
27 : fastest = 28 : } 29 : } 30 : 31 : END { 32 : 33 : printf " " 34 : printf " \n" 35 : 36 : split(saved["longest"], var, ":") 37 : printf "%15s %8d %8d %8d %8d (Longest)\n\n", 38 : var[1], var[2], var[4], var[5], var[3] 39 : 40 : split(saved["heaviest"], var, ":") 41 : printf "%15s %8d %8d %8d %8d (Heaviest)\n\n", 42 : var[1], var[2], var[4], var[5], var[3] 43 : 44 : split(saved["longest_range"], var, ":") 45 : printf "%15s %8d %8d %8d %8d (Longest Range)\n\n", 46 : var[1], var[2], var[4], var[5], var[3] 47 : 48 : split(saved["fastest"], var, ":") 49 : printf "%15s %8d %8d %8d %8d (Fastest)\n\n", 50 : var[1], var[2], var[4], var[5], var[3] 51 : 52 : }
end example
 

The sample output, given our previous data file, is shown below:

 # awk -f order.awk missiles.txt            Name    Length     Range     Speed    Weight                                             Titan        98      6300     15000    221500 (Longest)           Atlas        75      6300     17500    260000 (Heaviest)           Snark        67      6325       650     48147 (Longest Range)           Atlas        75      6300     17500    260000 (Fastest) # 

Awk does provide some shortcuts to simplify our application. Consider the following replacement to the END section of Listing 22.3 (see Listing 22.4).

Listing 22.4: Replacement of the END Section of Listing 22.3 (on the CD-ROM at ./source/ch21/order2.awk )
start example
  1  :       END {  2  :  3  :         printf "    "  4  :         printf "    \n"  5  :  6  :         for (name in saved) {  7  :  8  :           split(saved[name], var, ":")  9  :           printf "%15s  %8d  %8d  %8d  %8d (%s)\n\n",  10  :                  var[1], var[2], var[4], var[5], var[3], name  11  :  12  :         }  13  :  14  :       } 
end example
 

In this example, we illustrate awk s for loop, but using an index other than an integer (what we commonly think of for iterating through a loop). At line 6, we walk through the indices of the saved array ( longest , heaviest , longest__range , and fastest ). Using name at line 8, we split out the entry in the saved array for that index and emit the data as we did before.




GNU/Linux Application Programming
GNU/Linux Application Programming (Programming Series)
ISBN: 1584505680
EAN: 2147483647
Year: 2006
Pages: 203
Authors: M. Tim Jones

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net