Project62.Learn Advanced awk


Project 62. Learn Advanced awk

"What other advanced editing tasks can I accomplish with the awk command?"

This project covers three tasks that illustrate some of the more advanced editing capabilities of the awk text-processing language. Project 58 shows you how to apply such commands to a batch of files. Project 60 covers basic use of awk, and Projects 59 and 61 cover the sed command.

Process, Count and Report with awk

We'll illustrate some of the more advanced feature of awk tHRough three tasks: processing a CSV (comma-separated-value) file, counting lines in a file, and analyzing and reporting on a file.

The awk processing language is a full-fledged programming language, providing variables, conditional statements, loops, and functions. It follows very closely the syntax of the C programming language (which has been adopted by other languages, such as PHP). The aim of this project is not to teach the language but simply to give a few examples of awk's capabilities.

Process a CSV File

A CSV file is a plain-text representation of a data table or spreadsheet, and most spreadsheet programs can write and read CSV files. A CSV file contains a table of values in which columns are separated by commas and rows are separated by line breaks. Because awk expects white space to be used as a field (or column) separator, we must tell it to expect a comma instead. Here's a simple example that extracts the field 2 (surname) and field 4 (position) from a CSV file. We specify the field separator by using option -F.

$ cat people.csv scott,sheppard,mr,editor in chief,01 adrian,mayo,mr,editor,02 jan,forbes,miss,goddess,68 $ awk -F "," '{printf ("%-15s %s\n", $2,$4)}' people.csv sheppard       editor in chief mayo           editor forbes         goddess


Internally to awk, the variable FS holds the field separator, and we could have set it directly within the script as an action to the BEGIN pattern (which matches the start of the file).

$ awk 'BEGIN {FS=","}; {printf ("%-15s %s\n", $2,$4)}' ¬       people.csv


The field separator is actually a regular expression. Here's an example in which we specify a regular expression containing a list of possible separators.

$ cat people2.csv scott:sheppard,mr,editor in chief-01 adrian:mayo,mr,editor-02 jan:forbes,miss,goddess-68 $ awk 'BEGIN {FS=",|:|-"}; {printf ¬       ("%-15s %-15s Code %s\n", $2,$4, $5)}' people.csv sheppard     editor in chief Code 01 mayo         editor          Code 02 forbes       goddess         Code 68


Learn More

See Project 77 if you are unfamiliar with regular expressions.


Tip

The technique of setting the awk variable FS is handy when you wish to swap the field separator midway through a script. Identify a pattern that corresponds to the point in the text where the separator changes, and attach an action to it that resets the FS variable.


Count Lines

Take a look at this awk script, which counts the number of lines in a file and reports on which lines contain the text Sophie.

$ cat sophie.txt I hopped out of the car and promptly ate gravel. The non-retracting seat belt had wrapped itself around my ankle clearly attempting to do what Sophie failed to do during the drive home - kill me. :-) Sophie rushed to my rescue, helped me up, and brushed off the stones from my dress. There are better ways to get "stoned"! $ awk 'BEGIN {n=0} > {n=n+1} > /Sophie/{printf("Line %d\n", n)} > END {printf ("Total lines in file %d\n", n)}' sophie.txt Line 3 Line 4 Total lines in file 6


This example demonstrates the use of awk variables. We set variable n to be 0 at the start of the script, using the BEGIN pattern. For each line read, we increment n and print the current line number if the line contains Sophie. Finally, at the end of the file, matched by the special pattern END, we print the total number of lines in the file.

Use Conditionals

Our final example shows the use of conditional statements in an awk script. Suppose that we have a file, posts.txt, that lists the members of a bulletin board and the number of posts each member has made to the board. Let's process this file to find who has made the most posts and print the names of all members who have made 50 or more posts. We'll write our awk script to a script file instead of typing it on the command line. Here's our test input file.

$ cat posts.txt Mayo 50 posts Forbes 35 posts Sheppard 12 posts Trevor 345678 posts Hollis 17 posts


And here's our awk script, arbitrarily called script.awk.

$ cat script.awk BEGIN { print "Fifty or more posts"; max = 0; name = ""} {if ($2 >= 50) print $0} {if ($2 > max) {max = $2; name = $1}} END { printf ("Max posts %d by %s\n", max, name); print "---" }


Lines 2 and 3 are if statements. Line 2 tests whether the number of posts (the value of field 2) is greater than or equal to (using the relational operator >=) our threshold value of 50, and if so, the code prints the input line. Line 3 tests whether the number of posts is greater than the maximum so far (set to 0 at the beginning), and if so, the code saves this value and the poster's name to the variables max and name.

$ awk -f script.awk posts.txt Fifty or more posts Mayo 50 posts Trevor 345678 posts Max posts 345678 by Trevor ---


Tip

The awk language has constructs that mirror those of the C programming language. Check the man page for more details.





Mac OS X UNIX 101 Byte-Sized Projects
Mac OS X Unix 101 Byte-Sized Projects
ISBN: 0321374118
EAN: 2147483647
Year: 2003
Pages: 153
Authors: Adrian Mayo

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net