Project 62. Learn Advanced awk"What other advanced editing tasks can I accomplish with the awk command?" This project covers three tasks that illustrate some of the more advanced editing capabilities of the awk text-processing language. Project 58 shows you how to apply such commands to a batch of files. Project 60 covers basic use of awk, and Projects 59 and 61 cover the sed command. Process, Count and Report with awkWe'll illustrate some of the more advanced feature of awk tHRough three tasks: processing a CSV (comma-separated-value) file, counting lines in a file, and analyzing and reporting on a file. The awk processing language is a full-fledged programming language, providing variables, conditional statements, loops, and functions. It follows very closely the syntax of the C programming language (which has been adopted by other languages, such as PHP). The aim of this project is not to teach the language but simply to give a few examples of awk's capabilities. Process a CSV FileA CSV file is a plain-text representation of a data table or spreadsheet, and most spreadsheet programs can write and read CSV files. A CSV file contains a table of values in which columns are separated by commas and rows are separated by line breaks. Because awk expects white space to be used as a field (or column) separator, we must tell it to expect a comma instead. Here's a simple example that extracts the field 2 (surname) and field 4 (position) from a CSV file. We specify the field separator by using option -F. $ cat people.csv scott,sheppard,mr,editor in chief,01 adrian,mayo,mr,editor,02 jan,forbes,miss,goddess,68 $ awk -F "," '{printf ("%-15s %s\n", $2,$4)}' people.csv sheppard editor in chief mayo editor forbes goddess Internally to awk, the variable FS holds the field separator, and we could have set it directly within the script as an action to the BEGIN pattern (which matches the start of the file). $ awk 'BEGIN {FS=","}; {printf ("%-15s %s\n", $2,$4)}' ¬ people.csv The field separator is actually a regular expression. Here's an example in which we specify a regular expression containing a list of possible separators. $ cat people2.csv scott:sheppard,mr,editor in chief-01 adrian:mayo,mr,editor-02 jan:forbes,miss,goddess-68 $ awk 'BEGIN {FS=",|:|-"}; {printf ¬ ("%-15s %-15s Code %s\n", $2,$4, $5)}' people.csv sheppard editor in chief Code 01 mayo editor Code 02 forbes goddess Code 68 Learn More
Tip
Count LinesTake a look at this awk script, which counts the number of lines in a file and reports on which lines contain the text Sophie. $ cat sophie.txt I hopped out of the car and promptly ate gravel. The non-retracting seat belt had wrapped itself around my ankle clearly attempting to do what Sophie failed to do during the drive home - kill me. :-) Sophie rushed to my rescue, helped me up, and brushed off the stones from my dress. There are better ways to get "stoned"! $ awk 'BEGIN {n=0} > {n=n+1} > /Sophie/{printf("Line %d\n", n)} > END {printf ("Total lines in file %d\n", n)}' sophie.txt Line 3 Line 4 Total lines in file 6 This example demonstrates the use of awk variables. We set variable n to be 0 at the start of the script, using the BEGIN pattern. For each line read, we increment n and print the current line number if the line contains Sophie. Finally, at the end of the file, matched by the special pattern END, we print the total number of lines in the file. Use ConditionalsOur final example shows the use of conditional statements in an awk script. Suppose that we have a file, posts.txt, that lists the members of a bulletin board and the number of posts each member has made to the board. Let's process this file to find who has made the most posts and print the names of all members who have made 50 or more posts. We'll write our awk script to a script file instead of typing it on the command line. Here's our test input file. $ cat posts.txt Mayo 50 posts Forbes 35 posts Sheppard 12 posts Trevor 345678 posts Hollis 17 posts And here's our awk script, arbitrarily called script.awk. $ cat script.awk BEGIN { print "Fifty or more posts"; max = 0; name = ""} {if ($2 >= 50) print $0} {if ($2 > max) {max = $2; name = $1}} END { printf ("Max posts %d by %s\n", max, name); print "---" } Lines 2 and 3 are if statements. Line 2 tests whether the number of posts (the value of field 2) is greater than or equal to (using the relational operator >=) our threshold value of 50, and if so, the code prints the input line. Line 3 tests whether the number of posts is greater than the maximum so far (set to 0 at the beginning), and if so, the code saves this value and the poster's name to the variables max and name. $ awk -f script.awk posts.txt Fifty or more posts Mayo 50 posts Trevor 345678 posts Max posts 345678 by Trevor --- Tip
|