6.6. Records and Fields

 <  Day Day Up  >  

6.6.1 Records

Awk does not see input data as an endless string of characters , but sees it as having a format or structure. By default, each line is called a record and is terminated with a newline.

The Record Separator

By default, the output and input record separator (line separator) is a carriage return (newline), stored in the built-in awk variables ORS and RS , respectively. The ORS and RS values can be changed, but only in a limited fashion.

The $0 Variable

An entire record is referenced as $0 by awk . (When $0 is changed by substitution or assignment, the value of NF , the number of fields, may be changed.) The newline value is stored in awk 's built-in variable RS , a carriage return by default.

Example 6.11.
 %  cat employees   Tom Jones       4424      5/12/66     543354   Mary Adams      5346      11/4/63     28765   Sally Chang     1654      7/22/54     650000   Billy Black     1683      9/23/44     336500  %  nawk '{print 
 %  cat employees   Tom Jones 4424 5/12/66 543354   Mary Adams 5346 11/4/63 28765   Sally Chang 1654 7/22/54 650000   Billy Black 1683 9/23/44 336500  %  nawk '{print $0}' employees   Tom Jones 4424 5/12/66 543354   Mary Adams 5346 11/4/63 28765   Sally Chang 1654 7/22/54 650000   Billy Black 1683 9/23/44 336500  
}' employees
Tom Jones 4424 5/12/66 543354 Mary Adams 5346 11/4/63 28765 Sally Chang 1654 7/22/54 650000 Billy Black 1683 9/23/44 336500

EXPLANATION

The nawk variable $0 holds the current record. It is printed to the screen. By default, nawk would also print the record if the command were

 %  nawk '{print}' employees  

The NR Variable

The number of each record is stored in awk 's built-in variable, NR . After a record has been processed , the value of NR is incremented by one.

Example 6.12.
 %  cat employees   Tom Jones       4424     5/12/66     543354   Mary Adams      5346     11/4/63     28765   Sally Chang     1654     7/22/54     650000   Billy Black     1683     9/23/44     336500  %  nawk '{print NR, 
 %  cat employees   Tom Jones 4424 5/12/66 543354   Mary Adams 5346 11/4/63 28765   Sally Chang 1654 7/22/54 650000   Billy Black 1683 9/23/44 336500  %  nawk '{print NR, $0}' employees   1 Tom Jones 4424 5/12/66 543354   2 Mary Adams 5346 11/4/63 28765   3 Sally Chang 1654 7/22/54 650000   4 Billy Black 1683 9/23/44 336500  
}' employees
1 Tom Jones 4424 5/12/66 543354 2 Mary Adams 5346 11/4/63 28765 3 Sally Chang 1654 7/22/54 650000 4 Billy Black 1683 9/23/44 336500

EXPLANATION

Each record, $0 , is printed as it is stored in the file and is preceded with the number of the record, NR .

6.6.2 Fields

Each record consists of words called fields that, by default, are separated by whitespace (blank spaces or tabs). Each of these words is called a field, and awk keeps track of the number of fields in its built-in variable, NF . The value of NF can vary from line to line, and the limit is implementation-dependent, typically 100 fields per line. New fields can be created. The following example has four records (lines) and five fields ( columns ). Each record starts at the first field, represented as $1 , then moves to the second field, $2 , and so forth.

Example 6.13.
 (Fields are represented by a dollar sign and the number of the field.) (The Database)    Tom       Jones      4424     5/12/66      543354   Mary      Adams      5346     11/4/63      28765   Sally     Chang      1654     7/22/54      650000   Billy     Black      1683     9/23/44      336500  %  nawk '{print NR, , , }'  employees   1 Tom Jones 543354   2 Mary Adams 28765   3 Sally Chang 650000   4 Billy Black 336500  

EXPLANATION

Nawk will print the number of the record ( NR ), and the first, second, and fifth fields (columns) of each line in the file.

Example 6.14.
 %  nawk  '{print 
 %  nawk '{print $0, NF}' employees   Tom Jones 4444 5/12/66 543354 5   Mary Adams 5346 11/4/63 28765 5   Sally Chang 1654 7/22/54 650000 5   Billy Black 1683 9/23/44 336500 5  
, NF}' employees
Tom Jones 4444 5/12/66 543354 5 Mary Adams 5346 11/4/63 28765 5 Sally Chang 1654 7/22/54 650000 5 Billy Black 1683 9/23/44 336500 5

EXPLANATION

Nawk will print each record ( $0 ) in the file, followed by the number of fields.

6.6.3 Field Separators

The Input Field Separator

Awk 's built-in variable, FS , holds the value of the input field separator. When the default value of FS is used, awk separates fields by spaces and/or tabs, stripping leading blanks and tabs. The value of FS can be changed by assigning a new value to it, either in a BEGIN statement or at the command line. For now, we will assign the new value at the command line. To change the value of FS at the command line, the “F option is used, followed by the character representing the new separator.

Changing the Field Separator at the Command Line

See Example 6.15 for a demonstration on how to change the input field separator at the command line using the -F option.

Example 6.15.
 %  cat employees   Tom Jones:4424:5/12/66:543354   Mary Adams:5346:11/4/63:28765   Sally Chang:1654:7/22/54:650000   Billy Black:1683:9/23/44:336500  %  nawk F: '/Tom Jones/{print , }'  employees2   Tom Jones  4424  

EXPLANATION

The “F option is used to reassign the value of the input field separator at the command line. When a colon is placed directly after the “F option, nawk will look for colons to separate the fields in the employees file.

Using More Than One Field Separator

You may specify more than one input separator. If more than one character is used for the field separator, FS , then the string is a regular expression and is enclosed in square brackets. In the following example, the field separator is a space, colon, or tab. (The old version of awk did not support this feature.)

Example 6.16.
 %  nawk F'[ :\t]'  '{print , , }' employees   Tom Jones 4424   Mary Adams 5346   Sally Chang 1654   Billy Black 1683  

EXPLANATION

The “F option is followed by a regular expression enclosed in brackets. If a space, colon, or tab is encountered , nawk will use that character as a field separator. The expression is surrounded by quotes so that the shell will not pounce on the metacharacters for its own. (Remember that the shell uses brackets for filename expansion.)

The Output Field Separator

The default output field separator is a single space and is stored in awk 's internal variable, OFS . In all of the examples thus far, we have used the print statement to send output to the screen. The comma that is used to separate fields in print statements evaluates to whatever the OFS has been set. If the default is used, the comma inserted between $1 and $2 will evaluate to a single space and the print function will print the fields with a space between them.

The fields are jammed together if a comma is not used to separate the fields. The OFS will not be evaluated unless the comma separates the fields. The OFS can be changed.

Example 6.17.
 %  cat employees2   Tom Jones:4424:5/12/66:543354   Mary Adams:5346:11/4/63:28765   Sally Chang:1654:7/22/54:650000   Billy Black:1683:9/23/44:336500  (The Command Line) %  nawk F: '/Tom Jones/{print , , , }' employees2   Tom Jones  4424 5/12/66  543354  

EXPLANATION

The output field separator, a space, is stored in nawk 's OFS variable. The comma between the fields evaluates to whatever is stored in OFS . The fields are printed to standard output separated by a space.

Example 6.18.
 %  nawk F: '/Tom Jones/{print 
 %  nawk  “F: '/Tom Jones/{print $0}' employees2   Tom Jones:4424:5/12/66:543354  
}' employees2
Tom Jones:4424:5/12/66:543354

EXPLANATION

The $0 variable holds the current record exactly as it is found in the input file. The record will be printed as is.

 <  Day Day Up  >  


UNIX Shells by Example
UNIX Shells by Example (4th Edition)
ISBN: 013147572X
EAN: 2147483647
Year: 2004
Pages: 454
Authors: Ellie Quigley

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net