1.11 Shell Tools

awk , sed , and egrep are a related set of Unix shell tools for text processing. awk and egrep use a DFA match engine, and sed uses an NFA engine. For an explanation of the rules behind these engines, see Section 1.2.

This reference covers GNU egrep 2.4.2, a program for searching lines of text; GNU sed 3.02, a tool for scripting editing commands; and GNU awk 3.1, a programming language for text processing.

1.11.1 Supported Metacharacters

awk , egrep , and sed support the metacharacters and metasequences listed in Table 1-46 through Table 1-50. For expanded definitions of each metacharacter, see Section 1.2.1.

Table 1-46. Character representations





Alert (bell).

awk, sed


Backspace; supported only in character class.



Form feed.

awk, sed


Newline (line feed).

awk, sed


Carriage return.

awk, sed


Horizontal tab.

awk, sed


Vertical tab.

awk, sed

\o octal

A character specified by a one-, two-, or three-digit octal code.


\ octal

A character specified by a one-, two-, or three-digit octal code.


\x hex

A character specified by a two-digit hexadecimal code.

awk, sed

\d decimal

A character specified by a one, two, or three decimal code.

awk, sed

\c char

A named control character (e.g., \cC is Control-C).

awk, sed




\ metacharacter

Escape the metacharacter so that it literally represents itself.

awk, sed, egrep

Table 1-47. Character classes and class-like constructs





Matches any single character listed or contained within a listed range.

awk, sed, egrep


Matches any single character that is not listed or contained within a listed range.

awk, sed, egrep


Matches any single character, except newline.

awk, sed, egrep


Matches an ASCII word character, [a-zA-Z0-9_] .

egrep, sed


Matches a character that is not an ASCII word character, [^a-zA-Z0-9_] .

egrep, sed

[ : prop :]

Matches any character in the POSIX character class.

awk, sed

[^[ : prop :]]

Matches any character not in the POSIX character class.

awk, sed

Table 1-48. Anchors and other zero-width testshell tools





Matches only start of string, even if newlines are embedded.

awk, sed, egrep


Matches only end of search string, even if newlines are embedded.

awk, sed, egrep


Matches beginning of word boundary.



Matches end of word boundary.


Table 1-49. Comments and mode modifiers




flag: i or I

Case-insensitive matching for ASCII characters .


command-line option: -i

Case-insensitive matching for ASCII characters.


set IGNORECASE to non-zero

Case-insensitive matching for Unicode characters.


Table 1-50. Grouping, capturing, conditional, and control








Group and capture sub-matches, filling \1,\2,...,\9.


\ n

Contains the n th earlier submatch.


... ...

Alternation; match one or the other.

egrep, awk, sed

Greedy quantifiers



Match 0 or more times.

awk, sed, egrep


Match 1 or more times.

awk, sed, egrep


Match 1 or 0 times.

awk, sed, egrep

\{ n \}

Match exactly n times.

sed, egrep

\{ n ,\}

Match at least n times.

sed, egrep

\{ x , y \}

Match at least x times, but no more than y times.

sed, egrep


 egrep [   options   ]   pattern     files   

egrep searches files for occurrences of pattern and prints out each matching line.


 $ echo 'Spiderman Menaces City!' > dailybugle.txt $ egrep -i 'spider[- ]?man' dailybugle.txt Spiderman Menaces City! 

 sed '[   address1   ][,   address2   ]s/   pattern   /   replacement   /[   flags   ]'   files   sed -f   script     files   

By default, sed applies the substitution to every line in files . Each address can be either a line number or a regular expression pattern. A supplied regular expression must be defined within the forward slash delimiters ( /.. .). If address1 is supplied, substitution will begin on that line number or the first matching line, and continue until either the end of the file or the line indicated or matched by address2 .

Two subsequences, & and \ n , will be interpreted in replacement based on the results of the match. The sequence & is replaced with the text matched by pattern . The sequence \ n corresponds to a capture group (1..9) in the current match.

The available flags are:


Substitute the n th match in a line, where n is between 1 and 512.


Substitute all occurrences of pattern in a line.


Print lines with successful substitutions.

w file

Write lines with successful substitutions to file .


Change date formats from MM/DD/YYYY to DD.MM.YYYY.

 $ echo 12/30/1969'   sed 's!\([0-9][0-9]\)/\([0-9][0-9]\)/\([0-9]\{2,4\}\)!..!g' 

 awk '   instructions   '   files   awk -f   script     files   

The awk script contained in either instructions or script should be a series of / pattern / { action } pairs. The action code is applied to each line matched by pattern . awk also supplies several functions for pattern matching.


match( text , pattern )

If pattern matches in text , returns the position in text where the match starts. A failed match returns zero. A successful match also sets the variable RSTART to the position where the match started and the variable RLENGTH to the number of characters in the match.

gsub( pattern , replacement , text )

Substitutes each match of pattern in text with replacement and returns the number of substitutions. Defaults to $0 if text is not supplied.

sub ( pattern , replacement , text )

Substitutes first match of pattern in text with replacement . A successful substitution returns 1, and an unsuccessful substitution returns 0. Defaults to $0 if text is not supplied.


Create an awk file and then run it from the command line.

 $ cat sub.awk {     gsub(/https?:\/\/[a-z_.\w\/\#~:?+=&;%@!-]*/,                     "<a href=\"\&\">\&</a>");       print } $ echo "Check the website, http://www.oreilly.com/catalog/repr"  awk -f sub.awk 

1.11.2 Other Resources

  • sed & awk , by Dale Dougherty and Arnold Robbins (O'Reilly), is an introduction and reference to both tools.

Regular Expression Pocket Reference
Regular Expression Pocket Reference: Regular Expressions for Perl, Ruby, PHP, Python, C, Java and .NET (Pocket Reference (OReilly))
ISBN: 0596514271
EAN: 2147483647
Year: 2003
Pages: 41

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net