4.9. grep “E or egrep (GNU Extended grep ) The main advantage of using extended grep is that additional regular expression metacharacters (see Table 4.10) have been added to the basic set. With the “E extension, GNU grep allows the use of these new metacharacters. Table 4.10. egrep 's Regular Expression Metacharacters Metacharacter | Function | Example | What It Matches | ^ | Beginning-of-line anchor | ^love | Matches all lines beginning with love . | $ | End-of-line anchor | love$ | Matches all lines ending with love . | . | Matches one character | l..e | Matches lines containing an l , followed by two characters , followed by an e . | * | Matches zero or more characters | *love | Matches lines with zero or more spaces, of the preceding characters followed by the pattern love . | [ ] | Matches one character in the set | [Ll]ove | Matches lines containing love or Love . | [^ ] | Matches one character not in the set | [^A “KM “Z]ove | Matches lines not containing A through K or M through Z , followed by ove . | New with grep “E or egrep | + | Matches one or more of the preceding characters | [a “z]+ove | Matches one or more lowercase letters , followed by ove . Would find move , approve , love , behoove , etc. | ? | Matches zero or one of the preceding characters | lo?ve | Matches for an l followed by either one or not any o 's at all. Would find love or lve . | ab | Matches either a or b | lovehate | Matches for either expression, love or hate . | () | Groups characters | love(ablely) (ov)+ | Matches for loveable or lovely . Matches for one or more occurrences of ov . | x{m} x{m,} x{m,n} [a] | Repetition of character x : m times, at least m times, or between m and n times | o\{5} o\{5,} o\{5,10} | Matches if line has 5 occurrences of o at least 5 occurrences of o , or between 5 and 10 occurrences of o . | \w | Alphanumeric word character; [a-zA-Z0-9_] | l\w*e | Matches an l followed by zero more word characters, and an e . | \W | Nonalphanumeric word character; [^a-zA-Z0-9_] | \W\w* | Matches a non-word (\W) character followed by zero or more word characters (\w) . | \b | Word boundary | \blove\b | Matches only the word love . | [a] The { } metacharacters are not supported on all versions of UNIX or all pattern-matching utilities; they usually work with vi and grep . They don't work with UNIX egrep at all. 4.9.1 grep “E and egrep Examples The following examples illustrate the way the extended set of regular expression metacharacters are used with grep “E and egrep . The grep examples presented earlier illustrate the use of the standard metacharacters, also recognized by egrep . With basic GNU grep ( grep “G ), it is possible to use any of the additional metacharacters, provided that each of the special metacharacters is preceded by a backslash. The following examples show all three variants of grep to accomplish the same task. The examples in this section use the following datafile , repeated periodically for your convenience. % cat datafile | northwest | NW | Charles Main | 3.0 | .98 | 3 | 34 | western | WE | Sharon Gray | 5.3 | .97 | 5 | 23 | southwest | SW | Lewis Dalsass | 2.7 | .8 | 2 | 18 | southern | SO | Suan Chin | 5.1 | .95 | 4 | 15 | southeast | SE | Patricia Hemenway | 4.0 | .7 | 4 | 17 | eastern | EA | TB Savage | 4.4 | .84 | 5 | 20 | northeast | NE | AM Main Jr. | 5.1 | .94 | 3 | 13 | north | NO | Margot Weber | 4.5 | .89 | 5 | 9 | central | CT | Ann Stephens | 5.7 | .94 | 5 | 13 | Example 4.41. 1 % egrep 'NWEA' datafile northwest NW Charles Main 3.0 .98 3 34 eastern EA TB Savage 4.4 .84 5 20 2 % grep -E 'NWEA' datafile northwest NW Charles Main 3.0 .98 3 34 eastern EA TB Savage 4.4 .84 5 20 3 % grep 'NWEA' datafile 4 % grep 'NW\EA' datafile northwest NW Charles Main 3.0 .98 3 34 eastern EA TB Savage 4.4 .84 5 20 EXPLANATION -
Prints the line if it contains either the expression NW or the expression EA . In this example, egrep is used. If you do not have the GNU version of grep , use egrep . -
In this example, the GNU grep is used with the “E option to include the extended metacharacters. Same as egrep . -
Regular grep does not normally support extended regular expressions; the vertical bar is an extended regular expression metacharacter used for alternation . Regular grep doesn't recognize it and searches for the explicit pattern 'NWEA' . Nothing matches; nothing prints. -
With GNU regular grep (grep “G) , if the metacharacter is preceded by a backslash it will be interpreted as an extended regular expression just as with egrep and grep “E . % cat datafile | northwest | NW | Charles Main | 3.0 | .98 | 3 | 34 | western | WE | Sharon Gray | 53 | .97 | 5 | 23 | southwest | SW | Lewis Dalsass | 2.7 | .8 | 2 | 18 | southern | SO | Suan Chin | 5.1 | .95 | 4 | 15 | southeast | SE | Patricia Hemenway | 4.0 | .7 | 4 | 17 | eastern | EA | TB Savage | 4.4 | .84 | 5 | 20 | northeast | NE | AM Main Jr. | 5.1 | .94 | 3 | 13 | north | NO | Margot Weber | 4.5 | .89 | 5 | 9 | central | CT | Ann Stephens | 5.7 | .94 | 5 | 13 | Example 4.42. % egrep '3+' datafile % grep -E '3+' datafile % grep '3\+' datafile northwest NW Charles Main 3.0 .98 3 34 western WE Sharon Gray 5.3 .97 5 23 northeast NE AM Main Jr. 5.1 .94 3 13 central CT Ann Stephens 5.7 .94 5 13 EXPLANATION Prints all lines containing one or more 3 s. Example 4.43. % egrep '2\.?[09]' datafile % grep -E '2\.?[09]' datafile % grep '2\.\?[09]' datafile western WE Sharon Gray 5.3 .97 5 23 southwest SW Lewis Dalsass 2.7 .8 2 18 eastern EA TB Savage 4.4 .84 5 20 EXPLANATION Prints all lines containing a 2 , followed by zero or one period, followed by a number in the range between 0 and 9. Example 4.44. % egrep '(no)+' datafile % grep -E '(no)+' datafile % grep '\(no\)\+' datafile northwest NW Charles Main 3.0 .98 3 34 northeast NE AM Main Jr. 5.1 .94 3 13 north NO Margot Weber 4.5 .89 5 9 EXPLANATION Prints lines containing one or more occurrences of the pattern group no . Example 4.45. % grep -E '\w+\W+[ABC]' datafile northwest NW Charles Main 3.0 .98 3 34 southern SO Suan Chin 5.1 .95 4 15 northeast NE AM Main Jr. 5.1 .94 3 13 central CT Ann Stephens 5.7 .94 5 13 EXPLANATION Prints all lines containing one or more alphanumeric word characters (\w+) , followed by one or more nonalphanumeric word characters (\W+) , followed by one letter in the set ABC . Example 4.46. % egrep 'S(hu)' datafile % grep -E 'S(hu)' datafile % grep 'S\(h\u\)' datafile western WE Sharon Gray 5.3 .97 5 23 southern SO Suan Chin 5.1 .95 4 15 EXPLANATION Prints all lines containing S , followed by either h or u ; i.e., Sh or Su . Example 4.47. % egrep 'Shu' datafile % grep -E 'Shu' datafile % grep 'Sh\u' datafile western WE Sharon Gray 5.3 .97 5 23 southern SO Suan Chin 5.1 .95 4 15 southwest SW Lewis Dalsass 2.7 .8 2 18 southeast SE Patricia Hemenway 4.0 .7 4 17 EXPLANATION Prints all lines containing the expression Sh or u . 4.9.2 Anomalies with Regular and Extended Variants of grep The variants of GNU grep supported by Linux are almost, but not the same, as their UNIX namesakes. For example, the version of egrep , found in Solaris or BSD UNIX, does not support three metacharacter sets: \{ \} for repetition, \(\) for tagging characters, and \< \> , the word anchors. Under Linux, these metacharacters are available with grep and grep “E , but egrep does not recognize \< \> . The following examples illustrate these differences, just in case you are running bash or tcsh under a UNIX system other than Linux, and you want to use grep and its family in your shell scripts. The examples in this section use the following datafile , repeated periodically for your convenience. % cat datafile | northwest | NW | Charles Main | 3.0 | .98 | 3 | 34 | western | WE | Sharon Gray | 53 | .97 | 5 | 23 | southwest | SW | Lewis Dalsass | 2.7 | .8 | 2 | 18 | southern | SO | Suan Chin | 5.1 | .95 | 4 | 15 | southeast | SE | Patricia Hemenway | 4.0 | .7 | 4 | 17 | eastern | EA | TB Savage | 4.4 | .84 | 5 | 20 | northeast | NE | AM Main Jr. | 5.1 | .94 | 3 | 13 | north | NO | Margot Weber | 4.5 | .89 | 5 | 9 | central | CT | Ann Stephens | 5.7 | .94 | 5 | 13 | Example 4.48. (Linux GNU grep) 1 % grep '<north>' datafile # Must use backslashes 2 % grep '\<north\>' datafile north NO Margot Weber 4.5 .89 5 9 3 % grep -E '\<north\>' datafile north NO Margot Weber 4.5 .89 5 9 4 % egrep '\<north\>' datafile north NO Margot Weber 4.5 .89 5 9 (Solaris egrep) 5 % egrep '\<north\>' datafile <no output; not recognized> EXPLANATION -
No matter what variant of grep is being used, the word anchor metacharacters, < > , must be preceded by a backslash. -
This time, grep searches for a word that begins and ends with north . \< represents the beginning-of-word anchor and \> represents the end-of-word anchor. -
Grep with the “E option also recognizes the word anchors. -
The GNU form of egrep recognizes the word anchors. -
When using Solaris (SVR4), egrep does not recognize word anchors as regular expression metacharacters. Example 4.49. (Linux GNU grep) 1 % grep 'w(es)t.*' datafile grep: Invalid back reference 2 % grep 'w\(es\)t.*' datafile northwest NW Charles Main 3.0 .98 3 34 3 % grep -E 'w(es)t.*' datafile northwest NW Charles Main 3.0 .98 3 34 4 % egrep 'w(es)t.*' datafile northwest NW Charles Main 3.0 .98 3 34 (Solaris egrep) 5 % egrep 'w(es)t.*' datafile <no output; not recognized> EXPLANATION -
When using regular grep , the () extended metacharacters must be backslashed or an error occurs. -
If the regular expression, w\(es\)t , is matched, the pattern, es , is saved and stored in memory register 1. The expression reads: if west is found, tag and save es , search for any number of characters (.*) after it, followed by es (\1) again, and print the line. The es in Charles is matched by the backreference. -
This is the same as the previous example, except that grep with the “E switch does not precede the () with backslashes. -
The GNU egrep also uses the extended metacharacters, () , without backslashes. -
With Solaris, egrep doesn't recognize any form of tagging and backreferencing. % cat datafile | northwest | NW | Charles Main | 3.0 | .98 | 3 | 34 | western | WE | Sharon Gray | 5.3 | .97 | 5 | 23 | southwest | SW | Lewis Dalsass | 2.7 | .8 | 2 | 18 | southern | SO | Suan Chin | 5.1 | .95 | 4 | 15 | southeast | SE | Patricia Hemenway | 4.0 | .7 | 4 | 17 | eastern | EA | TB Savage | 4.4 | .84 | 5 | 20 | northeast | NE | AM Main Jr. | 5.1 | .94 | 3 | 13 | north | NO | Margot Weber | 4.5 | .89 | 5 | 9 | central | CT | Ann Stephens | 5.7 | .94 | 5 | 13 | Example 4.50. (Linux GNU grep) 1 % grep '\.[0-9]\{2\}[^0-9]' datafile northwest NW Charles Main 3.0 .98 3 34 western WE Sharon Gray 5.3 .97 5 23 southern SO Suan Chin 5.1 .95 4 15 eastern EA TB Savage 4.4 .84 5 20 northeast NE AM Main Jr. 5.1 .94 3 13 north NO Margot Weber 4.5 .89 5 9 central CT Ann Stephens 5.7 .94 5 13 2 % grep -E '\.[0-9]{2}[^0-9]' datafile northwest NW Charles Main 3.0 .98 3 34 western WE Sharon Gray 5.3 .97 5 23 southern SO Suan Chin 5.1 .95 4 15 eastern EA TB Savage 4.4 .84 5 20 northeast NE AM Main Jr. 5.1 .94 3 13 north NO Margot Weber 4.5 .89 5 9 central CT Ann Stephens 5.7 .94 5 13 3 % egrep '\.[0-9]{2}[^0-9]' datafile northwest NW Charles Main 3.0 .98 3 34 western WE Sharon Gray 5.3 .97 5 23 southern SO Suan Chin 5.1 .95 4 15 eastern EA TB Savage 4.4 .84 5 20 northeast NE AM Main Jr. 5.1 .94 3 13 north NO Margot Weber 4.5 .89 5 9 central CT Ann Stephens 5.7 .94 5 13 (Solaris egrep) 4 % egrep '\.[0-9]{2}[^0-9]' datafile <no output; not recognized with or without backslashes> EXPLANATION -
The extended metacharacters, { } , are used for repetition. The GNU and UNIX versions of regular grep do not evaluate this extended metacharacter set unless the curly braces are preceded by backslashes. The whole expression reads: search for a literal period \. , followed by a number between 0 and 9, [0 “9] , if the pattern is repeated exactly two times, \{2\} , followed by a nondigit [^0 “9] . -
With extended grep , grep “E , the repetition metacharacters, {2} , do not need to be preceded with backslashes as in the previous example. -
Because GNU egrep and grep “E are functionally the same, this command produces the same output as the previous example. -
This is the standard UNIX version of egrep . It does not recognize the curly braces as an extended metacharacter set either with or without backslashes. |