Section 4.2. Filtering Files: grep, egrep, fgrep, and uniq


[Page 109]

4.2. Filtering Files: grep, egrep, fgrep, and uniq

Often it's handy to be able to filter the contents of a file, selecting only those lines that match some kind of criteria. The utilities that do this include the following:

  • egrep, fgrep, and grep, which filter out all lines that do not contain a specified pattern

  • uniq, which filters out duplicate adjacent lines

The next few subsections describe these utilities.

4.2.1. Filtering Patterns: egrep/fgrep/grep

egrep, fgrep, and grep allow you to scan a file and filter out all of the lines that don't contain a specified pattern. They are very similar in nature, the main difference being the kind of text patterns that each can filter. I'll begin by describing the common features of all three, and then finish up by illustrating the differences. Figure 4-2 gives a brief synopsis of the three utilities.

Figure 4-2. Description of the grep command.

Utility: grep -hilnvw pattern { fileName } *

fgrep -hilnvwx string { fileName } *

egrep -hilnvw pattern { fileName } *

grep (Global or Get Regular Expression and Print) is a utility that allows you to search for a pattern in a list of files. If no files are specified, it searches standard input instead. pattern may be a regular expression. All lines that match the pattern are displayed to standard output. If more than one file is specified, each matching line is preceded by the name of the file unless the -h option is specified. The -n option precedes each matching line by its line number. The -i option causes the case of the patterns to be ignored. The -l option displays a list of the files that contain the specified pattern. The -v option causes grep to display all of the lines that don't match the pattern. The -w option restricts matching to occur on whole words only. fgrep (Fixed grep) is a fast version of grep that can only search for fixed strings. egrep (Extended grep) supports matching with regular expressions. fgrep supports an additional option; the -x option outputs only lines that are exactly equal to string.

For more information about regular expressions, consult the Appendix.


To obtain a list of all the lines in a file that contain a string, follow grep by the string and the name of the file to scan. Here's an example:

$ cat grepfile                 ...list the file to be filtered. Well you know it's your bedtime, So turn off the light, Say all your prayers and then, Oh you sleepy young heads dream of wonderful things, Beautiful mermaids will swim through the sea, 
[Page 110]
And you will be swimming there too. $ grep the grepfile ...search for the word "the". So turn off the light, Say all your prayers and then, Beautiful mermaids will swim through the sea, And you will be swimming there too. $ _


Notice that words that contain the string "the" also satisfied the matching condition. Here's an example of the -w and -n options:

$ grep -wn the grepfile       ...be more particular this time! 2:So turn off the light, 5:Beautiful mermaids will swim through the sea, $ _ 


To display only those lines in a file that don't match, use the -v option:

$ grep -wnv the grepfile      ...reverse the filter. 1:Well you know it's your bedtime, 3:Say all your prayers and then, 4:Oh you sleepy young heads dream of wonderful things, 6:And you will be swimming there too. $ _ 


If you specify more than one file to search, each selected line is preceded by the name of the file in which it appears. In the following example, I searched my C source files for the string "x". Please consult Chapter 5, "The Linux Shells," for a description of the shell wildcard mechanism.

$ grep -w x *.c      ...search all files ending in ".c". a.c:test (int x) fact2.c:long factorial (x) fact2.c:int x; fact2.c:  if ((x == 1) || (x == 0)) fact2.c:    result = x * factorial (x-1); $ grep -wl x *.c     ...list names of matching files. a.c fact2.c $ _ 


fgrep, grep, and egrep all support the options that I've described so far. The difference between them is that each allows a different kind of text pattern to be matched (Figure 4-3).


[Page 111]

Figure 4-3. The differences in the grep command family.

Utility

Kind of pattern that may be searched for

fgrep

Fixed string only.

grep

Regular expression.

egrep

Extended regular expression.


For information about regular expressions and extended regular expressions, consult the Appendix.

To illustrate the use of grep and egrep regular expressions, here is a piece of text followed by the lines of text that would match various regular expressions. When using egrep or grep, place regular expressions inside single quotes to prevent interference from the shell. In the examples in Figures 4-4 and 4-5 the portion of each line of this example text that satisfies the regular expression is italicized.

Well you know it's your bedtime, So turn off the light, Say all your prayers and then, Oh you sleepy young heads dream of wonderful things, Beautiful mermaids will swim through the sea, And you will be swimming there too. 


4.2.1.1. Matching Patterns

Figure 4-4. Pattern matching in grep.

[Page 112]

grep pattern

Lines that match

.nd

Say all your prayers and then,

Oh you sleepy young heads dream of wonderful things,

And you will be swimming there too.

^.nd

And you will be swimming there too.

sw.*ng

And you will be swimming there too.

[A-D]

Beautiful mermaids will swim through the sea,

And you will be swimming there too.

\.

And you will be swimming there too. (the "." matches)

a.

Say all your prayers and then,

Oh you sleepy young heads dream of wonderful things,

Beautiful mermaids will swim through the sea,

a.$

Beautiful mermaids will swim through the sea,

[a-m]nd

Say all your prayers and then,

[^a-m]nd

Oh you sleepy young heads dream of wonderful things,

And you will be swimming there too.


Figure 4-5. Pattern matching in egrep.
(This item is displayed on page 112 in the print version)

egrep Pattern

Lines that match

s.*w

Oh you sleepy young heads dream of wonderful things,

Beautiful mermaids will swim through the sea,

And you will be swimming there too.

s.+w

Oh you sleepy young heads dream of wonderful things,

Beautiful mermaids will swim through the sea,

off|will

So turn off the light,

Beautiful mermaids will swim through the sea,

And you will be swimming there too.

im*ing

And you will be swimming there too.

im?ing

<no matches>



[Page 112]

4.2.2. Removing Duplicate Lines: uniq

The uniq utility displays a file with all of its identical adjacent lines replaced by a single occurrence of the repeated line (Figure 4-6).

Figure 4-6. Description of the uniq command.

Utility: uniq -c -number [ inputfile [ outputfile ] ]

uniq is a utility that displays its input file with all adjacent repeated lines collapsed to a single occurrence of the repeated line. If an input file is not specified, standard input is read. The -c option causes each line to be preceded by the number of occurrences that were found. If number is specified, then number fields of each line are ignored.


Here's an example:

$ cat animals              ...look at the test file. cat  snake monkey  snake 
[Page 113]
dolphin elephant dolphin elephant goat elephant pig pig pig pig monkey pig $ uniq animals ...filter out duplicate adjacent lines. cat snake monkey snake dolphin elephant goat elephant pig pig monkey pig $ uniq -c animals ...display a count with the lines. 1 cat snake 1 monkey snake 2 dolphin elephant 1 goat elephant 2 pig pig 1 monkey pig $ uniq -1 animals ...ignore first field of each line. cat snake dolphin elephant pig pig $ _





Linux for Programmers and Users
Linux for Programmers and Users
ISBN: 0131857487
EAN: 2147483647
Year: 2007
Pages: 339

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net