Among the most commonly used tools in the UNIX System are those for finding words in files, especially grep, fgrep, and egrep. These commands search for text that matches a target or pattern that you specify You can use them to extract information from files, to search the output of a command for lines relating to a particular item, and to locate files containing a particular key word.
The three commands in the grep family are very similar. All of them print lines matching a target. They differ, however, in how you specify the search targets.
grep is the most commonly used of the three commands. It lets you search for a target which may be one or more words or patterns containing wildcards and other regular expression elements.
fgrep (fixed grep) does not allow regular expressions but does allow you to search for multiple targets.
egrep (extended grep) takes a richer set of regular expressions, as well as allowing multiple target searches, and is considerably faster than grep.
The grep command searches through one or more files for lines containing a target and then prints all of the matching lines it finds. For example, the following command prints all lines in the file mtg_note that contain the word “room”:
$ grep room mtg_note will be held at 2:00 in room 1J303. We will discuss
Note that you specify the target as the first argument and follow it with the names of the files to search. Think of the command as “search for target in file.”
The target can be a phrase-that is, two or more words separated by spaces. If the target contains spaces, however, you have to enclose it in quotes to prevent the shell from treating the different words as separate arguments. The following searches for lines containing the phrase “boxing wizards” in the file pangrams:
$ grep "boxing wizards" pangrams The five boxing wizards jump quickly.
Note that if the words “boxing” and “wizards” appear on different lines (separated by a newline character), grep will not find them, because it looks at only one line at a time.
If you give grep two or more files to search, it includes the name of the file before each line of output. For example, the following command searches for lines containing the string “vacation” in all of the files in the current directory:
$ grep vacation * mbox: I'll be gone on vacation July 24–28, but we could meet mbox: so, the only week when we're all available for a vacation savemail: sounds like a great idea for a vacation. I'd love
The output lists the names of the two files that contain the target word “vacation”-mbox and savemail-and the line(s) containing the target in each file.
You can use this feature to locate a file when you have forgotten its name but remember a key word that would identify it. For example, if you keep copies of your saved e-mail in a particular directory, you can use grep to find the one dealing with a particular subject by searching for a word or phrase that you know is contained in it. The following command shows how you can use grep to find a mail from someone named Dan:
$ grep Dan * savemail27: From: Dan N <dnidz> savemail43: well, sure. Dancing is pretty good exercise, so I
This shows you that the letter you were looking for is in the file savemail27.
The examples so far have used grep to search for specific words or strings of text, but grep also allows you to search for patterns that may match a number of different words or strings. The patterns for grep can be the same kinds of regular expressions that were described in Chapter 5. For example,
$ grep 'ch.*se' recipes
will find entries containing “chinese” or “cheese”, or in fact any line that has a ch sometime before an se, including something like “Blanch for 45 seconds”.
In the preceding pattern, the dot (.) matches any character other than newline. The asterisk says that those characters may be repeated any number of times. Together, .* indicates any string of any characters. Note that in this example the target pattern “ch.*se” is enclosed in single quotation marks. This prevents the asterisk from being treated by the shell as a filename wildcard. In general, you need to use quotes around any regular expression containing a character that has special meaning for the shell. (Filename wildcards and other special shell symbols are discussed in Chapter 4.)
Other regular expression symbols that are often useful in specifying targets for grep include the caret (^) and dollar sign ($), which are used to anchor words to the beginning and end of lines, and brackets ([ ]), which are used to indicate a class of characters. The following example shows how these can be used to specify patterns as targets:
$ grep '^Section [1–9]$' manuscript
This command finds all lines that contain just “Section n”, where n is a number from 1 to 9, in the file manuscript. The caret at the beginning and the dollar sign at the end indicate that the pattern must match the whole line. The brackets indicate that the target can include any one of the numbers from 1 to 9.
Table 19–1 lists regular expression symbols that are useful in forming grep search patterns.
Symbol | Definition | Example | Matches |
---|---|---|---|
. | Matches any single character. | th.nk | think, thank, thunk, etc. |
\ | Quotes the following character. | script\.py | script.py |
* | Matches zero or more repetitions of the previous item. | ap*le | ale, apple, etc. |
[ ] | Matches any one of the characters inside. | [QqXx] | Q, q, X, or x |
[a-z] | Matches any one of the characters in the range. | [0–9]* | any number: 0110, 27, 9876, etc. |
^ | Matches the beginning of a line. | ^If | any line beginning with If |
$ | Matches the end of a line. | \.$ | any line ending in a period |
Normally, grep distinguishes between uppercase and lowercase. For example, the following command would find “Unix” but not “UNIX" or “unix”:
$ grep Unix notes
You can use the −i (ignore case) option to find all lines containing a target regardless of uppercase and lowercase distinctions. This command finds all occurrences of the word “unix” regardless of capitalization:
$ grep −i unix notes
The −r option causes grep to recursively search files in all the subdirectories of the current directory.
$ grep −r "\.p[ly]" * PerlScripts/quickmail.pl: # usage: quickmail.pl recipient subject contents PythonScripts/zwrite.py: # usage: zwrite.py username
The backslash (\) prevents the dot (.) from being treated as a regular expression character-it represents a period here, so grep searches for a file containing “.pl” or “.py”. Be careful: if the directory contains many subdirectories with many files in them, it can take a very long time for a command like this to complete.
Another useful grep option, −n, allows you to list the line number on which the target (here, while) is found. For example,
$ grep −n while perlsample.pl 4: while (<>){ 11: while ($n > −0) {
One of the common uses of grep is to find which of several files in a directory deals with a particular topic. If all you want is to identify the files that contain a particular word or pattern, there is no need to print out the matching lines. With the −l (list) option, grep suppresses the printing of matching lines and just prints the names of files that contain the target. The following example lists all files in the current directory that include the word “Duckpond”:
$ grep −l Duckpond * about.html index.html report.cgi
You can use this option with the shell command substitution feature described in Chapter 4 to use these filenames as arguments to another UNIX System command. For example, the following command will use more to list all the files found by grep:
more 'grep −l Duckpond *'
By default, grep finds all lines that match the target pattern. Sometimes, though, it is useful to find the lines that do not match a particular pattern. You can do this with the −v option, which tells grep to print all lines that do not contain the specified target. This provides a quick way to find entries in a file that are missing a required piece of information. For example, suppose the file phonenums contains your personal phone book. The following command will print all lines in phonenums that do not contain numbers:
$ grep −v '[0–9]' phonenums
The −v option can also be useful for removing unwanted information from the output of another command. Chapter 3 described the file command and showed how you can use it to get a short description of the type of information contained in a file. Because the file command includes the word “directory” in its output for directories, you could list all files in the current directory that are not directories by piping the output of file to grep −v, as shown in the following example:
$ file * | grep −v directory
The fgrep command is similar to grep, but with three main differences: You can use it to search for several targets at once, it does not allow you to use regular expressions to search for patterns, and it is faster than grep. When you need to search many files or a very large file, the difference in speed can be significant.
With fgrep, you can search for lines containing any one of several targets. For example, the following command finds all entries in the phone_nums file that contain any of the words “saul”, “michelle”, or “anita”:
$ fgrep "saul > michelle > anita" phone_nums
The output might look like this:
saul 555–1122 saul (home) 555–1100 michelle 555–3344 anita 555–6677
When you give fgrep multiple search targets, each one must be on a separate line, and the entire search string must be in quotation marks. In this example, if you didn’t put michelle on a separate line you would be searching for saul michelle, and if you left out the quotes, the command would execute as soon as you hit ENTER.
With the −f (file) option, you can tell fgrep to take the search targets from a file, rather than having to enter them directly If you had a file in your home directory named .friends containing the usernames of your friends on the system, you could use fgrep to search the output of the finger command for the names on your list, like this:
$ finger | fgrep −f −/.friends
The egrep command is the most powerful member of the grep command family You can use it like fgrep to search for multiple targets, and it provides a larger set of regular expressions than grep. In fact, if you find yourself using the extended features of egrep often, you may want to add an alias that replaces grep with egrep in your shell configuration file. (For example, if you are using bash, you could add the line “alias grep=egrep” to your .bashrc.)
You can tell egrep to search for several targets in two ways: by putting them on separate lines as in fgrep, or by separating them with the vertical bar or pipe symbol (|). For example, the following command uses the pipe symbol to tell egrep to search for the words dan, robin, ben, and mari in the file phone_list:
$ egrep "dan|robin ben|mari" phone_list dan dnidz x1234 robin rpelc x3141 ben bsquared x9876 marissa mbaskett x2718
Note that there are no spaces between the pipe symbol and the targets. If there were, egrep would consider the spaces part of the target string. Also note the use of quotation marks to prevent the shell from interpreting the pipe symbol as an instruction to create a pipeline.
Table 19–2 summarizes the egrep extensions to the grep regular expression symbols.
Symbol | Definition | Example | Matches |
---|---|---|---|
+ | Matches one or more repetitions of the previous item. | .+ | any non-empty line |
? | Matches the previous item zero or one times. | index\.html? | index.htm, index.html |
( ) | Groups a portion of the pattern. | script(\.pl)? | script, script.pl |
| | Matches either the value before or after the |. | (E|e)xit | Exit, exit |
The egrep command provides most of the basic options of both grep and fgrep. You can tell it to ignore uppercase and lowercase distinctions (−i), search recursively through subdirectories (−r), print the line number of each match (−n), print only the names of files containing target lines (−l), print lines that do not contain the target (−v), and take the list of targets from a file (−f).