Archive files are good mechanisms to reduce the complexity of
manipulating large
Grep is a scan utility that originated with Unix systems, but is now implemented in many environments, including Qshell. The grep utility looks for files that contain character strings. Although grep is typically used with text files, it may also be used with binary files.
Anyone who deals with the Integrated File System should learn to use grep , because the Find String Using PDM command (FNDSTRPDM) won't work with IFS files. Even if you don't deal with IFS, you still might want use grep because it works with source physical files and has more powerful search capabilities than FNDSTRPDM.
Different sources give different versions of the origin of the
Here is the syntax of grep :
grep [options] regular-expression [input-files]
The
grep print *.java
By default, the search is case-sensitive, so this grep command will not find Print , PRINT , pRINT , or any other combination of cases.
Grep
would be useful even if this were the only kind of search it could perform, but
grep
can do much more, because it
|
Metacharacter |
Description |
|---|---|
|
(period) |
Match any character except end-of-line. |
|
* |
Match zero or more occurrences of the
|
|
^ |
Match from the beginning of the line. |
|
$ |
Match from the end of the line. |
|
[ ] |
Match any character within the brackets. Ranges may be specified with a hyphen. |
|
[^ ] |
Negates the groups or ranges of characters in the brackets. The caret must be the first character within the brackets. |
|
\{m\} |
Match exactly m occurrences of the preceding pattern. |
|
\{m,\} |
Match m or more occurrences of the preceding pattern. |
|
\{m,n\} |
Match m to n occurrences of the preceding pattern. |
|
\ |
|
|
\(\) |
Define a back reference to save matched characters as a pattern. The matched pattern can be referred to with a backslash followed by a number later in the expression. |
You may also use certain symbolic names in place of characters. These are shown in Table 17.2.
|
Symbol |
Description |
|---|---|
|
[[:alpha:]] |
Any letter in either case |
|
[[:upper:]] |
Any uppercase letter |
|
[[:lower:]] |
Any lowercase letter |
|
[[:digit:]] |
Any decimal digit |
|
[[:xdigit:]] |
Any hexadecimal digit, where A-F may be upper or lowercase |
|
[[:alnum:]] |
Any letter or decimal digit |
|
[[:space:]] |
Any space, tab,
|
|
[[:blank:]] |
Any space or tab character |
|
[[:punct:]] |
Any punctuation mark |
|
[[:cntrl:]] |
Any control character |
|
[[:print:]] |
Any printable character |
|
[[:graph:]] |
Any character that is not a letter, digit, or punctuation mark |
To
|
|
cat goodoleboys.txt Name Born Phone Dog Wife Shotgun Paid ========= ======== ======== ======== ========= ======= ===== Chuck Dec 25 444-2345 Blue Mary Sue 12 .50 Bubba Oct 13 444-1111 Buck Mary Jean 12 Billy Bob June 11 444-4340 Leotis Lisa Sue 12 Amos Jan 4 333-1119 Amos Abigail 20 Otis Sept 17 444-8000 Ol' Sal Sally 12 Claude May 31 333-4340 Blue Etheline 12 Roscoe Feb 2 444-2234 Rover Alice Jean 410 Arlis June 19 444-1314 Redeye Suzy Beth 12 .75 Junior April 30 BR-549 Percival Lilly Faye 12 Bill Feb 29 333-4444 Daisy Daisy 20 Ernest T. ?? none none none none
|
|
The first example simply finds all lines that begin with uppercase C:
grep ^C goodoleboys.txt Chuck Dec 25 444-2345 Blue Mary Sue 12 .50 Claude May 31 333-4340 Blue Etheline 12
Figure 17.2 is a slightly more complex example. It finds all lines that end with a zero. The first form of
grep
shown in the figure is used with files that are delimited with a single linefeed character, as is typical of Unix files. The second form is for files that are delimited with a combination of carriage-return and
|
|
grep '0$' goodoleboys.txt grep '0[[:cntrl:]]$' goodoleboys.txt Chuck Dec 25 444-2345 Blue Mary Sue 12 .50 Amos Jan 4 333-1119 Amos Abigail 20 Roscoe Feb 2 444-2234 Rover Alice Jean 410 Bill Feb 29 333-4444 Daisy Daisy 20
|
|
The following grep command looks for lines that contain a dollar sign followed by any two characters and a period:
grep '$..\.' goodoleboys.txt Arlis June 19 444-1314 Redeye Suzy Beth 12 .75
The first two periods function as metacharacters. The dollar sign and last period do not because they are preceded with backslashes.
These two grep commands find lines that contain a single quote:
grep " ' " goodoleboys.txt grep \' goodoleboys.txt Otis Sept 17 444-8000 Ol' Sal Sally 12
The first line shows that double quotes can "escape" single quotes.
This command finds lines with three zeros together:
grep '0\{3\}' goodoleboys.txt
Otis Sept 17 444-8000 Ol' Sal Sally 12
Figure 17.3 expands on the previous examples to find lines where the same uppercase letter followed by a lowercase letter is repeated. The \(and \) pair indicates that the match is to be saved as a pattern, which can be referred to as \1. If other patterns were saved, they would be referred to as \2, \3 , etc. These expressions are known as back references .
|
|
grep '\([A-Z][a-z]\).*' goodoleboys.txt Bubba Oct 13 444-1111 Buck Mary Jean 12 Amos Jan 4 333-1119 Amos Abigail 20 Otis Sept 17 444-8000 Ol' Sal Sally 12 Roscoe Feb 2 444-2234 Rover Alice Jean 410 Bill Feb 29 333-4444 Daisy Daisy 20
|
|
In the first line returned, the pattern Bu is found twice. In the second line, Ro is repeated. In the third line, Sa is repeated.
Figure 17.4 illustrates the use of grep metacharacters that have to do with including and excluding characters in a search.
|
|
grep '^[CR]' goodoleboys.txt Chuck Dec 25 444-2345 Blue Mary Sue 12 .50 Claude May 31 333-4340 Blue Etheline 12 Roscoe Feb 2 444-2234 Rover Alice Jean 410 grep '^[C-R]' goodoleboys.txt Name Born Phone Dog Wife Shotgun Paid Chuck Dec 25 444-2345 Blue Mary Sue 12 .50 Otis Sept 17 444-8000 Ol' Sal Sally 12 Claude May 31 333-4340 Blue Etheline 12 Roscoe Feb 2 444-2234 Rover Alice Jean 410 Junior April 30 BR-549 Percival Lilly Faye 12 Ernest T. ?? none none none none grep '^[^p-r]' goodoleboys.txt Amos Jan 4 333-1119 Amos Abigail 20 Roscoe Feb 2 444-2234 Rover Alice Jean 410
|
|
The three grep commands in Figure 17.5 use symbolic names. The first command finds lines with an alphabetic character, in either case, followed by a hyphen. The second command finds lines with white space followed by exactly three digits and more white space. The last command finds lines where a letter is followed by a punctuation mark.
|
|
grep '[[:alpha:]]-' goodoleboys.txt
Junior April 30 BR-549 Percival Lilly Faye 12
grep '[[:space:]][[:digit:]]\{3\}[[:space:]]' goodoleboys.txt
Roscoe Feb 2 444-2234 Rover Alice Jean 410
grep '[[:alpha:]][[:punct:]]' goodoleboys.txt
Otis Sept 17 444-8000 Ol' Sal Sally 12
Junior April 30 BR-549 Percival Lilly Faye 12
Ernest T. ?? none none none none
|
|
Figures 17.6 and 17.7 combine metacharacters and symbolic names to perform complex searches. Figure 17.6 finds lines whose first non-blank token is a
|
|
{% if main.adsdop %}{% include 'adsenceinline.tpl' %}{% endif %}
grep '^[[:space:]]*[[:alpha:]]\{5,7\}[[:space:]]' goodoleboys.txt
Chuck Dec 25 444-2345 Blue Mary Sue 12 .50
Bubba Oct 13 444-1111 Buck Mary Jean 12
Billy Bob June 11 444-4340 Leotis Lisa Sue 12
Claude May 31 333-4340 Blue Etheline 12
Roscoe Feb 2 444-2234 Rover Alice Jean 410
Arlis June 19 444-1314 Redeye Suzy Beth 12 .75
Junior April 30 BR-549 Percival Lilly Faye 12
Ernest T. ?? none none none none
|
|
|
|
grep '0[[:print:]]' goodoleboys.txt Billy Bob June 11 444-4340 Leotis Lisa Sue 12 Otis Sept 17 444-8000 Ol' Sal Sally 12 Claude May 31 333-4340 Blue Etheline 12 Arlis June 19 444-1314 Redeye Suzy Beth 12 .75 Junior April 30 BR-549 Percival Lilly Faye 12 grep '0[[:cntrl:]]' goodoleboys.txt Chuck Dec 25 444-2345 Blue Mary Sue 12 .50 Amos Jan 4 333-1119 Amos Abigail 20 Roscoe Feb 2 444-2234 Rover Alice Jean 410 Bill Feb 29 333-4444 Daisy Daisy 20
|
|
A search pattern does not have to be
grep an goodoleboys.txt
However, there's nothing wrong with placing single or double quotes around a search string that has no blanks. Therefore, the following two grep commands are equivalent to the previous one:
grep 'an' goodoleboys.txt grep "an" goodoleboys.txt
When the search argument includes a parameter or variable, you need to use quotes, unless you are sure that the parameter or variable will never contain blanks. Even so, it is good to use quotes just to be on the safe side.
The grep command in Figure 17.8 fails when $searchname is not quoted because Qshell sees Billy as the search string, Bob as the first file name, and goodoleboys.txt as the second file name. Grep succeeds only when the search argument is quoted.
|
|
/home/JSMITH $ searchname= 'Billy Bob' /home/JSMITH $ grep $searchname goodoleboys.txt grep: 001-0023 Error found opening file Bob. No such path or directory. /home/JSMITH $ grep "$searchname" goodoleboys.txt
|
|
Single quotes and double quotes function differently in Qshell. Single quotes, also called strong quotes , protect from parameter substitution. Double quotes, also called weak quotes , permit parameter substitution.
Figure 17.9 illustrates this point. The echo command shows that the fifth positional parameter has the value 444 . Parameter substitution occurs in the first grep command, in which the search pattern is not quoted, and the second grep command, in which the search pattern is delimited by weak quotes. In the third grep command, parameter substitution does not take place; grep looks for the string $5 (five dollars).
|
|
echo 444 /home/JSMITH $ grep goodoleboys.txt Chuck Dec 25 444-2345 Blue Mary Sue 12 .50 Bubba Oct 13 444-1111 Buck Mary Jean 12 Billy Bob June 11 444-4340 Leotis Lisa Sue 12 Otis Sept 17 444-8000 Ol' Sal Sally 12 Roscoe Feb 2 444-2234 Rover Alice Jean 410 Arlis June 19 444-1314 Redeye Suzy Beth 12 .75 Bill Feb 29 333-4444 Daisy Daisy 20 /home/JSMITH $ grep "" goodoleboys.txt Chuck Dec 25 444-2345 Blue Mary Sue 12 .50 Bubba Oct 13 444-1111 Buck Mary Jean 12 Billy Bob June 11 444-4340 Leotis Lisa Sue 12 Otis Sept 17 444-8000 O1' Sal Sally 12 Roscoe Feb 2 444-2234 Rover Alice Jean 410 Arlis June 19 444-1314 Redeye Suzy Beth 12 .75 Bill Feb 29 333-4444 Daisy Daisy 20 /home/JSMITH $ grep '' goodoleboys.txt Otis Sept 17 444-8000 Ol' Sal Sally 12
|
|
Did you notice the
Here is the rule of thumb you should keep in mind:
Use double quotes if the search string includes the name of a variable whose value is to be substituted. Otherwise, use single quotes.
You may list one or more file names at the end of the grep command. Each name can be an individual file, or it may contain wildcard characters for file-name expansion (globbing, discussed in chapter 14). If you omit the input-files parameter, grep reads from stdin. The only time you are likely to omit the input-files parameter, however, is when grep is reading the output of another command through a pipeline.
In the following example, output of the List Directory Contents command ( ls ) is the input to grep :
ls grep -i '[A-Z][12]'
This example lists files in the current directory whose names contain a letter of the alphabet, followed by either a one or a two.
There are several different ways to fill in the input-files parameter. One way is to list a file name, like this:
grep '22.34' mydata.csv
In this case, only one file (mydata.csv) is searched for in the current directory. You can specify a full path on the file name, of course:
grep '22.34' /home/jsmith/mydata.csv
You may want to use globbing to search more than one file at a time. The following example shows how to search all the CSV files in the current directory:
grep '22.34' *.csv
In the preceding two examples, the input-files parameter has only one argument. You can list more than one file, if you wish, separating them with white space. The command shown here searches three files:
grep '22.34' fileone.csv filetwo.csv filethree.csv
All of these commands search IFS files, but you can search source physical file members, too. For example, the
grep 'pgm' /qsys.lib/mylib.lib/src.file/*
You can mix and match IFS files and source physical files, too. In this command, grep searches all members of a source physical file, as well as all HTML and text files in the current IFS directory:
grep 'pgm' /qsys.lib/js.lib/src.file/* *.htm* *.txt