Flylib.com

Books Software

 
 
 

Qshell for iSeries - page 130


Summary

Archive files are good mechanisms to reduce the complexity of manipulating large numbers of files in Qshell. Data-compression utilities available in Qshell can also help with network performance and size requirements when dealing with large archives. The Java jar utility can be used as a very portable data compression tool. The pax utility is dated, and no longer often used.



Chapter 17: Grep

Overview

Grep is a scan utility that originated with Unix systems, but is now implemented in many environments, including Qshell. The grep utility looks for files that contain character strings. Although grep is typically used with text files, it may also be used with binary files.

Anyone who deals with the Integrated File System should learn to use grep , because the Find String Using PDM command (FNDSTRPDM) won't work with IFS files. Even if you don't deal with IFS, you still might want use grep because it works with source physical files and has more powerful search capabilities than FNDSTRPDM.

Different sources give different versions of the origin of the term grep , but it's likely that it comes from a search command in the ed and ex text editors. The command is g/re/p, where g indicates that the search is global (that is, over the entire file), re indicates that a regular expression describes the search string, and p indicates that the results are to be printed (i.e., displayed on the screen).

Here is the syntax of grep :


grep


[options] regular-expression [input-files]



Regular Expressions

The regular-expression parameter is the string for which you are searching. The simplest form of a regular expression is an exact sequence of characters for which the system is to search. In the following example, grep searches all files whose names end with .java (i.e., all Java source files) in the current directory for the string print :

grep print *.java

By default, the search is case-sensitive, so this grep command will not find Print , PRINT , pRINT , or any other combination of cases.

Grep would be useful even if this were the only kind of search it could perform, but grep can do much more, because it knows how to interpret metacharacters (sophisticated versions of wildcards). Table 17.1 describes these special symbols and their meanings.

Table 17.1: Metacharacters for Use with Grep

Metacharacter

Description

(period)

Match any character except end-of-line.

*

Match zero or more occurrences of the preceding pattern.

^

Match from the beginning of the line.

$

Match from the end of the line.

[ ]

Match any character within the brackets. Ranges may be specified with a hyphen.

[^ ]

Negates the groups or ranges of characters in the brackets. The caret must be the first character within the brackets.

\{m\}

Match exactly m occurrences of the preceding pattern.

\{m,\}

Match m or more occurrences of the preceding pattern.

\{m,n\}

Match m to n occurrences of the preceding pattern.

\

Turn off the special meaning of the following pattern.

\(\)

Define a back reference to save matched characters as a pattern. The matched pattern can be referred to with a backslash followed by a number later in the expression.

You may also use certain symbolic names in place of characters. These are shown in Table 17.2.

Table 17.2: Symbolic Names

Symbol

Description

[[:alpha:]]

Any letter in either case

[[:upper:]]

Any uppercase letter

[[:lower:]]

Any lowercase letter

[[:digit:]]

Any decimal digit

[[:xdigit:]]

Any hexadecimal digit, where A-F may be upper or lowercase

[[:alnum:]]

Any letter or decimal digit

[[:space:]]

Any space, tab, carriage -return, or formfeed character

[[:blank:]]

Any space or tab character

[[:punct:]]

Any punctuation mark

[[:cntrl:]]

Any control character

[[:print:]]

Any printable character

[[:graph:]]

Any character that is not a letter, digit, or punctuation mark

Grep Examples

To illustrate how regular expressions work, several grep examples follow, along with explanations of what each one accomplishes. The data file being searched is goodoleboys.txt, shown in Figure 17.1.

start figure


cat goodoleboys.txt


Name      Born      Phone    Dog       Wife    Shotgun  Paid


========= ======== ======== ======== ========= =======  =====


Chuck     Dec 25    444-2345 Blue     Mary Sue   12    .50


Bubba     Oct 13    444-1111 Buck     Mary Jean  12


Billy Bob June 11   444-4340 Leotis   Lisa Sue   12


Amos      Jan 4     333-1119 Amos     Abigail    20


Otis      Sept 17   444-8000 Ol' Sal  Sally      12


Claude    May 31    333-4340 Blue     Etheline   12


Roscoe    Feb 2     444-2234 Rover    Alice Jean 410


Arlis     June 19   444-1314 Redeye   Suzy Beth  12    .75


Junior    April 30  BR-549   Percival Lilly Faye 12


Bill      Feb 29    333-4444 Daisy    Daisy      20


Ernest T. ??        none     none     none       none

end figure

Figure 17.1: The goodoleboys.txt file is used for the search examples that follow.

The first example simply finds all lines that begin with uppercase C:


grep ^C    goodoleboys.txt


Chuck      Dec 25    444-2345 Blue      Mary Sue      12      .50


Claude     May 31    333-4340 Blue      Etheline      12

Figure 17.2 is a slightly more complex example. It finds all lines that end with a zero. The first form of grep shown in the figure is used with files that are delimited with a single linefeed character, as is typical of Unix files. The second form is for files that are delimited with a combination of carriage-return and linefeed characters. The [[:cntrl:]] expression allows for the carriage return.

start figure


grep '0$' goodoleboys.txt


grep '0[[:cntrl:]]$' goodoleboys.txt


Chuck     Dec 25    444-2345 Blue      Mary Sue      12      .50


Amos      Jan 4     333-1119 Amos      Abigail       20


Roscoe    Feb 2     444-2234 Rover     Alice Jean    410


Bill      Feb 29    333-4444 Daisy     Daisy         20

end figure

Figure 17.2: Find lines that end with a zero.

The following grep command looks for lines that contain a dollar sign followed by any two characters and a period:


grep '$..\.' goodoleboys.txt


Arlis     June 19   444-1314 Redeye    Suzy Beth      12      .75

The first two periods function as metacharacters. The dollar sign and last period do not because they are preceded with backslashes.

These two grep commands find lines that contain a single quote:


grep " ' "   goodoleboys.txt


grep \' goodoleboys.txt


Otis       Sept 17  444-8000 Ol' Sal  Sally        12

The first line shows that double quotes can "escape" single quotes.

This command finds lines with three zeros together:


grep '0\{3\}' goodoleboys.txt


Otis       Sept 17  444-8000 Ol' Sal   Sally       12

Figure 17.3 expands on the previous examples to find lines where the same uppercase letter followed by a lowercase letter is repeated. The \(and \) pair indicates that the match is to be saved as a pattern, which can be referred to as \1. If other patterns were saved, they would be referred to as \2, \3 , etc. These expressions are known as back references .

start figure


grep '\([A-Z][a-z]\).*' goodoleboys.txt


Bubba      Oct 13   444-1111 Buck     Mary Jean     12


Amos       Jan 4    333-1119 Amos     Abigail       20


Otis       Sept 17  444-8000 Ol' Sal  Sally         12


Roscoe     Feb 2    444-2234 Rover    Alice Jean    410


Bill       Feb 29   333-4444 Daisy    Daisy         20

end figure

Figure 17.3: Find lines where a particular pair of uppercase and lowercase letters are repeated.

In the first line returned, the pattern Bu is found twice. In the second line, Ro is repeated. In the third line, Sa is repeated.

Figure 17.4 illustrates the use of grep metacharacters that have to do with including and excluding characters in a search.

start figure


grep '^[CR]' goodoleboys.txt


Chuck     Dec 25    444-2345 Blue      Mary Sue      12     .50


Claude    May 31    333-4340 Blue      Etheline      12


Roscoe    Feb 2     444-2234 Rover     Alice Jean    410


grep '^[C-R]' goodoleboys.txt


Name      Born      Phone    Dog       Wife       Shotgun   Paid


Chuck     Dec 25    444-2345 Blue      Mary Sue      12     .50


Otis      Sept 17   444-8000 Ol' Sal   Sally         12


Claude    May 31    333-4340 Blue      Etheline      12


Roscoe    Feb 2     444-2234 Rover     Alice Jean    410


Junior    April 30  BR-549   Percival  Lilly Faye    12


Ernest T. ??        none     none      none          none


grep '^[^p-r]' goodoleboys.txt


Amos       Jan 4    333-1119 Amos      Abigail       20


Roscoe     Feb 2    444-2234 Rover     Alice Jean    410

end figure

Figure 17.4: Find lines that begin with C or R , then find lines that begin with any letter between C and R (inclusive). Finally, find lines that contain an A that is not followed by a letter from p to r , inclusive.

The three grep commands in Figure 17.5 use symbolic names. The first command finds lines with an alphabetic character, in either case, followed by a hyphen. The second command finds lines with white space followed by exactly three digits and more white space. The last command finds lines where a letter is followed by a punctuation mark.

start figure


grep '[[:alpha:]]-' goodoleboys.txt


Junior    April 30  BR-549   Percival Lilly Faye    12


grep  '[[:space:]][[:digit:]]\{3\}[[:space:]]' goodoleboys.txt


Roscoe    Feb 2     444-2234 Rover    Alice Jean    410


grep '[[:alpha:]][[:punct:]]' goodoleboys.txt


Otis       Sept 17  444-8000 Ol' Sal  Sally         12


Junior     April 30 BR-549   Percival Lilly Faye    12


Ernest T. ??        none     none     none          none

end figure

Figure 17.5: These commands illustrate the use of symbolic names.

Figures 17.6 and 17.7 combine metacharacters and symbolic names to perform complex searches. Figure 17.6 finds lines whose first non-blank token is a group of five to seven letters in any case. Figure 17.7 finds lines that contain a zero followed by a printable character, and then finds lines where the zero is followed by a control character.

start figure

{% if main.adsdop %}{% include 'adsenceinline.tpl' %}{% endif %}


grep  '^[[:space:]]*[[:alpha:]]\{5,7\}[[:space:]]' goodoleboys.txt


Chuck      Dec 25   444-2345 Blue      Mary Sue      12     .50


Bubba      Oct 13   444-1111 Buck      Mary Jean     12


Billy Bob  June 11  444-4340 Leotis    Lisa Sue      12


Claude     May 31   333-4340 Blue      Etheline      12


Roscoe     Feb 2    444-2234 Rover     Alice Jean    410


Arlis      June 19  444-1314 Redeye    Suzy Beth     12     .75


Junior     April 30 BR-549   Percival  Lilly Faye    12


Ernest T.  ??       none     none      none          none

end figure

Figure 17.6: Find lines whose first non-blank token is a group of five to seven letters in any case.

start figure


grep '0[[:print:]]' goodoleboys.txt


Billy Bob June 11   444-4340 Leotis    Lisa Sue      12


Otis      Sept 17   444-8000 Ol' Sal   Sally         12


Claude    May 31    333-4340 Blue      Etheline      12


Arlis     June 19   444-1314 Redeye    Suzy Beth     12     .75


Junior    April 30  BR-549   Percival  Lilly Faye    12


grep '0[[:cntrl:]]' goodoleboys.txt


Chuck     Dec 25    444-2345 Blue      Mary Sue      12     .50


Amos      Jan 4     333-1119 Amos      Abigail       20


Roscoe    Feb 2     444-2234 Rover     Alice Jean    410


Bill      Feb 29    333-4444 Daisy     Daisy         20

end figure

Figure 17.7: Find lines where the character (zero) is followed by a printable character, then find lines where zero is followed by a control character.

Quotes

A search pattern does not have to be enclosed in quotes if it does not contain any white space or special characters. In the following example, grep searches for the string an :

grep an goodoleboys.txt

However, there's nothing wrong with placing single or double quotes around a search string that has no blanks. Therefore, the following two grep commands are equivalent to the previous one:

grep 'an' goodoleboys.txt
grep "an" goodoleboys.txt

When the search argument includes a parameter or variable, you need to use quotes, unless you are sure that the parameter or variable will never contain blanks. Even so, it is good to use quotes just to be on the safe side.

The grep command in Figure 17.8 fails when $searchname is not quoted because Qshell sees Billy as the search string, Bob as the first file name, and goodoleboys.txt as the second file name. Grep succeeds only when the search argument is quoted.

start figure


/home/JSMITH $


searchname= 'Billy Bob'


/home/JSMITH $


grep $searchname goodoleboys.txt


grep: 001-0023 Error found opening file Bob. No such path or directory.


/home/JSMITH $


grep "$searchname" goodoleboys.txt

end figure

Figure 17.8: The grep search fails if the search pattern is not quoted because of an embedded space.

Single quotes and double quotes function differently in Qshell. Single quotes, also called strong quotes , protect from parameter substitution. Double quotes, also called weak quotes , permit parameter substitution.

Figure 17.9 illustrates this point. The echo command shows that the fifth positional parameter has the value 444 . Parameter substitution occurs in the first grep command, in which the search pattern is not quoted, and the second grep command, in which the search pattern is delimited by weak quotes. In the third grep command, parameter substitution does not take place; grep looks for the string $5 (five dollars).

start figure


echo


444


/home/JSMITH $


grep  goodoleboys.txt


Chuck     Dec 25   444-2345 Blue     Mary Sue      12     .50


Bubba     Oct 13   444-1111 Buck     Mary Jean     12


Billy Bob June 11  444-4340 Leotis   Lisa Sue      12


Otis      Sept 17  444-8000 Ol' Sal  Sally         12


Roscoe    Feb 2    444-2234 Rover    Alice Jean    410


Arlis     June 19  444-1314 Redeye   Suzy Beth     12     .75


Bill      Feb 29   333-4444 Daisy    Daisy 20


/home/JSMITH $


grep "" goodoleboys.txt


Chuck     Dec 25    444-2345 Blue    Mary Sue      12     .50


Bubba     Oct 13    444-1111 Buck    Mary Jean     12


Billy Bob June 11   444-4340 Leotis  Lisa Sue      12


Otis      Sept 17   444-8000 O1' Sal Sally         12


Roscoe    Feb 2     444-2234 Rover   Alice Jean    410


Arlis     June 19   444-1314 Redeye  Suzy Beth     12     .75


Bill      Feb 29    333-4444 Daisy   Daisy         20


/home/JSMITH $


grep '' goodoleboys.txt


Otis      Sept 17  444-8000 Ol' Sal  Sally         12

end figure

Figure 17.9: Strong quotes forbid parameter substitution; weak quotes allow it.

Did you notice the strange thing that grep does in this example? It matches a literal dollar-sign character ($) instead of treating it as an end-of-line metacharacter. Why? If the dollar sign is interpreted as end-of-line, then $5 is an illegal regular expression. Although taking advantage of this behavior will show that you are a real grep guru, you shouldn't rely on it. It's tricky and unclear to whoever might be changing the script later ”even if that person is you. Instead, to search for a literal dollar sign, use \$ .

Here is the rule of thumb you should keep in mind:

Use double quotes if the search string includes the name of a variable whose value is to be substituted. Otherwise, use single quotes.

Files

You may list one or more file names at the end of the grep command. Each name can be an individual file, or it may contain wildcard characters for file-name expansion (globbing, discussed in chapter 14). If you omit the input-files parameter, grep reads from stdin. The only time you are likely to omit the input-files parameter, however, is when grep is reading the output of another command through a pipeline.

In the following example, output of the List Directory Contents command ( ls ) is the input to grep :

ls      grep   -i   '[A-Z][12]'

This example lists files in the current directory whose names contain a letter of the alphabet, followed by either a one or a two.

There are several different ways to fill in the input-files parameter. One way is to list a file name, like this:

grep '22.34'  mydata.csv

In this case, only one file (mydata.csv) is searched for in the current directory. You can specify a full path on the file name, of course:

grep '22.34'  /home/jsmith/mydata.csv

You may want to use globbing to search more than one file at a time. The following example shows how to search all the CSV files in the current directory:

grep   '22.34'  *.csv

In the preceding two examples, the input-files parameter has only one argument. You can list more than one file, if you wish, separating them with white space. The command shown here searches three files:

grep '22.34' fileone.csv filetwo.csv filethree.csv

All of these commands search IFS files, but you can search source physical file members, too. For example, the next command searches all members of MYLIB/MYSRC for the string pgm :

grep   'pgm'   /qsys.lib/mylib.lib/src.file/*

You can mix and match IFS files and source physical files, too. In this command, grep searches all members of a source physical file, as well as all HTML and text files in the current IFS directory:

grep 'pgm' /qsys.lib/js.lib/src.file/*  *.htm*  *.txt