Filters and Regular Expressions

 < Day Day Up > 



Filters are commands that read data, perform operations on that data, and then send the results to the standard output. Filters generate different kinds of output, depending on their task. Some filters generate information only about the input, other filters output selected parts of the input, and still other filters output an entire version of the input, but in a modified way. Some filters are limited to one of these, while others have options that specify one or the other. You can think of a filter as operating on a stream of data—receiving data and generating modified output. As data is passed through the filter, it is analyzed, screened, or modified.

The data stream input to a filter consists of a sequence of bytes that can be received from files, devices, or the output of other commands or filters. The filter operates on the data stream, but it does not modify the source of the data. If a filter receives input from a file, the file itself is not modified. Only its data is read and fed into the filter.

The output of a filter is usually sent to the standard output. It can then be redirected to another file or device, or piped as input to another utility or filter. All the features of redirection and pipes apply to filters. Often data is read by one filter and its modified output piped into another filter.

Note 

Data could easily undergo several modifications as it is passed from one filter to another. However, it is always important to realize the original source of the data is never changed.

Many utilities and filters use patterns to locate and select specific text in your file. Sometimes, you may need to use patterns in a more flexible and powerful way, searching for several different variations on a given pattern. You can include a set of special characters in your pattern to enable a flexible search. A pattern that contains such special characters is called a regular expression. Regular expressions can be used in most filters and utilities that employ pattern searches such as sed, awk, grep, and egrep.

Tip 

Although many of the special characters used for regular expressions are similar to the shell file expansion characters, they are used in a different way. Shell file expansion characters operate on filenames. Regular expressions search text.

You can save the output of a filter in a file or send it to a printer. To do so, you need to use redirection or pipes. To save the output of a filter to a file, you redirect it to a file using the redirection operation, >. To send output to the printer, you pipe the output to the lpr utility, which then prints it. In the next command, the cat command pipes its output to the lpr command, which then prints it.

$ cat complist | lpr 

All filters accept input from the standard input. In fact, the output of one filter can be piped as the input for another filter. Many filters also accept input directly from files, however. Such filters can take filenames as their arguments and read data directly from those files.

Searching Files: grep

The grep and fgrep filters search the contents of files for a pattern. They then inform you of what file the pattern was found in and print the lines in which it occurred in each file. Preceding each line is the name of the file in which the line is located. grep can search for only one pattern, whereas fgrep can search for more than one pattern at a time.

The grep filter takes two types of arguments. The first argument is the pattern to be searched for; the second argument is a list of filenames, which are the files to be searched. You enter the filenames on the command line after the pattern. You can also use special characters, such as the asterisk, to generate a file list.

$ grep pattern filenames-list 

If you want to include more than one word in the pattern search, you enclose the words within single quotation marks. This is to quote the spaces between the words in the pattern. Otherwise, the shell would interpret the space as a delimiter or argument on the command line, and grep would try to interpret words in the pattern as part of the file list. In the next example, grep searches for the pattern "text file":

$ grep 'text file' preface A text file in Unix text files, changing or 

If you use more than one file in the file list, grep will output the name of the file before the matching line. In the next example, two files, preface and intro, are searched for the pattern "data". Before each occurrence, the filename is output.

$ grep data preface intro  preface: data in the file.  intro: new data

As mentioned earlier, you can also use shell file expansion characters to generate a list of files to be searched. In the next example, the asterisk file expansion character is used to generate a list of all files in your directory. This is a simple way of searching all of a directory's files for a pattern.

$ grep data * 

The special characters are often useful for searching a selected set of files. For example, if you want to search all your C program source code files for a particular pattern, you can specify the set of source code files with *.c. Suppose you have an unintended infinite loop in your program and you need to locate all instances of iterations. The next example searches only those files with a .c extension for the pattern "while" and displays the lines of code that perform iterations:

$ grep while *.c 

Regular Expressions

Regular expressions enable you to match possible variations on a pattern, as well as patterns located at different points in the text. You can search for patterns in your text that have different ending or beginning letters, or you can match text at the beginning or end of a line. The regular expression special characters are the circumflex, dollar sign, asterisk, period, and brackets: ^, $, *, ., []. The circumflex and dollar sign match on the beginning and end of a line. The asterisk matches repeated characters, the period matches single characters, and the brackets match on classes of characters. See Table 8-8 for a listing of the regular expression special characters.

Table 8-8: Regular Expression Special Characters

Character

Match

Operation

^

Start of a line

References the beginning of a line

$

End of a line

References the end of a line

.

Any character

Matches on any one possible character in a pattern

*

Repeated characters

Matches on repeated characters in a pattern

[]

Classes

Matches on classes of characters (a set of characters) in the pattern

Note 

Regular expressions are used extensively in many Linux filters and applications to perform searches and matching operations. The Vi and Emacs editors and the sed, diff, grep, and gawk filters all use regular expressions.

Suppose you want to use the long-form output of ls to display just your directories. One way to do this is to generate a list of all directories in the long form and pipe this list to grep, which can then pick out the directory entries. You can do this by using the ^ special character to specify the beginning of a line. Remember, in the long-form output of ls, the first character indicates the file type. A d represents a directory, a l represents a symbolic link, and an a represents a regular file. Using the pattern '^d', grep will match only on those lines beginning with a d.

$ ls -l | grep '^d' drwxr-x---  2  chris 512 Feb 10 04:30  reports drwxr-x---  2  chris 512 Jan 6  01:20  letters



 < Day Day Up > 



Red Hat(c) The Complete Reference
Red Hat Enterprise Linux & Fedora Edition (DVD): The Complete Reference
ISBN: 0072230754
EAN: 2147483647
Year: 2004
Pages: 328

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net