CONTENTS |
The three commands covered in this chapter, along with regular expressions (pattern matching), are often grouped together. There are even many books available devoted to awk and sed. In these books, grep usually goes along for the ride because awk is somewhat derived from sed and grep. In addition, the use of regular expressions for pattern matching that are used for awk, sed, andgrep are similar.
We'll take a look at regular expressions and pattern matching in general, and then cover the three commands in this chapter:
Regular expressions
sed, awk, and grep commands
Regular expressions describe patterns for which you are searching. A regular expression usually defines the pattern for which you are searching using wildcards. Since a regular expression defines a pattern you are searching for, the terms "regular expression" and "pattern matching" are often used interchangably.
Let's get down to a couple of words-of-caution immediately:
Regular expressions are different from file matching patterns used by the shell. Regular expressions are used by both the shell and many other programs, including those covered in this chapter. The file matching done by the shell and programs such as find are different from the regular expressions covered in this chapter.
Use single quotes around regular expressions. The meta-characters used in this chapter must be quoted in order to be passed to the shell as an argument. You will, therefore, see most regular expressions in this chapter quoted.
When using the programs in this book, such as grep and vi, you provide a regular expression that the program evaluates. The command will search for the pattern you supply. The pattern could be as simple as a string or it could be wildcards. The wildcards used by many programs are called meta-characters.
Table 6-1 shows a list of meta-characters and the program(s) to which they apply. Only the programs covered in this book (awk, grep, sed, and vi) are shown in Table 6-1. These meta-characters may be used with other programs, such as ed and egrep, as well, which are not covered in the book. Table 6-1 describes the meta-characters and their use.
Meta Character | awk | grep | sed | vi | Use |
---|---|---|---|---|---|
. | Yes | Yes | Yes | Yes | Match any single character. |
* | Yes | Yes | Yes | Yes | Match any number of the single character that precedes *. |
[...] | Yes | Yes | Yes | Yes | Match any one of the characters in the set [...]. |
$ | Yes | Yes | Yes | Yes | Matches the end of the line. |
^ | Yes | Yes | Yes | Yes | Matches the beginning of the line. |
\ | Yes | Yes | Yes | Yes | Escape the special character that follows \. |
\{n,m\} | Yes | Yes | No | No | Match a range of occur-rencesofa single character between n and m. |
+ | Yes | No | No | No | Matchoneormoreoccur-rences of the preceding regular expression. |
? | Yes | No | No | No | Match zero or one occurrence of the preceding regular expression. |
| | Yes | No | No | No | The preceding or following regular expression can be matched. |
() | Yes | No | No | No | Groups regular expressions in a typical parenthesis fashion. |
\{ \} | No | No | No | Yes | Match a word's beginning or end. |
You may want to refer to this table when regular expressions are used for one of the commands in the table.
Most of the editing performed on UNIX systems is done with vi. I have devoted a chapter to vi in this book, because of its prominence as a UNIX editor. Many times, we don't have the luxury of invoking vi when we need to edit a file. You may be writing a shell program or piping information between processes and need to edit in a non-interactive manner. sed can help here. Its name comes from stream editor, and it's a tool for filtering text files.
You can specify the name of the file you wish to edit with sed or it takes its input from standard input. sed reads one line at a time and performs the editing you specify to each line. You can specify specific line numbers for sed to edit as well.
sed uses many of the same commands as ed. You can view some of the ed commands in the vi chapter, and I also supply a summary of these at the end of this sed section.
You can invoke sed in the following two ways:
sed [-n][-e]'command' filename(s) sed [-n]-f scriptfile filename(s)
The first form of sed is for issuing commands on the command line. By default, sed will display all lines. The -n specifies that you want only to print lines that are specified with the p command.
If you supply more than one instruction on the command line, then your -e is used to inform sed that the next argument is an instruction.
The second form allows you to specify one or more scripts containing editing commands.
The following is a summary of the three options that appear in the two different forms of sed:
-n | Print only lines that are specified with the p command. |
-e command | The argument following -e is an editing command. |
-f filename | The argument following -f is a file containing editing commands. |
Let's view a couple of simple examples of what you can do with sed. These examples use some of the sed commands that appear at the end of this section. We'll use a file called passwd.test. We'll view this file with cat and then view only lines 16, 17, and 18 using the p option to sed, indicating we want only the specified lines printed:
# cat passwd.test root:PgYQCkVH65hyQ:0:0:root:/root:/bin/bash bin:*:1:1:bin:/bin: daemon:*:2:2:daemon:/sbin: adm:*:3:4:adm:/var/adm: lp:*:4:7:lp:/var/spool/lpd: sync:*:5:0:sync:/sbin:/bin/sync shutdown:*:6:11:shutdown:/sbin:/sbin/shutdown halt:*:7:0:halt:/sbin:/sbin/halt mail:*:8:12:mail:/var/spool/mail: news:*:9:13:news:/var/spool/news: uucp:*:10:14:uucp:/var/spool/uucp: operator:*:11:0:operator:/root: games:*:12:100:games:/usr/games: gopher:*:13:30:gopher:/usr/lib/gopher-data: ftp:*:14:50:FTP User:/home/ftp: man:*:15:15:Manuals Owner:/: nobody:*:65534:65534:Nobody:/:/bin/false col:Wh0yzfAV2qm2Y:100:100:Caldera OpenLinux User:/home/col:/bin/bash # # sed 16,18p passwd.test root:PgYQCkVH65hyQ:0:0:root:/root:/bin/bash bin:*:1:1:bin:/bin: daemon:*:2:2:daemon:/sbin: adm:*:3:4:adm:/var/adm: lp:*:4:7:lp:/var/spool/lpd: sync:*:5:0:sync:/sbin:/bin/sync shutdown:*:6:11:shutdown:/sbin:/sbin/shutdown halt:*:7:0:halt:/sbin:/sbin/halt mail:*:8:12:mail:/var/spool/mail: news:*:9:13:news:/var/spool/news: uucp:*:10:14:uucp:/var/spool/uucp: operator:*:11:0:operator:/root: games:*:12:100:games:/usr/games: gopher:*:13:30:gopher:/usr/lib/gopher-data: ftp:*:14:50:FTP User:/home/ftp: man:*:15:15:Manuals Owner:/: man:*:15:15:Manuals Owner:/: nobody:*:65534:65534:Nobody:/:/bin/false nobody:*:65534:65534:Nobody:/:/bin/false col:Wh0yzfAV2qm2Y:100:100:Caldera OpenLinux User:/home/col:/bin/bash col:Wh0yzfAV2qm2Y:100:100:Caldera OpenLinux User:/home/col:/bin/bash # # sed -n 16,18p passwd.test man:*:15:15:Manuals Owner:/: nobody:*:65534:65534:Nobody:/:/bin/false col:Wh0yzfAV2qm2Y:100:100:Caldera OpenLinux User:/home/col:/bin/bash
The first attempt to print only lines 16, 17, and 18 results in all of the lines in the file being printed and lines 16, 17, and 18 being printed twice. The reason is that sed reads each line of input and acts on each line. In order to specify the lines on which to act, we used the -n switch to suppress all lines from going to standard output. We then specify the lines we want to print and these will indeed go to standard output.
Now that we know how to view lines 16, 17, and 18 of the file, let's again view passwd.test and delete those same three lines with d:
# cat passwd.test root:PgYQCkVH65hyQ:0:0:root:/root:/bin/bash bin:*:1:1:bin:/bin: daemon:*:2:2:daemon:/sbin: adm:*:3:4:adm:/var/adm: lp:*:4:7:lp:/var/spool/lpd: sync:*:5:0:sync:/sbin:/bin/sync shutdown:*:6:11:shutdown:/sbin:/sbin/shutdown halt:*:7:0:halt:/sbin:/sbin/halt mail:*:8:12:mail:/var/spool/mail: news:*:9:13:news:/var/spool/news: uucp:*:10:14:uucp:/var/spool/uucp: operator:*:11:0:operator:/root: games:*:12:100:games:/usr/games: gopher:*:13:30:gopher:/usr/lib/gopher-data: ftp:*:14:50:FTP User:/home/ftp: man:*:15:15:Manuals Owner:/: nobody:*:65534:65534:Nobody:/:/bin/false col:Wh0yzfAV2qm2Y:100:100:Caldera OpenLinux User:/home/col:/bin/bash # # sed 16,18d passwd.test root:PgYQCkVH65hyQ:0:0:root:/root:/bin/bash bin:*:1:1:bin:/bin: daemon:*:2:2:daemon:/sbin: adm:*:3:4:adm:/var/adm: lp:*:4:7:lp:/var/spool/lpd: sync:*:5:0:sync:/sbin:/bin/sync shutdown:*:6:11:shutdown:/sbin:/sbin/shutdown halt:*:7:0:halt:/sbin:/sbin/halt mail:*:8:12:mail:/var/spool/mail: news:*:9:13:news:/var/spool/news: uucp:*:10:14:uucp:/var/spool/uucp: operator:*:11:0:operator:/root: games:*:12:100:games:/usr/games: gopher:*:13:30:gopher:/usr/lib/gopher-data: ftp:*:14:50:FTP User:/home/ftp:
As with our earlier grep example, we enclose any special characters in single quotes to make sure that they are not interfered with and are passed directly to sed unmodified and uninterpreted by the shell. In this example, we specify the range of lines to delete, 16 through 18, and the d for delete. We could specify just one line to delete, such as 16, and not specify an entire range. Because we did not redirect the output as part of the sed command line, the result is sent to standard output. The original file remains intact.
We could search for a pattern in a file and delete only those lines containing the pattern. The following example shows searching for bash and deleting the lines that contain bash:
# cat passwd.test root:PgYQCkVH65hyQ:0:0:root:/root:/bin/bash bin:*:1:1:bin:/bin: daemon:*:2:2:daemon:/sbin: adm:*:3:4:adm:/var/adm: lp:*:4:7:lp:/var/spool/lpd: sync:*:5:0:sync:/sbin:/bin/sync shutdown:*:6:11:shutdown:/sbin:/sbin/shutdown halt:*:7:0:halt:/sbin:/sbin/halt mail:*:8:12:mail:/var/spool/mail: news:*:9:13:news:/var/spool/news: uucp:*:10:14:uucp:/var/spool/uucp: operator:*:11:0:operator:/root: games:*:12:100:games:/usr/games: gopher:*:13:30:gopher:/usr/lib/gopher-data: ftp:*:14:50:FTP User:/home/ftp: man:*:15:15:Manuals Owner:/: nobody:*:65534:65534:Nobody:/:/bin/false col:Wh0yzfAV2qm2Y:100:100:Caldera OpenLinux User:/home/col:/bin/bash # # sed '/bash/ d' passwd.test bin:*:1:1:bin:/bin: daemon:*:2:2:daemon:/sbin: adm:*:3:4:adm:/var/adm: lp:*:4:7:lp:/var/spool/lpd: sync:*:5:0:sync:/sbin:/bin/sync shutdown:*:6:11:shutdown:/sbin:/sbin/shutdown halt:*:7:0:halt:/sbin:/sbin/halt mail:*:8:12:mail:/var/spool/mail: news:*:9:13:news:/var/spool/news: uucp:*:10:14:uucp:/var/spool/uucp: operator:*:11:0:operator:/root: games:*:12:100:games:/usr/games: gopher:*:13:30:gopher:/usr/lib/gopher-data: ftp:*:14:50:FTP User:/home/ftp: man:*:15:15:Manuals Owner:/: nobody:*:65534:65534:Nobody:/:/bin/false
Both lines containing bash were deleted from passwd.test (the root line and the col line).
As I had mentioned earlier, it is a good idea to use single quotes around all regular expressions. In this example, I enclosed in single quotes the pattern for which I was searching and the command to execute.
What if you wanted to delete all lines except those that contain bash? You would insert an exclamation mark before the d to delete all lines except those that contain bash, as shown in the following example:
# cat passwd.test root:PgYQCkVH65hyQ:0:0:root:/root:/bin/bash bin:*:1:1:bin:/bin: daemon:*:2:2:daemon:/sbin: adm:*:3:4:adm:/var/adm: lp:*:4:7:lp:/var/spool/lpd: sync:*:5:0:sync:/sbin:/bin/sync shutdown:*:6:11:shutdown:/sbin:/sbin/shutdown halt:*:7:0:halt:/sbin:/sbin/halt mail:*:8:12:mail:/var/spool/mail: news:*:9:13:news:/var/spool/news: uucp:*:10:14:uucp:/var/spool/uucp: operator:*:11:0:operator:/root: games:*:12:100:games:/usr/games: gopher:*:13:30:gopher:/usr/lib/gopher-data: ftp:*:14:50:FTP User:/home/ftp: man:*:15:15:Manuals Owner:/: nobody:*:65534:65534:Nobody:/:/bin/false col:Wh0yzfAV2qm2Y:100:100:Caldera OpenLinux User:/home/col:/bin/bash # # sed /bash/ !d passwd.test root:PgYQCkVH65hyQ:0:0:root:/root:/bin/bash col:Wh0yzfAV2qm2Y:100:100:Caldera OpenLinux User:/home/col:/bin/bash
This results in all but the two lines containing bash to be deleted from passwd.test.
Now that we have seen how to display and delete specific lines of the file, let's see how to add three lines to the end of the file:
# sed $a\ > This is a backup of passwd file\ > for viewing purposes only\ > so do not modify passwd.test root:PgYQCkVH65hyQ:0:0:root:/root:/bin/bash bin:*:1:1:bin:/bin: daemon:*:2:2:daemon:/sbin: adm:*:3:4:adm:/var/adm: lp:*:4:7:lp:/var/spool/lpd: sync:*:5:0:sync:/sbin:/bin/sync shutdown:*:6:11:shutdown:/sbin:/sbin/shutdown halt:*:7:0:halt:/sbin:/sbin/halt mail:*:8:12:mail:/var/spool/mail: news:*:9:13:news:/var/spool/news: uucp:*:10:14:uucp:/var/spool/uucp: operator:*:11:0:operator:/root: games:*:12:100:games:/usr/games: gopher:*:13:30:gopher:/usr/lib/gopher-data: ftp:*:14:50:FTP User:/home/ftp: man:*:15:15:Manuals Owner:/: nobody:*:65534:65534:Nobody:/:/bin/false col:Wh0yzfAV2qm2Y:100:100:Caldera OpenLinux User:/home/col:/bin/bash This is a backup of passwd file for viewing purposes only so do not modify
The backslashes (\) are used liberally in this example. Each backslash represents a new line. We go to the end of the file, as designated by the $, then we add a new line with the backslash, and then add the text we wish and a new line after the text. These lines are great to add to the end of the file, but we really should add them to the beginning of the file. The following example shows this approach:
# sed li\ > This is a backup passwd file\ > for viewing purposes only\ > so do not modify\ > passwd.test This is a backup passwd file for viewing purposes only so do not modify root:PgYQCkVH65hyQ:0:0:root:/root:/bin/bash bin:*:1:1:bin:/bin: daemon:*:2:2:daemon:/sbin: adm:*:3:4:adm:/var/adm: lp:*:4:7:lp:/var/spool/lpd: sync:*:5:0:sync:/sbin:/bin/sync shutdown:*:6:11:shutdown:/sbin:/sbin/shutdown halt:*:7:0:halt:/sbin:/sbin/halt mail:*:8:12:mail:/var/spool/mail: news:*:9:13:news:/var/spool/news: uucp:*:10:14:uucp:/var/spool/uucp: operator:*:11:0:operator:/root: games:*:12:100:games:/usr/games: gopher:*:13:30:gopher:/usr/lib/gopher-data: ftp:*:14:50:FTP User:/home/ftp: man:*:15:15:Manuals Owner:/: nobody:*:65534:65534:Nobody:/:/bin/false col:Wh0yzfAV2qm2Y:100:100:Caldera OpenLinux User:/home/col:/bin/bash
First, we run sed, specifying that on line one we are going to begin inserting the text shown. We use the single quote immediately following sed and use another single quote on the last line when we are done specifying all the information, except for the input file, which is passwd.test.
We have only scratched the surface of commands you can use with sed. The following sed summary includes the commands we have used (p for print; d for delete; and a for add), as well as others that were not part of the examples.
sed - Stream editor.
|
awk can pretty much do it all. With awk, you can search, modify files, generate reports, and a lot more. awk performs these tasks by searching for patterns in lines of input (from standard input or from a file). For each line that matches the specified pattern, it can perform some very complex processing on that line. The code to actually process matching lines of input is a cross between a shell script and a C program.
Data manipulation tasks that would be very complex with combinations of grep, cut, and paste are very easily done with awk. Because awk is a programming language, it can also perform mathematical operations or check the input very easily (shells don't do math very well). It can even do floating-point math (shells deal only with integers and strings).
The basic form of an awk program looks like this:
awk /pattern_to_match/ {prog to run} input_file_names
Notice that the whole program is enclosed in single quotes. If no input file names are specified, awk reads from standard input (as from a pipe).
The pattern_to_match must appear between the / characters. The pattern is actually a regular expression. Regular expressions were covered earlier in this chapter. Some common regular expression examples are included shortly.
The program to execute is written in awk code, which looks something like C. The program is executed whenever a line of input matches the pattern_to_match. If /pattern_to_match/ does not precede the program in {}, then the program is executed for every line of input.
awk works with fields of the input lines. Fields are words separated by white space or some other field separator. awk uses white space as a field separator by default. You can use the -F option to specify the field separator, as shown in a later example. The fields in awk patterns and programs are referenced with $, followed by the field number. For example, the second field of an input line is $2. If you are using an awk command in your shell programs, the fields ($1, $2, etc.) are not confused with the shell script's positional parameters because the awk variables are enclosed in single, causing the shell to ignore them.
You really need to see some examples of using awk to appreciate its power. The following few examples use a file called newfiles, which contains a list of files on a system less than 15 days old. This file is generated as part of a system administration audit program that checks various aspects of a UNIX system. The following shows the contents of newfiles:
# cat newfiles PROG>>>>> report of files not older than 14 days by find the file system is / -rw-r--r-- 1 root root 567 Dec 7 07:16 ./etc/mnttab -rw-r--r-- 1 root root 20713 Dec 7 07:18 ./etc/rc.log -rw-r--r-- 1 root root 0 Dec 7 07:17 ./etc/hpC2400/hparray.map -rw-r--r-- 1 root root 0 Dec 7 07:17 ./etc/hpC2400/hparray.devs -rw-r--r-- 1 root root 0 Dec 7 07:17 ./etc/hpC2400/hparray.luns -rw-r--r-- 1 root root 0 Dec 7 07:17 ./etc/hpC2400/hparray.addr -r-s------ 1 root root 0 Dec 7 07:17 ./etc/hpC2400/pscan.lock -r-s------ 1 root root 0 Dec 7 07:17 ./etc/hpC2400/monitor.lock -rw-r--r-- 1 root root 14299 Dec 7 07:17 ./etc/hpC2400/HPARRAY.INFO -rw-r--r-- 1 bin bin 8553 Dec 7 07:02 ./etc/shutdownlog -rw-r--r-- 1 root mail 32768 Dec 7 07:16 ./etc/mail/aliases.db -rw-r--r-- 1 root mail 33 Dec 7 07:16 ./etc/mail/sendmail.pid -rw-r--r-- 1 root root 13 Dec 7 07:16 ./etc/opt/dce/boot_time -rw-r--r-- 1 root root 720 Dec 7 13:34 ./etc/utmp -rw-r--r-- 1 root root 0 Dec 7 07:16 ./etc/xtab -rw-r--r-- 1 root root 0 Dec 7 07:18 ./etc/rmtab -rw-r--r-- 1 root root 40814 Dec 7 07:15 ./etc/rc.log.old -rw-r--r-- 1 root root 4620 Dec 7 13:34 ./etc/utmpx -rw-r--r-- 1 root root 9 Dec 7 13:17 ./etc/ntp.drift -rw-r--r-- 1 root root 616 Dec 7 07:15 ./etc/auto_parms.log -rw-r--r-- 1 root sys 219 Dec 7 07:00 ./etc/auto_parms.log.old -rw-rw-rw- 1 root sys 520 Nov 23 12:37 ./.sw/sessions/swlist.last -r--r--r-- 1 root informix 76 Dec 7 07:17 ./INFORMIXTMP/.inf.shmPSREP -r--r--r-- 1 root informix 76 Dec 7 07:18 ./INFORMIXTMP/.inf.shmPSDEV -rw------- 1 autosys autosys 4052 Nov 25 14:08 ./home/autosys/.sh_history -rw------- 1 tsaxs users 2228 Dec 1 13:15 ./home/tsaxs/.sh_history -rw------- 1 tsfxo users 2862 Nov 24 10:08 ./home/tsfxo/.sh_history PROG>>>>> report of files not older than 14 days by find the file system is /usr -rw-rw-rw- 1 opop6 users 21 Dec 7 13:46 ./local/adm/etc/lmonitor.hst -rw-r--r-- 1 tsgjf users 1093 Dec 7 13:17 ./local/flexlm/licenses/license.log PROG>>>>> report of files not older than 14 days by find the file system is /opt -rw-rw-r-- 1 bin bin 200 Dec 7 07:17 ./pred/bin/OPSDBPF -rw-r--r-- 1 root sys 800028 Dec 7 07:17 ./pred/bin/PSRNLOGD PROG>>>>> report of files not older than 14 days by find the file system is /var -rw-r--r-- 1 root sys 45089 Dec 7 07:16 ./adm/sw/swagentd.log -rw-rw-rw- 1 root sys 56 Dec 7 07:16 ./adm/sw/sessions/swlist.last -rw-rw-r-- 1 root root 12236 Dec 7 07:16 ./adm/ps_data -rw-r--r-- 1 root root 65 Dec 7 07:17 ./adm/cron/log -rw-r--r-- root root 162 Dec 7 07:00 ./adm/cron/OLDlog -r--r--r-- 1 root root 734143 Dec 7 07:16 ./adm/syslog/mail.log -rw-r--r-- 1 root root 65743 Dec 7 13:56 ./adm/syslog/syslog.log -rw-r--r-- 1 root root 4924974 Dec 7 07:02 ./adm/syslog/OLDsyslog.log -rw-rw-r-- 1 adm adm 2750700 Dec 7 13:52 ./adm/wtmp -rw------- 1 root other 145920 Dec 3 14:36 ./adm/btmp -rw-r--r-- 1 lp lp 33 Dec 7 07:17 ./adm/lp/log -rw-r--r-- 1 lp lp 67 Dec 7 07:01 ./adm/lp/oldlog -rw-r--r-- 1 root root 4330 Dec 7 07:18 ./adm/diag/device_table -rw-r--r-- 1 root root 34 Dec 7 07:18 ./adm/diag/misc_sys_data -rwxr-xr-x 1 root root 995368 Nov 22 15:16 ./adm/diag/LOG0190 -rwxr-xr-x 1 root root 995368 Nov 23 02:05 ./adm/diag/LOG0191 -rwxr-xr-x 1 root root 453964 Nov 23 07:01 ./adm/diag/LOG0192 -rwxr-xr-x 1 root root 970448 Nov 23 18:35 ./adm/diag/LOG0193 -rwxr-xr-x 1 root root 995368 Nov 24 05:24 ./adm/diag/LOG0194 -rwxr-xr-x 1 root root 995368 Nov 24 16:14 ./adm/diag/LOG0195 -rwxr-xr-x 1 root root 995368 Nov 25 03:03 ./adm/diag/LOG0196 -rwxr-xr-x 1 root root 995368 Nov 25 13:52 ./adm/diag/LOG0197 -rwxr-xr-x 1 root root 995368 Nov 26 00:41 ./adm/diag/LOG0198 -rwxr-xr-x 1 root root 995368 Nov 26 11:31 ./adm/diag/LOG0199 -rwxr-xr-x 1 root root 995368 Nov 26 22:20 ./adm/diag/LOG0200 -rwxr-xr-x 1 root root 995368 Nov 27 09:09 ./adm/diag/LOG0201 -rwxr-xr-x 1 root root 995368 Nov 27 19:58 ./adm/diag/LOG0202 -rwxr-xr-x 1 root root 995368 Nov 28 06:48 ./adm/diag/LOG0203 -rwxr-xr-x 1 root root 995368 Nov 28 17:37 ./adm/diag/LOG0204 -rwxr-xr-x 1 root root 995368 Nov 29 04:26 ./adm/diag/LOG0205 -rwxr-xr-x 1 root root 995368 Nov 29 15:16 ./adm/diag/LOG0206 -rwxr-xr-x 1 root root 995368 Nov 30 02:05 ./adm/diag/LOG0207 -rwxr-xr-x 1 root root 452020 Nov 30 06:59 ./adm/diag/LOG0208 -rwxr-xr-x 1 root root 970448 Nov 30 18:35 ./adm/diag/LOG0209 -rwxr-xr-x 1 root root 995368 Dec 1 05:24 ./adm/diag/LOG0210 -rwxr-xr-x 1 root root 995368 Dec 1 16:13 ./adm/diag/LOG0211 -rwxr-xr-x 1 root root 995368 Dec 2 03:03 ./adm/diag/LOG0212 -rwxr-xr-x 1 root root 995368 Dec 2 13:52 ./adm/diag/LOG0213 -rwxr-xr-x 1 root root 995368 Dec 3 00:41 ./adm/diag/LOG0214 -rwxr-xr-x 1 root root 995368 Dec 3 11:31 ./adm/diag/LOG0215 -rwxr-xr-x 1 root root 995368 Dec 3 22:20 ./adm/diag/LOG0216 -rwxr-xr-x 1 root root 995368 Dec 4 09:09 ./adm/diag/LOG0217 -rwxr-xr-x 1 root root 995368 Dec 4 19:58 ./adm/diag/LOG0218 -rwxr-xr-x 1 root root 995368 Dec 5 06:48 ./adm/diag/LOG0219 -rwxr-xr-x 1 root root 995368 Dec 5 17:37 ./adm/diag/LOG0220 -rwxr-xr-x 1 root root 995368 Dec 6 04:26 ./adm/diag/LOG0221 -rwxr-xr-x 1 root root 995368 Dec 6 15:15 ./adm/diag/LOG0222 -rwxr-xr-x 1 root root 995368 Dec 7 02:05 ./adm/diag/LOG0223 -rwxr-xr-x 1 root root 453964 Dec 7 07:00 ./adm/diag/LOG0224 -rwxr-xr-x 1 root root 543740 Dec 7 13:57 ./adm/diag/LOG0225 -rw-r--r-- 1 root root 19587 Dec 7 07:16 ./adm/ptydaemonlog -rw-r--r-- 1 root root 52 Dec 7 07:16 ./adm/conslog.opts -rw-r--r-- 1 root root 0 Dec 7 07:16 ./adm/rpc.statd.log -rw-r--r-- 1 root root 0 Dec 7 07:16 ./adm/rpc.lockd.log -rw-r--r-- 1 root root 24250 Dec 7 07:16 ./adm/vtdaemonlog -rw------- 1 root root 214 Dec 7 12:07 ./adm/sulog -rw------- 1 root root 381 Dec 3 17:34 ./adm/OLDsulog -rw-r--r-- 1 root sys 145 Dec 7 07:16 ./adm/rbootd.log -rw------- 1 sysadm psoft 60 Dec 1 16:59 ./tmp/EAAa09057 -rw-r--r-- 1 tsgjf users 0 Dec 7 13:17 ./tmp/lockHPCUPLANGS -rw-r--r-- 1 tsgjf users 175 Dec 7 06:40 ./tmp/.flexlm/lmgrd.1507 -rw-r--r-- 1 tsgjf users 175 Dec 7 13:28 ./tmp/.flexlm/lmgrd.1505 -rw-r--r-- 1 lp lp 0 Dec 7 07:17 ./spool/lp/outputq -rw-rw-rw- 1 lp lp 4 Dec 7 07:17 ./spool/lp/SCHEDLOCK -rw------- 1 root sys 0 Nov 23 07:00 ./spool/cron/tmp/croutAAAa01030 -rw------- 1 root sys 0 Nov 30 07:00 ./spool/cron/tmp/croutAAAa01039 -rw------- 1 root sys 0 Dec 7 07:00 ./spool/cron/tmp/croutAAAb01039 -rw-r--r-- 1 root root 4 Dec 7 07:16 ./run/syslog.pid -rw-r--r-- 1 root root 4 Dec 7 07:16 ./run/gated.pid -rw-r--r-- 1 root sys 145 Dec 7 07:16 ./run/gated.version -rw-r--r-- 1 root sys 3 Dec 7 07:16 ./statmon/state -rw-r--r-- 1 root root 29771 Dec 7 07:16 ./opt/dce/config/dce_config.log -rw-r--r-- 1 root sys 74 Dec 7 07:16 ./opt/dce/rpc/local/00404/srvr_socks -rw-r--r-- 1 root root 72 Dec 7 07:16 ./opt/dce/rpc/local/00927/srvr_socks -rw-r--r-- 1 root root 32768 Dec 7 07:16 ./opt/dce/dced/Ep.db -rw-r--r-- 1 root root 32768 Dec 7 07:20 ./opt/dce/dced/Llb.db -rw-r--r-- 1 root root 0 Nov 30 07:16 ./opt/perf/status.ttd -rw-r--r-- 1 root root 33 Dec 7 07:17 ./opt/perf/datafiles/RUN -rwxrwxrwx 1 root sys 9243180 Dec 7 13:55 ./opt/perf/datafiles/logappl -rwxrwxrwx 1 root sys 8697612 Dec 7 13:55 ./opt/perf/datafiles/logdev -rwxrwxrwx 1 root sys 9195152 Dec 7 13:55 ./opt/perf/datafiles/logglob -rwxrwxrwx 1 root sys 11112 Dec 7 07:17 ./opt/perf/datafiles/logindx -rwxrwxrwx 1 root sys 17639080 Dec 7 13:57 ./opt/perf/datafiles/logproc -rwxrwxrwx 1 root sys 3797 Dec 7 07:17 ./opt/perf/datafiles/mikslp.data -rw-rw-rw- 1 root sys 105 Nov 30 10:45 ./opt/perf/datafiles/agdb -rw-r--r-- 1 root root 5 Dec 7 07:17 ./opt/perf/datafiles/.perflbd.pid -rw-rw-rw- 1 root sys 21176 Dec 7 07:20 ./opt/perf/status.scope -rw-rw-rw- 1 root root 5 Nov 30 07:16 ./opt/perf/ttd.pid -rw-r--r-- 1 root root 0 Dec 7 07:17 ./opt/perf/status.mi -rw-rw-rw- 1 root sys 8254 Dec 7 07:17 ./opt/perf/status.perflbd -rw-rw-rw- 1 root sys 21507 Dec 7 07:20 ./opt/perf/status.rep_server -rw-rw-rw- 1 root sys 24570 Dec 7 07:20 ./opt/perf/status.alarmgen -rw-rw-rw- 1 root sys 160956 Dec 6 21:13 ./opt/omni/log/inet.log -rw-rw-rw- 1 root sys 158796 Dec 7 07:17 ./sam/log/samlog -rw-r--r-- 1 root root 64730 Dec 7 07:17 ./sam/boot.config -rw-rw-rw- 1 root sys 11906 Nov 24 14:27 ./sam/poe.iout -rw-rw-rw- 1 root sys 11906 Nov 23 09:10 ./sam/poe.iout.old -rw-rw-rw- 1 root sys 29 Nov 24 14:27 ./sam/poe.dion
You can see that this file contains several fields separated by white space. The next example evaluates the third field to determine whether it equals "adm," and if so, the line is printed:
# awk $3 == "adm" {print} newfiles -rw-rw-r-- 1 adm adm 2750700 Dec 7 13:52 ./adm/wtmp
There is precisely one line that contains exactly "adm" in the third field.
The next example evaluates the third field to determine whether it approximately equals "adm," meaning that the third field has "adm" embedded in it, and if so, the line is printed:
# awk $3 ~ "adm" {print} newfiles -rw-rw-r-- 1 adm adm 2750700 Dec 7 13:52 ./adm/wtmp -rw------- 1 sysadm psoft 60 Dec 1 16:59 ./tmp/EAAa09057
This result prints the line from the last example, which has "adm" in the third field as well as a line that contains "sysadm."
The next example performs the same search as the previous example; however, this time only fields nine and five are printed:
# awk $3 ~ "adm" {print $9, $5} newfiles ./adm/wtmp 2750700 ./tmp/EAAa09057 60
This time only the name of the file, field nine, and the size of the file are printed.
The next example evaluates the third field to determine if it does not equal "root," and if so, prints the entire line:
# awk $3 != "root" {print} newfiles PROG>>>>> report of files not older than 14 days by find the file system is / -rw-r--r-- 1 bin bin 8553 Dec 7 07:02 ./etc/shutdownlog -rw------- 1 autosys autosys 4052 Nov 25 14:08 ./home/autosys/.sh_history -rw------- 1 tsaxs users 2228 Dec 1 13:15 ./home/tsaxs/.sh_history -rw------- 1 tsfxo users 2862 Nov 24 10:08 ./home/tsfxo/.sh_history PROG>>>>> report of files not older than 14 days by find the file system is /usr -rw-rw-rw- 1 opop6 users 21 Dec 7 13:46 ./local/adm/etc/lmonitor.hst -rw-r--r-- 1 tsgjf users 1093 Dec 7 13:17 ./local/flexlm/licenses/license.log PROG>>>>> report of files not older than 14 days by find the file system is /opt -rw-rw-r-- 1 bin bin 200 Dec 7 07:17 ./pred/bin/OPSDBPF PROG>>>>> report of files not older than 14 days by find the file system is /var -rw-rw-r-- 1 adm adm 2750700 Dec 7 13:52 ./adm/wtmp -rw-r--r-- 1 lp lp 33 Dec 7 07:17 ./adm/lp/log -rw-r--r-- 1 lp lp 67 Dec 7 07:01 ./adm/lp/oldlog -rw------- 1 sysadm psoft 60 Dec 1 16:59 ./tmp/EAAa09057 -rw-r--r-- 1 tsgjf users 0 Dec 7 13:17 ./tmp/lockHPCUPLANGS -rw-r--r-- 1 tsgjf users 175 Dec 7 06:40 ./tmp/.flexlm/lmgrd.1507 -rw-r--r-- 1 tsgjf users 175 Dec 7 13:28 ./tmp/.flexlm/lmgrd.1505 -rw-r--r-- 1 lp lp 0 Dec 7 07:17 ./spool/lp/outputq -rw-rw-rw- 1 lp lp 4 Dec 7 07:17 ./spool/lp/SCHEDLOCK
This command results in many lines being printed that do not have "root" in the third field.
newfiles had whitespace to separate the fields. We don't often have this luxury in the UNIX world. The upcoming examples use passwd.test, which has a colon(:) as a field separator. passwd.test is shown below:
# cat passwd.test root:PgYQCkVH65hyQ:0:0:root:/root:/bin/bash bin:*:1:1:bin:/bin: daemon:*:2:2:daemon:/sbin: adm:*:3:4:adm:/var/adm: lp:*:4:7:lp:/var/spool/lpd: sync:*:5:0:sync:/sbin:/bin/sync shutdown:*:6:11:shutdown:/sbin:/sbin/shutdown halt:*:7:0:halt:/sbin:/sbin/halt mail:*:8:12:mail:/var/spool/mail: news:*:9:13:news:/var/spool/news: uucp:*:10:14:uucp:/var/spool/uucp: operator:*:11:0:operator:/root: games:*:12:100:games:/usr/games: gopher:*:13:30:gopher:/usr/lib/gopher-data: ftp:*:14:50:FTP User:/home/ftp: man:*:15:15:Manuals Owner:/: nobody:*:65534:65534:Nobody:/:/bin/false col:Wh0yzfAV2qm2Y:100:100:Caldera OpenLinux User:/home/col:/bin/bash
You can specify the field separator with the -F option followed by a separator, which is a colon(:) in passwd.test. The following example specifies the field separator and then evaluates the first field to determine whether it equals "root," and if so, prints out the entire line:
# awk -F: $1 == "root" {print} passwd.test root:PgYQCkVH65hyQ:0:0:root:/root:/bin/bash
The following example specifies the field separator and then evaluates the fourth field to determine whether it equals "0," which means the user is a member of the same group as "root", and if so, prints out the entire line:
# awk -F: $4 == "0" {print} passwd.test root:PgYQCkVH65hyQ:0:0:root:/root:/bin/bash sync:*:5:0:sync:/sbin:/bin/sync halt:*:7:0:halt:/sbin:/sbin/halt operator:*:11:0:operator:/root:
You can perform many types of comparisons besides == using awk. The following examples show the use of several comparison operators on our trusty passwd.test file. The first example prints all users who are in a group with a value less than 14:
# awk -F: $4 < 14 {print} passwd.test root:PgYQCkVH65hyQ:0:0:root:/root:/bin/bash bin:*:1:1:bin:/bin: daemon:*:2:2:daemon:/sbin: adm:*:3:4:adm:/var/adm: lp:*:4:7:lp:/var/spool/lpd: sync:*:5:0:sync:/sbin:/bin/sync shutdown:*:6:11:shutdown:/sbin:/sbin/shutdown halt:*:7:0:halt:/sbin:/sbin/halt mail:*:8:12:mail:/var/spool/mail: news:*:9:13:news:/var/spool/news: operator:*:11:0:operator:/root:
The next example prints all users who are in a group with a value less than or equal to 14:
# awk -F: $4 <= 14 {print} passwd.test root:PgYQCkVH65hyQ:0:0:root:/root:/bin/bash bin:*:1:1:bin:/bin: daemon:*:2:2:daemon:/sbin: adm:*:3:4:adm:/var/adm: lp:*:4:7:lp:/var/spool/lpd: sync:*:5:0:sync:/sbin:/bin/sync shutdown:*:6:11:shutdown:/sbin:/sbin/shutdown halt:*:7:0:halt:/sbin:/sbin/halt mail:*:8:12:mail:/var/spool/mail: news:*:9:13:news:/var/spool/news: uucp:*:10:14:uucp:/var/spool/uucp: operator:*:11:0:operator:/root:
Let's now print all users who are in a group that does not have a value of 14:
# awk -F: $4 != 14 {print} passwd.test root:PgYQCkVH65hyQ:0:0:root:/root:/bin/bash bin:*:1:1:bin:/bin: daemon:*:2:2:daemon:/sbin: adm:*:3:4:adm:/var/adm: lp:*:4:7:lp:/var/spool/lpd: sync:*:5:0:sync:/sbin:/bin/sync shutdown:*:6:11:shutdown:/sbin:/sbin/shutdown halt:*:7:0:halt:/sbin:/sbin/halt mail:*:8:12:mail:/var/spool/mail: news:*:9:13:news:/var/spool/news: operator:*:11:0:operator:/root: games:*:12:100:games:/usr/games: gopher:*:13:30:gopher:/usr/lib/gopher-data: ftp:*:14:50:FTP User:/home/ftp: man:*:15:15:Manuals Owner:/: nobody:*:65534:65534:Nobody:/:/bin/false col:Wh0yzfAV2qm2Y:100:100:Caldera OpenLinux User:/home/col:/bin/bash
Let's now print all users who are in a group with a value greater than or equal to 14:
# awk -F: $4 >= 14 {print} passwd.test uucp:*:10:14:uucp:/var/spool/uucp: games:*:12:100:games:/usr/games: gopher:*:13:30:gopher:/usr/lib/gopher-data: ftp:*:14:50:FTP User:/home/ftp: man:*:15:15:Manuals Owner:/: nobody:*:65534:65534:Nobody:/:/bin/false col:Wh0yzfAV2qm2Y:100:100:Caldera OpenLinux User:/home/col:/bin/bash
The last example shows all users who are in a group with a value greater than 14:
# awk -F: $4 > 14 {print} passwd.test games:*:12:100:games:/usr/games: gopher:*:13:30:gopher:/usr/lib/gopher-data: ftp:*:14:50:FTP User:/home/ftp: man:*:15:15:Manuals Owner:/: nobody:*:65534:65534:Nobody:/:/bin/false col:Wh0yzfAV2qm2Y:100:100:Caldera OpenLinux User:/home/col:/bin/bash
There is much more to awk than what I covered in this section. There are additional awk examples in the shell programming chapter.
The following table summarizes some of the comparison operators of awk covered in this section:
awk - Search a line for a specified pattern and perform operation(s).
|
Here in the information age, we have too much information. We are constantly trying to extract the information we are after from stacks of information. The grep command is used to search for text and display it. grep stands for General Regular Expression Parser. Let's first look at a few simple searches and display the output with grep. Figure 6-1 shows creating a long listing for /home/denise, and using grep, we search for patterns.
First, we search for the pattern netscape. This produces a list of files, all of which begin with.netscape.
Next we use the -c option to create a count for the number of times that netscape is found. The result is 6.
Do you think that grep is case-sensitive? The next example shows searching for the pattern netscape, and no matching patterns exist.
Using the -i option causes grep to ignore uppercase and lower case and just search for the pattern, and again, all the original matches are found.
Also, more than one pattern can be searched for. Using the -F option, both netscape and.c are searched for and a longer list of matches are found. Notice that two patterns to search for are enclosed in double quotes and are separated by a new line.
Let's now take a look at a couple of more advanced searches using grep. We'll use the passwd.test file as the basis for our searches because each line in it contains a lot of information. To start, the following is the contents of the passwd.test file on a Linux system:
# cat passwd.test root:PgYQCkVH65hyQ:0:0:root:/root:/bin/bash bin:*:1:1:bin:/bin: daemon:*:2:2:daemon:/sbin: adm:*:3:4:adm:/var/adm: lp:*:4:7:lp:/var/spool/lpd: sync:*:5:0:sync:/sbin:/bin/sync shutdown:*:6:11:shutdown:/sbin:/sbin/shutdown halt:*:7:0:halt:/sbin:/sbin/halt mail:*:8:12:mail:/var/spool/mail: news:*:9:13:news:/var/spool/news: uucp:*:10:14:uucp:/var/spool/uucp: operator:*:11:0:operator:/root: games:*:12:100:games:/usr/games: gopher:*:13:30:gopher:/usr/lib/gopher-data: ftp:*:14:50:FTP User:/home/ftp: man:*:15:15:Manuals Owner:/: nobody:*:65534:65534:Nobody:/:/bin/false col:Wh0yzfAV2qm2Y:100:100:Caldera OpenLinux User:/home/col:/bin/bash
We can search for a string in the password file just as we did in the earlier grep example. The following example searches for news in the passwd.test file:
# grep news passwd.test news:*:9:13:news:/var/spool/news:
Now let's check to see whether there is a user named bin in the passwd.test file. In orderforausernamedbin to have an entry in the passwd.test file, the user name, in this case bin, would be the first entry in the line. Here is the result of searching for this user:
# grep bin passwd.test root:PgYQCkVH65hyQ:0:0:root:/root:/bin/bash bin:*:1:1:bin:/bin: daemon:*:2:2:daemon:/sbin: sync:*:5:0:sync:/sbin:/bin/sync shutdown:*:6:11:shutdown:/sbin:/sbin/shutdown halt:*:7:0:halt:/sbin:/sbin/halt nobody:*:65534:65534:Nobody:/:/bin/false col:Wh0yzfAV2qm2Y:100:100:Caldera OpenLinux User:/home/col:/bin/bash
Many lines from passwd.test are indeed produced that contain the string bin; however, we have to search through these lines in order to find the user bin, which is the line in which bin is the first string that appears. This is more than we wanted when we initiated our search. We wanted to see a user name bin that would appear at the beginning of a line. We can further qualify our search, in this case to limit the search to a string at the beginning of a line, by using pattern matching discussed at the beginning of this chapter (see the Table 6-1.) In this case, we want to search only at the beginning of a line for bin, so we'll qualify our search with a caret (^) to restrict the search to only the beginning of the line, as shown in the following example:
# grep ^bin passwd.test bin:*:1:1:bin:/bin:
This search results in exactly the information in which we are interested, that is, a line beginning with bin. When using special characters, such as the caret(^) in this example, you should enclose the special characters in single quotes ('). Special characters may be interpreted by the shell and cause problems with the arguments we're trying to sendtogrep. Enclosing the search pattern in single quotes will ensures that the search pattern, in this case ^bin, is passed directly to grep. The search pattern in single quotes looks like the following:
# grep ^bin passwd.test bin:*:1:1:bin:/bin:
Because we are going to have to search for this line in the passwd.test file after we find it, we may as well print out the line number as well as the line itself by using the -n option, as shown in the following example:
# grep -n ^bin passwd.test 2:bin:*:1:1:bin:/bin:
The following is a summary of the grep command:
grep - Search for text and display results.
|
The following are the HP-UX manual pages for many of the commands used in the chapter. Commands often differ among UNIX variants, so you may find differences in the options or other areas for some commands; however, the following manual pages serve as an excellent reference.
awk - Pattern-processing language.
awk(1) awk(1) NAME awk - pattern-directed scanning and processing language SYNOPSIS awk [-Ffs] [-v var=value] [program | -f progfile ...] [file ...] DESCRIPTION awk scans each input file for lines that match any of a set of patterns specified literally in program or in one or more files specified as -f progfile. With each pattern there can be an associated action that is to be performed when a line in a file matches the pattern. Each line is matched against the pattern portion of every pattern-action statement, and the associated action is performed for each matched pattern. The file name - means the standard input. Any file of the form var=value is treated as an assignment, not a filename. An assignment is evaluated at the time it would have been opened if it were a filename, unless the -v option is used. An input line is made up of fields separated by white space, or by regular expression FS. The fields are denoted $1, $2, ...; $0 refers to the entire line. Options awk recognizes the following options and arguments: -F fs Specify regular expression used to separate fields. The default is to recognize space and tab characters, and to discard leading spaces and tabs. If the -F option is used, leading input field separators are no longer discarded. -f progfile Specify an awk program file. Up to 100 program files can be specified. The pattern-action statements in these files are executed in the same order as the files were specified. -v var=value Cause var=value assignment to occur before the BEGIN action (if it exists) is executed. Statements A pattern-action statement has the form: pattern { action } A missing { action } means print the line; a missing pattern always matches. Pattern-action statements are separated by new-lines or semicolons. An action is a sequence of statements. A statement can be one of the following: if(expression) statement [else statement] while(expression) statement for(expression;expression;expression) statement for(var in array) statement do statement while(expression) break continue {[statement ...]} expression # commonly var=expression print[expression-list] [> expression] printf format [, expression-list] [> expression] return [expression] next # skip remaining patterns on this input line. delete array [expression] # delete an array element. exit [expression] # exit immediately; status is expression. Statements are terminated by semicolons, newlines or right braces. An empty expression-list stands for $0. String constants are quoted (""), with the usual C escapes recognized within. Expressions take on string or numeric values as appropriate, and are built using the operators +, -, *, /, %, ^ (exponentiation), and concatenation (indicated by a blank). The operators ++, --, +=, -=, *=, /=, %=, ^=, **=, >, >=, <, <=, ==, !=, and ?: are also available in expressions. Variables can be scalars, array elements (denoted x[i]) or fields. Variables are initialized to the null string. Array subscripts can be any string, not necessarily numeric (this allows for a form of associative memory). Multiple subscripts such as [i,j,k] are permitted. The constituents are concatenated, separated by the value of SUBSEP. The print statement prints its arguments on the standard output (or on a file if >file or >>file is present or on a pipe if |cmd is present), separated by the current output field separator, and terminated by the output record separator. file and cmd can be literal names or parenthesized expressions. Identical string values in different statements denote the same open file. The printf statement formats its expression list according to the format (see printf(3)). Built-In Functions The built-in function close(expr) closes the file or pipe expr opened by a print or printf statement or a call to getline with the same string-valued expr. This function returns zero if successful, otherwise, it returns non-zero. The customary functions exp, log, sqrt, sin, cos, atan2 are built in. Other built-in functions are: blength[([s])] Length of its associated argument (in bytes) taken as a string, or of $0 if no argument. length[([s])] Length of its associated argument (in characters) taken as a string, or of $0 if no argument. rand() Returns a random number between zero and one. srand([expr]) Sets the seed value for rand, and returns the previous seed value. If no argument is given, the time of day is used as the seed value; otherwise, expr is used. int(x) Truncates to an integer value substr(s, m[, n]) Return the at most n-character substring of s that begins at position m, numbering from 1. If n is omitted, the substring is limited by the length of string s. index(s, t) Return the position, in characters, numbering from 1, in string s where string t first occurs, or zero if it does not occur at all. match(s, ere) Return the position, in characters, numbering from 1, in string s where the extended regular expression ere occurs, or 0 if it does not. The variables RSTART and RLENGTH are set to the position and length of the matched string. split(s, a[, fs]) Splits the string s into array elements a[1], a[2], ..., a[n], and returns n. The separation is done with the regular expression fs, or with the field separator FS if fs is not given. sub(ere, repl [, in]) Substitutes repl for the first occurrence of the extended regular expression ere in the string in. If in is not given, $0 is used. gsub Same as sub except that all occurrences of the regular expression are replaced; sub and gsub return the number of replacements. sprintf(fmt, expr, ...) String resulting from formatting expr ... according to the printf(3S) format fmt system(cmd) Executes cmd and returns its exit status toupper(s) Converts the argument string s to uppercase and returns the result. tolower(s) Converts the argument string s to lowercase and returns the result. The built-in function getline sets $0 to the next input record from the current input file; getline < file sets $0 to the next record from file. getline x sets variable x instead. Finally, cmd | getline pipes the output of cmd into getline; each call of getline returns the next line of output from cmd. In all cases, getline returns 1 for a successful input, 0 for end of file, and -1 for an error. Patterns Patterns are arbitrary Boolean combinations (with ! || &&) of regular expressions and relational expressions. awk supports Extended Regular Expressions as described in regexp(5). Isolated regular expressions in a pattern apply to the entire line. Regular expressions can also occur in relational expressions, using the operators ~ and !~. /re/ is a constant regular expression; any string (constant or variable) can be used as a regular expression, except in the position of an isolated regular expression in a pattern. A pattern can consist of two patterns separated by a comma; in this case, the action is performed for all lines from an occurrence of the first pattern though an occurrence of the second. A relational expression is one of the following: expression matchop regular-expression expression relop expression expression in array-name (expr,expr,...) in array-name where a relop is any of the six relational operators in C, and a matchop is either ~ (matches) or !~ (does not match). A conditional is an arithmetic expression, a relational expression, or a Boolean combination of the two. The special patterns BEGIN and END can be used to capture control before the first input line is read and after the last. BEGIN and END do not combine with other patterns. Special Characters The following special escape sequences are recognized by awk in both regular expressions and strings: Escape Meaning \a alert character \b backspace character \f form-feed character \n new-line character \r carriage-return character \t tab character \v vertical-tab character \nnn 1- to 3-digit octal value nnn \xhhh 1- to n-digit hexadecimal number Variable Names Variable names with special meanings are: FS Input field separator regular expression; a space character by default; also settable by option -Ffs. NF The number of fields in the current record. NR The ordinal number of the current record from the start of input. Inside a BEGIN action the value is zero. Inside an END action the value is the number of the last record processed. FNR The ordinal number of the current record in the current file. Inside a BEGIN action the value is zero. Inside an END action the value is the number of the last record processed in the last file processed. FILENAME A pathname of the current input file. RS The input record separator; a newline character by default. OFS The print statement output field separator; a space character by default. ORS The print statement output record separator; a newline character by default. OFMT Output format for numbers (default %.6g). If the value of OFMT is not a floating-point format specification, the results are unspecified. CONVFMT Internal conversion format for numbers (default %.6g). If the value of CONVFMT is not a floating-point format specification, the results are unspecified. SUBSEP The subscript separator string for multi- dimensional arrays; the default value is " 34" ARGC The number of elements in the ARGV array. ARGV An array of command line arguments, excluding options and the program argument numbered from zero to ARGC-1. The arguments in ARGV can be modified or added to; ARGC can be altered. As each input file ends, awk will treat the next non-null element of ARGV, up to the current value of ARGC-1, inclusive, as the name of the next input file. Thus, setting an element of ARGV to null means that it will not be treated as an input file. The name - indicates the standard input. If an argument matches the format of an assignment operand, this argument will be treated as an assignment rather than a file argument. ENVIRON Array of environment variables; subscripts are names. For example, if environment variable V=thing, ENVIRON["V"] produces thing. RSTART The starting position of the string matched by the match function, numbering from 1. This is always equivalent to the return value of the match function. RLENGTH The length of the string matched by the match function. Functions can be defined (at the position of a pattern-action statement) as follows: function foo(a, b, c) { ...; return x } Parameters are passed by value if scalar, and by reference if array name. Functions can be called recursively. Parameters are local to the function; all other variables are global. Note that if pattern-action statements are used in an HP-UX command line as an argument to the awk command, the pattern-action statement must be enclosed in single quotes to protect it from the shell. For example, to print lines longer than 72 characters, the pattern-action statement as used in a script (-f progfile command form) is: length > 72 The same pattern action statement used as an argument to the awk command is quoted in this manner: awk 'length > 72' EXTERNAL INFLUENCES Environment Variables LANG Provides a default value for the internationalization variables that are unset or null. If LANG is unset or null, the default value of "C" (see lang(5)) is used. If any of the internationalization variables contains an invalid setting, awk will behave as if all internationalization variables are set to "C". See environ(5). LC_ALL If set to a non-empty string value, overrides the values of all the other internationalization variables. LC_CTYPE Determines the interpretation of text as single and/or multi-byte characters, the classification of characters as printable, and the characters matched by character class expressions in regular expressions. LC_NUMERIC Determines the radix character used when interpreting numeric input, performing conversion between numeric and string values and formatting numeric output. Regardless of locale, the period character (the decimal-point character of the POSIX locale) is the decimal-point character recognized in processing awk programs (including assignments in command-line arguments). LC_COLLATE Determines the locale for the behavior of ranges, equivalence classes and multi-character collating elements within regular expressions. LC_MESSAGES Determines the locale that should be used to affect the format and contents of diagnostic messages written to standard error and informative messages written to standard output. NLSPATH Determines the location of message catalogues for the processing of LC_MESSAGES. PATH Determines the search path when looking for commands executed by system(cmd), or input and output pipes. In addition, all environment variables will be visible via the awk variable ENVIRON. International Code Set Support Single- and multi-byte character code sets are supported except that variable names must contain only ASCII characters and regular expressions must contain only valid characters. DIAGNOSTICS awk supports up to 199 fields ($1, $2, ..., $199) per record. EXAMPLES Print lines longer than 72 characters: length > 72 Print first two fields in opposite order: { print $2, $1 } Same, with input fields separated by comma and/or blanks and tabs: BEGIN { FS = ",[ \t]*|[ \t]+" } { print $2, $1 } Add up first column, print sum and average: {s +=$1 }" END { print "sum is", s, " average is", s/NR } Print all lines between start/stop pairs: /start/, /stop/ Simulate echo command (see echo(1)): BEGIN { # Simulate echo(1) for (i = 1; i < ARGC; i++) printf "%s ", ARGV[i] printf "\n" exit } AUTHOR awk was developed by AT&T, IBM, OSF, and HP. SEE ALSO lex(1), sed(1). A. V. Aho, B. W. Kernighan, P. J. Weinberger: The AWK Programming Language, Addison-Wesley, 1988. STANDARDS CONFORMANCE awk: SVID2, SVID3, XPG2, XPG3, XPG4, POSIX.2
grep - Command to match a specified pattern.
grep(1) grep(1) NAME grep, egrep, fgrep - search a file for a pattern SYNOPSIS Plain call with pattern grep [-E|-F] [-c|-l|-q] [-insvx] pattern [file ...] Call with (multiple) -e pattern grep [-E|-F] [-c|-l|-q] [-binsvx] -e pattern... [-e pattern] ... [file ...] Call with -f file grep [-E|-F] [-c|-l|-q] [-insvx] [-f pattern_file] [file ...] Obsolescent: egrep [-cefilnsv] [expression] [file ...] fgrep [-cefilnsvx] [strings] [file ...] DESCRIPTION The grep command searches the input text files (standard input default) for lines matching a pattern. Normally, each line found is copied to the standard output. grep supports the Basic Regular Expression syntax (see regexp(5)). The -E option (egrep) supports Extended Regular Expression (ERE) syntax (see regexp(5)). The -F option (fgrep) searches for fixed strings using the fast Boyer-Moore string searching algorithm. The -E and -F options treat newlines embedded in the pattern as alternation characters. A null expression or string matches every line. The forms egrep and fgrep are maintained for backward compatibility. The use of the -E and -F options is recommended for portability. Options -E Extended regular expressions. Each pattern specified is a sequence of one or more EREs. The EREs can be separated by newline characters or given in separate -e expression options. A pattern matches an input line if any ERE in the sequence matches the contents of the input line without its trailing newline character. The same functionality is obtained by using egrep. -F Fixed strings. Each pattern specified is a sequence of one or more strings. Strings can be separated by newline characters or given in separate -e expression options. A pattern matches an input line if the line contains any of the strings in the sequence. The same functionality is obtained by using fgrep. -b Each line is preceded by the block number on which it was found. This is useful in locating disk block numbers by context. Block numbers are calculated by dividing by 512 the number of bytes that have been read from the file and rounding down the result. -c Only a count of matching lines is printed. -e expression Same as a simple expression argument, but useful when the expression begins with a hyphen (-). Multiple -e options can be used to specify multiple patterns; an input line is selected if it matches any of the specified patterns. -f pattern_file The regular expression (grep and grep -E) or strings list (grep -F) is taken from the pattern_file. -i Ignore uppercase/lowercase distinctions during comparisons. -l Only the names of files with matching lines are listed (once), separated by newlines. If standard input is searched, a path name of - is listed. -n Each line is preceded by its relative line number in the file starting at 1. The line number is reset for each file searched. This option is ignored if -c, -b, -l, or -q is specified. -q (Quiet) Do not write anything to the standard output, regardless of matching lines. Exit with zero status upon finding the first matching line. Overrides any options that would produce output. -s Error messages produced for nonexistent or unreadable files are suppressed. -v All lines but those matching are printed. -x (eXact) Matches are recognized only when the entire input line matches the fixed string or regular expression. In all cases in which output is generated, the file name is output if there is more than one input file. Care should be taken when using the characters $, *, [, ^, |, (, ), and \ in expression, because they are also meaningful to the shell. It is safest to enclose the entire expression argument in single quotes ('...'). EXTERNAL INFLUENCES Environment Variables LANG determines the locale to use for the locale categories when both LC_ALL and the corresponding environment variable (beginning with LC_) do not specify a locale. If LANG is not specified or is set to the empty string, a default of C (see lang(5)) is used. LC_ALL determines the locale to use to override any values for locale categories specified by the settings of LANG or any environment variables beginning with LC_. LC_COLLATE determines the collating sequence used in evaluating regular expressions. LC_CTYPE determines the interpretation of text as single byte and/or multi-byte characters, the classification of characters as letters, the case information for the -i option, and the characters matched by character class expressions in regular expressions. LC_MESSAGES determines the language in which messages are displayed. If any internationalization variable contains an invalid setting, the commands behave as if all internationalization variables are set to C. See environ(5). International Code Set Support Single-byte and multi-byte character code sets are supported. RETURN VALUE Upon completion, grep returns one of the following values: 0 One or more matches found. 1 No match found. 2 Syntax error or inaccessible file (even if matches were found). EXAMPLES In the Bourne shell (sh(1)) the following example searches two files, finding all lines containing occurrences of any of four strings: grep -F 'if then else fi' file1 file2 Note that the single quotes are necessary to tell grep -F when the strings have ended and the file names have begun. For the C shell (see csh(1)) the following command can be used: grep -F 'if\ then\ else\ fi' file1 file2 To search a file named address containing the following entries: Ken 112 Warring St. Apt. A Judy 387 Bowditch Apt. 12 Ann 429 Sixth St. the command: grep Judy address prints: Judy 387 Bowditch Apt. 12 To search a file for lines that contain either a Dec or Nov, use either of the following commands: grep -E '[Dd]ec|[Nn]ov' file egrep -i 'dec|nov' file Search all files in the current directory for the string xyz: grep xyz * Search all files in the current directory subtree for the string xyz, and ensure that no error occurs due to file name expansion exceeding system argument list limits: find . -type f -print |xargs grep xyz The previous example does not print the name of files where string xyz appears. To force grep to print file names, add a second argument to the grep command portion of the command line: find . -type f -print |xargs grep xyz /dev/null In this form, the first file name is that produced by find, and the second file name is the null file. WARNINGS (XPG4 only.) If the -q option is specified, the exit status will be zero if an input line is selected, even if an error was detected. Otherwise, default actions will be performed. SEE ALSO sed(1), sh(1), regcomp(3C), environ(5), lang(5), regexp(5). STANDARDS CONFORMANCE grep: SVID2, SVID3, XPG2, XPG3, XPG4, POSIX.2 egrep: SVID2, SVID3, XPG2, XPG3, XPG4, POSIX.2 fgrep: SVID2, SVID3, XPG2, XPG3, XPG4, POSIX.2
sed - Stream text editor.
sed(1) sed(1) NAME sed - stream text editor SYNOPSIS sed [-n] script [file ...] sed [-n] [-e script] ... [-f script_file] ... [file ...] DESCRIPTION sed copies the named text files (standard input default) to the standard output, edited according to a script containing up to 100 commands. Only complete input lines are processed. Any input text at the end of a file that is not terminated by a new-line character is ignored. Options sed recognizes the following options: -f script_file Take script from file script_file. -e script Edit according to script. If there is just one -e option and no -f options, the flag -e can be omitted. -n Suppress the default output. sed interprets all -escript and -fscript_file arguments in the order given. Use caution, if mixing -e and -f options, to avoid unpredictable or incorrect results. Command Scripts A script consists of editor commands, one per line, of the following form: [address [, address]] function [arguments] In normal operation, sed cyclically copies a line of input into a pattern space (unless there is something left after a D command), applies in sequence all commands whose addresses select that pattern space, and, at the end of the script, copies the pattern space to the standard output (except under -n) and deletes the pattern space. Some of the commands use a hold space to save all or part of the pattern space for subsequent retrieval. Command Addresses An address is either a decimal number that counts input lines cumulatively across files,a $which addresses the last line of input, or a context address; that is, a /regular expression/ in the style of ed(1) modified thus: - In a context address, the construction \?regular expression?, where ? is any character, is identical to /regular expression/. Note that in the context address \xabc\xdefx, the second x stands for itself, so that the regular expression is abcxdef. - The escape sequence \n matches a new-line character embedded in the pattern space. - A period (.) matches any character except the terminal new line of the pattern space. - A command line with no addresses selects every pattern space. - A command line with one address selects each pattern space that matches the address. - A command line with two addresses selects the inclusive range from the first pattern space that matches the first address through the next pattern space that matches the second (if the second address is a number less than or equal to the line number first selected, only one line is selected). Thereafter the process is repeated, looking again for the first address. sed supports Basic Regular Expression syntax (see regexp(5)). Editing commands can also be applied to only non-selected pattern spaces by use of the negation function ! (described below). Command Functions In the following list of functions, the maximum number of permissible addresses for each function is indicated in parentheses. Other function elements are interpreted as follows: text One or more lines, all but the last of which end with \ to hide the new-line. Backslashes in text are treated like backslashes in the replacement string of an s command, and can be used to protect initial blanks and tabs against the stripping that is done on every script line. rfile Must terminate the command line, and must be preceded by exactly one blank. wfile Must terminate the command line, and must be preceded by exactly one blank. Each wfile is created before processing begins. There can be at most 10 distinct wfile arguments. sed recognizes the following functions: (1)a\ text Append. Place text on the output before reading next input line. (2)b label Branch to the : command bearing label. If no label is specified, branch to the end of the script. (2)c\ text Change. Delete the pattern space. With 0 or 1 address or at the end of a 2-address range, place text on the output. Start the next cycle. (2)d Delete pattern space and start the next cycle. (2)D Delete initial segment of pattern space through first new-line and start the next cycle. (2)g Replace contents of the pattern space with contents of the hold space. (2)G Append contents of hold space to the pattern space. (2)h Replace contents of the hold space with contents of the pattern space. (2)H Append the contents of the pattern space to the hold space. (1)i\ text Insert. Place text on the standard output. (2)l List the pattern space on the standard output in an unambiguous form. Non-printing characters are spelled in three-digit octal number format (with a preceding backslash), and long lines are folded. (2)n Copy the pattern space to the standard output if the default output has not been suppressed (by the -n option on the command line or the #n command in the script file). Replace the pattern space with the next line of input. (2)N Append the next line of input to the pattern space with an embedded new-line. (The current line number changes.) (2)p Print. Copy the pattern space to the standard output. (2)P Copy the initial segment of the pattern space through the first new-line to the standard output. (1)q Quit. Branch to the end of the script. Do not start a new cycle. (1)r rfile Read contents of rfile and place on output before reading the next input line. (2)s/regular expression/replacement/flags Substitute replacement string for instances of regular expression in the pattern space. Any character can be used instead of /. For a fuller description see ed(1). flags is zero or more of: n n=1-2048 (LINE_MAX). Substitute for just the nth occurrence of regular expression in the pattern space. g Global. Substitute for all non-overlapping instances of regular expression rather than just the first one. p Print the pattern space if a replacement was made and the default output has been suppressed (by the -n option on the command line or the #n command in the script file). w wfile Write. Append the pattern space to wfile if a replacement was made. (2)t label Test. Branch to the : command bearing the label if any substitutions have been made since the most recent reading of an input line or execution of a t. If label is empty, branch to the end of the script. (2)w wfile Write. Append the pattern space to wfile. (2)x Exchange the contents of the pattern and hold spaces. (2)y/string1/string2/ Transform. Replace all occurrences of characters in string1 with the corresponding character in string2. The lengths of string1 and string2 must be equal. (2)! function Don't. Apply the function (or group, if function is {) only to lines not selected by the address or addresses. (0): label This command does nothing; it bears a label for b and t commands to branch to. (1)= Place the current line number on the standard output as a line. (2){ Execute the following commands through a matching } only when the pattern space is selected. The syntax is: { cmd1 cmd2 cmd3 . . . } (0) An empty command is ignored. (0)# If a # appears as the first character on the first line of a script file, that entire line is treated as a comment with one exception: If the character after the # is an n, the default output is suppressed. The rest of the line after #n is also ignored. A script file must contain at least one non-comment line. EXTERNAL INFLUENCES Environment Variables LANG provides a default value for the internationalization variables that are unset or null. If LANG is unset or null, the default value of "C" (see lang(5)) is used. If any of the internationalization variables contains an invalid setting, sed will behave as if all internationalization variables are set to "C". See environ(5). LC_ALL If set to a non-empty string value, overrides the values of all the other internationalization variables. LC_CTYPE determines the interpretation of text as single and/or multi-byte characters, the classification of characters as printable, and the characters matched by character class expressions in regular expressions. LC_MESSAGES determines the locale that should be used to affect the format and contents of diagnostic messages written to standard error and informative messages written to standard output. NLSPATH determines the location of message catalogues for the processing of LC_MESSAGES. International Code Set Support Single- and multi-byte character code sets are supported. EXAMPLES Make a simple substitution in a file from the command line or from a shell script, changing abc to xyz: sed 's/abc/xyz/' file1 >file1.out Same as above but use shell or environment variables var1 and var2 in search and replacement strings: sed "s/$var1/$var2/" file1 >file1.out or sed 's/'$var1'/'$var2'/' file1 >file1.out Multiple substitutions in a single command: sed -e 's/abc/xyz/' -e 's/lmn/rst/' file1 or sed -e 's/abc/xyz/' \ -e 's/lmn/rst/' \ file1 >file1.out WARNINGS sed limits command scripts to a total of not more than 100 commands. The hold space is limited to 8192 characters. sed processes only text files. See the glossary for a definition of text files and their limitations. AUTHOR sed was developed by OSF and HP. SEE ALSO awk(1), ed(1), grep(1), environ(5), lang(5), regexp(5). sed: A Non-Interactive Streaming Editor tutorial in the Text Processing Users Guide. STANDARDS CONFORMANCE sed: SVID2, SVID3, XPG2, XPG3, XPG4, POSIX.2
CONTENTS |