Chapter 6. Advanced UNIX Tools - Regular Expressions, sed, awk, and grep

CONTENTS

Chapter 6. Advanced UNIX Tools - Regular Expressions, sed, awk, and grep

Three Commands
Regular Expression Words-of-Caution
Expressions Are Strings and Wildcards
sed
awk
grep
Manual Pages for Some Commands Used in Chapter 6

Three Commands

The three commands covered in this chapter, along with regular expressions (pattern matching), are often grouped together. There are even many books available devoted to awk and sed. In these books, grep usually goes along for the ride because awk is somewhat derived from sed and grep. In addition, the use of regular expressions for pattern matching that are used for awk, sed, andgrep are similar.

graphics/06icon01.gif

graphics/06icon02.gif

graphics/05icon13.gif

We'll take a look at regular expressions and pattern matching in general, and then cover the three commands in this chapter:

Regular expressions
sed, awk, and grep commands

Regular Expression Words-of-Caution

Regular expressions describe patterns for which you are searching. A regular expression usually defines the pattern for which you are searching using wildcards. Since a regular expression defines a pattern you are searching for, the terms "regular expression" and "pattern matching" are often used interchangably.

Let's get down to a couple of words-of-caution immediately:

graphics/06icon03.gif

Regular expressions are different from file matching patterns used by the shell. Regular expressions are used by both the shell and many other programs, including those covered in this chapter. The file matching done by the shell and programs such as find are different from the regular expressions covered in this chapter.
Use single quotes around regular expressions. The meta-characters used in this chapter must be quoted in order to be passed to the shell as an argument. You will, therefore, see most regular expressions in this chapter quoted.

Expressions Are Strings and Wildcards

graphics/06icon04.gif

When using the programs in this book, such as grep and vi, you provide a regular expression that the program evaluates. The command will search for the pattern you supply. The pattern could be as simple as a string or it could be wildcards. The wildcards used by many programs are called meta-characters.

graphics/06icon01.gif

graphics/05icon13.gif

graphics/06icon02.gif

Table 6-1 shows a list of meta-characters and the program(s) to which they apply. Only the programs covered in this book (awk, grep, sed, and vi) are shown in Table 6-1. These meta-characters may be used with other programs, such as ed and egrep, as well, which are not covered in the book. Table 6-1 describes the meta-characters and their use.

Table 6-1. Meta-Characters and Programs to Which They Apply
Meta Character	awk	grep	sed	vi	Use
.	Yes	Yes	Yes	Yes	Match any single character.
*	Yes	Yes	Yes	Yes	Match any number of the single character that precedes *.
[...]	Yes	Yes	Yes	Yes	Match any one of the characters in the set [...].
$	Yes	Yes	Yes	Yes	Matches the end of the line.
^	Yes	Yes	Yes	Yes	Matches the beginning of the line.
\	Yes	Yes	Yes	Yes	Escape the special character that follows \.
\{n,m\}	Yes	Yes	No	No	Match a range of occur-rencesofa single character between n and m.
+	Yes	No	No	No	Matchoneormoreoccur-rences of the preceding regular expression.
?	Yes	No	No	No	Match zero or one occurrence of the preceding regular expression.
\|	Yes	No	No	No	The preceding or following regular expression can be matched.
()	Yes	No	No	No	Groups regular expressions in a typical parenthesis fashion.
\{ \}	No	No	No	Yes	Match a word's beginning or end.

You may want to refer to this table when regular expressions are used for one of the commands in the table.

sed

graphics/06icon04.gif

graphics/06icon02.gif

Most of the editing performed on UNIX systems is done with vi. I have devoted a chapter to vi in this book, because of its prominence as a UNIX editor. Many times, we don't have the luxury of invoking vi when we need to edit a file. You may be writing a shell program or piping information between processes and need to edit in a non-interactive manner. sed can help here. Its name comes from stream editor, and it's a tool for filtering text files.

You can specify the name of the file you wish to edit with sed or it takes its input from standard input. sed reads one line at a time and performs the editing you specify to each line. You can specify specific line numbers for sed to edit as well.

sed uses many of the same commands as ed. You can view some of the ed commands in the vi chapter, and I also supply a summary of these at the end of this sed section.

You can invoke sed in the following two ways:

sed [-n][-e]'command' filename(s)  sed [-n]-f scriptfile filename(s)

graphics/06icon02.gif

The first form of sed is for issuing commands on the command line. By default, sed will display all lines. The -n specifies that you want only to print lines that are specified with the p command.

If you supply more than one instruction on the command line, then your -e is used to inform sed that the next argument is an instruction.

The second form allows you to specify one or more scripts containing editing commands.

The following is a summary of the three options that appear in the two different forms of sed:

-n	Print only lines that are specified with the p command.
-e command	The argument following -e is an editing command.
-f filename	The argument following -f is a file containing editing commands.

Let's view a couple of simple examples of what you can do with sed. These examples use some of the sed commands that appear at the end of this section. We'll use a file called passwd.test. We'll view this file with cat and then view only lines 16, 17, and 18 using the p option to sed, indicating we want only the specified lines printed:

graphics/04icon02.gif

graphics/06icon02.gif

# cat passwd.test  root:PgYQCkVH65hyQ:0:0:root:/root:/bin/bash  bin:*:1:1:bin:/bin:  daemon:*:2:2:daemon:/sbin:  adm:*:3:4:adm:/var/adm:  lp:*:4:7:lp:/var/spool/lpd:  sync:*:5:0:sync:/sbin:/bin/sync  shutdown:*:6:11:shutdown:/sbin:/sbin/shutdown  halt:*:7:0:halt:/sbin:/sbin/halt  mail:*:8:12:mail:/var/spool/mail:  news:*:9:13:news:/var/spool/news:  uucp:*:10:14:uucp:/var/spool/uucp:  operator:*:11:0:operator:/root:  games:*:12:100:games:/usr/games:  gopher:*:13:30:gopher:/usr/lib/gopher-data:  ftp:*:14:50:FTP User:/home/ftp:  man:*:15:15:Manuals Owner:/:  nobody:*:65534:65534:Nobody:/:/bin/false  col:Wh0yzfAV2qm2Y:100:100:Caldera OpenLinux User:/home/col:/bin/bash  #  # sed 16,18p passwd.test  root:PgYQCkVH65hyQ:0:0:root:/root:/bin/bash  bin:*:1:1:bin:/bin:  daemon:*:2:2:daemon:/sbin:  adm:*:3:4:adm:/var/adm:  lp:*:4:7:lp:/var/spool/lpd:  sync:*:5:0:sync:/sbin:/bin/sync  shutdown:*:6:11:shutdown:/sbin:/sbin/shutdown  halt:*:7:0:halt:/sbin:/sbin/halt  mail:*:8:12:mail:/var/spool/mail:  news:*:9:13:news:/var/spool/news:  uucp:*:10:14:uucp:/var/spool/uucp:  operator:*:11:0:operator:/root:  games:*:12:100:games:/usr/games:  gopher:*:13:30:gopher:/usr/lib/gopher-data:  ftp:*:14:50:FTP User:/home/ftp:  man:*:15:15:Manuals Owner:/:  man:*:15:15:Manuals Owner:/:  nobody:*:65534:65534:Nobody:/:/bin/false  nobody:*:65534:65534:Nobody:/:/bin/false  col:Wh0yzfAV2qm2Y:100:100:Caldera OpenLinux User:/home/col:/bin/bash  col:Wh0yzfAV2qm2Y:100:100:Caldera OpenLinux User:/home/col:/bin/bash  #  # sed -n 16,18p passwd.test  man:*:15:15:Manuals Owner:/:  nobody:*:65534:65534:Nobody:/:/bin/false  col:Wh0yzfAV2qm2Y:100:100:Caldera OpenLinux User:/home/col:/bin/bash

The first attempt to print only lines 16, 17, and 18 results in all of the lines in the file being printed and lines 16, 17, and 18 being printed twice. The reason is that sed reads each line of input and acts on each line. In order to specify the lines on which to act, we used the -n switch to suppress all lines from going to standard output. We then specify the lines we want to print and these will indeed go to standard output.

Now that we know how to view lines 16, 17, and 18 of the file, let's again view passwd.test and delete those same three lines with d:

# cat passwd.test  root:PgYQCkVH65hyQ:0:0:root:/root:/bin/bash  bin:*:1:1:bin:/bin:  daemon:*:2:2:daemon:/sbin:  adm:*:3:4:adm:/var/adm:  lp:*:4:7:lp:/var/spool/lpd:  sync:*:5:0:sync:/sbin:/bin/sync  shutdown:*:6:11:shutdown:/sbin:/sbin/shutdown  halt:*:7:0:halt:/sbin:/sbin/halt  mail:*:8:12:mail:/var/spool/mail:  news:*:9:13:news:/var/spool/news:  uucp:*:10:14:uucp:/var/spool/uucp:  operator:*:11:0:operator:/root:  games:*:12:100:games:/usr/games:  gopher:*:13:30:gopher:/usr/lib/gopher-data:  ftp:*:14:50:FTP User:/home/ftp:  man:*:15:15:Manuals Owner:/:  nobody:*:65534:65534:Nobody:/:/bin/false  col:Wh0yzfAV2qm2Y:100:100:Caldera OpenLinux User:/home/col:/bin/bash  #  # sed 16,18d passwd.test  root:PgYQCkVH65hyQ:0:0:root:/root:/bin/bash  bin:*:1:1:bin:/bin:  daemon:*:2:2:daemon:/sbin:  adm:*:3:4:adm:/var/adm:  lp:*:4:7:lp:/var/spool/lpd:  sync:*:5:0:sync:/sbin:/bin/sync  shutdown:*:6:11:shutdown:/sbin:/sbin/shutdown  halt:*:7:0:halt:/sbin:/sbin/halt  mail:*:8:12:mail:/var/spool/mail:  news:*:9:13:news:/var/spool/news:  uucp:*:10:14:uucp:/var/spool/uucp:  operator:*:11:0:operator:/root:  games:*:12:100:games:/usr/games:  gopher:*:13:30:gopher:/usr/lib/gopher-data:  ftp:*:14:50:FTP User:/home/ftp:

graphics/05icon13.gif

graphics/06icon02.gif

As with our earlier grep example, we enclose any special characters in single quotes to make sure that they are not interfered with and are passed directly to sed unmodified and uninterpreted by the shell. In this example, we specify the range of lines to delete, 16 through 18, and the d for delete. We could specify just one line to delete, such as 16, and not specify an entire range. Because we did not redirect the output as part of the sed command line, the result is sent to standard output. The original file remains intact.

We could search for a pattern in a file and delete only those lines containing the pattern. The following example shows searching for bash and deleting the lines that contain bash:

# cat passwd.test  root:PgYQCkVH65hyQ:0:0:root:/root:/bin/bash  bin:*:1:1:bin:/bin:  daemon:*:2:2:daemon:/sbin:  adm:*:3:4:adm:/var/adm:  lp:*:4:7:lp:/var/spool/lpd:  sync:*:5:0:sync:/sbin:/bin/sync  shutdown:*:6:11:shutdown:/sbin:/sbin/shutdown  halt:*:7:0:halt:/sbin:/sbin/halt  mail:*:8:12:mail:/var/spool/mail:  news:*:9:13:news:/var/spool/news:  uucp:*:10:14:uucp:/var/spool/uucp:  operator:*:11:0:operator:/root:  games:*:12:100:games:/usr/games:  gopher:*:13:30:gopher:/usr/lib/gopher-data:  ftp:*:14:50:FTP User:/home/ftp:  man:*:15:15:Manuals Owner:/:  nobody:*:65534:65534:Nobody:/:/bin/false  col:Wh0yzfAV2qm2Y:100:100:Caldera OpenLinux User:/home/col:/bin/bash  #  # sed '/bash/ d' passwd.test  bin:*:1:1:bin:/bin:  daemon:*:2:2:daemon:/sbin:  adm:*:3:4:adm:/var/adm:  lp:*:4:7:lp:/var/spool/lpd:  sync:*:5:0:sync:/sbin:/bin/sync  shutdown:*:6:11:shutdown:/sbin:/sbin/shutdown  halt:*:7:0:halt:/sbin:/sbin/halt  mail:*:8:12:mail:/var/spool/mail:  news:*:9:13:news:/var/spool/news:  uucp:*:10:14:uucp:/var/spool/uucp:  operator:*:11:0:operator:/root:  games:*:12:100:games:/usr/games:  gopher:*:13:30:gopher:/usr/lib/gopher-data:  ftp:*:14:50:FTP User:/home/ftp:  man:*:15:15:Manuals Owner:/:  nobody:*:65534:65534:Nobody:/:/bin/false

Both lines containing bash were deleted from passwd.test (the root line and the col line).

As I had mentioned earlier, it is a good idea to use single quotes around all regular expressions. In this example, I enclosed in single quotes the pattern for which I was searching and the command to execute.

What if you wanted to delete all lines except those that contain bash? You would insert an exclamation mark before the d to delete all lines except those that contain bash, as shown in the following example:

# cat passwd.test  root:PgYQCkVH65hyQ:0:0:root:/root:/bin/bash  bin:*:1:1:bin:/bin:  daemon:*:2:2:daemon:/sbin:  adm:*:3:4:adm:/var/adm:  lp:*:4:7:lp:/var/spool/lpd:  sync:*:5:0:sync:/sbin:/bin/sync  shutdown:*:6:11:shutdown:/sbin:/sbin/shutdown  halt:*:7:0:halt:/sbin:/sbin/halt  mail:*:8:12:mail:/var/spool/mail:  news:*:9:13:news:/var/spool/news:  uucp:*:10:14:uucp:/var/spool/uucp:  operator:*:11:0:operator:/root:  games:*:12:100:games:/usr/games:  gopher:*:13:30:gopher:/usr/lib/gopher-data:  ftp:*:14:50:FTP User:/home/ftp:  man:*:15:15:Manuals Owner:/:  nobody:*:65534:65534:Nobody:/:/bin/false  col:Wh0yzfAV2qm2Y:100:100:Caldera OpenLinux User:/home/col:/bin/bash  #  # sed  /bash/ !d  passwd.test  root:PgYQCkVH65hyQ:0:0:root:/root:/bin/bash  col:Wh0yzfAV2qm2Y:100:100:Caldera OpenLinux User:/home/col:/bin/bash

This results in all but the two lines containing bash to be deleted from passwd.test.

Now that we have seen how to display and delete specific lines of the file, let's see how to add three lines to the end of the file:

# sed  $a\  > This is a backup of passwd file\  > for viewing purposes only\  > so do not modify  passwd.test  root:PgYQCkVH65hyQ:0:0:root:/root:/bin/bash  bin:*:1:1:bin:/bin:  daemon:*:2:2:daemon:/sbin:  adm:*:3:4:adm:/var/adm:  lp:*:4:7:lp:/var/spool/lpd:  sync:*:5:0:sync:/sbin:/bin/sync  shutdown:*:6:11:shutdown:/sbin:/sbin/shutdown  halt:*:7:0:halt:/sbin:/sbin/halt  mail:*:8:12:mail:/var/spool/mail:  news:*:9:13:news:/var/spool/news:  uucp:*:10:14:uucp:/var/spool/uucp:  operator:*:11:0:operator:/root:  games:*:12:100:games:/usr/games:  gopher:*:13:30:gopher:/usr/lib/gopher-data:  ftp:*:14:50:FTP User:/home/ftp:  man:*:15:15:Manuals Owner:/:  nobody:*:65534:65534:Nobody:/:/bin/false  col:Wh0yzfAV2qm2Y:100:100:Caldera OpenLinux User:/home/col:/bin/bash  This is a backup of passwd file  for viewing purposes only  so do not modify

The backslashes (\) are used liberally in this example. Each backslash represents a new line. We go to the end of the file, as designated by the $, then we add a new line with the backslash, and then add the text we wish and a new line after the text. These lines are great to add to the end of the file, but we really should add them to the beginning of the file. The following example shows this approach:

# sed  li\  > This is a backup passwd file\  > for viewing purposes only\  > so do not modify\  >   passwd.test  This is a backup passwd file  for viewing purposes only  so do not modify  root:PgYQCkVH65hyQ:0:0:root:/root:/bin/bash  bin:*:1:1:bin:/bin:  daemon:*:2:2:daemon:/sbin:  adm:*:3:4:adm:/var/adm:  lp:*:4:7:lp:/var/spool/lpd:  sync:*:5:0:sync:/sbin:/bin/sync  shutdown:*:6:11:shutdown:/sbin:/sbin/shutdown  halt:*:7:0:halt:/sbin:/sbin/halt  mail:*:8:12:mail:/var/spool/mail:  news:*:9:13:news:/var/spool/news:  uucp:*:10:14:uucp:/var/spool/uucp:  operator:*:11:0:operator:/root:  games:*:12:100:games:/usr/games:  gopher:*:13:30:gopher:/usr/lib/gopher-data:  ftp:*:14:50:FTP User:/home/ftp:  man:*:15:15:Manuals Owner:/:  nobody:*:65534:65534:Nobody:/:/bin/false  col:Wh0yzfAV2qm2Y:100:100:Caldera OpenLinux User:/home/col:/bin/bash

graphics/06icon02.gif

First, we run sed, specifying that on line one we are going to begin inserting the text shown. We use the single quote immediately following sed and use another single quote on the last line when we are done specifying all the information, except for the input file, which is passwd.test.

We have only scratched the surface of commands you can use with sed. The following sed summary includes the commands we have used (p for print; d for delete; and a for add), as well as others that were not part of the examples.

graphics/06icon02.gif

sed - Stream editor.

Commands
	a	Appendtext.
	b	Branchtoalabel.
	c	Replace lines with text.
	d	Delete the current text buffer.
	D	Delete the first line of the current text buffer.
	g	Paste overwriting contents of the hold space.
	G	Paste the hold space below the address rather than overwriting it.
	h	Copy the pattern space into hold space.
	H	Append the contents of pattern space into hold space.
	i	Insert text.
	l	List the contents of the pattern space.
	n	Read the next line of input into the pattern space.
	N	Append next line of input to pattern space.
	p	Print the pattern space.
	P	Print from the start of the pattern space up to and including new line.
	q	Quit when address is encountered.
	r	Read in a file.
	s	Substitute patterns.
	t	Branch if substitution has been made to the current pattern space.
	w	Append the contents of the pattern space to the specified file.
	x	Interchange the contents of the holding area and pattern space.
	y	Translate characters.

awk

graphics/06icon01.gif

graphics/05icon13.gif

graphics/05icon12.gif

graphics/05icon14.gif

awk can pretty much do it all. With awk, you can search, modify files, generate reports, and a lot more. awk performs these tasks by searching for patterns in lines of input (from standard input or from a file). For each line that matches the specified pattern, it can perform some very complex processing on that line. The code to actually process matching lines of input is a cross between a shell script and a C program.

Data manipulation tasks that would be very complex with combinations of grep, cut, and paste are very easily done with awk. Because awk is a programming language, it can also perform mathematical operations or check the input very easily (shells don't do math very well). It can even do floating-point math (shells deal only with integers and strings).

The basic form of an awk program looks like this:

awk  /pattern_to_match/ {prog to run}  input_file_names

Notice that the whole program is enclosed in single quotes. If no input file names are specified, awk reads from standard input (as from a pipe).

The pattern_to_match must appear between the / characters. The pattern is actually a regular expression. Regular expressions were covered earlier in this chapter. Some common regular expression examples are included shortly.

The program to execute is written in awk code, which looks something like C. The program is executed whenever a line of input matches the pattern_to_match. If /pattern_to_match/ does not precede the program in {}, then the program is executed for every line of input.

awk works with fields of the input lines. Fields are words separated by white space or some other field separator. awk uses white space as a field separator by default. You can use the -F option to specify the field separator, as shown in a later example. The fields in awk patterns and programs are referenced with $, followed by the field number. For example, the second field of an input line is $2. If you are using an awk command in your shell programs, the fields ($1, $2, etc.) are not confused with the shell script's positional parameters because the awk variables are enclosed in single, causing the shell to ignore them.

graphics/06icon01.gif

You really need to see some examples of using awk to appreciate its power. The following few examples use a file called newfiles, which contains a list of files on a system less than 15 days old. This file is generated as part of a system administration audit program that checks various aspects of a UNIX system. The following shows the contents of newfiles:

# cat newfiles  PROG>>>>> report of files not older than 14 days by find  the file system is /  -rw-r--r--   1 root       root         567 Dec  7 07:16 ./etc/mnttab  -rw-r--r--   1 root       root       20713 Dec  7 07:18 ./etc/rc.log  -rw-r--r--   1 root       root           0 Dec  7 07:17 ./etc/hpC2400/hparray.map  -rw-r--r--   1 root       root           0 Dec  7 07:17 ./etc/hpC2400/hparray.devs  -rw-r--r--   1 root       root           0 Dec  7 07:17 ./etc/hpC2400/hparray.luns  -rw-r--r--   1 root       root           0 Dec  7 07:17 ./etc/hpC2400/hparray.addr  -r-s------   1 root       root           0 Dec  7 07:17 ./etc/hpC2400/pscan.lock  -r-s------   1 root       root           0 Dec  7 07:17 ./etc/hpC2400/monitor.lock  -rw-r--r--   1 root       root       14299 Dec  7 07:17 ./etc/hpC2400/HPARRAY.INFO  -rw-r--r--   1 bin        bin         8553 Dec  7 07:02 ./etc/shutdownlog  -rw-r--r--   1 root       mail       32768 Dec  7 07:16 ./etc/mail/aliases.db  -rw-r--r--   1 root       mail          33 Dec  7 07:16 ./etc/mail/sendmail.pid  -rw-r--r--   1 root       root          13 Dec  7 07:16 ./etc/opt/dce/boot_time  -rw-r--r--   1 root       root         720 Dec  7 13:34 ./etc/utmp  -rw-r--r--   1 root       root           0 Dec  7 07:16 ./etc/xtab  -rw-r--r--   1 root       root           0 Dec  7 07:18 ./etc/rmtab  -rw-r--r--   1 root       root       40814 Dec  7 07:15 ./etc/rc.log.old  -rw-r--r--   1 root       root        4620 Dec  7 13:34 ./etc/utmpx  -rw-r--r--   1 root       root           9 Dec  7 13:17 ./etc/ntp.drift  -rw-r--r--   1 root       root         616 Dec  7 07:15 ./etc/auto_parms.log  -rw-r--r--   1 root       sys          219 Dec  7 07:00 ./etc/auto_parms.log.old  -rw-rw-rw-   1 root       sys          520 Nov 23 12:37 ./.sw/sessions/swlist.last  -r--r--r--   1 root       informix      76 Dec  7 07:17 ./INFORMIXTMP/.inf.shmPSREP  -r--r--r--   1 root       informix      76 Dec  7 07:18 ./INFORMIXTMP/.inf.shmPSDEV  -rw-------   1 autosys    autosys     4052 Nov 25 14:08 ./home/autosys/.sh_history  -rw-------   1 tsaxs      users       2228 Dec  1 13:15 ./home/tsaxs/.sh_history  -rw-------   1 tsfxo      users       2862 Nov 24 10:08 ./home/tsfxo/.sh_history  PROG>>>>> report of files not older than 14 days by find  the file system is /usr  -rw-rw-rw-   1 opop6      users         21 Dec  7 13:46 ./local/adm/etc/lmonitor.hst  -rw-r--r--   1 tsgjf      users       1093 Dec  7 13:17 ./local/flexlm/licenses/license.log  PROG>>>>> report of files not older than 14 days by find  the file system is /opt  -rw-rw-r--   1 bin        bin          200 Dec  7 07:17 ./pred/bin/OPSDBPF  -rw-r--r--   1 root       sys       800028 Dec  7 07:17 ./pred/bin/PSRNLOGD  PROG>>>>> report of files not older than 14 days by find  the file system is /var  -rw-r--r--   1 root       sys        45089 Dec  7 07:16 ./adm/sw/swagentd.log  -rw-rw-rw-   1 root       sys           56 Dec  7 07:16 ./adm/sw/sessions/swlist.last  -rw-rw-r--   1 root       root       12236 Dec  7 07:16 ./adm/ps_data  -rw-r--r--   1 root       root          65 Dec  7 07:17 ./adm/cron/log  -rw-r--r--     root       root         162 Dec  7 07:00 ./adm/cron/OLDlog  -r--r--r--   1 root       root      734143 Dec  7 07:16 ./adm/syslog/mail.log  -rw-r--r--   1 root       root       65743 Dec  7 13:56 ./adm/syslog/syslog.log  -rw-r--r--   1 root       root     4924974 Dec  7 07:02 ./adm/syslog/OLDsyslog.log  -rw-rw-r--   1 adm        adm      2750700 Dec  7 13:52 ./adm/wtmp  -rw-------   1 root       other     145920 Dec  3 14:36 ./adm/btmp  -rw-r--r--   1 lp         lp            33 Dec  7 07:17 ./adm/lp/log  -rw-r--r--   1 lp         lp            67 Dec  7 07:01 ./adm/lp/oldlog  -rw-r--r--   1 root       root        4330 Dec  7 07:18 ./adm/diag/device_table  -rw-r--r--   1 root       root          34 Dec  7 07:18 ./adm/diag/misc_sys_data  -rwxr-xr-x   1 root       root      995368 Nov 22 15:16 ./adm/diag/LOG0190  -rwxr-xr-x   1 root       root      995368 Nov 23 02:05 ./adm/diag/LOG0191  -rwxr-xr-x   1 root       root      453964 Nov 23 07:01 ./adm/diag/LOG0192  -rwxr-xr-x   1 root       root      970448 Nov 23 18:35 ./adm/diag/LOG0193  -rwxr-xr-x   1 root       root      995368 Nov 24 05:24 ./adm/diag/LOG0194  -rwxr-xr-x   1 root       root      995368 Nov 24 16:14 ./adm/diag/LOG0195  -rwxr-xr-x   1 root       root      995368 Nov 25 03:03 ./adm/diag/LOG0196  -rwxr-xr-x   1 root       root      995368 Nov 25 13:52 ./adm/diag/LOG0197  -rwxr-xr-x   1 root       root      995368 Nov 26 00:41 ./adm/diag/LOG0198  -rwxr-xr-x   1 root       root      995368 Nov 26 11:31 ./adm/diag/LOG0199  -rwxr-xr-x   1 root       root      995368 Nov 26 22:20 ./adm/diag/LOG0200  -rwxr-xr-x   1 root       root      995368 Nov 27 09:09 ./adm/diag/LOG0201  -rwxr-xr-x   1 root       root      995368 Nov 27 19:58 ./adm/diag/LOG0202  -rwxr-xr-x   1 root       root      995368 Nov 28 06:48 ./adm/diag/LOG0203  -rwxr-xr-x   1 root       root      995368 Nov 28 17:37 ./adm/diag/LOG0204  -rwxr-xr-x   1 root       root      995368 Nov 29 04:26 ./adm/diag/LOG0205  -rwxr-xr-x   1 root       root      995368 Nov 29 15:16 ./adm/diag/LOG0206  -rwxr-xr-x   1 root       root      995368 Nov 30 02:05 ./adm/diag/LOG0207  -rwxr-xr-x   1 root       root      452020 Nov 30 06:59 ./adm/diag/LOG0208  -rwxr-xr-x   1 root       root      970448 Nov 30 18:35 ./adm/diag/LOG0209  -rwxr-xr-x   1 root       root      995368 Dec  1 05:24 ./adm/diag/LOG0210  -rwxr-xr-x   1 root       root      995368 Dec  1 16:13 ./adm/diag/LOG0211  -rwxr-xr-x   1 root       root      995368 Dec  2 03:03 ./adm/diag/LOG0212  -rwxr-xr-x   1 root       root      995368 Dec  2 13:52 ./adm/diag/LOG0213  -rwxr-xr-x   1 root       root      995368 Dec  3 00:41 ./adm/diag/LOG0214  -rwxr-xr-x   1 root       root      995368 Dec  3 11:31 ./adm/diag/LOG0215  -rwxr-xr-x   1 root       root      995368 Dec  3 22:20 ./adm/diag/LOG0216  -rwxr-xr-x   1 root       root      995368 Dec  4 09:09 ./adm/diag/LOG0217  -rwxr-xr-x   1 root       root      995368 Dec  4 19:58 ./adm/diag/LOG0218  -rwxr-xr-x   1 root       root      995368 Dec  5 06:48 ./adm/diag/LOG0219  -rwxr-xr-x   1 root       root      995368 Dec  5 17:37 ./adm/diag/LOG0220  -rwxr-xr-x   1 root       root      995368 Dec  6 04:26 ./adm/diag/LOG0221  -rwxr-xr-x   1 root       root      995368 Dec  6 15:15 ./adm/diag/LOG0222  -rwxr-xr-x   1 root       root      995368 Dec  7 02:05 ./adm/diag/LOG0223  -rwxr-xr-x   1 root       root      453964 Dec  7 07:00 ./adm/diag/LOG0224  -rwxr-xr-x   1 root       root      543740 Dec  7 13:57 ./adm/diag/LOG0225  -rw-r--r--   1 root       root       19587 Dec  7 07:16 ./adm/ptydaemonlog  -rw-r--r--   1 root       root          52 Dec  7 07:16 ./adm/conslog.opts  -rw-r--r--   1 root       root           0 Dec  7 07:16 ./adm/rpc.statd.log  -rw-r--r--   1 root       root           0 Dec  7 07:16 ./adm/rpc.lockd.log  -rw-r--r--   1 root       root       24250 Dec  7 07:16 ./adm/vtdaemonlog  -rw-------   1 root       root         214 Dec  7 12:07 ./adm/sulog  -rw-------   1 root       root         381 Dec  3 17:34 ./adm/OLDsulog  -rw-r--r--   1 root       sys          145 Dec  7 07:16 ./adm/rbootd.log  -rw-------   1 sysadm     psoft         60 Dec  1 16:59 ./tmp/EAAa09057  -rw-r--r--   1 tsgjf      users          0 Dec  7 13:17 ./tmp/lockHPCUPLANGS  -rw-r--r--   1 tsgjf      users        175 Dec  7 06:40 ./tmp/.flexlm/lmgrd.1507  -rw-r--r--   1 tsgjf      users        175 Dec  7 13:28 ./tmp/.flexlm/lmgrd.1505  -rw-r--r--   1 lp         lp             0 Dec  7 07:17 ./spool/lp/outputq  -rw-rw-rw-   1 lp         lp             4 Dec  7 07:17 ./spool/lp/SCHEDLOCK  -rw-------   1 root       sys            0 Nov 23 07:00 ./spool/cron/tmp/croutAAAa01030  -rw-------   1 root       sys            0 Nov 30 07:00 ./spool/cron/tmp/croutAAAa01039  -rw-------   1 root       sys            0 Dec  7 07:00 ./spool/cron/tmp/croutAAAb01039  -rw-r--r--   1 root       root           4 Dec  7 07:16 ./run/syslog.pid  -rw-r--r--   1 root       root           4 Dec  7 07:16 ./run/gated.pid  -rw-r--r--   1 root       sys          145 Dec  7 07:16 ./run/gated.version  -rw-r--r--   1 root       sys            3 Dec  7 07:16 ./statmon/state  -rw-r--r--   1 root       root       29771 Dec  7 07:16 ./opt/dce/config/dce_config.log  -rw-r--r--   1 root       sys           74 Dec  7 07:16 ./opt/dce/rpc/local/00404/srvr_socks  -rw-r--r--   1 root       root          72 Dec  7 07:16 ./opt/dce/rpc/local/00927/srvr_socks  -rw-r--r--   1 root       root       32768 Dec  7 07:16 ./opt/dce/dced/Ep.db  -rw-r--r--   1 root       root       32768 Dec  7 07:20 ./opt/dce/dced/Llb.db  -rw-r--r--   1 root       root           0 Nov 30 07:16 ./opt/perf/status.ttd  -rw-r--r--   1 root       root          33 Dec  7 07:17 ./opt/perf/datafiles/RUN  -rwxrwxrwx   1 root       sys      9243180 Dec  7 13:55 ./opt/perf/datafiles/logappl  -rwxrwxrwx   1 root       sys      8697612 Dec  7 13:55 ./opt/perf/datafiles/logdev  -rwxrwxrwx   1 root       sys      9195152 Dec  7 13:55 ./opt/perf/datafiles/logglob  -rwxrwxrwx   1 root       sys        11112 Dec  7 07:17 ./opt/perf/datafiles/logindx  -rwxrwxrwx   1 root       sys     17639080 Dec  7 13:57 ./opt/perf/datafiles/logproc  -rwxrwxrwx   1 root       sys         3797 Dec  7 07:17 ./opt/perf/datafiles/mikslp.data  -rw-rw-rw-   1 root       sys          105 Nov 30 10:45 ./opt/perf/datafiles/agdb  -rw-r--r--   1 root       root           5 Dec  7 07:17 ./opt/perf/datafiles/.perflbd.pid  -rw-rw-rw-   1 root       sys        21176 Dec  7 07:20 ./opt/perf/status.scope  -rw-rw-rw-   1 root       root           5 Nov 30 07:16 ./opt/perf/ttd.pid  -rw-r--r--   1 root       root           0 Dec  7 07:17 ./opt/perf/status.mi  -rw-rw-rw-   1 root       sys         8254 Dec  7 07:17 ./opt/perf/status.perflbd  -rw-rw-rw-   1 root       sys        21507 Dec  7 07:20 ./opt/perf/status.rep_server  -rw-rw-rw-   1 root       sys        24570 Dec  7 07:20 ./opt/perf/status.alarmgen  -rw-rw-rw-   1 root       sys       160956 Dec  6 21:13 ./opt/omni/log/inet.log  -rw-rw-rw-   1 root       sys       158796 Dec  7 07:17 ./sam/log/samlog  -rw-r--r--   1 root       root       64730 Dec  7 07:17 ./sam/boot.config  -rw-rw-rw-   1 root       sys        11906 Nov 24 14:27 ./sam/poe.iout  -rw-rw-rw-   1 root       sys        11906 Nov 23 09:10 ./sam/poe.iout.old  -rw-rw-rw-   1 root       sys           29 Nov 24 14:27 ./sam/poe.dion

You can see that this file contains several fields separated by white space. The next example evaluates the third field to determine whether it equals "adm," and if so, the line is printed:

# awk  $3 == "adm" {print}  newfiles  -rw-rw-r--   1 adm        adm      2750700 Dec  7 13:52 ./adm/wtmp

There is precisely one line that contains exactly "adm" in the third field.

The next example evaluates the third field to determine whether it approximately equals "adm," meaning that the third field has "adm" embedded in it, and if so, the line is printed:

# awk  $3 ~ "adm" {print}  newfiles  -rw-rw-r--   1 adm        adm      2750700 Dec  7 13:52 ./adm/wtmp  -rw-------   1 sysadm     psoft         60 Dec  1 16:59 ./tmp/EAAa09057

This result prints the line from the last example, which has "adm" in the third field as well as a line that contains "sysadm."

The next example performs the same search as the previous example; however, this time only fields nine and five are printed:

# awk  $3 ~ "adm" {print $9, $5}  newfiles  ./adm/wtmp 2750700  ./tmp/EAAa09057 60

This time only the name of the file, field nine, and the size of the file are printed.

The next example evaluates the third field to determine if it does not equal "root," and if so, prints the entire line:

# awk  $3 != "root" {print}  newfiles  PROG>>>>> report of files not older than 14 days by find  the file system is /  -rw-r--r--   1 bin        bin           8553 Dec  7 07:02 ./etc/shutdownlog  -rw-------   1 autosys   autosys     4052 Nov 25 14:08 ./home/autosys/.sh_history  -rw-------   1 tsaxs      users        2228 Dec  1 13:15 ./home/tsaxs/.sh_history  -rw-------   1 tsfxo      users        2862 Nov 24 10:08 ./home/tsfxo/.sh_history  PROG>>>>> report of files not older than 14 days by find  the file system is /usr  -rw-rw-rw-  1 opop6    users         21 Dec  7 13:46 ./local/adm/etc/lmonitor.hst  -rw-r--r--   1 tsgjf     users          1093 Dec  7 13:17  ./local/flexlm/licenses/license.log  PROG>>>>> report of files not older than 14 days by find  the file system is /opt  -rw-rw-r--   1 bin        bin            200 Dec  7 07:17 ./pred/bin/OPSDBPF  PROG>>>>> report of files not older than 14 days by find  the file system is /var  -rw-rw-r--   1 adm        adm        2750700 Dec  7 13:52 ./adm/wtmp  -rw-r--r--   1 lp         lp              33 Dec  7 07:17 ./adm/lp/log  -rw-r--r--   1 lp         lp              67 Dec  7 07:01 ./adm/lp/oldlog  -rw-------   1 sysadm     psoft           60 Dec  1 16:59 ./tmp/EAAa09057  -rw-r--r--   1 tsgjf      users            0 Dec  7 13:17 ./tmp/lockHPCUPLANGS  -rw-r--r--   1 tsgjf     users         175 Dec  7 06:40 ./tmp/.flexlm/lmgrd.1507  -rw-r--r--   1 tsgjf     users         175 Dec  7 13:28 ./tmp/.flexlm/lmgrd.1505  -rw-r--r--   1 lp         lp               0 Dec  7 07:17 ./spool/lp/outputq  -rw-rw-rw-   1 lp         lp               4 Dec  7 07:17 ./spool/lp/SCHEDLOCK

This command results in many lines being printed that do not have "root" in the third field.

newfiles had whitespace to separate the fields. We don't often have this luxury in the UNIX world. The upcoming examples use passwd.test, which has a colon(:) as a field separator. passwd.test is shown below:

# cat passwd.test  root:PgYQCkVH65hyQ:0:0:root:/root:/bin/bash  bin:*:1:1:bin:/bin:  daemon:*:2:2:daemon:/sbin:  adm:*:3:4:adm:/var/adm:  lp:*:4:7:lp:/var/spool/lpd:  sync:*:5:0:sync:/sbin:/bin/sync  shutdown:*:6:11:shutdown:/sbin:/sbin/shutdown  halt:*:7:0:halt:/sbin:/sbin/halt  mail:*:8:12:mail:/var/spool/mail:  news:*:9:13:news:/var/spool/news:  uucp:*:10:14:uucp:/var/spool/uucp:  operator:*:11:0:operator:/root:  games:*:12:100:games:/usr/games:  gopher:*:13:30:gopher:/usr/lib/gopher-data:  ftp:*:14:50:FTP User:/home/ftp:  man:*:15:15:Manuals Owner:/:  nobody:*:65534:65534:Nobody:/:/bin/false  col:Wh0yzfAV2qm2Y:100:100:Caldera OpenLinux User:/home/col:/bin/bash

You can specify the field separator with the -F option followed by a separator, which is a colon(:) in passwd.test. The following example specifies the field separator and then evaluates the first field to determine whether it equals "root," and if so, prints out the entire line:

# awk -F:  $1 == "root" {print}  passwd.test  root:PgYQCkVH65hyQ:0:0:root:/root:/bin/bash

The following example specifies the field separator and then evaluates the fourth field to determine whether it equals "0," which means the user is a member of the same group as "root", and if so, prints out the entire line:

# awk -F:  $4 == "0" {print}  passwd.test  root:PgYQCkVH65hyQ:0:0:root:/root:/bin/bash  sync:*:5:0:sync:/sbin:/bin/sync  halt:*:7:0:halt:/sbin:/sbin/halt  operator:*:11:0:operator:/root:

graphics/06icon01.gif

You can perform many types of comparisons besides == using awk. The following examples show the use of several comparison operators on our trusty passwd.test file. The first example prints all users who are in a group with a value less than 14:

# awk -F:  $4 < 14 {print}  passwd.test  root:PgYQCkVH65hyQ:0:0:root:/root:/bin/bash  bin:*:1:1:bin:/bin:  daemon:*:2:2:daemon:/sbin:  adm:*:3:4:adm:/var/adm:  lp:*:4:7:lp:/var/spool/lpd:  sync:*:5:0:sync:/sbin:/bin/sync  shutdown:*:6:11:shutdown:/sbin:/sbin/shutdown  halt:*:7:0:halt:/sbin:/sbin/halt  mail:*:8:12:mail:/var/spool/mail:  news:*:9:13:news:/var/spool/news:  operator:*:11:0:operator:/root:

The next example prints all users who are in a group with a value less than or equal to 14:

# awk -F:  $4 <= 14 {print}  passwd.test  root:PgYQCkVH65hyQ:0:0:root:/root:/bin/bash  bin:*:1:1:bin:/bin:  daemon:*:2:2:daemon:/sbin:  adm:*:3:4:adm:/var/adm:  lp:*:4:7:lp:/var/spool/lpd:  sync:*:5:0:sync:/sbin:/bin/sync  shutdown:*:6:11:shutdown:/sbin:/sbin/shutdown  halt:*:7:0:halt:/sbin:/sbin/halt  mail:*:8:12:mail:/var/spool/mail:  news:*:9:13:news:/var/spool/news:  uucp:*:10:14:uucp:/var/spool/uucp:  operator:*:11:0:operator:/root:

Let's now print all users who are in a group that does not have a value of 14:

# awk -F:  $4 != 14 {print}  passwd.test  root:PgYQCkVH65hyQ:0:0:root:/root:/bin/bash  bin:*:1:1:bin:/bin:  daemon:*:2:2:daemon:/sbin:  adm:*:3:4:adm:/var/adm:  lp:*:4:7:lp:/var/spool/lpd:  sync:*:5:0:sync:/sbin:/bin/sync  shutdown:*:6:11:shutdown:/sbin:/sbin/shutdown  halt:*:7:0:halt:/sbin:/sbin/halt  mail:*:8:12:mail:/var/spool/mail:  news:*:9:13:news:/var/spool/news:  operator:*:11:0:operator:/root:  games:*:12:100:games:/usr/games:  gopher:*:13:30:gopher:/usr/lib/gopher-data:  ftp:*:14:50:FTP User:/home/ftp:  man:*:15:15:Manuals Owner:/:  nobody:*:65534:65534:Nobody:/:/bin/false  col:Wh0yzfAV2qm2Y:100:100:Caldera OpenLinux User:/home/col:/bin/bash

Let's now print all users who are in a group with a value greater than or equal to 14:

# awk -F:  $4 >= 14 {print}  passwd.test  uucp:*:10:14:uucp:/var/spool/uucp:  games:*:12:100:games:/usr/games:  gopher:*:13:30:gopher:/usr/lib/gopher-data:  ftp:*:14:50:FTP User:/home/ftp:  man:*:15:15:Manuals Owner:/:  nobody:*:65534:65534:Nobody:/:/bin/false  col:Wh0yzfAV2qm2Y:100:100:Caldera OpenLinux User:/home/col:/bin/bash

The last example shows all users who are in a group with a value greater than 14:

# awk -F:  $4 > 14 {print}  passwd.test  games:*:12:100:games:/usr/games:  gopher:*:13:30:gopher:/usr/lib/gopher-data:  ftp:*:14:50:FTP User:/home/ftp:  man:*:15:15:Manuals Owner:/:  nobody:*:65534:65534:Nobody:/:/bin/false  col:Wh0yzfAV2qm2Y:100:100:Caldera OpenLinux User:/home/col:/bin/bash

graphics/06icon01.gif

There is much more to awk than what I covered in this section. There are additional awk examples in the shell programming chapter.

The following table summarizes some of the comparison operators of awk covered in this section:

awk - Search a line for a specified pattern and perform operation(s).

Comparison operators:
	<	Less than.
	<=	Less than or equal to.
	==	Equal to.
	~	Strings match.
	!=	Not equal to.
	>=	Greater than or equal to.
	>	Greater than.

grep

graphics/05icon13.gif

Here in the information age, we have too much information. We are constantly trying to extract the information we are after from stacks of information. The grep command is used to search for text and display it. grep stands for General Regular Expression Parser. Let's first look at a few simple searches and display the output with grep. Figure 6-1 shows creating a long listing for /home/denise, and using grep, we search for patterns.

Figure 6-1. grep Command

graphics/06fig01.gif

First, we search for the pattern netscape. This produces a list of files, all of which begin with.netscape.

Next we use the -c option to create a count for the number of times that netscape is found. The result is 6.

graphics/05icon13.gif

Do you think that grep is case-sensitive? The next example shows searching for the pattern netscape, and no matching patterns exist.

Using the -i option causes grep to ignore uppercase and lower case and just search for the pattern, and again, all the original matches are found.

Also, more than one pattern can be searched for. Using the -F option, both netscape and.c are searched for and a longer list of matches are found. Notice that two patterns to search for are enclosed in double quotes and are separated by a new line.

Let's now take a look at a couple of more advanced searches using grep. We'll use the passwd.test file as the basis for our searches because each line in it contains a lot of information. To start, the following is the contents of the passwd.test file on a Linux system:

# cat passwd.test  root:PgYQCkVH65hyQ:0:0:root:/root:/bin/bash  bin:*:1:1:bin:/bin:  daemon:*:2:2:daemon:/sbin:  adm:*:3:4:adm:/var/adm:  lp:*:4:7:lp:/var/spool/lpd:  sync:*:5:0:sync:/sbin:/bin/sync  shutdown:*:6:11:shutdown:/sbin:/sbin/shutdown  halt:*:7:0:halt:/sbin:/sbin/halt  mail:*:8:12:mail:/var/spool/mail:  news:*:9:13:news:/var/spool/news:  uucp:*:10:14:uucp:/var/spool/uucp:  operator:*:11:0:operator:/root:  games:*:12:100:games:/usr/games:  gopher:*:13:30:gopher:/usr/lib/gopher-data:  ftp:*:14:50:FTP User:/home/ftp:  man:*:15:15:Manuals Owner:/:  nobody:*:65534:65534:Nobody:/:/bin/false  col:Wh0yzfAV2qm2Y:100:100:Caldera OpenLinux      User:/home/col:/bin/bash

We can search for a string in the password file just as we did in the earlier grep example. The following example searches for news in the passwd.test file:

graphics/05icon13.gif

# grep news passwd.test  news:*:9:13:news:/var/spool/news:

Now let's check to see whether there is a user named bin in the passwd.test file. In orderforausernamedbin to have an entry in the passwd.test file, the user name, in this case bin, would be the first entry in the line. Here is the result of searching for this user:

# grep bin passwd.test  root:PgYQCkVH65hyQ:0:0:root:/root:/bin/bash  bin:*:1:1:bin:/bin:  daemon:*:2:2:daemon:/sbin:  sync:*:5:0:sync:/sbin:/bin/sync  shutdown:*:6:11:shutdown:/sbin:/sbin/shutdown  halt:*:7:0:halt:/sbin:/sbin/halt  nobody:*:65534:65534:Nobody:/:/bin/false  col:Wh0yzfAV2qm2Y:100:100:Caldera OpenLinux  User:/home/col:/bin/bash

Many lines from passwd.test are indeed produced that contain the string bin; however, we have to search through these lines in order to find the user bin, which is the line in which bin is the first string that appears. This is more than we wanted when we initiated our search. We wanted to see a user name bin that would appear at the beginning of a line. We can further qualify our search, in this case to limit the search to a string at the beginning of a line, by using pattern matching discussed at the beginning of this chapter (see the Table 6-1.) In this case, we want to search only at the beginning of a line for bin, so we'll qualify our search with a caret (^) to restrict the search to only the beginning of the line, as shown in the following example:

# grep ^bin passwd.test  bin:*:1:1:bin:/bin:

This search results in exactly the information in which we are interested, that is, a line beginning with bin. When using special characters, such as the caret(^) in this example, you should enclose the special characters in single quotes ('). Special characters may be interpreted by the shell and cause problems with the arguments we're trying to sendtogrep. Enclosing the search pattern in single quotes will ensures that the search pattern, in this case ^bin, is passed directly to grep. The search pattern in single quotes looks like the following:

graphics/05icon13.gif

# grep  ^bin  passwd.test  bin:*:1:1:bin:/bin:

Because we are going to have to search for this line in the passwd.test file after we find it, we may as well print out the line number as well as the line itself by using the -n option, as shown in the following example:

# grep -n  ^bin  passwd.test  2:bin:*:1:1:bin:/bin:

The following is a summary of the grep command:

grep - Search for text and display results.

Options
	-c	Return the number of matches without showing you the text.
	-h	Show the text with no reference to file names.
	-i	Ignore the case when searching.
	-l	Return the names of files containing a match without showing you the text.
	-n	Return the line number of the text searched for in a file as well as the text itself.
	-v	Return the lines that do not match the text you searched for.
	-E	Search for more than one expression (same as egrep).
	-F	Search for more than one expression (same as fgrep).

Manual Pages for Some Commands Used in Chapter 6

The following are the HP-UX manual pages for many of the commands used in the chapter. Commands often differ among UNIX variants, so you may find differences in the options or other areas for some commands; however, the following manual pages serve as an excellent reference.

awk

graphics/06icon01.gif

awk - Pattern-processing language.

awk(1) awk(1) NAME awk - pattern-directed scanning and processing language SYNOPSIS awk [-Ffs] [-v var=value] [program | -f progfile ...] [file ...] DESCRIPTION awk scans each input file for lines that match any of a set of patterns specified literally in program or in one or more files specified as -f progfile. With each pattern there can be an associated action that is to be performed when a line in a file matches the pattern. Each line is matched against the pattern portion of every pattern-action statement, and the associated action is performed for each matched pattern. The file name - means the standard input. Any file of the form var=value is treated as an assignment, not a filename. An assignment is evaluated at the time it would have been opened if it were a filename, unless the -v option is used. An input line is made up of fields separated by white space, or by regular expression FS. The fields are denoted $1, $2, ...; $0 refers to the entire line. Options awk recognizes the following options and arguments: -F fs Specify regular expression used to separate fields. The default is to recognize space and tab characters, and to discard leading spaces and tabs. If the -F option is used, leading input field separators are no longer discarded. -f progfile Specify an awk program file. Up to 100 program files can be specified. The pattern-action statements in these files are executed in the same order as the files were specified. -v var=value Cause var=value assignment to occur before the BEGIN action (if it exists) is executed. Statements A pattern-action statement has the form: pattern { action } A missing { action } means print the line; a missing pattern always matches. Pattern-action statements are separated by new-lines or semicolons. An action is a sequence of statements. A statement can be one of the following: if(expression) statement [else statement] while(expression) statement for(expression;expression;expression) statement for(var in array) statement do statement while(expression) break continue {[statement ...]} expression # commonly var=expression print[expression-list] [> expression] printf format [, expression-list] [> expression] return [expression] next # skip remaining patterns on this input line. delete array [expression] # delete an array element. exit [expression] # exit immediately; status is expression. Statements are terminated by semicolons, newlines or right braces. An empty expression-list stands for $0. String constants are quoted (""), with the usual C escapes recognized within. Expressions take on string or numeric values as appropriate, and are built using the operators +, -, *, /, %, ^ (exponentiation), and concatenation (indicated by a blank). The operators ++, --, +=, -=, *=, /=, %=, ^=, **=, >, >=, <, <=, ==, !=, and ?: are also available in expressions. Variables can be scalars, array elements (denoted x[i]) or fields. Variables are initialized to the null string. Array subscripts can be any string, not necessarily numeric (this allows for a form of associative memory). Multiple subscripts such as [i,j,k] are permitted. The constituents are concatenated, separated by the value of SUBSEP. The print statement prints its arguments on the standard output (or on a file if >file or >>file is present or on a pipe if |cmd is present), separated by the current output field separator, and terminated by the output record separator. file and cmd can be literal names or parenthesized expressions. Identical string values in different statements denote the same open file. The printf statement formats its expression list according to the format (see printf(3)). Built-In Functions The built-in function close(expr) closes the file or pipe expr opened by a print or printf statement or a call to getline with the same string-valued expr. This function returns zero if successful, otherwise, it returns non-zero. The customary functions exp, log, sqrt, sin, cos, atan2 are built in. Other built-in functions are: blength[([s])] Length of its associated argument (in bytes) taken as a string, or of $0 if no argument. length[([s])] Length of its associated argument (in characters) taken as a string, or of $0 if no argument. rand() Returns a random number between zero and one. srand([expr]) Sets the seed value for rand, and returns the previous seed value. If no argument is given, the time of day is used as the seed value; otherwise, expr is used. int(x) Truncates to an integer value substr(s, m[, n]) Return the at most n-character substring of s that begins at position m, numbering from 1. If n is omitted, the substring is limited by the length of string s. index(s, t) Return the position, in characters, numbering from 1, in string s where string t first occurs, or zero if it does not occur at all. match(s, ere) Return the position, in characters, numbering from 1, in string s where the extended regular expression ere occurs, or 0 if it does not. The variables RSTART and RLENGTH are set to the position and length of the matched string. split(s, a[, fs]) Splits the string s into array elements a[1], a[2], ..., a[n], and returns n. The separation is done with the regular expression fs, or with the field separator FS if fs is not given. sub(ere, repl [, in]) Substitutes repl for the first occurrence of the extended regular expression ere in the string in. If in is not given, $0 is used. gsub Same as sub except that all occurrences of the regular expression are replaced; sub and gsub return the number of replacements. sprintf(fmt, expr, ...) String resulting from formatting expr ... according to the printf(3S) format fmt system(cmd) Executes cmd and returns its exit status toupper(s) Converts the argument string s to uppercase and returns the result. tolower(s) Converts the argument string s to lowercase and returns the result. The built-in function getline sets $0 to the next input record from the current input file; getline < file sets $0 to the next record from file. getline x sets variable x instead. Finally, cmd | getline pipes the output of cmd into getline; each call of getline returns the next line of output from cmd. In all cases, getline returns 1 for a successful input, 0 for end of file, and -1 for an error. Patterns Patterns are arbitrary Boolean combinations (with ! || &&) of regular expressions and relational expressions. awk supports Extended Regular Expressions as described in regexp(5). Isolated regular expressions in a pattern apply to the entire line. Regular expressions can also occur in relational expressions, using the operators ~ and !~. /re/ is a constant regular expression; any string (constant or variable) can be used as a regular expression, except in the position of an isolated regular expression in a pattern. A pattern can consist of two patterns separated by a comma; in this case, the action is performed for all lines from an occurrence of the first pattern though an occurrence of the second. A relational expression is one of the following: expression matchop regular-expression expression relop expression expression in array-name (expr,expr,...) in array-name where a relop is any of the six relational operators in C, and a matchop is either ~ (matches) or !~ (does not match). A conditional is an arithmetic expression, a relational expression, or a Boolean combination of the two. The special patterns BEGIN and END can be used to capture control before the first input line is read and after the last. BEGIN and END do not combine with other patterns. Special Characters The following special escape sequences are recognized by awk in both regular expressions and strings: Escape Meaning \a alert character \b backspace character \f form-feed character \n new-line character \r carriage-return character \t tab character \v vertical-tab character \nnn 1- to 3-digit octal value nnn \xhhh 1- to n-digit hexadecimal number Variable Names Variable names with special meanings are: FS Input field separator regular expression; a space character by default; also settable by option -Ffs. NF The number of fields in the current record. NR The ordinal number of the current record from the start of input. Inside a BEGIN action the value is zero. Inside an END action the value is the number of the last record processed. FNR The ordinal number of the current record in the current file. Inside a BEGIN action the value is zero. Inside an END action the value is the number of the last record processed in the last file processed. FILENAME A pathname of the current input file. RS The input record separator; a newline character by default. OFS The print statement output field separator; a space character by default. ORS The print statement output record separator; a newline character by default. OFMT Output format for numbers (default %.6g). If the value of OFMT is not a floating-point format specification, the results are unspecified. CONVFMT Internal conversion format for numbers (default %.6g). If the value of CONVFMT is not a floating-point format specification, the results are unspecified. SUBSEP The subscript separator string for multi- dimensional arrays; the default value is " 34" ARGC The number of elements in the ARGV array. ARGV An array of command line arguments, excluding options and the program argument numbered from zero to ARGC-1. The arguments in ARGV can be modified or added to; ARGC can be altered. As each input file ends, awk will treat the next non-null element of ARGV, up to the current value of ARGC-1, inclusive, as the name of the next input file. Thus, setting an element of ARGV to null means that it will not be treated as an input file. The name - indicates the standard input. If an argument matches the format of an assignment operand, this argument will be treated as an assignment rather than a file argument. ENVIRON Array of environment variables; subscripts are names. For example, if environment variable V=thing, ENVIRON["V"] produces thing. RSTART The starting position of the string matched by the match function, numbering from 1. This is always equivalent to the return value of the match function. RLENGTH The length of the string matched by the match function. Functions can be defined (at the position of a pattern-action statement) as follows: function foo(a, b, c) { ...; return x } Parameters are passed by value if scalar, and by reference if array name. Functions can be called recursively. Parameters are local to the function; all other variables are global. Note that if pattern-action statements are used in an HP-UX command line as an argument to the awk command, the pattern-action statement must be enclosed in single quotes to protect it from the shell. For example, to print lines longer than 72 characters, the pattern-action statement as used in a script (-f progfile command form) is: length > 72 The same pattern action statement used as an argument to the awk command is quoted in this manner: awk 'length > 72' EXTERNAL INFLUENCES Environment Variables LANG Provides a default value for the internationalization variables that are unset or null. If LANG is unset or null, the default value of "C" (see lang(5)) is used. If any of the internationalization variables contains an invalid setting, awk will behave as if all internationalization variables are set to "C". See environ(5). LC_ALL If set to a non-empty string value, overrides the values of all the other internationalization variables. LC_CTYPE Determines the interpretation of text as single and/or multi-byte characters, the classification of characters as printable, and the characters matched by character class expressions in regular expressions. LC_NUMERIC Determines the radix character used when interpreting numeric input, performing conversion between numeric and string values and formatting numeric output. Regardless of locale, the period character (the decimal-point character of the POSIX locale) is the decimal-point character recognized in processing awk programs (including assignments in command-line arguments). LC_COLLATE Determines the locale for the behavior of ranges, equivalence classes and multi-character collating elements within regular expressions. LC_MESSAGES Determines the locale that should be used to affect the format and contents of diagnostic messages written to standard error and informative messages written to standard output. NLSPATH Determines the location of message catalogues for the processing of LC_MESSAGES. PATH Determines the search path when looking for commands executed by system(cmd), or input and output pipes. In addition, all environment variables will be visible via the awk variable ENVIRON. International Code Set Support Single- and multi-byte character code sets are supported except that variable names must contain only ASCII characters and regular expressions must contain only valid characters. DIAGNOSTICS awk supports up to 199 fields ($1, $2, ..., $199) per record. EXAMPLES Print lines longer than 72 characters: length > 72 Print first two fields in opposite order: { print $2, $1 } Same, with input fields separated by comma and/or blanks and tabs: BEGIN { FS = ",[ \t]*|[ \t]+" } { print $2, $1 } Add up first column, print sum and average: {s +=$1 }" END { print "sum is", s, " average is", s/NR } Print all lines between start/stop pairs: /start/, /stop/ Simulate echo command (see echo(1)): BEGIN { # Simulate echo(1) for (i = 1; i < ARGC; i++) printf "%s ", ARGV[i] printf "\n" exit } AUTHOR awk was developed by AT&T, IBM, OSF, and HP. SEE ALSO lex(1), sed(1). A. V. Aho, B. W. Kernighan, P. J. Weinberger: The AWK Programming Language, Addison-Wesley, 1988. STANDARDS CONFORMANCE awk: SVID2, SVID3, XPG2, XPG3, XPG4, POSIX.2

grep

graphics/05icon13.gif

grep - Command to match a specified pattern.

grep(1) grep(1) NAME grep, egrep, fgrep - search a file for a pattern SYNOPSIS Plain call with pattern grep [-E|-F] [-c|-l|-q] [-insvx] pattern [file ...] Call with (multiple) -e pattern grep [-E|-F] [-c|-l|-q] [-binsvx] -e pattern... [-e pattern] ... [file ...] Call with -f file grep [-E|-F] [-c|-l|-q] [-insvx] [-f pattern_file] [file ...] Obsolescent: egrep [-cefilnsv] [expression] [file ...] fgrep [-cefilnsvx] [strings] [file ...] DESCRIPTION The grep command searches the input text files (standard input default) for lines matching a pattern. Normally, each line found is copied to the standard output. grep supports the Basic Regular Expression syntax (see regexp(5)). The -E option (egrep) supports Extended Regular Expression (ERE) syntax (see regexp(5)). The -F option (fgrep) searches for fixed strings using the fast Boyer-Moore string searching algorithm. The -E and -F options treat newlines embedded in the pattern as alternation characters. A null expression or string matches every line. The forms egrep and fgrep are maintained for backward compatibility. The use of the -E and -F options is recommended for portability. Options -E Extended regular expressions. Each pattern specified is a sequence of one or more EREs. The EREs can be separated by newline characters or given in separate -e expression options. A pattern matches an input line if any ERE in the sequence matches the contents of the input line without its trailing newline character. The same functionality is obtained by using egrep. -F Fixed strings. Each pattern specified is a sequence of one or more strings. Strings can be separated by newline characters or given in separate -e expression options. A pattern matches an input line if the line contains any of the strings in the sequence. The same functionality is obtained by using fgrep. -b Each line is preceded by the block number on which it was found. This is useful in locating disk block numbers by context. Block numbers are calculated by dividing by 512 the number of bytes that have been read from the file and rounding down the result. -c Only a count of matching lines is printed. -e expression Same as a simple expression argument, but useful when the expression begins with a hyphen (-). Multiple -e options can be used to specify multiple patterns; an input line is selected if it matches any of the specified patterns. -f pattern_file The regular expression (grep and grep -E) or strings list (grep -F) is taken from the pattern_file. -i Ignore uppercase/lowercase distinctions during comparisons. -l Only the names of files with matching lines are listed (once), separated by newlines. If standard input is searched, a path name of - is listed. -n Each line is preceded by its relative line number in the file starting at 1. The line number is reset for each file searched. This option is ignored if -c, -b, -l, or -q is specified. -q (Quiet) Do not write anything to the standard output, regardless of matching lines. Exit with zero status upon finding the first matching line. Overrides any options that would produce output. -s Error messages produced for nonexistent or unreadable files are suppressed. -v All lines but those matching are printed. -x (eXact) Matches are recognized only when the entire input line matches the fixed string or regular expression. In all cases in which output is generated, the file name is output if there is more than one input file. Care should be taken when using the characters $, *, [, ^, |, (, ), and \ in expression, because they are also meaningful to the shell. It is safest to enclose the entire expression argument in single quotes ('...'). EXTERNAL INFLUENCES Environment Variables LANG determines the locale to use for the locale categories when both LC_ALL and the corresponding environment variable (beginning with LC_) do not specify a locale. If LANG is not specified or is set to the empty string, a default of C (see lang(5)) is used. LC_ALL determines the locale to use to override any values for locale categories specified by the settings of LANG or any environment variables beginning with LC_. LC_COLLATE determines the collating sequence used in evaluating regular expressions. LC_CTYPE determines the interpretation of text as single byte and/or multi-byte characters, the classification of characters as letters, the case information for the -i option, and the characters matched by character class expressions in regular expressions. LC_MESSAGES determines the language in which messages are displayed. If any internationalization variable contains an invalid setting, the commands behave as if all internationalization variables are set to C. See environ(5). International Code Set Support Single-byte and multi-byte character code sets are supported. RETURN VALUE Upon completion, grep returns one of the following values: 0 One or more matches found. 1 No match found. 2 Syntax error or inaccessible file (even if matches were found). EXAMPLES In the Bourne shell (sh(1)) the following example searches two files, finding all lines containing occurrences of any of four strings: grep -F 'if then else fi' file1 file2 Note that the single quotes are necessary to tell grep -F when the strings have ended and the file names have begun. For the C shell (see csh(1)) the following command can be used: grep -F 'if\ then\ else\ fi' file1 file2 To search a file named address containing the following entries: Ken 112 Warring St. Apt. A Judy 387 Bowditch Apt. 12 Ann 429 Sixth St. the command: grep Judy address prints: Judy 387 Bowditch Apt. 12 To search a file for lines that contain either a Dec or Nov, use either of the following commands: grep -E '[Dd]ec|[Nn]ov' file egrep -i 'dec|nov' file Search all files in the current directory for the string xyz: grep xyz * Search all files in the current directory subtree for the string xyz, and ensure that no error occurs due to file name expansion exceeding system argument list limits: find . -type f -print |xargs grep xyz The previous example does not print the name of files where string xyz appears. To force grep to print file names, add a second argument to the grep command portion of the command line: find . -type f -print |xargs grep xyz /dev/null In this form, the first file name is that produced by find, and the second file name is the null file. WARNINGS (XPG4 only.) If the -q option is specified, the exit status will be zero if an input line is selected, even if an error was detected. Otherwise, default actions will be performed. SEE ALSO sed(1), sh(1), regcomp(3C), environ(5), lang(5), regexp(5). STANDARDS CONFORMANCE grep: SVID2, SVID3, XPG2, XPG3, XPG4, POSIX.2 egrep: SVID2, SVID3, XPG2, XPG3, XPG4, POSIX.2 fgrep: SVID2, SVID3, XPG2, XPG3, XPG4, POSIX.2

sed

graphics/06icon02.gif

sed - Stream text editor.

sed(1)                                                                sed(1)  NAME       sed - stream text editor  SYNOPSIS       sed [-n] script [file ...]       sed [-n] [-e script] ... [-f script_file] ... [file ...]  DESCRIPTION       sed copies the named text files (standard input default) to the       standard output, edited according to a script containing up to 100       commands. Only complete input lines are processed. Any input text at       the end of a file that is not terminated by a new-line character is       ignored.     Options       sed recognizes the following options:            -f script_file                        Take script from file script_file.            -e script   Edit according to script. If there is just one -e                        option and no -f options, the flag -e can be omitted.            -n          Suppress the default output.       sed interprets all -escript and -fscript_file arguments in the order       given. Use caution, if mixing -e and -f options, to avoid       unpredictable or incorrect results.     Command Scripts       A script consists of editor commands, one per line, of the following       form:            [address [, address]] function [arguments]       In normal operation, sed cyclically copies a line of input into a       pattern space (unless there is something left after a D command),       applies in sequence all commands whose addresses select that pattern       space, and, at the end of the script, copies the pattern space to the       standard output (except under -n) and deletes the pattern space.       Some of the commands use a hold space to save all or part of the       pattern space for subsequent retrieval.     Command Addresses       An address is either a decimal number that counts input lines       cumulatively across files,a $which addresses the last line of input,       or a context address; that is, a /regular expression/ in the style of       ed(1) modified thus:            -  In a context address, the construction \?regular expression?,               where ? is any character, is identical to /regular               expression/. Note that in the context address \xabc\xdefx,               the second x stands for itself, so that the regular expression               is abcxdef.            -  The escape sequence \n matches a new-line character embedded               in the pattern space.            -  A period (.) matches any character except the terminal new              line of the pattern space.            -  A command line with no addresses selects every pattern space.            -  A command line with one address selects each pattern space               that matches the address.            -  A command line with two addresses selects the inclusive range               from the first pattern space that matches the first address               through the next pattern space that matches the second (if the               second address is a number less than or equal to the line               number first selected, only one line is selected). Thereafter               the process is repeated, looking again for the first address.       sed supports Basic Regular Expression syntax (see regexp(5)).       Editing commands can also be applied to only non-selected pattern       spaces by use of the negation function ! (described below).     Command Functions       In the following list of functions, the maximum number of permissible       addresses for each function is indicated in parentheses. Other       function elements are interpreted as follows:            text        One or more lines, all but the last of which end with                        \ to hide the new-line. Backslashes in text are                        treated like backslashes in the replacement string of                        an s command, and can be used to protect initial                        blanks and tabs against the stripping that is done on                        every script line.            rfile       Must terminate the command line, and must be preceded                        by exactly one blank.            wfile       Must terminate the command line, and must be preceded                        by exactly one blank. Each wfile is created before                        processing begins. There can be at most 10 distinct                        wfile arguments.       sed recognizes the following functions:       (1)a\       text        Append. Place text on the output before reading next                   input line.       (2)b label  Branch to the : command bearing label. If no label is                   specified, branch to the end of the script.       (2)c\       text        Change. Delete the pattern space. With 0 or 1 address or                   at the end of a 2-address range, place text on the output.                   Start the next cycle.       (2)d        Delete pattern space and start the next cycle.       (2)D        Delete initial segment of pattern space through first                   new-line and start the next cycle.       (2)g        Replace contents of the pattern space with contents of the                   hold space.       (2)G        Append contents of hold space to the pattern space.       (2)h        Replace contents of the hold space with contents of the                   pattern space.       (2)H        Append the contents of the pattern space to the hold                   space.       (1)i\       text        Insert. Place text on the standard output.       (2)l        List the pattern space on the standard output in an                   unambiguous form. Non-printing characters are spelled in                   three-digit octal number format (with a preceding                   backslash), and long lines are folded.       (2)n        Copy the pattern space to the standard output if the                   default output has not been suppressed (by the -n option                   on the command line or the #n command in the script file).                   Replace the pattern space with the next line of input.       (2)N        Append the next line of input to the pattern space with an                   embedded new-line. (The current line number changes.)       (2)p        Print. Copy the pattern space to the standard output.       (2)P        Copy the initial segment of the pattern space through the                   first new-line to the standard output.       (1)q        Quit. Branch to the end of the script. Do not start a                   new cycle.       (1)r        rfile Read contents of rfile and place on output before reading                   the next input line.       (2)s/regular expression/replacement/flags                   Substitute replacement string for instances of regular                   expression in the pattern space. Any character can be                   used instead of /. For a fuller description see ed(1).                   flags is zero or more of:                      n           n=1-2048 (LINE_MAX). Substitute for just                                  the nth occurrence of regular expression in                                  the pattern space.                      g           Global. Substitute for all non-overlapping                                  instances of regular expression rather than                                  just the first one.                      p           Print the pattern space if a replacement                                  was made and the default output has been                                  suppressed (by the -n option on the command                                  line or the #n command in the script file).                      w wfile     Write. Append the pattern space to wfile                                  if a replacement was made.       (2)t label  Test. Branch to the : command bearing the label if any                   substitutions have been made since the most recent reading                   of an input line or execution of a t. If label is empty,                   branch to the end of the script.       (2)w wfile  Write. Append the pattern space to wfile.       (2)x        Exchange the contents of the pattern and hold spaces.       (2)y/string1/string2/                   Transform. Replace all occurrences of characters in                   string1 with the corresponding character in string2. The                   lengths of string1 and string2 must be equal.       (2)! function                   Don't. Apply the function (or group, if function is {)                   only to lines not selected by the address or addresses.       (0): label  This command does nothing; it bears a label for b and t                   commands to branch to.       (1)=        Place the current line number on the standard output as a                   line.       (2){        Execute the following commands through a matching } only                   when the pattern space is selected. The syntax is:                   { cmd1                   cmd2                   cmd3                    .                    .                    .                   }       (0)         An empty command is ignored.       (0)#        If a # appears as the first character on the first line of                   a script file, that entire line is treated as a comment                   with one exception: If the character after the # is an n,                   the default output is suppressed. The rest of the line                   after #n is also ignored. A script file must contain at                   least one non-comment line.  EXTERNAL INFLUENCES     Environment Variables       LANG provides a default value for the internationalization variables       that are unset or null. If LANG is unset or null, the default value of       "C" (see lang(5)) is used. If any of the internationalization       variables contains an invalid setting, sed will behave as if all       internationalization variables are set to "C". See environ(5).       LC_ALL If set to a non-empty string value, overrides the values of all       the other internationalization variables.       LC_CTYPE determines the interpretation of text as single and/or       multi-byte characters, the classification of characters as printable,       and the characters matched by character class expressions in regular       expressions.       LC_MESSAGES determines the locale that should be used to affect the       format and contents of diagnostic messages written to standard error       and informative messages written to standard output.       NLSPATH determines the location of message catalogues for the       processing of LC_MESSAGES.     International Code Set Support       Single- and multi-byte character code sets are supported.  EXAMPLES       Make a simple substitution in a file from the command line or from a       shell script, changing abc to xyz:            sed 's/abc/xyz/' file1 >file1.out       Same as above but use shell or environment variables var1 and var2 in       search and replacement strings:            sed "s/$var1/$var2/" file1 >file1.out       or            sed 's/'$var1'/'$var2'/' file1 >file1.out       Multiple substitutions in a single command:            sed -e 's/abc/xyz/' -e 's/lmn/rst/' file1       or            sed -e 's/abc/xyz/' \            -e 's/lmn/rst/' \            file1 >file1.out  WARNINGS       sed limits command scripts to a total of not more than 100 commands.       The hold space is limited to 8192 characters.       sed processes only text files. See the glossary for a definition of       text files and their limitations.  AUTHOR       sed was developed by OSF and HP.  SEE ALSO       awk(1), ed(1), grep(1), environ(5), lang(5), regexp(5).       sed: A Non-Interactive Streaming Editor tutorial in the Text       Processing Users Guide.  STANDARDS CONFORMANCE       sed: SVID2, SVID3, XPG2, XPG3, XPG4, POSIX.2

CONTENTS

Chapter 6. Advanced UNIX Tools - Regular Expressions, sed, awk, and grep