Chapter 5. UNIX Tools - split, wc, sort, cmp, diff, comm, dircmp, cut, paste, join, and tr

CONTENTS

Chapter 5. UNIX Tools - split, wc, sort, cmp, diff, comm, dircmp, cut, paste, join, and tr

  •  Not All Commands on All UNIX Variants
  •  split
  •  wc
  •  sort
  •  cmp, diff, and comm
  •  dircmp
  •  cut
  •  paste
  •  tr
  •  Manual Pages for Some Commands Used in Chapter 5

Not All Commands on All UNIX Variants

A variety of commands are covered in this chapter, including:

  • split, wc, sort, cmp, diff, comm, dircmp, cut, paste, join, and tr commands

I cover many useful and enjoyable commands in this chapter. All the commands, however, are not available on all UNIX variants. If a specific command is not available on your system, then you probably have a similar command or can combine more than one command to achieve the desired result.

split

Some files are just too long. The file listing we earlier looked at may be more easily managed if split into multiple files. We can use the split command to make listing into files 25 lines long, as shown in Figure 5-1:

Figure 5-1. split Command

graphics/05fig01.gif

Note that the split command produced several files from listing called xaa, xab, and so on. The -l option is used to specify the number of lines in files produced by split.

Here is a summary of the split command:

split - Split a file into multiple files.

 

Options

   
 

-l line_count

Split the file into files with line_count lines per file.

 

-b n

Split the file into files with n bytes per file.

wc

We know that we have split listing into separate files of 25 lines each, but how many lines were in listing originally? How about the number of words in listing? Those of us who get paid by the word for some of the articles we write often want to know. How about the number of characters in a file? The wc command can produce a word, line, and character count for you. Figure 5-2 shows issuing the wc command with the -wlc options, which produce a count of words with the -w option, lines with the -l option, and characters with the -c option.

Figure 5-2. wc Command

graphics/05fig02.gif

graphics/05icon01.gif

graphics/05icon02.gif

Notice that the number of words and lines produced by wc is the same for the file listing. The reason is that each line contains exactly one word. When we display the words, lines, and characters with the wc command for the text file EMACS.tutorial, we cansee thatthe number of words is 6251, the number of lines is 825, and the number of characters is 34491. In a text file, in this case a tutorial, you would expect many more words than lines.

Here is a summary of the wc command:

wc - Produce a count of words, lines, and characters.

 

Options

   
 

-l

Print the number of lines in a file.

 

-w

Print the number of words in a file.

 

-c

Print the number of characters in a file.

sort

Sometimes the contents of files are not sorted in the way you would like. You can use the sort command to sort files with a variety of options.

graphics/05icon03.gif

You may find, as you continue to use your UNIX system, that your system administrator is riding you about the amount of disk space you that you are consuming. You can monitor the amount of disk space you are consuming with the du command. Figure 5-3 shows creating a file called disk_space that lists the amount of disk space consumed by files and directories and shows the first 20 lines of the file:

Figure 5-3. sort Command Example #1

graphics/05fig03.gif

graphics/05icon03.gif

Notice that the result is sorted alphabetically. In many cases, this is what you want. If the file were not sorted alphabetically, you could use the sort command to do so. In this case, we don't care as much about seeing entries in alphabetical order as we do in numeric order, that is, the files and directories that are consuming the most space. Figure 5-4 shows sorting the file disk_space numerically with the -n option and reversing the order of the sort with the -r option so that the biggest numbers appear first. We then specify the output file name with the -o option.

Figure 5-4. sort Command Example #2

graphics/05fig04.gif

What if the items being sorted had many more fields than our two-column disk usage example? Let's go back to the passwd.test file for a more complex sort. Let's cat passwd.test so we can again see its contents:

graphics/04icon02.gif

# cat passwd.test  root:PgYQCkVH65hyQ:0:0:root:/root:/bin/bash  bin:*:1:1:bin:/bin:  daemon:*:2:2:daemon:/sbin:  adm:*:3:4:adm:/var/adm:  lp:*:4:7:lp:/var/spool/lpd:  sync:*:5:0:sync:/sbin:/bin/sync  shutdown:*:6:11:shutdown:/sbin:/sbin/shutdown  halt:*:7:0:halt:/sbin:/sbin/halt  mail:*:8:12:mail:/var/spool/mail:  news:*:9:13:news:/var/spool/news:  uucp:*:10:14:uucp:/var/spool/uucp:  operator:*:11:0:operator:/root:  games:*:12:100:games:/usr/games:  gopher:*:13:30:gopher:/usr/lib/gopher-data:  ftp:*:14:50:FTP User:/home/ftp:  man:*:15:15:Manuals Owner:/:  nobody:*:65534:65534:Nobody:/:/bin/false  col:Wh0yzfAV2qm2Y:100:100:Caldera OpenLinux                     User:/home/col:/bin/bash 

graphics/05icon03.gif

Now let's use sort to determine which users are in the same group. Fields are separated in passwd.test by a colon (:). The fourth field is the group to which a user belongs. For instance, bin is in group 1, daemon in group 2, and so on. To sort by group, we would have to specify three options to the sort command. The first is to specify the delimiter (or field separator) of colon (:) using the -t option. Next, we would have to specify the field on which we wish to sort with the -k option. Finally, we want a numeric sort, so use the -n option. The following example shows a numeric sort of the passwd.test file by the fourth field:

# sort -t: -k4 -n passwd.test  halt:*:7:0:halt:/sbin:/sbin/halt  operator:*:11:0:operator:/root:  root:PgYQCkVH65hyQ:0:0:root:/root:/bin/bash  sync:*:5:0:sync:/sbin:/bin/sync  bin:*:1:1:bin:/bin:  daemon:*:2:2:daemon:/sbin:  adm:*:3:4:adm:/var/adm:  lp:*:4:7:lp:/var/spool/lpd:  shutdown:*:6:11:shutdown:/sbin:/sbin/shutdown  mail:*:8:12:mail:/var/spool/mail:  news:*:9:13:news:/var/spool/news:  uucp:*:10:14:uucp:/var/spool/uucp:  man:*:15:15:Manuals Owner:/:  gopher:*:13:30:gopher:/usr/lib/gopher-data:  ftp:*:14:50:FTP User:/home/ftp:  col:Wh0yzfAV2qm2Y:100:100:Caldera OpenLinux                     User:/home/col:/bin/bash  games:*:12:100:games:/usr/games:  nobody:*:65534:65534:Nobody:/:/bin/false 

graphics/05icon03.gif

The following is a summary of the sort command.

sort - Sort lines of files (alphabetically by default).

 

Options

   
 

-b

Ignore leading spaces and tabs.

 

-c

Check whether files are already sorted, and if so, do nothing.

 

-d

Ignore punctuation and sort in dictionary order.

 

-f

Ignore the case of entries when sorting.

 

-i

Ignore non-ASCII characters when sorting.

 

-ks

Use field s as the field on which to base the sort.

 

-m

Merge sorted files.

 

-n

Sort in numeric order.

 

-o

file Specify the output file name rather than write to standard output.

 

-r

Reverse the order of the sort by starting with the last letter of the alphabet or with the largest number, as we did in the example.

 

+n

Skip n fields or columns before sorting.

cmp, diff, and comm

A fact of life is that as you go about editing files, you may occasionally lose track of what changes you have made to which files. You may then need to make comparisons of files. Let's take a look at three such commands, cmp, diff, and comm, and see how they compare files.

graphics/05icon04.gif

graphics/05icon05.gif

graphics/05icon06.gif

graphics/04icon04.gif

Let's assume that we have modified a script called llsum. The unmodified version of llsum was saved as llsum.orig. Using the head command, we can view the first 20 lines of llsum and then the first 20 lines of llsum.orig:

# head -20 llsum  #  #!/bin/sh  # Displays a truncated long listing (ll) and  # displays size statistics  # of the files in the listing.  ll $* | \  awk ' BEGIN { x=i=0; printf "%-25s%-10s%8s%8s\n",\                        "FILENAME","OWNER","SIZE","TYPE" }        $1 ~ /^[-dlps]/ {# line format for normal files                printf "%-25s%-10s%8d",$9,$3,$5                x = x + $5                    i++                        }       $1 ~ /^-/ { printf "%8s\n","file" } # standard file  types        $1 ~ /^d/ { printf "%8s\n","dir" }        $1 ~ /^l/ { printf "%8s\n","link" }        $1 ~ /^p/ { printf "%8s\n","pipe" }        $1 ~ /^s/ { printf "%8s\n","socket" }        $1 ~ /^[bc]/ { # line format for device files                              printf              "%-25s% - 10s%8s%8s\n",$10,$3,"","dev"                   }  #  # head -20 llsum.orig  #  #!/bin/sh  # Displays a truncated long listing (ll) and  # displays size statistics  # of the files in the listing.  ll $* | \  awk ' BEGIN { x=i=0; printf "%-16s%-10s%8s%8s\n",\                       "FILENAME","OWNER","SIZE","TYPE" }        $1 ~ /^[-dlps]/ {# line format for normal files                 printf "%-16s%-10s%8d",$9,$3,$5                 x = x + $5                    i++                         }        $1 ~ /^-/ { printf "%8s\n","file" } # standard file  types        $1 ~ /^d/ { printf "%8s\n","dir" }        $1 ~ /^l/ { printf "%8s\n","link" }        $1 ~ /^p/ { printf "%8s\n","pipe" }        $1 ~ /^s/ { printf "%8s\n","socket" }        $1 ~ /^[bc]/ { # line format for device files                              printf              "%-16s% - 10s%8s%8s\n",$10,$3,"","dev"                   } 

I'm not sure what changes I made to llsum.orig to improve it, so we can first use cmp to see whether indeed differences exist between the files.

graphics/05icon07.gif

$  $ cmp llsum llsum.orig  llsum llsum.orig differ: char 154, line 6  $ 

cmp does not report back much information, only that character 154 in the file at line 6 is different in the two files. There may indeed be other differences, but this is all we know about so far.

To get information about all of the differences in the two files, we could use the -l option to cmp:

$ cmp -l llsum llsum.orig     154  62  61     155  65  66     306  62  61     307  65  66     675  62  61     676  65  66 

This is not all that useful an output, however. W want to see not only the position of the differences, but also the differences themselves.

Now we can use diff to describe all the differences in the two files:

graphics/05icon08.gif

$ diff llsum llsum.orig  6c6  < awk ' BEGIN { x=i=0; printf "%-25s%-10s%8s%8s\n",\  --- > awk ' BEGIN { x=i=0; printf "%-16s%-10s%8s%8s\n",\  9c9  <            printf "%-25s%-10s%8d",$9,$3,$5  --- >            printf "%-16s%-10s%8d",$9,$3,$5  19c19  <            printf "%-25s%-10s%8s%8s\n",$10,$3,"","dev"  --- >            printf "%-16s%-10s%8s%8s\n",$10,$3,"","dev"  $ 

We now know that lines 6, 9, and 25 are different in the two files and these lines are also listed for us. From this listing, we can see that the number 16 in llsum.orig was changed to 25 in the newer llsum file, and this accounts for all of the differences in the two files. The less "than sign" (<) precedes lines from the first file, in this case llsum. The "greater than" sign (>) precedes lines from the second file, in this case llsum.orig. I made this change, starting the second group of information from character 16 to character 25, because I wanted the second group of information, produced by llsum, to start at column 25. The second group of information is the owner, as shown in the following example:

$ llsum  FILENAME                 OWNER         SIZE    TYPE  README                   denise         810    file  backup_files             denise        3408    file  biography                denise         427    file  cshtest                  denise        1024     dir  gkill                    denise        1855    file  gkill.out                denise         191    file  hostck                   denise         924    file  ifstat                   denise        1422    file  ifstat.int               denise        2147    file  ifstat.out               denise         723    file  introdos                 denise       54018    file  introux                  denise       52476    file  letter                   denise       23552    file  letter.auto              denise       69632    file  letter.auto.recover      denise       71680    file  letter.backup            denise       23552    file  letter.lck               denise          57    file  letter.recover           denise       69632    file  llsum                    denise        1267    file  llsum.orig               denise        1267    file  llsum.out                denise        1657    file  llsum.tomd.out           denise        1356    file  psg                      denise         670    file  psg.int                  denise         802    file  psg.out                  denise         122    file  sam_adduser              denise        1010    file  tdolan                   denise        1024     dir  trash                    denise        4554    file  trash.out                denise         329    file  typescript               denise        2017    file  The files listed occupy 393605 bytes (0.3754 Mbytes)  Average file size is 13120 bytes  $ 

When we run llsum.orig, clearly the second group of information, which is the owner, starts at column 16 and not column 32:

$ llsum.orig  FILENAME        OWNER         SIZE    TYPE  README          denise         810    file  backup_files    denise        3408    file  biography       denise         427    file  cshtest         denise        1024     dir  gkill           denise        1855    file  gkill.out       denise         191    file  hostck          denise         924    file  ifstat          denise        1422    file  ifstat.int      denise        2147    file  ifstat.out      denise         723    file  introdos        denise       54018    file  introux         denise       52476    file  letter          denise       23552    file  letter.auto     denise       69632    file  letter.auto.rec denise       71680    file  letter.backup   denise       23552    file  letter.lck      denise          57    file  letter.recover  denise       69632    file  llsum           denise        1267    file  llsum.orig      denise        1267    file  llsum.out       denise        1657    file  llsum.tomd.out  denise        1356    file  psg             denise         670    file  psg.int         denise         802    file  psg.out         denise         122    file  sam_adduser     denise        1010    file  tdolan          denise        1024     dir  trash           denise        4554    file  trash.out       denise         329    file  typescript      denise        3894    file  The files listed occupy 395482 bytes (0.3772 Mbytes)  Average file size is 13182 bytes  script done on Mon Dec 11 12:59:18  $ 

graphics/05icon09.gif

graphics/05icon10.gif

We can compare two sorted files using comm and see the lines that are unique to each file, as well as the lines found in both files. When we compare two files with comm, the lines that are unique to the first file appear in the first column, the lines unique to the second file appear in the second column and the lines contained in both files appear in the third column. Let's go back to the /etc/passwd file to illustrate this comparison. We'll compare two /etc/passwd files, the active /etc/passwd file in use and an old /etc/passwd file from a backup:

# comm /etc/passwd /etc/passwd.backup                  root:PgYQCkVH65hyQ:0:0:root:/root:/bin/bash                  bin:*:1:1:bin:/bin:                  daemon:*:2:2:daemon:/sbin:                  adm:*:3:4:adm:/var/adm:                  lp:*:4:7:lp:/var/spool/lpd:                  sync:*:5:0:sync:/sbin:/bin/sync                  shutdown:*:6:11:shutdown:/sbin:/sbin/shutdown                  halt:*:7:0:halt:/sbin:/sbin/halt                  mail:*:8:12:mail:/var/spool/mail:                     news:*:9:13:news:/var/spool/news:                     uucp:*:10:14:uucp:/var/spool/uucp:             operator1:*:12:0:operator:/root:                     operator:*:11:0:operator:/root:     games:*:12:100:games:/usr/games:                     gopher:*:13:30:gopher:/usr/lib/gopher-data:                     ftp:*:14:50:FTP User:/home/ftp:                     man:*:15:15:Manuals Owner:/:                     nobody:*:65534:65534:Nobody:/:/bin/false                     col:Wh0yzfAV2qm2Y:100:100:Caldera                     OpenLinux User:/home/col:/bin/bash 

You can see from this output that the user games appears only in the active /etc/passwd file, the user operator1 appears only in the /etc/passwd.backup file, and all of the other entries appear in both files.

graphics/05icon07.gif

graphics/05icon08.gif

The following is a summary of the cmp and diff commands:

cmp - Compare the contents of two files. The byte position and line number of the first difference between the two files is returned.

 

Options

   
 

-l

Display the byte position and differing characters for all differences within a file.

 

-s

Work silently; that is, only exit codes are returned.

diff - Compares two files and reports differing lines.

 

Options

   
 

-b

Ignore blanks at the end of a line.

 

-i

Ignore case differences.

 

-t

Expand tabs in output to spaces.

 

-w

Ignore spaces and tabs.

dircmp

Why stop at comparing files? You will probably have many directories in your user area as well. dircmp compares two directories and produces information about the contents of directories.

To begin with, let's perform a long listing of two directories:

graphics/05icon11.gif

graphics/02icon02.gif

$ ls -l krsort.dir.old  total 168  -rwxr-xr-x   1 denise   users     34592 Oct 31 11:27 krsort  -rwxr-xr-x   1 denise   users      3234 Oct 31 11:27 krsort.c  -rwxr-xr-x   1 denise   users     32756 Oct 31 11:27 krsort.dos  -rw-r--r--   1 denise   users      9922 Oct 31 11:27 krsort.q  -rwxr-xr-x   1 denise   users      3085 Oct 31 11:27 krsortorig.c  $  $ ls -l krsort.dir.new  total 168  -rwxr-xr-x   1 denise   users     34592 Oct 31 15:17 krsort  -rwxr-xr-x   1 denise   users     32756 Oct 31 15:17 krsort.dos  -rw-r--r--   1 denise   users      9922 Oct 31 15:17 krsort.q  -rwxr-xr-x   1 denise   users      3234 Oct 31 15:17 krsort.test.c  -rwxr-xr-x   1 denise   users      3085 Oct 31 15:17 krsortorig.c  $ 

From this listing, you can see clearly that one file is unique to each directory. krsort.c appears in only the krsort.dir.old directory, and krsort.test.c appears in only the krsort.dir.new directory. Let's now use dircmp to inform us of the differences in these two directories:

$ dircmp krsort.dir.old krsort.dir.new  krsort.dir.old only and krsort.dir.new only Page 1  ./krsort.c         ./krsort.test.c  Comparison of krsort.dir.old krsort.dir.new Page 1  directory      .  same           ./krsort  same           ./krsort.dos  same           ./krsort.q  same           ./krsortorig.c  $ 

This is a useful output. First, the files that appear in only one directory are listed. Then, the files common to both directories are listed.

graphics/05icon11.gif

The following is a summary of the dircmp command:

dircmp - Compare directories.

 

Options

   
 

-d

Compare the contents of files with the same name in both directories and produce a report of what must be done to make the files identical.

 

-s

Suppress information about different files.

cut

There are times when you have an output that has too many fields in it. When we issued the llsum command earlier, it produced four fields: FILENAME, OWNER, SIZE, and TYPE. What if we want to take this output and look at just the FILENAME and SIZE? We could modify the llsum script, or we could use the cut command to eliminate the OWNER and TYPE fields with the following commands:

graphics/05icon12.gif

$ llsum | cut -c 1-25,37-43  FILENAME                    SIZE  README                       810  backup_files                3408  biography                    427  cshtest                     1024  gkill                       1855  gkill.out                    191  hostck                       924  ifstat                      1422  ifstat.int                  2147  ifstat.out                   723  introdos                   54018  introux                    52476  letter                     23552  letter.auto                69632  letter.auto.recover        71680  letter.backup              23552  letter.lck                    57  letter.recover             69632  llsum                       1267  llsum.orig                  1267  llsum.out                   1657  llsum.tomd.out              1356  psg                          670  psg.int                      802  psg.out                      122  sam_adduser                 1010  tdolan                      1024  trash                       4554  trash.out                    329  typescript                    74  The files listed occupy 3 (0.373  Average file size is 1305  $ 

This has produced a list from llsum, which is piped to cut. Only characters 1 through 25 and 37 through 43 have been extracted. These characters correspond to the fields we want. At the end of the output are two lines that are only partially printed. We don't want these lines, so we can use grep -v to eliminate them and print all other lines. The output of this command is saved to the file llsum.out at the end of this output, which we'll use later:

graphics/05icon13.gif

$./llsum | grep -v "bytes" | cut -c 1-25,37-43  FILENAME                    SIZE  README                       810  backup_files                3408  biography                    427  cshtest                     1024  gkill                       1855  gkill.out                    191  hostck                       924  ifstat                      1422  ifstat.int                  2147  ifstat.out                   723  introdos                   54018  introux                    52476  letter                     23552  letter.auto                69632  letter.auto.recover        71680  letter.backup              23552  letter.lck                    57  letter.recover             69632  llsum                       1267  llsum.orig                  1267  llsum.out                   1657  llsum.tomd.out              1356  psg                          670  psg.int                      802  psg.out                      122  sam_adduser                 1010  tdolan                      1024  trash                       4554  trash.out                    329  typescript                  1242  $ llsum | grep -v "bytes" | cut -c 1-25,37-4_3 > llsum.out  $ 

graphics/05icon12.gif

The following is a summary of the cut command, with some of the more commonly used options:

cut - Extract specified fields from each line.

 

Options

   
 

-c list

Extract based on character position, as shown in the example.

 

-f list

Extract based on fields.

 

-d char

The character following the d is the delimiter when using the -f option. The delimiter is the character that separates fields.

paste

graphics/05icon14.gif

Files can be merged in a variety of ways. If you want to merge files on a line-by-line basis, you can use the paste command. The first line in the second file is pasted to the end of the first line in the first file and so on.

Let's use the cut command just covered and extract only the permissions field, or characters 1 through 10, to get only the permissions for files. We'll then save these in the file ll.out:

graphics/05icon12.gif

graphics/02icon02.gif

$ ls -al | cut -c 1-10  total 798  drwxrwxrwx  drwxrwxrwx  -rwxrwxrwx  -rwxrwxrwx  -rwxrwxrwx  drwxr-xr-x  -rwxrwxrwx  -rw-r--r-- -rwxrwxrwx  -rwxrwxrwx  -rwxr-xr-x  -rw-r--r-- -rw-r--r-- -rwxrwxrwx  -rw-r--r-- -rw-r--r-- -rw-r--r-- -rw-r--r-- -rw-rw- rw- -rw-r--r-- -rw-r--r-- -rwxrwxrwx  -rwxr-xr-x  -rw-r--r-- -rw-r--r-- -rwxrwxrwx  -rwxr-xr-x  -rw-r--r-- -rwxrwxrwx  drwxr-xr-x  -rwxrwxrwx  -rw-r--r-- -rw-r--r-- $ ls -al | cut -c 1-10 > ll.out  $ 

graphics/02icon02.gif

graphics/05icon12.gif

graphics/05icon14.gif

We can now use the paste command to paste the permissions saved in thell.out file to the other file-related information in the llsum.out file:

$ paste llsum.out ll.out  FILENAME                   SIZE        total 792  README                      810        -rwxrwxrwx  backup_files               3408        -rwxrwxrwx  biography                   427        -rwxrwxrwx  cshtest                    1024        drwxr-xr-x  gkill                      1855        -rwxrwxrwx  gkill.out                   191        -rw-r--r-- hostck                      924        -rwxrwxrwx  ifstat                     1422        -rwxrwxrwx  ifstat.int                 2147        -rwxr-xr-x  ifstat.out                  723        -rw-r--r-- introdos                  54018        -rw-r--r-- introux                   52476        -rwxrwxrwx  letter                    23552        -rw-r--r-- letter.auto               69632        -rw-r--r-- letter.auto.recover       71680        -rw-r--r-- letter.backup             23552        -rw-r--r-- letter.lck                   57        -rw-rw- rw- letter.recover            69632        -rw-r--r-- ll.out                     1057        -rw-r--r-- llsum                      1267        -rwxrwxrwx  llsum.orig                 1267        -rwxr-xr-x  llsum.out                  1657        -rw-r--r-- llsum.tomd.out            1356        -rw-r--r-- psg                        670        -rwxrwxrwx  psg.int                    802        -rwxr-xr-x  psg.out                    122        -rw-r--r-- sam_adduser               1010        -rwxrwxrwx  tdolan                    1024        drwxr-xr-x  trash                     4554        -rwxrwxrwx  trash.out                  329        -rw-r--r-- typescript                 679        -rw-r--r-- $ 

This has produced a list that includes FILENAME and SIZE from llsum.out and permissions from ll.out.

If both files have the same first field, you can use the join command to merge the two files.

The following is a summary of the paste and join commands, with some of the more commonly used options:

graphics/05icon15.gif

graphics/05icon14.gif

paste - Merge lines of files.

 

Options

   
 

-d list

Use list as the delimiter between columns. You can use special escape sequences for list such as \n for newline and \t for tab.

join - Combine two presorted files that have a common key field.

 

Options

   
 

-a n

Produce the normal output and also generate a line for each line that can't be joined in 1 or 2.

 

-e string

Replace empty fields in output with string.

 

-t char

Use char as the field separator.

tr

graphics/05icon16.gif

graphics/02icon02.gif

tr translates characters. tr is ideal for such tasks as changing case. For instance, what if you want to translate all lowercase characters to uppercase? The following example shows listing files that have the suffix "zip" and then translates these files into uppercase:

$ ls -al *.zip  file1.zip  file2.zip  file3.zip  file4.zip  file5.zip  file6.zip  file7.zip  $ ls -al *.zip | tr "[:lower:]" "[:upper:]"  FILE1.ZIP  FILE2.ZIP  FILE3.ZIP  FILE4.ZIP  FILE5.ZIP  FILE6.ZIP  FILE7.ZIP  $ 

We use brackets in this case because we are translating a class of characters.

The following is a summary of the tr command, with some of the more commonly used options.

tr - Translate characters.

 

Options

   
 

-A

Translate on a byte-by-byte basis.

 

-d

Delete all occurrences of characters specified.

 

[:class:]

Translate from one character class to another, such as from lowercase class to uppercase class, as shown in the example.

Manual Pages for Some Commands Used in Chapter 5

The following are the HP-UX manual pages for many of the commands used in this chapter. Commands often differ among UNIX variants, so you may find differences in the options or other areas for some commands; however, the following manual pages serve as an excellent reference.

cmp

graphics/05icon07.gif

cmp - Comparefiles.

cmp(1)                                                                 cmp(1)  NAME       cmp - compare two files  SYNOPSIS       cmp [-l] [-s] file1 file2  DESCRIPTION       cmp compares two files (if file1 or file2 is -, the standard input is       used). Under default options, cmp makes no comment if the files are       the same; if they differ, it announces the byte and line number at       which the difference occurred. If one file is an initial subsequence       of the other, that fact is noted.       cmp recognizes the following options:            -l      Print the byte number (decimal) and the differing bytes                    (octal) for each difference (byte numbering begins at 1                    rather than 0).            -s      Print nothing for differing files; return codes only.  EXTERNAL INFLUENCES     Environment Variables       LANG determines the language in which messages are displayed. If LANG       is not specified or is set to the empty string, a default of "C" (see       lang(5)) is used instead of LANG. If any internationalization       variable contains an invalid setting, cmp behaves as if all       internationalization variables are set to "C". See environ(5).     International Code Set Support       Single- and multi-byte character code sets are supported.  DIAGNOSTICS       cmp returns the following exit values:            0 Files are identical.            1 Files are not identical.            2 Inaccessible or missing argument.       cmp prints the following warning if the comparison succeeds till the       end of file of file1(file2) is reached.            cmp: EOF on file1(file2)  SEE ALSO       comm(1), diff(1).  STANDARDS CONFORMANCE       cmp: SVID2, SVID3, XPG2, XPG3, XPG4, POSIX.2 

comm

graphics/05icon09.gif

comm - Produce three-column output of sorted files.

comm(1)                                                            comm(1)  NAME       comm - select or reject lines common to two sorted files  SYNOPSIS       comm [-[123]] file1 file2  DESCRIPTION       comm reads file1 and file2, which should be ordered in increasing       collating sequence (see sort(1) and Environment Variables below), and       produces a three-column output:            Column 1: Lines that appear only in file1,            Column 2: Lines that appear only in file2,            Column 3: Lines that appear in both files.       If - is used for file1 or file2, the standard input is used.       Options 1, 2, or 3 suppress printing of the corresponding column.       Thus comm -12 prints only the lines common to the two files; comm -23       prints only lines in the first file but not in the second; comm -123       does nothing useful.  EXTERNAL INFLUENCES     Environment Variables       LC_COLLATE determines the collating sequence comm expects from the       input files.       LC_MESSAGES determines the language in which messages are displayed.       If LC_MESSAGES is not specified in the environment or is set to the       empty string, the value of LANG determines the language in which       messages are displayed. If LC_COLLATE is not specified in the       environment or is set to the empty string, the value of LANG is used       as a default. If LANG is not specified or is set to the empty string,       a default of ``C'' (see lang(5)) is used instead of LANG. If any       internationalization variable contains an invalid setting, comm       behaves as if all internationalization variables are set to ``C''.       See environ(5).     International Code Set Support       Single- and multi-byte character code sets are supported.  EXAMPLES       The following examples assume that file1 and file2 have been ordered       in the collating sequence defined by the LC_COLLATE or LANG       environment variable.       Print all lines common to file1 and file2 (in other words, print       column 3):            comm -12 file1 file2       Print all lines that appear in file1 but not in file2 (in other words,       print column 1):            comm -23 file1 file2       Print all lines that appear in file2 but not in file1 (in other words,       print column 2):            comm -13 file1 file2  SEE ALSO       cmp(1), diff(1), sdiff(1), sort(1), uniq(1).  STANDARDS CONFORMANCE       comm: SVID2, SVID3, XPG2, XPG3, XPG4, POSIX.2 

cut

graphics/05icon12.gif

cut - Cut selected fields from the lines in a file.

cut(1)                                                               cut(1)  NAME       cut - cut out (extract) selected fields of each line of a file  SYNOPSIS       cut -c list [file ...]       cut -b list [-n] [file ...]       cut -f list [-d char] [-s] [file ...]  DESCRIPTION       cut cuts out (extracts) columns from a table or fields from each line       in a file; in data base parlance, it implements the projection of a       relation. Fields as specified by list can be fixed length (defined in       terms of character or byte position in a line when using the -c or -b       option), or the length can vary from line to line and be marked with a       field delimiter character such as the tab character (when using the -f       option). cut can be used as a filter; if no files are given, the       standard input is used.       When processing single-byte character sets, the -c and -b options are       equivalent and produce identical results. When processing multi-byte       character sets, when the -b and -n options are used together, their       combined behavior is very similar, but not identical to the -c option.     Options       Options are interpreted as follows:            list           A comma-separated list of integer byte (-b                           option), character (-c option), or field (-f                           option) numbers, in increasing order, with                           optional - to indicate ranges. For example:                                1,4,7                                     Positions 1, 4, and 7.                                1-3,8                                     Positions 1 through 3 and 8.                                -5,10                                     Positions 1 through 5 and 10.                                3-   Position 3 through last position.            -b list        Cut based on a list of bytes. Each selected byte                           is output unless the -n option is also specified.            -c list        Cut based on character positions specified by list                           (-c 1-72 extracts the first 72 characters of each                           line).            -f list        Where list is a list of fields assumed to be                           separated in the file by a delimiter character                           (see -d); for example, -f 1,7 copies the first and                           seventh field only. Lines with no field                           delimiters will be passed through intact (useful                           for table subheadings), unless -s is specified.           -d  char        The character following -d is the field delimiter                           (-f option only). Default is tab. Space or other                           characters with special meaning to the shell must                           be quoted. Adjacent field delimiters delimit null                           fields.           -n              Do not split characters. If the high end of a                           range within a list is not the last byte of a                           character, that character is not included in the                           output. However, if the low end of a range within                           a list is not the first byte of a character, the                           entire character is included in the output."           -s              Suppresses lines with no delimiter characters when                           using -f option. Unless -s is specified, lines                           with no delimiters appear in the output without                           alteration.     Hints       Use grep to extract text from a file based on text pattern recognition       (using regular expressions). Use paste to merge files line-by-line in       columnar format. To rearrange columns in a table in a different       sequence, use cut and paste. See grep(1) and paste(1) for more       information.  EXTERNAL INFLUENCES     Environment Variables       LC_CTYPE determines the interpretation of text as single and/or       multi-byte characters.       If LC_CTYPE is not specified in the environment or is set to the empty       string, the value of LANG is used as a default for each unspecified or       empty variable. If LANG is not specified or is set to the empty       string, a default of "C" (see lang(5)) is used instead of LANG. If       any internationalization variable contains an invalid setting, cut       behaves as if all internationalization variables are set to "C". See       environ(5).     International Code Set Support       The delimiter specified with the -d argument must be a single-byte       character. Otherwise, single- and multi-byte character code sets are       supported.  EXAMPLES       Password file mapping of user ID to user names:            cut -d : -f 1,5 /etc/passwd       Set environment variable name to current login name:            name=`who am i | cut -f 1 -d " "`       Convert file source containing lines of arbitrary length into two       files where file1 contains the first 500 bytes (unless the 500th byte       is within a multi-byte character), and file2 contains the remainder of       each line:            cut -b 1-500 -n source > file1            cut -b 500- -n source > file2  DIAGNOSTICS       line too long  Line length must not exceed LINE_MAX characters or                      fields, including the new-line character (see                      limits(5).       bad list for b/c/f option                      Missing -b, -c, or -f option or incorrectly specified                      list. No error occurs if a line has fewer fields than                      the list calls for.       no fields      list is empty.  WARNINGS       cut does not expand tabs. Pipe text through expand(1) if tab       expansion is required.       Backspace characters are treated the same as any other character. To       eliminate backspace characters before processing by cut, use the fold       or col command (see fold(1) and col(1)).  AUTHOR       cut was developed by OSF and HP.  SEE ALSO       grep(1), paste(1).  STANDARDS CONFORMANCE       cut: SVID2, SVID3, XPG2, XPG3, XPG4, POSIX.2 

diff

graphics/05icon08.gif

diff - File and directory comparison.

diff(1)                                                        diff(1)  NAME       diff - differential file and directory comparator  SYNOPSIS       diff [-C n] [-S name] [-lrs] [-bcefhintw] dir1 dir2       diff [-C n] [-S name] [-bcefhintw] file1 file2       diff [-D string] [-biw] file1 file2  DESCRIPTION     Comparing Directories       If both arguments are directories, diff sorts the contents of the       directories by name, then runs the regular file diff algorithm       (described below) on text files that have the same name in each       directory but are different. Binary files that differ, common       subdirectories, and files that appear in only one directory are       listed. When comparing directories, the following options are       recognized:            -l         Long output format; each text file diff is piped                       through pr to paginate it (see pr(1)). Other                       differences are remembered and summarized after all                       text file differences are reported.            -r         Applies diff recursively to common subdirectories                       encountered.            -s         diff reports files that are identical but otherwise                       not mentioned.            -S name    Starts a directory diff in the middle of the sorted                       directory, beginning with file name.     Comparing Files       When run on regular files, and when comparing text files that differ       during directory comparison, diff tells what lines must be changed in       the files to bring them into agreement. diff usually finds a smallest       sufficient set of file differences. However, it can be misled by       lines containing very few characters or by other situations. If       neither file1 nor file2 is a directory, either can be specified as -,       in which case the standard input is used. If file1 is a directory, a       file in that directory whose filename is the same as the filename of       file2 is used (and vice versa).       There are several options for output format. The default output       format contains lines resembling the following:            n1 a n3,n4            n1,n2 d n3            n1,n2 c n3,n4       These lines resemble ed commands to convert file1 into file2. The       numbers after the letters pertain to file2. In fact, by exchanging a       for d and reading backwards one may ascertain equally how to convert       file2 into file1. As in ed, identical pairs where n1=n2 or n3=n4 are       abbreviated as a single number.       Following each of these lines come all the lines that are affected in       the first file flagged by <, then all the lines that are affected in       the second file flagged by >.       Except for -b, -w, -i, or -t which can be given with any of the       others, the following options are mutually exclusive:            -e       Produce a script of a, c, and d commands for the ed                     editor suitable for recreating file2 from file1. Extra                     commands are added to the output when comparing                     directories with -e, so that the result is a shell                     script for converting text files common to the two                     directories from their state in dir1 to their state in                     dir2 (see sh-bourne(1)            -f       Produce a script similar to that of the -e option that                     is not useful with ed but is more readable by humans.            -n       Produce a script similar to that of -e, but in the                     opposite order, and with a count of changed lines on                     each insert or delete command. This is the form used by                     rcsdiff (see rcsdiff(1)).            -c       Produce a difference list with 3 lines of context. -c                     modifies the output format slightly: the output begins                     with identification of the files involved, followed by                     their creation dates, then each change separated by a                     line containing about twelve asterisks (*)s. Lines                     removed from file1 are marked with -, and lines added to                     file2 are marked +. Lines that change from one file to                     the other are marked in both files with with !. Changes                     that lie within 3 lines of each other in the file are                     grouped together on output.            -C n     Output format similar to -c but with n lines of context.            -h       Do a fast, half-hearted job. This option works only                     when changed stretches are short and well separated, but                     can be used on files of unlimited length.            -D string                     Create a merged version of file1 and file2 on the                     standard output, with C preprocessor controls included                     so that a compilation of the result without defining                     string is equivalent to compiling file1, while compiling                     the result with string defined is equivalent to                     compiling file2.            -b       Ignore trailing blanks (spaces and tabs) and treat other                     strings of blanks as equal.            -w       Ignore all whitespace (blanks and tabs). For example,                     if ( a == b ) and if(a==b) are treated as equal.            -i       Ignores uppercase/lowercase differences. Thus A is                     treated the same as a.            -t       Expand tabs in output lines. Normal or -c output adds                     one or more characters to the front of each line.                     Resulting misalignment of indentation in the original                     source lines can make the output listing difficult to                     interpret. This option preserves original source file                     indentation.  EXTERNAL INFLUENCES     Environment Variables       LANG determines the locale to use for the locale categories when both       LC_ALL and the corresponding environment variable (beginning with LC_)       do not specify a locale. If LANG is not set or is set to the empty       string, a default of "C" (see lang(5)) is used.       LC_CTYPE determines the space characters for the diff command, and the       interpretation of text within file as single- and/or multi-byte       characters.       LC_MESSAGES determines the language in which messages are displayed.       If any internationalization variable contains an invalid setting, diff       and diffh behave as if all internationalization variables are set to       "C". See environ(5).     International Code Set Support       Single- and multi-byte character code sets are supported with the       exception that diff and diffh do not recognize multi-byte alternative       space characters.  RETURN VALUE       Upon completion, diff returns with one of the following exit values:             0 No differences were found.             1 Differences were found.            >1 An error occurred.  EXAMPLES       The following command creates a script file script:            diff -e x1 x2 >script       w is added to the end of the script in order to save the file:            echo w >> script       The script file can then be used to create the file x2 from the file       x1 using the editor ed in the following manner:            ed x1 < script       The following command produces the difference output with 2 lines of       context information before and after the line that was different:            diff -C2 x1 x2       The following command ignores all blanks and tabs and ignores       uppercase-lowercase differences.            diff -wi x1 x2  WARNINGS       Editing scripts produced by the -e or -f option are naive about       creating lines consisting of a single dot (.).       When comparing directories with the -b, -w, or -i options specified,       diff first compares the files in the same manner as cmp, then runs the       diff algorithm if they are not equal. This may cause a small amount       of spurious output if the files are identical except for insignificant       blank strings or uppercase/lowercase differences.       The default algorithm requires memory allocation of roughly six times       the size of the file. If sufficient memory is not available for       handling large files, the -h option or bdiff can be used (see       bdiff(1)).       When run on directories with the -r option, diff recursively descends       sub-trees. When comparing deep multi-level directories, more memory       may be required than is currently available on the system. The amount       of memory required depends on the depth of recursion and the size of       the files.  AUTHOR       diff was developed by AT&T, the University of California, Berkeley,       and HP.  FILES       /usr/lbin/diffh used by -h option  SEE ALSO       bdiff(1), cmp(1), comm(1), diff3(1), diffmk(1), dircmp(1), ed(1),       more(1), nroff(1), rcsdiff(1), sccsdiff(1), sdiff(1), terminfo(4).  STANDARDS CONFORMANCE       diff: SVID2, SVID3, XPG2, XPG3, XPG4, POSIX.2 

dircmp

graphics/05icon11.gif

dircmp - Compare directories and produce results.

dircmp(1)                                                        dircmp(1)  NAME       dircmp - directory comparison  SYNOPSIS       dircmp [-d] [-s] [-wn] dir1 dir2  DESCRIPTION       dircmp examines dir1 and dir2 and generates various tabulated       information about the contents of the directories. Sorted listings of       files that are unique to each directory are generated for all the       options. If no option is entered, a sorted list is output indicating       whether the filenames common to both directories have the same       contents.            -d      Compare the contents of files with the same name in both                    directories and output a list telling what must be                    changed in the two files to bring them into agreement.                    The list format is described in diff(1).            -s      Suppress messages about identical files.            -wn     Change the width of the output line to n characters. The                    default width is 72.  EXTERNAL INFLUENCES     Environment Variables       LC_COLLATE determines the order in which the output is sorted.       If LC_COLLATE is not specified in the environment or is set to the       empty string, the value of LANG is used as a default. If LANG is not       specified or is set to the empty string, a default of ``C'' (see       lang(5)) is used instead of LANG. If any internationalization       variable contains an invalid setting, dircmp behaves as if all       internationalization variables are set to ``C'' (see environ(5)).     International Code Set Support       Single- and multi-byte character code sets are supported.  EXAMPLES       Compare the two directories slate and sleet and produce a list of       changes that would make the directories identical:            dircmp -d slate sleet  SEE ALSO       cmp(1), diff(1).  STANDARDS CONFORMANCE       dircmp: SVID2, SVID3, XPG2, XPG3 

join

graphics/05icon15.gif

join - Join two relations based on lines in files.

join(1)                                                            join(1)  NAME       join - relational database operator  SYNOPSIS       join [options] file1 file2  DESCRIPTION       join forms, on the standard output, a join of the two relations       specified by the lines of file1 and file2. If file1 or file2 is -,       the standard input is used.       file1 and file2 must be sorted in increasing collating sequence (see       Environment Variables below) on the fields on which they are to be       joined; normally the first in each line.       The output contains one line for each pair of lines in file1 and file2       that have identical join fields. The output line normally consists of       the common field followed by the rest of the line from file1, then the       rest of the line from file2.       The default input field separators are space, tab, or new-line. In       this case, multiple separators count as one field separator, and       leading separators are ignored. The default output field separator is       a space.       Some of the below options use the argument n. This argument should be       a 1 or a 2 referring to either file1 or file2, respectively.     Options       -a n        In addition to the normal output, produce a line for each                   unpairable line in file n, where n is 1 or 2.       -e s        Replace empty output fields by string s.       -j m        Join on field m of both files. The argument m must be                   delimited by space characters. This option and the                   following two are provided for backward compatibility.                   Use of the -1 and -2 options ( see below ) is recommended                   for portability.       -j1 m       Join on field m of file1.       -j2 m       Join on field m of file2.       -o list     Each output line comprises the fields specified in list,                   each element of which has the form n.m, where n is a file                   number and m is a field number. The common field is not                   printed unless specifically requested.       -t c        Use character c as a separator (tab character). Every                   appearance of c in a line is significant. The character c                   is used as the field separator for both input and output.       -v file_number                   Instead of the default output, produce a line only for                   each unpairable line in file_number, where file_number is                   1 or 2.       -1 f        Join on field f of file 1. Fields are numbered starting                   with 1.       -2 f        Join on field f of file 2. Fields are numbered starting                   with 1.  EXTERNAL INFLUENCES     Environment Variables       LC_COLLATE determines the collating sequence join expects from input       files.       LC_CTYPE determines the alternative blank character as an input field       separator, and the interpretation of data within files as single       and/or multi-byte characters. LC_CTYPE also determines whether the       separator defined through the -t option is a single- or multi-byte       character.       If LC_COLLATE or LC_CTYPE is not specified in the environment or is       set to the empty string, the value of LANG is used as a default for       each unspecified or empty variable. If LANG is not specified or is       set to the empty string, a default of ``C'' (see lang(5)) is used       instead of LANG. If any internationalization variable contains an       invalid setting, join behaves as if all internationalization variables       are set to ``C'' (see environ(5)).     International Code Set Support       Single- and multi-byte character code sets are supported with the       exception that multi-byte-character file names are not supported.  EXAMPLES       The following command line joins the password file and the group file,       matching on the numeric group ID, and outputting the login name, the       group name, and the login directory. It is assumed that the files       have been sorted in the collating sequence defined by the LC_COLLATE       or LANG environment variable on the group ID fields.            join -1 4 -2 3 -o 1.1 2.1 1.6 -t: /etc/passwd /etc/group       The following command produces an output consisting all possible       combinations of lines that have identical first fields in the two       sorted files sf1 and sf2, with each line consisting of the first and       third fields from sorted_file1 and the second and fourth fields from       sorted_file2:            join -j1 1 -j2 1 -o 1.1, 2.2, 1.3, 2.4 sorted_file1 sorted_file2  WARNINGS       With default field separation, the collating sequence is that of sort       -b; with -t, the sequence is that of a plain sort.       The conventions of join, sort, comm, uniq, and awk are incongruous.       Numeric filenames may cause conflict when the -o option is used       immediately before listing filenames.  AUTHOR       join was developed by OSF and HP.  SEE ALSO       awk(1), comm(1), sort(1), uniq(1).  STANDARDS CONFORMANCE       join: SVID2, SVID3, XPG2, XPG3, XPG4, POSIX.2 

paste

graphics/05icon14.gif

paste - Merge lines of files.

paste(1)                                                           paste(1)  NAME       paste - merge same lines of several files or subsequent lines of one       file  SYNOPSIS       paste file1 file2 ...       paste -d list file1 file2 ...       paste -s [-d list] file1 file2 ...  DESCRIPTION       In the first two forms, paste concatenates corresponding lines of the       given input files file1, file2, etc. It treats each file as a column       or columns in a table and pastes them together horizontally (parallel       merging). In other words, it is the horizontal counterpart of cat(1)       which concatenates vertically; i.e., one file after the other. In the       -s option form above, paste replaces the function of an older command       with the same name by combining subsequent lines of the input file       (serial merging). In all cases, lines are glued together with the tab       character, or with characters from an optionally specified list.       Output is to standard output, so paste can be used as the start of a       pipe, or as a filter if - is used instead of a file name.       paste recognizes the following options and command-line arguments:            -d        Without this option, the new-line characters of all but                      the last file (or last line in case of the -s option)                      are replaced by a tab character. This option allows                      replacing the tab character by one or more alternate                      characters (see below).            list      One or more characters immediately following -d replace                      the default tab as the line concatenation character.                      The list is used circularly; i.e., when exhausted, it                      is reused. In parallel merging (that is, no -s                      option), the lines from the last file are always                      terminated with a new-line character, not from the                      list. The list can contain the special escape                      sequences: \n (new-line), \t (tab), \\ (backslash), and                      \0 (empty string, not a null character). Quoting may                      be necessary if characters have special meaning to the                      shell. (For example, to get one backslash, use -                     d"\\\\").            -s        Merge subsequent lines rather than one from each input                      file. Use tab for concatenation, unless a list is                      specified with the -d option. Regardless of the list,                      the very last character of the file is forced to be a                      new-line.            -         Can be used in place of any file name to read a line                      from the standard input (there is no prompting).  EXTERNAL INFLUENCES     Environment Variables       LC_CTYPE determines the locale for the interpretation of text as       single- and/or multi-byte characters.       LC_MESSAGES determines the language in which messages are displayed.       If LC_CTYPE or LC_MESSAGES is not specified in the environment or is       set to the empty string, the value of LANG is used as a default for       each unspecified or empty variable. If LANG is not specified or is       set to the empty string, a default of "C" (see lang(5)) is used       instead of LANG.       If any internationalization variable contains an invalid setting,       paste behaves as if all internationalization variables are set to "C".       See environ(5).     International Code Set Support       Single- and multi-byte character code sets are supported.  RETURN VALUE       These commands return the following values upon completion:            0 Completed successfully.            >0 An error occurred.  EXAMPLES       List directory in one column:            ls | paste -d" " -      List directory in four columns            ls | paste - - - -      Combine pairs of lines into lines            paste -s -d"\t\n" file     Notes       pr -t -m... works similarly, but creates extra blanks, tabs and new      lines for a nice page layout.  DIAGNOSTICS       too many files           Except for the -s option, no more than                                OPEN_MAX - 3 input files can be specified                                (see limits(5)).  AUTHOR       paste was developed by OSF and HP.  SEE ALSO       cut(1), grep(1), pr(1).  STANDARDS CONFORMANCE       paste: SVID2, SVID3, XPG2, XPG3, XPG4, POSIX.2 

sort

graphics/05icon03.gif

sort - Sort contents of files.

sort(1)                                                             sort(1)  NAME       sort - sort or merge files  SYNOPSIS       sort [-m] [-o output] [-bdfinruM] [-t char] [-k keydef] [-y [kmem]] [-z       recsz] [-T dir] [file ...]       sort [-c] [-AbdfinruM] [-t char] [-k keydef] [-y [kmem]] [-z recsz] [-T       dir] [file ...]  DESCRIPTION       sort performs one of the following functions:            1.  Sorts lines of all the named files together and writes the                result to the specified output.            2.  Merges lines of all the named (presorted) files together and                writes the result to the specified output.            3.  Checks that a single input file is correctly presorted.       The standard input is read if - is used as a file name or no input       files are specified.       Comparisons are based on one or more sort keys extracted from each       line of input. By default, there is one sort key, the entire input       line. Ordering is lexicographic by characters using the collating       sequence of the current locale. If the locale is not specified or is       set to the POSIX locale, then ordering is lexicographic by bytes in       machine-collating sequence. If the locale includes multi-byte       characters, single-byte characters are machine-collated before multi      byte characters.     Behavior Modification Options       The following options alter the default behavior:            -A          Sorts on a byte-by-byte basis using each character's                        encoded value. On some systems, extended characters                        will be considered negative values, and so sort                        before ASCII characters. If you are sorting ASCII                        characters in a non-C/POSIX locale, this flag                        performs much faster.            -c          Check that the single input file is sorted according                        to the ordering rules. No output is produced; the                        exit code is set to indicate the result.            -m          Merge only; the input files are assumed to be already                        sorted.            -o output   The argument given is the name of an output file to                        use instead of the standard output. This file can be                        the same as one of the input files.            -u          Unique: suppress all but one in each set of lines                        having equal keys. If used with the -c option, check                        to see that there are no lines with duplicate keys,                        in addition to checking that the input file is                        sorted.            -y [kmem]   The amount of main memory used by the sort can have a                        large impact on its performance. If this option is                        omitted, sort begins using a system default memory                        size, and continues to use more space as needed. If                        this option is presented with a value, kmem, sort                        starts using that number of kilobytes of memory,                        unless the administrative minimum or maximum is                        violated, in which case the corresponding extremum                        will be used. Thus, -y 0 is guaranteed to start with                        minimum memory. By convention, -y (with no argument)                        starts with maximum memory.            -z recsz    The size of the longest line read is recorded in the                        sort phase so that buffers can be allocated during                        the merge phase. If the sort phase is omitted via                        the -c or -m options, a popular system default size                        will be used. Lines longer than the buffer size will                        cause sort to terminate abnormally. Supplying the                        actual number of bytes in the longest line to be                        merged (or some larger value) will prevent abnormal                        termination.            -T dir      Use dir as the directory for temporary scratch files                        rather than the default directory, which is is one of                        the following, tried in order: the directory as                        specified in the TMPDIR environment variable;                        /var/tmp, and finally, /tmp.     Ordering Rule Options       When ordering options appear before restricted sort key       specifications, the ordering rules are applied globally to all sort       keys. When attached to a specific sort key (described below), the       ordering options override all global ordering options for that key.       The following options override the default ordering rules:            -d          Quasi-dictionary order: only alphanumeric characters                        and blanks (spaces and tabs), as defined by LC_CTYPE                        are significant in comparisons (see environ(5)).                        (XPG4 only.) The behavior is undefined for a sort key                        to which -i or -n also applies.            -f          Fold letters. Prior to being compared, all lowercase                        letters are effectively converted into their                        uppercase equivalents, as defined by LC_CTYPE.            -i          In non-numeric comparisons, ignore all characters                        which are non-printable, as defined by LC_CTYPE. For                        the ASCII character set, octal character codes 001                        through 037 and 0177 are ignored.            -n          The sort key is restricted to an initial numeric                        string consisting of optional blanks, an optional                        minus sign, zero or more digits with optional radix                        character, and optional thousands separators. The                        radix and thousands separator characters are defined                        by LC_NUMERIC. The field is sorted by arithmetic                        value. An empty (missing) numeric field is treated                        as arithmetic zero. Leading zeros and plus or minus                        signs on zeros do not affect the ordering. The -n                        option implies the -b option (see below).            -r          Reverse the sense of comparisons.            -M          Compare as months. The first several non-blank                        characters of the field are folded to uppercase and                        compared with the langinfo(5) items ABMON_1 < ABMON_2                        < ... < ABMON_12. An invalid field is treated as                        being less than ABMON_1 string. For example,                        American month names are compared such that JAN < FEB                        < ... < DEC. An invalid field is treated as being                        less than all months. The -M option implies the -b                        option (see below).     Field Separator Options       The treatment of field separators can be altered using the options:            -t char     Use char as the field separator character; char is                        not considered to be part of a field (although it can                        be included in a sort key). Each occurrence of char                        is significant (for example, <char><char> delimits an                        empty field). If -t is not specified, <blank>                        characters will be used as default field separators;                        each maximal sequence of <blank> characters that                        follows a non-<blank> character is a field separator.            -b          Ignore leading blanks when determining the starting                        and ending positions of a restricted sort key. If                        the -b option is specified before the first -k option                        (+pos1 argument), it is applied to all -k options                        (+pos1 arguments). Otherwise, the -b option can be                        attached independently to each -k field_start or                        field_end option (+pos1 or (-pos2 argument; see                        below). Note that the -b option is only effective                        when restricted sort key specifications are given.     Restricted Sort Key            -k keydef   The keydef argument defines a restricted sort key.                        The format of this definition is                             field_start[type][,field_end[type]]                        which defines a key field beginning at field_start                        and ending at field_end. The characters at positions                        field_start and field_end are included in the key                        field, providing that field_end does not precede                        field_start. A missing field_end means the end of the                        line. Fields and characters within fields are                        numbered starting with 1. Note that this is                        different than the obsolete form of restricted sort                        keys, where numbering starts at 0. See WARNINGS                        below.                        Specifying field_start and field_end involves the                        notion of a field, a minimal sequence of characters                        followed by a field separator or a new-line. By                        default, the first blank of a sequence of blanks acts                        as the field separator. All blanks in a sequence of                        blanks are considered to be part of the next field;                        for example, all blanks at the beginning of a line                        are considered to be part of the first field.                        The arguments field_start and field_end each have the                        form m.n which are optionally followed by one or more                        of the type options b, d, f, i, n, r, or M. These                        modifiers have the functionality for this key only,                        that their command-line counterparts have for the                        entire record.                        A field_start position specified by m.n is                        interpreted to mean the nth character in the mth                        field. A missing n means .1, indicating the first                        character of the mth field. If the -b option is in                        effect, n is counted from the first non-blank                        character in the mth field.                        A field_end position specified by m.n is interpreted                        to mean the nth character in the mth field. If n is                        missing, the mth field ends at the last character of                        the field. If the -b option is in effect, n is                        counted from the first non-<blank> character in the                        mth field.                        Multiple -k options are permitted and are significant                        in command line order. A maximum of 9 -k options can                        be given. If no -k option is specified, a default                        sort key of the entire line is used. When there are                        multiple sort keys, later keys are compared only                        after all earlier keys compare equal. Lines that                        otherwise compare equal are ordered with all bytes                        significant. If all the specified keys compare                        equal, the entire record is used as the final key.                        The -k option is intended to replace the obsolete                        [+pos1 [+pos2]] notation, using field_start and                        field_end respectively. The fully specified [+pos1                        [+pos2]] form:                             +w.x-y.z                        is equivalent to:                             -k w+1.x+1,y.0 (if z == 0)                             -k w+1.x+1,y+1.z (if z >0)     Obsolete Restricted Sort Key       The notation +pos1 -pos2 restricts a sort key to one beginning at pos1       and ending at pos2. The characters at positions pos1 and pos2 are       included in the sort key (provided that pos2 does not precede pos1).       A missing -pos2 means the end of the line.       Specifying pos1 and pos2 involves the notion of a field, a minimal       sequence of characters followed by a field separator or a new-line.       By default, the first blank (space or tab) of a sequence of blanks       acts as the field separator. All blanks in a sequence of blanks are       considered to be part of the next field; for example, all blanks at       the beginning of a line are considered to be part of the first field.       pos1 and pos2 each have the form m.n optionally followed by one or       more of the flags bdfinrM. A starting position specified by +m.n is       interpreted to mean character n+1 in field m+1. A missing .n means       .0, indicating the first character of field m+1. If the b flag is in       effect, n is counted from the first non-blank in field m+1; +m.0b       refers to the first non-blank character in field m+1.       A last position specified by -m.n is interpreted to mean the nth       character (including separators) after the last character of the m th       field. A missing .n means .0, indicating the last character of the       mth field. If the b flag is in effect, n is counted from the last       leading blank in field m+1; -m.1b refers to the first non-blank in       field m+1.  EXTERNAL INFLUENCES     Environment Variables       LC_COLLATE determines the default ordering rules applied to the sort.       LC_CTYPE determines the locale for interpretation of sequences of       bytes of text data as characters (e.g., single- verses multibyte       characters in arguments and input files) and the behavior of character       classification for the -b, -d, -f, -i, and -n options.       LC_NUMERIC determines the definition of the radix and thousands       separator characters for the -n option.       LC_TIME determines the month names for the -M option.       LC_MESSAGES determines the language in which messages are displayed.       LC_ALL determines the locale to use to override the values of all the       other internationalization variables.       NLSPATH determines the location of message catalogs for the processing       of LC_MESSAGES.       LANG provides a default value for the internationalization variables       that are unset or null. If LANG is unset or null, the default value of       "C" (see lang(5)) is used.       If any of the internationalization variables contains an invalid       setting, sort behaves as if all internationalization variables are set       to "C". See environ(5).     International Code Set Support       Single- and multi-byte character code sets are supported.  EXAMPLES       Sort the contents of infile with the second field as the sort key:            sort -k 2,2 infile       Sort, in reverse order, the contents of infile1 and infile2, placing       the output in outfile and using the first two characters of the second       field as the sort key:            sort -r -o outfile -k 2.1,2.2 infile1 infile2       Sort, in reverse order, the contents of infile1 and infile2, using the       first non-blank character of the fourth field as the sort key:            sort -r -k 4.1b,4.1b infile1 infile2       Print the password file (/etc/passwd) sorted by numeric user ID (the       third colon-separated field):            sort -t: -k 3n,3 /etc/passwd       Print the lines of the presorted file infile, suppressing all but the       first occurrence of lines having the same third field:            sort -mu -k 3,3 infile  DIAGNOSTICS       sort exits with one of the following values:             0   All input files were output successfully, or -c was                 specified and the input file was correctly presorted.             1   Under the -c option, the file was not ordered as specified,                 or if the -c and -u options were both specified, two input                 lines were found with equal keys. This exit status is not                 returned if the -c option is not used.            >1   An error occurred such as when one or more input lines are                 too long.       When the last line of an input file is missing a new-line character,       sort appends one, prints a warning message, and continues.       If an error occurs when accessing the tables that contain the       collation rules for the specified language, sort prints a warning       message and defaults to the POSIX locale.       If a -d, -f, or -i option is specified for a language with multi-byte       characters, sort prints a warning message and ignores the option.  WARNINGS       Numbering of fields and characters within fields (-k option) has       changed to conform to the POSIX standard. Beginning at HP-UX Release       9.0, the -k option numbers fields and characters within fields,       starting with 1. Prior to HP-UX Release 9.0, numbering started at 0.       A field separator specified by the -t option is recognized only if it       is a single-byte character.       The character type classification categories alpha, digit, space, and       print are not defined for multi-byte characters. For languages with       multi-byte characters, all characters are significant in comparisons.  FILES       /var/tmp/stm???       /tmp/stm???  AUTHOR       sort was developed by OSF and HP.  SEE ALSO       comm(1), join(1), uniq(1), collate8(4), environ(5), hpnls(5), lang(5).  STANDARDS CONFORMANCE       sort: SVID2, SVID3, XPG2, XPG3, XPG4, POSIX.2 

tr

graphics/05icon16.gif

tr - Substitute selected characters.

tr(1)                                                                 tr(1)  NAME       tr - translate characters  SYNOPSIS       tr [-Acs] string1 string2       tr -s [-Ac] string1       tr -d [-Ac] string1       tr -ds [-Ac] string1 string1  DESCRIPTION       tr copies the standard input to the standard output with substitution       or deletion of selected characters. Input characters from string1 are       replaced with the corresponding characters in string2. If necessary,       string1 and string2 can be quoted to avoid pattern matching by the       shell.       tr recognizes the following command line options:            -A             Translates on a byte-by-byte basis. When this flag                           is specified tr does not support extended                           characters.            -c             Complements the set of characters in string1,                           which is the set of all characters in the current                           character set, as defined by the current setting                           of LC_CTYPE, except for those actually specified                           in the string1 argument. These characters are                           placed in the array in ascending collation                           sequence, as defined by the current setting of                           LC_COLLATE.            -d             Deletes all occurrences of input characters or                           collating elements found in the array specified in                           string1.                           If -c and -d are both specified, all characters                           except those specified by string1 are deleted. The                           contents of string2 are ignored, unless -s is also                           specified. Note, however, that the same string                           cannot be used for both the -d and the -s flags;                           when both flags are specified, both string1 (used                           for deletion) and string2 (used for squeezing) are                           required.                           If -d is not specified, each input character or                           collating element found in the array specified by                           string1 is replaced by the character or collating                           element in the same relative position specified by                           string2.            -s             Replaces any character specified in string1 that                           occurs as a string of two or more repeating                           characters as a single instance of the character                           in string2.                           If the string2 contains a character class, the                           argument's array contains all of the characters in                           that character class. For example:                           tr -s '[:space:]'                           In a case conversion, however, the string2 array                           contains only those characters defined as the                           second characters in each of the toupper or                           tolower character pairs, as appropriate. For                           example:                           tr -s '[:upper:]' '[:lower:]'       The following abbreviation conventions can be used to introduce ranges       of characters, repeated characters or single-character collating       elements into the strings:            c1-c2 or       Stands for the range of collating elements c1            [c1-c2]        through c2, inclusive, as defined by the current                           setting of the LC_COLLATE locale category.            [:class:]or    Stands for all the characters belonging to the            [[:class:]]    defined character class, as defined by the current                           setting of LC_CTYPE locale category. The following                           character class names will be accepted when                           specified in string1: alnum, alpha, blank, cntrl.                           digit, graph, lower, print, punct, space, upper,                           or xdigit, Character classes are expanded in                           collation order.                           When the -d and -s flags are specified together,                           any of the character class names are accepted in                           string2; otherwise, only character class names                           lower or upper are accepted in string2 and then                           only if the corresponding character class (upper                           and lower, respectively) is specified in the same                           relative position in string1. Such a                           specification is interpreted as a request for case                           conversion.                           When [:lower:] appears in string1 and [:upper:]                           appears in string2, the arrays contain the                           characters from the toupper mapping in the                           LC_CTYPE category of the current locale. When                           [:upper:] appears in string1 and [:lower:] appears                           in string2, the arrays contain the characters from                           the tolower mapping in the LC_CTYPE category of                           the current locale.            [=c=]or        Stands for all the characters or collating            [[=c=]]        elements belonging to the same equivalence class                           as c, as defined by the current setting of                           LC_COLLATE locale category. An equivalence class                           expression is allowed only in string1, or in                           string2 when it is being used by the combined -d                           and -s options.            [a*n]          Stands for n repetitions of a. If the first digit                           of n is 0, n is considered octal; otherwise, n is                           treated as a decimal value. A zero or missing n                           is interpreted as large enough to extend string2-                          based sequence to the length of the string1-based                           sequence.       The escape character \ can be used as in the shell to remove special       meaning from any character in a string. In addition, \ followed by 1,       2, or 3 octal digits represents the character whose ASCII code is       given by those digits.       An ASCII NUL character in string1 or string2 can be represented only       as an escaped character; i.e. as \000, but is treated like other       characters and translated correctly if so specified. NUL characters       in the input are not stripped out unless the option -d "\000" is       given.  EXTERNAL INFLUENCES     Environment Variables       LANG provides a default value for the internationalization variables       that are unset or null. If LANG is unset or null, the default value of       "C" (see lang(5)) is used. If any of the internationalization       variables contains an invalid setting, tr will behave as if all       internationalization variables are set to "C". See environ(5).       LC_ALL If set to a non-empty string value, overrides the values of all       the other internationalization variables.       LC_CTYPE determines the interpretation of text as single and/or       multi-byte characters, the classification of characters as printable,       and the characters matched by character class expressions in regular       expressions.       LC_MESSAGES determines the locale that should be used to affect the       format and contents of diagnostic messages written to standard error       and informative messages written to standard output.       NLSPATH determines the location of message catalogues for the       processing of LC_MESSAGES.  RETURN VALUE       tr exits with one of the following values:             0 All input was processed successfully.            >0 An error occurred.  EXAMPLES       For the ASCII character set and default collation sequence, create a       list of all the words in file1, one per line in file2, where a word is       taken to be a maximal string of alphabetics. Quote the strings to       protect the special characters from interpretation by the shell ( 012       is the ASCII code for a new-line (line feed) character:            tr -cs "[A-Z][a-z]" "[\012*]" <file1 >file2       Same as above, but for all character sets and collation sequences:            tr -cs "[:alpha:]" "[\012*]" <file1 >file2       Translate all lower case characters in file1 to upper case and write       the result to standard output.                 tr "[:lower:]" "[:upper:]" <file1       Use an equivalence class to identify accented variants of the base       character e in file1, strip them of diacritical marks and write the       result to file2:            tr "[=e=]" "[e*]" <file1 >file2       Translate each digit in file1 to a # (number sign), and write the       result to file2.            tr "0-9" "[#*]" <file1 >file2       The * (asterisk) tells tr to repeat the # (number sign) enough times       to make the second string as long as the first one.  AUTHOR       tr was developed by OSF and HP.  SEE ALSO       ed(1), sh(1), ascii(5), environ(5), lang(5), regexp(5).  STANDARDS CONFORMANCE       tr: SVID2, SVID3, XPG2, XPG3, XPG4, POSIX.2 

wc

graphics/05icon01.gif

wc - Count words, bytes, and lines.

wc(1)                                                               wc(1)  NAME       wc - word, line, and byte or character count  SYNOPSIS       wc [-c|-m] [-lw] [names]  DESCRIPTION       The wc command counts lines, words, and bytes or characters in the       named files, or in the standard input if no names are specified. It       also keeps a total count for all named files.       A word is a maximal string of characters delimited by spaces, tabs, or       new-lines.       wc recognizes the following command-line options:            -c        Write to the standard output the number of bytes in                      each input file.            -m        Write to the standard output the number of characters                      in each input file.            -w        Write to the standard output the number of words in                      each input file.            -l        Write to the standard output the number of newline                      characters in each input file.       The c and m options are mutually exclusive. Otherwise, the l, w, and       c or m options can be used in any combination to specify that a subset       of lines, words, and bytes or characters are to be reported.       When any option is specified, wc will report only the information       requested by the specified options. If no option is specified, The       default output is -lwc.       When names are specified on the command line, they are printed along       with the counts.  EXTERNAL INFLUENCES     Environment Variables       LC_CTYPE determines the range of graphics and space characters, and       the interpretation of text as single- and/or multi-byte characters.       LC_MESSAGES determines the language in which messages are displayed.       If LC_CTYPE or LC_MESSAGES is not specified in the environment or is       set to the empty string, the value of LANG is used as a default for       each unspecified or empty variable. If LANG is not specified or is       set to the empty string, a default of "C" (see lang(5)) is used       instead of LANG.       If any internationalization variable contains an invalid setting, wc       behaves as if all internationalization variables are set to "C". See       environ(5).     International Code Set Support       Single- and multi-byte character code sets are supported.  WARNINGS       The wc command counts the number of newlines to determine the line       count. If a text file has a final line that is not terminated with a       newline character, the count will be off by one.     Standard Output (XPG4 only)       By default, the standard output contains an entry for each input file       of the form:       "%d %d %d %s\n", <newlines>, <words>, <bytes>, <file>       If the -m option is specified, the number of characters replaces the       <bytes> field in this format.       If any options are specified and the -l option is not specified, the       number of newlines are not written.       If any options are specified and the -w option is not specified, the       number of words are not written.       If any options are specified and neither -c nor -m is specified, the       number of bytes or characters are not written.       If no input file operands are specified, no flie name is written and       no blank characters preceding the pathname is written.       If more than one input file operand is specified, an additional line       is written, of the same format as the other lines, except that the       word total (in the POSIX Locale) is written instead of a pathname and       the total of each column is written as appropriate. Such an       additional line, if any, is written at the end of the input.     Exit Status       The wc utility shall exit with one of the following values            0 Successful completion.            >0 An error occured.  EXAMPLES       Print the number of words and characters in file1:            wc -wm file1       The following is printed when the above command is executed:            n1 n2 file1     where n1 is the number of words and n2 is the number of characters in       file1.  STANDARDS CONFORMANCE       wc: SVID2, SVID3, XPG2, XPG3, XPG4, POSIX.2 
CONTENTS


UNIX User's Handbook
UNIX Users Handbook (2nd Edition)
ISBN: 0130654191
EAN: 2147483647
Year: 2001
Pages: 34

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net