Editing and Formatting Files


There are many ways to edit and format files in the UNIX System. Chapter 5 described the text editors vi and emacs. Chapter 21 will explain how to use awk and sed to write programs that modify file contents. In addition, the troff, nroff, and LaTeX systems can be used to create formatted documents. For example, many of the UNIX man pages are formatted with nroff, which is why they cannot be saved to a file with

 $ man command > manfile

To save a man page, use the command

 $ man command | col −b > manfile

which sends the output of man through the col filter for nroff output. Formatting documents with troff, nroff, and LaTeX is explained in detail on the companion web site.

The commands pr and fmt can be used to add simple formatting to a file, such as a header with page numbers, often before printing it.

The tr command is a small but useful tool for processing text. It translates characters according to a simple set of rules.

spell searches a file for misspelled words. The related commands ispell and aspell allow you to interactively correct the spelling in a file.

pr

The most common use of pr is to add a header to every page of a file. The header contains the page number, date, time, and name of the file. For example, if names is a simple data file that contains a short list of names and addresses, with no header information, then with pr, you get the following:

 $ pr names Aug 28 15:25 2006  names Page 1 Nate   nate@engineer.com Rebecca   rlf@library.edu Dan   dkraut@bio.ca.edu Liz   liz@thebest.net

pr is often used to add header information to files when they are printed, as shown here:

 $ pr notes lp

If you name several files, each one will have its own header, with the appropriate name and page numbers in the output.

You can also use pr in a pipeline to add header information to the output of another command. This is very useful for printing data files when you need to keep track of date or version information. The following commands print out the long format file listing of the current directory with a header that includes today’s date:

 $ ls −1 pr lp

You can customize the heading with the h option followed by the heading you want. The following command prints “Chapter 19 --- First Draft” at the top of each page of output:

 $ pr −h "Chapter 19 --- First Draft" chapter19 | lp

Note that when the header text contains spaces, it must be enclosed by quotation marks.

Simple Formatting with pr

pr also has options for simple formatting. To double-space a file when you print it, use the d option. The n option adds line numbers to the output. The following command prints the file double-spaced and with line numbers:

 $ pr −d −n program.c lp

You can use pr to print output in two or more columns. For example, the following prints the names of the files in the current directory in three columns:

 $ ls pr −3 lp

pr handles simple page layout, including setting the number of lines per page, the line width, and the offset of the left margin. The following command specifies a line width of 60 characters, a left margin offset of eight characters, and a page length of 60 lines:

 $ pr −w 60 −o 8 −1 60 note lp

fmt

Another simple formatter, fmt, can be used to control the width of your output. fmt breaks, fills, or joins lines in the input you give it and produces output lines that have (up to) the number of characters you specify The default width is 72 characters, but you can use the w option to specify other line widths. fmt is a quick way to consolidate files that contain lots of short lines, or eliminate long lines from files before sending them to a printer. In general it makes ragged files look better. The following illustrates how fmt works.

 $ cat sample This is an example of a short file that contains lines of varying width.

We can even up the lines in the file sample as follows.

 $ fmt −w 16 sample This is an example of a short file that contains lines of varying width.

tr

tr replaces one set of characters with another set. For example, you could use tr to translate all the : (colon) characters in the /etc/passwd file into tabs, like this:

 $ tr : '\t' < /etc/passwd root    x       0       0    root   /root    /bin/bash dbp     x     944     100    Dave Barker-Plummer   /home/dbp   /bin/bash etch    x     945     100    John Etchemendy       /home/etch  /bin/bash a-liu   x     946     100    Albert Liu      /home/a-liu    /bin/bash

In this example, the escape sequence \t stands for the TAB character. It is enclosed in single quotes to prevent the shell from interpreting it. File redirection (with the input operator <) is used to send the contents of /etc/passwd to tr. The tr command is one of the few common UNIX System tools that does not allow you to specify a filename as an argument. tr only reads standard input, so you have to use input redirection or a pipe to give it input.

The tr command can translate any number of characters. In general, you give tr two lists of characters: the list of characters to be translated, and the list of characters they will be replaced by. tr translates the first character in the input list to the first character in the output list, the second input character to the second output character, and so on. For example, the following command replaces the characters a, b, and c in lowerfile with the corresponding uppercase letters, and saves the output to a new file:

 $ tr abc ABC < lowerfile > upperfile

Because each character in the input list corresponds to one character in the output list, the two lists must have the same number of characters.

Specifying Ranges and Repetitions

You can use brackets and a minus sign () to indicate a range of characters, similar to the use of range patterns in regular expressions and filename matching. The following example uses tr to translate all lowercase letters in name_file to uppercase:

 $ cat name_file ben robin dan marissa $ tr ' [a-z] ' ' [A-Z] ' < name_file BEN ROBIN DAN MARISSA

tr can be used to encode or decode text using simple substitution ciphers (codes). A specific example of this is the rot13 cipher, which replaces each letter in the input text with the letter 13 letters later in the alphabet (wrapping around at the end). For instance, k is translated to x and Y is translated to L. The following command encrypts a file using this rule. Note that rot13 maps lowercase letters to lowercase letters and uppercase letters to uppercase letters.

 $ cat hello Hello, world $ tr "[a-m] [n-z] [A-M] [N-Z]" "[n-z] [a-m] [N-Z] [A-M]" < hello > code.rot13 $ cat code.rot13 Uryyb, j beyq

You can use the same tr command to decrypt a file encrypted with the rot13 rule. The rot13 cipher is sometimes used to weakly encrypt potentially offensive jokes in newsgroups.

If you want to translate each of a set of input characters to the same single output character, you can use an asterisk to tell tr to repeat the output character. For example, the following replaces each digit in the input with the number sign (#).

 $ tr ' [0–9] ' ' [#*] ' < data

This particular feature of tr is not found in all versions of UNIX.

Removing Repeated Characters

The previous example translates digits to number signs. Each digit of a number will produce a number sign in the output. For example, 1024 comes out as #. You can tell tr to remove repeated characters from the translated string with the s (squeeze) option. The following version of the preceding command replaces each number in the input with a single number sign in the output, regardless of how many digits it contains:

 $ tr −s ' [0–9] ' ' [#*] ' < data

You can use tr to create a list of all the words appearing in a file. The following command puts every word in the file on a separate line by replacing each group of spaces with a newline. It then sorts the words into alphabetical order and uses uniq to produce an output that lists each word and the number of times it occurs in the file.

 $ cat short_file This is the first line. And this is the last. $ cat short_file | tr −s ' ' '\n' sort | uniq −c 1 And 1 This 1 first 2 is 1 last. 1 line. 2 the 1 this

If you wanted to list words in order of descending frequency, you could pipe the output of uniq c to sort rn.

Other Options for tr

Sometimes it is convenient to specify the input list by its complement, that is, by telling tr which characters not to translate. You can do this with the c (complement) option.

The following command makes nonalphanumeric characters in a file easily visible by translating characters that are not alphabetic or digits to an underscore.

 $ tr −c ' [A-Z] [a-z] [0–9] ' ' [_*] ' < messyfile

You can use the d (delete) option to tell tr to delete characters in the input set from its output. This is an easy way to remove special or nonprinting characters from a file. The following command uses the c and d options to remove everything except alphabetic characters and digits:

 $ tr −cd " [a-z] [A-Z] [0–9]" < messyfile

In particular, this example will delete punctuation marks, spaces, and other characters.

spell

spell is a UNIX command that allows you to check the spelling in a file. Running

 $ spell textfile

will produce a list of the words that are misspelled in textfile. The option b causes spell to use British spellings.

Linux systems come with the command ispell, which allows you to interactively correct misspelled words. ispell can be downloaded from http://ficus-www.cs.uda.edu/geoff/ispell.html. A similar program, called aspell, can be found at http://aspell.net/. To check the spelling in a file with aspell, use

 $ aspell check textfile

aspell often does a better job of suggesting alternatives to misspelled words than ispell. The manual can be found online at http://aspell.net/man-html/index.html.




UNIX. The Complete Reference
UNIX: The Complete Reference, Second Edition (Complete Reference Series)
ISBN: 0072263369
EAN: 2147483647
Year: 2006
Pages: 316

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net