Project 57. Edit Text Files"How do I quickly strip blank lines from hundreds of files?" Learn More
This project introduces utilities that apply simple transformations to text files, such as translating case, removing excessive white space, stripping blank lines, folding long lines, and converting between space and tab characters. It covers the commands TR, expand, unexpand, fold, and fmt. When you need to modify a file in a more complex manner, consider using either sed, covered in Projects 59 and 61, or awk, covered in Projects 60 and 62. Change File ContentThe tr command searches a file for specific characters and translates them into other characters. It takes as its arguments two strings, translating characters found in the first string into the corresponding characters from the second string. Rather oddly, tr does not take filename arguments but always reads its standard input and writes to its standard output. We'll get'round this by using input/output redirection. Let's convert the contents of the file jill.txt to be all uppercase, reading jill.txt as standard input and writing to loud.txt as standard output. $ cat jill.txt She likes black. I'm not sure if she has ambitions to be a Goth, or an undertaker. Just a passing phase I suspect - hearse today, gone tomorrow. $ tr 'abcdefghijklmnopqrstuvwxyz' ¬ 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' <jill.txt >loud.txt $ cat loud.txt SHE LIKES BLACK. I'M NOT SURE IF SHE HAS AMBITIONS TO BE A GOTH, OR AN UNDERTAKER. JUST A PASSING PHASE I SUSPECT - HEARSE TODAY, GONE TOMORROW. The tr command understands character classes such as "all lowercase characters" or "all printable characters." Therefore, we can shorten our command to $ tr '[:lower:]' '[:upper:]' <jill.txt >loud.txt Read the man page for tr for a list of the classes it recognizes. Convert from Mac to UnixA typical use for TR is to convert an old-style Mac file to a Unix-compliant file. Mac OS 9 used a Return character (ASCII code 13) to mark the end of a line, whereas Unix uses a Newline character (ASCII code 10). A Mac-style text file appears to consist of one very long line when viewed by a Unix text-processing utility and editor. Suppose that we have files imported from Mac OS 9 and wish to make them play nice in a Unix environment. We use tr to translate Return, represented by the special sequence \r, into Newline, represented by \n. $ tr '\r' '\n' < mac-file > unix-file Write Back to the Original FileIt's not possible to write output back to the file being read because of the way input/output redirection works. However, the following trick, which uses a semicolon to separate two commands on a single line, will produce that effect. The command before the semicolon redirects translated output from mac-file to a new file called tmp. When that command completes, the mv command renames tmp to mac-file, overwriting the original file with a translated replacement. $ tr '\r' '\n' < mac-file > tmp; mv tmp mac-file Strip Lines and CharactersIn this section, we look at ways to tidy up files. You might want to strip out nonprinting characters, for example, or remove excessive white space. Tip
Let's start by removing excessive white spacewhich we define as being two or more consecutive spaces, tabs (\t), or Newlines (\n)from file spaced. We employ the TR command again, with option -s to squeeze repeated occurrences of selected characters into a single occurrence, and direct the translated output to file squashed. Tip
$ tr -s ' \t\n' <spaced >squashed Alternatively, if we accept the definition of white space given by man 3 isspace, we can achieve the same effect by typing $ tr -s '[:space:]' <spaced >squashed Next, suppose that you have a file containing control and other nonprinting characters that you wish to remove. Let's view the file by using cat and option -v to display nonprinting characters visibly (for example, Control-a is displayed by ^A). $ cat -v control abc^A^B^C def^D^E^F ghi^G^H jkl uvw^U^V^W xyz To remove all nonprinting characters, use tr. As a first attempt, try applying it with option -d, which deletes specified characters. Use the class [:print:], which specifies all printing characters (the ones you want to preserve), but then use option -c to specify the inverse of the class (everything that isn't in the class): $ tr -cd '[:print:]' <control abc def ghijkluvw xyz$ This deletes all non printing characters, all right, but unfortunately, those characters include some, such as Tab and Newline, that are essential to text formatting. We can get around this problem by adding the class [:space:] to the selected characters. Our next attempt deletes all characters that are not printable but leaves behind "white space" that provides formatting. $ tr -cd '[:print:][:space:]' <control abc def ghi jkl uvw xyz Expand and UnexpandThe command expand expands tab characters into the appropriate number of spaces; the command unexpand does the reverse. Pass option -a to unexpand to ensure that all spaces are converted; otherwise, only leading spaces are converted. Files containing long lines, such as those often found in HTML source code, can have their contents broken into shorter lines with the fold command. Here's the original file (which is one long line, but shown split across four lines in the book). $ cat count He puzzled at my counting. Did I exceed his range? Apparently, there is an African tribe who count one, two, many. So perhaps he's related. More likely he's related to the African tribe who have "one too many". We specify that the output should be lines exactly 50 characters in length by using the option -w50. The output is shown displayed on the Terminal screen, but we could redirect it to a file or back to the original file by using the technique described in "Write Back to the Original File" earlier in this project. $ fold -w50 count He puzzled at my counting. Did I exceed his range? Apparently, there is an African tribe who count o ne, two, many. So perhaps he's related. More likely he's related to the African tribe who have "one too many". Alternatively, we might use the command fmt, which breaks lines at spaces instead of midword. $ fmt -50 count He puzzled at my counting. Did I exceed his range? Apparently, there is an African tribe who count one, two, many. So perhaps he's related. More likely he's related to the African tribe who have "one too many".
Tip
|