Section 4.12. Transforming Files


[Page 140 (continued)]

4.12. Transforming Files

There are several utilities that perform a transformation on the contents of a file, including the following:

  • gzip, gunzip, and zcat, which convert a file into a space-efficient intermediate format and then back again. These utilities are useful for saving disk space.

  • sed, a general-purpose programmable stream editor that edits a file according to a pre-prepared set of instructions.

  • tr, which maps characters from one set to another. This utility is useful for performing simple mappings such as converting a file from uppercase to lowercase.

  • ul, which converts embedded underline sequences in a file to a form suitable for a particular terminal type.


[Page 141]

The next few subsections contain a description of each utility.

4.12.1. Compressing Files: gzip and gunzip

The GNU file compression utility is called gzip. The utility to uncompress the file, as you might guess, is gunzip. Figure 4-24 resolves these utilities.

Figure 4-24. Description of the gzip, gunzip, and zcat commands.

Utility: gzip -cv { fileName }+

gunzip -cv { fileName }+

zcat { fileName }+

gzip replaces a file by its compressed version, appending a ".gz" suffix. The -c option sends the result to standard output rather than overwriting the original file. The -v option displays the amount of compression that takes place.

The gunzip command uncompresses a file created by gzip.

zcat is equivalent to gunzip -c.


This is how to use gzip and gunzip:

$ ls -lG palindrome.c reverse.c  -rw-r--r--   1 ables        224 Jul  1 14:14 palindrome.c  -rw-r--r--   1 ables        266 Jul  1 14:14 reverse.c $ gzip -v palindrome.c  reverse.c palindrome.c:            34.3% -- replaced with palindrome.c.gz reverse.c:               39.4% -- replaced with reverse.c.gz $ ls -lG palindrome.c.gz reverse.c.gz  -rw-r--r--   1 ables        178 Jul  1 14:14 palindrome.c.gz  -rw-r--r--   1 ables        189 Jul  1 14:14 reverse.c.gz $ gunzip -v *.gz palindrome.c.gz:         34.3% -- replaced with palindrome.c reverse.c.gz:            39.4% -- replaced with reverse.c $ ls -lG palindrome.c reverse.c  -rw-r--r--   1 ables        224 Jul   1 14:14 palindrome.c  -rw-r--r--   1 ables        266 Jul   1 14:14 reverse.c $ _ 


4.12.2. Stream Editing: sed

The stream editor utility sed scans one or more files and performs an editing action on all of the lines that match a particular condition. The actions and conditions may be stored in a sed script. sed is useful for performing simple repetitive editing tasks.

sed is a fairly comprehensive utility. Because of this, I've only attempted to describe the main features and options of sed; however, the material in this section will allow you to write a good number of useful sed scripts.


[Page 142]

Figure 4-25 gives a synopsis of sed.

Figure 4-25. Description of the sed command.

Utility: sed [ -e script ] [ -f scriptfile ] { fileName }*

sed is a utility that edits an input stream according to a script that contains editing commands. Each editing command is separated by a newline, and describes an action and a line or range of lines to perform the action upon. A sed script may be stored in a file and executed by using the -f option. If a script is placed directly on the command line, it should be surrounded by single quotes. If no files are specified, sed reads from standard input. The format of sed scripts is described in the following sections.


4.12.2.1. sed commands

A sed script is a list of one or more of the commands shown in Figure 4-26, separated by newlines.

Figure 4-26. Editing commands in sed.

Command syntax

Meaning

address a\text

Append text after the line specified by address.

addressRange c\text

Replace the text specified by addressRange with text.

addressRange d

Delete the text specified by addressRange.

address i\text

Insert text after the line specified by address.

address r name

Append the contents of the file name after the line specified by address.

addressRange s/expr/str/

Substitute the first occurrence of the regular expression expr by the string str.

addressRange s/expr/str/g

Substitute every occurrence of the regular expression expr by the string str.


The following rules apply:

  • address must be either a line number or a regular expression. A regular expression selects all of the lines that match the expression. You may use $ to select the last line.


  • [Page 143]
  • addressRange can be a single address or a couple of addresses separated by commas. If two addresses are specified, then all of the lines between the first line that matches the first address and the first line that matches the second address are selected.

  • If no address is specified, then the command is applied to all of the lines.

4.12.2.2. Substituting Text

In the following example, I supplied the sed script on the command line. The script inserted a couple of spaces at the start of every line.

$ cat arms                      ...look at the original file. People just like me, Are all around the world, Waiting for the loved ones that they need. And with my heart, I make a simple wish, Plain enough for anyone to see. $ sed 's/^/  /' arms > arms.indent    ...indent the file. $ cat arms.indent                     ...look at the result.  People just like me,  Are all around the world,  Waiting for the loved ones that they need.  And with my heart,  I make a simple wish,  Plain enough for anyone to see. $ _ 


To remove all of the leading spaces from a file, use the substitute operator in the reverse fashion:

$ sed 's/^ *//' arms.indent    ...remove leading spaces. People just like me, Are all around the world, Waiting for the loved ones that they need. And with my heart, I make a simple wish, Plain enough for anyone to see. $ _ 


4.12.2.3. Deleting Text

The next example illustrates a script that deleted all of the lines that contained the character 'a':

$ sed '/a/d' arms      ...remove all lines containing an 'a'. People just like me, $ _ 



[Page 144]

To delete only those lines that contain the word 'a', I surrounded the regular expression by escaped angled brackets (\< and \>):

$ sed '/\<a\>/d' arms People just like me, Are all around the world, Waiting for the loved ones that they need. And with my heart, Plain enough for anyone to see. $ _ 


4.12.2.4. Inserting Text

In the next example, I inserted a copyright notice at the top of the file by using the insert command. Notice that I stored the sed script in a file and executed it by using the -f option.

$ cat sed5                   ...look at the sed script. 1i\ Copyright 1992, 1998, & 2002 by Graham Glass\ All rights reserved\ $ sed -f sed5 arms           ...insert a copyright notice. Copyright 1992, 1998, & 2002 by Graham Glass All rights reserved People just like me, Are all around the world, Waiting for the loved ones that they need. And with my heart, I make a simple wish, Plain enough for anyone to see. $ _ 


4.12.2.5. Replacing Text

To replace lines, use the change function. In the following example, I replaced the group of lines 1..3 with a censored message:

$ cat sed6                    ...list the sed script. 1,3c\ Lines 1-3 are censored. $ sed -f sed6 arms            ...execute the script. Lines 1-3 are censored. And with my heart, I make a simple wish, Plain enough for anyone to see. $ _ 



[Page 145]

To replace individual lines with a message rather than an entire group, supply a separate command for each line:

$ cat sed7                    ...list the sed script. 1c\ Line 1 is censored. 2c\ Line 2 is censored. 3c\ Line 3 is censored. $ sed -f sed7 arms            ...execute the script. Line 1 is censored. Line 2 is censored. Line 3 is censored. And with my heart, I make a simple wish, Plain enough for anyone to see. $ _ 


4.12.2.6. Inserting Files

In the following example, I inserted a message after the last line of the file:

$ cat insert     ...list the file to be inserted. The End $ sed '$r insert' arms   ...execute the script. People just like me, Are all around the world, Waiting for the loved ones that they need. And with my heart, I make a simple wish, Plain enough for anyone to see. The End $ _ 


4.12.2.7. Multiple sed Commands

This last example illustrates the use of multiple sed commands. I inserted a "<<" sequence at the start of each line, and appended a ">>" sequence to the end of each line:

$ sed -e 's/^/<< /' -e 's/$/ >>/' arms << People just like me, >> << Are all around the world, >> << Waiting for the loved ones that they need. >> << And with my heart, >> << I make a simple wish, >> << Plain enough for anyone to see. >> $ _ 



[Page 146]

4.12.3. Translating Characters: tr

The tr utility maps the characters in a file from one character set to another (Figure 4-27).

Figure 4-27. Description of the tr command.

Utility: tr -cds string1 string2

tr maps all of the characters in its standard input from the character set string1 to the character set string2. If the length of string2 is less than the length of string1, it's padded by repeating its last character; in other words, the command "tr abc de" is equivalent to "tr abc dee".

A character set may be specified using the [] notation of shell filename substitution:

  • To specify the character set a, d, and f, simply write them as a single string: adf.

  • To specify the character set a through z, separate the start and end characters by a dash: a-z.

By default, tr replaces every character of standard input in string1 by its corresponding character in string2.

The -c option causes string1 to be complemented before the mapping is performed. Complementing a string means that it is replaced by a string that contains every ASCII character except those in the original string. The net effect is that every character of standard input that does not occur in string1 is replaced.

The -d option causes every character in string1 to be deleted from standard input.

The -s option causes every repeated output character to be condensed into a single instance.


Here are some examples of tr in action:

$ cat go.cart                 ...list the sample input file. go cart racing $ tr a-z A-Z < go.cart        ...translate lower to uppercase. GO CART RACING $ tr a-c D-E < go.cart        ...replace abc by DEE. go EDrt rDEing $ tr -c a X < go.cart         ...replace every non-a with X. XXXXaXXXXXaXXXXX$             ...even last newline is replaced. $ tr -c a-z '\012' < go.cart     ...replace non-alphas with go                               ...ASCII 12 (newline). 
[Page 147]
cart racing $ tr -cs a-z '\012' < go.cart ...repeat, but condense go ...repeated newlines. cart racing $ tr -d a-c < go.cart ...delete all a-c characters. go rt ring $ _


4.12.4. Converting Underline Sequences: ul

The ul utility transforms a file that contains underlining characters so that it appears correctly on a particular terminal type. This is useful with commands like man that generate underlined text. Figure 4-28 describes how ul works.

Figure 4-28. Description of the ul command.

Utility: ul -tterminal { filename }*

ul is a utility that transforms underline characters in its input so that they will display correctly on the specified terminal. If no terminal is specified, the one defined by the TERM environment variable is assumed. The "/etc/termcap" file (or terminfo database) is used by ul to determine the correct underline sequence.


For example, let's say that you want to use the man utility to produce a document that you wish to print on a simple ASCII-only printer. The man utility generates underline characters for your current terminal, so to filter the output so that it's suitable for a dumb printer, pipe the output of man through ul with the "dumb" terminal setting. Here's an example:

$ man who | ul -tdumb > man.txt $ head man.txt           ...look at the first 10 lines. WHO(1)                 User Commands                   WHO(1) NAME     who - show who is logged on SYNOPSIS     who [OPTION]... [ FILE | ARG1 ARG2 ] DESCRIPTION $ _ 





Linux for Programmers and Users
Linux for Programmers and Users
ISBN: 0131857487
EAN: 2147483647
Year: 2007
Pages: 339

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net