Project59.Learn the sed Stream Editor


Project 59. Learn the sed Stream Editor

"How do I write a script to perform the same sequence of editing commands on a number of text files?"

This project shows you how to use the sed stream editor, which changes text files by reading editing commands from a script. Project 58 shows how to apply such commands to a batch of files. Project 61 covers more advanced use of sed, and Projects 60 and 62 cover the awk command.

The sed Basics

The sed stream editor was written to edit text files, but it's not an interactive editor like nano, vim, or emacs. Instead of following commands entered "live" by a user, sed executes edits according to instructions provided in a command script. The most common use of sed is to apply the same set of edits to many files, either as a one-time transformation or at regular intervalsto make a small change across hundreds of HTML files, for example, or to process Apache log files once a day.

The sed command writes its output to standard out, so it can easily create new files as it edits existing ones. As of Mac OS X 10.4 (Tiger), sed also accepts the option -i, which directs it to write changes back to the original source file.

A sed script consists of editing commands; each command describes a line range and a function. When sed receives files as input, it reads each line by line. When an input line falls in a command's line range, sed applies the corresponding function to that input line. An input line may fall in the line range of many commands and, therefore, will have many functions applied to it.

You can write a sed script directly to the command line or to a file. This project considers simple scripts of just a few lines, which we'll write directly to the command line. Scripts that are more complex are usually written to files and are the subject of Project 61.

Tip

Specify option -i and a filename extension to make sed write the edits back to the input file instead of to standard out. The original file is saved in a backup file named as the original file plus the specified extension. The following command changes fuse.txt and writes the original file to fuse.txt.bak.

$ sed -i.bak ¬    's/Jill/Jillean/g' ¬    fuse.txt



Next, we'll look at a few examples to clarify what we've just learned.

Let's Edit

Substituting one pattern for another is a common use for sed. Should you wish to be formal and replace Jill with Jillean, for example, you could employ the following command.

First, let's view the original file.

$ cat fuse.txt Jill has a short fuse - light it and stand well back. So who lit Jill's fuse, and did he stand well back? Read on


Next, invoke sed to perform the edit by typing

$ sed 's/Jill/Jillean/g' fuse.txt Jillean has a short fuse - light it and stand well back. So who lit Jillean's fuse, and did he stand well back? Read on


We invoked sed, passing the quoted script 's/Jill/Jillean/g' and the name of the file to process. Although not necessary in this example, it's wise always to quote the script to prevent the shell from expanding special characters before passing them to sed.

Our script consists of one command, which does not define a line range; therefore, its function is applied to every line of the file. The function is s for substitute, which has the syntax s/match-text/replace-text/flags. The flag g is for global replace; see "sed Functions" later in this project.

Our next example deletes all blank lines from a file. The sed command to do this specifies a line range "every blank line" and the function delete. "Every blank line" is defined by the regular expression ^$, delimited by forward slash characters (/). The function d deletes the matched line.

$ sed '/^$/d' blanks.txt


Line ranges are usually given as plain text or regular expressions, and all lines that contain a match for the text or regular expression fall in the line range.

Make sed grep

Make sed behave like grep by combining function p, to print matching lines, and option -n, to suppress the automatic echoing of every input line. We'll search the file biff.txt for all lines that contain the text Biff.

$ sed -n '/Biff/p' biff.txt *I forget the name - let's assume Biff for want of a single syllable, grunt-able word). *'Biff' grinned, and I swear that I could hear a few synaptic connections sparking the thought 'threesome' (had he been able to count to three). *I declined the unspoken suggestion, confusing Biff somewhat


(I've added a star [ * ] to mark each line because lines in the original text occupy several lines when printed in this book.)

In this example, the line range is described by the plain-text expression /Biff/ ; every line that contains the text Biff will fall in the range and be printed by the function p.

Encode a File

Have some fun encoding text files. In this example, we apply function y, which transforms input lines by replacing characters listed in the first set with those listed in the second set. Our example shifts all letters in the input text one place to the right. No filenames are specified, so sed reads and writes standard input and output. Press Control-d (end of input) when you get bored.

$ sed 'y/abcdefghijklmnopqrstuvwxyz/¬     bcedfghijklmnopqrstuvwxyza/' this is just a bit of fun uijt jt kvtu b cju pg gvo <Control-d>


Learn More

Projects 77 and 78 explain regular expressions.


Tip

In older versions of sed that do not have the -i option, you can write back to the original file as follows.

$ sed 's/witch/which/' ¬     input > tmp; ¬     mv tmp input


Project 6 covers input/output redirection.


Line Ranges in sed Scripts

Tip

Specify option -E should the expression be an extended regular expression; otherwise, sed will recognize only basic regular expressions.


Tip

You can specify line addresses to sed with a line number instead of a pattern. To make sed act like the head command and print the first three lines of a file, for example, we could type

$ sed -n '1,3p' ¬       index.php



Let's examine line ranges in more detail. The most basic line range is the empty one that matches all lines in the input file. When not empty, a line range may be a single address, which usually consists of a regular expression (such as /^$/ to select all empty lines) or plain text (such as /Jill/ to select all lines containing the text Jill).

Two addresses separated by a comma select all lines from the first line that matches the first address to the first subsequent line that matches the second address. To select just the lines in Chapter One, for example, we might specify the line range /Chapter One/,/Chapter Two/ (assuming that Chapter One starts with the text "Chapter One," and similarly for Chapter Two).

sed Functions

Immediately following a line range, sed expects to see an editing function to be applied to each line in the range. Here are some of the most useful sed functions:

  • s substitutes one pattern for another pattern. For example, s/witch/which/ replaces all occurrences of witch by which. Only the first occurrence on each line is replaced. If you wish to replace all occurrences on a line, specify the flag g (global) s/witch/which/g. The pattern to match against may be a regular expression.

  • d deletes the line.

  • p prints the line to standard out.

  • w writes the line to a file.

  • y replaces characters like the TR command, covered in Project 57. To convert a line to all capitals, we would specify the function y/abcdefghijklmnopqrstuvwxyz/ABCDEFGHIJKLMNOPQRSTUVWXYZ/.

Multiple sed Commands

Suppose that we have a couple of replacements to make to a text file and that we also want to delete lines that contain the author's notes. We could apply sed several times, once for each edit, but instead, let's take advantage of the fact that a sed script, like any script, can include multiple commands.

With sed, there are three methods available for writing multiple-command scripts:

  • Separate the commands with semicolons.

  • Present several commands by specifying sed option -e.

  • Write multiple commands to a script file.

The following examples have Jan drinking gin instead of vodka, make Sophie 5 years younger (she'll love me for that), and remove the author's notes. Here's the original text.

$ cat sophie.txt Note Move this section down I returned to planet earth when a lady sat down beside me and announced: "Hi, I'm Sophie". "Hello, I'm... (thinking through a Vodka haze) Jan". Note Check the grammar here She smiled and we chatted for a while. Sophie was about 30, bleached-blonde, good-looking, and just a shade overweight.


Now let's apply a three-command sed script in which we separate the lines of the script with semicolons.

$ sed 's/Vodka/Gin/g;s/30/25/g;/^[N|n]ote/d' sophie.txt I returned to planet earth when a lady sat down beside me and announced: "Hi, I'm Sophie". "Hello, I'm... (thinking through a Gin haze) Jan". She smiled and we chatted for a while. Sophie was about 25, bleached-blonde, good-looking, and just a shade overweight.


Alternatively, we could specify three separate commands by typing

$ sed -e 's/Vodka/Gin/g' -e 's/30/25/g' -e '/^[N|n]ote/d' ¬      sophie.txt


For the third alternative, we'll create a script file called 3edits and pass the name of that file to sed. A sed script is a regular text file with each edit command on a separate line.

$ cat 3edits s/Vodka/Gin/g s/30/25/g /^[N|n]ote/d $ sed -f 3edits sophie.txt


All three alternatives yield the same results.

Complex Line Ranges

In previous examples, we used a single address (in the form of a regular expression) to match the lines we wanted to edit. Tell sed to select a range of lines to edit by specifying two addresses. The first line to match the first address marks the start of the range. The first line after that to match the second address marks the end of the range.

As an example, suppose that we want to remove all of many paragraphs within an HTML file that have been assigned the class tail. Each begins with the HTML tag <p >, and each ends with a closing tag </p>, like this abbreviated example.

<p > <b>Six Vodkas: </b>A tale of Vodka, misunderstanding, an... </p>


To delete all such paragraphs, we use the following command.

$ sed '/<p >/,/<\/p>/d' tails.html


Having matched and deleted the first such paragraph, sed continues to search the file for subsequent ranges and deletes all those it finds.

To print the paragraphs instead of deleting them, type

$ sed -n '/<p >/,/<\/p>/p' index.php ... <p > <b>The Immovable Object: </b>A tale of pain, pursed lips... </p>


Suppose you wish to edit only in paragraphs that meet certain criteriachanging tale to tail only within paragraphs of class tail, for example. To do so, we again use a range to specify the matching criteria and apply a sed substitute function to the range.

$ sed '/<p >/,/<\/p>/s/tale/tail/' index.php <p > <b>The Immovable Object: </b>A tail of pain, pursed lips... </p>


Note

In these examples, we allowed sed to write to standard output to better illustrate what's happening. Normally, we would write back to the original file by specifying option -i.


Tails

If you are wondering why tales is spelled tails, visit the home page of Jan's Web site (http://jan.1dot1.com); it's an HTML joke.





Mac OS X UNIX 101 Byte-Sized Projects
Mac OS X Unix 101 Byte-Sized Projects
ISBN: 0321374118
EAN: 2147483647
Year: 2003
Pages: 153
Authors: Adrian Mayo

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net