sed | UNIX: The Complete Reference, Second Edition (Complete Reference Series)

sed works in basically the same way as awk: it takes a set of patterns and simple editing commands and applies them to an input stream. It has a different syntax (which will seem very familiar if you are a vi user, but will probably be rather difficult if you are not), and slightly different capabilities. In particular, it lacks the field processing and control flow features of awk. Most programs which can be written in sed can also be written in awk. However, sed can be very useful for performing a simple set of editing commands on input before sending it on to awk.

How sed Works

To edit a file with sed, you give it a list of editing commands and the filename. For example, the following command deletes the first line of the file data and prints the result to standard output:

 $ sed '1d' data

Note that editing commands are enclosed in single quotation marks. This is because the editing command list is treated as an argument to the sed command, and it may contain spaces, newlines, or other special characters. The name of the file to edit can be specified as the second argument on the command line. If you do not give it a filename, sed reads and edits standard input.

The sed command reads its input one line at a time. If a line is selected by a command in the command list, sed performs the appropriate editing operation and prints the resulting line. If a line is not selected, it is copied to standard output. Editing commands and line addresses are very similar to the commands and addresses used with ed, which is discussed in Appendix A. Experienced vi users will also recognize many of the commands.

sed does not modify the original file. To save the changes sed makes, use file redirection, as in

 $ sed '1d' data > newdata

Selecting Lines

The sed editing commands generally consist of an address and an operation. The address tells sed which lines to act on. There are two ways to specify addresses: by line numbers and by regular expression patterns.

As the previous example showed, you can specify a line with a single number. You can also specify a range of lines, by listing the first and last lines in the range, separated by a comma. The following command deletes the first four lines of data:

 $ sed '1,4d' data

Regular expression patterns select all lines that contain a string matching the pattern. The following command removes all lines containing “New York” from the file states:

 $ sed '/New York/d' states

sed uses the same regular expressions as awk. You can also specify a range using two regular expressions separated by a comma, just like in awk.

Editing Commands

In addition to the delete command (d), sed supports a (append), i (insert), and c (change) for adding text. It uses r and w to read from or write to a file.

By default, sed prints all lines to standard output. If you invoke sed with the -n option (no copy), only those lines that you explicitly print are sent to standard output. For example, the following prints lines 10 through 20 only:

 $ sed -n '10,20p' file

Replacing Strings

The substitute (s) command works like the similar vi command. This example switches all occurrences of 2006 to 2007 in the file scheduling:

 $ sed 's/2006/2007/g' scheduling

Because there is no line address or pattern at the beginning, this command will be applied to every line in the input file. As in vi, the g at the end of the substitution stands for “global”. It causes the substitution to be applies to every part of the line that matches the pattern.

You can also use an explicit search pattern to find all the lines containing the string “2006” before applying the substitution:

 $ sed '/2006/s//2007/g'

This command tells sed to operate on all lines containing the pattern 2006, and in each of those lines to change all instances of the target (2006) to 2007.

Substitution is a very common use of sed. If you are not familiar with this syntax for substitutions, you might want to review vi substitutions in Chapter 5.

Using sed and awk Together

It is often convenient to use sed and awk together to solve a problem. Even though awk has a full set of commands for manipulating text, using sed to filter the input to awk can simplify and clarify things. You can use sed for its simple text editing capabilities, and awk for its ability to deal with fields and records, as well as for its rich programming capabilities.

The following example shows how you can use sed and awk together to extract a list of songs from a music database. Here is part of the entry for one song from an XML music data file:

 $ cat mysongs <key>Name</key><string>Airportman</string> <key>Artist</key><string>R. E .M. </string> <key>Album</key><string>Up</string> <key>Genre</key><string>Rock</string> <key>Kind</key><string>MPEG audio file</string> <key>Size</key><integer>4091947</integer> <key>Total Time</key><integer>255608</integer> <key>Track Number</key><integer>1</integer> <key>Track Count</key><integer>14</integer> <key>Year</key><integer>1998</integer>

The data is stored as a simple keyword/value pair, with XML markup tags.

In its current form, the information is hard to read. Also, there are some fields that you don’t really need. You can use sed to turn this file full of data into a useful table. Specifically, you can eliminate the XML tags and create a table showing the song title, artist, album, and track number.

Processing the File with sed

The first step is to use sed to remove the XML tags, and to insert a : after the keyword in each line. Inserting the : isn’t too hard. The substitution

 s/<\/key>/: /

will replace the “</key>” entries with “: ”. Removing the XML tags, however, is a bit more difficult. The substitution

 s/<.*>//g

will actually delete everything from the first < to the last >. That’s because * is greedy, meaning it will try to match the largest pattern possible-in this case, most of the line. The substitution

 s/<[^>]*>//g

will do the trick, although it’s more complicated. The pattern “<[^>]*>” matches a <, then any string of characters that does not include >, and finally a > sign. So the substitution will delete the XML tags “<key>”, “<integer>”, and “</integer>”.

You can combine the two substitutions on one line with a; and run a single sed command:

 $ sed 's/<\/key>/: /; s/< [^>]*>//g' mysongs Name: Airportman Artist: R. E. M. Album: Up Genre: Rock Kind: MPEG audio file Size: 4091947 Total Time: 255608 Track Number: 1 Track Count: 14 Year: 1998

This is an improvement. It’s readable; but it still has a block structure and it still includes extra information.

You can remove the extra lines with statements like

 /Kind: / d

or remove them all at once with

 /(Kind|Genre|Size|Total Time|Track Count Year): / d

but the output is still on multiple lines:

 $ sed 's/<\/key>/: /; s/< [^>]*>//g; > /(Kind|Genre|Size Total Time|Track Count Year): / d' mysongs Name: Airportman Artist: R. E. M. Album: Up Track Count: 14

That’s fine for a short example like this, but not ideal for a long file with many entries.

Using sed as a Filter for awk

At this point, a better solution would be to use sed to remove the field name along with the XML tags, and then pass the results to awk. You can then use awk to select only the fields that you want, and arrange them so that they are all on one line and in the right order.

The sed command to remove the “<key>…</key>” data and the other tags is

 $ sed 's/<key>.*<\/key>//; s/< [^>]*>//g' mysongs Airportman R.E.M. Up Rock MPEG audio file 4091947 255608 1 1 1 14 1998

The awk command will read the records in this format and use the field variables to select the fields you want and output them in the proper order. Since the input records use newline as the field delimiter and a blank line as the record delimiter, the awk program includes an initial statement defining the field separator (FS) and record separator (RS) accordingly.

The commands for the awk program are in the file makesonglist.

 $ cat makesonglist       BEGIN {FS="\n"; RS=""; OFS="\t"}             {print $2, $3, $1, $10}

Putting the sed command together with the awk program produces the result you want.

 $ sed 's/<key>.*<\/key>//; s/< [^>]*>//g' mysongs | awk -f makesonglist R.E.M.       Up    Airportman        1

Troubleshooting Your awk Programs

If awk finds an error in a program, it will give you a “Syntax error” message. This can be frustrating, especially to a beginner, as the syntax of awk programs can be tricky Here are some points to check if you are getting a mysterious error message or if you are not getting the output you expect:

Make sure that there is a space between the final single quotation mark in the command line and any arguments or input filenames that follow it.
Make sure you enclosed the awk program in single quotation marks to protect it from interpretation by the shell.
Make sure you put braces around the action statement.
Do not confuse the operators == and =. Use == for comparing the value of two variables or expressions. Use = to assign a value to a variable.
Regular expressions must be enclosed in forward slashes, not backslashes.
If you are using a filename inside a program, it must be enclosed in quotation marks. (But filenames on the command line are not enclosed in quotation marks.)
Each pattern/action pair should be on its own line to ensure the readability of your program. However, if you choose to combine them, use a semicolon in between.
If your field separator is something other than a space, and you are sending output to a new file, specify the output field separator as well as the input field separator in order to get the expected results.
If you change the order of fields or add a new field, use a print statement as part of the action statement, or the new modified field will not be created.
If an action statement takes more than one line, the opening brace must be on the same line as the pattern statement.
Remember to use a > if you want to redirect output to a file on the command line.