Module 113 Regular Expressions (REs) | Illustrated Unix System V/Bsd

Table of Contents

Module 113
Regular Expressions (REs)

DESCRIPTION

Regular expressions are patterns consisting of regular ASCII characters. They are used by various UNIX commands to search for a string of text that matches a given pattern. Most characters used in a regular expression pattern represent themselves . Special characters provide a way to restrict or generalize the pattern you are trying to match.

The nawk , ed , egrep , ex , expr , grep , pg , sed , ksh , and vi commands all use some form of regular expressions to perform these search and match functions. Unfortunately, each command has its own set of regular expressions. Thus there is no standard set to remember, although most of them share the same base of regular expressions with different additional regular expressions.

This module describes all possible regular expressions in full. Examples are provided for each pattern. Each command that utilizes regular expressions contains a brief listing of the regular expressions it supports.

REGULAR EXPRESSIONS

The following table contains each regular expression and the task that it performs when used inside a pattern. Regular expressions are often referred to as REs in UNIX terminology. Thus we use the RE notation for uniformity and briefness.

Metacharacters

Metacharacters are the special characters used in regular expression patterns that have special meanings. Metacharacters are often referred to as special or magic characters. The following is a brief listing of these characters.

\	Escapes the meaning of a metacharacter.
^	Matches the beginning of the line.
$	Matches the end of the line.
.	Matches any single character.
[ ]	A character class. Matches any one character in the class.
	Alternation of regular expressions. Matches either one or the other of the regular expressions provided.
( )	Concatenation operation. Normally the parentheses are omitted.
*	Matches zero or more of the preceding regular expression.
+	Matches one or more of the preceding regular expression.
?	Matches zero or one of the preceding regular expression.

To have a metacharacter interpreted as a normal character, precede it with a backslash (\). Multiple metacharacters may be combined to define variable expressions consisting of only metachar- acters.

BASIC REGULAR EXPRESSIONS

Basic regular expressions make up the building blocks of all regular expressions.

Simple Regular Expressions

Simple regular expressions are plain text that do not contain any metacharacters. Some examples follow.

NOTE:
The UNIX operating system and its utilities are case sensitive. Each ASCII character is recognized by its unique code. Thus when you search for strings of text you must be aware of the uppercase verses lowercase mismatch. Regular expressions provide a means of circumventing the problems associated with case sensitive searches. There are also commands that can be used to translate text to uppercase or lowercase only.

John	Matches the word John . It does not match john or JOHN.
Go home now	Matches the string Go home now .
Not again	Matches the string Not again .

Single-character Regular Expressions

Single-character regular expression are used to match themselves or various combinations of strings. They allow you to generalize the string you want to match. In some situations you may consider them as a way to restrict the matching of unwanted strings.

Character	Description


c	A single ordinary character that matches itself. For example:

 ice cream

	matches the string ice cream. Each character in the string has to match.
c	A single special character that has been escaped. It no longer has a special meaning and thus matches itself. Special characters are . (period), * (asterisk), [ (left bracket ), \ (backslash), ^ (caret), $ (dollar sign) and ~ (tilde). For example:

 Tim\.

	matches the string Tim.
.	Match any single character except for a new-line. For example:

t.m

	matches such strings as tammie , timmy , tom , tommy , and tim .
[ list ]	Matches any one of the ASCII characters listed within the brackets. The four special characters ., *, [, and \ represent themselves if used within the brackets. For example:

 s[ea]t

	matches sat and set but does not match seat .
[ c1 - c2 ]	Match any one of the ASCII characters in the range defined within the brackets. The - (hyphen) may be used to indicate a range of ASCII characters. This is part of the list feature described for the previous notation. For instance:

 [0-9]

	matches the range of number from 0 to 9. It is equivalent to [0123456789].
[^ list ]	Do NOT match any of the ASCII characters listed within the brackets. Ranges may be specified. For example:

 [^0-9]

	matches all characters that do not contain a number.
*	Matches zero or more occurrences of the preceding regular expression. Matches the longest possible string. For example:

 se*d

matches sd , sed , and seed . The pattern

 s.*d

matches sd , sed , seed , sid , sod , so don t move ahead and various other strings.
Remember, this regular expression matches the longest possible string. For
example, if you have the following text:

 Start at the starting line when we start.

and you use a regular expression like

 s/t.*t/top/

you substitute stop for the longest possible string. The result is:

 Stop.

Table of Contents