7.2. Metacharacters

 < Day Day Up > 

Different metacharacters have different meanings, depending upon where they are used. In particular, regular expressions used for searching through text (matching) have one set of metacharacters, while the metacharacters used when processing replacement text (such as in a text editor) have a different set. These sets also vary somewhat per program. This section covers the metacharacters used for searching and replacing, with descriptions of the variants in the different utilities.

7.2.1. Search Patterns

The characters in the following table have special meaning only in search patterns .

Character

Pattern

.

Match any single character except newline. Can match newline in awk.

*

Match any number (or none) of the single character that immediately precedes it. The preceding character can also be a regular expression. For example, since . (dot) means any character, .* means "match any number of any character."

^

Match the following regular expression at the beginning of the line or string.

$

Match the preceding regular expression at the end of the line or string.

[ ]

Match any one of the enclosed characters: a hyphen (-) indicates a range of consecutive characters. A circumflex (^) as the first character in the brackets reverses the sense: it matches any one character not in the list. A hyphen or close bracket (]) as the first character is treated as a member of the list. All other metacharacters are treated as members of the list (i.e., literally).

{n,m}

Match a range of occurrences of the single character that immediately precedes it. The preceding character can also be a regular expression. {n} matches exactly n occurrences, {n,} matches at least n occurrences, and {n,m} matches any number of occurrences between n and m. n and m must be between 0 and 255, inclusive. (GNU programs allow a range of 0 to 32,767.)

\{n,m\}

Just like {n,m}, earlier, but with backslashes in front of the braces. (Historically, different utilities used different syntaxes for the same thing.)

\

Turn off the special meaning of the following character.

\( \)

Save the subpattern enclosed between \( and \) into a special holding space. Up to nine subpatterns can be saved on a single line. The text matched by the subpatterns can be "replayed" in substitutions by the escape sequences \1 to \9.

\n

Replay the nth subpattern enclosed in \( and \) into the pattern at this point. n is a number from 1 to 9, with 1 starting on the left. See the following Examples.

\< \>

Match characters at beginning (\<) or end (\>) of a word.

+

Match one or more instances of preceding regular expression.

?

Match zero or one instances of preceding regular expression.

|

Match the regular expression specified before or after the vertical bar (alternation).

( )

Apply a match to the enclosed group of regular expressions.


Many Unix systems allow the use of POSIX "character classes" within the square brackets that enclose a group of characters. They are typed enclosed in [: and :]. For example, [[:alnum:]] matches a single alphanumeric character.

Class

Characters matched

alnum

Alphanumeric characters

lower

Lowercase characters

alpha

Alphabetic characters

print

Printable characters

blank

Space or TAB

punct

Punctuation characters

cntrl

Control characters

space

Whitespace characters

digit

Decimal digits

upper

Uppercase characters

graph

Non-space characters

xdigit

Hexadecimal digits


Finally, the GNU versions of the standard utilities accept additional escape sequences that act like metacharacters . (Because \b can also be interpreted as the sequence for the ASCII Backspace character, different utilities treat it differently. Check each utility's documentation.)

Sequence

Meaning

\b

Word boundary, either beginning or end of a word, as for the \< and \> metacharacters described earlier.

\B

Interword match; matches between two word-constituent characters.

\w

Matches any word-constituent character; equivalent to [[:alnum:]_].

\W

Matches any non-word-constituent character; equivalent to [^[:alnum:]_].

\'

Beginning of an Emacs buffer. Used by most other GNU utilities to mean unambiguously "beginning of string."

\'

End of an Emacs buffer. Used by most other GNU utilities to mean unambiguously "end of string."


7.2.2. Replacement Patterns

The characters in the following table have special meaning only in replacement patterns .

Character

Pattern

\

Turn off the special meaning of the following character.

\n

Reuse the text matched by the nth subpattern previously saved by \( and \) as part of the replacement pattern . n is a number from 1 to 9, with 1 starting on the left.

&

Reuse the text matched by the search pattern as part of the replacement pattern.

~

Reuse the previous replacement pattern in the current replacement pattern. Must be the only character in the replacement pattern (ex and vi).

%

Reuse the previous replacement pattern in the current replacement pattern. Must be the only character in the replacement pattern (ed).

\u

Convert first character of replacement pattern to uppercase.

\U

Convert entire replacement pattern to uppercase.

\l

Convert first character of replacement pattern to lowercase.

\L

Convert entire replacement pattern to lowercase.

\e

Turn off previous \u or \l.

\E

Turn off previous \U or \L.


     < Day Day Up > 


    Unix in a Nutshell
    Unix in a Nutshell, Fourth Edition
    ISBN: 0596100299
    EAN: 2147483647
    Year: 2005
    Pages: 201

    flylib.com © 2008-2017.
    If you may any questions please contact us: flylib@qtcs.net