Section 3.2. Compiling Regular Expressions


3.1. Regular Expression Syntax

The typical regular expression is delimited by a pair of slashes; the %r form can also be used. Table 3.1, "Basic Regular Expressions," gives some simple examples:

Table 3.1. Basic Regular Expressions

Regex

Explanation

/Ruby/

Match the single word Ruby

/[Rr]uby/

Match Ruby or ruby

/^abc/

Match an abc at beginning of line

%r(xyz$)

Match an xyz at end of line

%r|[0-9]*|

Match any sequence of (zero or more) digits


It is also possible to place a modifier, consisting of a single letter, immediately after a regex. Table 3.2 shows the most common modifiers:

Table 3.2. Regular Expression Modifiers

Modifier

Meaning

i

Ignore case in regex

o

Perform expression substitution only once

m

Multiline mode (dot matches newline)

x

Extended regex (allow whitespace, comments)


Others will be covered in Chapter 4.

To complete our introduction to regular expressions, Table 3.3 lists the most common symbols and notations available:

Table 3.3. Common Notations Used in Regular Expressions

Notation

Meaning

^

Beginning of a line or string

$

End of a line or string

.

Any character except newline (unless multiline)

\w

Word character (digit, letter, or underscore)

\W

Non-word character

\s

Whitespace character (space, tab, newline, and so on)

\S

Non-whitespace character

\d

Digit (same as [0-9])

\D

Non-digit

\A

Beginning of a string

\Z

End of a string or before newline at the end

\z

End of a string

\b

Word boundary (outside [ ] only)

\B

Non-word boundary

\b

Backspace (inside [ ] only)

[]

Any single character of set

*

0 or more of previous subexpression

*?

0 or more of previous subexpression (non-greedy)

+

1 or more of previous subexpression

+?

1 or more of previous subexpression (non-greedy)

{m,n}

m to n instances of previous subexpression

{m,n}?

m to n instances of previous subexpression (non-greedy)

?

0 or 1 of previous regular expression

|

Alternatives

(?= )

Positive lookahead

(?! )

Negative lookahead

()

Grouping of subexpressions

(?> )

Embedded subexpression

(?: )

Non-capturing group

(?imx-imx)

Turn options on/off henceforth

(?imx-imx:expr)

Turn options on/off for this expression

(?# )

Comment


An understanding of regex handling greatly benefits the modern programmer. A complete discussion of this topic is far beyond the scope of this book, but if you're interested see the definitive work Mastering Regular Expressions by Jeffrey Friedl.

For additions and extensions to the material in this section, refer to section 3.13, "Ruby and Oniguruma."




The Ruby Way(c) Solutions and Techniques in Ruby Programming
The Ruby Way, Second Edition: Solutions and Techniques in Ruby Programming (2nd Edition)
ISBN: 0672328844
EAN: 2147483647
Year: 2004
Pages: 269
Authors: Hal Fulton

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net