Section 3.10. Matching a Newline with a Dot


3.9. Extended Regular Expressions

Regular expressions are frequently cryptic, especially as they get longer. The x directive enables you to stretch out a regex across multiple lines; spaces and newlines are ignored, so that you can use these for indentation and readability. This also encourages the use of comments, although comments are possible even in simple regexes.

For a contrived example of a moderately complex regular expression, let's suppose that we had a list of addresses like this:

addresses =   [ "409 W Jackson Ave",           "No. 27 Grande Place",     "16000 Pennsylvania Avenue",   "2367 St. George St.",     "22 Rue Morgue",               "33 Rue St. Denis",     "44 Rue Zeeday",               "55 Santa Monica Blvd.",     "123 Main St., Apt. 234",      "123 Main St., #234",     "345 Euneva Avenue, Suite 23", "678 Euneva Ave, Suite A"]


In these examples, each address consists of three partsa number, a street name, and an optional suite or apartment number. I'm making the arbitrary rules that there can be an optional No. on the front of the number, and the period may be omitted. Likewise let's arbitrarily say that the street name may consist of ordinary word characters but also allows the apostrophe, hyphen, and period. Finally, if the optional suite number is used, it must be preceded by a comma and one of the tokens Apt., Suite, or # (number sign).

Here is the regular expression I created for this. Notice that I've commented it heavily (maybe even too heavily):

regex = / ^                  # Beginning of string           ((No\.?)\s+)?      # Optional: No[.]           \d+ \s+            # Digits and spacing           ((\w|[.'-])+       # Street name... may be            \s*               #   multiple words.           )+           (,\s*              # Optional: Comma etc.            (Apt\.?|Suite|\#) # Apt[.], Suite, #            \s+               # Spacing            (\d+|[A-Z])       # Numbers or single letter           )?           $                  # End of string         /x


The point here is clear. When your regex reaches a certain threshold (which is a matter of opinion), make it an extended regex so that you can format it and add comments.

You may have noticed that I used ordinary Ruby comments here (# ...) instead of regex comments ((?#...)). Why did I do that? Simply because I could. The regex comments are needed only when the comment needs to be closed other than at the end of the line (for example, when more "meat" of the regex follows the comment on the same line).




The Ruby Way(c) Solutions and Techniques in Ruby Programming
The Ruby Way, Second Edition: Solutions and Techniques in Ruby Programming (2nd Edition)
ISBN: 0672328844
EAN: 2147483647
Year: 2004
Pages: 269
Authors: Hal Fulton

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net