Introducing Regular Expressions


ColdFusion includes support for regular expressions. If you've worked at all with Perl, you probably know all about regular expressions because they are such a central part of Perl's string handling and manipulation capabilities, and generally walk hand in hand with the Perl language itself. As a rule, they aren't nearly as important to ColdFusion coders as they are to Perl coders, but regular expressions are still incredibly useful in ColdFusion development.

This chapter introduced you to regular expressions and explains how they can be used in ColdFusion applications.

What Are Regular Expressions?

Regular expressions are a way of looking for characters within chunks of text, using special wildcards to describe exactly what you're looking for. There are a lot of different wildcards you can use, from the simple * and ? characters that you probably recognize from the DOS or Unix command line, to less common, more powerful wildcards that really only apply to regular expressions.

What Are Regular Expressions Similar To?

The analogy isn't perfect, but you can think of regular expressions as being kind of like WHERE statements in SQL, except that regular expressions are for querying plain text rather than database tables. Instead of specifying what records you want to find with a WHERE clause, you specify which characters you want to find using regular expressions.

Actually, the analogy works better if you think of regular expressions as being specifically analogous to a SELECT query that uses the LIKE keyword to search the database based on wildcards. You remember the LIKE keyword from SQL, don't you? It lets you select records using syntax such as:

 SELECT * FROM Films WHERE Summary LIKE '%color%' 

As you probably know, the database would respond to this query with all films that contain the word color in the summary. The % characters are behaving as wildcards; you can think of each % as being shorthand for saying "any amount of text." So you are asking the database to return all records where Summary includes any amount of text, followed by the word color, followed by any amount of text. SQL also lets you use sets of characters as wildcards, like this:

 SELECT * FROM Films WHERE Summary LIKE '%[Pp]ress [0-9]%' 

To this second query, the database would respond with all films where the summary contains the phrase Press 1 or Press 2 (or Press 3, and so on), using either a lowercase or uppercase P.

Even if you're not familiar with these SQL wildcards, you can see the basic idea. The various wildcard characters are used to describe what you're looking for. Regular expressions are really no different conceptually, except that there are lots of wildcards instead of only a few.

NOTE

Regular expression purists may shudder at the way I'm using the term "wildcard" here. Bear with me. We'll get to the nitty-gritty later.


At the risk of belaboring this introduction, and as I hinted in the first paragraph, you can also think of regular expressions as similar to the * and ? wildcards that you use on the command line to find files. Again, as you probably know, MS-DOS lets you use commands like this:

 c:\>dir P*.txt 

This command finds all files in the current directory that start with P and that have a .txt extension. The * wildcard does the same thing here as the % wildcard does in SQL: It stands in for the idea of any number of characters.

So, you're already familiar with a couple of regular expressionlike ways of using wildcards to find information. Now you just need to learn the specific wildcards you can use with regular expressions, and how to use them in your ColdFusion applications. That's what the rest of this chapter is all about.

What Are Regular Expressions Used For?

Within the context of ColdFusion applications, regular expressions are generally used for these purposes:

  • Pattern searching. Regular expressions can be used as a kind of search utility that finds one or more exact occurrences of a pattern. By pattern, I mean a word, number, entire phrase, or any combination of characters, both printable and not. A match is successful when one or more occurrences of the pattern exist. You might use pattern searching to find all telephone numbers in a given paragraph of text, or all hyperlinks in a chunk of HTML.

  • Pattern testing. Testing a pattern is a form of data validation, and an excellent one at that. The regular expression in this context is the rule, or set of rules, that your data conforms to in order to pass the test. You might use pattern testing to validate a user's form entries.

  • Pattern removal. Pattern removal ensures data integrity by allowing you to search and remove unwanted or hazardous patterns within a block of text. Any string that causes complications within your application is hazardous. You might use pattern removal to remove all curse words, email addresses, or telephone numbers from a chunk of text, leaving the rest of the text alone.

  • Pattern replacement. Functioning as a search-and-replace mechanism, pattern replacement allows you to find one or more occurrences of a pattern within a block of text and then replace it with a new pattern, parts of the original pattern, or a mixture of both. You might use pattern replacement to surround all email addresses in a block of text with a mailto: hyperlink so the user can click the address to send a message.

You'll see regular expressions being used for each of these purposes in this chapter's example listings.

What Do Regular Expressions Look Like?

Just so you can get a quick sense of what they look like, I'll show you some regular expressions now. Unless you've used regular expressions before, don't expect to understand these examples at this point. I'm showing them to you now just so you'll get an idea of how powerful the various wildcards are.

This regular expression matches the abbreviation CFML (each letter can be in upper- or lowercase, and each letter may or may not have a period after it):

 [Cc]\.?[Ff]\.?[Mm]\.?[Ll]\.? 

This regular expression matches any HTML tag (or, for that matter, a CFML, XML, or any other type of angle-bracketed tag):

 <[^>]*> 

This regular expression is one way of matching an email address:

 ([\w._]+)\@([\w_]+(\.[\w_]+)+) 

Do Regular Expressions Differ Among Languages?

Yes. There are many tools and programming languages that provide regular expression functionality of one sort or another. Perl, JavaScript, grep/egrep, POSIX, and ColdFusion are just a few; there are plenty more. Over the years, some of the tools and languages have added their own extensions or improvements. Most of the basic regular expression wildcards will work in any of these tools, but other wildcards might work in Tool A but not in Tool B, or might have a slightly different meaning in Tool C. People often refer to the various levels of compatibility as "flavors" (the Perl flavor, the POSIX flavor, and so on).

You can think of these tweaks and flavors as resembling the various changes and improvements that have been made over the years to SQL, to the point where queries written for Access, Oracle, and Sybase databases might look considerably different (especially if the queries are doing something complicated). But that doesn't change the fact that they are all based on the same basic syntax; if you've learned one, you've basically learned them all.

NOTE

The term "regular expression" gets a bit tedious to read over and over again, so I will often use the term RegEx instead. It's a customary way to shorten the term.


Historically, ColdFusion's support for regular expressions has been no different from that of other languages. Many developers complained about how ColdFusion seemed to borrow bits and pieces of other RexEx flavors, without ever being truly compatible with any of them. This situation has been rectified with ColdFusion. The regular expression support in CFML is now based on the RegEx flavor that developers wanted most: the Perl 5 flavor. There are only a few differences, and those differences flow naturally from the inherent differences between the languages themselves.



Advanced Macromedia ColdFusion MX 7 Application Development
Advanced Macromedia ColdFusion MX 7 Application Development
ISBN: 0321292693
EAN: 2147483647
Year: 2006
Pages: 240
Authors: Ben Forta, et al

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net