Section 11.10. Using Regular Expressions

   

11.10 Using Regular Expressions

Regular expressions are descriptions of textual patterns that enable string matching. You have likely used at least a simple regular expression. Perhaps the most common character in a regular expression is the wildcard character ( * ). So if you have ever entered *.cfm to view only the ColdFusion markup files in a directory, you have used regular expressions. You may or may not have had occasion to use regular expressions with ColdFusion functions.

Regular expressions are a useful tool, and we will cover them here briefly as the JDK 1.4 makes them easy to use; the new classes are a package under java.util .

11.10.1 Regex in CF

Using a regular expression in ColdFusion boils down to implementing one of two functions: REFind() and REFindNoCase() . These functions are prefixed with RE , which stands for regular expression.

REFind() performs a case-sensitive search that returns the position of the first occurrence of a regular expression in a string, beginning with the position specified. If no occurrences are found, returns .

REFindNoCase() is a case-insensitive version of REFind() . Here are a few examples showing the different kinds of usage of this function:

A simple regex:

 <cfoutput>#REFindNoCase("a+c", "abcaaccdd")#</cfoutput> 

Using POSIX syntax:

 <cfoutput>#REFindNoCase("[[:alpha:]]", "aBBccDdeeE")#  </cfoutput> 

11.10.2 Metacharacters and Pattern Matching

Table 11.3 shows the most common metacharacters used in regular expression matching.

Table 11.3. The Most Common Metacharacters Used in Regex

Metacharacter

Meaning

  \d  

Any digit (0 “9)

  \d  

Any non-digit

  \s  

A whitespace character

  \s  

Any non-whitespace character

  \w  

Letters, numbers , and underscores

  \w  

Any non-\w characters

.

Any single character

^

Nothing before this character

$

Nothing after this character

+

One or more

*

Zero or more

?

Zero or one

It can be very difficult to get your mind around how regexes work without seeing some examples (or after seeing some, for that matter). So, Table 11.4 provides some examples:

Table 11.4. Regex Examples

Regular Expression

Strings that Match

A*B

B, AB, AAB, AAAB,

A+B

AB, AAB, AAAB,

A?B

B or AB

[XYZ]B

XB, YB, or ZB

[A-C]B

AB, BB, or CB

[3-5]X

3X, 4X, or 5X

Item\s\d

Item 1, (but not Item5 or ItemX)

(XZ)Y

XY or ZY

(hi\s){2}

hi hi

(hi\s){1-3}

hi, hi, hi or hi hi hi

It used to be that performing regular expression matching in Java was a rather elaborate process. One had to use StringTokenizers to match text in substrings using charAt() . The JDK 1.4 makes it easier, having introduced a new package called java.util.regex . There are two classes in this package that help you work with regex.

11.10.2.1 public final class PATTERN

A Pattern object is a compiled instance of a string representing a regular expression. The Pattern object can then be used to create a Matcher object to match character sequences against the regular expression.

11.10.2.2 public final class MATCHER

The Matcher object interprets patterns in order to match a character sequence. You create a matcher by invoking the pattern's matcher method. There are three kinds of match operations you can perform:

  1. matches() tries to match the complete input sequence against the pattern.

  2. lookingAt() tries to match the input sequence, starting at the beginning, against the pattern.

  3. find() scans the input sequence looking for the next subsequence matching the pattern.

A typical sequence for matching a regular expression then looks like this:

 pattern p = Pattern.compile("a*b");  Matcher m = p.matcher("aaaaab"); boolean b = m.matches(); 

Let's look at an example. Let's create a source file that will have find all instances of "Allaire" and replace them with "Macromedia." This perhaps is not an advanced regex example, but it demonstrates what is important here: how to use these two classes in conjunction to perform any regex matching you need to do.

First, we create a file called MyData.dat that contains the following information:

Allaire is a company with an office in Boston.

Not only does Allaire make ColdFusion, but now

Allaire has great parties.

We will use the regex engine to replace all instances of "Allaire" with instances of "Macromedia."

Note

This example also uses instances of the java.io.File class and the java.io.BufferedReader and java.io.BufferedWriter class. You can work with files and directories using objects in these classes when you would choose <cffile> in ColdFusion. This example takes an existing file and reads its data in, then writes a new file with the updated information. Look in the API for more on this subject.


You will want to change the location of the file and match the path to one suitable for your system.

11.10.3 RegexTest.java

 // performs a search to find and replace  // pattern instances using regex package chp11; import java.util.regex.*; import java.io.*; public class RegExTest {     public static void main(String[] args)         throws Exception {             // create file object         File inFile = new File("C:\MyData.dat");         File outFile = new File("C:\UpdatedData.dat");             // get an input stream             // to read a file in         FileInputStream inputStream =             new FileInputStream(inFile);             // get an output stream             // so we can write the file             // back out.         FileOutputStream outputStream =             new FileOutputStream(outFile);             // the BufferedReader performs             // efficient reading-in of text             // from a character source         BufferedReader readerIn = new BufferedReader(             new InputStreamReader(inputStream));         BufferedWriter writerOut = new BufferedWriter(             new OutputStreamWriter(outputStream));             // write pattern and compile it into             // a pattern object.             // Your regex goes here:         Pattern p = Pattern.compile("Allaire");             // use the matcher to find occurences         Matcher m = p.matcher("");         String s = null;         while((s = readerIn.readLine()) != null) {             m.reset(s);                 //Replace characters             String result = m.replaceAll("Macromedia");             writerOut.write(result);             writerOut.newLine();         }             // close connections        readerIn.close();        writerOut.close();        System.out.println("Operation performed successfully.");     } } 

The output is:

 Operation performed successfully. 

Upon opening the file, you should see all instances of "Allaire" replaced with "Macromedia."

With the new regex package, performing regular expression matching now gives you a power closer to that of sed or awk or Perl for this kind of operation.


   
Top


Java for ColdFusion Developers
Java for ColdFusion Developers
ISBN: 0130461806
EAN: 2147483647
Year: 2005
Pages: 206
Authors: Eben Hewitt

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net