Regular expressions are descriptions of textual patterns that enable string matching. You have likely used at least a simple regular expression. Perhaps the most common character in a regular expression is the wildcard character ( * ). So if you have ever entered *.cfm to view only the ColdFusion markup files in a directory, you have used regular expressions. You may or may not have had occasion to use regular expressions with ColdFusion functions.
Regular expressions are a useful tool, and we will cover them here briefly as the JDK 1.4 makes them easy to use; the new classes are a package under java.util .
Using a regular expression in ColdFusion boils down to implementing one of two functions: REFind() and REFindNoCase() . These functions are prefixed with RE , which stands for regular expression.
REFind() performs a case-sensitive search that returns the position of the first occurrence of a regular expression in a string, beginning with the position specified. If no occurrences are found, returns .
REFindNoCase() is a case-insensitive version of REFind() . Here are a few examples showing the different kinds of usage of this function:
A simple regex:
<cfoutput>#REFindNoCase("a+c", "abcaaccdd")#</cfoutput>
Using POSIX syntax:
<cfoutput>#REFindNoCase("[[:alpha:]]", "aBBccDdeeE")# </cfoutput>
Table 11.3 shows the most common metacharacters used in regular expression matching.
Metacharacter | Meaning |
---|---|
\d | Any digit (0 “9) |
\d | Any non-digit |
\s | A whitespace character |
\s | Any non-whitespace character |
\w | Letters, numbers , and underscores |
\w | Any non-\w characters |
. | Any single character |
^ | Nothing before this character |
$ | Nothing after this character |
+ | One or more |
* | Zero or more |
? | Zero or one |
It can be very difficult to get your mind around how regexes work without seeing some examples (or after seeing some, for that matter). So, Table 11.4 provides some examples:
Regular Expression | Strings that Match |
---|---|
A*B | B, AB, AAB, AAAB, |
A+B | AB, AAB, AAAB, |
A?B | B or AB |
[XYZ]B | XB, YB, or ZB |
[A-C]B | AB, BB, or CB |
[3-5]X | 3X, 4X, or 5X |
Item\s\d | Item 1, (but not Item5 or ItemX) |
(XZ)Y | XY or ZY |
(hi\s){2} | hi hi |
(hi\s){1-3} | hi, hi, hi or hi hi hi |
It used to be that performing regular expression matching in Java was a rather elaborate process. One had to use StringTokenizers to match text in substrings using charAt() . The JDK 1.4 makes it easier, having introduced a new package called java.util.regex . There are two classes in this package that help you work with regex.
A Pattern object is a compiled instance of a string representing a regular expression. The Pattern object can then be used to create a Matcher object to match character sequences against the regular expression.
The Matcher object interprets patterns in order to match a character sequence. You create a matcher by invoking the pattern's matcher method. There are three kinds of match operations you can perform:
matches() tries to match the complete input sequence against the pattern.
lookingAt() tries to match the input sequence, starting at the beginning, against the pattern.
find() scans the input sequence looking for the next subsequence matching the pattern.
A typical sequence for matching a regular expression then looks like this:
pattern p = Pattern.compile("a*b"); Matcher m = p.matcher("aaaaab"); boolean b = m.matches();
Let's look at an example. Let's create a source file that will have find all instances of "Allaire" and replace them with "Macromedia." This perhaps is not an advanced regex example, but it demonstrates what is important here: how to use these two classes in conjunction to perform any regex matching you need to do.
First, we create a file called MyData.dat that contains the following information:
Allaire is a company with an office in Boston.
Not only does Allaire make ColdFusion, but now
Allaire has great parties.
We will use the regex engine to replace all instances of "Allaire" with instances of "Macromedia."
Note
This example also uses instances of the java.io.File class and the java.io.BufferedReader and java.io.BufferedWriter class. You can work with files and directories using objects in these classes when you would choose <cffile> in ColdFusion. This example takes an existing file and reads its data in, then writes a new file with the updated information. Look in the API for more on this subject.
You will want to change the location of the file and match the path to one suitable for your system.
// performs a search to find and replace // pattern instances using regex package chp11; import java.util.regex.*; import java.io.*; public class RegExTest { public static void main(String[] args) throws Exception { // create file object File inFile = new File("C:\MyData.dat"); File outFile = new File("C:\UpdatedData.dat"); // get an input stream // to read a file in FileInputStream inputStream = new FileInputStream(inFile); // get an output stream // so we can write the file // back out. FileOutputStream outputStream = new FileOutputStream(outFile); // the BufferedReader performs // efficient reading-in of text // from a character source BufferedReader readerIn = new BufferedReader( new InputStreamReader(inputStream)); BufferedWriter writerOut = new BufferedWriter( new OutputStreamWriter(outputStream)); // write pattern and compile it into // a pattern object. // Your regex goes here: Pattern p = Pattern.compile("Allaire"); // use the matcher to find occurences Matcher m = p.matcher(""); String s = null; while((s = readerIn.readLine()) != null) { m.reset(s); //Replace characters String result = m.replaceAll("Macromedia"); writerOut.write(result); writerOut.newLine(); } // close connections readerIn.close(); writerOut.close(); System.out.println("Operation performed successfully."); } }
The output is:
Operation performed successfully.
Upon opening the file, you should see all instances of "Allaire" replaced with "Macromedia."
With the new regex package, performing regular expression matching now gives you a power closer to that of sed or awk or Perl for this kind of operation.
Top |