ProblemYou're ready to get started using regular expression processing to beef up your Java code by testing to see if a given pattern can match in a given string. SolutionUse the Java Regular Expressions Package, java.util.regex. DiscussionThe good news is that the Java API for regexes is actually easy to use. If all you need is to find out whether a given regex matches a string, you can use the convenient boolean matches( ) method of the String class, which accepts a regex pattern in String form as its argument: if (inputString.matches(stringRegexPattern)) { // it matched... do something with it... } This is, however, a convenience routine, and convenience always comes at a price. If the regex is going to be used more than once or twice in a program, it is more efficient to construct and use a Pattern and its Matcher(s). A complete program constructing a Pattern and using it to match is shown here: import java.util.regex.*; /** * Simple example of using regex class. */ public class RESimple { public static void main(String[] argv) throws PatternSyntaxException { String pattern = "^Q[^u]\\d+\\."; String input = "QA777. is the next flight. It is on time."; Pattern p = Pattern.compile(pattern); boolean found = p.matcher(input).lookingAt( ); System.out.println("'" + pattern + "'" + (found ? " matches '" : " doesn't match '") + input + "'"); } } The java.util.regex package consists of two classes, Pattern and Matcher, which provide the public API shown in Example 4-1. Example 4-1. Regex public API/** The main public API of the java.util.regex package. * Prepared by javap and Ian Darwin. */ package java.util.regex; public final class Pattern { // Flags values ('or' together) public static final int UNIX_LINES, CASE_INSENSITIVE, COMMENTS, MULTILINE, DOTALL, UNICODE_CASE, CANON_EQ; // Factory methods (no public constructors) public static Pattern compile(String patt); public static Pattern compile(String patt, int flags); // Method to get a Matcher for this Pattern public Matcher matcher(CharSequence input); // Information methods public String pattern( ); public int flags( ); // Convenience methods public static boolean matches(String pattern, CharSequence input); public String[] split(CharSequence input); public String[] split(CharSequence input, int max); } public final class Matcher { // Action: find or match methods public boolean matches( ); public boolean find( ); public boolean find(int start); public boolean lookingAt( ); // "Information about the previous match" methods public int start( ); public int start(int whichGroup); public int end( ); public int end(int whichGroup); public int groupCount( ); public String group( ); public String group(int whichGroup); // Reset methods public Matcher reset( ); public Matcher reset(CharSequence newInput); // Replacement methods public Matcher appendReplacement(StringBuffer where, String newText); public StringBuffer appendTail(StringBuffer where); public String replaceAll(String newText); public String replaceFirst(String newText); // information methods public Pattern pattern( ); } /* String, showing only the regex-related methods */ public final class String { public boolean matches(String regex); public String replaceFirst(String regex, String newStr); public String replaceAll(String regex, String newStr) public String[] split(String regex) public String[] split(String regex, int max); } This API is large enough to require some explanation. The normal steps for regex matching in a production program are:
The CharSequence interface, added to java.lang with JDK 1.4, provides simple read-only access to objects containing a collection of characters. The standard implementations are String and StringBuffer (described in Chapter 3), and the "new I/O" class java.nio.CharBuffer. Of course, you can perform regex matching in other ways, such as using the convenience methods in Pattern or even in java.lang.String. For example: // StringConvenience.java -- show String convenience routine for "match" String pattern = ".*Q[^u]\\d+\\..*"; String line = "Order QT300. Now!"; if (line.matches(pattern)) { System.out.println(line + " matches \"" + pattern + "\""); } else { System.out.println("NO MATCH"); } But the three-step list just described is the "standard" pattern for matching. You'd likely use the String convenience routine in a program that only used the regex once; if the regex were being used more than once, it is worth taking the time to "compile" it, since the compiled version runs faster. As well, the Matcher has several finder methods, which provide more flexibility than the String convenience routine match( ). The Matcher methods are:
Each of these methods returns boolean, with true meaning a match and false meaning no match. To check whether a given string matches a given pattern, you need only type something like the following: Matcher m = Pattern.compile(patt).matcher(line); if (m.find( )) { System.out.println(line + " matches " + patt) } But you may also want to extract the text that matched, which is the subject of the next recipe. The following recipes cover uses of this API. Initially, the examples just use arguments of type String as the input source. Use of other CharSequence types is covered in Recipe Recipe 4.5. |