ProblemYou need to find the text that the regex matched. SolutionSometimes you need to know more than just whether a regex matched a string. In editors and many other tools, you want to know exactly what characters were matched. Remember that with multipliers such as * , the length of the text that was matched may have no relationship to the length of the pattern that matched it. Do not underestimate the mighty .* , which happily matches thousands or millions of characters if allowed to. As you saw in the previous recipe, you can find out whether a given match succeeds just by using find( ) or matches( ). But in other applications, you will want to get the characters that the pattern matched. After a successful call to one of the above methods, you can use these "information" methods to get information on the match:
The notion of parentheses or " capture groups" is central to regex processing. Regexes may be nested to any level of complexity. The group(int) method lets you retrieve the characters that matched a given parenthesis group. If you haven't used any explicit parens, you can just treat whatever matched as "level zero." For example: // Part of REmatch.java String patt = "Q[^u]\\d+\\."; Pattern r = Pattern.compile(patt); String line = "Order QT300. Now!"; Matcher m = r.matcher(line); if (m.find( )) { System.out.println(patt + " matches \"" + m.group(0) + "\" in \"" + line + "\""); } else { System.out.println("NO MATCH"); } When run, this prints: Q[^u]\d+\. matches "QT300." in "Order QT300. Now!" An extended version of the REDemo program presented in Recipe 4.2, called REDemo2, provides a display of all the capture groups in a given regex; one example is shown in Figure 4-3. Figure 4-3. REDemo2 in action![]() It is also possible to get the starting and ending indexes and the length of the text that the pattern matched (remember that terms with multipliers, such as the \d+ in this example, can match an arbitrary number of characters in the string). You can use these in conjunction with the String.substring( ) methods as follows: // Part of regexsubstr.java -- Prints exactly the same as REmatch.java Pattern r = Pattern.compile(patt); String line = "Order QT300. Now!"; Matcher m = r.matcher(line); if (m.find( )) { System.out.println(patt + " matches \"" + line.substring(m.start(0), m.end(0)) + "\" in \"" + line + "\""); } else { System.out.println("NO MATCH"); } } Suppose you need to extract several items from a string. If the input is: Smith, John Adams, John Quincy and you want to get out: John Smith John Quincy Adams just use: // from REmatchTwoFields.java // Construct a regex with parens to "grab" both field1 and field2 Pattern r = Pattern.compile("(.*), (.*)"); Matcher m = r.matcher(inputLine); if (!m.matches( )) throw new IllegalArgumentException("Bad input: " + inputLine); System.out.println(m.group(2) + ' ' + m.group(1)); |