ProblemYou need to find the text that the regex matched. SolutionSometimes you need to know more than just whether a regex matched a string. In editors and many other tools, you want to know exactly what characters were matched. Remember that with multipliers such as * , the length of the text that was matched may have no relationship to the length of the pattern that matched it. Do not underestimate the mighty .* , which happily matches thousands or millions of characters if allowed to. As you saw in the previous recipe, you can find out whether a given match succeeds just by using find( ) or matches( ). But in other applications, you will want to get the characters that the pattern matched. After a successful call to one of the above methods, you can use these "information" methods to get information on the match:
The notion of parentheses or " capture groups" is central to regex processing. Regexes may be nested to any level of complexity. The group(int) method lets you retrieve the characters that matched a given parenthesis group. If you haven't used any explicit parens, you can just treat whatever matched as "level zero." For example: // Part of REmatch.java String patt = "Q[^u]\\d+\\."; Pattern r = Pattern.compile(patt); String line = "Order QT300. Now!"; Matcher m = r.matcher(line); if (m.find( )) { System.out.println(patt + " matches \"" + m.group(0) + "\" in \"" + line + "\""); } else { System.out.println("NO MATCH"); } When run, this prints: Q[^u]\d+\. matches "QT300." in "Order QT300. Now!" An extended version of the REDemo program presented in Recipe 4.2, called REDemo2, provides a display of all the capture groups in a given regex; one example is shown in Figure 4-3. Figure 4-3. REDemo2 in actionIt is also possible to get the starting and ending indexes and the length of the text that the pattern matched (remember that terms with multipliers, such as the \d+ in this example, can match an arbitrary number of characters in the string). You can use these in conjunction with the String.substring( ) methods as follows: // Part of regexsubstr.java -- Prints exactly the same as REmatch.java Pattern r = Pattern.compile(patt); String line = "Order QT300. Now!"; Matcher m = r.matcher(line); if (m.find( )) { System.out.println(patt + " matches \"" + line.substring(m.start(0), m.end(0)) + "\" in \"" + line + "\""); } else { System.out.println("NO MATCH"); } } Suppose you need to extract several items from a string. If the input is: Smith, John Adams, John Quincy and you want to get out: John Smith John Quincy Adams just use: // from REmatchTwoFields.java // Construct a regex with parens to "grab" both field1 and field2 Pattern r = Pattern.compile("(.*), (.*)"); Matcher m = r.matcher(inputLine); if (!m.matches( )) throw new IllegalArgumentException("Bad input: " + inputLine); System.out.println(m.group(2) + ' ' + m.group(1)); |