Recipe 4.9 Matching Newlines in Text


Problem

You need to match newlines in text.

Solution

Use \n or \r.

See also the flags constant Pattern.MULTILINE, which makes newlines match as beginning-of-line and end-of-line (^ and $).

Discussion

While line-oriented tools from Unix such as sed and grep match regular expressions one line at a time, not all tools do. The sam text editor from Bell Laboratories was the first interactive tool I know of to allow multiline regular expressions; the Perl scripting language followed shortly. In the Java API, the newline character by default has no special significance. The BufferedReader method readLine( ) normally strips out whichever newline characters it finds. If you read in gobs of characters using some method other than readLine( ), you may have some number of \n , \r, or \r\n sequences in your text string.[4] Normally all of these are treated as equivalent to \n. If you want only \n to match, use the UNIX_LINES flag to the Pattern.compile( ) method.

[4] Or a few related Unicode characters, including the next-line (\u0085), line-separator (\u2028), and paragraph-separator (\u2029) characters.

In Unix, ^ and $ are commonly used to match the beginning or end of a line, respectively. In this API, the regex metacharacters ^ and $ ignore line terminators and only match at the beginning and the end, respectively, of the entire string. However, if you pass the MULTILINE flag into Pattern.compile( ) , these expressions match just after or just before, respectively, a line terminator; $ also matches the very end of the string. Since the line ending is just an ordinary character, you can match it with . or similar expressions, and, if you want to know exactly where it is, \n or \r in the pattern match it as well. In other words, to this API, a newline character is just another character with no special significance. See the sidebar Pattern.compile( ) Flags. An example of newline matching is shown in Example 4-6.

Example 4-6. NLMatch.java
import java.util.regex.*; /**  * Show line ending matching using regex class.  * @author Ian F. Darwin, ian@darwinsys.com  * @version $Id: ch04.xml,v 1.4 2004/05/04 20:11:27 ian Exp $  */ public class NLMatch {     public static void main(String[] argv) {         String input = "I dream of engines\nmore engines, all day long";         System.out.println("INPUT: " + input);         System.out.println( );         String[] patt = {             "engines.more engines",             "engines$"         };         for (int i = 0; i < patt.length; i++) {             System.out.println("PATTERN " + patt[i]);             boolean found;             Pattern p1l = Pattern.compile(patt[i]);             found = p1l.matcher(input).find( );             System.out.println("DEFAULT match " + found);             Pattern pml = Pattern.compile(patt[i],                  Pattern.DOTALL|Pattern.MULTILINE);             found = pml.matcher(input).find( );             System.out.println("MultiLine match " + found);             System.out.println( );         }     } }

If you run this code, the first pattern (with the wildcard character .) always matches, while the second pattern (with $) matches only when MATCH_MULTILINE is set.

> java NLMatch INPUT: I dream of engines more engines, all day long   PATTERN engines more engines DEFAULT match true MULTILINE match: true   PATTERN engines$ DEFAULT match false MULTILINE match: true



Java Cookbook
Java Cookbook, Second Edition
ISBN: 0596007019
EAN: 2147483647
Year: 2003
Pages: 409
Authors: Ian F Darwin

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net