Recipe 4.7 Controlling Case in Regular Expressions

Problem

You want to find text regardless of case.

Solution

Compile the Pattern passing in the flags argument Pattern.CASE_INSENSITIVE to indicate that matching should be case-independent ("fold" or ignore differences in case). If your code might run in different locales (see Chapter 15), add Pattern.UNICODE_CASE. Without these flags, the default is normal, case-sensitive matching behavior. This flag (and others) are passed to the Pattern.compile( ) method, as in:

// CaseMatch.java Pattern  reCaseInsens = Pattern.compile(pattern, Pattern.CASE_INSENSITIVE |  Pattern.UNICODE_CASE); reCaseInsens.matches(input);        // will match case-insensitively

This flag must be passed when you create the Pattern; as Pattern objects are immutable, they cannot be changed once constructed.

The full source code for this example is online as CaseMatch.java.

Pattern.compile( ) Flags

Half a dozen flags can be passed as the second argument to Pattern.compile( ) . If more than one value is needed, they can be or'd together using the | bitwise or operator. In alphabetical order, the flags are:

CANON_EQ: Enables so-called "canonical equivalence," that is, characters are matched by their base character, so that the character e followed by the "combining character mark" for the acute accent ( ´ ) can be matched either by the composite character é or the letter e followed by the character mark for the accent (see Recipe 4.8).
CASE_INSENSITIVE: Turns on case-insensitive matching (see Recipe Recipe 4.7).
COMMENTS: Causes whitespace and comments (from # to end-of-line) to be ignored in the pattern.
DOTALL: Allows dot (.) to match any regular character or the newline, not just newline (see Recipe Recipe 4.9).
MULTILINE: Specifies multiline mode (see Recipe Recipe 4.9).
UNICODE_CASE: Enables Unicode-aware case folding (see Recipe 4.7).
UNIX_LINES: Makes \n the only valid "newline" sequence for MULTILINE mode (see Recipe 4.9).