Matching Strings with Regular Expressions

Problem

You want to know whether or not a string matches a certain pattern.

Solution

You can usually describe the pattern as a regular expression. The =~ operator tests a string against a regular expression:

	string = 'This is a 30-character string.'

	if string =~ /([0-9]+)-character/ and $1.to_i == string.length
	 "Yes, there are #$1 characters in that string."
	end
	# => "Yes, there are 30 characters in that string."

You can also use Regexp#match:

	match = Regexp.compile('([0-9]+)-character').match(string)
	if match && match[1].to_i == string.length
	 "Yes, there are #{match[1]} characters in that string."
	end
	# => "Yes, there are 30 characters in that string."

You can check a string against a series of regular expressions with a case statement:

	string = "123"

	case string
	when /^[a-zA-Z]+$/
	 "Letters"
	when /^[0-9]+$/
	 "Numbers"
	else
	 "Mixed"
	end
	# => "Numbers"

Discussion

Regular expressions are a cryptic but powerful minilanguage for string matching and substring extraction. They've been around for a long time in Unix utilities like sed, but Perl was the first general-purpose programming language to include them. Now almost all modern languages have support for Perl-style regular expression.

Ruby provides several ways of initializing regular expressions. The following are all equivalent and create equivalent Regexp objects:

	/something/
	Regexp.new("something")
	Regexp.compile("something")
	%r{something}

The following modifiers are also of note.

Table 1-1.
Regexp::IGNORECASE	i	Makes matches case-insensitive.
Regexp::MULTILINE	m	Normally, a regexp matches against a single line of a string. This will cause a regexp to treat line breaks like any other character.
Regexp::EXTENDED	x	This modifier lets you space out your regular expressions with whitespace and comments, making them more legible.

Here's how to use these modifiers to create regular expressions:

	/something/mxi
	Regexp.new('something',
	 Regexp::EXTENDED + Regexp::IGNORECASE + Regexp::MULTILINE)
	%r{something}mxi

Here's how the modifiers work:

	case_insensitive = /mangy/i
	case_insensitive =~ "I'm mangy!" # => 4
	case_insensitive =~ "Mangy Jones, at your service." # => 0

	multiline = /a.b/m
	multiline =~ "banana
banana" # => 5
	/a.b/ =~ "banana
banana" # => nil
	# But note:
	/a
b/ =~ "banana
banana" # => 5

	extended = %r{  was # Match " was"
	 s # Match one whitespace character
	 a # Match "a" }xi
	extended =~ "What was Alfred doing here?" # => 4
	extended =~ "My, that was a yummy mango." # => 8
	extended =~ "It was


a fool's errand" # => nil