|
|
|
Using Regular ExpressionsThe .NET Framework provides rich regular expression support and functionality within the base class library. Previously, regular expressions were only supported via third-party libraries for languages such as C++ and Visual Basic. In .NET, the regular expression syntax is fully supported and there are a number of classes in the System.Text.RegularExpressions namespace geared to leveraging this powerful string processing language. Understanding Expression SyntaxThe hardest part of using regular expressions in .NET is to first gain an understanding of the language. Unlike C#, the syntax for regular expressions is somewhat cryptic and takes time and practice to understand and apply it correctly. The language itself is based on a set of escape sequences. These escape sequences translate into patterns that are applied to recognizing items within a source string. Each escape sequence has a particular meaning and often works in combination with other elements to create the overall expression. The best way to learn the language of regular expressions is to jump in with both feet. |
|
|
|
|
|
|
How to Use Matching
The most basic use of regular expressions is that of matching substrings within a string. Matching can be for words,
\bA
Translated, it means "begin on a word boundary and find the letter A." The matches produced would be two matches for the letter A: the first letter of Angie and Albert. No other characters would be included in the match result. We could extend this pattern to locate all words that begin with the letter A. The regular expression to accomplish this would be as
\bA\w+
This expression
If we want to find words that do not begin with the letter A, but rather contain the letter A or a, the following expression could be used: \b[^aA\s]\w*[aA]\w+
This expression reads, "find words that do not begin with a or A or a space, that begin with letters or digits and contain either a or A, and are followed by one or more
Table 4.6. Regular Expression Single-Character Escape Sequences
The items in Table 4.6 are included for completeness and are probably not going to be part of normal usage. Table 4.7 lists the standard character classes for regular expressions. Table 4.7. Regular Expression Character Classes
At this point, you have seen the basic character classes and escape sequences as well as some regular expressions. Next, you'll make use of the .NET RegEx class to match input and display the results. Listing 4.5 matches various phone number input types and shows how to test and match for multiple formats. Listing 4.5. Using the Regex Class for String Matching
using System;
using System.Text; //StringBuilder
using System.Text.RegularExpressions; //Regular Expression classes
namespace Listing_4_5
{
class Class1
{
static void Main(string[] args) {
string phoneNumber1 = "555-1212";
string phoneNumber2 = "919-555-1212";
string phoneNumber3 = "(919) 555-1212";
string invalidPhoneNumberFormat = "919.555.1212";
//Match the phone number for the following formats
//1: 555-1212
//2: 919-555-1212
//3: (919) 555-1212
StringBuilder expressionBuilder = new StringBuilder( );
//The @ symbol is used so the \ is ignored as a C# escape sequence
//start at beginning of line (^), 3 digits hyphen 4 digits
expressionBuilder.Append( @"^\d{3}-\d{4}" );
//or, another expression to follow
expressionBuilder.Append( "" );
//start at beginning of line (^), 3 digits hyphen 3 digits hyphen 4
// digits
expressionBuilder.Append( @"^\d{3}-\d{3}-\d{4}" );
//or last expression to meet the criteria
expressionBuilder.Append( "" );
//start at beginning of line (^), open paren ( //3 digits close paren )
//space 3 digits hyphen 4 digits
//Note: the open and close parens must be //escaped with the \
//character.
expressionBuilder.Append( @"^\(\d{3}\)\s\d{3}-\d{4}" );
//Now we have the regular expression, create the RegEx object
Regex phoneMatchExpression = new Regex( expressionBuilder.ToString( ) );
//Match the phone numbers
if( phoneMatchExpression.Match( phoneNumber1 ).Success )
Console.WriteLine( "phoneNumber1 matches" );
else
Console.WriteLine( "phoneNumber1 has invalid format" );
if( phoneMatchExpression.Match( phoneNumber2 ).Success )
Console.WriteLine( "phoneNumber2 matches" );
else
Console.WriteLine( "phoneNumber2 has invalid format" );
if( phoneMatchExpression.Match( phoneNumber3 ).Success )
Console.WriteLine( "phoneNumber3 matches" );
else
Console.WriteLine( "phoneNumber3 has invalid format" );
if( phoneMatchExpression.Match( invalidPhoneNumberFormat ).Success )
Console.WriteLine( "invalidPhoneNumberFormat matches" );
else
Console.WriteLine( "invalidPhoneNumberFormat has invalid format" );
}
}
}
Validating Data with Regular Expressions
Whenever you hear the phrase
data validation
, think of applying regular expressions. By using regular expressions, you can validate character ranges, length, and format. A useful example of this is validating passwords for standards compliance; for example, if the password must be
Table 4.8. Regular Expression Assertions
{% if main.adsdop %}{% include 'adsenceinline.tpl' %}{% endif %} Armed with assertions, it is now possible to validate a password whose length is 8 to 12 characters and must include at least 1 uppercase character and 1 digit. The following expression uses assertions to implement this validation:
^(?=.*\d+)(?=.*[a-z]+)(?=.*[A-Z]+).{4,8}$
Grouping Matches
When parsing strings, the ability to locate substrings and quickly access them can be difficult with the
System.String
and
System.StringBuilder
. However, the regular expression grouping support allows for quick access to matched substrings. By creating a named
Grouping is accomplished by creating a named or unnamed group using the following syntax: (?<group-name>pattern) With the group name specified, a returned Match object contains a Groups collection that can be indexed by the name of the captured group. Listing 4.6 shows how to use grouping to parse and capture data from the web query string param1=data1¶m2=data2 . Listing 4.6. Grouping with Regular Expressions
using System;
using System.Text.RegularExpressions;
namespace Listing_4_6 {
class Class1 {
static void Main(string[] args) {
string queryString = "param1=data1¶m2=data2";
Regex queryStringExpression = new Regex( @"param1=(?<param1>\w+[^&])
¶m2=(?<param2>\w+[^&])" );
Match match = queryStringExpression.Match( queryString );
if( match.Success ) {
//display the group data
Console.WriteLine( "param1 := {0}", match.Groups[ "param1" ].Value );
Console.WriteLine( "param2 := {0}", match.Groups[ "param2" ].Value );
}
}
}
}
Replacing Matched Strings
One of the more useful features of regular expression is the ability to implement
TIP
Unnamed groups are created whenever a pattern is
Looking back at the phone number validation example, if the phone number is 9195551212 and you want to display it as (919) 555-1212, you could use a matching expression in combination with a replacement expression. The code necessary to create the desired result is as follows:
string match = @"(\d{3})(\d{3})(\d{4})";
string replace = @"() -";
string result = Regex.Replace( match, replace );
After executing the Replace method, the result string would contain the newly formatted phone number. |
|
|
|