|
The most basic use of regular expressions is that of matching substrings within a string. Matching can be for words, characters, or any other conceivable sequence required. For example, given the string "Angie called Albert" we could begin by locating the capital letter A only if it begins a word. The following regular expression would do just that: \bA Translated, it means "begin on a word boundary and find the letter A." The matches produced would be two matches for the letter A: the first letter of Angie and Albert. No other characters would be included in the match result. We could extend this pattern to locate all words that begin with the letter A. The regular expression to accomplish this would be as follows: \bA\w+ This expression translates to "begin at word boundaries, and locate the letter A followed by one or more characters or digits." This expression again produces two matches. This time the matches are Angie and Albert. If we want to find words that do not begin with the letter A, but rather contain the letter A or a, the following expression could be used: \b[^aA\s]\w*[aA]\w+ This expression reads, "find words that do not begin with a or A or a space, that begin with letters or digits and contain either a or A, and are followed by one or more letters or digits." Table 4.6 lists the basic escape sequences for regular expressions.
The items in Table 4.6 are included for completeness and are probably not going to be part of normal usage. Table 4.7 lists the standard character classes for regular expressions.
At this point, you have seen the basic character classes and escape sequences as well as some regular expressions. Next, you'll make use of the .NET RegEx class to match input and display the results. Listing 4.5 matches various phone number input types and shows how to test and match for multiple formats. Listing 4.5. Using the Regex Class for String Matchingusing System; using System.Text; //StringBuilder using System.Text.RegularExpressions; //Regular Expression classes namespace Listing_4_5 { class Class1 { static void Main(string[] args) { string phoneNumber1 = "555-1212"; string phoneNumber2 = "919-555-1212"; string phoneNumber3 = "(919) 555-1212"; string invalidPhoneNumberFormat = "919.555.1212"; //Match the phone number for the following formats //1: 555-1212 //2: 919-555-1212 //3: (919) 555-1212 StringBuilder expressionBuilder = new StringBuilder( ); //The @ symbol is used so the \ is ignored as a C# escape sequence //start at beginning of line (^), 3 digits hyphen 4 digits expressionBuilder.Append( @"^\d{3}-\d{4}" ); //or, another expression to follow expressionBuilder.Append( "|" ); //start at beginning of line (^), 3 digits hyphen 3 digits hyphen 4 // digits expressionBuilder.Append( @"^\d{3}-\d{3}-\d{4}" ); //or last expression to meet the criteria expressionBuilder.Append( "|" ); //start at beginning of line (^), open paren ( //3 digits close paren ) //space 3 digits hyphen 4 digits //Note: the open and close parens must be //escaped with the \ //character. expressionBuilder.Append( @"^\(\d{3}\)\s\d{3}-\d{4}" ); //Now we have the regular expression, create the RegEx object Regex phoneMatchExpression = new Regex( expressionBuilder.ToString( ) ); //Match the phone numbers if( phoneMatchExpression.Match( phoneNumber1 ).Success ) Console.WriteLine( "phoneNumber1 matches" ); else Console.WriteLine( "phoneNumber1 has invalid format" ); if( phoneMatchExpression.Match( phoneNumber2 ).Success ) Console.WriteLine( "phoneNumber2 matches" ); else Console.WriteLine( "phoneNumber2 has invalid format" ); if( phoneMatchExpression.Match( phoneNumber3 ).Success ) Console.WriteLine( "phoneNumber3 matches" ); else Console.WriteLine( "phoneNumber3 has invalid format" ); if( phoneMatchExpression.Match( invalidPhoneNumberFormat ).Success ) Console.WriteLine( "invalidPhoneNumberFormat matches" ); else Console.WriteLine( "invalidPhoneNumberFormat has invalid format" ); } } } Validating Data with Regular ExpressionsWhenever you hear the phrase data validation, think of applying regular expressions. By using regular expressions, you can validate character ranges, length, and format. A useful example of this is validating passwords for standards compliance; for example, if the password must be alphanumeric with at least one uppercase letter and one digit. To handle such validation, it is necessary to understand and apply regular expression assertions. Table 4.8 lists the assertions and their descriptions.
Armed with assertions, it is now possible to validate a password whose length is 8 to 12 characters and must include at least 1 uppercase character and 1 digit. The following expression uses assertions to implement this validation: ^(?=.*\d+)(?=.*[a-z]+)(?=.*[A-Z]+).{4,8}$ Grouping MatchesWhen parsing strings, the ability to locate substrings and quickly access them can be difficult with the System.String and System.StringBuilder. However, the regular expression grouping support allows for quick access to matched substrings. By creating a named group, you can quickly access the captured data for the expressed pattern. Such grouping makes a task such as parsing and accessing web query string parameters very simple to do. Grouping is accomplished by creating a named or unnamed group using the following syntax: (?<group-name>pattern) With the group name specified, a returned Match object contains a Groups collection that can be indexed by the name of the captured group. Listing 4.6 shows how to use grouping to parse and capture data from the web query string param1=data1¶m2=data2. Listing 4.6. Grouping with Regular Expressionsusing System; using System.Text.RegularExpressions; namespace Listing_4_6 { class Class1 { static void Main(string[] args) { string queryString = "param1=data1¶m2=data2"; Regex queryStringExpression = new Regex( @"param1=(?<param1>\w+[^&]) ¶m2=(?<param2>\w+[^&])" ); Match match = queryStringExpression.Match( queryString ); if( match.Success ) { //display the group data Console.WriteLine( "param1 := {0}", match.Groups[ "param1" ].Value ); Console.WriteLine( "param2 := {0}", match.Groups[ "param2" ].Value ); } } } } Replacing Matched StringsOne of the more useful features of regular expression is the ability to implement search-and-replace-style functionality with a very powerful languagethat language of course being regular expressions. The replacement works for both named and unnamed groups and allows for a new string to be created based on the matched pattern and supplied replacement expression. TIP Unnamed groups are created whenever a pattern is enclosed in parentheses. These unnamed groups have a one-based ordinal number assigned to them and can be referenced as $1, $2, $3, …. Looking back at the phone number validation example, if the phone number is 9195551212 and you want to display it as (919) 555-1212, you could use a matching expression in combination with a replacement expression. The code necessary to create the desired result is as follows: string match = @"(\d{3})(\d{3})(\d{4})"; string replace = @"($1) $2-$3"; string result = Regex.Replace( match, replace ); After executing the Replace method, the result string would contain the newly formatted phone number. |
|