Recipe10.2.Extracting Groups from a MatchCollection


Recipe 10.2. Extracting Groups from a MatchCollection

Problem

You have a regular expression that contains one or more named groups, such as the following:

 \\\\(?<TheServer>\w*)\\(?<TheService>\w*)\\ 

where the named group TheServer will match any server name within a UNC string, and TheService will match any service name within a UNC string.

You need to store the groups that are returned by this regular expression in a keyed collection (such as a Dictionary<string, Group>) in which the key is the group name.

Solution

The ExtractGroupings method shown in Example 10-3 obtains a set of Group objects keyed by their matching group name.

Example 10-3. ExtractGroupings method

 using System; using System.Collections; using System.Collections.Generics; using System.Text.RegularExpressions; public static List<Dictionary<string, Group>> ExtractGroupings(string source,                                                            string matchPattern,                                                            bool wantInitialMatch) {     List<Dictionary<string, Group>> keyedMatches =          new List<Dictionary<string, Group>>();     int startingElement = 1;     if (wantInitialMatch)     {         startingElement = 0;     }     Regex RE = new Regex(matchPattern, RegexOptions.Multiline);     MatchCollection theMatches = RE.Matches(source);     foreach(Match m in theMatches)     {         Dictionary<string, Group> groupings = new Dictionary<string, Group>();         for (int counter = startingElement; counter < m.Groups.Count; counter++)         {             // If we had just returned the MatchCollection directly, the             // GroupNameFromNumber method would not be available to use.             groupings.Add(RE.GroupNameFromNumber(counter), m.Groups[counter]);         }         keyedMatches.Add(groupings);     }     return (keyedMatches); } 

The ExtractGroupings method can be used in the following manner to extract named groups and organize them by name:

 public static void TestExtractGroupings() {     string source = @"Path = ""\\MyServer\MyService\MyPath;                               \\MyServer2\MyService2\MyPath2\""";     string matchPattern = @"\\\\(?<TheServer>\w*)\\(?<TheService>\w*)\\";     foreach (Dictionary<string, Group> grouping in              ExtractGroupings(source, matchPattern, true))     {         foreach (KeyValuePair kvp in grouping)             Console.WriteLine("Key / Value = " + kvp.Key + " / " + kvp.Value);         Console.WriteLine("");     } } 

This test method creates a source string and a regular expression pattern in the MatchPattern variable. The two groupings in this regular expression are highlighted here:

 string matchPattern = @"\\\\(?<TheServer>\w*)\\(?<TheService>\w*)\\"; 

The names for these two groups are: TheServer and TheService. Text that matches either of these groupings can be accessed through these group names.

The source and matchPattern variables are passed in to the ExTRactGroupings method, along with a Boolean value, which is discussed shortly. This method returns a List<T>; containing Dictionary<string,Group> objects. These Dictionary<string,Group> objects contain the matches for each of the named groups in the regular expression, keyed by their group name.

This test method, TestExtractGroupings, returns the following:

 Key / Value = 0 / \\MyServer\MyService\ Key / Value = TheService / MyService Key / Value = TheServer / MyServer Key / Value = 0 / \\MyServer2\MyService2\ Key / Value = TheService / MyService2 Key / Value = TheServer / MyServer2 

If the last parameter to the ExtractGroupings method were to be changed to false, the following output would result:

 Key / Value = TheService / MyService Key / Value = TheServer / MyServer Key / Value = TheService / MyService2 Key / Value = TheServer / MyServer2 

The only difference between these two outputs are that the first grouping is not displayed when the last parameter to ExtractGroupings is changed to false. The first grouping is always the complete match of the regular expression.

Discussion

Groups within a regular expression can be defined in one of two ways. The first way is to add parentheses around the subpattern that you wish to define as a grouping. This type of grouping is sometimes labeled as unnamed. This grouping can later be easily extracted from the final text in each Match object returned by running the regular expression. The regular expression for this recipe could be modified, as follows, to use a simple unnamed group:

 string matchPattern = @"\\\\(\w*)\\(\w*)\\"; 

After running the regular expression, you can access these groups using a numeric integer value starting with 1.

The second way to define a group within a regular expression is to use one or more named groups. A named group is defined by adding parentheses around the subpattern that you wish to define as a grouping and, additionally, adding a name to each grouping, using the following syntax:

 (?<Name>\w*) 

The Name portion of this syntaxis the name you specify for this group. After executing this regular expression, you can access this group by the name Name.

To access each group, you must first use a loop to iterate each Match object in the MatchCollection. For each Match object, you access the GroupCollection's indexer, using the following unnamed syntax:

 string group1 = m.Groups[1].Value; string group2 = m.Groups[2].Value; 

or the following named syntax where m is the Match object:

 string group1 = m.Groups["Group1_Name"].Value; string group2 = m.Groups["Group2_Name"].Value; 

If the Match method was used to return a single Match object instead of the MatchCollection, use the following syntax to access each group:

 // Unnamed syntax string group1 = theMatch.Groups[1].Value; string group2 = theMatch.Groups[2].Value; // Named syntax string group1 = theMatch.Groups["Group1_Name"].Value; string group2 = theMatch.Groups["Group2_Name"].Value; 

where theMatch is the Match object returned by the Match method.

See Also

See the ".NET Framework Regular Expressions" and "Hashtable Class" topics in the MSDN documentation.



C# Cookbook
Secure Programming Cookbook for C and C++: Recipes for Cryptography, Authentication, Input Validation & More
ISBN: 0596003943
EAN: 2147483647
Year: 2004
Pages: 424

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net