Recipe 10.1. Enumerating MatchesProblemYou need to find one or more substrings corresponding to a particular pattern within a string. You need to be able to inform the searching code to return either all matching substrings or only the matching substrings that are unique within the set of all matched strings. SolutionCall the FindSubstrings method shown in Example 10-1, which executes a regular expression and obtains all matching text. This method returns either all matching results or only the unique matches; this behavior is controlled by the findAllUnique parameter. Note that if the findAllUnique parameter is set to true, the unique matches are returned sorted alphabetically. Example 10-1. FindSubstrings method
The TestFindSubstrings method shown in Example 10-2 searches for any tags in an XML string; it does this by searching for a block of text that begins with the < character and ends with the > character. This method first displays all unique tag matches present in the XML string and then displays all tag matches within the string. Example 10-2. The TestFindSubstrings method
The following text will be displayed: UNIQUE MATCHES <!-- My comment --> <![CDATA[<escaped> <><chars>>>>>]]> </Control> </Window> <?xml version="1.0\" encoding=\"UTF-8\"?> <Control > <Control > <Property Top="0" Left="0" Caption="Enter Name Here"/> <Property Top="0" Left="0" Text="BLANK"/> <Window > ALL MATCHES <?xml version="1.0\" encoding=\"UTF-8\"?> <!-- my comment --> <![CDATA[<escaped> <><chars>>>>>]]> <Window > <Control > <Property Top="0" Left="0" Text="BLANK"/> </Control> <Control > <Property Top="0" Left="0" Caption="Enter Name Here"/> </Control> <Control > <Property Top="0" Left="0" Caption="Enter Name Here"/> </Control> </Window> DiscussionAs you can see, the regular expression classes in the FCL are quite easy to use. The first step is to create an instance of the Regex object that contains the regular expression pattern, along with any options for running this pattern. The second step is to get a reference to an instance of the Match object, if you need only the first found match, or a MatchCollection object, if you need more than just the first found match. To get a reference to this object, the two instance methods Match and Matches can be called from the Regex object that was created in the first step. The Match method returns a single match object (Match) and Matches returns a collection of match objects (MatchCollection). The FindSubstrings method returns an array of Match objects that can be used by the calling code. You may have noticed that the unique elements are returned sorted, and the nonunique elements are not sorted. A SortedList, which is used by the FindSubstrings method to store unique strings that match the regular expression pattern, automatically sorts its items when they are added. The regular expression used in the TestFindSubstrings method is very simplistic and will work in mostbut not allconditions. For example, if two tags are on the same line, as shown here: <tagData></tagData> the regular expression will catch the entire line, not each tag separately. You could change the regular expression from <.*> to <[^>]*> to match only up to the closing > ([^>]* matches everything that is not a >). However, this will fail in the CDATA section, matching <![CDATA[<escaped>, <>, and <chars> instead of <![CDATA[<escaped><> <chars>>>>>]]>. The more complicated @"(<!\[CDATA.*>|<[^>]*>)" will match either <!\[CDATA.*> (a greedy match for everything within the CDATA section) or <[^>]*>, described previously. See AlsoSee the ".NET Framework Regular Expressions" and "SortedList Class" topics in the MSDN documentation. |