Recipe2.5.Finding the Location of All Occurrences of a String Within Another String


Recipe 2.5. Finding the Location of All Occurrences of a String Within Another String

Problem

You need to search a string for every occurrence of a specific string. In addition, the case sensitivity, or insensitivity, of the search needs to be controlled.

Solution

Using IndexOf or IndexOfAny in a loop, you can determine how many occurrences of a character or string exist as well as their locations within the string. To find each occurrence of a string in another string using a case-sensitive search, use the following code:

 using System; using System.Collections; using System.Collections.Generic; public static int[] FindAll(string matchStr, string searchedStr, int startPos) {     int foundPos = -1; // -1 represents not found.     int count = 0;     List<int> foundItems = new List<int>();     do     {         foundPos = searchedStr.IndexOf(matchStr, startPos);         if (foundPos > -1)         {             startPos = foundPos + 1;             count++;             foundItems.Add(foundPos);             Console.WriteLine("Found item at position: " + foundPos.ToString());         }     } while (foundPos > -1 && startPos < searchedStr.Length);     return ((int[])foundItems.ToArray()); } 

If the FindAll method is called with the following parameters:

 int[] allOccurrences = FindAll("Red", "BlueTealRedredGreenRedYellow", 0); 

the string "Red" is found at locations 8 and 19 in the string searchedStr. This code uses the IndexOf method inside a loop to iterate through each found matchStr string in the searchStr string.

To find a character in a string using a case-sensitive search, use the following code:

 public static int[] FindAll(char MatchChar, string searchedStr, int startPos) {     int foundPos = -1; // -1 represents not found.     int count = 0;     List<int> foundItems = new List<int>();     do     {         foundPos = searchedStr.IndexOf(MatchChar, startPos);         if (foundPos > -1)         {             startPos = foundPos + 1;             count++;             foundItems.Add(foundPos);             Console.WriteLine("Found item at position: " + foundPos.ToString());         }     } while (foundPos > -1 && startPos < searchedStr.Length);     return ((int[])foundItems.ToArray()); } 

If the FindAll method is called with the following parameters:

 int[] allOccurrences = FindAll('r', "BlueTealRedredGreenRedYellow", 0); 

the character 'r' is found at locations 11 and 15 in the string searchedStr. This code uses the IndexOf method inside a do loop to iterate through each found matchChar character in the searchStr string. Overloading the FindAll method to accept either a char or string type avoids the performance hit of boxing the char type to a string type.

To find each occurrence of a string in another string using a case-insensitive search, use the following code:

 public static int[] FindAny(string matchStr, string searchedStr, int startPos) {     int foundPos = -1; // -1 represents not found.     int count = 0;     List<int> foundItems = new List<int>();     // Factor out case-sensitivity     searchedStr = searchedStr.ToUpper();     matchStr = matchStr.ToUpper();     do     {         foundPos = searchedStr.IndexOf(matchStr, startPos);         if (foundPos > -1)         {             startPos = foundPos + 1;             count++;             foundItems.Add(foundPos);             Console.WriteLine("Found item at position: " + foundPos.ToString());         }     } while (foundPos > -1 && startPos < searchedStr.Length);     return ((int[])foundItems.ToArray()); } 

If the FindAny method is called with the following parameters:

 int[] allOccurrences = FindAny("Red", "BlueTealRedredGreenRedYellow", 0); 

the string "Red" is found at locations 8, 11, and 19 in the string searchedStr. This code uses the IndexOf method inside a loop to iterate through each found matchStr string in the searchStr string. The search is rendered case-insensitive by using the ToUpper method on both the searchedStr and the matchStr strings.

To find a set of characters in a string, use the following code:

 public static int[] FindAny(char[] MatchCharArray, string searchedStr,                             int startPos) {     int foundPos = -1; // -1 represents not found.     int count = 0;     List<int> foundItems = new List<int>();     do     {         foundPos = searchedStr.IndexOfAny(MatchCharArray, startPos);         if (foundPos > -1)         {             startPos = foundPos + 1;             count++;             foundItems.Add(foundPos);             Console.WriteLine("Found item at position: " + foundPos.ToString());         }     } while (foundPos > -1 && startPos < searchedStr.Length);     return ((int[])foundItems.ToArray()); } 

If the FindAll method is called with the following parameters:

 int[] allOccurrences = FindAll(new char[] MatchCharArray = {'R', 'r'},                                "BlueTealRedredGreenRedYellow", 0); 

the characters 'r' or 'R' is found at locations 8, 11, 15, and 19 in the string searchedStr. This code uses the IndexOfAny method inside a loop to iterate through each found matchStr string in the searchStr string. The search is rendered case-insensitive by using an array of char containing all characters, both uppercase and lowercase, to be searched for.

Discussion

In the example code, the foundPos variable contains the location of the found character/string within the searchedStr string. The startPos variable contains the next position at which to start the search. The IndexOf or IndexOfAny method is used to perform the actual searching. The count variable simply counts the number of times the character/string was found in the searchedStr string.

The example uses a do loop so that the IndexOf or IndexOfAny operation is executed at least one time before the check in the while clause is performed. This check determines whether there are any more character/string matches to be found in the searchedStr string. This loop terminates when foundPos returns -1 (meaning that no more character/strings can be found in the searchedStr string) or when an out-of-bounds condition exists. When foundPos equals -1, there are no more instances of the match value in the searchedStr string; therefore, you can exit the loop. If, however, the startPos overshoots the last character element of the searchedStr string, an out-of-bounds condition exists and an exception is thrown. To prevent this, always check to make sure that any positioning variables that are modified inside of the loop, such as the startPos variable, are within their intended bounds.

Once a match is found by the IndexOf or IndexOfAny method, the if statement body is executed to increment the count variable by one and to move the startPos up past the previously found match. The count variable is incremented by one to indicate that another match was found. The startPos is increased to the starting position of the last match found plus 1. Adding 1 is necessary so that you do not keep matching the same character/string that was previously matched, which will cause an infinite loop to occur in the code if at least one match is found in the searchedStr string. To see this behavior, remove the +1 from the code.

There is one potential problem with this code. Consider the case where:

 searchedStr = "aaaa"; matchStr = "aa"; 

The code contained in this recipe will match "aa" three times.

 (aa)aa a(aa)a aa(aa) 

This situation may be fine for some applications, but not if you need it to return only the following matches:

 (aa)aa aa(aa) 

To do this, change the following line in the while loop:

 startPos = foundPos + 1; 

to this:

 startPos = foundPos + matchStr.Length; 

This code moves the startPos pointer beyond the first matched string, disallowing any internal matches.

To convert this code to use a while loop rather than a do loop, the foundPos variable must be initialized to 0, and the while loop expression should be as follows:

 while (foundPos >= 0 && startPos < searchStr.Length) {    foundPos = searchedStr.IndexOf(matchChar, startPos);    if (foundPos > -1)    {        startPos = foundPos + 1;        count++;    } } 

See Also

See the "String.IndexOf Method" and "String.IndexOfAny Method" topics in the MSDN documentation.



C# Cookbook
Secure Programming Cookbook for C and C++: Recipes for Cryptography, Authentication, Input Validation & More
ISBN: 0596003943
EAN: 2147483647
Year: 2004
Pages: 424

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net