Working with Regular Expressions
Regular expressions give you the ability to manipulate and search strings using a special syntax, and there is a great deal of support for regular expressions in C#. A regular expression can be applied to text, and can search and modify that text. Regular expressions are a language all their own, and it's not the
A full treatment on creating regular expressions is beyond the scope of this book (this topic alone would take a complete chapter), but you can find many useful regular expressions already built into the Regular Expression Editor in the C# IDE. Here are some of the pre-built regular expressions you'll find there:
As you can see, regular expressions are terse and pretty tightly packed. In the previous regular expressions,
\w
stands for a "word character" such as a letter or
{% if main.adsdop %}{% include 'adsenceinline.tpl' %}{% endif %} Let's take a look at an example. Say, for example, that you wanted to pick all the lowercase words out of a string. Using Regular Expression Matches
Say that we had some sample text,
"Here is the text!"
, and we wanted to pull all words that only had lowercase letters out of it. To match words with lowercase
So to match lowercase words, we can use the regular expression \b[a-z]+\b . The + operator means "one or more of," so this is a word boundary, followed by one or more lowercase letters, followed by a word boundary. Now we have the regular expressionhow do we use it? Using the System.Text.RegularExpressions.Regex class, we can create a new RegEx object, which holds your regular expression:
using System.Text.RegularExpressions;
class ch02_25
{
static void Main()
{
string text = "Here is the text!";
Regex regularExpression = new Regex(@"\b[a-z]+\b");
.
.
.
static void Main()
{
string text = "Here is the text!";
Regex regularExpression = new Regex(@"\b[a-z]+\b");
MatchCollection matches = regularExpression.Matches(text);
.
.
.
This MatchCollection object contains the matches in text to our regular expression. All we have to do now is to loop over that object with a foreach loop to display those matches, as you see in ch02_25.cs, Listing 2.25. Listing 2.25 Using Regular Expressions (ch02_25.cs)
using System.Text.RegularExpressions;
class ch02_25
{
static void Main()
{
string text = "Here is the text!";
Regex regularExpression = new Regex(@"\b[a-z]+\b");
MatchCollection matches = regularExpression.Matches(text);
foreach (Match match in matches)
{
if (match.Length != 0)
{
System.Console.WriteLine(match);
}
}
}
}
Here's what you see when you run this code: C:>ch02_25 is the text As you can see, the code found and displayed the lowercase words in the test string. Using Regular Expression GroupsRegular expressions can also use groups to perform multiple matches in the same text string. For example, take a look at this string: "Order number: 1234 Customer number: 5678" Say that you wanted to pick out the order number (1234) and customer number (5678) from this text. You can match a digit with \d , so to match a four-digit number, you use \d\d\d\d .
Here's a regular expression that will match the string (note that in regular expressions, characters match
"Order number: (\d\d\d\d) Customer number: (\d\d\d\d)"
Note the parentheses around the four-digit
"Order number: (?<order>\d\d\d\d) "Customer number: (?<customer>\d\d\d\d)" After we apply this regular expression to the sample text, we need to recover the text that matched each group to get the order and customer numbers. You do that with a Match object's Groups propertyfor example, match .Groups["order"] will return the match to the order group. You can see this at work in ch02_26.cs, Listing 2.26, where we're recovering matches to the order and customer groups. Listing 2.26 Using Regular Expression Groups (ch02_26.cs)
using System.Text.RegularExpressions;
class ch02_26
{
static void Main()
{
string text = "Order number: 1234 Customer number: 5678";
Regex regularExpression =
new Regex(@"Order number: (?<order>\d\d\d\d) " +
@"Customer number: (?<customer>\d\d\d\d)");
MatchCollection matches = regularExpression.Matches(text);
foreach (Match match in matches)
{
if (match.Length != 0)
{
System.Console.WriteLine("Order number: {0}",
match.Groups["order"]);
System.Console.WriteLine("Customer number: {0}",
match.Groups["customer"]);
}
}
}
}
Here's what you see when you run this code. As you can see, the code picked out the order and customer numbers: C:\>ch02_26 Order number: 1234 Customer number: 5678 Using Capture CollectionsYou can even use the same group name multiple times in the same regular expression. For example, take a look at this text: "Order, customer numbers: 1234, 5678"; Say that you wanted to get the two four-digit numbers here. In this case, you could use this regular expression with two named groupsboth of which are named number : "(?<number>\d\d\d\d), (?<number>\d\d\d\d)" Now when you try to display the matches to the number group, you might use this code:
Regex regularExpression =
new Regex(@"(?<number>\d\d\d\d), (?<number>\d\d\d\d)");
MatchCollection matches = regularExpression.Matches(text);
foreach (Match match in matches)
{
if (match.Length != 0)
{
System.Console.WriteLine(match.Groups["number"]);
}
}
The problem here is that when you run this code, you'll only see this result, which is the second number we're looking for: 5678 The second number group match overwrote the first group match. To find the matches to both groups even though they have the same name, you can use the Captures collection, which contains all the matches to groups with a particular name. In this case, match .Groups["number"].Captures will return a Captures collection of Capture objects holding all the matches to the number group. And all we have to do is to loop over that collection and display the matches we've found. You can see how this works in ch02_27.cs, Listing 2.27, which uses two groups named number in the same regular expression and displays the matches to both groups. Listing 2.27 Using Regular Expression Capture Groups (ch02_27.cs)
using System.Text.RegularExpressions;
class ch02_27
{
static void Main()
{
string text = "Order, customer numbers: 1234, 5678";
Regex regularExpression =
new Regex(@"(?<number>\d\d\d\d), (?<number>\d\d\d\d)");
MatchCollection matches = regularExpression.Matches(text);
foreach (Match match in matches)
{
if (match.Length != 0)
{
foreach (Capture capture in
match.Groups["number"].Captures)
{
System.Console.WriteLine(capture);
}
}
}
}
}
Here are the results of this code, which picked up both matches: C:\>ch02_27 1234 5678 And that's itnow you're working with regular expressions in C#. We have some basic C# programming under our belts at this point; the next step, starting in Chapter 3, is where C# really starts to come into its ownin object-oriented programming. |