In the previous matching tag examples, my input has matched p and H2 tags, and the Regex finds them just fine. However, there s nothing in the regular expression itself that requires the opening and closing tags to match. I m going to add a test that shows that unmatched tags will still pass this Regex, and then see if I can figure out how to require them to match. I seem to remember that there s a way to do that with the Regex class. Here s the new test:
[Test] public void InvalidXmlNotHandledYet() {
Regex r = new Regex("<(?<prefix>.*)>(?<body>.*)</(?<suffix>.*)>");
Match m = r.Match("<p>this is a para</H2>");
Assert(m.Success);
AssertEquals("p",m.Groups["prefix"].Value);
AssertEquals("H2",m.Groups["suffix"].Value);
}
Just as expected, the same Regex matches a p followed by an H2. Not what we really want, but we want to be sure we understand what our code does. This test now motivates the next extension, to a Regex that does force the tags to match. I m not sure we will need this ”we may already have gone beyond our current need for regular expressions, but my mission here is to learn as much as I can, in a reasonable time, about how Regex works. Now I ll have to search the Help a bit. Hold on...
The documentation seems to suggest that you can have named backreferences , using \k. I ll write a test. Hold on again... All right! Worked almost the first time: just a simple mistake away from perfect. Here s the new test:
[Test] public void Backreference() {
Regex r = new Regex("<(?<prefix>.*)>(?<body>.*)</\k<prefix>.*>");
Match m = r.Match("<p>this is a para</p>");
Assert(m.Success);
m = r.Match("<p>this is a para</H2>");
Assert(!m.Success);
}
In this test, notice that we had to type \\k to get the \k into the expression. This is because C# strings, like most languages strings, already use the backslash to prefix newlines and other special characters . We have to type two of them to get one backslash into the string. The amazing thing is that I actually remembered to do that the first time! The mistake? I left the word suffix there instead of saying \k<prefix>, as was my intent.