RegularExpressionValidator Control

I l @ ve RuBoard

`RegularExpressionValidator` Control

Before we go into this control, let's briefly discuss regular expressions, what they are, and how they are used. A complete coverage of regular expressions is well beyond the scope of this chapter (and this book!), but in this chapter we will provide you with a brief introduction to this powerful tool, along with a number of helpful examples and references to resources where you can find more information. Regular expressions are often the simplest, easiest way to search or compare strings, so they naturally can be very powerful when used to validate user input on forms.

A regular expression is a pattern of text that serves as a template for a particular character pattern. Strings can then be evaluated to determine whether any portion(s) of the string match the pattern described by the regular expression. Regular expression syntax includes specifications for many special metacharacters, that give these patterns a great deal of flexibility and power (and also sometimes make them look as if a drunken monkey was typing on the keyboard to produce the pattern). Nonetheless, regular expressions are a very powerful tool, that until recently have been largely limited to the UNIX world of computing.

Regular expressions were developed in the 1950s, believe it or not, by American mathematician Stephen Kleene, and were used to describe what he called "the algebra of regular sets." Ken Thompson, the principal inventor of UNIX, would later incorporate regular expressions into that operating system's search utilities. The first practical application of regular expressions was in the Unix editor called qed.

http://msdn.Microsoft.com/scripting/jscript/doc/reconearlybeginnings.htm

More recently, regular expressions have found their way into many different search engine utilities, JScript/JavaScript, and even recent versions of Microsoft's VBScript language.

At their simplest, regular expressions are just simple strings. If you have a string and you want it to match "abc", you could simply use the regular expression " abc ". However, this expression would also match "123abc", "abc123", and "This string has abc in it". In order to restrict the pattern matching to JUST the string in the expression, it is necessary to use a couple of metacharacters, the caret (" ^ ") and the dollar sign (" $ "). The caret (" ^ )" character matches the beginning of a string, and the dollar sign (" $ )" matches the end of a string. So to create a regular expression that will only match "abc", you would use the expression " ^abc$ ". Now, this is pretty cool, but obviously the utility of matching a single, hard-coded expression is very limited. Let's move on to look at wildcard characters .

What if you want to match all strings that began with "abc" and ended with "123" regardless of any characters in the middle? Then you could use the expression " ^abc*123$ " because the asterisk (" * ") character matches any number of characters. In this way, it works very much like the DOS * wildcard. To match a single character, you use the period (" .) ", and so " ^abc.123$ " would match "abca123" and "abc5123" but not "abcXX123".

Finally, you can match a range of characters by placing the character range within hard braces (or square brackets) "([ ])". The hyphen "( - )" character can be used to specify a range, or a list of characters can be placed in the braces. For example, "[12345]" and "[1-5]" would both match any numeric character from 1 to 5. Using the caret (" ^ ")at the beginning of a bracket expression has the effect of inverting the match, so [^1-5] would match anything but a numeric character from 1 to 5.

Now, that should give you enough of a grip on regular expressions for us to start using them. Let's take a quick look at the RegularExpressionValidator's one special property that makes it stand out from the BaseValidator class (see Table 8.6), and then we'll look at some examples.

Table 8.6. `RegularExpressionValidator` Class Additional Property

Method	Type	Description
`ValidationExpression`	`String`	The regular expression to be used to validate the `ControlToValidate` .

The RegularExpressionValidator control only exposes one property in addition to the standard properties common to all validator controls, the ValidationExpression . This property is set to the regular expression used by ASP.NET to match with the ControlToValidate , and the control will conclude that the ControlToValidate is valid if it finds that its contents match this expression. Listing 8.4 demonstrates how to validate a few common fields using the RegularExpressionValidator control.

Listing 8.4 Adding regular expression validation to a text box.

 <%@ Page %> <HTML>     <HEAD>         <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" >     </HEAD>     <body>         <form method="post" runat="server">             <asp:Label ID="errorMessage" Runat="server"></asp:Label>             <table>                 <tr>                     <td>                         Email                     </td>                     <td>                         <asp:textbox runat="Server" columns="30"     id="email"></asp:textbox>                     </td>                     <td>                         <asp:regularexpressionvalidator runat="Server"     id="email_regex" controltovalidate="email" errormessage="" display="dynamic"     validationexpression="^[\ w-\ .]{ 1,} @([\ w-]+\ .)+[\ w-]{ 2,3} $">                         *                         </asp:regularexpressionvalidator>                     </td>                 </tr>                 <tr>                     <td>                         Phone                     </td>                     <td>                         <asp:textbox runat="Server" columns="15"     id="phone"></asp:textbox>                     </td>                     <td>                         <asp:regularexpressionvalidator runat="Server"     id="phone_regex" controltovalidate="phone" errormessage="" display="dynamic"     validationexpression="^[2-9]\ d{ 2} -\ d{ 3} -\ d{ 4} $">                         *                         </asp:regularexpressionvalidator>                     </td>                 </tr>                 <tr>                     <td>                         Social Security Number                     </td>                     <td>                         <asp:textbox runat="Server" columns="15"     id="ssn"></asp:textbox>                     </td>                     <td>                         <asp:regularexpressionvalidator runat="Server"     id="ssn_regex" controltovalidate="ssn" errormessage="" display="dynamic"     validationexpression="^\ d{ 3} -\ d{ 2} -\ d{ 4} $">                         *                         </asp:regularexpressionvalidator>                     </td>                 </tr>                 <tr>                     <td>                         ZIP/Postal Code                     </td>                     <td>                         <asp:textbox runat="Server" columns="15"     id="zip"></asp:textbox>                     </td>                     <td>                         <asp:regularexpressionvalidator runat="Server"     id="zipregex" controltovalidate="zip" errormessage="" display="dynamic"     validationexpression="^\ d{ 5} -\ d{ 4} \ d{ 5} [A-Z]\ d[A-Z]\ s{ 1} \ d[A-Z]\ d$">                             *                         </asp:regularexpressionvalidator>                     </td>                 </tr>             </table>             <asp:button id="save_button" runat="Server"     text="Save"></asp:button>         </form>     </body> </HTML>

This example uses four regular expressions, which validate commonly used form fields. Let's examine them in detail to see exactly how they are working before we conclude our coverage of the RegularExpressionValidator . First, and probably the most common of all, is the e-mail regular expression. In researching this chapter I quickly found at least half a dozen different e-mail validation regular expressions, each slightly different from the others in syntax and function. Some explicitly required that the last part of the domain name be either " com ", " edu ", " mil ", " gov ", " net ", or " org ". Others simply checked to ensure that an ampersand (" @ )" and a period (" . )" were present in the string to be checked. Given the fact that every country has its own top-level domain (TLD), and several new TLDs are slated to be introduced in the next year or two, I chose to simply check the pattern of the string, rather than the specific words involved. The regular expression used to validate an e-mail address in Listing 8.4 looks like this:

  ^[\ w-\ .]+@([\ w-]+\ .)+[\ w-]{ 2,3} $

This regular expression uses a few metacharacters that we haven't yet covered, so we'll go over them now. Starting at the left, we've already discussed the caret (" ^ ") character, which matches the beginning of the string. And we know that the "[]" brackets are used to contain lists of characters. In this case, the "\w" sequence is a special character sequence that matches any alphanumeric word sequence, including the underscore ""("_) character. It is functionally equivalent to " [A-Za-z0-9_] ", but much shorter. Because hyphens and periods are also allowed in e-mail addresses, we also list the hyphen (" - ") and the period (" \. "). The period is a special character by itself, and so must be "escaped" by using the backslash "( \) "in order to reference the actual period. Next comes the plus "( + )" character. This character matches the preceding expression one or more times. In this case, it means that we will have any combination of alphanumeric characters, underscores, hyphens, and periods consisting of one or more of these values.

That brings us to the ampersand (" @ )" character, which simply matches the "@" character in the e-mail address. Following the " @ ", things are a bit more complicated. The use of parentheses ( ) is similar to those in mathematical expressions, as a means of grouping expressions. Inside the parentheses, we are matching any series of alphanumeric characters (including hyphens and underscores, but not periods), as represented by the expression " [\w-]+ ". This is then followed by the backslash and a single period to create the expression "( \.) ". The entire expression in parentheses is then matched one or more times by adding a plus "( +) " to the end of it.

Finally, the last portion of the expression consists of "[\w-]{ 2,3} $ ". This uses the familiar "[\w-]" expression, but now we introduce yet another new metacharacter, the curly braces ( { } ). Curly braces are used to designate an exact number of times to match the preceding expression, similar to the plus "( + )" but more specific. The curly braces include a minimum and maximum value (or if only one value is used, only that exact number of matches is allowed). In this case, we are only matching character strings that are two or three characters long. Lastly, we match the end of the string with the dollar sign "( $ )".

Okay, that was a lot to cover for one regular expression, but the good news is that you now know 90% of the most commonly used characters used in regular expressions. We'll see that the next three expressions are much easier to explain, now that you have this strong knowledge base. The next regular expression used in Listing 8.4 is for a phone number and matches patterns that look like this:

 "ANN-NNN-NNNN"

where N is a numeric digit from 0 to 9, and A is a numeric digit from 2 to 9 (0 and 1 are not valid starting digits for phone numbers). Note that this regular expression only matches domestic U.S. telephone numbers , but it would not be difficult to modify this regular expression to suit another country's standard, or to support international extensions. The regular expression to match this pattern looks like this:

  ^[2-9]\ d{ 2} -\ d{ 3} -\ d{ 4} $

You're familiar with the caret "( ^ )" and the dollar sign "($)" by now. The first character is matched by the " [2-9] " expression, which matches any number from 2 to 9. This is immediately followed by "\d{ 2} ". The backslash followed by the letter d "(\d )" is a special metacharacter that matches any numeric digit; it is equivalent to " [0-9] ". And of course we've already covered curly braces ”this evaluates to exactly two occurences of "\d", or two numeric digits. This is followed by the hyphen "( - )", which matches the same character "( - )" in the phone number. Next we match exactly three numeric digits, another hyphen, and then exactly four numeric digits, with the remaining expression of "\d{ 3} -\d{ 4} ", and that is it!

After you've seen the phone number regular expression, the social security number expression should be pretty self-explanatory. Its expression matches the pattern of "NNN-NN-NNNN", where N is any numeric digit.

  ^\ d{ 3} -\ d{ 2} -\ d{ 4} $

This evaluates to exactly three numeric digits, a hyphen, exactly two numeric digits, another hyphen, and finally exactly four numeric digits. Simple, right?

The last regular expression we will examine matches a ZIP or postal code for the United States or Canada. This is a bit more complicated than the phone or SSN examples because there are three different options that are all valid. The ZIP code is valid if it consists of five numbers or five numbers, a hyphen, and four numbers. The Canadian postal codes, on the other hand, have the form of "ANA NAN" where N is a number from 0 to 9 and A is a capital letter from A to Z. The expression, then, is

  ^\ d{ 5} -\ d{ 4} \ d{ 5} [A-Z]\ d[A-Z] \ d[A-Z]\ d$

This introduces a few more new characters that we have yet to deal with. Going from left to right, we see that the first portion of the expression matches ZIP+4 patterns of exactly five digits, a hyphen, and exactly four digits, using the pattern " \d{ 5} -\d{ 4} ". The next character, the pipe "( )", is read as "or", and is used to separate multiple valid patterns. In this case, the next valid expression is "\d{ 5} ", which is exactly five numeric digits (which matches a standard U.S. ZIP code). Next we encounter another pipe "( )" character, which then leads to our third valid option. Although this looks fairly imposing , it is actually pretty straightforward. The first half of this expression is " [A-Z]\d[A-Z] " and simply matches any single capital letter, any single numeric character, and then any single capital letter. This is followed by a space, which simply matches a space in the postal code, and finally the expression "\d[A-Z]\d ", which matches a single numeric character, a single capital letter, and finally another single numeric character.

You can find a lot more information on regular expressions in Microsoft's MSDN resource. Their scripting documentation includes a good article on regular expression syntax, located at http://msdn.microsoft.com/scripting/vbscript/doc/jsgrpregexpsyntax.htm. A quick search for regular expressions in any search engine will yield many more articles online. Finally, for a growing list of ready-made regular expressions and an online expression tester, visit the Regular Expression Library at http://regexlib.com/.

I l @ ve RuBoard

RegularExpressionValidator Control