Your Regex Library


Are you ready to explore your own exclusive library of ready-to-run regular expressions, sample code and all? Here s a listing of the exciting tips we ll be covering in this chapter:

  • Exactly-One-Digit Checker

  • Real Number Matcher

  • Alphanumerical Matcher: No Spaces or Dots

  • 24- Hour Clock Time Check

  • Identifying Valid Dates

  • File Path and Extension Check

  • Checking for Repeated Words

  • Getting Capitalized Words

  • Matching Numbers from a String

  • Who Are You? ” Name Checker

  • Naughty-Word Filter Expression

  • True Email Address Validation

  • Validating a Web Site Address

  • Internet URL Matcher: FTP, HTTP, HTTPS

  • Checking for a Valid Domain

  • IP Address Checker

  • Extracting Links and Images from HTML

  • Checking HTML Color Codes

  • Credit Card Validation

  • Password Format Enforcing Expression

  • Defining Your Own HTML: Custom Tags, with Expressions

  • ISBN Checker

  • Is That a GUID?

  • U.S. ZIP Code Checker

  • U.S. Social Security Number Checker

  • U.S. Phone Number Checker

  • U.S. State Checker

  • U.K. Postal Code Matcher

  • U.K. National Insurance Number Check

  • U.K. Telephone Number Validator

  • Converting American and British Dates

  • French, German, and Japanese Expressions

  • The Simple Cure for Loose Expressions

Enough talk. A little less conversation, a little more action....

Exactly-One-Digit Checker

Starting off simply, we re going to look at two regular expressions that will match exactly one digit ”firstly, any digit between zero and nine, and, secondly, one digit between a certain range. We ll be using the System.Text.RegularExpressions.Regex.IsMatch shared function to compare a value with our expression. If a match occurs, a True is returned.

One-Digit Regular Expression

Expression: ^\d$

Sample matches: 0, 5, 3

Sample nonmatches: K, 492, Jazz

Sample VB .NET code:

 Dim blnMatch As Boolean, strValue As String = "5"  blnMatch = System.Text.RegularExpressions.Regex.IsMatch( _     strValue, "^\d$") 

One Digit in Range Regular Expression

Expression: ^[5-8]$ (the 5 and 8 boundaries can be altered to any single digit)

Sample matches: 5, 6, 8

Sample nonmatches: 3, 9, K

Sample VB .NET code:

 Dim blnMatch As Boolean, strValue As String = "10"  blnMatch = System.Text.RegularExpressions.Regex.IsMatch( _     strValue, "^[5-8]$") 

Real Number Matcher

It s often useful to check for a real number, without having to resort to the less-flexible and now- defunct IsNumeric function. This regular expression does just that and allows for an optional positive or negative sign, too.

Real Number Regular Expression

Expression: ^[-+]?\d+(\.\d+)?$

Sample matches: 18, +3.142, -0.20

Sample nonmatches: 540-70, .70, 250x

Sample VB .NET code:

 Dim blnMatch As Boolean, strValue As String = "-0.20"  blnMatch = System.Text.RegularExpressions.Regex.IsMatch( _     strValue, "^[-+]?\d+(\.\d+)?$") 

Alphanumerical Matcher: No Spaces or Dots

It s occasionally useful to check whether a string consists of purely alphabetical and numeric characters ”and not the likes of spaces, dots, backslashes, or other weird whatnots. This expression is best used to ensure a user has chosen an acceptable username and password during signup .

Alphanumerical Regular Expression

Expression: ^[a-zA-Z0-9]+$

Sample matches: karlmoore, 10b, green63

Sample nonmatches: 3.142, United Kingdom, $48

Sample VB .NET code:

 Dim blnMatch As Boolean, strValue As String = "karl.moore"  blnMatch = System.Text.RegularExpressions.Regex.IsMatch( _     strValue, "^[a-zA-Z0-9]+$") 

24-Hour Clock Time Check

This next regular expression can be a real gem. It checks for a valid time in the HH:MM 24-hour clock format.

24-Hour HH:MM Regular Expression

Expression: ^([0-1][0-9][2][0-3]):([0-5][0-9])$

Sample matches: 12:00, 19:34, 02:57

Sample nonmatches: 02:57am, 12:18 PM, 24:00

Sample VB .NET code:

 Dim blnMatch As Boolean, strValue As String = "17:57"  blnMatch = System.Text.RegularExpressions.Regex.IsMatch( _     strValue, "^([0-1][0-9][2][0-3]):([0-5][0-9])$") 

Identifying Valid Dates

It s often useful to identify valid dates, but, with IsDate now relegated to the uncool Microsoft.VisualBasic namespace, developers are looking for another method of checking.

This little expression checks that the format of a value matches the format XX/XX/YYYY, where XX can be either one or two digits and YYYY is always exactly four digits. It s not foolproof, as the sample matches show, but it s useful as a stopgap check. (See Checking for a Date the Intelligent .NET Way in Chapter 7 for a more sound method of checking for a valid date.)

XX/XX/YYYY Date Checker Regular Expression

Expression: ^\d{1,2}(\/-)\d{1,2}(\/-)\d{4}$

Sample matches: 1/1/2004, 20/05/1975, 99/99/9999

Sample nonmatches: 1/1/04, 001/01/2004, 08-08-2004

Sample VB .NET code:

 Dim blnMatch As Boolean, strValue As String = "05/02/2004"  blnMatch = System.Text.RegularExpressions.Regex.IsMatch( _     strValue, "^\d{1,2}(\/-)\d{1,2}(\/-)\d{4}$") 

File Path and Extension Check

Imagine that the user types in a filename. You may not want to use the typed file just yet (perhaps it doesn t exist), but check that its path format is correct and that the file extension is appropriate. The following expression does just that, and even handles network locations.

File Path Regular Expression

Expression: ^([a-zA-Z]\:\\)\\([^\\]+\\)*[^\/:*?" < > ]+\.DOC(l)?$ (alter the

DOC here to your valid file extension, use IgnoreCase )

Sample matches: c:\data.doc, e:\whitecliff\staff\km\file.DOC,

\\network\km\file.doc

Sample nonmatches: c:\, c:\myreport.txt, sitrep.doc

Sample VB .NET code:

 Dim blnMatch As Boolean, strValue As String = "c:\files\report.doc"  blnMatch = System.Text.RegularExpressions.Regex.IsMatch( _   strValue, "^([a-zA-Z]\:\)\([^\]+\)*[^\/:*?""<>]+\.doc(l)?$", _   System.Text.RegularExpressions.RegexOptions.IgnoreCase) 

Checking for Repeated Words

Code for handling regular expressions is pretty standard. In fact, there are three core techniques: using .IsMatch as we have so far; using .Replace as we will shortly; and, retrieving a list of matches from a chunk of text and then cycling through the results. That s the format adopted by the following regular expression, which checks for repeated words in a string ” especially useful for error checking. This expression works using back references and word boundaries and is generally a little more complex than the ones we ve seen so far.

Repeated-Words Regular Expression

Expression: \b(\w+)\s+\1\b ( use IgnoreCase )

Sample matches: apple apple, and the the views were amazing, Truly truly

Sample nonmatches: karl, an occurrence, didn t didn t ( apostrophe, hyphens and most other nonalphabetic characters split words )

Sample VB .NET code:

 ' Setup class references  Dim objRegEx As System.Text.RegularExpressions.Regex  Dim objMatch As System.Text.RegularExpressions.Match  ' Create regular expression  objRegEx = New System.Text.RegularExpressions.Regex( _      "\b(\w+)\s+\b", _      System.Text.RegularExpressions.RegexOptions.IgnoreCase _      Or System.Text.RegularExpressions.RegexOptions.Compiled)  ' Match our text with the expression  objMatch = objRegEx.Match("Why didn't they they ask Evans?")  ' Loop through matches and display captured portion in MessageBox  While objMatch.Success      Dim strMatch As String      strMatch = objMatch.Groups(1).ToString      MessageBox.Show(strMatch)      objMatch = objMatch.NextMatch()  End While 

Getting Capitalized Words

This next expression is relatively simple and could have a number of different uses, like identifying proper nouns such as names and places, or picking out keywords for summarizing a document. It matches with words that begin with a capitalized letter and have the rest of the word in lowercase. Watch out, however: this expression uses word boundaries, defining words as an alphabetic string separated by nonalphabetic characters. So this means that didn t is classed as two separate words ( Didn and t ). Worth keeping in mind.

Capitalized-Word Regular Expression

Expression: (\b[^\Wa-z0-9_][^\WA-Z0-9_]*\b)

Sample matches: Amazing, Peter, Bonbon

Sample nonmatches: james, VB .NET, BizArre

Sample VB .NET code:

 ' Setup class references  Dim objRegEx As System.Text.RegularExpressions.Regex  Dim objMatch As System.Text.RegularExpressions.Match  ' Create regular expression  objRegEx = New System.Text.RegularExpressions.Regex( _      "(\b[^\Wa-z0-9_][^\WA-Z0-9_]*\b)", _      System.Text.RegularExpressions.RegexOptions.Compiled)  ' Match our text with the expression  objMatch = objRegEx.Match("VB.NET Secrets is A ReAl amazin' Read!")  ' Loop through matches and display matching value in MessageBox  While objMatch.Success      Dim strMatch As String      strMatch = objMatch.Value      MessageBox.Show(strMatch)      objMatch = objMatch.NextMatch()  End While 

Matching Numbers from a String

Being able to pick out the numbers from a string is a highly useful type of artificial intelligence. You may wish to automatically parse appropriate account numbers from an email message and suggest them to your customer service advisor, for example. This expression enables you to pass it a string and return individual matches on all the sets of numerical data contained therein. The expression is actually similar to the Real Number Matcher tip earlier, but, instead of just checking whether one string matches, our code searches and returns all matches in a string.

Matching-Numbers Regular Expression

Expression: (\d+\.?\d*\.\d+)

Sample matches: 75, 0.01, 812.15

Sample nonmatches: one, eight pounds , & pound ;-.

Sample VB .NET code:

 ' Setup class references  Dim objRegEx As System.Text.RegularExpressions.Regex  Dim objMatch As System.Text.RegularExpressions.Match  ' Create regular expression  objRegEx = New System.Text.RegularExpressions.Regex( _      "(\d+\.?\d*\.\d+)", _      System.Text.RegularExpressions.RegexOptions.Compiled)  ' Match our text with the expression  objMatch = objRegEx.Match("John is 50. Acct #32315. Owes .21.")  ' Loop through matches and display matching value in MessageBox  While objMatch.Success      Dim strMatch As String      strMatch = objMatch.Value      MessageBox.Show(strMatch)      objMatch = objMatch.NextMatch()  End While 

Who Are You? ”Name Checker

I like this one. It s cool. It s intelligent. It s a name checker: simply pass in the first and/or last name of a person in any case, and it ll match on a valid name. It s not perfect, but it ll filter out many incorrect name formats, such as those containing invalid characters or numerical data. It s great for quickly removing invalid entries in your mailing list or deleting spam from your inbox.

Valid-Name Regular Expression

Expression: ^[a-zA-Z]+(([\'\,\.\- ][a-zA-Z ])?[a-zA-Z]*)*$

Sample matches: K. Moore, Mike O Brien, Elizabeth Du-Banter

Sample nonmatches: Karl01, Mike_brien, john--doe

Sample VB .NET code:

 Dim blnMatch As Boolean, strValue As String = "Visual Basic .NET"  blnMatch = System.Text.RegularExpressions.Regex.IsMatch( _      strValue, "^[a-zA-Z]+(([\'\,\.\- ][a-zA-Z ])?[a-zA-Z]*)*$") 

Naughty-Word Filter Expression

If you re running a site where users are allowed to make unmoderated posts, you may need to screen offensive language. Well, this regular expression can help filter even the naughtiest of swear words. It matches a whole word and will replace it with the phrase of your choice. You simply need to replace the sample words with your own, separating each entry with the pipe character ( ).

Profanity Filter Regular Expression

Expression: (\b damn \b)(\b hell \b) ( change to use your own words. Use IgnoreCase where appropriate )

Sample matches: damn, to hell with that, you are a Hell s Angel

Sample nonmatches: be damned , hello squire, hellish bean

Sample VB .NET code:

 Dim objRegEx As New System.Text.RegularExpressions.Regex( _      "(\bdamn\b)(\bhell\b)", _      System.Text.RegularExpressions.RegexOptions.IgnoreCase)  Dim strValue As String = "Oh hell, fiend, damn you!"  Dim strResult As String = _      objRegEx.Replace(strValue, "#@%$!!") 

True Email Address Validation

You ll find many email address regular expressions winging around the Web these days, but they never really take into account the variety of addresses that are available, and usually cater only to mainstream users with AOL accounts. Those with email addresses based on an IP address, or using little-known country codes, often get ignored. The following regular expression, however, adheres to the email address naming specification and should match on all valid addresses.

Real Email Regular Expression

Expression: ^([a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0- 9]{1,3}\.)(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}[0-9]{1,3})(\]?)$

Sample matches: <karl@karlmoore.com>, <foo12@bar6.edu>, <a.lan@bury.tv>

Sample nonmatches: <karl, @karlmoore.com>, <john@johnsplace>

Sample VB .NET code:

 Dim blnMatch As Boolean, strValue As String = "karl@karlmoore.com"  blnMatch = System.Text.RegularExpressions.Regex.IsMatch(strValue, _      "^([a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)" & _      "(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}[0-9]{1,3})(\]?)$") 

Validating a Web Site Address

Looking to ensure that your user has entered a correctly formatted Web site address, including the HTTP? This is the regular expression for you.

Web Site Address Regular Expression

Expression: ^(http://([\w-]+\.)+[\w-]+(/[\w- ./?% & =]*)?)$

Sample matches: http://www.karlmoore.com, http://mtv.com/

Sample nonmatches: www.yahoo.com, cnn.com

Sample VB .NET code:

 Dim blnMatch As Boolean, strValue As String = "http://www.bbc.co.uk/"  blnMatch = System.Text.RegularExpressions.Regex.IsMatch(strValue, _      "^http://([\w-]+\.)+[\w-]+(/[\w- ./?%&=]*)?") 

Internet URL Matcher: FTP, HTTP, HTTPS

Most supposed URL regular expressions don t match against the likes of ftp:// and https://. But not this one ”it will match against all ftp://, http://, and https:// URLs. Here, I m also using three sets of code to demonstrate the expression: first, a simple match; second, cycling through a document and extracting all matching URLs; and, third, hyperlinking all matches using the .Replace function and $0 match goes here statement setup.

FTP/HTTP/HTTPS URL Regular Expression

Expression: (httpftphttps):\/\/[\w]+(.[\w]+)([\w\- \.,@?^=% & amp;:/~\+#]*[\w\-\@?^=% & amp;/~\+#])?

Sample matches: https://www.karlmoore.com, ftp://ftp.download.com/

Sample nonmatches: www.yahoo.com, cnn.com, HTTP://www.km.com/

Sample VB .NET code #1:

 Dim blnMatch As Boolean, strValue As String = "ftp://files.mysite.com/"  blnMatch = System.Text.RegularExpressions.Regex.IsMatch(strValue, _      "^((httpftphttps):\/\/[\w]+(.[\w]+)" & _      "([\w\-\.,@?^=%&amp;:/~\+#]*[\w\-\@?^=%&amp;/~\+#])?)$") 
TOP TIP  

You may notice that the expression used here is slightly different from the one listed previously. That s because we re tightening it to ensure that our string exactly matches and doesn t contain any excess. You ll find a full explanation later in this chapter, with The Cure for ˜Loose Expressions tip.

Sample VB .NET code #2:

 ' Setup class references  Dim objRegEx As System.Text.RegularExpressions.Regex  Dim objMatch As System.Text.RegularExpressions.Match  ' Create regular expression  objRegEx = New System.Text.RegularExpressions.Regex( _      "(httpftphttps):\/\/[\w]+(.[\w]+)([\w\-\.,@?^=" & _      "%&amp;:/~\+#]*[\w\-\@?^=%&amp;/~\+#])?", _      System.Text.RegularExpressions.RegexOptions.Compiled)  ' Match our text with the expression  objMatch = objRegEx.Match("<html> http://www.samplesite.com/ " & _      "etc etc ftp://samplesite.com/ </html>")  ' Loop through matches and display matching value in MessageBox  While objMatch.Success      Dim strMatch As String      strMatch = objMatch.Value      MessageBox.Show(strMatch)      objMatch = objMatch.NextMatch()  End While 

Sample VB .NET code #3:

 ' Setup objRegEx, passing in expression ("pattern")  Dim objRegEx As New System.Text.RegularExpressions.Regex( _      "(httpftphttps):\/\/[\w]+(.[\w]+)([\w\-\.,@?^=" & _      "%&amp;:/~\+#]*[\w\-\@?^=%&amp;/~\+#])?")  ' Our actual value  Dim strValue As String = "<html> http://www.samplesite.tv/ " & _      "etc etc ftp://www.samplesite.com/ </html>"  ' Finding matches and replacing with "HREF" and actual match  Dim strResult As String = _      objRegEx.Replace(strValue, "<a href="" 
 ' Setup objRegEx, passing in expression ("pattern") Dim objRegEx As New System.Text.RegularExpressions.Regex( _ "(httpftphttps):\/\/[\w]+(.[\w]+)([\w\-\.,@?^=" & _ "%&amp;:/~\+#]*[\w\-\@?^=%&amp;/~\+#])?") ' Our actual value Dim strValue As String = "<html> http://www.samplesite.tv/ " & _ "etc etc ftp://www.samplesite.com/ </html>" ' Finding matches and replacing with "HREF" and actual match Dim strResult As String = _ objRegEx.Replace(strValue, "<a href=""$0"">$0</a>") 
"">
 ' Setup objRegEx, passing in expression ("pattern") Dim objRegEx As New System.Text.RegularExpressions.Regex( _ "(httpftphttps):\/\/[\w]+(.[\w]+)([\w\-\.,@?^=" & _ "%&amp;:/~\+#]*[\w\-\@?^=%&amp;/~\+#])?") ' Our actual value Dim strValue As String = "<html> http://www.samplesite.tv/ " & _ "etc etc ftp://www.samplesite.com/ </html>" ' Finding matches and replacing with "HREF" and actual match Dim strResult As String = _ objRegEx.Replace(strValue, "<a href=""$0"">$0</a>") 
</a>")

Checking for a Valid Domain

Whether you re running your own domain sales service or are making regular automated updates to the DNS zone files, being able to check that a domain adheres to a valid format is an important feat ”and it s one that this next regular expression will help you overcome .

Note that this expression will allow something like www.karlmoore.com , as it could theoretically be a valid sub-domain. If you don t want this to occur, you may wish to check whether the first three characters are www. ”and if so, strip them out.

Domain-Format Regular Expression

Expression: ^[a-zA-Z0-9]+([a-zA-Z0-9\-\.]+)?\.(comorgnetmiledu info )$ (add more domain suffixes in the same format, as required)

Sample matches: karlmoore.com, secure.whitecliff.net, my-valid-domain.edu

Sample nonmatches: karlmoore!.com, kmcom, test.co.uk , 127.0.0.1

Sample VB .NET code:

 Dim blnMatch As Boolean, strValue As String = "example.com"  blnMatch = System.Text.RegularExpressions.Regex.IsMatch( _  strValue, "^[a-zA-Z0-9]+([a-zA-Z0-9\-\.]+)?\.(comorgnetmileduinfo)$", _     System.Text.RegularExpressions.RegexOptions.IgnoreCase) 

IP Address Checker

It s not the most common requirement in the world, but validating the format of an IP address can be awfully difficult if you re doing it in code. With a regular expression however, it s easy, as this next number demonstrates .

IP Address Regular Expression

Expression: ^(25[0-5]2[0-4][0-9][0-1]{1}[0-9]{2}[1-9]{1}[0-9]{1} [1-9])\.(25[0-5]2[0-4][0-9][0-1]{1}[0-9]{2}[1-9]{1}[0-9]{1} [1-9]0)\.(25[0-5]2[0-4][0-9][0-1]{1}[0-9]{2}[1-9]{1}[0-9]{1} [1-9]0)\.(25[0-5]2[0-4][0-9][0-1]{1}[0-9]{2}[1-9]{1}[0-9]{1}[0-9])$

Sample matches: 127.0.0.1, 255.255.255.0, 168.87.32.1

Sample nonmatches: 256.0.0.1, 123.abc.etc.1, 0.255.255.255

Sample VB .NET code:

 Dim blnMatch As Boolean, strValue As String = "123.456.789.101"  blnMatch = System.Text.RegularExpressions.Regex.IsMatch(strValue, _    "^(25[0-5]2[0-4][0-9][0-1]{1}[0-9]{2}[1-9]{1}[0-9]{1}[1-9])\." & _    "(25[0-5]2[0-4][0-9][0-1]{1}[0-9]{2}[1-9]{1}[0-9]{1}[1-9]0)\." & _    "(25[0-5]2[0-4][0-9][0-1]{1}[0-9]{2}[1-9]{1}[0-9]{1}[1-9]0)\." & _    "(25[0-5]2[0-4][0-9][0-1]{1}[0-9]{2}[1-9]{1}[0-9]{1}[0-9])$") 

Extracting Links and Images from HTML

If you re looking to extract all the links or images from an HTML string, you re in luck. I ve got just the tip for you. Simply flick back to Chapter 7, More .NET Secrets and check out the regular- expression-based Tricks of Parsing a Web Page for Links and Images tip in the Internet section.

Checking HTML Color Codes

This is another one of those expressions that, although you probably won t use it all too often, when you do, it s a lifesaver. This baby checks for valid HTML hexadecimal color codes. Whether they re the three-digit types (one of the 216 Web-safe colors) or in full six-digit form, hash (#) or no hash, this expression will check their validity. It s useful if you re putting together your own graphics or HTML application, or are allowing users to customize colors on your site.

HTML Color Code Regular Expression

Expression: ^#?([a-f][A-F][0-9]){3}(([a-f][A-F][0-9]){3})?$

Sample matches: 00FF00, #0000FF, #039

Sample nonmatches: orange, 0x000000, #000FF

Sample VB .NET code:

 Dim blnMatch As Boolean, strValue As String = "#0000FF"  blnMatch = System.Text.RegularExpressions.Regex.IsMatch(strValue, _      "^#?([a-f][A-F][0-9]){3}(([a-f][A-F][0-9]){3})?$") 

Credit Card Validation

Imagine that you re designing an offline orders system. Your sales people want to be able to record credit card details for debiting later. Wouldn t it be nice to at least check the basic format of the card number to ensure greater accuracy? This expression does just that, matching on all major credit cards, including VISA (length of 16 characters, prefix of 4), MasterCard (length of 16, prefix between 51 and 55), Discover (length of 16, prefix 6011), and American Express (length of 15, prefix 34 or 37). You can even insert optional hyphens between each set of digits.

Credit Card Regular Expression

Expression: ^((?:4\d{3})(?:5[1- 5]\d{2})(?:6011)(?:3[68]\d{2})(?:30[012345]\d))[ -]?(\d{4})[ -]?(\d{4})[ -]?(\d{4}3[4,7]\d{13})$

Sample matches: 6011-1234-5678-1234 (Discover) , 5401 2345 6789 1011 (MasterCard)

Sample nonmatches: 3401-123-456-789, 3411-5555-3333-111

Sample VB .NET code:

 Dim blnMatch As Boolean, strValue As String = "5420206023965209"  blnMatch = System.Text.RegularExpressions.Regex.IsMatch(strValue, _      "^((?:4\d{3})(?:5[1-5]\d{2})(?:6011)" & _      "(?:3[68]\d{2})(?:30[012345]\d))[ -]?(\d{4})" & _      "[ -]?(\d{4})[ -]?(\d{4}3[4,7]\d{13})$") 

Password Format Enforcing Expression

Numerous password- related regular expressions are available on the Internet. This is one of my personally created favorites, however. It checks that the first letter of the password is a letter, that the password is between 4 and 15 characters in length, and that only letters , numbers, and the underscore character are used. Anything else, and it s no match.

Password Format Regular Expression

Expression: ^[a-zA-Z]\w{3,14}$

Sample matches: abcd, SuPeR, password24

Sample nonmatches: oh boy, 12km, longinvalidpassword

Sample VB .NET code:

 Dim blnMatch As Boolean, strValue As String = "MyInvalidPassword"  blnMatch = System.Text.RegularExpressions.Regex.IsMatch(strValue, _     "^[a-zA-Z]\w{3,14}$") 

Defining Your Own HTML: Custom Tags, with Expressions

I run an online forum and know how tricky it can be allowing users to post HTML. They can send the whole page design haywire; they insert little scripts to make windows pop up; they turn my scrollbar a funny color. In short, those few naughty users screw things up ”and that s why I and numerous other Webmasters enjoy implementing a little pseudo-HTML.

How? They design a number of their own tags and get users to type them directly ”such as [url] http://www.yahoo.com/ [/url] . When the pages are served up these tags are processed , using regular expressions, and regular HTML is put in their place. You can use buttons to automatically insert the tags through JavaScript (see how you compose a post at my own www.vbforums.com for an example), and can even embed one tag inside another.

This tip shares a number of the most commonly requested tags, encapsulated in one chunk of commented sample code. After you ve figured out the general structure of these custom tags, you ll find it easy to customize them to your exact requirements.

TOP TIP  

If you want to strip any HTML initially entered by the user, check out the Converting HTML to Text, Easily tip in the Internet section of Chapter 7 ( More .NET Secrets ).

Custom Tag Regular Expressions

Sample VB .NET Code:

 '  Turns [url]http://www.cnn.com/ /[/url]  '  into <a target="_blank" href=  ' "http://www.cnn.com/">http://www.cnn.com/</a>  strResult = System.Text.RegularExpressions.Regex.Replace(YourData, _     "\[url[^>]*?=\s*?[""']?([^'"" >]+?)[ '""]?\]([^\""]*?)\[/url\]", _     "<a target=""_blank"" href=""""></a>", _     System.Text.RegularExpressions.RegexOptions.IgnoreCase)      '  Turns [url=http://www.cnn.com/]CNN[/url]  '  into <a target="_blank" href="http://www.cnn.com/">CNN</a>  strResult = System.Text.RegularExpressions.Regex.Replace(YourData, _     "\[URL\]([^\""]*?)\[/URL\]", _     "<a target=""_blank"" href="""" target=""_new""></a>", _     System.Text.RegularExpressions.RegexOptions.IgnoreCase)  '  Turns [email]myname@domain.com[/email]  '  into <a href="mailto:myname@domain.com">myname@domain.com</a>  strResult = System.Text.RegularExpressions.Regex.Replace(YourData, _     "\[email\]([^\""]*?)\[/email\]", _     "<a href=""mailto:""></a>", _     System.Text.RegularExpressions.RegexOptions.IgnoreCase)  '  Turns [email=myname@domain.com]Click here to email me[/email]  '  into <a href="mailto:myname@domain.com">Click here to email me</a>  strResult = System.Text.RegularExpressions.Regex.Replace(YourData, _     "\[email[^>]*?=\s*?[""']?([^'"" >]+?)[ '""]?\]([^\""]*?)\[/email\]", _     "<a href=""mailto:""></a>", _     System.Text.RegularExpressions.RegexOptions.IgnoreCase)  ' [b]some text[/b] produces bold text  strResult = System.Text.RegularExpressions.Regex.Replace(YourData, _     "\[b\]([^\""]*?)\[/b\]", "<b></b>", _     System.Text.RegularExpressions.RegexOptions.IgnoreCase)  '  [u]some text[/u] produces underlined text  strResult = System.Text.RegularExpressions.Regex.Replace(YourData, _     "\[u\]([^\""]*?)\[/u\]", "<u></u>", _     System.Text.RegularExpressions.RegexOptions.IgnoreCase)  ' [i]some text[/i] produces italicised text  strResult = System.Text.RegularExpressions.Regex.Replace(YourData, _     "\[i\]([^\""]*?)\[/i\]", "<i></i>", _     System.Text.RegularExpressions.RegexOptions.IgnoreCase)  ' [color=blue]some text[/color] produces blue text  strResult = System.Text.RegularExpressions.Regex.Replace(YourData, _     "\[color[^>]*?=\s*[""']?([^'"" >]+?)[ '""]?\]([^\""]*)\[/color\]", _     "<font color=""""></font>", _     System.Text.RegularExpressions.RegexOptions.IgnoreCase)  ' [size=4]some text[/size] produces size 4 text  strResult = System.Text.RegularExpressions.Regex.Replace(YourData, _     "\[size[^>]*?=\s*[""']?([^'"" >]+?)[ '""]?\]([^\""]*)\[/size\]", _     "<font size=""""></font>", _     System.Text.RegularExpressions.RegexOptions.IgnoreCase)  ' [font=courier]some text[/font] produces Courier font text  strResult = System.Text.RegularExpressions.Regex.Replace(YourData, _     "\[font[^>]*?=\s*[""']?([^'"" >]+?)[ '""]?\]([^\""]*)\[/font\]", _     "<font face=""""></font>", _     System.Text.RegularExpressions.RegexOptions.IgnoreCase)  '  Turns [img]http://www.karlmoore.com/km/images/km_splash.gif[/img]  '  into <img src=" http://www.karlmoore.com/km/images/km_splash.gif">  strResult = System.Text.RegularExpressions.Regex.Replace(YourData, _     "\[img\]([^\""]*?)\[/img\]", "<img src="""">", _     System.Text.RegularExpressions.RegexOptions.IgnoreCase)  '  Turns [quote]Do or do not, there is no try.[/quote]  ' into <blockquote><p><hr>My block quite.<hr></p></blockquote>  strResult = System.Text.RegularExpressions.Regex.Replace(YourData, _     "\[quote\]([^\""]*?)\[/quote\]", _     "<blockquote><p><hr><hr></p></blockquote>", _     System.Text.RegularExpressions.RegexOptions.IgnoreCase) 

ISBN Checker

Practically every book you ll find in your local library has an International Standard Book Number (ISBN), and they all adhere to one simple format: they re ten digits and optionally end with an X . And that s exactly what this next regular expression checks for. This expression is useful for error checking your media library database or for those running the next Amazon.com.

ISBN Regular Expression

Expression: ^\d{9}[\dX]$

Sample matches: 1234567890, 159059021X, 123456789X

Sample nonmatches: 159059021-X, IsbnNumber, X1234568X

Sample VB .NET code:

 Dim blnMatch As Boolean, strValue As String = "159059021X"  blnMatch = System.Text.RegularExpressions.Regex.IsMatch(strValue, _     "^\d{9}[\dX]$") 

Is That a GUID?

Globally unique identifiers (GUIDs) are alphanumeric characters grouped together in strings of 8-4-4-4-12 in length. This regular expression checks as to the general format of a GUID. If it contains either letters or numbers in this sequence, along with dashes, it matches.

GUID Regular Expression

Expression: ^[A-Z0-9]{8}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{12}$ (use IgnoreCase )

Sample matches: 87374201-1CB6-46A2-AEAB-C0F2F8ABA75D

Sample nonmatches: 873742011CB646A2AEABC0F2F8ABA75D

Sample VB .NET code:

 Dim strValue As String = New System.Guid().NewGuid.ToString  Dim blnMatch As Boolean  blnMatch = System.Text.RegularExpressions.Regex.IsMatch(strValue, _     "^[A-Z0-9]{8}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{4}-[A-Z0-9]{12}$", _     System.Text.RegularExpressions.RegexOptions.IgnoreCase) 

U.S. ZIP Code Checker

ZIP codes in the United States come in two different formats: the older five-digit type (90210), and the newer and more complex nine-digit version, the ZIP+4, divided by a dash into parts of five then four (60126-8722). This regular expression checks for both, and returns a match if valid.

U.S. ZIP Code Regular Expression

Expression: ^( ^\d{5}(-\d{4})?)$

Sample matches: 90210, 60126-8722, 10101

Sample nonmatches: 9O21O ( letter O rather than zero ), 60126-87234, UsZip-Code

Sample VB .NET code:

 Dim blnMatch As Boolean, strValue As String = "90210"  blnMatch = System.Text.RegularExpressions.Regex.IsMatch(strValue, _     "^(^\d{5}(-\d{4})?)$") 

U.S. Social Security Number Checker

Social security numbers in the United States always follow a definite pattern. Nine digits in total, dotted with hyphens as so: NNN-NN-NNNN. The following expression returns a match if compared with a set of characters that exactly match this format.

U.S. Social Security Regular Expression

Expression: ^\d{3}-\d{2}-\d{4}$

Sample matches: 123-45-6789, 101-56-5032, 103-49-1232

Sample nonmatches: 123456789, 123-456-789, Social_Security

Sample VB .NET code:

 Dim blnMatch As Boolean, strValue As String = "123-45-6789"  blnMatch = System.Text.RegularExpressions.Regex.IsMatch(strValue, _     "\d{3}-\d{2}-\d{4}$") 

U.S. Phone Number Checker

U.S. telephone numbers can be entered in a variety of different formats. This neat little expression enforces a little standardization, matching against only the following national and statewide formats: NNN-NNN-NNNN and NNN-NNNN.

U.S. Phone Regular Expression

Expression: ^(?:\d{3}-)?\d{3}-\d{4}$

Sample matches: 555-555-5555, 555-5555, 123-456-7890

Sample nonmatches: 1231231234, 123 456 7890, 001.123.4567

Sample VB .NET code:

 Dim blnMatch As Boolean, strValue As String = "(123) - 123-1234"  blnMatch = System.Text.RegularExpressions.Regex.IsMatch(strValue, _     "^(?:\d{3}-)?\d{3}-\d{4}$") 

U.S. State Checker

Two-character state abbreviations are common input fields on data entry forms. But it s a nightmare validating whether the user has entered a correct value. Wouldn t it be useful to encompass all that logic into one handy regular expression? Here it is. This matches on any valid two-character state abbreviation, from Washington (WA) to California (CA).

U.S. State Regular Expression

Expression: ^A[KLRZ]$^C[AOT]$^D[CE]$^FL$^GA$^HI$^I[ADLN]$ ^K[SY]$^LA$^M[ADEINOST]$^N[BCDHJMVY]$^O[HKR]$^PA$^RI$^S[CD]$^T[NX] $^UT$^V[AT]$^W[AIVY]$

Sample matches: MI, AK, OH

Sample nonmatches: OD, Michigan, S.C.

Sample VB .NET code:

 Dim blnMatch As Boolean, strValue As String = "CA"  blnMatch = System.Text.RegularExpressions.Regex.IsMatch( _  strValue, "^A[KLRZ]$^C[AOT]$^D[CE]$^FL$^GA$^HI$^I[ADLN]$" & _     "^K[SY]$^LA$^M[ADEINOST]$^N[BCDHJMVY]$^O[HKR]$" & _     "^PA$^RI$^S[CD]$^T[NX]$^UT$^V[AT]$^W[AIVY]$") 

U.K. Postal Code Matcher

Americans attempting to validate U.K. postal codes must feel slightly dazed. There s no simple five-digit solution here. The United Kingdom has at least six different types of postal code that are quite normal and recognizable to the inhabitants of England, Wales, and Scotland. These are LN NLL, LLN NLL, LNN NLL, LLNN NLL, LLNL NLL, and LNL NLL, where L represents a letter and N depicts a number. And this cute little expression matches the lot.

U.K. Postal Code Regular Expression

Expression: ^[a-zA-Z]{1,2}[0-9][0-9A-Za-z]{0,1} {0,1}[0-9][A-Za-z]{2}$

Sample matches: DN14 8PX, W1C 7PQ, n1 8af

Sample nonmatches: M18-4NQ, S663 9AY, 90210

Sample VB .NET code:

 Dim blnMatch As Boolean, strValue As String = "DN14 8PX"  blnMatch = System.Text.RegularExpressions.Regex.IsMatch(strValue, _     "^[a-zA-Z]{1,2}[0-9][0-9A-Za-z]{0,1} {0,1}[0-9][A-Za-z]{2}$") 

U.K. National Insurance Number Check

The U.K. National Insurance number is the equivalent of the U.S. Social Security number. It does, however, have a slightly different format: LLNNNNNNL. The following expression checks for this exact combination of letters and numbers and returns a match as appropriate.

U.K. National Insurance Number Regular Expression

Expression: ^[A-Za-z]{2}[0-9]{6}[A-Za-z]{1}$

Sample matches: WC723814X, PF482301X, AI393150D

Sample nonmatches: 12345678X, P4F82301X, WC7238144X

Sample VB .NET code:

 Dim blnMatch As Boolean, strValue As String = "KM123456Y"  blnMatch = System.Text.RegularExpressions.Regex.IsMatch(strValue, _     "^[A-Za-z]{2}[0-9]{6}[A-Za-z]{1}$") 

U.K. Telephone Number Validator

A U.K. landline or mobile telephone number can come in any of three core formats: NNN NNNN NNNN, NNNN NNN NNN, or NNNNN NNN NNN. This neat expression checks for each of these formats, allowing for optional spaces.

U.K. Telephone Number Regular Expression

Expression: (0\d{2} ?\d{4} ?\d{4}$)(0\d{3} ?\d{3} ?\d{3}$)(0\d{4} ?\d{3} ?\d{3}$)

Sample matches: 020 1234 1234, 01405 123 123, 01142 212 123

Sample nonmatches: 020-1234-1234, +44 20 1234 1234, 12345678

Sample VB .NET code:

 Dim blnMatch As Boolean, strValue As String = "020 1234 1234"  blnMatch = System.Text.RegularExpressions.Regex.IsMatch(strValue, _     "(0\d{2} ?\d{4} ?\d{4}$)(0\d{3} ?\d{3} ?\d{3}$)" & _     "(0\d{4} ?\d{3} ?\d{3}$)") 

Converting American and British Dates

The Americans and British get along very well together. Their date formats, however, don t. An American would write 10/1/2004 for October 1st, 2004, whereas a Brit would interpret it as the 10th of January 2004. That s where the following regular expression could come in handy: it takes a date and, using the Replace function, groups, and back references, it switches the day and month parts around, therefore converting between the two formats.

American/British Date Regular Expression

Expression: \b(? < part1 > \d{1,2})/(? < part2 > \d{1,2})/(? < year > \d{2,4})\b

Sample matches: 5/12/02 (becomes 12/5/02) , 20/05/2003 (becomes

05/20/2003) , 31/12/2005 (becomes 12/31/2005)

Sample nonmatches: 5-12-02, 31/12/ 03, December 31, 2003

Sample VB .NET code:

 Dim strValue As String = "31/12/2002"  Dim strNewValue As String = _     System.Text.RegularExpressions.Regex.Replace(strValue, _     "\b(?<part1>\d{1,2})/(?<part2>\d{1,2})/(?<year>\d{2,4})\b", _     "${part2}/${part1}/${year}") 

French, German, and Japanese Expressions

This next lot of regular expressions, I just can t claim credit for. They come directly from the sample expressions suggested by the ASP.NET RegularExpressionValidator control. (See the Five Steps to Regular Expressions tip in Chapter 3 for more information.) They re listed here for reference only, minus example code:

  • French phone number: (0( \d\d ))?\d\d \d\d(\d \d \d\d )\d\d

  • French postal code: \d{5}

  • German phone number: ((\(0\d\d\) (\(0\d{3}\) )?\d )?\d\d \d\d \d\d\(0\d{4}\) \d \d\d-\d\d?)

  • German postal code: (D-)?\d{5}

  • Japanese phone number: (0\d{1,4}-\(0\d{1,4}\) ?)?\d{1,4}-\d{4}

  • Japanese postal code: \d{3}(-(\d{4}\d{2}))?

The Simple Cure for Loose Expressions

If you re writing code using regular expressions you ve extracted from books or found on the Internet, you may need to do just a little work before sticking them into your application.

Imagine, for example, that you decided to use the Microsoft regular expression for validating a French postal code ” \d{5} ” asking for five simple digits. You d expect to get IsMatch returning True on something like 12345 . But you wouldn t think it d match on ABC12345XYZ now, would you?

Well, it does.

Why? Because the expression is too loose. So long as it finds at least one match in the string, it ll give your code the official nod. The fact that your database may end up full of completely corrupt data is of no consequence.

TOP TIP  

If the IsMatch function returns a True , you can find out which part is matching through the Match function. If multiple parts match, you can cycle through them all using the Matches collection.

The trick is to tighten up your expression. But how? Follow my simple rule: just add ^( to the beginning of your expression and )$ to the end. This will result in IsMatch returning a True only when there is an exact match, and no excess.

So, with the slightly amended ^(\d{5})$ , something like 12345 will match, but ABC12345XYZ won t. Just the way you want it.

Watch out, however: you want to use tightening with only the IsMatch function. If you want to cycle through a list of matches (think the second code sample of the Internet URL Matcher: FTP, HTTP, HTTPS tip), don t tighten your expression. Here, you want it to be loose and search the whole text for matching patterns. Also, although we ve demonstrated that the regular expressions behind the RegularExpressionValidator control are loose, the actual control checks for exact matches, so don t be put off using it.

In all of the examples given in this chapter, I ve automatically tightened or loosened depending on the situation. Now you know the ^( and )$ technique, you can take it one step further, customizing these and other expressions to exactly meet your requirements.

Good luck!




The Ultimate VB .NET and ASP.NET Code Book
The Ultimate VB .NET and ASP.NET Code Book
ISBN: 1590591062
EAN: 2147483647
Year: 2003
Pages: 76
Authors: Karl Moore

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net