Regular Expressions


A regular expression is a powerful pattern-matching facility that enables you to compare patterns against target strings to find pattern matches.

So what exactly is a regular expression? Let's say that you want to evaluate a simple North American phone number (999) 999 9999 using VBScript string functions:

Function IsPhone(strPhone)
 If Len(strPhone) = 13 And Left(strPhone, 1) = "(" And _
 IsNumeric(Mid(strPhone, 2, 3)) And Mid(strPhone, 5, 1) = ")" And _
 IsNumeric(Mid(strPhone, 6, 3)) And IsNumeric(Mid(strPhone, 9, 4)) Then
 IsPhone = True
 IsPhone = False
 End If
End Function

The IsPhone function will return True if the argument that is passed matches a phone number in the format (999) 999 9999. It would return True if the number (123) 456 7890 was passed, but not 456 7890 or 555-124 3445.

The following code performs phone-number matching using a regular expression:

Function IsPhone(strPhone)
 Set objRegExp = New RegExp
 objRegExp.Pattern = " ^((?d{3})?)?[ -]?d{3}[ -]?d{4}$"
 IsPhone = objRegExp.Test(strPhone)
End Function

This function matches phone numbers with or without area codes. It also matches phone numbers with spaces or hyphens as separators. It returns True if any of the following phone numbers are tested: (123) 456 789, 123456 7890, 555-1234, 6663434, or 604-434-2343.

To provide the same functionality, it is possible to use VBScript, but the code would quickly become unwieldy and difficult to manage and maintain. The regular expression version could be easily extended to check for overseas phone numbers by simply modifying the string pattern-no additional code would be required.

Regular expression patterns can perform sophisticated text matching, extracting, and search and replace operations.

This chapter gives a general introduction to building regular expression patterns and how to use them to match as well as search for and replace items in strings of text. For a much more detailed immersion in the art of regular expressions, get the book Mastering Regular Expressions, by Jeffrey Friedl.


For more information on regular expressions, refer to the MSDN Library article "What's New in Windows Script 5.5" ( can also review the article "Microsoft Beefs Up VBScript with Regular Expressions" (

Validating a String


You want to validate a string.


You can create an instance of the RegExp object and set the Pattern property with the expression you want to validate against. Invoke the Test method against the string you want to test. The following snippet validates e-mail strings:

Dim objRegExp, strAddress

 'create a new instance of the regexp object
 Set objRegExp = New RegExp
 'set case matching off
 objRegExp.IgnoreCase = True
 'set pattern
 objRegExp.Pattern = "w+(.w+)?@w+(.w+)+"

 strAddress= InputBox("Enter an E-mail address")
 'check if address is valid
 If Not objRegExp.Test(strAddress) Then
 MsgBox "Not a valid E-mail address"
 End If


Regular expression operations are exposed through the RegExp object in VBScript version 5.0 and later. This object is actually a built-in native VBScript object. Unlike other components such as file manipulation (which is exposed through external objects), new instances of the RegExp object can be created by invoking the New operator:

Dim objRegExp
'create a new instance of a RegExp object
Set objRegExp = New RegExp

Once an instance of the object has been created, you can set a number of properties before evaluating the expression.

By default, regular expressions tests are case-sensitive. To turn case testing off, set the IgnoreCase property to True.

The Pattern property contains the actual expression pattern you are going to test.

If the pattern is a string of characters, the pattern searches for that string against a target string for any occurrences. Use the Test method to test a target string against a regular expression. The syntax is as follows:

bResult = objRegExp.Test(strTarget)

strTarget represents the target string being tested. The Test method returns True if the pattern finds a match anywhere in the target string.

In the following sample, a string is checked for an occurrence of "fred":

Dim objRegExp, strName

Set objRegExp = New RegExp
 objRegExp.IgnoreCase = True
 objRegExp.Pattern = "fred"
strName = InputBox("Enter your name")

 If objRegExp.Test(strName) Then
 MsgBox "The name matched"
 MsgBox "The name didn't match"
 End If

Regular expressions use metacharacters to perform more complex operations than matching simple string patterns. On the simplest level, metacharacters can provide operations somewhat similar to wildcards used in command-line operations. These metacharacters are called quantifiers.

Quantifiers match the preceding subexpression in the regular expression a certain number of times, depending on the metacharacter.

A subexpression may represent a single character or a group of characters and metacharacters. Table 8-1 lists metacharacter quantifiers.

Table 8-1: Metacharacter Quantifiers




Matches preceding subexpression zero or more times. The pattern fred* would match fred, freddy, and french.


Matches one or more (but not zero) times. The pattern fred+ would match fred and freddy, but not french.


Matches preceding subexpression zero or one time. The pattern fred+ would match fred and freddy, but not french.

In the previous examples, the metacharacter affected the previous single character subexpression. You may want to group characters and modifiers into strings. Any expression surrounded by parentheses "()" is treated as a single subexpression.

For the regular expression (Fred)+(Bob)? Fred and Bob are subexpressions because they are surrounded by parentheses. The first plus sign (+) metacharacter applies to subexpression Fred, and the question mark (?) metacharacter applies to subexpression Bob.

Without the brackets, the plus sign (+) metacharacter in the regular expression Fred+Bob? would only apply to the preceding d character and the question mark (?) would apply to the preceding b.

If you want to treat a metacharacter as a normal character in a regular expression, precede it with a backslash. The expression (**) would match the string (**) because each of the characters is preceded by a backslash ().

The dollar sign ($) metacharacter matches the end of a string. The caret (^) metacharacter matches the beginning of a string. Table 8-2 lists examples using these metacharacters.

Table 8-2: The Dollar Sign ($) and Caret (^) Metacharacters




Matches the string "the end" but not "end of the world"


Matches the string "beginning of line" but not "in the beginning"

To match one or another pattern, use the vertical bar (|) alternate metacharacter to separate alternate choices. Table 8-3 lists alternate metacharacter samples.

Table 8-3: Alternate Metacharacters




Will match any string that starts with a weekday (e.g., Monday or Wednesday, but not Saturday or Sunday)


Matches a string that contains either january or jan


Matches a string that starts with http: or ftp:


Matches any string that ends with 403 or 404

Use the period (.) metacharacter to test for one occurrence of any character except the newline character. Table 8-4 lists period metacharacter samples.


Table 8-4: Period Metacharacters



Matches any string that contains text surrounded by

and tags

The pattern ^Mr .+ .+ would match any string starting with Mr and containing a first and last name of any length.

To test for a range or set of characters in a pattern, surround the range with square brackets. The pattern ^Mr [A-Z]+ [A-Z]+$ would test for Mr firstname lastname, where the names must contain valid alphabetic characters. While the range specified only lists uppercase letters, if the IgnoreCase property is set to True, the case will be ignored when testing.

Prefixing the range with a carat (^) indicates that you want to match all characters except those in the specified range. The pattern H[^A-F] would match H1 and HG, but not HA.

To match a string with the salutation Mr. (with a period), the pattern ^Mr. .+ .+$ wouldn't work. This is because the period (.) is a pattern metacharacter, so instead of matching the period in the salutation, it allows any character.

If you test for a character that also represents a metacharacter operator, prefix it with a backslash ().

The pattern used to match the salutation with a period would look like this: ^Mr. .+ .+$.

Certain characters prefixed by the backslash represent special characters, such as the newline and form-feed characters, or sequences and ranges of other metacharacters. Table 8-5 lists additional metacharacters.

Table 8-5: Additional Metacharacters




Digit character. Same as [0-9].


Nondigit character. Same as [^0-9].


Form-feed character.

Newline character.

Carriage return character.


White space characters. This includes space, newline, tab, and form feed. Same as [ fv].


Non-white space characters. Same as [^ fv].

Tab character.


Vertical tab character.


Any alphanumeric character. Same as "[A-Za-z0-9_]".


Nonword character. Same as "[^A-Za-z0-9_]".

Word boundary. Matches the space between word (w) and nonword (W) characters.


Nonword boundary.

Let's look at the pattern used in the Solution script to check e-mail addresses: w+(.w+)?@w+(.w+)+. The first part, w+, checks for one or more occurrences of alphanumeric characters. The second part, (.w+)?, checks for zero or more occurrences of a string sequence starting with a period followed by any number of alphanumeric characters. The at sign (@) indicates it must exist in the string.

After the at sign, you see w+, which would match one or more occurrences of alphanumeric characters. The last bit of the pattern, (.w+)+, would match one or more string combinations starting with a period followed by any number of alphanumeric characters.

You've seen how you can use the +,*, and ? quantifiers to perform repetitive matches. They can be used to represent none, one, or any number of matches. But what if you want to perform a set number of matches?

Use the brace brackets ({}) to perform a set number of matches. The quantifier {5} would match the preceding subexpression exactly 5 times, {5,} would match at least 5 times, and {2,5} would match a minimum of 2 and a maximum of 5 times. Table 8-6 lists regular expression samples.

Table 8-6: Regular Expressions




Matches exactly four numbers.


Matches at least two numbers.


A 5- or 9-digit ZIP code.


IP address, such as


Date in mm/dd/yyyy format. The year can be either two or four digits.


Web or FTP server address. Must contain at least two elements of a fully qualified domain name, so http://odin is not valid, but is.

^((?d{3})?)?[ -]?d{3}[ -]?d{4}$

North American phone number. Would validate 555-1234, (123)555-1234, and 124-545234. Would not validate international area codes.

The metacharacters can be used with a number of range operators to only check for a range of values. Any character surrounded by square brackets is considered a range. Table 8-7 lists regular expression range samples.

Table 8-7: Regular Expression Ranges




A number between 11 and 99.


Matches a string that starts with any character outside of the range of B to K and ends with a number between 1 and 9. A3 and y5 are valid, but E4 and S0 are not.


Checks for valid e-mail address (e.g.,


Matches time (e.g., 12:30 but not 44:30).


A 1- or 2-digit hexadecimal value.

You can reference subexpressions elsewhere in the regular expression by a backslash followed by the number of the subexpression you want to reference, starting from 0. This is known as back-referencing.

For example, to match any valid HTML tag sequence, use <(.*)>.*</1>. The <(.*)> expression matches any HTML tag, such as . The parentheses surrounding the expression (.*) indicate the text that it matches will be stored as a subexpression.

The 1 expression back-references the first subexpression match, which in this case is the first HTML tag.

A backslash followed by an x character and then a one- or two-digit hexa-decimal value, such as x22, will match the character with the corresponding ASCII value.

A backslash followed by an x character and then a two- or three-digit octal number value, such as x22, will match the character with the corresponding ASCII value.

Regular expressions are more of an integral part of JScript. Unlike VBScript, JScript allows literal regular expressions similar to literal strings. Forward slashes (/) are used to surround these literal regular expressions:

objRegExp = /strPattern/[strSwitch];

JScript can also create a regular expression using the new operator and built-in RegExp object:

var objRegExp = new RegExp(strPattern,[strSwitch]);

strPattern is the regular expression pattern you want to use. The strSwitch parameter represents one or more optional switches. The switches that are available are i, which ignores case when matching text, and g, which performs a global search.

//the following statements create two Instances of a regexp object
//with the same pattern
var objRegExp = new RegExp("Mr .+ .+", "i");

objRegExp2 = /Mr .+ .+/i;

If you want to use the regular expression functions in a scripting language or development environment that supports COM objects (such as Visual Basic), you can create an instance of the regular expression object using the Vbscript.RegExp program ID:

'the following statement can be used to create an instance of the
'regular expression object in other VB dialects
Set objRegExp= CreateObject("vbscript.RegExp")


For more information, read the MSDN Library article "Microsoft Beefs Up VBScript with Regular Expressions," which is available at

See Also

Search for "Regular expression syntax" in the VBScript documentation Help file. Solution 5.10, Solution 6.1, and Solution 9.2.

Matching Multiple Patterns


You want all the expressions in a string that match a pattern.


You can create a RegExp object, setting the Pattern property to the pattern you want to match and setting the Global property to True. Any matches are stored in the Matches collection. The following sample returns a list of all numeric values from a comma-delimited string:

Set objRegExp = New RegExp
'set pattern to extract all numeric values from string
objRegExp.Pattern = "d+.?d*|.d+" objRegExp.IgnoreCase = True
objRegExp.Global = True
Set objMatches = objRegExp.Execute("111.13,1232,ABC,444,55")

For Each objMatch In objMatches
 Wscript.Echo "Found match:" & objMatch.Value & " at position " & _


Repetitive patterns can be extracted from a string using regular expressions.

The Regular expression object contains a Global property. Setting this property to True indicates that the input will be checked for multiple occurrences of the pattern.

To return a list of any matches from the expression, invoke the Execute method:

Set objMatches = objRegExp.Execute(strString)

objRegExp represents a regular expression object, while the strString parameter is the string to test.

Any matches returned from the string are stored in the Matches collection. This collection contains all matches from the input string represented as Match objects.

Each Match object contains a Value and FirstIndex property. The Value property returns the value extracted from the string, while the FirstIndex property returns the offset in the string where the match was made.

The following generic function extracts items from a comma-delimited string. The elements are stored in an array, which is returned by the function:

Function ExtractCSV(strCSV)

 Dim objRegExp, objMatch, aRet, objMatches, nF
 Set objRegExp = New RegExp
' matches digits, digits followed by a point, digits followed by a
' point and then more digits, a point followed by digits, or anything
' enclosed by double-quotes
 objRegExp.Pattern = "d+.?d*|.d+|x22[^""]+x22"

objRegExp.IgnoreCase = True
objRegExp.Global = True
 Set objMatches = objRegExp.Execute(strCSV)
 If objMatches.Count > 0 Then

 ReDim aRet(objMatches.Count)
 For nF = 0 To objMatches.Count - 1 ' iterate Matches collection.
 Set objMatch = objMatches.item(nF)
 ' check if the string is surrounded by quotes, if so remove them
 If Left(objMatch.Value, 1) = """" And _
 Right(objMatch.Value, 1) = """" Then
 aRet(nF) = Mid(objMatch.Value, 2, Len(objMatch.Value) - 2)
 aRet(nF) = objMatch.Value
 End If

 ExtractCSV = aRet
 ExtractCSV = Empty
 End If

End Function

The string elements can be numeric or text values. Any text values must be surrounded by double quotes and can contain any character, including commas.

aValues = ExtractCSV("10.50,10,"" Fred Smith"",20")

See Also

For more information, read the MSDN Library article "Global Property" (

Matching Subexpressions


You want to extract all regular expression subexpression matches from a target string.


Use the Matches collection's SubMatches property to list all matched subexpressions:

Set objRegExp = New RegExp
' match "digits slash digits slash digits"
objRegExp.Pattern = "(d+)/(d+)/(d+)"
objRegExp.IgnoreCase = True

Set objMatches = objRegExp.Execute("5/13/2000")

'list all subexpressions [
For nF = 0 To objMatches(0).SubMatches.Count - 1
 Wscript.Echo objMatches(0).SubMatches(nF)


The Regular expression object's Matches collection returns a list of matches by executing a single expression against a string. This is useful when matching multiple matches of a single pattern, as demonstrated by Solution 8.2.

If you need to match multiple different expressions or a set number of expressions in a certain order, you need to use subexpressions. For example, the following regular expression matches two subexpressions:


If this regular expression was executed against the string "10:30", it would return two subexpressions: 10 and 30. VBScript did not support subexpression matching until VBScript version 5.5. Documentation provided with earlier VBScript versions states that it can process subexpressions using the Matches collection, but this is not the case.

VBScript 5.5 introduces a SubMatches property to the Match object, which provides access to any subexpressions returned from the execution of a regular expression against a string.

The first Match element of the Matches collection contains the string matches, which can be accessed through the SubMatches property. The SubMatches property exposes a Count property that returns the number of submatches.

Each submatch can be accessed through this collection as an array element. The following sample extracts the month, day, and year from a date string and displays each element from the SubMatches collection:

Dim objRegExp, objMatches, objSubMatches
'Set objRegExp = New RegExp
Set objRegExp = CreateObject("VBScript.RegExp")
' match "digits slash digits slash digits"
objRegExp.Pattern = "(d+)/(d+)/(d+)"
Set objMatches = objRegExp.Execute("5/13/2000")
'the first element of the matches collection contains the submatches
Set objSubMatches = objMatches(0).SubMatches
'list the number of sub expression matches and matches
Wscript.Echo "Subexpression count " & objSubMatches.Count
Wscript.Echo "Subexpression 1 " & objSubMatches (0)
Wscript.Echo "Subexpression 2 " & objSubMatches (1)
Wscript.Echo "Subexpression 3 " & objSubMatches (2)

The following script, regflt.vbs, is a generic command-line program that executes a regular expression pattern against standard input, outputting any extracted subexpressions to standard output delimited by a comma:

'command line regular expression filter
Dim nF, strDelim, strLine
 If WScript.Arguments.Count < 1 Or WScript.Arguments.Count > 2 Then
 End If

 strDelim = "," 'set delimiter
 strPattern = WScript.Arguments(0)

 'check if alternate output delimiter specified
 If WScript.Arguments.Count = 2 Then strDelim = WScript.Arguments(1)

 'create regular expression object and set properties
 Set objRegExp = New RegExp
 objRegExp.Pattern = strPattern
 objRegExp.IgnoreCase = True
 objRegExp.Global = True

 'loop until the end of the text stream has been encountered
 Do While Not WScript.StdIn.AtEndOfStream

 'read line from standard input
 strLine = WScript.StdIn.ReadLine

 'execute regular expression match
 Set objMatches = objRegExp.Execute(strLine)
 strOut = ""
 'if matches are made, loop through each match and append to output
 If objMatches.Count > 0 Then
 For nF = 0 To objMatches(0).SubMatches.Count - 2
 Wscript.Stdout.Write objMatches(0).Submatches(nF) & strDelim
 Wscript.Stdout.WriteLine objMatches(0).Submatches(nF)
 End If

Sub ShowUsage()
WScript.Echo "regflt filters standard input against " & _
 "a regular expression." & vbLf & _
 "Syntax:" & vbLf & _
 "regflt.vbs regexp [delimiter]" & vbLf & _
 "regexp regular expression" & vbLf & _
 "delimiter optional. Character to delimiter output columns"
 WScript.Quit -1
End Sub

Only one parameter is required for the script, which is the regular expression that is being executed against each line of the standard input. An optional second parameter can be passed to change the output delimiter, which by default is a comma.

The script can be useful for extracting values from existing files and building delimited files based on the output.

In the following sample, a price list is stored in a text file called prices.txt:

Bakery Price List Page 1
White bread $1.20
Brown bread $1.25
Whole wheat $1.30
Buns 0.30ea

To extract the descriptions and prices to an output file called prices.csv, use this code:

cscript regflt.vbs "^(w+ w*).+(d+.d+)" < prices.txt

The results are stored in the prices.csv file:

White bread,1.20
Brown bread,1.25
Whole wheat,1.30

Values can be piped from any other application that outputs information to standard output. The following currency exchange page is stored on a remote Web server:


Exchange Rates

Exchange Rates for date:1/12/2000

Canadian Dollar CAN .67
Australian Dollar AUS .64
German Mark GDR .7
British Pound PND 1.5
Japanese Yen YEN .9

Using the httpget script from Chapter 11, the page is retrieved and piped to the regflt script:

cscript httpget.wsf "" | cscript regflt.
vbs "(.+)(.+)(.+)" > exchange.csv

The exchange values are extracted and redirected to a file called exchange.csv:

Canadian Dollar,CAN,.67
Australian Dollar,AUS,.64
German Mark,GDR,.7
British Pound,PND,1.5
Japanese Yen,YEN,.9

JScript has always provided access to subexpressions. Use the RegExp class to access subexpressions. JScript's way of accessing subexpressions is different than VBScript's as a result of language standard considerations.

To access a subexpression, either reference the RegExp class match property for the match number you want to return or the array of matches returned from the execution of the regular expression.

The numeric match properties are identified by a dollar sign ($) followed by the match number:

 var objRegExp = new RegExp("(.+):(.+)"," ig");
 var aMatches = objRegExp.exec("hello:there");

//output the first and second subexpression match
 WScript.Echo (RegExp.$1);
 WScript.Echo (RegExp.$2);

//do the same as above, except reference the matches array
 WScript.Echo (aMatches[1]);
 WScript.Echo (aMatches[2]);

If you are processing an unknown number or more than nine sub-expressions, you can access them through the Matches collection. The following JScript code sample implements a command-line regular expression filter similar to the Solution script:

//filters regular expression elements from strings
var nF, rst, strLine, strOut;

if(WScript.Arguments.length< 1 || WScript.Arguments.length>1)

// get the regular expression string
 rst = WScript.Arguments.Item(0);

 //loop until the end of the text stream has been encountered
 strLine = WScript.StdIn.ReadLine();

 arg = strLine.match(rst);

 arg = strLine.match(rst);
 if (arg)

See Also

For more information, read the MSDN Library article "What's New in Windows Script 5.5" (

Replacing Values


You want to replace all occurrences of a pattern with something else.


In the following example, a match is made against a date string and the first date element is swapped with the second:

Dim objRegExp, strName

Set objRegExp = New RegExp
 objRegExp.IgnoreCase = True
 objRegExp.Pattern = "^(d{1,2})/(d{1,2})/(d{2,4})$"

 Wscript.Echo objRegExp.Replace("5/3/2000","$2/$1/$3")

The result of this script is 3/5/2000.


Regular expressions provide powerful search and replace capabilities.

A replace operation can replace a string that matches a regular expression with another string or with a subexpression result. It can also call a separate function to perform the replace operation.

objRegExp.Replace(strInput, strReplaceText)

The Replace method arguments are listed in Table 8-8.

Table 8-8: Replace Method Arguments




Regular expression object. The criteria to be used to search for the expression to replace are specified in the Pattern property of the regular expression.


String to search.


Replacement text or reference to replacement function.

The Replace method returns the results of the replace operation(s). If no matches and replacements were made, the method returns the original input string.

A simple search and replace can be made by passing a string value as the replacement text.

The replace operation can also perform operations using subexpression matches. To reference subexpressions in the replacement string, insert dollar signs ($) followed by the subexpression number-for example, $1, $2, and so on.

As of version 5.5 of VBScript and JScript, replacement operations can call a function to perform the replacement.

To pass a function, use the GetRef function to pass a reference to the name of the replacement function. The function must take three parameters: the match string, position, and source.

The result returned by the function is used as the replacement value. The following example illustrates how to call a replace function to perform a "proper" case conversion (the first letter is uppercase and the rest are lowercase) on a string:

Dim objRegExp

Set objRegExp = New RegExp
objRegExp.IgnoreCase = True
objRegExp.Pattern = "w+"
objRegExp.Global = True

Wscript.Echo _
 objRegExp.Replace("FRED MCTAVISH", GetRef("Proper"))

Function Proper(strMatch, nPos, strSource)
 'check if the match starts with MC
 If StrComp(Left(strMatch,2), "Mc", vbTextCompare)=0 Then
 Proper= "Mc" & Ucase(Mid(strMatch,3,1)) & Lcase(mid(strMatch,4))
 Proper= Ucase(Left(strMatch,1)) & Lcase(mid(strMatch,2))
 End If
End Function

The Proper function is passed to the Replace method. On each string match, the function is called and the string is converted to "proper" case. The capability to call a function allows for additional logic and calculations to be executed against the matched strings.

JScript implements the replace functionality through the Replace method of the string class:

strResult = objString.Replace(objRegExp, strReplaceText)

Instead of invoking a Replace method on the regular expression object, a regular expression object is passed to the string class Replace method. The objRegExp parameter represents the regular expression object, while strReplaceText is the replacement text.

//swap day/month in date
 var strResult, objRegexp;
 var strDate = "5/3/2000";
 //create regular expression object
 objRegExp = /^(d{1,2})/(d{1,2})/(d{2,4})$/i;
 //replace first value with second
 strResult = strDate.replace(objRegExp, "$2/$1/$3");


See Also

Solution 9.2. For more information, read the MSDN Library article "Microsoft Beefs Up VBScript with Regular Expressions" (

Managing Enterprise Systems with the Windows Script Host
Managing Enterprise Systems with the Windows Script Host
ISBN: 1893115674
EAN: 2147483647
Year: 2005
Pages: 242
Authors: Stein Borge © 2008-2020.
If you may any questions please contact us: