Regular Expressions

Overview

Version 5 and above of VBScript fully support regular expressions in VBScript. Before that this was one feature that was sorely lacking within VBScript, and one that made it inferior to other scripting languages, including JavaScript.


Introduction to Regular Expressions

Regular expressions provide powerful facilities for character pattern-matching and replacing. Before the addition of regular expressions to the VBScript engine, performing a search-and-replace task throughout a string required a fair amount of code, comprising mainly of loops , InStr , and Mid functions. Now it is possible to do all this with one line of code using a regular expression.

If you've programmed in the past using another language (C++, Perl, awk, or JavaScript-even Microsoft's own JScript had support for regular expression before VBScript did), regular expressions won't be new to you. However, one thing that experienced programmers need to know in order to leverage regular expressions in VBScript is that VBScript does not provide support for regular expression constants (like /a pattern/ ). Instead, VBScript uses text strings assigned to the Pattern property of a RegExp object. In many ways this is superior to the traditional method because there is no new syntax to learn. But if you are used to regular expressions from other languages, especially client-side JavaScript, this might be something that you may not expect.

Note  

There are now many Windows based text editors that have followed in the footsteps of the Unix text editor vi and now support regular expression searches. These include UltraEdit-32 ( www.ultraedit.com ) and SlickEdit ( www.slickedit.com ).


Regular Expressions in Action

The quickest and easiest way to become familiar with regular expressions is to look at a few examples. Here is probably one of the simplest examples of regular expression in action-a simple find-and-replace example.



Dim re, s




Set re = New RegExp




re.Pattern = "France"




s = "The rain in France falls mainly on the plains."




MsgBox re.Replace(s, "Spain")


Nothing spectacular-it's just a simple find and replace, but it is a powerful foundation to build up from. Here's how the code works. First, we create a new regular expression object.



Set re = New RegExp


Then we set the key property on that object. This is the pattern that we want to match.



re.Pattern = "France"


And the following line is the string we will be searching.



s = "The rain in France falls mainly on the plains."


The last line is the powerhouse of the script and is the line that does the real work. It asks our regular expression object to find the first occurrence of "France" (the pattern) within the string held in variable s and to replace it with "Spain" . Once we've done that, we use a message box to show off our great find-and-replace skills.



MsgBox re.Replace(s, "Spain")


When the script is run, the final output should be as shown in Figure 9-1.


Figure 9-1

Now, it's all well and good hard-coding the string and search criteria straight from the start, but you can make it a lot more flexible by making the script accept the string and the find-and-replace criteria from an input.



Dim re, s, sc




Set re = New RegExp




s = InputBox("Type a string for the code to search")




re.Pattern = InputBox("Type in a pattern to find")




sc = InputBox("Type in a string to replace the pattern")




MsgBox re.Replace(s, sc)


This is pretty much the exact same code as we had before, but with three key differences. Instead of having everything hard-coded into the script, we introduce flexibility by using three input boxes in the code.



s = InputBox("Type a string for the code to search")




re.Pattern = InputBox("Type in a pattern to find")




sc = InputBox("Type in a string to replace the pattern")


The final change to the code is in the final line enabling the Replace method to make use of the sc variable.



MsgBox re.Replace(s, sc)


This lets you manually enter the string you want to be searched, as shown in Figure 9-2.

click to expand
Figure 9-2

Then you can enter the pattern you want to find (see Figure 9-3).

click to expand
Figure 9-3

Finally, enter a string to replace the pattern (see Figure 9-4).

click to expand
Figure 9-4

This let's you try out something that you might already be thinking. That is, what happens if you try to find and replace a pattern that doesn't exist in the string. In fact, nothing happens, as shown here. Type in the string as shown in Figure 9-5.

click to expand
Figure 9-5

Next you enter a search for a pattern that doesn't exist (something that doesn't appear in the string). In Figure 9-6 we use the string 'JScript'.

click to expand
Figure 9-6

In the next prompt, enter a string to replace the nonexistent pattern. As no replacement will be carried out it can be anything. In Figure 9-7 we use the string 'JavaScript.'

click to expand
Figure 9-7

Notice what happens. Nothing. As you can see in Figure 9-8, the initial string is unchanged.


Figure 9-8

Building on Simplicity

Obviously the examples that you've seen so far are quite simple ones, and to be honest, we could probably do everything we've done here just as easily using VBScript's string manipulation functions. But what if we wanted to replace all occurrences of string? Or what if we wanted to replace all occurrences of string but only when they appear at the end of a word?

We need to make some tweaks to the code. Take a look at the following code.



Dim re, s




Set re = New RegExp




re.Pattern = "in"




re.Global = True




s = "The rain in Spain falls mainly on the plains."




MsgBox re.Replace(s, "in the country of")


This version has two key differences. First, it uses a special sequence (  ) to match a word boundary (we'll explore all the special sequences available in the Regular Expression Characters section ). This is demonstrated in Figure 9-9.

click to expand
Figure 9-9

What if we left the b out, like this?

Dim re, s
Set re = New RegExp


re.Pattern = "in"


re.Global = True
s = "The rain in Spain falls mainly on the plains."
MsgBox re.Replace(s, "in the country of")

Without this, the "in" part of the words "rain" , "Spain" , "mainly" and "plains" would be changed to "in the country of" also. This would give, as you can see in Figure 9-10, some very funny , but undesirable, results.

click to expand
Figure 9-10

Second, by setting the Global property we ensure that we match all the occurrences of "in" that we want.

Dim re, s
Set re = New RegExp
re.Pattern = "in"


re.Global = True


s = "The rain in Spain falls mainly on the plains."
MsgBox re.Replace(s, "in the country of")

Regular expressions provide a very powerful language for expressing complicated patterns like these, so let's get on with learning about the objects that allow us to use them within VBScript.


The RegExp Object

The RegExp object is the object that provides simple regular expression support in VBScript. All the properties and methods relating to regular expressions in VBScript are related to this object.

Dim re


Set re = New RegExp


This object has three properties and three methods. The three properties are:

  • Global property
  • IgnoreCase property
  • Pattern property

The three methods are:

  • Execute method
  • Replace method
  • Test method

RegExp Properties

As mentioned before, the RegExp object has three properties that you can use. Let's take a look at the three properties associated with the RegExp object.Global property.

The Global property is responsible for setting or returning a Boolean value that indicates whether or not a pattern is to match all occurrences in an entire search string or just the first occurrence.

object.Global [= value ]
object Always a RegExp object
value There are two possible values: True and False

If the value of the Global property is True then the search applies to the entire string; if it is False then it does not. Default is False -not True as documented in some Microsoft sources

Dim re, s
Set re = New RegExp
re.Pattern = "in"


re.Global = True


s = "The rain in Spain falls mainly on the plains."
MsgBox re.Replace(s, "in the country of")

IgnoreCase Property

The IgnoreCase property sets or returns a Boolean value that indicates whether or not the pattern search is case-sensitive.

object.IgnoreCase [= value ]
object Always a RegExp object
value There are two possible values: True and False

If the value of the IgnoreCase property is False then the search is case sensitive; if it is True then it is not. Default is False -not True as documented in some Microsoft sources

Dim re, s
Set re = New RegExp
re.Pattern = "in"
re.Global = True


re.IgnoreCase = True


s = "The rain In Spain falls mainly on the plains."
MsgBox re.Replace(s, "in the country of")

Pattern Property

The Pattern property sets or returns the regular expression pattern being searched.

object.Pattern [= "searchstring"]
object Always a RegExp object
searchstring Regular string expression being searched for. May include any of the regular expression characters -optional
Dim re, s
Set re = New RegExp


re.Pattern = "in"


re.Global = True
s = "The rain in Spain falls mainly on the plains."
MsgBox re.Replace(s, "in the country of")

Regular Expression Characters

Tip  

Capitalized special characters do the opposite of their lower case counterparts.

Character Description

Marks the next character as either a special character or a literal

^

Matches the beginning of input

$

Matches the end of input

*

Matches the preceding character zero or more times

+

Matches the preceding character one or more times

?

Matches the preceding character zero or one time

.

Matches any single character except a newline character

(pattern)

Matches pattern and remembers the match. The matched substring can be retrieved from the resulting Matches collection, using Item [0]...[n] . To match the parentheses characters themselves , precede with slash-use (((or()(

(?:pattern)

Matches pattern but does not capture the match, that is, it is a noncapturing match that is not stored for possible later use. This is useful for combining parts of a pattern with the ' or ' character (). For example, ' anomol(?:yies )' is a more economical expression than ' anomolyanomolies '

(?=pattern)

Positive lookahead matches the search string at any point where a string matching pattern begins. This is a noncapturing match, that is, the match is not captured for possible later use. For example, 'Windows (?=9598NT2000XP)' matches 'Windows' in 'Windows XP' but not 'Windows' in 'Windows 3.1'

(?!pattern)

Negative lookahead matches the search string at any point where a string not matching pattern begins. This is a noncapturing match, that is, the match is not captured for possible later use. For example, 'Windows (?!9598NT2000XP)' matches 'Windows' in 'Windows 3.1' but does not match 'Windows' in 'Windows XP'

xy

Matches either x or y

{n}

Matches exactly n times ( n must always be a nonnegative integer)

{n}

Matches at least n times ( n must always be a nonnegative integer-note the terminating comma)

{n,m}

Matches at least n and at most m times ( m and n must always be nonnegative integers)

[xyz]

Matches any one of the enclosed characters ( xyz represents a character set)

[xyz]

Matches any character not enclosed ( ^xyz represents a negative character set)

[a-z]

Matches any character in the specified range ( a-z represents a range of characters)

[m-z]

Matches any character not in the specified range ( ^m-z represents a negative range of characters)



Matches a word boundary, that is, the position between a word and a space

B

Matches a nonword boundary

d

Matches a digit character. Equivalent to [0-9]

D

Matches a nondigit character. Equivalent to [^0-9]

f

Matches a form-feed character

Matches a newline character

Matches a carriage return character

s

Matches any white space including space, tab, form-feed, and so on. Equivalent to "[f v ]"

S

Matches any nonwhite space character. Equivalent to [^f r t v]"

Matches a tab character

v

Matches a vertical tab character

w

Matches any word character including underscore . Equivalent to "[A-Za-z0-9_ ]'

W

Matches any nonword character. Equivalent to "[^A-Za-z0-9\_]"

.

Matches

Matches

{

Matches {

}

Matches }

\

Matches

[

Matches [

]

Matches ]

(

Matches (

)

Matches )

$ num

Matches num , where num is a positive integer. A reference back to remembered matches (note the $ symbol- differs from some Microsoft documentation)

Matches n , where n is an octal escape value. Octal escape values must be 1, 2, or 3 digits long

uxxxx

Matches the ASCII character expressed by the UNICODE xxxx

xn

Matches n , where n is a hexadecimal escape value. Hexadecimal escape values must be exactly two digits long

Many of these codes are self-explanatory, but some examples would probably help with others. We've already seen a simple pattern:



re.Pattern = "in"


Often it's useful to match any one of a whole class of characters. We do this by enclosing the characters that we want to match in square brackets. For example, the following example will replace any single digit with a more generic term .

Dim re, s
Set re = New RegExp


re.Pattern = "[23456789]"




s = "Spain received 3 millimeters of rain last week."




MsgBox re.Replace(s, "many")


Figure 9-11 shows the output from this code.

click to expand
Figure 9-11

In this case, the number "3" is replaced with the text "many" . As you might expect, we can shorten this class by using a range. This pattern does the same as the preceding one but saves some typing.

Dim re, s
Set re = New RegExp


re.Pattern = "[2-9]"


s = "Spain received 3 millimeters of rain last week."
MsgBox re.Replace(s, "many")

Replacing digits is a common task. In fact, the pattern [0-9] (covering all the digits) is used so often that there is a shortcut for it: d is equivalent to [0-9].



Dim re, s




Set re = New RegExp




re.Pattern = "d"




s = "a b c d e f 1 g 2 h ... 10 z"




MsgBox re.Replace(s, "a number")


The string with the replaced characters is shown in Figure 9-12


Figure 9-12

But what if you wanted to match everything except a digit? Then we can use negation, which is indicated by a circumflex ( ^ ) used within the class square brackets.

Tip  

Note:Using^ outside the square brackets has a totally different meaning and is discussed after the next example.

Thus, to match any character other than a digit we can use any of the following patterns.



re.Pattern = "[^,0-9]" 'the hard way




re.Pattern = "[^d]" 'a little shorter




re.Pattern = "[D]" 'another of those special characters


The last option here uses another of the dozen or so special characters. In most cases these characters just save you some extra typing (or act as a good memory shorthand) but a few, like matching tabs and other nonprintable characters, can be very useful.

There are three special characters that anchor a pattern. They don't match any characters themselves but force another pattern to appear at the beginning of the input ( ^ used outside of [] ), the end of the input ( $ ), or at a word boundary (we've already seen  ).

Another way by which we can shorten our patterns is using repeat counts. The basic idea is to place the repeat after the character or class. For example, the following pattern, as shown in Figure 9-13, matches both digits and replaces them.



Dim re, s




Set re = New RegExp




re.Pattern = "d{3}"




s = "Spain received 100 millimeters of rain in the last 2 weeks."




MsgBox re.Replace(s, "a whopping number of")


click to expand
Figure 9-13

Without the use of the repeat count Figure 9-14 shows that the code would leave the '00' in the final string.

click to expand
Figure 9-14



Dim re, s




Set re = New RegExp




re.Pattern = "d"


s = "Spain received 100 millimeters of rain in the last 2 weeks."
MsgBox re.Replace(s, "a whopping number of")

Note also that we can't just set re.Global = True because we'd end up with four instances of the phrase "a whopping number of" in the result. The result is shown in Figure 9-15.

click to expand
Figure 9-15

Dim re, s
Set re = New RegExp


re.Global = True re.Pattern = "d"


s = "Spain received 100 millimeters of rain in the last 2 weeks."
MsgBox re.Replace(s, "a whopping number of")

As the previous table shows, we can also specify a minimum number of matches {min} or a range {min , max ,}. Again there are a few repeat patterns that are used so often that they have special short cuts.



re.Pattern = "d+" 'one or more digits, d{1, }




re.Pattern = "d*" 'zero or more digits, d{0, }




re.Pattern = "d?" 'optional: zero or one, d{0,1}


Dim re, s
Set re = New RegExp
re.Global = True


re.Pattern = "d+"


s = "Spain received 100 millimeters of rain in the last 2 weeks."
MsgBox re.Replace(s, "a number")

The output of the last code is shown in Figure 9-16.

click to expand
Figure 9-16

Dim re, s
Set re = New RegExp
re.Global = True


re.Pattern = "d*"




s = "Spain received 100 millimeters of rain in the last 2 weeks."




MsgBox re.Replace(s, "a number")


The output of the preceding code is shown in Figure 9-17.

click to expand
Figure 9-17

Dim re, s
Set re = New RegExp
re.Global = True


re.Pattern = "d?"


s = "Spain received 100 millimeters of rain in the last 2 weeks."
MsgBox re.Replace(s, "a number")

The output of the preceding code is shown in Figure 9-18.

The last special characters we should discuss are remembered matches. These are useful when we want to use some or all of the text that matched our pattern as part of the replacement text-see the Replace method for an example of using remembered matches.

click to expand
Figure 9-18

To illustrate this, and bring all this discussion of special characters together, let's do something more useful. We want to search an arbitrary text string and locate any URLs within it. To keep this example simple and reasonable in size , we will only be searching for the 'http:' protocols, but we will be handling some of the vulgarities of DNS names , including an unlimited number of domain layers . Don't worry if you 'don't speak DNS, ' what you know from typing URLs into your browser will suffice.

Our code uses yet another of the RegExp object's methods that we'll meet in more detail in the next section. For now, we need only know that Execute simply performs the pattern match and returns each match via a collection. Here's the code.



Dim re, s




Set re = New RegExp




re.Global = True




re.Pattern = "http://(w+[w-]*w+.)*w+"




s = "http://www.kingsley-hughes.com is a valid web address. And so is "




s = s & vbCrLf & "http://www.wrox.com. And " s = s & vbCrLf & "http://www.pc.ibm.com - even with 4 levels."




Set colMatches = re.Execute(s)




For Each match In colMatches




MsgBox "Found valid URL: " & match.Value




Next


As we'd expect, the real work is done in the line that sets the actual pattern. It looks a bit daunting at first, but it's actually quite easy to follow. Let's break it down.

Our pattern begins with the fixed string http:// . We then use parentheses to group the real workhorse of this pattern. The following highlighted pattern will match one level of a DNS name , including a trailing dot.

re.Pattern = "http://(

w[



w-]*



w



.

)*w+"

This pattern begins with one of the special characters we looked at earlier, w , which is used to match [a-zA-Z0-9] , or in English, all the alphanumeric characters. Next we use the class brackets to match either an alphanumeric character or a dash. This is because DNS names can include dashes. Why didn't we use the same pattern before? Simple-because valid DNS names cannot begin or end with a dash. We allow zero or more characters from this expanded class by using the * repeat count.

re.Pattern = "http://(w

[



w-]*

w..*w+"

After that, we again strictly want an alphanumeric character so our domain name doesn't end in a dash. The last pattern in the parentheses matches the dots (.) used to separate DNS levels.

Note  

we can't use the dot alone because that is a special character that normally matches any single character except a newline. Thus, we 'escape' this character, by preceding it with a slash .

After wrapping all that in parentheses, just to keep our grouping straight, we again use the * repeat count. So the following highlighted pattern will match any valid domain name followed by a dot. To put it another way, it will match one level of a fully qualified DNS name.

re.Pattern = "http://

{



w[



w-]*w

.)*w+"

We end the pattern by requiring one or more alphanumeric characters for the top-level domain name (for example, the com , org , edu , and so on.).

re.Pattern = "http://(w[w-]*w.)*

w+

"

RegExp Methods

We've covered the properties of the RegExp object, so it's time to take a look at the methods. There are three methods associated with the RegExp object that we can look at.

Execute Method

This method is used to execute a regular expression search against a specified string and returns a Matches collection. This is the trigger in the code to run the pattern matching on the string.

object.Execute(string)
object Always a RegExp object
string The text string which is searched for-required
Dim re, s
Set re = New RegExp
re.Global = True
re.Pattern = "http://(w+[w-]*w+.)*w+"
s = "http://www.kingsley-hughes.com is a valid web address. And so is "
s = s & vbCrLf & "http://www.wrox.com. And " s = s & vbCrLf & "http://www.pc.ibm.com - even with 4 levels."


Set colMatches = re.Execute(s)


For Each match In colMatches
 MsgBox "Found valid URL: " & match.Value
Next
Note  

Some of Microsoft's own documentation has been known to contain such errors, most of whichthough have hopefully been corrected by now.

Remember the result of Execute is always a collection, possibly even an empty collection. You can use a test like if re.Execute(s).count = , or better yet use the Test method, which is designed for this purpose.

Replace Method

This method is used to replace text found in a regular expression search.

object.Replace(string1, string2)
object Always a RegExp object.
string1 This is the text string in which the text replacement is to occur-required.
string2 This is the replacement text string-required.

MsgBox re.Replace(s, '** TOP SECRET! **')

The output of the last code is shown in Figure 9-19.

click to expand
Figure 9-19

The Replace method can also replace subexpressions in the pattern. In order to accomplish this we use the special characters $1 , $2 , and so on. in the replace text. These 'parameters' refer to remembered matches.

Backreferencing

A remembered match is simply part of a pattern. This is known as backreferencing. We designate which parts we want to be stored into a temporary buffer by enclosing them in parentheses. Each captured submatch is stored in the order in which it is encountered (from left to right in a regular expressions pattern). The buffer numbers where the submatches are stored begins at 1 and continues up to a maximum of 99 subexpressions. They are then referred to sequentially as $1 , $2 , and so on.

You can override the saving of that part of the regular expression using the noncapturing metacharacters ' ?: ', ' ?= ', or ' ?! '.

In the following example we remember the first five words (consisting of one or more nonwhite space character) and then we display only four of them in the replacement text.

Dim re, s
Set re = New RegExp
re.Pattern = "(S+)s+(S+)s+(S+)s+(S+)s+(S+)"
s = "VBScript is not very cool."
MsgBox re.Replace(s, " ")

The output of the preceding code is shown in Figure 9-20.


Figure 9-20

Notice how in the last code we have added a (S+)s+ pair for each word in the string. This is to give the code greater control over how the string is handled. With this we prevent the tail of the string being added to the end of the string displayed. Take great care when using backreferencing to make sure that the outputs you get are what you expect them to be to!

Test Method

The Test method executes a regular expression search against a specified string and returns a Boolean value that indicates whether or not a pattern match was found.

object.Test(string)

object

Always a

string

The text string upon which the regular expression is executed-required

The Test method returns True if a pattern match is found and False if no match is found. This is the preferred way to determine if a string contains a pattern. Note we often must make patterns case insensitive, as in the following example.



Dim re, s




Set re = New RegExp




re.IgnoreCase = True




re.Pattern = "http://(w+[w-]*w+.)*w+"




s = "Some long string with http://www.wrox.com buried in it."




If re.Test(s) Then




MsgBox "Found a URL."




Else




MsgBox "No URL found."




End If


The output of the preceding code is shown in Figure 9-21.


Figure 9-21

The Matches Collection

The Matches collection is a collection of regular expression Match objects.

A Matches collection contains individual Match objects. The only way to create this collection is using the Execute method of the RegExp object. It is important to remember that the Matches collection property is read-only, as are the individual Match objects.

When a regular expression is executed, zero or more Match objects result. Each Match object provides access to three things:

  • The string found by the regular expression
  • The length of the string
  • An index to where the match was found

Remember to set the Global property to True or your Matches collection can never contain more than one member. This is an easy way to create a very simple but hard to trace bug!

Dim re, objMatch, colMatches, sMsg
Set re = New RegExp
re.Global = True re.Pattern = "http://(w+[w-]*w+.)*w+" s = "http://www.kingsley-hughes.com is a valid web address. And so is "
s = s & vbCrLf & "http://www.wrox.com. As is "
s = s & vbCrLf & "http://www.wiley.com."


Set colMatches = re.Execute(s)




sMsg = ""




For Each objMatch in colMatches




sMsg = sMsg & "Match of " & objMatch.Value




sMsg = sMsg & ", found at position " & objMatch.FirstIndex & " of




the string."




sMsg = sMsg & "The length matched is "




sMsg = sMsg & objMatch.Length & "." & vbCrLf




Next




MsgBox sMsg


Matches Properties

Matches is a simple collection and supports just two properties:

Count
Item

Count returns the number of items in the collection.

Dim re, objMatch, colMatches, sMsg
Set re = New RegExp
re.Global = True re.Pattern = "http://(w+[w-]*w+.)*w+"
s = "http://www.kingsley-hughes.com is a valid web address. And so is "
s = s & vbCrLf {&} "http://www.wrox.com. As is "
s = s & vbCrLf & "http://www.wiley.com." 
Set colMatches = re.Execute(s)


MsgBox colMatches.count


The output of the preceding code is shown in Figure 9-22.


Figure 9-22

Item returns an item based on the specified key.

Dim re, objMatch, colMatches, sMsg
Set re = New RegExp
re.Global = True re
re Pattern = "http://(w+[w-]*w+.)*w+"
s = "http://www.kingsley-hughes.com is a valid web address. And so is "
s = s & vbCrLf & "http://www.wrox.com. As is " s = s & vbCrLf & "http://www.wiley.com." 
Set colMatches = re.Execute(s)


MsgBox colMatches.item(0)




MsgBox colMatches.item(1)




MsgBox colMatches.item(2)


The Match Object

Match objects are the members in a Matches collection. The only way to create a Match object is by using the Execute method of the RegExp object. When a regular expression is executed, zero or more Match objects can result.

Each Match object provides the following:

  • Access to the string found by the regular expression
  • The length of the string found
  • An index to where in the string the match was found

Match Properties

The Match object has three properties. All three properties are read-only:

  • FirstIndex
  • Length
  • Value

FirstIndex Property

The FirstIndex property returns the position in a search string where a match occurs.

object.FirstIndex
object Always a Match object
Dim re, objMatch, colMatches, sMsg
Set re = New RegExp
re.Global = True
re.Pattern = "http://(w+[w-]*w+.)*w+" s = "http://www.kingsley-hughes.com is a valid web address. And so is "
s = s & vbCrLf & "http://www.wrox.com. As is " s = s & vbCrLf & "http://www.wiley.com." 
Set colMatches = re.Execute(s)
sMsg = ""
For Each objMatch in colMatches


sMsg = sMsg & "Match of " & objMatch.Value


sMsg = sMsg & ", found at position " & objMatch.FirstIndex & " of the string. "
 sMsg = sMsg & "The length matched is "
 sMsg = sMsg & objMatch.Length & "." & vbCrLf
Next
MsgBox sMsg

Length Property

The Length property returns the length of a match found in a search string.

object.Length
object Always a Match object
Dim re, objMatch, colMatches, sMsg
Set re = New RegExp
re.Global = True re.Pattern = "http://(w+[w-]*w+.)*w+"
s = "http://www.kingsley-hughes.com is a valid web address. And so is "
s = s & vbCrLf & "http://www.wrox.com. As is "
s = s & vbCrLf & "http://www.wiley.com." 
Set colMatches = re.Execute(s)
sMsg = ""
For Each objMatch in colMatches 
 sMsg = sMsg & "Match of " & objMatch.Value 
 sMsg = sMsg & ", found at position " & objMatch.FirstIndex & "
of the string. "
 sMsg = sMsg & "The length matched is "


sMsg = sMsg & objMatch.Length & "." & vbCrLf


Next
MsgBox sMsg

Value Property

The Value property returns the value or text of a match found in a search string.

object.Value
object Always a Match object.
Dim re, objMatch, colMatches, sMsg
Set re = New RegExp
re.Global = True
re.Pattern = "http://(w+[w-]*w+.)*w+" s = "http://www.kingsley-hughes.com is a valid web address. And so is "
s = s & vbCrLf & "http://www.wrox.com. As is " s = s & vbCrLf & "http://www.wiley.com." 
Set colMatches = re.Execute(s)
sMsg = ""
For Each objMatch in colMatches


sMsg = sMsg & "Match of " & objMatch.Value


sMsg = sMsg & ", found at position " & objMatch.FirstIndex & " of the string. "
 sMsg = sMsg & "The length matched is "
 sMsg = sMsg & objMatch.Length & "." & vbCrLf
Next
MsgBox sMsg


A Few Examples

We've covered a lot of theory in the past few pages. Theory is great but you might like to see regular expressions in action. Let's complete this chapter with a few examples of how you can make use of regular expressions to solve real life problems.

Validating Phone Number Input

Validating inputs prevents bogus or dubious information being entered by a user . One piece of information that many developers need to make sure is a telephone number entered correctly. While we cannot write a script to actually check if a number is a valid phone number, we can use script and regular expressions to enforce a format on the input, which helps to eliminate false entry.



Dim re, s, objMatch, colMatches




Set re = New RegExp




re.Pattern = "([0-9]{3}[0-9]{3}-[0-9]{4}"




re.Global = True




re.IgnoreCase = True




s = InputBox("Enter your phone number in the following Format (XXX)XXX-XXXX:")




If re.Test(s) Then




MsgBox "Thank you!"




Else




MsgBox "Sorry but that number is not in a valid format."




End If


The code is simple, but again it is the pattern that does all the hard work. Depending on the input, you can get one of two possible output messages, shown in Figures 9-23 and 9-24.


Figure 9-23


Figure 9-24

If you want to make this script applicable in countries with other formats you will have to do a little work on it, but customizing it wouldn't be difficult.

Breaking Down URIs

Here is an example that can be used to break down a Universal Resource Indicator (URI) into its component parts . Take the following URI:



www.wrox.com:80/misc-pages/support.shtml


We can write a script that will break it down into the protocol ( ftp , http , and so on), the domain address, and the page/ path . To do this we can use the following pattern.



"(w+): /  /([^ / :]+)(:d*)?( [^ # ]*)"


The following code will carry out the task.



Dim re, s Set re = New RegExp




re.Pattern = "(w+): /  /( [^ /:]+)(:d*)?( [^ # ]*)"




re.Global = True




re.IgnoreCase = True




s = "http://www.wrox.com:80/misc-pages/support.shtml"




MsgBox re.Replace(s, "")




MsgBox re.Replace(s, "")




MsgBox re.Replace(s, "")




MsgBox re.Replace(s, "")


Testing for HTML Elements

Testing for HTML elements is easy; all you need is the right pattern. Here is one that works for elements with both an opening and closing tag.



"<(.*)>.*</>"


How you script this depends on what you want to do. Here is a simple script just for demonstration purpose.



Dim re, s




Set re = New RegExp




re.IgnoreCase = True




re.Pattern = "<(.*)>.*< / >"




s = "

This is a paragraph

" If re.Test(s) Then MsgBox "HTML element found." Else MsgBox "No HTML element found." End If

Matching White Space

Sometimes it can be really handy to be able to match white space, that is, lines that are either completely empty, or that only contain white space (spaces and tab characters ). Here is the pattern you would need for that.



"^[ 	]*$"


That breaks down to the following:

^ -Matches the start of the line.

[ ] *-Match zero or more space or tab ( ) characters.

$ -Match the end of the line.



Dim re, s, colMatches, objMatch, sMsg




Set re = New RegExp




re.Global = True re.Pattern = "^[ 	]*$"




s = " "




Set colMatches = re.Execute(s)




sMsg = ""




For Each objMatch in colMatches




sMsg = sMsg & "Blank line found at position " & objMatch.FirstIndex




& " of the string."




Next




MsgBox sMsg


Matching HTML Comment Tags

When you come to the section on Windows Script Host we'll show you how you can use VBScript and Widows Script Host to work with the file system. Once you can do this, reading and modifying files becomes within your reach. One good application of regular expressions might be to look for comment tags within an HTML file. You could then choose to remove these before making the files available on the Web.

Here is a script that can detect HTML comment tags.



Dim re, s




Set re = New RegExp




re.Global = True re.Pattern = "^.*.*$"




s = " 

A Title " If re.Test(s) Then MsgBox "HTML comment tags found." Else MsgBox "No HTML comment tags found." End If

With a simple modification to the pattern and the use of Replace method, we can get the script to remove the comment tag altogether.

Dim re, s
Set re = New RegExp
re.Global = True


re.Pattern = "(^.*)()(.*$)"


s = " 

A Title " If re.Test(s) Then MsgBox "HTML comment tags found." Else MsgBox "No HTML comment tags found." End If MsgBox re.Replace(s, "" & "")


Summary

In this chapter we've covered, in depth, regular expressions and how they fit into the world of VBScript. You've seen how regular expressions can be used to carry out effective, flexible pattern matching within text strings. You've also seen examples of what can be done by effectively integrating regular expressions with script together with examples of customizable find and replace within text strings as well as input validations.

Learning to use regular expressions can seem a bit daunting and even those comfortable with programming sometimes find regular expressions forbidding and choose instead less flexible solutions. However, the power and flexibility that regular expressions give to the programmer is immense and your efforts will be quickly rewarded!




VBScript Programmer's Reference
VBScript Programmers Reference
ISBN: 0470168080
EAN: 2147483647
Year: 2003
Pages: 242

Flylib.com © 2008-2020.
If you may any questions please contact us: flylib@qtcs.net