Managing Enterprise Systems with the Windows Script Host
Authors: Borge S.
Published year: 2005
Pages: 69-70/242
Buy this book on amazon.com >>

8.2 Matching Multiple Patterns

Problem

You want all the expressions in a string that match a pattern.

Solution

You can create a RegExp object, setting the Pattern property to the pattern you want to match and setting the Global property to True . Any matches are stored in the Matches collection. The following sample returns a list of all numeric values from a comma-delimited string:

Set objRegExp = New RegExp 'set pattern to extract all numeric values from string objRegExp.Pattern = "\d+\.?\d*\.\d+" objRegExp.IgnoreCase = True objRegExp.Global = True Set objMatches = objRegExp.Execute("111.13,1232,ABC,444,55") For Each objMatch In objMatches Wscript.Echo "Found match:" & objMatch.Value & " at position " & _ objMatch.FirstIndex Next

Discussion

Repetitive patterns can be extracted from a string using regular expressions.

The Regular expression object contains a Global property. Setting this property to True indicates that the input will be checked for multiple occurrences of the pattern.

To return a list of any matches from the expression, invoke the Execute method:

Set objMatches = objRegExp.Execute(strString)

objRegExp represents a regular expression object, while the strString parameter is the string to test.

Any matches returned from the string are stored in the Matches collection. This collection contains all matches from the input string represented as Match objects.

Each Match object contains a Value and FirstIndex property. The Value property returns the value extracted from the string, while the FirstIndex property returns the offset in the string where the match was made.

The following generic function extracts items from a comma-delimited string. The elements are stored in an array, which is returned by the function:

Function ExtractCSV(strCSV) Dim objRegExp, objMatch, aRet, objMatches, nF Set objRegExp = New RegExp ' matches digits, digits followed by a point, digits followed by a ' point and then more digits, a point followed by digits, or anything '

enclosed

by double-quotes objRegExp.Pattern = "\d+\.?\d*\.\d+\x22[^""]+\x22" objRegExp.IgnoreCase = True objRegExp.Global = True Set objMatches = objRegExp.Execute(strCSV) If objMatches.Count > 0 Then ReDim aRet(objMatches.Count) For nF = 0 To objMatches.Count - 1 ' iterate Matches collection. Set objMatch = objMatches.item(nF) ' check if the string is

surrounded

by quotes, if so remove them If Left(objMatch.Value, 1) = """" And _ Right(objMatch.Value, 1) = """" Then aRet(nF) = Mid(objMatch.Value, 2, Len(objMatch.Value) - 2) Else aRet(nF) = objMatch.Value End If Next ExtractCSV = aRet Else ExtractCSV = Empty End If End Function

The string elements can be numeric or text values. Any text values must be surrounded by double quotes and can contain any character, including commas.

aValues = ExtractCSV("10.50,10,"" Fred Smith"",20")

See Also

For more information, read the MSDN Library article "Global Property" ( http://msdn.microsoft.com/library/en-us/script56/html/vsproGlobal.asp ).

8.3 Matching Subexpressions

Problem

You want to extract all regular expression subexpression matches from a target string.

Solution

Use the Matches collection's SubMatches property to list all matched subexpressions:

Set objRegExp = New RegExp ' match "digits slash digits slash digits" objRegExp.Pattern = "(\d+)\/(\d+)\/(\d+)" objRegExp.IgnoreCase = True Set objMatches = objRegExp.Execute("5/13/2000") 'list all subexpressions [ For nF = 0 To objMatches(0).SubMatches.Count - 1 Wscript.Echo objMatches(0).SubMatches(nF) Next

Discussion

The Regular expression object's Matches collection returns a list of matches by executing a single expression against a string. This is useful when matching multiple matches of a single pattern, as demonstrated by Solution 8.2.

If you need to match multiple different expressions or a set number of expressions in a certain order, you need to use subexpressions. For example, the following regular expression matches two subexpressions:

(.+):(.+)

If this regular expression was executed against the string "10:30" , it would return two subexpressions: 10 and 30. VBScript did not support subexpression matching until VBScript version 5.5. Documentation provided with earlier VBScript versions states that it can process subexpressions using the Matches collection, but this is not the case.

VBScript 5.5 introduces a SubMatches property to the Match object, which provides access to any subexpressions returned from the execution of a regular expression against a string.

The first Match element of the Matches collection contains the string matches, which can be accessed through the SubMatches property. The SubMatches property exposes a Count property that returns the number of submatches.

Each submatch can be accessed through this collection as an array element. The following sample extracts the month, day, and year from a date string and displays each element from the SubMatches collection:

Dim objRegExp, objMatches, objSubMatches 'Set objRegExp = New RegExp Set objRegExp = CreateObject("VBScript.RegExp") ' match "digits slash digits slash digits" objRegExp.Pattern = "(\d+)\/(\d+)\/(\d+)" Set objMatches = objRegExp.Execute("5/13/2000") 'the first element of the matches collection contains the submatches Set objSubMatches = objMatches(0).SubMatches 'list the number of sub expression matches and matches Wscript.Echo "Subexpression count " & objSubMatches.Count Wscript.Echo "Subexpression 1 " & objSubMatches (0) Wscript.Echo "Subexpression 2 " & objSubMatches (1) Wscript.Echo "Subexpression 3 " & objSubMatches (2)

The following script, regflt.vbs , is a generic command-line program that executes a regular expression pattern against standard input, outputting any extracted subexpressions to standard output delimited by a comma:

'regflt.vbs 'command line regular expression filter Dim nF, strDelim, strLine If WScript.Arguments.Count < 1 Or WScript.Arguments.Count > 2 Then ShowUsage End If strDelim = "," 'set delimiter strPattern = WScript.Arguments(0) 'check if alternate output

delimiter

specified If WScript.Arguments.Count = 2 Then strDelim = WScript.Arguments(1) 'create regular expression object and set properties Set objRegExp = New RegExp objRegExp.Pattern = strPattern objRegExp.IgnoreCase = True objRegExp.Global = True 'loop until the end of the text stream has been

encountered

Do While Not WScript.StdIn.AtEndOfStream 'read line from standard input strLine = WScript.StdIn.ReadLine 'execute regular expression match Set objMatches = objRegExp.Execute(strLine) strOut = "" 'if matches are made, loop through each match and append to output If objMatches.Count > 0 Then For nF = 0 To objMatches(0).SubMatches.Count - 2 Wscript.Stdout.Write objMatches(0).Submatches(nF) & strDelim Next Wscript.Stdout.WriteLine objMatches(0).Submatches(nF) End If Loop Sub ShowUsage() WScript.Echo "regflt filters standard input against " & _ "a regular expression." & vbLf & _ "Syntax:" & vbLf & _ "regflt.vbs regexp [delimiter]" & vbLf & _ "regexp regular expression" & vbLf & _ "delimiter optional. Character to delimiter output

columns

" WScript.Quit -1 End Sub

Only one parameter is required for the script, which is the regular expression that is being executed against each line of the standard input. An optional second parameter can be passed to change the output delimiter, which by default is a comma.

The script can be useful for extracting values from existing files and building delimited files based on the output.

In the following sample, a price list is stored in a text file called prices.txt :

Bakery Price List Page 1 White bread .20 Brown bread .25 Whole wheat .30 Buns 0.30ea

To extract the descriptions and prices to an output file called prices.csv , use this code:

cscript regflt.vbs "^(\w+ \w*).+(\d+\.\d+)" < prices.txt

The results are stored in the prices.csv file:

White bread,1.20 Brown bread,1.25 Whole wheat,1.30 Buns,0.30

Values can be piped from any other application that outputs information to standard output. The following currency exchange page is stored on a remote Web server:

<html><head> <title>Exchange Rates</title> </head><body> <p>Exchange Rates for date:1/12/2000</p> <table border="1" width="529"> <tr><td>Canadian Dollar</td><td>CAN</td><td>.67</td></tr> <tr><td>Australian Dollar</td><td>AUS</td><td>.64</td></tr> <tr><td>German Mark</td><td>GDR</td><td>.7</td></tr> <tr><td>British Pound</td><td>PND</td><td>1.5</td></tr> <tr><td>Japanese Yen</td><td>YEN</td><td>.9</td></tr> </table> </body></html>

Using the httpget script from Chapter 11, the page is retrieved and piped to the regflt script:

cscript httpget.wsf "http://acme.com/xchange.htm"  cscript regflt. vbs "<tr><td>(.+)</td><td>(.+)</td><td>(.+)</td></tr>" > exchange.csv

The exchange values are extracted and redirected to a file called exchange.csv :

Canadian Dollar,CAN,.67 Australian Dollar,AUS,.64 German Mark,GDR,.7 British Pound,PND,1.5 Japanese Yen,YEN,.9

JScript has always provided access to subexpressions. Use the RegExp class to access subexpressions. JScript's way of accessing subexpressions is different than VBScript's as a result of language standard considerations.

To access a subexpression, either reference the RegExp class match property for the match number you want to return or the array of matches returned from the execution of the regular expression.

The numeric match properties are identified by a dollar sign ($) followed by the match number:

var objRegExp = new RegExp("(.+):(.+)"," ig"); var aMatches = objRegExp.

exec

("hello:there"); //output the first and second subexpression match WScript.Echo (RegExp.); WScript.Echo (RegExp.); //do the same as above, except reference the matches array WScript.Echo (aMatches[1]); WScript.Echo (aMatches[2]);

If you are processing an unknown number or more than nine sub-expressions, you can access them through the Matches collection. The following JScript code sample implements a command-line regular expression filter similar to the Solution script:

//regflt.js //filters regular expression elements from strings var nF, rst, strLine, strOut; if(WScript.Arguments.length< 1  WScript.Arguments.length>1) WScript.Quit(-1); // get the regular expression string rst = WScript.Arguments.Item(0); //loop until the end of the text stream has been encountered while(!WScript.StdIn.AtEndOfStream) { strLine = WScript.StdIn.ReadLine(); arg = strLine.match(rst); arg = strLine.match(rst); if (arg) { for(nF=1;nF<arg.length;nF++) WScript.StdOut.Write(arg[nF] + (nF<arg.length-1 ? "," : "")); WScript.StdOut.WriteLine(); } }

See Also

For more information, read the MSDN Library article "What's New in Windows Script 5.5" ( http://msdn.microsoft.com/library/en-us/dnclinic/html/scripting121399.asp ).

Managing Enterprise Systems with the Windows Script Host
Authors: Borge S.
Published year: 2005
Pages: 69-70/242
Buy this book on amazon.com >>