Recipe 5.37. Using Regular Expressions to Extract All Numbers


Problem

You want to extract all numbers from a string that has extra whitespace, text, and other nonnumeric characters interspersed throughout.

Solution

Sample code folder: Chapter 05\RegexExtractNum

Use a regular expression (Regex) object to identify and parse out a list of all numbers in the string.

Discussion

This is a very tricky problem if the exact format of the string is not known. Identifying exactly which sets of characters are parts of numbers with accuracy in all cases can be difficult. Negative signs, scientific notation, and other complications can arise. Fortunately, the regular expression object greatly simplifies the task. The fol-lowing code demonstrates how it works:

 Imports System.Text.RegularExpressions ' …Later, in a method… Dim source As String = _    "This 321.0 string -0.020 contains " & _    "3.0E-17 several 1 2. 34 numbers" Dim result As String Dim parser As New _    Regex("[-+]?([0-9]*\.)?[0-9]+([eE][-+]?[0-9]+)?") Dim sourceMatches As MatchCollection = _    parser.  Matches(source) Dim counter As Integer result = "Count: " & _    sourceMatches.Count.ToString() & vbNewLine For counter = 0 To sourceMatches.Count - 1    result &= vbNewLine    result &= sourceMatches(counter).Value.ToString()    result &= Space(5)    result &= CDbl(sourceMatches(counter).Value).ToString() Next counter MsgBox(result) 

The string to be parsed is source, which contains a variety of integer and floating-point numbers, both positive and negative, with words and other nonnumeric characters mixed in. A Regex object named parser is instantiated using a specially crafted regular expression designed to locate all conventionally defined numbers. The Matches() method of the Regex object is applied to the string, and a collection of Matches is returned. This collection's Count property provides a tally of how many numbers were found in the string. Each item in the Matches collection has a Value property with a ToString() method that converts the numeric value to a string.

Figure 5-42 shows the results of parsing the sample string, listing the numbers found using the regular expression. The Matches value displays the string exactly as copied from the original string. That's the first number on lines 27 in the message box. The second number shows the string converted to a Double and then back to a string. The reason for this extra step is to verify that the match string does convert to a numeric value.

Figure 5-42. Parsing the sample string reveals all the numbers it contains


The regular expression presented in this example is one of many that can be found on multiple Internet web sites. The Internet provides a great resource for locating regular expressions for any specific purposes.


See Also

Recipe 5.38 also discusses regular expression processing. The following web sites are just some of the many places on the Internet that provide regular expression samples:

http://www.regular-expressions.info/examples.html
http://sitescooper.org/tao_regexps.html
http://en.wikipedia.org/wiki/Regular_expression



Visual Basic 2005 Cookbook(c) Solutions for VB 2005 Programmers
Visual Basic 2005 Cookbook: Solutions for VB 2005 Programmers (Cookbooks (OReilly))
ISBN: 0596101775
EAN: 2147483647
Year: 2006
Pages: 400

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net