Regular Expression Examples


Before we wrap up this quick introduction to regular expressions, let's review of regular expressions that you're likely to need and use.

E-mail Address

It's not unreasonable that you might want to search for a string of text that matches an email address pattern. Here is one such regular expression:

 ^([\w-]+)(\.[\w-]+)*@([\w-]+\.)+[a-zA-Z]{2,7}$ 

The selection is a sequence consisting of:

  • A start anchor (^)

  • The expression ([\w-]+) that returns any word string and the dash character

  • The expression (\.[\w-]+)* that returns a period and then any word string and the dash

  • The @ character

  • The expression ([\w-]+\.)+ that returns any word string that ends in a period

  • The expression [a–zA–Z]{2,7}

  • An end anchor ($)

The expression [a–zA–Z]{2,7} will return any character string that is at least two characters and no more than seven. This should allow domain names such as .ca and .museum.

There's more than one way

There are many different regular expression patterns for an e-mail address. Even though this particular pattern should work for just about any address, it is not 100% guaranteed. We used this pattern because it is relatively simple to follow.

Here's how we might use this expression:

 PS C:\> $regex=[regex]"^([\w-]+)(\.[\w-]+)*@([\w-]+\.)+[a-zA-Z]{2,7}$" PS C:\> $var= ("j.hicks@sapien.com","oops@ca",` >> "don@sapien.com""alex@dev.sapien.com") PS C:\> $var j.hicks@sapien.com oops@ca don@sapien.com alex@dev.sapien.com PS C:\> $var -match $regex j.hicks@sapien.com don@sapien.com alex@dev.sapien.com PS C:\> $var.count 4 PS C:\> 

We start by creating a regex object with our e-mail pattern and define an object variable with some e-mail names to check. We've introduced one entry that we know will fail to match. The easiest way to list the matches is to use the -match operator that returns all the valid email addresses.

If you try the $regex.matches($var) or $regex.IsMatch($var) expressions, nothing or FALSE is returned:

 PS C:\> $regex.matches($var) PS C:\> $regex.IsMatch($var) False PS C:\> 

This occurs because $var is an array. We need to enumerate the array and evaluate each element against the regular expression pattern:

 PS C:\> foreach ($item in $var) { >> if ($regex.IsMatch($item)) { >> write-host $item "is a valid address" >> } >> else { >> write-host "$item is NOT a valid address" } >> } j.hicks@sapien.com is a valid address oops@ca is NOT a valid address don@sapien.com is a valid address alex@dev.sapien.com is a valid address PS C:\> >> 

In this example we're enumerating each item in $var. If the current variable item matches the regular expression pattern, then we display a message confirming the match. Otherwise we display a non-matching message.

String with No Spaces

Up to now we've been using regular expressions to match alphanumeric characters. However, we can also match whitespaces such as a space, tab, new line, or the lack of whitespace. Here's a regex object that uses \S that is looking to match nonwhitespace characters:

 PS C:\> $regex=[regex]"\S" PS C:\> $var="The-quick-brown-fox-jumped-over-the-lazy-dog." PS C:\> $var2="The quick brown fox jumped over the lazy dog." PS C:\> 

In this example we have two variables - one with white spaces and the other without. Which one will return TRUE when evaluated with IsMatch?

 PS C:\> $regex.IsMatch($var) True PS C:\> $regex.IsMatch($var2) True PS C:\> 

Actually, this is a trick question because both return TRUE. This happens because \S is looking for any nonwhitespace character. Since each letter or the dash is a nonwhitespace character, the pattern matches. If our aim is to check a string to find out if it contains any spaces, then we really need to use a different regular expression and understand that a finding of FALSE is what we're seeking:

 PS C:\> $regex=[regex]"\s{1}" PS C:\> $regex.Ismatch($var) False PS C:\> $regex.Ismatch($var2) True PS C:\> 

The regular expression \s{1} is looking for a whitespace character that occurs only one time. Evaluating $var with IsMatch returns FALSE because there are no spaces in the string. The same execution with $var2 returns TRUE because there are spaces in the string. So, if we wanted to take some action based on this type of negative matching, we might use something like this:

NegativeMatchingTest.ps1

image from book
 $var="The-quick-brown-fox-jumped-over-the-lazy-dog." $var2="The quick brown fox jumped over the lazy dog." $regex=[regex]"\s{1}" $var if (($regex.IsMatch($var)) -eq "False") { write-host "Expression has spaces" } else { write-host "Expression has no spaces" } $var2 if (($regex.IsMatch($var2)) -eq "False") { write-host "Expression has spaces" } else { write-host "Expression has no spaces" } 
image from book

This action produces the following output:

 The-quick-brown-fox-jumped-over-the-lazy-dog. Expression has no spaces The quick brown fox jumped over the lazy dog. Expression has spaces PS C:\> 

The purpose of this example is to illustrate that there may be times when you want to match on something that is missing or a negative pattern.

Domain Credential

Let's look at a regular expression example to match a Windows domain name that is in the format Domain\username:

 PS C:\> $regex=[regex]("\w+\\\w+") PS C:\> $var=@("sapien\jeff","sapien\don","sapien\alex") PS C:\> $regex.matches($var) Groups   : {sapien\jeff} Success  : True Captures : {sapien\jeff} Index    : 0 Length   : 11 Value    : sapien\jeff Groups   : {sapien\don} Success  : True Captures : {sapien\don} Index    : 12 Length   : 10 Value    : sapien\don Groups   : {sapien\alex} Success  : True Captures : {sapien\alex} Index    : 23 Length   : 11 Value    : sapien\alex PS C:\> 

Again, we create our regex object and an object variable with some domain names. Invoking the -matches method shows the results. As we've demonstrated earlier, you can display the match values in at least two different ways:

 PS C:\> foreach ($m in $regex.matches($var)) {$m.value} sapien\jeff sapien\don sapien\alex PS C:\> $var -match $regex sapien\jeff sapien\don sapien\alex PS C:\> 

Which method you choose will depend on what you want to do with the information.

Telephone Number

Matching a phone number is pretty straightforward. We can use the pattern \d{3}–\d{4} to match any basic phone number without the area code:

 PS C:\> $regex=[regex]"\d{3}–\d{4}" PS C:\> "555-1234" -match $regex True PS C:\> $matches Name                           Value ----                           ----- 0                              555–1234 PS C:\> "5551-234" -match $regex False PS C:\> $regex.IsMatch("abc-defg") False PS C:\> $regex.IsMatch("123-0987") True PS C:\> 

Hopefully these examples are looking familiar. First we defined a regular expression object and then we test different strings to see if there is a match. You can see that only three digits (\d{3}) plus a dash (-) plus four digits (\d{4}) make a match.

IP Address

For our final example, let's look at a likely use for a regular expression. We want to examine a Web log and pull out all the IP addresses. Here's the complete set of commands. We'll go through them at the end:

 PS C:\Logs> $var=get-content "ex060211.log" PS C:\Logs> $regex=[regex]"\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}" PS C:\Logs> $regex.Matches($var) Groups   : {192.168.10.1} Success  : True Captures : {192.168.10.1} Index    : 15679 Length   : 12 Value    : 192.168.10.1 Groups   : {217.58.174.3} Success  : True Captures : {217.58.174.3} Index    : 15728 Length   : 12 Value    : 217.58.174.3 PS C:\Logs> $regex.Matches($var)| ` select-object -unique -property "value" Value ----- 192.168.10.1 69.207.16.195 61.77.118.73 69.207.43.227 59.16.161.193 221.248.23.251 202.196.222.222 216.127.66.128 64.252.96.72 213.97.113.25 85.124.110.222 59.11.81.103 59.13.34.109 220.195.3.86 38.119.239.197 217.58.174.3 220.135.88.151 69.241.39.66 213.152.142.15 PS C:\Logs> 

The first thing we do is dump the contents of the log file to the variable $var. Next we create an object variable that will be a regular expression object by casting it as type regex and specifying the matching pattern. Remember, we need to use a regular expression object because the -match operator only checks for the first instance of a match. In an IIS log, the first IP listed is usually the host Web server and we want the visitor's IP address that comes second. Everything we've covered up to now about patterns and regular expressions is still valid. We're just going to use an object with built-in regular expression functionality. You'll also notice that our IP address pattern does not use ^ and $. That's because the IP addresses we're looking for don't start or end each line of the log file.

Invoking the Matches method of the regex object essentially takes our matching pattern and compares it to the contents of $var:. Whenever a match is found, it will be displayed. We've edited the output to only show a few representative examples.

Alone that might be sufficient. But we'll take this one step further and send the output of the Matches method to the Select-Object cmdlet. With this cmdlet we can select only the value property of each regular expression match and also return a list of unique values.



Windows PowerShell. TFM
Internet Forensics
ISBN: 982131445
EAN: 2147483647
Year: 2004
Pages: 289

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net