Section 4.4. Take String Comparison Beyond -eq, -lt, and -gt


4.4. Take String Comparison Beyond -eq, -lt, and -gt

In Chapter 3, we looked at a number of comparison operators, such as -eq, -lt, and -gt, which can be applied to many of the different types of data we work with in the shell. Each of these three operators works effectively on strings by checking for identical strings and giving some idea of relative alphabetical ordering. However, there are many cases in which we'd like to do a more meaningful comparison of the actual letters within a stringfor example, to test whether the string contains a certain shorter string or whether it matches a certain format such as the aaa.bbb.ccc.ddd format of a numeric IP address.

We'll look at two approaches to matching strings in this section. The first case uses the -like operator with some basic wildcard rules to see whether one string contains another. The second technique uses the -match operator and relies on regular expressions to communicate more complex matching rules. Before we begin, let's look at some examples of regular expressions.

Fear of Regular Expressions

Regular expressions can appear very daunting at first glance. The seemingly senseless sequence of backslashes, characters, parentheses, brackets, and other punctuation can easily become overwhelming. Authoring regular expressions isn't necessarily a skill you can pick up overnight, nor should you need to. Careful use of the -match operator with some well-crafted regular expressions is a handy way to match text quickly and efficiently. However, if you find yourself authoring regular expressions that are hundreds of characters long, it may be time to think about breaking the tests out separately and perhaps grouping them together inside a function.


4.4.1. Regular Expressions

A regular expression describes a set of matching strings according to a series of rules. In this section, we'll cover a few of the basic rules and look at some common examples, but it's important to realize that regular expressions are a vast topic that won't be covered exhaustively here. For a more complete picture of the topic, consider picking up a copy of Mastering Regular Expressions (O'Reilly).

There are three principles that are fundamental to understanding and effectively using regular expressions. The first is the concept of alternatesthat is, the idea that a single regular expression can express two or more different strings to match against. Alternates are separated by a vertical bar (|), which is the same symbol used for building a pipeline. For example, the regular expression w3svc|iisadmin|msftpsvc matches "w3svc", "iisadmin", "msftpsvc", and the string "w3svc service is started but iisadmin is not." Square brackets are often used as shorthand for specifying single-character alternatesfor example, where [aeiou] is equivalent to a|e|i|o|u. The hyphen can also be used inside brackets to cover a range; [a-m] matches any letter in the first half of the alphabet.

Second, different parts of a regular expression can be grouped together using parentheses. Grouping is useful when only part of a longer regular expression is subject to alternation or quantification. For example, the regular expression (w3|msftp)svc matches both "w3svc" and "msftpsvc." Groups can be nested inside each other, provided every open parenthesis is matched to a closing one.

Quantification, the third key part of regular expressions, gives us the power to specify how many times a certain character or sequence must occur to constitute a match. For example, the regular expression (domain\\)?user would match "user" and "domain\user" but not "domain\domain\user". Table 4-3 describes the quantifiers available for use.

Table 4-3. Common quantifiers for denoting quantity in regular expressions

Quantifier

Matches the preceding expression...

*

Zero or more times

+

One or more times

?

Once at most

{n}

Exactly n times

{n,}

At least n times

{n,m}

At least n and at most m times


Regular expressions can also use a set of special characters as shorthand for common matches. These special characters, shown in Table 4-4, are different from those covered earlier in this chapter, and they apply only to regular expressions.

Table 4-4. Common special characters used in regular expressions

Special character

Meaning

.

Any single character

^

Start of a string

$

End of a string

\b

Word boundary (such as a space or newline)

\d

Digit (0-9)

\n

Newline

\s

Whitespace (space, tab, newline, etc.)

\t

Tab

\w

Word (alphabet plus digits and underscore)


Many of these special characters have an inverse associated with their capital letter form. For example, \S matches anything that isn't whitespace, and \W matches anything that isn't a word or digit.

To wrap up this short tour, Table 4-5 contains a few examples of simple regular expressions that we'll rely on in the examples that follow.

Table 4-5. Simple regular expressions

Type of information

Regular expression

Windows username

(\w*\\)?\w*

IP address

^\d+\.\d+\.\d+\.\d+$

Simple private IP addresses (RFC 1918 defined 10.x.x.x, 172.16-32.x.x, 192.168.x.x)

^(10\.\d+\.\d+\.\d+|172\.[1-3][0-9]\.\d+\.\d+|192\.168\.\d+\.\d+)$

GUID (in the registry format of {xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx})

^{?[0-9a-f]{8}-([0-9a-f]{4}-){3}[0-9a-f]{12}}?$


With the basics in place, it's time to match some strings.

4.4.2. How Do I Do That?

Let's start by reviewing the -eq comparison operator. When used on strings, -eq does a case-insensitive test to see whether strings are identicalnot close, but identical:

     MSH D:\MshScripts> "foo" -eq "foo"     True     MSH D:\MshScripts> "foo" -eq "bar"     False     MSH D:\MshScripts> "foo " -eq "foo"     False     MSH D:\MshScripts> "foo" -eq "FOO"     True

The -like operator brings into play all of the wildcards we just saw. The behaviors of *, ?, [, and ] all follow the same rules as we saw when matching filenames:

     MSH D:\MshScripts> "foo" -like "foo"     True     MSH D:\MshScripts> "foobar" -like "foo*"     True     MSH D:\MshScripts> "foobar" -like "*ba?"     True     MSH D:\MshScripts> "gray" -like "gr[ae]y"     True

-like has a related operator, -clike, that is used to perform case-sensitive matching. The two operators treat wildcards in almost exactly the same fashion; the only difference is that the -clike operator distinguishes between uppercase and lowercase letters:

     MSH D:\MshScripts> "foo" -like "FOO"     True     MSH D:\MshScripts> "foo" -clike "FOO"     False

Both -like and -clike have inverse commands that return true when no match is made and false when a match is present. The -notlike operator is a handy shortcut for -not ("a" -like "b"):

     MSH D:\MshScripts> "foo" -notlike "FOO"     False     MSH D:\MshScripts> "foo" -cnotlike "FOO"     True

Wildcard comparisons are a useful tool and can be applied to all types of string-matching tasks. However, there are types of strings that cannot be captured in sufficient detail with wildcards alone. For example, it's possible to match one character (?) or any number of characters (*), yet there's no way to express a match of, say, exactly four. Likewise, a wildcard match is wide openletters, numbers, and punctuation are all allowed. For some more specific matches, it's time to bring in the regular expressions.

MSH performs regular expression matching with the -match operator. As with -like, it, too, has related operators for case sensitivity (-cmatch) and negative matches (-notmatch and -cnotmatch).

Let's look at a few simple regular expression matches. Although we're looking at all of these examples as simple command-line Boolean tests, these ideas can easily be transferred to other places, such as the where-object cmdlet, taking wildcards to a whole new level:

     MSH C:\WINDOWS\system32> "ipv6.exe" -match ".*exe"     True     MSH D:\MshScripts> "ipv6.exe" -match ".*\d{1}.*exe"     True     MSH D:\MshScripts> "ipv6.exe" -match ".*\d{2}.*exe"     False                # regex required two consecutive digits     MSH D:\MshScripts> get-childitem | where-object { $_ -match ".*\d{2}.*exe" }         Directory: FileSystem::C:\WINDOWS\System32     Mode                LastWriteTime     Length Name     ----                -------------     ------ ----     -a---          8/4/2004   5:00 AM      47104 cmdl32.exe     -a---          8/4/2004   5:00 AM      39936 cmmon32.exe     -a---          8/4/2004   5:00 AM      45568 drwtsn32.exe     -a---          8/4/2004   5:00 AM      45568 extrac32.exe     -a---          8/4/2004   5:00 AM      92224 krnl386.exe     -a---          8/4/2004   5:00 AM     123392 mplay32.exe     -a---          8/4/2004   5:00 AM       3252 nw16.exe     -a---          8/4/2004   5:00 AM      32768 odbcad32.exe     -a---          8/4/2004   5:00 AM       3584 regedt32.exe     -a---          8/4/2004   5:00 AM      11776 regsvr32.exe     -a---          8/4/2004   5:00 AM      33280 rundll32.exe     ...

It's worthwhile to compare the behavior of -like with -match to better understand their differences. Even the simplest cases turn up some surprises:

     MSH D:\MshScripts> "foobar" -like "foo"     False     MSH D:\MshScripts> "foobar" -match "foo"     True

When used without any special characters, quantifiers, or alternates, regular expression matching is similar to wildcard matching with one key difference: if no wildcards are present in a -like match, the strings must be identical for a match to occur, whereas with a regular expression, it's sufficient for the string to simply contain the regular expression. When writing regular expressions, it's important to keep this in mind and start the regular expression with a caret (^) and end it with a dollar sign ($). The following example shows the different outcomes that result when you try to match an invalid dotted IP address against the two types of regular expression:

     MSH D:\MshScripts> "1.2.3.4.5" -match "\d+\.\d+\.\d+\.\d+"     True        # No!     MSH D:\MshScripts> "1.2.3.4.5" -match "^\d+\.\d+\.\d+\.\d+$"     False       # That's better

Let's take a look at a slightly more involved example. First, we'll use the regular expression for a GUID and verify that it's working correctly against a sample GUID:

     MSH D:\MshScripts> $guidRegex = "^{?[0-9a-f]{8}-([0-9a-f]{4}-){3}     [0-9a-f]{12}}?$"     MSH D:\MshScripts> $myGuid = [System.Guid]::NewGuid( ).ToString( )     MSH D:\MshScripts> $myGuid     496a3bc7-861d-4176-9778-e01f266ba835     MSH D:\MshScripts> $myGuid -match $guidRegex     True

For one last example, let's turn our attention to IP addresses. To grab the current IP address, we'll again dip into the .NET Framework and then run the IP through a couple of regular expressions to confirm that it's both valid and non-private:

     MSH D:\MshScripts> function get-ipaddress {     >>$hostname = [System.Net.Dns]::GetHostName( )     >>$hosts = [System.Net.Dns]::GetHostByName($hostname)     >>$hosts.AddressList[0].ToString( )     >>}     >>     MSH D:\MshScripts> $ipRegex = "^\d+\.\d+\.\d+\.\d+$"     MSH D:\MshScripts> $privateIpRegex = "^(10\.\d+\.\d+\.\d+|172\.[1-3][0-9]\.\     d+\.\d+|192\.168\.\d+\.\d+)$"     MSH D:\MshScripts> $myIP = get-ipaddress     MSH D:\MshScripts> $myIP     169.254.136.191     MSH D:\MshScripts> $myIP -match $ipRegex     True     MSH D:\MshScripts> $myIP -notmatch $privateIpRegex     True

4.4.3. What About...

... Why is -like needed? Can't its behavior be achieved just by using the -match operator? While it's true that regular expressions can be used to get the same results as wildcard matches, there are good reasons to have both. If the -like wildcard syntax makes a comparison easier to read, it usually makes long-term maintenance of scripts easier as well.

... Does variable expansion work here? Absolutely. As we saw earlier, MSH exercises variable expansion on any strings it sees that are enclosed in double quotes. Make sure to use single or double quotes appropriately, depending on how you want MSH to handle your variables:

     MSH D:\MshScripts> $myVar = "Andy"     MSH D:\MshScripts> "Hello, $myVar" -ilike "*andy*"     True     MSH D:\MshScripts> $myRegex = "\d{3}"     MSH D:\MshScripts> "test133" -match "test$myRegex"     True     MSH D:\MshScripts> "test148" -match 'test$myRegex' # no expansion     False

4.4.4. Where Can I Learn More?

We've only scratched the surface of regular expressions in this section. They stand as a very expressive tool for solving all types of text-matching scenarios, and they can be significantly more complex and powerful than the examples we've looked at here. The regular expression language covered here is precisely the same as the one offerered by the .NET Framework.Information is available at http://msdn.microsoft.com/library/default.asp?url=/library/en-us/cpgenref/html/cpconregularexpressionslanguageelements.asp.

We've covered a number of distinct aspects of the MSH infrastructure in this chapter. We'll wrap up with a discussion of the error-handling mechanisms built into MSH.




Monad Jumpstart
Monad Jumpstart
ISBN: N/A
EAN: N/A
Year: 2005
Pages: 117

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net