A Regular Expression Rosetta Stone

A Regular Expression Rosetta Stone

Regular expressions are incredibly powerful, and their usefulness extends beyond just restricting input. They constitute a technology worth understanding for solving many complex data manipulation problems. I write many applications, mostly in Perl and C#, that use regular expressions to analyze log files for attack signatures and to analyze source code for security defects. Because subtle variations exist in regular expression syntax between programming languages and execution environments, the rest of this chapter outlines some of these variations. (Note that my intention is only to give you a number of regular expression quick references.)

Regular Expressions in Perl

Perl is recognized as a leader in regular expression support, in part because of its excellent string-handling and file-handling support. A regular expression that extracts the time from a string in Perl looks like this:

$_ = "We leave at 12:15pm for Mount Doom. "; if (/.*(\d{2}:\d{2}[ap]m)/i) { print $1; }

Note that the regular expression takes no arguments, because if no argument is provided, the $_ implicit variable is used. If the data is in a variable other than $_, you should use the following syntax:

var =~ /expression/;

Regular Expressions in Managed Code

Most if not all applications written in C#, Managed C++, Microsoft Visual Basic .NET, ASP.NET, and so on have access to the .NET Framework and as such can use the System.Text.RegularExpressions namespace. I've already outlined its syntax earlier in this chapter. However, for completeness, following are C#, Visual Basic .NET, and Managed C++ examples of the date extraction code I showed earlier in Perl.

C# Example

// C# Example String s = @"We leave at 12:15pm for Mount Doom."; Regex r = new Regex(@".*(\d{2}:\d{2}[ap]m)",RegexOptions.IgnoreCase); if (r.Match(s).Success) Console.Write(r.Match(s).Result("$1"));

Visual Basic .NET Example

' Visual Basic .NET example Imports System.Text.RegularExpressions Dim s As String Dim r As Regex s = "We leave at 12:15pm for Mount Doom." r = New Regex(".*(\d{2}:\d{2}[ap]m)", RegexOptions.IgnoreCase) If r.Match(s).Success Then Console.Write(r.Match(s).Result("$1")) End If

Managed C++ Example

// Managed C++ version #using <mscorlib.dll> #include <tchar.h> #using <system.dll> using namespace System; using namespace System::Text; using namespace System::Text::RegularExpressions; String *s = S"We leave at 12:15pm for Mount Doom."; Regex *r = new Regex(".*(\\d{2}:\\d{2}[ap]m)",IgnoreCase); if (r->Match(s)->Success) Console::WriteLine(r->Match(s)->Result(S"$1"));

Note that the same code applies to ASP.NET because ASP.NET is language-neutral.

Regular Expressions in Script

The base JavaScript 1.2 language supports regular expressions by using syntax similar to Perl. Netscape Navigator 4 and later and Microsoft Internet Explorer 4 and later also support regular expressions.

var r = /.*(\d{2}:\d{2}[ap]m)/; var s = "We leave at 12:15pm for Mount Doom."; if (s.match(r)) alert(RegExp.$1);

Regular expressions are also available to developers in Microsoft Visual Basic Scripting Edition (VBScript) version 5 via the RegExp object:

Set r = new RegExp r.Pattern = ".*(\d{2}:\d{2}[ap]m)" r.IgnoreCase = True Set m = r.Execute("We leave at 12:15pm for Mount Doom.") MsgBox m(0).SubMatches(0)

If you plan to use regular expressions in client code, you should use them only to validate client requests to save round-trips; using them is not a security technique.

NOTE
Because ASP uses JScript and VBScript, you can access the regular expressions in these languages from within your Web pages.

Regular Expressions in C++

Now for the difficult language! Not that it is hard to write C++ code; rather, the language has limited class support for regular expressions. If you use the Standard Template Library (STL), an STL-aware class named Regex++ is available at http://www.boost.org. You can read a good article written by the Regex++ author at http://www.ddj.com/documents/s=1486/ddj0110a/0110a.htm.

Microsoft Visual C++, included with Microsoft Visual Studio .NET, includes a lightweight Active Template Library (ATL) regular expression parser template class, CAtlRegExp. Note that the regular expression syntax used by Regex++ and CAtlRegExp are different from the classic syntax some of the less-used operators are missing, and some elements are different. The syntax for CAtlRegExp regular expressions is at http://msdn.microsoft.com/library/en-us/vclib/html/vclrfcatlregexp.asp.

The following is an example of using CAtlRegExp:

#include <AtlRX.h> CAtlRegExp<> re; re.Parse(".*{\\d\\d:\\d\\d[ap]m}",FALSE); CAtlREMatchContext<> mc; if (re.Match("We leave at 12:15pm for Mount Doom.", &mc)) { const CAtlREMatchContext<>::RECHAR* szStart = 0; const CAtlREMatchContext<>::RECHAR* szEnd = 0; mc.GetMatch(0,&szStart, &szEnd); ptrdiff_t nLength = szEnd - szStart; printf("%.*s",nLength, szStart); }



Writing Secure Code
Writing Secure Code, Second Edition
ISBN: 0735617228
EAN: 2147483647
Year: 2001
Pages: 286

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net