5.2 Using Perl-Compatible Regular Expressions in PHP


You want to use Perl-compatible regular expressions within PHP, either because you like them more or because you need their added functionality and speed. (I normally use them because I like their syntax so much more.)

Technique

PHP provides a set of functions for performing pattern matching using Perl-compatible regular expressions syntax via the Perl Compatible Regular Expressions Library (PCRE) included with PHP.

 <?php // Non-Whitespace Character, Whitespace Character, Non-Whitespace Character preg_match ('/(\S+)\s+(\S+)/', $line, $match); // Match all occurrences of Non-Whitespace Character, Whitespace Character // Non-Whitespace character, same as using the /g modifier in Perl preg_match_all ('/(\S+)\s+(\S+)/', $file, $match); // split $line into an array of characters $chars = preg_split ('//', $line); // Remove all whitespace elements of $chars $chars = preg_grep ('/^\S+$/', $chars); // Quote unquoted regular expression meta-characters $input = preg_quote ($input); ?> 

Comments

The Perl Compatible Regular Expressions Library was added in PHP version 3.0.9. When PCRE came out, it was like comfort food for me. (One of my favorite parts of Perl is its regular expression support; the other is the sort () function.) But the Perl-compatible regular expression library is not only a comfort food to converted Perl programmers, but it is also much faster and a more powerful regular expression library. (It still lacks some of the power that Perl contains, but that is discussed in the next chapter.)

Perl-compatible regular expressions are too large a topic for this book to cover. For a full discussion, I suggest O'Reilly's Mastering Regular Expressions. What we will do in this recipe is create an actual program in PHP using Perl-compatible regular expressions.

Mini-Problem

In the beginning, there was UNIX. UNIX begat the Internet, which begat the email system, which begat a program called uucp ”an acronym for UNIX-to-UNIX copy. uucp used 7-bit characters, perfectly suited for its original purpose of email messages and Usenet news browsing, but woefully inadequate for handling binary files that used a full 8-bit character set. To work around this, resourceful programmers created UUencoding, which encoded 8-bit character sets to a 7-bit character set so that they could be sent over email. All modern UNIX systems come with the necessary encoding and decoding procedures. But what if you don't have UNIX decoding?

Mini-Technique

You can use PHP to convert the document back to its original format. This script combines imap8_bit with the PCRE to parse a uuencoded message provided by the user . (PHP must be installed in CGI format.)

 <?php if (!isset ($argv[0])) {     die ("You must specify an Infile"); } $valid = false; $fp = @fopen ($argv[0], "r") or               die ("Cannot Open Uuencoded file"); while ($line = @fgets ($fp, 1024)) {     if (preg_match ("/^begin\s+(\d+)\s+(.*?)/", $line, $match)) {         $filemode  = $match[1];         $writefile = $match[2];         $valid = true;         break;     } } if (!$valid) {     die ('Not a uuencoded file'); } $fpw = @fopen ($writefile, 'w') or                 die ("Cannot Open $writefile"); while ($line = @fgets ($fp, 1024)) {     $decoded = imap_8bit ($line);     if ($decoded == " 
 <?php if (!isset ($argv[0])) { die ("You must specify an Infile"); } $valid = false; $fp = @fopen ($argv[0], "r") or die ("Cannot Open Uuencoded file"); while ($line = @fgets ($fp, 1024)) { if (preg_match ("/^begin\s+(\d+)\s+(.*?)/", $line, $match)) { $filemode = $match[1]; $writefile = $match[2]; $valid = true; break; } } if (!$valid) { die ('Not a uuencoded file'); } $fpw = @fopen ($writefile, 'w') or die ("Cannot Open $writefile"); while ($line = @fgets ($fp, 1024)) { $decoded = imap_8bit ($line); if ($decoded == "\0") { die ("Invalid line"); } fputs ($fpw, $decoded); if (preg_match ("/^end$/", $decoded)) { break; } } fclose($fpw); fclose($fp); chmod (oct ($filemode), $writefile); ?> 
") { die ("Invalid line"); } fputs ($fpw, $decoded); if (preg_match ("/^end$/", $decoded)) { break; } } fclose($fpw); fclose($fp); chmod (oct ($filemode), $writefile); ?>


PHP Developer's Cookbook
PHP Developers Cookbook (2nd Edition)
ISBN: 0672323257
EAN: 2147483647
Year: 2000
Pages: 351

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net