TechniquePHP provides a set of functions for performing pattern matching using Perl-compatible regular expressions syntax via the Perl Compatible Regular Expressions Library (PCRE) included with PHP. <?php // Non-Whitespace Character, Whitespace Character, Non-Whitespace Character preg_match ('/(\S+)\s+(\S+)/', $line, $match); // Match all occurrences of Non-Whitespace Character, Whitespace Character // Non-Whitespace character, same as using the /g modifier in Perl preg_match_all ('/(\S+)\s+(\S+)/', $file, $match); // split $line into an array of characters $chars = preg_split ('//', $line); // Remove all whitespace elements of $chars $chars = preg_grep ('/^\S+$/', $chars); // Quote unquoted regular expression meta-characters $input = preg_quote ($input); ?> CommentsThe Perl Compatible Regular Expressions Library was added in PHP version 3.0.9. When PCRE came out, it was like comfort food for me. (One of my favorite parts of Perl is its regular expression support; the other is the sort () function.) But the Perl-compatible regular expression library is not only a comfort food to converted Perl programmers, but it is also much faster and a more powerful regular expression library. (It still lacks some of the power that Perl contains, but that is discussed in the next chapter.) Perl-compatible regular expressions are too large a topic for this book to cover. For a full discussion, I suggest O'Reilly's Mastering Regular Expressions. What we will do in this recipe is create an actual program in PHP using Perl-compatible regular expressions. Mini-ProblemIn the beginning, there was UNIX. UNIX begat the Internet, which begat the email system, which begat a program called uucp ”an acronym for UNIX-to-UNIX copy. uucp used 7-bit characters, perfectly suited for its original purpose of email messages and Usenet news browsing, but woefully inadequate for handling binary files that used a full 8-bit character set. To work around this, resourceful programmers created UUencoding, which encoded 8-bit character sets to a 7-bit character set so that they could be sent over email. All modern UNIX systems come with the necessary encoding and decoding procedures. But what if you don't have UNIX decoding? Mini-TechniqueYou can use PHP to convert the document back to its original format. This script combines imap8_bit with the PCRE to parse a uuencoded message provided by the user . (PHP must be installed in CGI format.) <?php if (!isset ($argv[0])) { die ("You must specify an Infile"); } $valid = false; $fp = @fopen ($argv[0], "r") or die ("Cannot Open Uuencoded file"); while ($line = @fgets ($fp, 1024)) { if (preg_match ("/^begin\s+(\d+)\s+(.*?)/", $line, $match)) { $filemode = $match[1]; $writefile = $match[2]; $valid = true; break; } } if (!$valid) { die ('Not a uuencoded file'); } $fpw = @fopen ($writefile, 'w') or die ("Cannot Open $writefile"); while ($line = @fgets ($fp, 1024)) { $decoded = imap_8bit ($line); if ($decoded == "<?php if (!isset ($argv[0])) { die ("You must specify an Infile"); } $valid = false; $fp = @fopen ($argv[0], "r") or die ("Cannot Open Uuencoded file"); while ($line = @fgets ($fp, 1024)) { if (preg_match ("/^begin\s+(\d+)\s+(.*?)/", $line, $match)) { $filemode = $match[1]; $writefile = $match[2]; $valid = true; break; } } if (!$valid) { die ('Not a uuencoded file'); } $fpw = @fopen ($writefile, 'w') or die ("Cannot Open $writefile"); while ($line = @fgets ($fp, 1024)) { $decoded = imap_8bit ($line); if ($decoded == "\0") { die ("Invalid line"); } fputs ($fpw, $decoded); if (preg_match ("/^end$/", $decoded)) { break; } } fclose($fpw); fclose($fp); chmod (oct ($filemode), $writefile); ?>") { die ("Invalid line"); } fputs ($fpw, $decoded); if (preg_match ("/^end$/", $decoded)) { break; } } fclose($fpw); fclose($fp); chmod (oct ($filemode), $writefile); ?> |