PHP's popularity exploded during the early phases of the late-90s web boom and remains wildly popular today. One reason for this popularity is that even nonengineers can start using its basic features with very little preparation. Yet, despite this approachability, PHP also provides a vast cornucopia of advanced features and functions sure to please the seasoned engineer. PHP supports regular expressions, of course, and does so with no less than three separate, unrelated regex engines.
The three regex engines in PHP are the "preg," "ereg," and "mb_ereg" engines. This book covers the preg suite of functions. It's backed by an NFA engine that is generally superior , in both features and speed, to the other two. ("preg" is normally pronounced "p-reg.")
Reliance on Early Chapters Before looking at what's in this chapter, it's important to emphasize that it relies heavily on the base material in Chapters 1 through 6. Readers interested only in PHP may be inclined to start their reading with this chapter, but I want to encourage them not to miss the benefits of the preface (in particular, the typographical conventions) and the earlier chapters: Chapters 1, 2, and 3 introduce basic concepts, features, and techniques involved with regular expressions, while Chapters 4, 5, and 6 offer important keys to regex understanding that directly apply to PHP's preg engine. Among the important concepts covered in earlier chapters are the base mechanics of how an NFA regex engine goes about attempting a match, greediness , backtracking, and efficiency concerns.
Along those lines, let me emphasize that despite convenient tables such as the one in this chapter on page 441, or, say, ones in earlier chapters such as those on pages 114 and 123, this book's foremost intention is not to be a reference, but a detailed instruction on how to master regular expressions.
This chapter starts with a few words on the history of the preg engine, followed by a look at the regex flavor it provides. Later sections cover in detail the preg function interface, followed by preg-specific efficiency concerns, and finally, extended examples.
preg Background and History The "preg" name comes from the prefix used with all of the interface function names , and stands for "Perl Regular Expressions." This engine was added by Andrei Zmievski, who was frustrated with the limitations of the then-current standard ereg suite. ("ereg" stands for "extended regular expressions," a POSIX-compliant package that is "extended" compared to the most simple regex flavors, but is considered fairly minimalistic by today's standards.)
Andrei created the preg suite by writing an interface to PCRE ("Perl Compatible Regular Expressions"), an excellent NFA-based regular-expression library that closely mimics the regular-expression syntax and semantics of Perl, and provides exactly the power Andrei sought.
Before finding PCRE, Andrei had first looked at the Perl source code to see whether it might be borrowed for use in PHP. He was undoubtedly not the first to examine Perl's regex source code, nor the first to come to the quick realization that it is not for the faint of heart. As powerful and fast as Perl regexes are for the user , the source code itself had been worked and reworked by many people over the years and had become something rather beyond human understanding.
Luckily, Philip Hazel at the University of Cambridge in England had been befuddled by Perl's regex source code as well, so to fulfil his own needs, he created the PCRE library (introduced on page 91). Philip had the luxury of starting from scratch with a known semantics to mimic , and so it was with great relief that several years later Andrei found a well-written , well-documented, high-performance library he could tie in to PHP.
Following Perl's changes over the years, PCRE has itself evolved, and with it, PHP. This book covers PHP Versions 4.4.3 and 5.1.4, both of which incorporate PCRE Version 6.6. [ ]
In case you are not familiar with PHP's version-numbering scheme, note that both the 4.x and 5.x tracks are maintained in parallel, with the 5.x versions being a much-expanded rewrite. Because both are maintained and released independently, it's possible for a 5.x version to contain an older version of PCRE than a more-modern 4.x version.