10.6. PHP Efficiency IssuesPHP's preg routines use PCRE, an optimized NFA regular-expression engine, so many of the techniques discussed in Chapters 4 through 6 apply directly. This includes benchmarking critical sections of code to understand practically, and not just theoretically, what is fast and what is not. Chapter 6 shows an example of benchmarking in PHP (˜ 234). For particularly time-critical code, remember that a callback is generally faster than using the e pattern modifier (˜ 465), and that named capture with very long strings can result in a lot of extra data copying. Regular expressions are compiled as they're encountered at runtime, but PHP has a huge 4,096-entry cache (˜ 242), so in practice, a particular pattern string is compiled only the first time it is encountered . The S pattern modifier deserves special coverage: it "studies" a regex to try to achieve a faster match. (This is unrelated, by the way, to Perl's study function, which works with target text rather than a regular expression ˜ 359.) 10.6.1. The S Pattern Modifier: "Study"Using the S pattern modifier instructs the preg engine to spend a little extra time [ ] studying the regular expression prior to its application, with the hope that the extra time spent increases match speed enough to justify it. It may well be that no extra speed is achieved by doing this, but in some situations the speedup is measured by orders of magnitudes .
Currently, the situations where study can and can't help are fairly well defined: it enhances what Chapter 6 calls the initial class discrimination optimization (˜ 247). I'll start off first by noting that unless you intend to apply a regex to a lot of text, there's probably not a lot of time to save in the first place. You need to be concerned with the S pattern modifier only when applying the same regex to large chunks of text, or to many small chunks . 10.6.1.1. Standard optimizations, without the S pattern modifierConsider a simple expression such as <(\w+) . Due to the nature of this regex, we know that every match must begin with the ' < ' character. A regex engine can (and in the preg suites case, does) take advantage of that by presearching the target string for ' < ' and applying the full regular expression at those locations only (since a match must begin with < , applying it starting at any other character is pointless). This simple presearch can be much faster than a full regex application, and therein lies the optimization. Particularly, the less frequently the character in question appears in the target text, the greater the optimization. Also, the more work a regex engine must do to detect a first-character failure, the greater the benefit of the optimization. This optimization helps <i></i><b>!</b> more than <(\w+) because in the first case, the regex engine would otherwise have to attempt four different alternatives before moving on to the next attempt. Thats a lot of work to avoid. 10.6.1.2. Enhancing the optimization with the S pattern modifierThe preg engine is smart enough to apply this optimization to most expressions that have only a single character that must start any match, as in the previous examples. However, the S pattern modifier tells the engine to preanalyze the expression to enable this optimization for expressions whose possible matches have multiple starting characters . Here are several sample expressions, some of which we've already seen in this chapter, that require the S pattern modifier to be optimized in this way:
10.6.1.3. When the S pattern modifier can't helpIt's instructive to look at the type of expressions that don't benefit from the S pattern modifier:
10.6.1.4. Suggested useIt doesn't take long for the preg engine to do the extra analysis invoked by the S pattern modifier, so if you'll be applying a regex to relatively large chunks of text, it doesn't hurt to use it. If you think there's any chance it might apply, the potential benefit makes it worthwhile. |