Recipe 21.5. Avoiding Regular Expressions

21.5.1. Problem

You want to improve script performance by optimizing string-matching operations.

21.5.2. Solution

Replace unnecessary regular expression calls with faster string and character type function alternatives.

21.5.3. Discussion

A common source of unnecessary computation is the use of regular expression functions when they are not needed'for example, if you're validating a form submission for a valid username and want to make sure that the username contains only alphanumeric characters.

A common approach to this problem is a regular expression:

<?php if (!preg_match('/^[a-z0-9]*$/i', $username)) {   echo 'please enter a valid username.'; } ?>

The same test can be performed much faster with the ctype_alnum( ) function.

Using code-timing techniques covered in Recipe 21.1, let's compare the above test with ctype_alnum( ):

<?php $username = 'foo411'; $start = microtime(true); if (!preg_match('/^[a-z0-9]*/i', $username)) {     echo 'please enter a valid username'; } $regextime = microtime(true) - $start; $start = microtime(true); if (!ctype_alnum($username)) {     echo 'please enter a valid username'; } $ctypetime = microtime(true) - $start; echo "preg_match took:  $regextime seconds\n"; echo "ctype_alnum took: $ctypetime seconds\n"; ?>

This will output results similar to:

preg_match took:  0.000163078308105 seconds ctype_alnum took: 9.05990600586E-06 seconds

ctype_alnum( ) is considerably faster; 9.05990600586E-06 is the same as 0.00000906 seconds, which is 18 times faster than the preg_match( ) regular expression, with exactly the same result.

When applied to a complex application, replacing unnecessary regular expressions with equivalent alternatives can add up to a significant performance gain.

A good litmus test when you're coding and need to decide whether or not you need to use a regular expression is whether or not the match you're performing can be explained in a brief sentence. Granted, there are some matches, such as "string is a valid email address," which cannot be adequately verified without a complex regular expression. However, "check if string A contains string B" can be tested with several different approaches, but is ultimately a very simple test that does not require regular expressions:

$haystack = 'The quick brown fox jumps over the lazy dog'; $needle = 'lazy dog'; // slowest if (ereg($needle, $haystack)) echo 'match!'; // slow if (preg_match("/$needle/", $haystack)) echo 'match!'; // fast if (strstr($haystack, $needle)) echo 'match!'; // fastest if (strpos($haystack, $needle) !== false) echo 'match!';

There is certainly a benefit to double-checking the ctype and string functions before making a commitment to a regular expression, particularly if you're working a section of code that will loop repeatedly.

21.5.4. See Also

Documentation on ctype functions at; on string functions at; on regular expression functions at

