7.7. Perl Efficiency Issues
For the most part, efficiency with Perl regular expressions is achieved in the same way as with any tool that uses a Traditional NFA. Use the techniques discussed in Chapter 6the internal optimizations, the unrolling methods , the "Think" section all apply to Perl.
There are, of course, Perl-specific issues as well, and in this section, we'll look at the following topics:
7.7.1. "There's More Than One Way to Do It"
There are often many ways to go about solving any particular problem, so there's no substitute for really knowing all that Perl has to offer when balancing efficiency and readability. Let's look at the simple problem of padding an IP address like ' 188.8.131.52 ' such that each of the four parts becomes exactly three digits: ' 018.181.000.024 '. One simple and readable solution is:
$ip = sprintf("%03d.%03d.%03d.%03d", split(/\./, $ip));
This is a fine solution, but there are certainly other ways to do the job. In the interest of comparison, Table 7-6 examines various ways to achieve the same goal, and their relative efficiency (they're listed from the most efficient to the least). This example's goal is simple and not very interesting in and of itself, yet it represents a common text-handling task, so I encourage you to spend some time understanding the various approaches. You may even see some Perl techniques that are new to you.
Each approach produces the same result when given a correct IP address, but fails in different ways if given something else. If there is any chance that the data will be malformed , you'll need more care than any of these solutions provide. That aside, the practical differences lie in efficiency and readability. As for readability, #1 and #13 seem the most straightforward (although it's interesting to see the wide gap in efficiency). Also straightforward are #3 and #4 (similar to #1) and #8 (similar to #13). The rest all suffer from varying degrees of crypticness.
So, what about efficiency? Why are some less efficient than others? It's the interactions among how an NFA works (Chapter 4), Perl's many regex optimizations (Chapter 6), and the speed of other Perl constructs (such as sprintf , and the mechanics of the substitution operator). The substitution operator's /e modifier, while indispensable at times, does seem to be mostly at the bottom of the list.
It's interesting to compare two pairs, #3/#4 and #8/#14 . The two regexes of each pair differ only in their use of parentheses the one without the parentheses is just a bit faster than the one with. But #8's use of $& as a way to avoid parentheses comes at a high cost not shown by these benchmarks (˜ 355).