Recipe 1.18 Escaping Characters

1.18.1 Problem

You need to output a string with certain characters (quotes, commas, etc.) escaped. For instance, you're producing a format string for sprintf and want to convert literal % signs into %%.

1.18.2 Solution

Use a substitution to backslash or double each character to be escaped:

# backslash $var =~ s/([CHARLIST])/\\$1/g; # double $var =~ s/([CHARLIST])/$1$1/g;

1.18.3 Discussion

$var is the variable to be altered. The CHARLIST is a list of characters to escape and can contain backslash escapes like \t and \n. If you just have one character to escape, omit the brackets:

$string =~ s/%/%%/g;

The following code lets you do escaping when preparing strings to submit to the shell. (In practice, you would need to escape more than just ' and " to make any arbitrary string safe for the shell. Getting the list of characters right is so hard, and the risks if you get it wrong are so great, that you're better off using the list form of system and exec to run programs, shown in Recipe 16.2. They avoid the shell altogether.)

$string = q(Mom said, "Don't do that."); $string =~ s/(['"])/\\$1/g;

We had to use two backslashes in the replacement because the replacement section of a substitution is read as a double-quoted string, and to get one backslash, you need to write two. Here's a similar example for VMS DCL, where you need to double every quote to get one through:

$string = q(Mom said, "Don't do that."); $string =~ s/(['"])/$1$1/g;

Microsoft command interpreters are harder to work with. In Windows, COMMAND.COM recognizes double quotes but not single ones, disregards backquotes for running commands, and requires a backslash to make a double quote into a literal. Any of the many free or commercial Unix-like shell environments available for Windows will work just fine, though.

Because we're using character classes in the regular expressions, we can use - to define a range and ^ at the start to negate. This escapes all characters that aren't in the range A through Z.

$string =~ s/([^A-Z])/\\$1/g;

In practice, you wouldn't want to do that, since it would pick up a lowercase "a" and turn it into "\a", for example, which is ASCII BEL character. (Usually when you mean non-alphabetic characters, \PL works better.)

If you want to escape all non-word characters, use the \Q and \E string metacharacters or the quotemeta function. For example, these are equivalent:

$string = "this \Qis a test!\E"; $string = "this is\\ a\\ test\\!"; $string = "this " . quotemeta("is a test!");

1.18.4 See Also

The s/// operator in perlre(1) and perlop(1) and Chapter 5 of Programming Perl; the quotemeta function in perlfunc(1) and Chapter 29 of Programming Perl; the discussion of HTML escaping in Recipe 19.1; Recipe 19.5 for how to avoid having to escape strings to give the shell



Perl Cookbook
Perl Cookbook, Second Edition
ISBN: 0596003137
EAN: 2147483647
Year: 2003
Pages: 501

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net