Item 53: Use pack and unpack for data munging.Perl's built-in pack and unpack operators are two of the bigger, sharper blades on the " Swiss Army Chainsaw." [1] Perhaps they were originally intended as a ho-hum means of translating binary data to and from Perl data types like strings and integers, but pack and unpack can be put to more interesting and offbeat uses.
The pack operator works more or less like sprintf . It takes a format string followed by a list of values to be formatted, and returns a string:
The unpack operator works the other way:
The pack format string is a list of single-character specifiers that specify the type of data to be packed or unpacked. Here is the current list of specifiers:
Each specifier may be followed by a repeat count indicating how many values from the list to format. The repeat counts for the string specifiers ( A , a , B , b , H , and h ) are specialthey indicate how many bytes/bits/nybbles to add to the output string. An asterisk used as a repeat count means to use the specifier preceding the asterisk for all the remaining items. The unpack operator also can compute checksums. Just precede a specifier with a percent sign and a number indicating how many bits of check-sum are desired. The extracted items then are checksummed together into a single item:
Sorting with packSuppose that you have a list of numeric Internet addressesin string formto sort , something like: 11.22.33.44 1.3.5.7 23.34.45.56 You would like to have them in "numeric" order. That is, the list should be sorted on the numeric value of the first number, then subsorted on the second, then the third, and finally the fourth. As usual, if you try to sort a list like this ASCIIbetically, the results are in the wrong order (see Item 14). Sorting numerically won't work either, because that would only sort on the first number in each string. Using pack provides a pretty good solution: @sorted_addr = sort { pack('C*', split /\./, $a) cmp pack('C*', split /\./, $b) } @addr; For efficiency, this definitely should be rewritten as a Schwartzian Transform (see Item 14): @sorted_addr = map { $_->[0] } sort { $a->[1] cmp $b->[1] } map { [$_, pack('C*', split /\./)] } @addr; Notice that the comparison operator used in the sort is cmp , not <=> . The pack function is converting a list of numbers (e.g., 11, 22, 33, 44 ) into a 4-byte string ( "\x0b\x16\x21\x2c" ). Comparing these strings ASCIIbetically produces the proper sorting order. Of course, you could also use Socket and write: @sorted_addr = map { $_->[0] } sort { $a->[1] cmp $b->[1] } map { [$_, inet_aton($_)] } @addr; but obviously pack provides a more general capability. Manipulating hex escapesBecause pack and unpack understand hexadecimal strings, they can be useful in manipulating strings containing hex escapes and the like. For example, suppose you are programming for the World Wide Web and would like to "URI unescape" unsafe characters in a string. To URI unescape a string, you need to replace each occurrence of an escapea percent sign followed by two hex digitswith the corresponding character. For example, "a%5eb" would be decoded to yield "a^b" . You can write a Perl substitution to do this in one line: $_ = "a%5eb"; s/%([0-9a-fA-F]{2})/pack("c",hex())/ge; This particular snippet is widespread in some older handrolled CGI scripts. However, it's somewhat obscure looking, and as is the case for many commonly performed tasks in Perl, there is a module designed specifically for the job: use URI::Escape; $_ = uri_unescape "a%5eb"; UUencoding/decodingHave you ever tried to write a program to uudecode a file? It's easy in Perl, thanks to the uuencode/decode support built into pack and unpack : A uudecode program
|