Item 53: Use pack and unpack for data munging.

' 'pack' and 'unpack' for data munging ."-->

Item 53: Use pack and unpack for data munging.

Perl's built-in pack and unpack operators are two of the bigger, sharper blades on the " Swiss Army Chainsaw." [1] Perhaps they were originally intended as a ho-hum means of translating binary data to and from Perl data types like strings and integers, but pack and unpack can be put to more interesting and offbeat uses.

[1] One of the many obliquely complimentary names Perl has been given.

The pack operator works more or less like sprintf . It takes a format string followed by a list of values to be formatted, and returns a string:

 pack("CCCC", 80, 101, 114, 108) 

"Perl" pack 4 unsigned chars.

The unpack operator works the other way:

 unpack("CCCC", "Perl") 

(80, 101, 114, 108)

The pack format string is a list of single-character specifiers that specify the type of data to be packed or unpacked. Here is the current list of specifiers:

Format specifiers for pack and unpack

Format

Description

Example

Result

A

ASCII string, space padded

 pack "A2A3", "Pea", "rl" 
 "Perl " 

a

ASCII string, null padded

 pack "A2A3", "Pea", "rl" 
 "Perl 
 "Perl\0 " 
"

B

bit string, descending order

 pack "B8", "00110000" 
 "0" 

b

bit string, ascending ( vec ) order

 pack "b8", "00001100" 
 "0" 

H

hex string, high nybble first

 pack "H*", "5065726c" 
 "Perl" 

h

hex string, low nybble first

 pack "h2h2h2h2", "05",  "56", "27", "c6" 
 "Perl" 

C

unsigned char

 unpack "C*", "76" 
 255, 1, 2, 254 

c

signed char

 unpack "C*", "76" 
 -1, 1, 2, -2 

S

16-bit unsigned integer

 unpack "S2", "76" 
 65281, 766  [  ]  

s

16-bit signed integer

 unpack "S2", "76" 
 -255, 766  [  ]  

L

32-bit unsigned integer

 unpack "L", "76" 
 4278256382  [  ]  

l

32-bit signed integer

 unpack "L", "76" 
 -16710914  [  ]  

I

"native" unsigned integer, at least 32 bits

 unpack "I", "76" 
 4278256382  [  ]  

i

"native" signed integer, at least 32 bits

 unpack "I", "76" 
 -16710914  [  ]  

N

32-bit integer in "network" (big-endian) order

 unpack "N", "76" 
 4278256382 

n

16-bit integer, network order

 unpack "n2", "76" 
 65281, 766 

V

32-bit integer in "VAX" (little-endian) order

 unpack "V*", "76" 
 4261544447 

v

16-bit integer, VAX order

 unpack "v2", "76" 
 511, 65026 

u

uuencoded string

 unpack "u*", '&5R;```' 
 "Perl" 

w

BER (Basic Encoding Rules) encoded integer

 unpack "ww", "777" 
 127, 16383 

X

back up 1 byte

 pack "A4XXA2", "Peat", "rl" 
 "Perl" 

x

null byte

 unpack "L", pack("Cxxx", 1) 
 16777216  [  ]  

@

null fill to absolute position

 unpack "H*", pack('@3C', 1) 
 "00000001" 

[ ] Depends on platform endian-nessthis table was constructed on a big-endian machine.

Each specifier may be followed by a repeat count indicating how many values from the list to format. The repeat counts for the string specifiers ( A , a , B , b , H , and h ) are specialthey indicate how many bytes/bits/nybbles to add to the output string. An asterisk used as a repeat count means to use the specifier preceding the asterisk for all the remaining items.

The unpack operator also can compute checksums. Just precede a specifier with a percent sign and a number indicating how many bits of check-sum are desired. The extracted items then are checksummed together into a single item:

 unpack "c4", ""; 
 1, 2, 3, 4 
 unpack "%16c4", ""; 
 10 
 unpack "%3c4", ""; 
 2 

Sorting with pack

Suppose that you have a list of numeric Internet addressesin string formto sort , something like:

 11.22.33.44  1.3.5.7  23.34.45.56 

You would like to have them in "numeric" order. That is, the list should be sorted on the numeric value of the first number, then subsorted on the second, then the third, and finally the fourth. As usual, if you try to sort a list like this ASCIIbetically, the results are in the wrong order (see Item 14). Sorting numerically won't work either, because that would only sort on the first number in each string. Using pack provides a pretty good solution:

 @sorted_addr =    sort { pack('C*', split /\./, $a) cmp           pack('C*', split /\./, $b) } @addr; 

For efficiency, this definitely should be rewritten as a Schwartzian Transform (see Item 14):

 @sorted_addr =    map { $_->[0] }    sort { $a->[1] cmp $b->[1] }    map { [$_, pack('C*', split /\./)] }    @addr; 

Notice that the comparison operator used in the sort is cmp , not <=> . The pack function is converting a list of numbers (e.g., 11, 22, 33, 44 ) into a 4-byte string ( "\x0b\x16\x21\x2c" ). Comparing these strings ASCIIbetically produces the proper sorting order. Of course, you could also use Socket and write:

 @sorted_addr =    map { $_->[0] }    sort { $a->[1] cmp $b->[1] }    map { [$_, inet_aton($_)] }    @addr; 

but obviously pack provides a more general capability.

Manipulating hex escapes

Because pack and unpack understand hexadecimal strings, they can be useful in manipulating strings containing hex escapes and the like.

For example, suppose you are programming for the World Wide Web and would like to "URI unescape" unsafe characters in a string. To URI unescape a string, you need to replace each occurrence of an escapea percent sign followed by two hex digitswith the corresponding character. For example, "a%5eb" would be decoded to yield "a^b" . You can write a Perl substitution to do this in one line:

 $_ = "a%5eb";  s/%([0-9a-fA-F]{2})/pack("c",hex())/ge; 

This particular snippet is widespread in some older handrolled CGI scripts. However, it's somewhat obscure looking, and as is the case for many commonly performed tasks in Perl, there is a module designed specifically for the job:

 use URI::Escape;  $_ = uri_unescape "a%5eb"; 

UUencoding/decoding

Have you ever tried to write a program to uudecode a file? It's easy in Perl, thanks to the uuencode/decode support built into pack and unpack :

A uudecode program

 while (<>) {    last if ($mode, $filename) =        /^begin\s+(\d+)\s+(\S+)/i;  } 

Skip to the start of the uuencoded data.

 if ($mode) {    open F, ">$filename" or      die "couldn't open $filename: $!\n";    chmod oct($mode), $filename or      die "couldn't set mode: $!\n";    print "$mode $filename\n";    while (<>) {      last if (/^(`end)/i);      print F unpack('u*', $_);    }  } 

Assuming we got started: Create output file.

Set the mode.

Read a line of data, uudecode it, print it, until done.



Effective Perl Programming. Writing Better Programs with Perl
Effective Perl Programming: Writing Better Programs with Perl
ISBN: 0201419750
EAN: 2147483647
Year: 1996
Pages: 116

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net