Hack 10 More Involved
|
Hack 11 Adding HTTP Headers to Your Request
Add more functionality to your programs, or
The most commonly used syntax for
LWP::UserAgent
$response = $browser->get( $url, $key1, $value1, $key2, $value2, ... ); Why is adding HTTP headers sometimes necessary? It really depends on the site that you're pulling data from; some will respond only to actions that appear to come from common end-user browsers, such as Internet Explorer, Netscape, Mozilla, or Safari. Others, in a desperate attempt to minimize bandwidth costs, will send only compressed data [Hack #16], requiring decoding on the client end. All these client necessities can be enabled through the use of HTTP headers. For example, here's how to send more Netscape-like headers:
my @ns_headers = (
'User-Agent' => 'Mozilla/4.76 [en] (Win98; U)',
'Accept' => 'image/gif, image/x-xbitmap, image/jpeg,
image/pjpeg, image/png, */*',
'Accept-Charset' => 'iso-8859-1,*',
'Accept-Language' => 'en-US',
);
$response = $browser->get($url, @ns_headers);
Or, alternatively, without the interim array:
$response = $browser->get($url,
'User-Agent' => 'Mozilla/4.76 [en] (Win98; U)',
'Accept' => 'image/gif, image/x-xbitmap, image/jpeg,
image/pjpeg, image/png, */*',
'Accept-Charset' => 'iso-8859-1,*',
'Accept-Language' => 'en-US',
);
In these headers, you're telling the remote server which types
of data you're willing to
Accept
and in what order: GIFs,
bitmaps, JPEGs, PNGs, and then anything else (you'd rather have a
GIF first, but an HTML file is fine if the server can't provide the
data in your preferred formats). For servers that cater to
international users by offering translated documents, the
Accept-Language
and
Accept-Charset
headers give
you the ability to choose what
If you were only going to change the User-Agent , you could just modify the $browser object's default line from libwww-perl/5.65 (or the like) to whatever you wish, using LWP::UserAgent 's agent method (in this case, for Netscape 4.76):
$browser->agent('Mozilla/4.76 [en] (Win98; U)');
Here's a short list of common
User-Agent
s you might
wish to mimic; all perform quite
Mozilla/4.0 (compatible; MSIE 5.22; Mac_PowerPC) Mozilla/4.0 (compatible; MSIE 6.0; Windows 98) Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.3) Gecko/20030312
Some sites prefer you
$response = $browser->get($url, 'Referer' => 'http://site.com/url.html'); Just goes to show you that relying upon a certain Referer or specific User-Agent is no security worth considering for your own site and resources. Sean Burke and Kevin Hemenway |