Hacks 8-32

Hacks #8-32

The idea behind scraping sites often arises out of pure, immediate, and frantic desire : it's late at night, you've forgotten your son's soccer game for the twelfth time in a row, and you're vowing never to let it happen again. Sure, you could place a bookmark to the school calendar in your browser toolbar, but you want something even more insidious, something you couldn't possibly forget or grow accustomed to seeing.

A bit later, you've got a Perl script that automatically emails you every hour of every day that a game is scheduled. You've just made your life less forgetful, your computer more useful, and your son more loving. This is where spidering and scraping shines: when you've got an itch that can best be scratched by getting your computer involved. And if there's one programming language that can quickly scratch an itch better than any other, it's Perl.

Perl is renowned for "making easy things easier and hard things possible," earning the reputation of "Swiss Army chainsaw," "Internet duct tape," or the ultimate "glue language." Since it's a scripting language (as opposed to a compiled one, like C), rapid development is its modus operandi; throw together bits and pieces from code here and there, try it out, tweak, hem, haw, and deploy. Along with its immense repository of existing code (see CPAN, the Comprehensive Perl Archive Network, at http://www.cpan.org) and the uncanny ability to "do what you mean," it's a perfect language on which to base a spidering hacks book.

In this book, we're going to assume you have a rudimentary knowledge of Perl. You may not be much more than an acolyte, but we're hoping you can create something a little more advanced then "Hello, World." What we're not going to assume, however, is that you've done much, if any, network programming before. We still hear tales of those who have stayed away from Internet programming because they're scared of how difficult it might be.

Trust us, like a lot of things with Perl, it's a lot easier than you think. In this chapter, we'll devote a decent amount of time to getting you up to speed with what you need to know: installing the network access modules for Perl [Hack #8] and then learning how to use them, from the simplest query [Hack #9] on up to progress bars [Hack #18], faking HTTP headers [Hack #11], and so on.



Spidering Hacks
Spidering Hacks
ISBN: 0596005776
EAN: 2147483647
Year: 2005
Pages: 157

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net