Flylib.com

Books Software

 
 
 

How to Use This Book

How to Use This Book

You can read the book from cover to cover if you like, but you might be better served by picking an interesting item from the table of contents and just diving in.

If you're relatively new to Perl or spidering, you should consider starting with a few hacks from each progressive chapter. The principles in Chapter 1 give you a firm grounding on what we'll be doing, while Chapter 2 acquaints you with the various network and parsing modules of Perl. Chapter 3 starts off nice and easy by simply downloading files, and Chapter 4, the bulk of the book, moves you into the more complicated realm of repurposing information.

Conventions Used in This Book

The following is a list of the typographical conventions used in this book:


Italic

Used to indicate new terms, example URLs, filenames, file extensions, directories, program names , and, of course, for emphasis. For example, a path in the filesystem will appear as /Developer/Applications .


Constant width

Used to show code examples, anything that might be typed from the keyboard, the contents of files, and the output from commands.


Constant width italic

Used in examples and tables to show text that should be replaced with your own user -supplied values.


Color

The second color is used to indicate a cross-reference within the text, and occasionally to help keep the author awake during late-night writing binges.

You should pay special attention to notes set apart from the text with the following icons:

This is a tip, suggestion, or general note. It contains useful supplementary information about the topic at hand.


This is a warning or note of caution. When you see one of these, your safety, privacy, or money might be in jeopardy.


The thermometer icons, found next to each hack, indicate the relative complexity of the hack:

figs/beginner.gif
figs/moderate.gif
figs/expert.gif

How to Contact Us

We have tested and verified the information in this book to the best of our ability, but you may find that features have changed (or even that we have made mistakes!). As a reader of this book, you can help us to improve future editions by sending us your feedback. Please let us know about any errors, inaccuracies, bugs , misleading or confusing statements, and typos that you find anywhere in this book.

Please also let us know what we can do to make this book more useful to you. We take your comments seriously and will try to incorporate reasonable suggestions into future editions. You can write to us at:

O'Reilly & Associates, Inc.
1005 Gravenstein Hwy N.
Sebastopol, CA 95472
(800) 998-9938 (in the U.S. or Canada)
(707) 829-0515 (international/local)
(707) 829-0104 (fax)

To ask technical questions or to comment on the book, send email to:

bookquestions@oreilly.com

For more information about this book and others, see the O'Reilly web site:

http://www.oreilly.com

For details about Spidering Hacks , including examples, errata, reviews, and plans for future editions, go to:

http://www.oreilly.com/catalog/spiderhks/

Got a Hack?

To explore Hacks books online or to contribute a hack for future titles, visit:

http://hacks.oreilly.com

Chapter 1. Walking Softly

Hacks #1-7

Hack  1.   A Crash Course in Spidering and Scraping

Hack  2.   Best Practices for You and Your Spider

Hack  3.   Anatomy of an HTML Page

Hack  4.   Registering Your Spider

Hack  5.   Preempting Discovery

Hack  6.   Keeping Your Spider Out of Sticky Situations

Hack  7.   Finding the Patterns of Identifiers