Flylib.com
Spidering Hacks
Spidering Hacks
ISBN: 0596005776
EAN: 2147483647
Year: 2005
Pages: 157
Authors:
Kevin Hemenway
,
Tara Calishain
BUY ON AMAZON
Main Page
Table of content
Copyright
Credits
About the Authors
Contributors
Preface
Why Spidering Hacks?
How This Book Is Organized
How to Use This Book
Conventions Used in This Book
How to Contact Us
Got a Hack?
Chapter 1. Walking Softly
Hacks 1-7
Hack 1 A Crash Course in Spidering and Scraping
Hack 2 Best Practices for You and Your Spider
Hack 3 Anatomy of an HTML Page
Hack 4 Registering Your Spider
Hack 5 Preempting Discovery
Hack 6 Keeping Your Spider Out of Sticky Situations
Hack 7 Finding the Patterns of Identifiers
Chapter 2. Assembling a Toolbox
Hacks 8-32
Perl Modules
Resources You May Find Helpful
Hack 8 Installing Perl Modules
Hack 9 Simply Fetching with LWP::Simple
Hack 10 More Involved Requests with LWP::UserAgent
Hack 11 Adding HTTP Headers to Your Request
Hack 12 Posting Form Data with LWP
Hack 13 Authentication, Cookies, and Proxies
Hack 14 Handling Relative and Absolute URLs
Hack 15 Secured Access and Browser Attributes
Hack 16 Respecting Your Scrapee s Bandwidth
Hack 17 Respecting robots.txt
Hack 18 Adding Progress Bars to Your Scripts
Hack 19 Scraping with HTML::TreeBuilder
Hack 20 Parsing with HTML::TokeParser
Hack 21 WWW::Mechanize 101
Hack 22 Scraping with WWW::Mechanize
Hack 23 In Praise of Regular Expressions
Hack 24 Painless RSS with Template::Extract
Hack 25 A Quick Introduction to XPath
Hack 26 Downloading with curl and wget
Hack 27 More Advanced wget Techniques
Hack 28 Using Pipes to Chain Commands
Hack 29 Running Multiple Utilities at Once
Hack 30 Utilizing the Web Scraping Proxy
Hack 31 Being Warned When Things Go Wrong
Hack 32 Being Adaptive to Site Redesigns
Chapter 3. Collecting Media Files
Hacks 33-42
Hack 33 Detective Case Study: Newgrounds
Hack 34 Detective Case Study: iFilm
Hack 35 Downloading Movies from the Library of Congress
Hack 36 Downloading Images from Webshots
Hack 37 Downloading Comics with dailystrips
Hack 38 Archiving Your Favorite Webcams
Hack 39 News Wallpaper for Your Site
Hack 40 Saving Only POP3 Email Attachments
Hack 41 Downloading MP3s from a Playlist
Hack 42 Downloading from Usenet with nget
Chapter 4. Gleaning Data from Databases
Hacks 43-89
Hack 43 Archiving Yahoo Groups Messages with yahoo2mbox
Hack 44 Archiving Yahoo Groups Messages with WWW::Yahoo::Groups
Hack 45 Gleaning Buzz from Yahoo
Hack 46 Spidering the Yahoo Catalog
Hack 47 Tracking Additions to Yahoo
Hack 48 Scattersearch with Yahoo and Google
Hack 49 Yahoo Directory Mindshare in Google
Hack 50 Weblog-Free Google Results
Hack 51 Spidering, Google, and Multiple Domains
Hack 52 Scraping Amazon.com Product Reviews
Hack 53 Receive an Email Alert for Newly Added Amazon.com Reviews
Hack 54 Scraping Amazon.com Customer Advice
Hack 55 Publishing Amazon.com Associates Statistics
Hack 56 Sorting Amazon.com Recommendations by Rating
Hack 57 Related Amazon.com Products with Alexa
Hack 58 Scraping Alexa s Competitive Data with Java
Hack 59 Finding Album Information with FreeDB and Amazon.com
Hack 60 Expanding Your Musical Tastes
Hack 61 Saving Daily Horoscopes to Your iPod
Hack 62 Graphing Data with RRDTOOL
Hack 63 Stocking Up on Financial Quotes
Hack 64 Super Author Searching
Hack 65 Mapping O Reilly Best Sellers to Library Popularity
Hack 66 Using All Consuming to Get Book Lists
Hack 67 Tracking Packages with FedEx
Hack 68 Checking Blogs for New Comments
Hack 69 Aggregating RSS and Posting Changes
Hack 70 Using the Link Cosmos of Technorati
Hack 71 Finding Related RSS Feeds
Hack 72 Automatically Finding Blogs of Interest
Hack 73 Scraping TV Listings
Hack 74 What s Your Visitor s Weather Like?
Hack 75 Trendspotting with Geotargeting
Hack 76 Getting the Best Travel Route by Train
Hack 77 Geographic Distance and Back Again
Hack 78 Super Word Lookup
Hack 79 Word Associations with Lexical Freenet
Hack 80 Reformatting Bugtraq Reports
Hack 81 Keeping Tabs on the Web via Email
Hack 82 Publish IE s Favorites to Your Web Site
Hack 83 Spidering GameStop.com Game Prices
Hack 84 Bargain Hunting with PHP
Hack 85 Aggregating Multiple Search Engine Results
Hack 86 Robot Karaoke
Hack 87 Searching the Better Business Bureau
Hack 88 Searching for Health Inspections
Hack 89 Filtering for the Naughties
Chapter 5. Maintaining Your Collections
Hacks 90-93
Hack 90 Using cron to Automate Tasks
Hack 91 Scheduling Tasks Without cron
Hack 92 Mirroring Web Sites with wget and rsync
Hack 93 Accumulating Search Results Over Time
Chapter 6. Giving Back to the World
Hacks 94-100
Hack 94 Using XML::RSS to Repurpose Data
Hack 95 Placing RSS Headlines on Your Site
Hack 96 Making Your Resources Scrapable with Regular Expressions
Hack 97 Making Your Resources Scrapable with a REST Interface
Hack 98 Making Your Resources Scrapable with XML-RPC
Hack 99 Creating an IM Interface
Hack 100 Going Beyond the Book
Colophon
Index
Index SYMBOL
Index A
Index B
Index C
Index D
Index E
Index F
Index G
Index H
Index I
Index J
Index K
Index L
Index M
Index N
Index O
Index P
Index Q
Index R
Index S
Index T
Index U
Index V
Index W
Index X
Index Y
Spidering Hacks
ISBN: 0596005776
EAN: 2147483647
Year: 2005
Pages: 157
Authors:
Kevin Hemenway
,
Tara Calishain
BUY ON AMAZON
Introducing Microsoft Office InfoPath 2003 (Bpg-Other)
Creating Forms
Adding Basic Controls and Lists
Adding Views to a Template
Publishing Form Templates
Navigating the InfoPath Object Model
Cisco IP Communications Express: CallManager Express with Cisco Unity Express
The Purpose of Cisco IPC Express
The Cisco 7920 Wireless IP Phone
Extensive Markup Language Applications
Cisco Zero Touch Deployment
Missing Transfer Patterns
Systematic Software Testing (Artech House Computer Library)
Risk Analysis
The Software Tester
Improving the Testing Process
Appendix B Testing Survey
Appendix D Sample Master Test Plan
Cisco IOS in a Nutshell (In a Nutshell (OReilly))
Naming and Numbering Interfaces
Asynchronous Interfaces
Route Filtering
Switch Terminology
trace
Introducing Microsoft ASP.NET AJAX (Pro - Developer)
The AJAX Revolution
The Microsoft Client Library for AJAX
Partial Page Rendering
The AJAX Control Toolkit
Built-in Application Services
Understanding Digital Signal Processing (2nd Edition)
INVERSE DFT
HALF-BAND FIR FILTERS
FIXED-POINT BINARY FORMATS
INTERPOLATING A BANDPASS SIGNAL
COMPUTING FFT TWIDDLE FACTORS
flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net
Privacy policy
This website uses cookies. Click
here
to find out more.
Accept cookies