Flylib.com
Spidering Hacks
Spidering Hacks
ISBN: 0596005776
EAN: 2147483647
Year: 2005
Pages: 157
Authors:
Kevin Hemenway
,
Tara Calishain
BUY ON AMAZON
Main Page
Table of content
Copyright
Credits
About the Authors
Contributors
Preface
Why Spidering Hacks?
How This Book Is Organized
How to Use This Book
Conventions Used in This Book
How to Contact Us
Got a Hack?
Chapter 1. Walking Softly
Hacks 1-7
Hack 1 A Crash Course in Spidering and Scraping
Hack 2 Best Practices for You and Your Spider
Hack 3 Anatomy of an HTML Page
Hack 4 Registering Your Spider
Hack 5 Preempting Discovery
Hack 6 Keeping Your Spider Out of Sticky Situations
Hack 7 Finding the Patterns of Identifiers
Chapter 2. Assembling a Toolbox
Hacks 8-32
Perl Modules
Resources You May Find Helpful
Hack 8 Installing Perl Modules
Hack 9 Simply Fetching with LWP::Simple
Hack 10 More Involved Requests with LWP::UserAgent
Hack 11 Adding HTTP Headers to Your Request
Hack 12 Posting Form Data with LWP
Hack 13 Authentication, Cookies, and Proxies
Hack 14 Handling Relative and Absolute URLs
Hack 15 Secured Access and Browser Attributes
Hack 16 Respecting Your Scrapee s Bandwidth
Hack 17 Respecting robots.txt
Hack 18 Adding Progress Bars to Your Scripts
Hack 19 Scraping with HTML::TreeBuilder
Hack 20 Parsing with HTML::TokeParser
Hack 21 WWW::Mechanize 101
Hack 22 Scraping with WWW::Mechanize
Hack 23 In Praise of Regular Expressions
Hack 24 Painless RSS with Template::Extract
Hack 25 A Quick Introduction to XPath
Hack 26 Downloading with curl and wget
Hack 27 More Advanced wget Techniques
Hack 28 Using Pipes to Chain Commands
Hack 29 Running Multiple Utilities at Once
Hack 30 Utilizing the Web Scraping Proxy
Hack 31 Being Warned When Things Go Wrong
Hack 32 Being Adaptive to Site Redesigns
Chapter 3. Collecting Media Files
Hacks 33-42
Hack 33 Detective Case Study: Newgrounds
Hack 34 Detective Case Study: iFilm
Hack 35 Downloading Movies from the Library of Congress
Hack 36 Downloading Images from Webshots
Hack 37 Downloading Comics with dailystrips
Hack 38 Archiving Your Favorite Webcams
Hack 39 News Wallpaper for Your Site
Hack 40 Saving Only POP3 Email Attachments
Hack 41 Downloading MP3s from a Playlist
Hack 42 Downloading from Usenet with nget
Chapter 4. Gleaning Data from Databases
Hacks 43-89
Hack 43 Archiving Yahoo Groups Messages with yahoo2mbox
Hack 44 Archiving Yahoo Groups Messages with WWW::Yahoo::Groups
Hack 45 Gleaning Buzz from Yahoo
Hack 46 Spidering the Yahoo Catalog
Hack 47 Tracking Additions to Yahoo
Hack 48 Scattersearch with Yahoo and Google
Hack 49 Yahoo Directory Mindshare in Google
Hack 50 Weblog-Free Google Results
Hack 51 Spidering, Google, and Multiple Domains
Hack 52 Scraping Amazon.com Product Reviews
Hack 53 Receive an Email Alert for Newly Added Amazon.com Reviews
Hack 54 Scraping Amazon.com Customer Advice
Hack 55 Publishing Amazon.com Associates Statistics
Hack 56 Sorting Amazon.com Recommendations by Rating
Hack 57 Related Amazon.com Products with Alexa
Hack 58 Scraping Alexa s Competitive Data with Java
Hack 59 Finding Album Information with FreeDB and Amazon.com
Hack 60 Expanding Your Musical Tastes
Hack 61 Saving Daily Horoscopes to Your iPod
Hack 62 Graphing Data with RRDTOOL
Hack 63 Stocking Up on Financial Quotes
Hack 64 Super Author Searching
Hack 65 Mapping O Reilly Best Sellers to Library Popularity
Hack 66 Using All Consuming to Get Book Lists
Hack 67 Tracking Packages with FedEx
Hack 68 Checking Blogs for New Comments
Hack 69 Aggregating RSS and Posting Changes
Hack 70 Using the Link Cosmos of Technorati
Hack 71 Finding Related RSS Feeds
Hack 72 Automatically Finding Blogs of Interest
Hack 73 Scraping TV Listings
Hack 74 What s Your Visitor s Weather Like?
Hack 75 Trendspotting with Geotargeting
Hack 76 Getting the Best Travel Route by Train
Hack 77 Geographic Distance and Back Again
Hack 78 Super Word Lookup
Hack 79 Word Associations with Lexical Freenet
Hack 80 Reformatting Bugtraq Reports
Hack 81 Keeping Tabs on the Web via Email
Hack 82 Publish IE s Favorites to Your Web Site
Hack 83 Spidering GameStop.com Game Prices
Hack 84 Bargain Hunting with PHP
Hack 85 Aggregating Multiple Search Engine Results
Hack 86 Robot Karaoke
Hack 87 Searching the Better Business Bureau
Hack 88 Searching for Health Inspections
Hack 89 Filtering for the Naughties
Chapter 5. Maintaining Your Collections
Hacks 90-93
Hack 90 Using cron to Automate Tasks
Hack 91 Scheduling Tasks Without cron
Hack 92 Mirroring Web Sites with wget and rsync
Hack 93 Accumulating Search Results Over Time
Chapter 6. Giving Back to the World
Hacks 94-100
Hack 94 Using XML::RSS to Repurpose Data
Hack 95 Placing RSS Headlines on Your Site
Hack 96 Making Your Resources Scrapable with Regular Expressions
Hack 97 Making Your Resources Scrapable with a REST Interface
Hack 98 Making Your Resources Scrapable with XML-RPC
Hack 99 Creating an IM Interface
Hack 100 Going Beyond the Book
Colophon
Index
Index SYMBOL
Index A
Index B
Index C
Index D
Index E
Index F
Index G
Index H
Index I
Index J
Index K
Index L
Index M
Index N
Index O
Index P
Index Q
Index R
Index S
Index T
Index U
Index V
Index W
Index X
Index Y
Spidering Hacks
ISBN: 0596005776
EAN: 2147483647
Year: 2005
Pages: 157
Authors:
Kevin Hemenway
,
Tara Calishain
BUY ON AMAZON
Similar book on Amazon
Agile Project Management: Creating Innovative Products (2nd Edition)
Webbots, Spiders, and Screen Scrapers: A Guide to Developing Internet Agents with PHP/CURL
Mining the Social Web: Analyzing Data from Facebook, Twitter, LinkedIn, and Other Social Media Sites
Google Hacks: Tips & Tools for Finding and Using the World's Information
Database Modeling with MicrosoftВ® Visio for Enterprise Architects (The Morgan Kaufmann Series in Data Management Systems)
ERP and Data Warehousing in Organizations: Issues and Challenges
Challenging the Unpredictable: Changeable Order Management Systems
The Effects of an Enterprise Resource Planning System (ERP) Implementation on Job Characteristics – A Study using the Hackman and Oldham Job Characteristics Model
Context Management of ERP Processes in Virtual Communities
Healthcare Information: From Administrative to Practice Databases
A Hybrid Clustering Technique to Improve Patient Data Quality
The CISSP and CAP Prep Guide: Platinum Edition
Access Control
Telecommunications and Network Security
The Certification Phase
The Accreditation Phase
Appendix C The Information System Security Architecture Professional (ISSAP) Certification
Secure Programming Cookbook for C and C++: Recipes for Cryptography, Authentication, Input Validation & More
Converting Between Numeric Types
Justify a Text File
Using Hashed Containers
Initializing a Container with Random Numbers
Introduction
The New Solution Selling: The Revolutionary Sales Process That Is Changing the Way People Sell [NEW SOLUTION SELLING 2/E]
Chapter Two Principles
Chapter Five Stimulating Interest
Chapter Nine Selling When You re Not First
Chapter Thirteen Closing: Reaching Final Agreement
Chapter Fifteen Sales Management System: Managers Managing Pipelines and Salespeople
Extending and Embedding PHP
Compiling on UNIX
Properties
Array Manipulation
Linked Lists
Appendix D. Additional Resources
Python Programming for the Absolute Beginner, 3rd Edition
A Regression Test Script
Sending Mail by SMTP
Utility Modules
Submitting PyErrata Reports
PyCalc: A Calculator Program/Object
flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net
Privacy policy
This website uses cookies. Click
here
to find out more.
Accept cookies