Rather than await discovery, introduce yourself! No matter how gentle and polite your spider is, sooner or later you're going to be noticed. Some webmaster's going to see what your spider is up to, and they're going to want some answers. Rather than wait for that to happen, why not take the initiative and make the first contact yourself? Let's look at the ways you can preempt discovery, make the arguments for your spider, and announce it to the world. Making ContactIf you've written a great spider, why not tell the site about it? For a small site, this is relatively easy and painless: just look for the Feedback, About, or Contact links. For larger sites, though, figuring out whom to contact is more difficult. Try the technical contacts first, and then web feedback contacts. I've found that public relations contacts are usually best to reach last. Although tempting, because it's usually easy to find their addresses, PR folk like to concentrate on dealing with press people (which you're probably not) and they probably won't know enough programming to understand your request. (PR people, this isn't meant pejoratively. We still love you. Keep helping us promote O'Reilly books. Kiss, kiss.) If you absolutely can't find anyone to reach out to, try these three steps:
Making the Arguments for Your SpiderNow that you have a contact address, give a line of reasoning for your spider. If you can clearly describe what your spider's all about, great. But it may get to the point where you have to code up an example to show to the webmaster. If the person you're talking to isn't Perl-savvy, consider making a client-side version of your script with Perl2Exe (http://www.indigostar.com/perl2exe.htm) or PAR (http://search.cpan.org/author/AUTRIJUS/PAR) and sending it to her to test drive. Offer to show her the code. Explain what it does. Give samples of the output. If she really likes it, offer to let her distribute it from her site! Remember, all the average, nonprogramming webmaster is going to hear is "Hi! I wrote this Program and it Does Stuff to your site! Mind if I use it?" Understand if she wants a complete explanation and a little reassurance. Making Your Spider Easy to Find and Learn AboutAnother good way to make sure that someone knows about your spider is to include contact information in the spider's User-Agent [Hack #11]. Contact information can be an email or a web address. Whatever it is, be sure to monitor the address and make sure the web site has adequate information. Considering Legal IssuesDespite making contact, getting permission, and putting plenty of information about your spider on the Web, you may still have questions. Is your spider illegal? Are you going to get in trouble for using it? There are many open issues with respect to the laws relating to the Web, and cases, experts, and scholarsnot to mention members of the Web communitydisagree heartily on most of them. Getting permission and operating within its limits probably reduces your risk, particularly if the site's a small one (that is, run by a person or two instead of a corporation). If you don't have permission and the site's terms of service aren't clear, risk is greater. That's probably also true if you've not asked permission and you're spidering a site that makes an API available and has very overt terms of service (like Google). Legal issues on the Internet are constantly evolving; the medium is just too new to make sweeping statements about fair use and what's going to be okay and what's not. It's not just how your spider does its work, but also what you do with what you collect. In fact, we need to warn you that just because a hack is in the book doesn't mean that we can promise that it won't create risks or that no webmaster will ever consider the hack a violation of the relevant terms of service or some other legal rights. Use your common sense (don't suck everything off a web site, put it on yours, and think you're okay), keep copyright laws in mind (don't take entire wire service stories and stick them on your site), and ask permission (the worst thing they can say is no, right?). If you're really worried, your best results will come from talking to an experienced lawyer. |