Section 1.3. Why Syndicate Your Content?


1.3. Why Syndicate Your Content?

The advantages of using other people's feeds are obvious, but what about supplying your own? There are at least nine reasons to do so:

  • It increases traffic to your site.

  • It builds brand awareness for your site.

  • It can help with search engine rankings.

  • It helps cement relationships within a community of sites.

  • It improves the site/user relationship.

  • With additional technologies, it allows others to give additional features to your serviceupdate-notification via instant messaging, for example.

  • It makes the Internet an altogether richer place, pushing semantic technology along and encouraging reuse. Good things happen when you share your data.

  • It gives you a good excuse to play with some cool stuff.

  • By reducing the amount of screen-scraping of your site, it saves wasted bandwidth.

There you are: social, spiritual, and mercenary reasons to provide a feed for your site.


    1.4. Legal Implications

    The copyright implications for RSS feeds are quite simple. There are two choices for feed publishers, and these reflect on the user.

    First, the publisher can decide that the feed must be licensed in some way. In this case, only authorized users can use the feed. It is good manners on the part of the publisher to make it as obvious as possible that this is the caseby providing a copyright notice in an XML comment, at least, and preferably by making it difficult for unauthorized users to get to the feed. Password protection is a reasonable minimum. Registering a pay-only feed with aggregators or allowing Google to see the feed is asking for trouble.

    Second, and most commonly, the publisher can decide that the RSS feed is entirely free to use. In this case, it is only polite for the publishers of public RSS feeds to consider the feed entirely in the public domainfree to be used by anyone, for anything. This might sound a little radical to the average company vice president, but remember: there is nothing in the RSS feed that didn't, in some way, in the actual source information in the first place. It is rather futile to get upset that someone might not be using your headlines in the company-approved font, or committing a similar infraction; it's somewhat against the spirit of the exercise.

    Screen-scraping a site to create a feed, by writing a script to read the site-specific layout, is a different matter. It has already been legally found, in U.S. courts at least (in the Ticketmaster versus Tickets.com case of October 1999 to March 2000), that linking to a page didn't in itself a breach of copyright. And you can argue, perhaps less convincingly, that reproducing headlines and excerpts from a site comes under fair-use guidelines for review purposes. However, it is extremely bad form to continue scraping a site if the site owner asks you to stop. Instead, try to evangelize RSS to the site owner and get him to start a proper feed.

    Nevertheless, for private use, screen-scraping is a useful technique. In later chapters you'll see how running screen-scraping scripts on your local machine can produce extremely useful feed-based applications. Because these are entirely self-contained, there's no legal issue at all.

    1.4.1. If You Are Scraped

    If you are being scraped heavily and want it stopped, there are four ways to do so. First, scrapers should obey the robots.txt directive; setting a robots.txt file in the root directory of your site sends a definite signal most will follow. Second, you can contact the scraper and ask her to stop; if she is professional, she will do so immediately. Third, you can block the IP address of the scraper, although this is sometimes rather like herding cats; scrapers can move around.

    The fourth and best way is to make a feed of your own. I'll show how to do so in the following chapters.