Section 10.13. Having Amazon Produce Its Own RSS Feeds


10.13. Having Amazon Produce Its Own RSS Feeds

Andrew Odewahn, an editor at O'Reilly, spotted how you can have Amazon.com produce RSS feeds from any search within its own site. So, he says:

You could subscribe to a search for books on "software engineering," "Java," and "history of europe" to easily keep track with what's going on.

To do this, you only need play with a URL. The URL for an RSS 0.91 feed of a keyword search for "software engineering" is:

http://xml.amazon.com/onca/xml3?t=webservices-20&dev- t=amznRss&KeywordSearch=software%20engineering&mode=books&bcm=&typ e=lite&page=1&ct=text/xml&sort=+salesrank&f=http://xml.amazon.com/xsl/ xml-rss091.xsl

As you can see, this is actually calling the Amazon web services system and then formatting it by passing it through an XSLT stylesheet.

Andrew goes on to say,

By default, the search results are sorted by Amazon rank. After hacking around a bit, I found that you can change this default to any of a variety of other orders. For example, you can modify the feed so that books are sorted by publication date to get an automatically updated list of new books published on a particular topic.

To do this, scroll over the URL until you see the "&sort=+salesrank" part. To change the sort order, replace "salesrank" with the option you'd prefer. Here's a list of possible options found on Amazon's Web Services page, http://www.amazon.com/gp/aws/landing.html:

  • Publication Date daterank

  • Featured Items pmrank

  • Sales Rank salesrank

  • Customer reviews reviewrank

  • Price (Lo-Hi) pricerank

  • Price (Hi-Lo) inverse-pricerank

So, to get a feed of all the new books published in Software Engineering, just replace "salesrank" with "daterank" and click update. From now on, you easily see all the new stuff coming out in one place. For example:

http://xml.amazon.com/onca/xml3?t=webservices-20&dev- t=amznRss&KeywordSearch=Ben%20Hammersley&mode=books&bcm=&type=l ite&page=1&ct=text/xml&sort=+pricerank&f=http://xml.amazon.com/xsl/xml- rss091.xsl

produces a feed of the books I have written, in order of price. This technique highlights a very good point: it is well worth looking at complex URLs quite carefully: they're invariably hackable.


    10.14. Cross-Poster for Movable Type

    By now, the traditional use of feeds as a form of content syndication is beginning to look somewhat old-fashioned. But fear not: here's a use that is as traditional as can be.

    I needed a script to check all of the weblogs I write for and to cross-post everything I write onto my own weblog. Bear in mind that I'm not the only author on these other sites.

    To do this, check their RSS feeds on a set schedule, grab the content within, build a big entry, and then post it. This code is for a Movable Type installation, but it isn't hard to modify it to fit another weblogging platform.

    10.14.1. Walking Through the Code

    You open the proceedings by defining all of the libraries and modules. This is the exact same code as I have running on my own server, so you need to modify the following paths to point to your own Movable Type libraries, blog IDs, and so on:

    use lib "/web/script/ben/mediacooperative.com/lib";
    use lib "/web/script/ben/mediacooperative.com/extlib";
    use lib "/web/script/ben/lib/perl";
    use MT;
    use MT::Entry;
    use Date::Manip;
    use LWP::Simple 'get';
    use XML::RSS;
    
    my $MTauthor = "1";
    my $MTblogID = "3";
    my $MTconfig = "/home/ben/web/mediacooperative.com/mt.cfg";
    my $guts     = "";

    Now, let's set up a list of sites to check. For each site, you need to define only the feed URL and the <dc:creator> or <author> name under which I am posting. Everything else you can get from the feed itself. For example:

    http://del.icio.us/rss/bhammersley "bhammersley" http://www.oreillynet.com/feeds/author/?x-au=909 "Ben Hammersley" http://monkeyfilter.com/rss.php "DangerIsMyMiddleName" http://www.benhammersley.com/expeditions/northpole2006/index.rdf "Ben Hammersley"

    You can do this with an array of arrays:

    my @sites_to_check = (
        [ "http://del.icio.us/rss/bhammersley",               "bhammersley" ],
        [ "http://www.oreillynet.com/feeds/author/?x-au=909", "Ben Hammersley" ],
        [ "http://monkeyfilter.com/rss.php", "DangerIsMyMiddleName" ],
        [
            "http://www.benhammersley.com/expeditions/northpole2006/index.rdf",
            "Ben Hammersley"
        ],
    );

    Now, the loop. You go through each feed, downloading, parsing, and so on. Let's start by taking the site_feed_url and the site_author_nym (the name under which I go on that site) out of the array. This step could be omitted, but for the sake of clarity, we'll leave it here.

    for my $site_being_checked (@sites_to_check) {
    
        my $site_feed_url   = @$site_being_checked[0];
        my $site_author_nym = @$site_being_checked[1];

    Now, retrieve the feed, or go to the next one if it fails:

        my $feed_xml = get("$site_feed_url") or next;

    And now, to parse it. You do so by spawning a new instance of the XML::RSS parser and jamming the feed into it:

    my $rss_parser = XML::RSS->new( );
    $rss_parser->parse($feed_xml);

    To set up for the strange occasion where there might be new content to post, let's query the newly created RSS parser object for its name:

     my $feed_name = $rss_parser->{channel}->{title};
     my $feed_link = $rss_parser->{channel}->{link};

    Now, go through each of the items within the field, and grab all needed data out of them: the link, title, description, author, and date. Note that you have to include the fallbacks of the guid, content, and the various dc values to deal with different versions of RSS.

    foreach my $item ( @{ $rss_parser->{items} } ) {
    
        my $item_link        = $$item{link} || $$item{guid};
        my $item_title       = $$item{title};
        my $item_description = $$item{description};
        my $item_author      = $$item{author} || $$item{dc}->{creator};
        my $item_date        = $$item{pubDate} || $$item{dc}->{date};

    Now, check to see if any were written today by me. First, work out what time and date it is now. Then, compare the post's date with the date now, and, if it's less than 24 hours behind, and it was written by me, then all is good.

    Note: to get this code to work with del.icio.us and any other sites that use date strings with z instead of +00:00 (which Date::Manip can't deal with), you have to use a nasty substitution. Sorry about that.

    my $todays_date = &UnixDate( "now", "%Y-%m-%dT%H:%M:%S+00:00" );
    $item_date =~ s/Z/+00:00/;
    my $date_delta = DateCalc( "$item_date", "$todays_date", \$err, 1 );
    my $parsed_delta = Delta_Format( "$date_delta", exact, '%dh' );
    
    if ( ( $parsed_delta < 1 ) and ( $item_author eq $site_author_nym ) ) {

    If all the tests turn out to be true, add a bunch of HTML to the $guts variable, within which you're building the new entry:

                $guts .=
    qq|<div ><blockquote><a href="$item_link">$item_title</a><br/>posted
    to <a href="$feed_link">$feed_name</a><br/></p><p>$item_description</p></blockquote></div>|;
    
            }
        }
    }

    Now, having worked our way through the feeds, if the $guts has anything in it, you need to post it and take care of that end:

    if ( $guts ne "" ) {
    
        my $mt    = MT->new( Config => $MTconfig ) or die MT->errstr;
        my $entry = MT::Entry->new;
    
        $entry->blog_id($MTblogID);
        $entry->status( MT::Entry::RELEASE( ) );
        $entry->author_id($MTauthor);
        $entry->title("Posted elsewhere today");
        $entry->text($guts);
        $entry->convert_breaks(0);
        $entry->save
          or die $entry->errstr;
    
        # rebuild the site
    
        $mt->rebuild( BlogID => $MTblogID )
          or die "Rebuild error: " . $mt->errstr;
    
        # ping aggregators
    
        $mt->ping($MTblogID);
    
    }

    10.14.2. The Entire Listing

    #!/usr/bin/perl
    
    use lib "/web/script/ben/mediacooperative.com/lib";
    use lib "/web/script/ben/mediacooperative.com/extlib";
    use lib "/web/script/ben/lib/perl";
    use MT;
    use MT::Entry;
    use Date::Manip;
    use LWP::Simple 'get';
    use XML::RSS;
    
    my $MTauthor = "1";
    my $MTblogID = "3";
    my $MTconfig = "/home/ben/web/mediacooperative.com/mt.cfg";
    my $guts     = "";
    
    my @sites_to_check = (
        [ "http://del.icio.us/rss/bhammersley",               "bhammersley" ],
        [ "http://www.oreillynet.com/feeds/author/?x-au=909", "Ben Hammersley" ],
        [ "http://monkeyfilter.com/rss.php", "DangerIsMyMiddleName" ],
        [
            "http://www.benhammersley.com/expeditions/northpole2006/index.rdf",
            "Ben Hammersley"
        ],
    );
    
    for my $site_being_checked (@sites_to_check) {
    
        my $site_feed_url   = @$site_being_checked[0];
        my $site_author_nym = @$site_being_checked[1];
    
        my $feed_xml = get("$site_feed_url") or next;
    
        my $rss_parser = XML::RSS->new( );
        $rss_parser->parse($feed_xml);
    
        my $feed_name = $rss_parser->{channel}->{title};
        my $feed_link = $rss_parser->{channel}->{link};
    
        foreach my $item ( @{ $rss_parser->{items} } ) {
    
            my $item_link        = $$item{link} || $$item{guid};
            my $item_title       = $$item{title};
            my $item_description = $$item{description};
            my $item_author      = $$item{author} || $$item{dc}->{creator};
            my $item_date        = $$item{pubDate} || $$item{dc}->{date};
                 
            my $todays_date = &UnixDate( "now", "%Y-%m-%dT%H:%M:%S+00:00" );
            $item_date =~ s/Z/+00:00/;
            my $date_delta = DateCalc( "$item_date", "$todays_date", \$err, 1 );
            my $parsed_delta = Delta_Format( "$date_delta", exact, '%dh' );
    
            if ( ( $parsed_delta < 1 ) and ( $item_author eq $site_author_nym ) ) {
    
                $guts .=
    qq|<div ><blockquote><a href="$item_link">$item_title</a><br/>posted
    to <a href="$feed_link">$feed_name</a><br/></p><p>$item_description</p></blockquote></div>|;
    
            }
        }
    }
    
    if ( $guts ne "" ) {
    
        my $mt    = MT->new( Config => $MTconfig ) or die MT->errstr;
        my $entry = MT::Entry->new;
    
        $entry->blog_id($MTblogID);
        $entry->status( MT::Entry::RELEASE( ) );
        $entry->author_id($MTauthor);
        $entry->title("Posted elsewhere today");
        $entry->text($guts);
        $entry->convert_breaks(0);
        $entry->save
          or die $entry->errstr;
    
        # rebuild the site
    
        $mt->rebuild( BlogID => $MTblogID )
          or die "Rebuild error: " . $mt->errstr;
    
        # ping aggregators
    
        $mt->ping($MTblogID);
    }