Section 10.14. Cross-Poster for Movable Type


10.14. Cross-Poster for Movable Type

By now, the traditional use of feeds as a form of content syndication is beginning to look somewhat old-fashioned. But fear not: here's a use that is as traditional as can be.

I needed a script to check all of the weblogs I write for and to cross-post everything I write onto my own weblog. Bear in mind that I'm not the only author on these other sites.

To do this, check their RSS feeds on a set schedule, grab the content within, build a big entry, and then post it. This code is for a Movable Type installation, but it isn't hard to modify it to fit another weblogging platform.

10.14.1. Walking Through the Code

You open the proceedings by defining all of the libraries and modules. This is the exact same code as I have running on my own server, so you need to modify the following paths to point to your own Movable Type libraries, blog IDs, and so on:

use lib "/web/script/ben/mediacooperative.com/lib"; use lib "/web/script/ben/mediacooperative.com/extlib"; use lib "/web/script/ben/lib/perl"; use MT; use MT::Entry; use Date::Manip; use LWP::Simple 'get'; use XML::RSS; my $MTauthor = "1"; my $MTblogID = "3"; my $MTconfig = "/home/ben/web/mediacooperative.com/mt.cfg"; my $guts     = "";

Now, let's set up a list of sites to check. For each site, you need to define only the feed URL and the <dc:creator> or <author> name under which I am posting. Everything else you can get from the feed itself. For example:

http://del.icio.us/rss/bhammersley "bhammersley" http://www.oreillynet.com/feeds/author/?x-au=909 "Ben Hammersley" http://monkeyfilter.com/rss.php "DangerIsMyMiddleName" http://www.benhammersley.com/expeditions/northpole2006/index.rdf "Ben Hammersley"

You can do this with an array of arrays:

my @sites_to_check = (     [ "http://del.icio.us/rss/bhammersley",               "bhammersley" ],     [ "http://www.oreillynet.com/feeds/author/?x-au=909", "Ben Hammersley" ],     [ "http://monkeyfilter.com/rss.php", "DangerIsMyMiddleName" ],     [         "http://www.benhammersley.com/expeditions/northpole2006/index.rdf",         "Ben Hammersley"     ], );

Now, the loop. You go through each feed, downloading, parsing, and so on. Let's start by taking the site_feed_url and the site_author_nym (the name under which I go on that site) out of the array. This step could be omitted, but for the sake of clarity, we'll leave it here.

for my $site_being_checked (@sites_to_check) {     my $site_feed_url   = @$site_being_checked[0];     my $site_author_nym = @$site_being_checked[1];

Now, retrieve the feed, or go to the next one if it fails:

    my $feed_xml = get("$site_feed_url") or next;

And now, to parse it. You do so by spawning a new instance of the XML::RSS parser and jamming the feed into it:

my $rss_parser = XML::RSS->new( ); $rss_parser->parse($feed_xml);

To set up for the strange occasion where there might be new content to post, let's query the newly created RSS parser object for its name:

 my $feed_name = $rss_parser->{channel}->{title};  my $feed_link = $rss_parser->{channel}->{link};

Now, go through each of the items within the field, and grab all needed data out of them: the link, title, description, author, and date. Note that you have to include the fallbacks of the guid, content, and the various dc values to deal with different versions of RSS.

foreach my $item ( @{ $rss_parser->{items} } ) {     my $item_link        = $$item{link} || $$item{guid};     my $item_title       = $$item{title};     my $item_description = $$item{description};     my $item_author      = $$item{author} || $$item{dc}->{creator};     my $item_date        = $$item{pubDate} || $$item{dc}->{date};

Now, check to see if any were written today by me. First, work out what time and date it is now. Then, compare the post's date with the date now, and, if it's less than 24 hours behind, and it was written by me, then all is good.

Note: to get this code to work with del.icio.us and any other sites that use date strings with z instead of +00:00 (which Date::Manip can't deal with), you have to use a nasty substitution. Sorry about that.

my $todays_date = &UnixDate( "now", "%Y-%m-%dT%H:%M:%S+00:00" ); $item_date =~ s/Z/+00:00/; my $date_delta = DateCalc( "$item_date", "$todays_date", \$err, 1 ); my $parsed_delta = Delta_Format( "$date_delta", exact, '%dh' ); if ( ( $parsed_delta < 1 ) and ( $item_author eq $site_author_nym ) ) {

If all the tests turn out to be true, add a bunch of HTML to the $guts variable, within which you're building the new entry:

            $guts .= qq|<div ><blockquote><a href="$item_link">$item_title</a><br/>posted to <a href="$feed_link">$feed_name</a><br/></p><p>$item_description</p></blockquote></div>|;         }     } }

Now, having worked our way through the feeds, if the $guts has anything in it, you need to post it and take care of that end:

if ( $guts ne "" ) {     my $mt    = MT->new( Config => $MTconfig ) or die MT->errstr;     my $entry = MT::Entry->new;     $entry->blog_id($MTblogID);     $entry->status( MT::Entry::RELEASE( ) );     $entry->author_id($MTauthor);     $entry->title("Posted elsewhere today");     $entry->text($guts);     $entry->convert_breaks(0);     $entry->save       or die $entry->errstr;     # rebuild the site     $mt->rebuild( BlogID => $MTblogID )       or die "Rebuild error: " . $mt->errstr;     # ping aggregators     $mt->ping($MTblogID); }

10.14.2. The Entire Listing

#!/usr/bin/perl use lib "/web/script/ben/mediacooperative.com/lib"; use lib "/web/script/ben/mediacooperative.com/extlib"; use lib "/web/script/ben/lib/perl"; use MT; use MT::Entry; use Date::Manip; use LWP::Simple 'get'; use XML::RSS; my $MTauthor = "1"; my $MTblogID = "3"; my $MTconfig = "/home/ben/web/mediacooperative.com/mt.cfg"; my $guts     = ""; my @sites_to_check = (     [ "http://del.icio.us/rss/bhammersley",               "bhammersley" ],     [ "http://www.oreillynet.com/feeds/author/?x-au=909", "Ben Hammersley" ],     [ "http://monkeyfilter.com/rss.php", "DangerIsMyMiddleName" ],     [         "http://www.benhammersley.com/expeditions/northpole2006/index.rdf",         "Ben Hammersley"     ], ); for my $site_being_checked (@sites_to_check) {     my $site_feed_url   = @$site_being_checked[0];     my $site_author_nym = @$site_being_checked[1];     my $feed_xml = get("$site_feed_url") or next;     my $rss_parser = XML::RSS->new( );     $rss_parser->parse($feed_xml);     my $feed_name = $rss_parser->{channel}->{title};     my $feed_link = $rss_parser->{channel}->{link};     foreach my $item ( @{ $rss_parser->{items} } ) {         my $item_link        = $$item{link} || $$item{guid};         my $item_title       = $$item{title};         my $item_description = $$item{description};         my $item_author      = $$item{author} || $$item{dc}->{creator};         my $item_date        = $$item{pubDate} || $$item{dc}->{date};                       my $todays_date = &UnixDate( "now", "%Y-%m-%dT%H:%M:%S+00:00" );         $item_date =~ s/Z/+00:00/;         my $date_delta = DateCalc( "$item_date", "$todays_date", \$err, 1 );         my $parsed_delta = Delta_Format( "$date_delta", exact, '%dh' );         if ( ( $parsed_delta < 1 ) and ( $item_author eq $site_author_nym ) ) {             $guts .= qq|<div ><blockquote><a href="$item_link">$item_title</a><br/>posted to <a href="$feed_link">$feed_name</a><br/></p><p>$item_description</p></blockquote></div>|;         }     } } if ( $guts ne "" ) {     my $mt    = MT->new( Config => $MTconfig ) or die MT->errstr;     my $entry = MT::Entry->new;     $entry->blog_id($MTblogID);     $entry->status( MT::Entry::RELEASE( ) );     $entry->author_id($MTauthor);     $entry->title("Posted elsewhere today");     $entry->text($guts);     $entry->convert_breaks(0);     $entry->save       or die $entry->errstr;     # rebuild the site     $mt->rebuild( BlogID => $MTblogID )       or die "Rebuild error: " . $mt->errstr;     # ping aggregators     $mt->ping($MTblogID); }



    Developing Feeds with RSS and Atom
    Developing Feeds with Rss and Atom
    ISBN: 0596008813
    EAN: 2147483647
    Year: 2003
    Pages: 118

    flylib.com © 2008-2017.
    If you may any questions please contact us: flylib@qtcs.net