Hack 66 Using All Consuming to Get Book Lists

figs/moderate.gif figs/hack66.gif

You can retrieve a list of the most-mentioned books in the weblog community, as well as personal book lists and recommendations, through either of All Consuming's two web service APIs .

This hack could represent the future of web applications. It glues together pieces of several web service APIs and then, in turn , offers an API to its features. If someone were to create a derivative application with this API, it would represent a third layer of abstraction from Amazon.com's service. Entire lightweight services may someday be built layer upon layer like this, with dozens of interconnected applications exchanging data freely behind the scenes.

If this is a book about scraping and spidering, why include instructions on how to use web-based APIs? Quite simply, they make scraping easier . Instead of having to worry about ever-changing HTML [Hack #32], you merely have to do some quick research to learn the provided interface. Likewise, using an API makes it easier to combine raw data from scraping with prepared data from sites like Technorati, All Consuming, Alexa, and Amazon.com. For an example, check out [Hack #59].


All Consuming (http://www.allconsuming.net) is a fairly small application, built on top of a mountain of information that has been made freely available through web services. Amazon.com's Web Services API fuels the invaluable book information, Google's API allows us to get related web sites for book titles, and Weblogs.com has an XML file that lets us know which web sites have been updated each hour . Combining these three services, we can create lists of books that are being talked about on the Web. It only makes sense for us to give back to this generous community by opening up SOAP and REST interfaces to All Consuming's information, to be used for free and in any way that can be invented.

The SOAP Code

Here's an example of how you can access All Consuming information on your own, using SOAP and Perl. Create a file called display_weekly_list_with_soap.cgi :

 #!/usr/bin/perl  -w # display_weekly_list_with_soap.cgi use strict;  use SOAP::Lite +autodispatch =>      uri => 'http://www.allconsuming.net/AllConsumngAPI',     proxy => 'http://www.allconsuming.net/soap.cgi'; # optional values for the API. my ($hour,$day,$month,$year) = qw( 12 05 28 2003 ); my $AllConsumingObject =  AllConsumingAPI->new(                          $hour,  # optional                          $day,   # optional                          $month, # optional                          $year   # optional                        ); 

This creates a new object, $AllConsumingObject , which you can then use to retrieve a wide variety of data, as explained in the following sections.

Most-mentioned lists

Every hour, All Consuming crawls recently updated weblogs to see if any new books have been mentioned for the first time on any given site. It combines this information with Amazon.com's Web Services API, aggregates frequently mentioned books into hourly and weekly lists, and archives them all the way back to August 2002. GetHourlyList sends you the most recent hour's list information, GetWeeklyList sends you the most recent aggregation of all activity during the last week, and GetArchiveList returns you the hourly or weekly list that corresponds with the date that you specify when creating the object (the $hour , $day , $month , and $year variables ). For example:

 my $HourlyData = $AllConsumingObject->GetHourlyList; my $WeeklyData = $AllConsumingObject->GetWeeklyList; my $ArchivedData = $AllConsumingObject->GetArchiveList; 
Personal book lists

People have created their own book lists directly through All Consuming, assigning them to categories like Currently Reading, Favorite Books, and Completed Books. Although some of these lists are available for use on other sites through methods like JavaScript includes, if someone wants to add a Favorite Books list to their site, they'll have to use the SOAP or REST interfaces to do so:

 my $CurrentlyReading = $AllConsumingObject->GetCurrentlyReadingList('   insert    [RETURN]       name   '); my $FavoriteBooks = $AllConsumingObject->GetFavoriteBooksList('   insert    [RETURN]       name   '); my $PurchasedBooks = $AllConsumingObject->GetPurchasedBooksList('   insert    [RETURN]       name   '); my $CompletedBooks = $AllConsumingObject->GetCompletedBooksList('   insert    [RETURN]       name   '); 
Book metadata and weblog mentions

Some users have added valuable metadata about books, such as first lines and number of pages. This is mostly for fun, and it allows me to have an hourly "first line trivia" question on my homepage, to see if you can guess the book that the first line comes from. In any case, if you want to retrieve book metadata for a given book, you can do so with the following method:

 my $Metadata = $AllConsumingObject->GetMetadataForBook('   insert ISBN   '); 

The argument passed in is the ISBN (International Standard Book Number) for the book you'd like to retrieve metadata from. For a list of metadata that's currently available for use, you can check out the metadata scorecard at All Consuming (http://www.allconsuming.net/scorecard.html).

Alternatively, if you'd like to receive a list of all of the weblogs that have mentioned a particular book, you can retrieve that information using the following method:

 my $WeblogMentions = $AllConsumingObject->GetWeblogMentionsForBook('   insert    [RETURN]       ISBN   '); 
Friends and recommendations

All Consuming also has friend relationshipsbetween people who have marked their favorite web sites so they can keep track of what they're readingas well as book recommendations based on the sum of all those friend relationships. You can get a list of web sites that you or someone else has marked as a friend, by including your weblog URL:

 my $Friends = $AllConsumingObject->GetFriends('   insert URL   '); 

And to get a list of books that all of your friends are currently reading, sorted by those that are mentioned recently and the most times, you can do this:

 my $Recommendations = $AllConsumingObject->GetRecommendations('   insert URL   '); 

To iterate through the results these methods return, do something like this:

 # The array here may differ depending # on the type of data being returned. if (ref($WeeklyData->{'asins'}) eq 'ARRAY') {     foreach my $item (@{$WeeklyData->{'asins'}}) {         print "TITLE: $item->{'title'}\n",         "AUTHOR: $item->{'author'}\n\n";     } } 

Of course, in either of these examples, you can change the URL passed to any other URL. For a full list of methods you can invoke on this object, visit the instructions (http://allconsuming.net/news/000012.html) and code samples (http://allconsuming.net/soap-code-example.txt).

The REST Code

For those who think SOAP is a bit of overkill for simple applications like this, you can get the same information REST-style. Add this code to a file called display_weekly_list_with_rest.cgi :

 #!/usr/bin/perl -w # display_weekly_list_with_rest.cgi use strict; use LWP::Simple; use XML::Simple; # Any of the URLs mentioned below can replace this one. my $URLToGet = 'http://allconsuming.net/rest.cgi?weekly=1'; # Download and parse. my $XML = get($URLToGet); my $ParsedXML = XMLin($XML, suppressempty => 1); # The array here may differ depending # on the type of data being returned. if (ref($ParsedXML->{'asins'}) eq 'ARRAY') {     foreach my $item (@{$ParsedXML->{'asins'}}) {         print "TITLE: $item->{'title'}\n",         "AUTHOR: $item->{'author'}\n\n";     } } 

Following are the URL formats you can access via HTTP to return XML data directly.

Most-mentioned lists

Here's the REST interface for requesting the hourly and weekly most-mentioned lists:

 http://allconsuming.net/rest.cgi?  hourly  =1 http://allconsuming.net/rest.cgi?  weekly  =1 

If you'd like to retrieve an archived list of most-mentioned books, you can specify the date, like so:

 http://allconsuming.net/rest.cgi?  archive  =1&  hour  =12&  day  =12&  month  =5&  year  =2003 
Personal book lists

To retrieve a list of any of your categorized books in XML format, add your username to any of the following URLs. Note the category name in the URL.

 http://allconsuming.net/rest.cgi?  currently_reading  =1&username=  insert name  http://allconsuming.net/rest.cgi?  favorite_books  =1&username=  insert name  http://allconsuming.net/rest.cgi?  purchased_books  =1&username=  insert name  http://allconsuming.net/rest.cgi?  completed_books  =1&username=  insert name  
Book metadata and weblog mentions

To get XML data about a specific item, include the ISBN in these URLs:

 http://allconsuming.net/rest.cgi?  metadata  =1&isbn=  insert ISBN  http://allconsuming.net/rest.cgi?  weblog_mentions_for_book  =1&isbn=  insert ISBN  
Friends and recommendations

To find XML data that includes friends or recommendations for a given weblog, you can include the weblog's URL in the appropriate format:

 http://allconsuming.net/rest.cgi?  friends  =1&url=  insert URL  http://allconsuming.net/rest.cgi?  recommendations  =1&url=  insert URL  

Running the Hack

Running display_weekly_list_with_rest.cgi without modification shows:

 %  perl display_weekly_list_with_rest.cgi  TITLE: Peer-to-Peer : Harnessing the Power of Disruptive Technologies AUTHOR: Andy Oram TITLE: Quicksilver : Volume One of The Baroque Cycle AUTHOR: Neal Stephenson TITLE: A Pattern Language: Towns, Buildings, Construction AUTHOR: Christopher Alexander, Sara Ishikawa, Murray Silverstein TITLE: Designing With Web Standards AUTHOR: Jeffrey Zeldman TITLE: Slander: Liberal Lies About the American Right AUTHOR: Ann H. Coulter TITLE: Bias : A CBS Insider Exposes How the Media Distort the News AUTHOR: Bernard Goldberg TITLE: The Adventures of Charmin the Bear AUTHOR: David McKee, Joanna Quinn 

The XML Results

The returned output of both the SOAP and REST interfaces will be XML that looks something like this:

 <opt>   <header      lastBuildDate="Sat May 28 13:30:02 2003"      title="All Consuming"      language="en-us"      description="Most recent books being talked about by webloggers."      link="http://allconsuming.net/"      number_updated="172"    />   <asins      asin="0465045669"      title="Metamagical Themas"      author="Douglas R. Hofstadter"      url="http://www.erikbenson.com/"     image="http://images.amazon.com/images/P/0465045669.01.THUMBZZZ.jpg"      excerpt="Douglas Hoftstadter's lesser-known book, Metamagical Themas,  has a great chapter or two on self-referential sentences like 'This sentence  was in the past tense.'."      amazon_url="http://amazon.com/exec/obidos/ASIN/0465045669/"     allconsuming_url="http://allconsuming.net/item.cgi?id=0465045669"   /> </opt> 

If multiple items are returned, there will be multiple <asins /> elements.

Hacking the Hack

Although All Consuming currently tracks only book trends, it also stores information about other types of items that are available at Amazon.com, such as CDs, DVDs, and electronics. You can't find this information anywhere on All Consuming's site, but if you use either of the APIs to retrieve weblog mentions for an ASIN (Amazon.com Standard Identification Number) that belongs to a product category other than books, it will still faithfully return any weblog data that it has for that item.

Erik Benson



Spidering Hacks
Spidering Hacks
ISBN: 0596005776
EAN: 2147483647
Year: 2005
Pages: 157

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net