Hack 57 Related Amazon.com Products with Alexa

Hack 57 Related Amazon.com Products with Alexa

figs/moderate.gif figs/hack57.gif

Given any URL, Alexa will return traffic data, user ratings, and even related Amazon.com products. This hack creates a cloud of related product data for any given URL .

Alexa (http://www.alexa.com), an Amazon.com property, measures a web site's traffic, then rates it for popularity based on other sites with similar topics. Along with these Related Links, you also have the capability to read, add, and write reviews, as well as find similar products at Amazon.com. Some interesting scripts can be created, simply by following through with the various information Alexa provides via its XML exports. For example, we can create a list of products recommended not only for a given web site, but also for web sites that are related to the original. Following those related web sites and obtaining their related Amazon.com products creates a cloud of items related to the original URL. In the following section, we'll walk you through the code for one such cloud creator.

The Code

For this script, you'll need an Amazon.com developer token, which can be obtained from http://www.amazon.com/ webservices /. Save the following code to a file called alexa.pl :

 #!/usr/bin/perl -w use strict; use URI; use LWP::Simple; use Net::Amazon; use XML::Simple; use constant AMAZON_TOKEN => '   your token here   '; use constant DEBUG => 0; # get our arguments. the first argument is the # URL to fetch, and the second is the output. my $url = shift  die " 
 #!/usr/bin/perl -w use strict; use URI; use LWP::Simple; use Net::Amazon; use XML::Simple; use constant AMAZON_TOKEN => '   your token here   '; use constant DEBUG => 0; # get our arguments. the first argument is the # URL to fetch, and the second is the output. my $url = shift  die "$0 <url> [<output>]\n"; my $output = shift  '/www/htdocs/cloud.html'; # we'll need to fetch the Alexa XML at some point, and # we'll do it a few different times, so we create a  # subroutine for it. Using the URI module, we can # correctly encode a URL with a query. In fact, you'll # notice the majority of this function is involved with # this, and at the end we use LWP::Simple to actually # download and return the XML. ##################################################### sub fetch_xml {     my $url = shift;     $url = "http://$url" unless $url =~ m[^http://];     warn "Fetching Alexa data for $url\n" if DEBUG;     my @args = (         cli => 10,     dat => 'snba',         ver => '7.0',  url => $url,     );     my $base = 'http://data.alexa.com/data';     my $uri = URI->new( $base );     $uri->query_form( @args );     $uri = $uri->as_string;     return get( $uri ); } # raw XML is no good for us, though, as we want to extract # particular items of interest. we use XML::Simple to turn # the XML into Perl data structures, because it's easier # than fiddling with event handling (as with XML::Parser # or XML::SAX), and we know there's only a small amount of # data. we want the list of related sites and the list of # related products. we extract and return both. ##################################################### sub handle_xml {     my $page = shift;     my $xml = XMLin( $page );     my @related = map {         {             asin => $_->{ASIN},             title => $_->{TITLE},             href => $xml->{RLS}{PREFIX}.$_->{HREF},         }     } @{ $xml->{RLS}{RL} };     my @products;     if (ref $xml->{SD}{AMZN}{PRODUCT} eq 'ARRAY') {         @products = map { $_->{ASIN} } @{ $xml->{SD}{AMZN}{PRODUCT} };     } else { @products = $xml->{SD}{AMZN}{PRODUCT}{ASIN}; }     return ( \@related, \@products ); } # Functions done; now for the program: warn "Start URL is $url\n" if DEBUG; my @products; # running accumulation of product ASINs {     my $page = fetch_xml( $url );     my ($related, $new_products) = handle_xml( $page );     @products = @$new_products; # running list     for (@$related) {         my $xml = fetch_xml( $_->{href} );         my ($related, $new_products) = handle_xml( $page );         push @products, @$new_products;     } } # We now have a list of products in @products, so # we'd best do something with them. Let's look # them up on Amazon and see what their titles are. my $amazon = Net::Amazon->new( token => AMAZON_TOKEN ); my %products = map { $_ => undef } @products; for my $asin ( sort keys %products ) {     warn "Searching for $asin...\n" if DEBUG;     my $response = $amazon->search( asin => $asin );     my @products = $response->properties;     die "ASIN is not unique!?" unless @products == 1;     my $product = $products[0];     $products{$asin} = {         name => $product->ProductName,         price => $product->OurPrice,         asin => $asin,     }; } # Right. We now have name , price, and # ASIN. Let's output an HTML report: {     umask 022;     warn "Writing to $output\n" if DEBUG; open my $fh, '>', $output or die $!;     print $fh "<html><head><title>Cloud around $url</title></head><body>";     if (keys %products) {         print $fh "<table>";         for my $asin (sort keys %products) {             my $data = $products{$asin};             printf $fh "<tr><td>".                        "<a href=\"http://amazon.com/ exec /obidos/ASIN/%s\">".                        "%s</a></td> <td>%s</td></tr>",                        @{$data}{qw( asin name price )};         }         print $fh "</table>";     }     else { print $fh "No related products found.\n"; }     print $fh "</body></html>\n"; } 
<url> [<output>]\n"; my $output = shift '/www/htdocs/cloud.html'; # we'll need to fetch the Alexa XML at some point, and # we'll do it a few different times, so we create a # subroutine for it. Using the URI module, we can # correctly encode a URL with a query. In fact, you'll # notice the majority of this function is involved with # this, and at the end we use LWP::Simple to actually # download and return the XML. ##################################################### sub fetch_xml { my $url = shift; $url = "http://$url" unless $url =~ m[^http://]; warn "Fetching Alexa data for $url\n" if DEBUG; my @args = ( cli => 10, dat => 'snba', ver => '7.0', url => $url, ); my $base = 'http://data.alexa.com/data'; my $uri = URI->new( $base ); $uri->query_form( @args ); $uri = $uri->as_string; return get( $uri ); } # raw XML is no good for us, though, as we want to extract # particular items of interest. we use XML::Simple to turn # the XML into Perl data structures, because it's easier # than fiddling with event handling (as with XML::Parser # or XML::SAX), and we know there's only a small amount of # data. we want the list of related sites and the list of # related products. we extract and return both. ##################################################### sub handle_xml { my $page = shift; my $xml = XMLin( $page ); my @related = map { { asin => $_->{ASIN}, title => $_->{TITLE}, href => $xml->{RLS}{PREFIX}.$_->{HREF}, } } @{ $xml->{RLS}{RL} }; my @products; if (ref $xml->{SD}{AMZN}{PRODUCT} eq 'ARRAY') { @products = map { $_->{ASIN} } @{ $xml->{SD}{AMZN}{PRODUCT} }; } else { @products = $xml->{SD}{AMZN}{PRODUCT}{ASIN}; } return ( \@related, \@products ); } # Functions done; now for the program: warn "Start URL is $url\n" if DEBUG; my @products; # running accumulation of product ASINs { my $page = fetch_xml( $url ); my ($related, $new_products) = handle_xml( $page ); @products = @$new_products; # running list for (@$related) { my $xml = fetch_xml( $_->{href} ); my ($related, $new_products) = handle_xml( $page ); push @products, @$new_products; } } # We now have a list of products in @products, so # we'd best do something with them. Let's look # them up on Amazon and see what their titles are. my $amazon = Net::Amazon->new( token => AMAZON_TOKEN ); my %products = map { $_ => undef } @products; for my $asin ( sort keys %products ) { warn "Searching for $asin...\n" if DEBUG; my $response = $amazon->search( asin => $asin ); my @products = $response->properties; die "ASIN is not unique!?" unless @products == 1; my $product = $products[0]; $products{$asin} = { name => $product->ProductName, price => $product->OurPrice, asin => $asin, }; } # Right. We now have name, price, and # ASIN. Let's output an HTML report: { umask 022; warn "Writing to $output\n" if DEBUG; open my $fh, '>', $output or die $!; print $fh "<html><head><title>Cloud around $url</title></head><body>"; if (keys %products) { print $fh "<table>"; for my $asin (sort keys %products) { my $data = $products{$asin}; printf $fh "<tr><td>". "<a href=\"http://amazon.com/exec/obidos/ASIN/%s\">". "%s</a></td> <td>%s</td></tr>", @{$data}{qw( asin name price )}; } print $fh "</table>"; } else { print $fh "No related products found.\n"; } print $fh "</body></html>\n"; }

Running the Hack

Run the script on the command line, passing it the URL you're interested in and a filename to which you'd like the results saved (you can also hardcode a default output location into the script). The following output shows an example of the script's DEBUG output turned on:

 %  perl alexa.pl http://www.gamegrene.com/ testing.html  Start URL is http://www.gamegrene.com/ Fetching Alexa data for http://www.gamegrene.com/ Fetching Alexa data for http://www.elvesontricycles.com/ Fetching Alexa data for http://www.chimeramag.com/ Fetching Alexa data for http://pages.infinit.net/raymondl Fetching Alexa data for http://www.beyond-adventure.com/ Fetching Alexa data for http://strcat.com/News Fetching Alexa data for http://members.aol.com/stocdred Fetching Alexa data for http://lost-souls.hk.st/ Fetching Alexa data for http://www.gamerspulse.com/ Fetching Alexa data for http://www.gignews.com/ Fetching Alexa data for http://www.gamesfirst.com/ Searching for 0070120102... Searching for 0070213631... Searching for 0070464081... Searching for 0070465886... ..etc.. Searching for 1879239027... Writing to testing.html 

Figure 4-4 shows an example of the resulting file.

Figure 4-4. Amazon.com's related products for Gamegrene.com
figs/sphk_0404.gif

Hacking the Hack

As the script stands, it requires manual running or a cron script [Hack #90] to regularly place the latest information on your own pages (if that's your intent, of course). You might want to turn this into a CGI program and let people enter web sites of their own choice. This is pretty easy to do. If you've created an HTML form that accepts the desired web site in an input named url , like this:

 <form method="GET" action="alexa.pl"> URL: <input type="text" name="url" /> </form> 

then modifying your script to accept this value means changing this:

 # get our arguments. the first argument is the # URL to fetch, and the second is the output. my $url = shift  die " 
 # get our arguments. the first argument is the # URL to fetch, and the second is the output. my $url = shift  die "$0 <url> [<output>]\n"; my $output = shift  '/www/htdocs/cloud.html'; 
<url> [<output>]\n"; my $output = shift '/www/htdocs/cloud.html';

to this:

 use LWP::Simple qw(!head); use CGI qw/:standard/; my $url = param('url'); 

and changing the output from a filename from this:

 warn "Writing to $output\n" if DEBUG; open my $fh, '>', $output or die $!; 

to the waiting web browser:

 my $fh = *STDOUT; # redirect. print $fh "Content-type: text/html\n\n"; 

Be sure to remove the extraneous use LWP::Simple; line at the beginning of the script. Since both CGI and LWP::Simple have a function named head , you'll get a number of warning messages about redefinitions, unless you change the way LWP::Simple has been imported. By telling it not to import its own unnecessary head function, our new code circumvents these warnings.

Iain Truskett



Spidering Hacks
Spidering Hacks
ISBN: 0596005776
EAN: 2147483647
Year: 2005
Pages: 157

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net