Hack 57 Related Amazon.com Products with Alexa
Given any URL, Alexa will return traffic data, user ratings, and even related Amazon.com products. This hack creates a cloud of related product data for any given URL . Alexa (http://www.alexa.com), an Amazon.com property, measures a web site's traffic, then rates it for popularity based on other sites with similar topics. Along with these Related Links, you also have the capability to read, add, and write reviews, as well as find similar products at Amazon.com. Some interesting scripts can be created, simply by following through with the various information Alexa provides via its XML exports. For example, we can create a list of products recommended not only for a given web site, but also for web sites that are related to the original. Following those related web sites and obtaining their related Amazon.com products creates a cloud of items related to the original URL. In the following section, we'll walk you through the code for one such cloud creator. The CodeFor this script, you'll need an Amazon.com developer token, which can be obtained from http://www.amazon.com/ webservices /. Save the following code to a file called alexa.pl : #!/usr/bin/perl -w use strict; use URI; use LWP::Simple; use Net::Amazon; use XML::Simple; use constant AMAZON_TOKEN => ' your token here '; use constant DEBUG => 0; # get our arguments. the first argument is the # URL to fetch, and the second is the output. my $url = shift die "#!/usr/bin/perl -w use strict; use URI; use LWP::Simple; use Net::Amazon; use XML::Simple; use constant AMAZON_TOKEN => ' your token here '; use constant DEBUG => 0; # get our arguments. the first argument is the # URL to fetch, and the second is the output. my $url = shift die "$0 <url> [<output>]\n"; my $output = shift '/www/htdocs/cloud.html'; # we'll need to fetch the Alexa XML at some point, and # we'll do it a few different times, so we create a # subroutine for it. Using the URI module, we can # correctly encode a URL with a query. In fact, you'll # notice the majority of this function is involved with # this, and at the end we use LWP::Simple to actually # download and return the XML. ##################################################### sub fetch_xml { my $url = shift; $url = "http://$url" unless $url =~ m[^http://]; warn "Fetching Alexa data for $url\n" if DEBUG; my @args = ( cli => 10, dat => 'snba', ver => '7.0', url => $url, ); my $base = 'http://data.alexa.com/data'; my $uri = URI->new( $base ); $uri->query_form( @args ); $uri = $uri->as_string; return get( $uri ); } # raw XML is no good for us, though, as we want to extract # particular items of interest. we use XML::Simple to turn # the XML into Perl data structures, because it's easier # than fiddling with event handling (as with XML::Parser # or XML::SAX), and we know there's only a small amount of # data. we want the list of related sites and the list of # related products. we extract and return both. ##################################################### sub handle_xml { my $page = shift; my $xml = XMLin( $page ); my @related = map { { asin => $_->{ASIN}, title => $_->{TITLE}, href => $xml->{RLS}{PREFIX}.$_->{HREF}, } } @{ $xml->{RLS}{RL} }; my @products; if (ref $xml->{SD}{AMZN}{PRODUCT} eq 'ARRAY') { @products = map { $_->{ASIN} } @{ $xml->{SD}{AMZN}{PRODUCT} }; } else { @products = $xml->{SD}{AMZN}{PRODUCT}{ASIN}; } return ( \@related, \@products ); } # Functions done; now for the program: warn "Start URL is $url\n" if DEBUG; my @products; # running accumulation of product ASINs { my $page = fetch_xml( $url ); my ($related, $new_products) = handle_xml( $page ); @products = @$new_products; # running list for (@$related) { my $xml = fetch_xml( $_->{href} ); my ($related, $new_products) = handle_xml( $page ); push @products, @$new_products; } } # We now have a list of products in @products, so # we'd best do something with them. Let's look # them up on Amazon and see what their titles are. my $amazon = Net::Amazon->new( token => AMAZON_TOKEN ); my %products = map { $_ => undef } @products; for my $asin ( sort keys %products ) { warn "Searching for $asin...\n" if DEBUG; my $response = $amazon->search( asin => $asin ); my @products = $response->properties; die "ASIN is not unique!?" unless @products == 1; my $product = $products[0]; $products{$asin} = { name => $product->ProductName, price => $product->OurPrice, asin => $asin, }; } # Right. We now have name , price, and # ASIN. Let's output an HTML report: { umask 022; warn "Writing to $output\n" if DEBUG; open my $fh, '>', $output or die $!; print $fh "<html><head><title>Cloud around $url</title></head><body>"; if (keys %products) { print $fh "<table>"; for my $asin (sort keys %products) { my $data = $products{$asin}; printf $fh "<tr><td>". "<a href=\"http://amazon.com/ exec /obidos/ASIN/%s\">". "%s</a></td> <td>%s</td></tr>", @{$data}{qw( asin name price )}; } print $fh "</table>"; } else { print $fh "No related products found.\n"; } print $fh "</body></html>\n"; }<url> [<output>]\n"; my $output = shift '/www/htdocs/cloud.html'; # we'll need to fetch the Alexa XML at some point, and # we'll do it a few different times, so we create a # subroutine for it. Using the URI module, we can # correctly encode a URL with a query. In fact, you'll # notice the majority of this function is involved with # this, and at the end we use LWP::Simple to actually # download and return the XML. ##################################################### sub fetch_xml { my $url = shift; $url = "http://$url" unless $url =~ m[^http://]; warn "Fetching Alexa data for $url\n" if DEBUG; my @args = ( cli => 10, dat => 'snba', ver => '7.0', url => $url, ); my $base = 'http://data.alexa.com/data'; my $uri = URI->new( $base ); $uri->query_form( @args ); $uri = $uri->as_string; return get( $uri ); } # raw XML is no good for us, though, as we want to extract # particular items of interest. we use XML::Simple to turn # the XML into Perl data structures, because it's easier # than fiddling with event handling (as with XML::Parser # or XML::SAX), and we know there's only a small amount of # data. we want the list of related sites and the list of # related products. we extract and return both. ##################################################### sub handle_xml { my $page = shift; my $xml = XMLin( $page ); my @related = map { { asin => $_->{ASIN}, title => $_->{TITLE}, href => $xml->{RLS}{PREFIX}.$_->{HREF}, } } @{ $xml->{RLS}{RL} }; my @products; if (ref $xml->{SD}{AMZN}{PRODUCT} eq 'ARRAY') { @products = map { $_->{ASIN} } @{ $xml->{SD}{AMZN}{PRODUCT} }; } else { @products = $xml->{SD}{AMZN}{PRODUCT}{ASIN}; } return ( \@related, \@products ); } # Functions done; now for the program: warn "Start URL is $url\n" if DEBUG; my @products; # running accumulation of product ASINs { my $page = fetch_xml( $url ); my ($related, $new_products) = handle_xml( $page ); @products = @$new_products; # running list for (@$related) { my $xml = fetch_xml( $_->{href} ); my ($related, $new_products) = handle_xml( $page ); push @products, @$new_products; } } # We now have a list of products in @products, so # we'd best do something with them. Let's look # them up on Amazon and see what their titles are. my $amazon = Net::Amazon->new( token => AMAZON_TOKEN ); my %products = map { $_ => undef } @products; for my $asin ( sort keys %products ) { warn "Searching for $asin...\n" if DEBUG; my $response = $amazon->search( asin => $asin ); my @products = $response->properties; die "ASIN is not unique!?" unless @products == 1; my $product = $products[0]; $products{$asin} = { name => $product->ProductName, price => $product->OurPrice, asin => $asin, }; } # Right. We now have name, price, and # ASIN. Let's output an HTML report: { umask 022; warn "Writing to $output\n" if DEBUG; open my $fh, '>', $output or die $!; print $fh "<html><head><title>Cloud around $url</title></head><body>"; if (keys %products) { print $fh "<table>"; for my $asin (sort keys %products) { my $data = $products{$asin}; printf $fh "<tr><td>". "<a href=\"http://amazon.com/exec/obidos/ASIN/%s\">". "%s</a></td> <td>%s</td></tr>", @{$data}{qw( asin name price )}; } print $fh "</table>"; } else { print $fh "No related products found.\n"; } print $fh "</body></html>\n"; } Running the HackRun the script on the command line, passing it the URL you're interested in and a filename to which you'd like the results saved (you can also hardcode a default output location into the script). The following output shows an example of the script's DEBUG output turned on: % perl alexa.pl http://www.gamegrene.com/ testing.html Start URL is http://www.gamegrene.com/ Fetching Alexa data for http://www.gamegrene.com/ Fetching Alexa data for http://www.elvesontricycles.com/ Fetching Alexa data for http://www.chimeramag.com/ Fetching Alexa data for http://pages.infinit.net/raymondl Fetching Alexa data for http://www.beyond-adventure.com/ Fetching Alexa data for http://strcat.com/News Fetching Alexa data for http://members.aol.com/stocdred Fetching Alexa data for http://lost-souls.hk.st/ Fetching Alexa data for http://www.gamerspulse.com/ Fetching Alexa data for http://www.gignews.com/ Fetching Alexa data for http://www.gamesfirst.com/ Searching for 0070120102... Searching for 0070213631... Searching for 0070464081... Searching for 0070465886... ..etc.. Searching for 1879239027... Writing to testing.html Figure 4-4 shows an example of the resulting file. Figure 4-4. Amazon.com's related products for Gamegrene.comHacking the HackAs the script stands, it requires manual running or a cron script [Hack #90] to regularly place the latest information on your own pages (if that's your intent, of course). You might want to turn this into a CGI program and let people enter web sites of their own choice. This is pretty easy to do. If you've created an HTML form that accepts the desired web site in an input named url , like this: <form method="GET" action="alexa.pl"> URL: <input type="text" name="url" /> </form> then modifying your script to accept this value means changing this: # get our arguments. the first argument is the # URL to fetch, and the second is the output. my $url = shift die "# get our arguments. the first argument is the # URL to fetch, and the second is the output. my $url = shift die "$0 <url> [<output>]\n"; my $output = shift '/www/htdocs/cloud.html';<url> [<output>]\n"; my $output = shift '/www/htdocs/cloud.html'; to this: use LWP::Simple qw(!head); use CGI qw/:standard/; my $url = param('url'); and changing the output from a filename from this: warn "Writing to $output\n" if DEBUG; open my $fh, '>', $output or die $!; to the waiting web browser: my $fh = *STDOUT; # redirect. print $fh "Content-type: text/html\n\n"; Be sure to remove the extraneous use LWP::Simple; line at the beginning of the script. Since both CGI and LWP::Simple have a function named head , you'll get a number of warning messages about redefinitions, unless you change the way LWP::Simple has been imported. By telling it not to import its own unnecessary head function, our new code circumvents these warnings. Iain Truskett |