Hack 56 Sorting Amazon.com Recommendations by Rating

figs/moderate.gif figs/hack56.gif

Find the highest-rated items among your Amazon.com productrecommendations .

If you've taken the time to fine-tune your Amazon.com recommendations, you know how precise they can be. If you've also looked at the star rating for some of your favorite products, then you know that the rating can be a good indication of quality. The Amazon.com recommendation and the customer rating both add important information to a product, and they can help you make a decision about whether or not to buy one item over another.

To get a feel for the products Amazon.com recommends for you, you can visit your book recommendations at any time at the following URL:

 http://www.amazon.com/o/tg/stores/recs/instant-recs/-/  books  /0/ 

In addition to books, you can also find recommendations in other product categories. You can replace books in the URL with any of Amazon.com's catalogs, including music , electronics , dvd , and photo .

When you browse to your recommendations, you'll likely find several pages of items. Wouldn't it be great if you could add the customer review dimension by sorting the entire list by its average star rating? This hack does exactly that with a bit of screen scraping.

The Code

Because Amazon.com doesn't offer sorting by customer rating, this script first gathers all of your Amazon.com book recommendations into one list. By providing your Amazon.com account's email address and password, the script logs in as you and then requests the book recommendations page. It continues to request pages in a loop, picking out the details of your product recommendations with regular expressions. Once all the products and details are stored in an array, they can be sorted by star rating and printed out in any order you wantin this case, the average star rating.

Be sure to replace your email address and password in the proper places in the following code. You'll also need to have write permission in the script's directory so you can store Amazon.com cookies in a text file, cookies.lwp .

The code listed here intentionally logs you in under an unsecured HTTP connection, to better ensure that the script is portable across systems that don't have the relevant SSL libraries installed. If you know you have them working properly, be sure to change http:// to https :// to gain some added protection for your login information.


Save the following script to a file called get_recommendations.pl :

 #!/usr/bin/perl  -w # get_recommendations.pl # # A script to log on to Amazon, retrieve # recommendations, and sort by highest rating. # Usage: perl get_recommendations.pl use strict; use HTTP::Cookies; use LWP::UserAgent; # Amazon email and password. my $email = '   insert email address   '; my $password = '   insert password'   ; # Amazon login URL for normal users. my $logurl = "http://www.amazon.com/exec/obidos/flex-sign-in-done/"; # Now log into Amazon. my $ua = LWP::UserAgent->new; $ua->agent("(compatible; MSIE 4.01; MSN 2.5; AOL 4.0; Windows 98)"); $ua->cookie_jar( HTTP::Cookies->new('file' => 'cookies.lwp','autosave' => 1)); my %headers = ( 'content-type' => "application/x-www-form-urlencoded" ); $ua->post($logurl,    [ email       => $email,     password    => $password,     method      => 'get', opt => 'oa',     page        => 'recs/instant-recs-sign-in-standard.html',     response    => "tg/recs/recs-post-login-dispatch/-/recs/pd_rw_gw_r",     'next-page' => 'recs/instant-recs-register-standard.html',     action      => 'sign-in checked' ], %headers); # Set some variables to hold # our sorted recommendations. my (%title_list, %author_list); my (@asins, @ratings, $done); # We're logged in, so request the recommendations. my $recurl = "http://www.amazon.com/exec/obidos/tg/".               "stores/recs/instant-recs/-/books/0/t"; # Set all Amazon recommendations in # an array/title and author in hashes. until ($done) {      # Send the request for the recommendations.      my $content = $ua->get($recurl)->content;      # Loop through the HTML, looking for matches.      while ($content =~ m!<td colspan=2 width=100%>.*?detail/-/(.*?)/ref.  [RETURN]  *?<b>(.*?)</b>.*?by (.*?)\n.*?Average Customer Review&#58;.*?(.*?)out of 5  [RETURN]  stars.*?<td colspan=3><hr noshade size=1></td>!mgis) {          my ($asin,$title,$author,$rating) = ('','','','');          $title  =~ s!<.+?>!!g; # drop all HTML tags, cheaply.          $rating =~ s!\n!!g;    # remove newlines from the rating.          $rating =~ s! !!g;     # remove spaces from the rating.          $title_list{$asin} = $title;    # store the title.          $author_list{$asin} = $author;  # and the author.          push (@asins, $asin);           # and the ASINs.          push (@ratings, $rating);       # and the ... OK!      }      # See if there are more results. If so, continue the loop.      if ($content =~ m!<a href=(.*?instant-recs.*?)>more results.*?</a>!i) {         $recurl = "http://www.amazon.com"; # reassign the URL.      } else { $done = 1; } # nope, we're done. } # Sort the results by highest star rating and print! for (sort { $ratings[$b] <=> $ratings[$a] } 0..$#ratings) {     next unless $asins[$_]; # skip el blancos.     print "$title_list{$asins[$_]}  ($asins[$_])\n" .            "by $author_list{$asins[$_]} \n" .           "$ratings[$_] stars.\n\n"; } 

Running the Hack

Run the hack from the command line and send the results to another file, like this:

 %  perl get_recommendations.pl > top_rated_recommendations.txt  

The text file top_rated_recommendations.txt should be filled with product recommendations, with the highest-rated items on top. You can tweak the URL in $recurl to look for DVDs, CDs, or other product types, by changing the books URL to the product line you're interested in.

See Also

  • Amazon Hacks (http://oreilly.com/catalog/amazonhks/) by Paul Bausch

Paul Bausch



Spidering Hacks
Spidering Hacks
ISBN: 0596005776
EAN: 2147483647
Year: 2005
Pages: 157

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net