Hack 59 Finding Album Information with FreeDB and Amazon.com

Hack 59 Finding Album Information with FreeDB and Amazon.com

figs/expert.gif figs/hack59.gif

By combining identifying information from one database with related information from another, you can create powerful applications with little effort .

Although using an MP3 collection to turn your computer into a jukebox might be all the rage these days, some of us are still listening to audio CDs. And, thanks to the FreeDB project (http://www.freedb.org) and the original CDDB before it, we can identify CDs based on their contents and look up information such as artist and the names of tracks. Once we have that information, we can try looking up more from other sources.

With the help of the Amazon.com API (http://www.amazon.com/ webservices /), we can find things like cover art, other albums by the same artist, and release dates of albums. If we put this all together, we can come up with a pretty decent Now Playing display for what we're listening to.

Getting Started

So, this is what we want our script to do:

  • Calculate a disc ID for the current CD.

  • Perform a search on FreeDB for details on the CD.

  • Use the FreeDB record data to perform a search at Amazon.com.

  • Get information from the Amazon.com results for the current album.

  • Collect details on other albums from the same artist.

  • Construct an HTML page to display all the results.

To get this hack started, let's sketch out the overall flow of our script:

 #!/usr/bin/perl -w use strict; use LWP::Simple; # Settings for our Amazon developer account our $amazon_affilate_id = "   your affiliate ID, if any   "; our $amazon_api_key     = "   your amazon api key   "; # Location of a FreeDB mirror web interface our $freedb_url  = 'http://freedb.freedb.org/~cddb/cddb.cgi'; # Get the discid of the current CD my $discid = get_discid(  ); # Search for the CD details on FreeDB my $cd_info = freedb_search($discid); # Given the artist, look for music on Amazon my @amazon_rec = amazon_music_search($cd_info->{artist}); # Try to match the FreeDB title up # with Amazon to find current playing. my $curr_rec = undef; my @other_recs = (  ); for my $rec (@amazon_rec) {   if ( !defined $curr_rec && $cd_info->{title} eq $rec->{title} ) {     $curr_rec = $rec;   } else {     push @other_recs, $rec;   } } print html_template({current=>$curr_rec, others=>\@other_recs}); 

Note that we've set up a few overall configuration variables , such as our Amazon.com affiliate ID and a key for use with calls to the API. You'll want to check out the documentation for Amazon.com Web Services and sign up for a developer token. This allows Amazon.com to identify one consumer of their services from another. Now we have the overall flow of the script, so let's work out the implementation of the functions we're calling.

Checking Your Disc ID

The first part of our hack is a little tricky, and it depends a lot on your operating system. To perform a search on FreeDB, we first need to identify the current CD, and that requires access to the CD device itself. This is fairly easy to do under Linux and Mac OS X; other environments will require more homework.

For Linux and Mac OS X, we can use a small program called cd-discid (http://lly.org/~rcw/cd-discid/). If you happen to be using Debian Linux, you can install the cd-discid package using apt-get . If you're on Mac OS X and have Fink (http://fink. sourceforge .net) installed, use fink install cd-discid . If neither of these things apply to you, don't worry, we can skip this step and use a hardcoded disc ID to see how the script works, at least.

Once the program is installed, we can use this function under Linux:

 sub get_discid {   # For Linux   my $cd_discid = '/usr/local/bin/cd-discid';   my $cd_dev    = '/dev/cdrom';   return `$cd_discid $cd_dev`; } 

Basically, this calls the disc ID program using /dev/cdrom as the device containing the audio CD to be identified. You might need to adjust the path to both the program and the CD device in this function.

If you're using Mac OS X, then this implementation should work for you:

 sub get_discid {   # For Mac OS X   my $cd_discid = '/sw/bin/cd-discid';   my ($cd_dev)  = '/dev/'.     join '', map { /= "(.*?)"$/ }       grep { /"BSD Name"/ }         split(/\n/, `ioreg -w 0 -c IOCDMedia`);   return `$cd_discid $cd_dev`; } 

This looks kind of tricky, but it uses a utility called ioreg , which lists I/O devices registered with the system. We check for devices in which CD media is currently inserted and do some filtering and scraping to discover the BSD Unix device name for the appropriate device. It's dirty, but it works well.

However, if none of this works for you (either because you're using a Windows machine, or else had installation problems with the source code), you can opt to use a canned disc ID in order to explore the rest of this hack:

 sub get_discid {   # If all else fails... use Weird Al's "Alapalooza"   return "a60a840c+12 150 17795 37657 54225 72617 87907 106037 ".     "125857 141985 164055 165660 185605 2694"; } 

Digging Up the FreeDB Details

Once we have a disc ID, we can make a query against the FreeDB web service. From there, we should be able to get the name of the artist, as well as the album title and a list of track titles. Usage of the FreeDB web service is described at:

http://www.freedb.org/modules.php?name=Sections&sop=viewarticle&artid=28

under Addendum B, "CDDBP under HTTP."

Let's start implementing the FreeDB search by making a call to the web service:

 sub freedb_search {   my $discid = shift;   # Get the discid for the current   # CD and make a FreeDB query with it.   $discid =~ s/ /\+/;   my $disc_query = get("$freedb_url?cmd=cddb+query+$discid&".                        "hello=joe_random+www.asdf.com+freebot+2.1&proto=1");   my ($code, $cat, $id, @rest) = split(/ /, $disc_query); 

The first thing we do is escape the spaces in the disc ID for use in the URL used to request a query on the FreeDB web service. Then, we request the URL. In response to the request, we get a status code, along with a category and record ID. We can use this category and record ID to look up the details for our audio CD:

 # Using the results of the discid query, look up the CD's details.   # Create a hash from the name/value pairs in the detail response.   # (Note that we clean up EOF characters in the data.)   my %freedb_data =     map { s/\r//; /(.*)=(.*)/ }       split(/\n/,             get("$freedb_url?cmd=cddb+read+$cat+$id&".                 "hello=deusx+www.decafbad.com+freebot+2.1&proto=1")); 

The result of the FreeDB read request gives us a set of name/value pairs, one per line. So, we can split the result of the query by lines and use a regular expression on each to extract the name/value pairs and place them directly into a hash. However, as we receive it, the data is not quite as convenient to handle as it could be, so we can rearrange and restructure things before returning the results:

 # Rework the FreeDB result data into   # a more easily handled structure.   my %disc_info = ( );   # Artist and title are separated by ' / ' in DTITLE.   ($disc_info{artist}, $disc_info{title}) =   split(/ \/ /, $freedb_data{DTITLE});   # Extract series of tracks from   # TTITLE0..TTITLEn; stop at   # first empty title.   my @tracks = (  );   my $track_no = 0;   while ($freedb_data{"TTITLE$track_no"}) {     push @tracks, $freedb_data{"TTITLE$track_no"};     $track_no++;   }   $disc_info{tracks} = \@tracks;   return \%disc_info; } 

With this, we convert a flat set of cumbersome name/value pairs into a more flexible Perl data structure. Artist name and album title are accessible via artist and title keys in the structure, respectively, and track names are available as an array reference under the tracks key.

Rocking with Amazon.com

The next thing our script needs is the ability to search Amazon.com for products by a given artist. Luckily, Amazon.com's Web Services produce clean XML, so it won't be too hard to extract what we need from the data, even without using a full XML parser.

But first, we'll need a couple of convenience functions added to our script:

 sub trim_space {   my $val = shift;   $val=~s/^\s+//;   $val=~s/\s+$//g;   return $val; } sub clean_name {   my $name = shift;   $name=lc($name);   $name=trim_space($name);   $name=~s/[^a-z0-9 ]//g;   $name=~s/ /_/g;   return $name; } 

The first function trims whitespace from the ends of a string, and the second cleans up a string to ensure that it contains only lowercase alphanumeric characters and underscores. This last function is used to make fairly uniform hash keys in data structures.

Next, we can implement our Amazon.com Web Services (http://www.amazon.com/gp/aws/landing.html) searching code:

 # Search for authors via the Amazon search API. sub amazon_music_search {   my ($artist) = @_;   $artist =~ s/[^A-Za-z0-9 ]/ /;   # Construct the base URL for Amazon author searches.   my $base_url = "http://xml.amazon.com/onca/xml3?t=$amazon_affilate_id&".     "dev-t=$amazon_api_key&mode=music&type=lite&f=xml".       "&ArtistSearch=$artist"; 

The first thing we do is take the artist name as a parameter and try to clean up all characters that aren't alphanumeric or spaces. Then, we construct the URL to query the web service, as described in the documentation from the Amazon.com software development kit.

Next, we start to get the results of our search. Queries on Amazon.com's Web Services return results a handful at a time across several pages; so, if we want to gather all the results, we'll first need to figure out how many total pages there are. Luckily, this is a part of every page of results, so we can grab the first page and extract this information with a simple regular expression:

 # Get the first page of search results.   my $content = get($base_url."&page=1");   # Find the total number of search results pages to be processed.   $content =~ m{<totalpages>(.*?)</totalpages>}mgis;   my ($totalpages) = ('1'); 

After getting the total number of pages, we can start gathering the rest of the pages into an array, starting with the first page we have already downloaded. We can do this with a quick Perl expression that maps the page numbers to page requests, the results of which are added to the array. Notice that we also sleep for a second in between requests , as per the instructions in the Amazon.com Web Services license:

 # Grab all pages of search results.   my @search_pages = ($content);   if ($totalpages > 1) {     push @search_pages,       map { sleep(1); get($base_url."&page=$_") } (2..$totalpages);   } 

Now that we have all the pages of the results, we can process them all and extract data for each album found. Details for each item are, appropriately enough, found as children of a tag named details . We can extract these children from each occurrence of the details tag using a regular expression. We can also grab the URL to the item detail page from an attribute named url :

 # Extract data for all the records   # found in the search results.   my @records;   for my $content (@search_pages) {     # Grab the content of all <details> tags     while ($content  [RETURN]  =~ m{<details(?!s) url="(.*?)".*?>(.*?)</details>}mgis) {       # Extract the URL attribute and tag body content.       my($url, $details_content) = ('', ''); 

After extracting the child tags for a detail record, we can build a Perl hash from child tag names and their content values, using another relatively simple regular expression and our convenience functions:

 # Extract all the tags from the detail record, using   # tag name as hash key and tag contents as value.   my %record = (_type=>'amazon', url=>$url);   while ($details_content =~ m{<(.*?)>(.*?)</>}mgis) {     my ($name, $val) = ('', '');     $record{clean_name($name)} = $val;   } 

However, not all of the child tags of details are flat tags. In particular, the names of artists for an album are child tags. So, with one more regular expression and a map function, we can further process these child tags into a list. We can also rename productname to title , for more intuitive use later:

 # Further process the artists list to extract author       # names, and standardize on product name as title.       my $artists = $record{artists}  '';       $record{artists} =         [ map { $_ } ( $artists =~ m{<artist>(.*?)</artist>}mgis ) ];       $record{title} = $record{productname};       push @records, \%record;     }   }   return @records; } 

So, with a few web requests and less than a handful of regular expressions, we can search for and harvest a pile of records on albums found at Amazon.com for a given artist.

Presenting the Results

At this point, we can identify a CD, look up its details in FreeDB, and search for albums at Amazon.com. The last thing our main program does is combine all these functions, determine which Amazon.com product is the current album, and feed it and the rest of the albums to a function to prepare an HTML page with the results.

Now, we can implement the construction of that page:

 sub html_template {   my $vars = shift;   my $out = '';   $out .= qq^     <html>       <head><title>Now Playing</title></head>       <body>         <div align="center">           <h1>Now playing:</h1>   ^;   $out .= format_album($vars->{current}, 1);   $out .= qq^           <h1>Also by this artist:</h1>\n";           <table border="1" cellspacing="0" cellpadding="8">   ^; 

This code begins an HTML page, using a function we'll implement in a minute, which produces a display of an album with title and cover art. Next, we can put together a table that shows the rest of the related albums from this artist. We create a table showing smaller cover art, with three columns per row:

 my $col = 0;   my $row = '';   for my $rec (@{$vars->{others}}) {     $row .= '<td align="center" width="33%">';     $row .= format_album($rec, 0);     $row .= "</td>\n";     $col++;     if (($col % 3) == 0) {       $out .= "<tr>\n$row\n</tr>\n";       $row = '';     }   } 

Finally, we close up the table and finish off the page:

 $out .= qq^           </table>         </div>     </body></html>   ^;   return $out; } 

The last thing we need is a function to prepare HTML to display an album:

 sub format_album {   my ($rec, $large) = @_;   my $out = '';   my $img = ($large) ? 'imageurllarge' : 'imageurlmedium';   $out .= qq^<a href="$rec->{url}"><img src="$rec->{$img}"/></a><br/>^;   $out .= qq^<b><a href="$rec->{url}">$rec->{title}</a></b><br />^;   if (defined $rec->{releasedate}) {     $out .= qq^Released: $rec->{releasedate}^;   }   if (ref($rec->{artists}) eq 'ARRAY') {     $out .= '<br />by <b>'.join(', ', @{$rec->{artists}}).'</b>';   } } 

This function produces several lines of HTML. The first line displays an album's cover art in one of two sizes, based on the second parameter. The second line displays the album's title, linked to its detail page at Amazon.com. The third line shows when the album was released, if we have this information, and the final line lists the artist who created the album.

Hacking the Hack

With this script and the help of FreeDB and Amazon.com, we can go from a CD in the tray to an HTML Now Playing display. This could be integrated into a CD player application and improved in any number of ways:

  • The handful of regular expressions used to parse Amazon.com's XML are mostly adequate, but a proper XML parser, like XML::Simple , would be better.

  • Errors and unidentified CDs are not handled very well.

  • Other web services could be pulled in to further use the harvested CD data.

Maybe someday, something like this could be enhanced with the ability to automatically purchase and download music from an online store, to grab albums you don't yet have in order to expand your jukebox even further. Powerful things happen when one simple tool can be easily chained to another.

l.m.orchard



Spidering Hacks
Spidering Hacks
ISBN: 0596005776
EAN: 2147483647
Year: 2005
Pages: 157

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net