Hack 88 Searching for Health Inspections

figs/moderate.gif figs/hack88.gif

How healthy are the restaurants in your neighborhood? And when you find a good one, how do you get there? By combining databases with maps!

You don't have to scrape a site to build a URL that leads to their resources! This hack searches Seattle's King County database of restaurant inspections (http://www.decadeonline.com/main. phtml ?agency=skc), which can be queried with a complete restaurant name or just a single word. The script returns a list of the restaurants found, links to the restaurant's health inspection information, and also adds a direct link to a MapQuest map of the restaurant's location.

What? Isn't scraping MapQuest against its TOS? Yes, but this program doesn't touch the MapQuest site; instead, it builds a direct link to a relevant MapQuest map. So, while a user might access a MapQuest page based on this program's output, we never programmatically access the site and thus never violate the TOS.

The Code

Save this script as kcrestaurants.pl :

 #!/usr/bin/perl -w use strict; use HTML::TableExtract; use LWP::Simple; use URI::Escape; # get our restaurant name from the command line. my $name = shift  die "Usage: kcrestaurants.pl <string>\n"; # and our constructed URL to the health database. my $url = "http://www.decadeonline.com/results.phtml?agency=skc".           "&forceresults=1&offset=0&businessname=" . uri_escape($name) .           "&businessstreet=&city=&zip=&soundslike=&sort=FACILITY_NAME"; # download our health data. my $data = get($url) or die $!; die "No restaurants matched your search query.\n"     if $data =~ /no results were found/;   # and suck in the returned matches. my $te = HTML::TableExtract->new(keep_html => 1, count => 1); $te->parse($data) or die $!; # yum, yum, i love second table! # and now loop through the data. foreach my $ts ($te->table_states) {   foreach my $row ($ts->rows) {      next if $row->[1] =~ /Site Address/; # skip if this is our header.      foreach ( qw/ 0 1 / ) { # remove googly poofs.         $row->[$_] =~ s/^\s+\s+\s+$/ /g; # remove whitespace.         $row->[$_] =~ s/\n\f\r/ /g; # remove newlines.      }       # determine name/addresses.      my ($url, $name, $address, $mp_url);       if ($row->[0] =~ /href="(.*?)">.*?2">(.*?)<\/font>/) {          ($url, $name) = (, ); # almost there.      } if ($row->[1] =~ /2">(.*?)<\/font>/) { $address = ; }      # and the MapQuest URL.      if ($address =~ /(.*), ([^,]*)/) {          my $street = ; my $city = ;          $mp_url = "http://www.mapquest.com/maps/map.adp?".                    "country=US&address=" . uri_escape($street) .                    "&city=" . $city . "&state=WA&zipcode=";      }      print "Company name: $name\n";      print "Company address: $address\n";      print "Results of past inspections:\n ".            "http://www.decadeonline.com/$url\n";      print "MapQuest URL: $mp_url\n\n";   } } 

Running the Hack

To run the hack, just specify the restaurant name or keyword you want to search for. If there's no restaurant found based on your query, it'll say as much:

 %  perl kcrestaurants.pl perlfood  No restaurants matched your search query. 

A matching search returns health inspection and MapQuest links:

 %  perl kcrestaurants.pl "restaurant le gourmand"  Company name: RESTAURANT LE GOURMAND Company address: 425 NW MARKET ST , Seattle Results of past inspections:  http://www.decadeonline.com/fac.phtml?    agency=skc&forceresults=1&facid=FA0003608 MapQuest URL: http://www.mapquest.com/maps/map.adp?country=US&address    =425%20NW%20MARKET%20ST%20&city=Seattle&state=WA&zipcode= 

Or, if there are a number of results, it returns a complete list:

 %  perl kcrestaurants.pl restaurant  Company name: RESTAURANT EL TAPATIO Company address: 3720 FACTORIA BL , Bellevue Results of past inspections:  http://www.decadeonline.com/fac.phtml?    agency=skc&forceresults=1&facid=FA0003259 MapQuest URL: http://www.mapquest.com/maps/map.adp?country=US&address    =3720%20FACTORIA%20BL%20&city=Bellevue&state=WA&zipcode= Company name: RESTAURANT ICHIBAN Company address: 601 S MAIN ST , Seattle Results of past inspections:  http://www.decadeonline.com/fac.phtml?    agency=skc&forceresults=1&facid=FA0001743 MapQuest URL: http://www.mapquest.com/maps/map.adp?country=US&address    =601%20S%20MAIN%20ST%20&city=Seattle&state=WA&zipcode= ... 

Hacking the Hack

If you don't live in Seattle, you might not personally have much use for this particular example. But if you live anywhere within the United States, the code can be adapted to suit you. Many counties in the United States have posted their restaurant inspection scores online. Go to your state or county's official web site (the county site is better if you know what it is) and search for restaurant inspections. From there, you should be able to find restaurant scores from which you can build a script like this. Bear in mind that different counties have different levels of information.

You don't have to use MapQuest either. If you have the name, city, and state of a restaurant, you can build a URL to get the phone number from Google. (However, you can't use the Google API to perform this search, because it does not yet support the phonebook : syntax.)

Let's take our previous example of the Restaurant Le Gourmand, located in Seattle, Washington. The Google search syntax for a phonebook query would be:

 bphonebook:Restaurant Le Gourmand Seattle WA 

And the URL to lead to the result would look like this:

 http://www.google.com/search?q=bphonebook:Restaurant+Le+Gourmand+Seattle+WA 

You might want to use that instead of, or in addition to, a link to MapQuest.



Spidering Hacks
Spidering Hacks
ISBN: 0596005776
EAN: 2147483647
Year: 2005
Pages: 157

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net