Hack 87 Searching the Better Business Bureau

Hack 87 Searching the Better Business Bureau

figs/moderate.gif figs/hack87.gif

Is that new company offering to build your house, deliver your groceries, and walk your dog legit and free of complaint? Find out with an automated query of the Better Business Bureau's web site .

If you're a citizen of the United States, you're probably aware of the Better Business Bureau (http://www.bbb.org), a nonprofit organization that acts as a neutral party in resolving complaints between businesses and consumers. There are over 125 local Better Business Bureaus across the country.

The Better Business Bureau (BBB) company database is searchable by URL. This hack runs a BBB search by URL and provides information on a business if one is found. Further, the hack searches PlanetFeedback.com for any additional online feedback about that company.

Links to feedback and basic company information is provided, but a tally of customer complaints from the BBB is not. Why? Each of the 125 local bureaus provides varying amounts of data and formats that data in slightly different ways; adding the code to handle them all would be, we suspect, a monumental undertaking. So, we are not going to provide that here; instead, we'll stick to basic company information only.

The Code

Save this script as bbbcheck.pl :

 #!/usr/bin/perl -w use strict; use LWP::Simple; use URI::Escape; # $MAX_BBB_SEARCH_RETRIES is the number of times that the # script will attempt to look up the URL on the BBB web site.  # (Experimentally, the BBB web site appeared to give "database # unavailable" error messages about 30% of the time.) my $MAX_BBB_SEARCH_RETRIES = 3; # $MAX_BBB_REFERRAL_PAGE_RETRIES is the number of times the # script will attempt to download the company information # from the URL provided in the search results. my $MAX_BBB_REFERRAL_PAGE_RETRIES = 3; # suck in our business URL, and append it to the BBB URL. my $business_url = shift  die "You didn't pass a URL for checking!\n"; my $search_url   = "http://search.bbb.org/results.html?tabletouse=".                    "url_search&url=" . $business_url; my %company; # place we keep company info. # look for the results until requested. for (my $i = 1; $i <= $MAX_BBB_SEARCH_RETRIES; ++$i) {     my $data = get($search_url); # gotcha, bugaboo!     # did we have a problem? pause if so.     if ($data =~ /apologize.*delay/ or !defined($data)) {        print "Connection to BBB failed. Waiting 5 seconds to retry.\n";        sleep(5); next; # let's try this again, shall we?     }     # die if there's no data to yank.     die "There were no companies found for this URL.\n"          if $data =~ /There are no companies/i;     # get the company name, address, and redirect.     if ($data =~ /<!-- n -->.*?href="(.*?)">(.*)<!--  -->.*?">(.*)<\/f/i) {        $company{redir}   = "http://search.bbb.org/";        $company{name}    = ; $company{address} = ;        $company{address} =~ s/<br>/\n/g;        print "\nCompany name and address:\n";        print "$company{name}\n$company{address}\n\n";     }     # if there was no redirect, then we can't     # move on to the local BBB site, so we die.     unless ($company{redir}) {       die "Unable to process the results returned. You can inspect ".           "the results manually at the following url: $search_url\n"; }     last if $data; } # now that we have the redirect for the local BBB site, # we'll try to download its contents and parse them. for (my $i = 1; $i <= $MAX_BBB_REFERRAL_PAGE_RETRIES; ++$i) {     my $data = get($company{redir});      # did we have a problem? pause if so.     unless (defined $data) {        print "Connection to BBB failed. Waiting 5 seconds to retry.\n";        sleep(5); next; # let's try this again, shall we?     }          $data =~ s/\n\f\r//g; # grab even more information.     $data =~ s/\n\f\r//g; # grab even more information.     if ($data=~/Date:<\/b>.*?<td.*?>(.*?)<\/td>/i){$company{start}=;}     if ($data=~/Entity:<\/b>.*?<td.*?>(.*?)<\/td>/i){$company{entity}=;}     if ($data=~/l ?:<\/b>.*?<td.*?>(.*?)<\/td>/i){$company{principal}=;}     if ($data=~/Phone.*?:<\/b>.*?<td.*?>(.*?)<\/td>/i){$company{phone}=;}     if ($data=~/Fax.*?:<\/b>.*?<td.*?>(.*?)<\/td>/){$company{fax}=;}     if ($data=~/Status:<\/b>.*?<td.*?>(.*?)<\/td>/){$company{mbr}=;}     if ($data=~/BBB:<\/b>.*?<td.*?>(.*?)<\/td>/){$company{joined}=;}     if ($data=~/sification:<\/b>.*?<td.*?>(.*?)<\/td>/){$company{type}=;}     last if $data; } # print out the extra data we've found. print "Further information (if any):\n"; foreach (qw/start_date entity principal phone fax mbr joined type/) {    next unless $company{$_}; # skip blanks.    print " Start Date: " if $_ eq "start_date";    print " Type of Entity: " if $_ eq "entity";    print " Principal: " if $_ eq "principal";    print " Phone Number: " if $_ eq "phone";    print " Fax Number: " if $_ eq "fax";    print " Membership Status: " if $_ eq "mbr";    print " Date Joined BBB: " if $_ eq "joined";    print " Business Classification: " if $_ eq "type";    print "$company{$_}\n"; } print "\n"; # alright. we have all our magic data that we can get from the  # BBB, so let's see if there's anything on PlanetFeedback.com to display. my $planetfeedback_url = "http://www.planetfeedback.com/sharedLetters".                          "Results/1,2933,,00.html?frmCompany=".                          uri_escape($company{name})."&frmFeedbackType".                          "One=0&frmIndustry=0&frmFeedbackTypeTwo=0".                          "&frmMaxValue=20&buttonClicked=submit1".                          "&frmEventType=0"; my $data = get($planetfeedback_url) or # go, speed   die "Error downloading from PlanetFeedback: $!"; # racer, go! # did we get anything worth showing? if ($data =~ /not posted any Shared Letters/i) {    print "No feedback found for company '$company{name}'\n"; } else { print "Feedback available at $planetfeedback_url\n"; } 

Running the Hack

Invoke the script on the command line with the URL of a business site you'd like to check. If there's no match at the BBBa distinct possibility, since it doesn't contain every known business URLthe script will stop:

 %  perl bbbcheck.pl http://www.oreilly.com  There were no companies found for this URL. 

If there is a match, it'll give you some information about the company, then check PlanetFeedback.com for additional data. If they've received any comments on the business at hand, you'll be provided a URL for further reading.

Let's do a little checking up on Microsoft, shall we?

 %  perl bbbcheck.pl http://www.microsoft.com  Company name and address: MICROSOFT CORPORATION 9255 Towne Center Dr 4th Fl SAN DIEGO, CA Further information (if any):  Start Date: January 1975  Type of Entity: Corporation  Principal: Ms Shaina Houston FMS  Phone Number: January 1975  Fax Number: (858) 909-3838  Membership Status: Yes  Date Joined BBB: May 2003  Business Classification: Computer Sales & Service Feedback available at http://www.planetfeedback.com/sharedLettersResults/ 1,2933,,00.html?frmCompany=MICROSOFT%20CORPORATION&frmFeedbackTypeOne=0&  frmIndustry=0&frmFeedbackTypeTwo=0&frmMaxValue=20&buttonClicked=submit1&  frmEventType=0 

Hacking the Hack

The script here is extensive in what it does. After all, it visits two sites and provides you with a fair amount of information. But despite that, it's still pretty bare-bones. The output is sent only to the screen, and the amount of information it scrapes is limited because of the multiple formats of the various local BBBs.

So, when you're planning on improving the script, focus on two different things. First, think about how you might scrape more information if it were presented in a more standard format. For example, say you want to search only businesses in San Francisco. The BBB search site allows for that, though you'll have to search by business name instead of URL (see the first search option at http://search.bbb.org/search.html). If you search for businesses only in San Francisco, you'll get results only from the Golden Gate BBB. With one data format, you can access more information, including any complaint numbers and the company's standing in the BBB.

The second thing you'll want to improve is output. Currently, this hack sends out only plain text, but, as you saw previously, the PlanetFeedback.com URL is extensive. To fix this, you might want to spit out HTML instead, allowing you to simply click a link instead of copying and pasting. For that matter, you could set up an array with several business URLs and send all their results to the same file.



Spidering Hacks
Spidering Hacks
ISBN: 0596005776
EAN: 2147483647
Year: 2005
Pages: 157

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net