Hack 77 Geographic Distance and Back Again

figs/expert.gif figs/hack77.gif

When you're traveling from one place to another, it's usually handy to know exactly how many miles you're going to be on the road. One of the best ways to get the most accurate result is to use latitude and longitude .

Dr. Seuss once wrote, "From here to there, from near to far, funny things are everywhere." But just how far apart are those funny things, anyway?

Given the latitude and longitude of two terrestrial objects, and assuming the earth to be a perfect sphere with a smooth surface, the "great circle" calculation to find the shortest surface distance between those two objects is a simple bit of trigonometry. Even though the earth is neither smooth nor a perfect sphere, the calculation is surprisingly accurate. I found the positioni.e., the latitude and longitudeof my home and the home of a friend who lives a short distance away. Using a town map and a ruler, I calculated the distance at 7.49 miles. Using the positions and trigonometry, the calculated distance came out at 7.43 miles.

That was good enough for me, so I set about to create a program that would accept two addresses and return the distance between them. Initially, I thought I'd have the program done in about 30 minutes. Ultimately, it required a few hours of research and a creative hack of MapPoint. The tough part? Getting the true latitude and longitude for an address, something I mistakenly thought would be trivial on our little high-tech planetnot so!

The Latitude/Longitude Question

The difficulty associated with this hack can be demonstrated through a very simple exercise: right now, before you read any further, using any online resource that you like, go find the latitude and longitude of your housenot just of your Zip Code , but of your actual house .

Not so easy, is it? In fact, I was surprised by the difficulty this problem presented. I found several resourcesthe easiest to use being the U.S. Census web site (http://www.census.gov)that could turn Zip Codes into positions, presumably somewhere near the center of the Zip Code's geographic region, but virtually nothing that would give me the position of an actual address. In the past, I used a mapping service called MapBlast! (http://www.mapblast.com), and I thought I recalled that this service would give me map positions. However, a trip to MapBlast! now lands you at MapPoint, Microsoft's mapping service, which apparently acquired MapBlast! in the not-too- distant past.

At this point, I'll spare you the details of my research and cut to the chase:

  • If you want the position for a Zip Code, it's easy; there are lots of sites and even some Perl packages that will do this automatically for you.

  • The major mapping services will take a position and present you with a map, but they won't give you a position if you give them an address.

  • Microsoft has a nice set of web service APIs in addition to MapPoint, and they can be used to find the position of an address. Unfortunately, it's a subscription service.

  • Pay services (search http://www.geocode.com to find a few) can turn an address into a position.

  • Whether intentional or not, MapPoint does publish the position for an address in its publicly accessible web interface. It's not published on the page; it's published in the URL.

I found that last item in the list most intriguing. I discovered it quite by accident . I had mapped my address and by chance took a look at the URL. I recognized some numbers that looked suspiciously like my latitude and longitude. I played around a bit and found the behavior was consistent; MapPoint returns a latitude/longitude position in its URL whenever it maps an address. Try it. Go to http://mappoint.msn.com/ and map an address. Then, look closely at the URL for the parameter whose name is C . It's the latitude and longitude of the address you just looked up. Now, all I needed to do was find a way to make MapPoint give that data up to a Perl script!

Hacking the Latitude Out of MapPoint

Getting MapPoint to respond to a Perl script as it would to a browser was a bit more difficult than a straightforward GET or POST . My first few quick attempts earned me return data that contained messages like "Function not allowed," "ROBOT-NOINDEX," and "The page you are looking for does not exist." In the end, I grabbed my trusty packet analyzer and monitored the traffic between IE and MapPoint, ultimately learning what it would take to make MapPoint think it was talking to a browser and not to a script. Here's what happens:

  1. The first GET request to http://mappoint.msn.com/ earns you a Location : HTTP header in return. The new location redirects to home.aspx , prefixed by a pathname that includes a long string of arbitrary characters , presumably a session ID or some other form of tracking information.

  2. A GET on the new location retrieves the "Find a Map" form. Among the obvious fieldsstreet, city, and so onare some hidden ones. In particular, one hidden field named _ _VIEWSTATE contains about 1 KB of encoded data. It turns out that returning the exact _ _VIEWSTATE is important when sending the address query.

  3. Next, we do a POST to send MapPoint the address we want mapped. In addition to the address information and the _ _VIEWSTATE field, there are a few other hidden fields to send. In the present code, we send a request specifically for an address in the United States. MapPoint supports other countries , as well as "Place" queries for the entire world, and it wouldn't be too much work to extend the program to handle these as well.

  4. In response to the POST , we get another Location : HTTP header, this time redirecting to map.aspx . The URL contains several arguments, and among them is the latitude/longitude data that we want.

  5. If you perform a GET on the new location, now you get the map. Our script doesn't do this last GET , however, because the data we want is in the URL, not on the result page.

The Code

If you take a look at the GetPosition function in the code, you'll see that it follows the five steps in the previous section exactly. The code also includes a simple routine to parse an addressto make the thing user -friendly, not because we had toand a mainline to glue it all together and report the results. I used a nice package named Geo::Distance to perform the actual distance calculations. Time for some Perl!

Save the following code as geodist.pl :

 #!/usr/bin/perl -w # Usage: geodist.pl --from="fromaddr" --to="toaddr" [--unit="unit"] # See ParseAddress(  ) below for the format of addresses. Default unit is # "mile". Other units are yard, foot, inch, kilometer, meter, centimeter. use strict; use Getopt::Long; use Geo::Distance; use HTTP::Request::Common; use LWP::UserAgent; # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - my $_ADDRESS_REGEX = q<(((([^\,]+),\s*)?([^\,]+),\s*)?([A-Z]{2}))?> .   q<(\s*(\d{5}(-\d{4})?))?>; sub ParseAddress {   # Moderately robust regex parse of an address of the form:   #   Street Address, City, ST ZIP   # Assumes that a city implies a state, and a street address implies a   # city; otherwise, all fields are optional. Does a good job so long as   # there are no commas in street address or city fields.      my $AddrIn = shift;   my $ComponentsOut = shift;   $AddrIn =~ /$_ADDRESS_REGEX/;   $ComponentsOut->{Address} =  if ;   $ComponentsOut->{City} =  if ;   $ComponentsOut->{State} =  if ;   $ComponentsOut->{Zip} =  if ; } # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - sub GetPosition {   # Hack mappoint.msn.com to obtain the longitude and latitude of an   # address. MapPoint doesn't actually return lon/lat as user data, but   # it can be found in a Location header when a successful map request is   # made. Testing has shown this to be a robust hack. Biggest caveat   # presently is failure when MapPoint returns multiple address matches.   my $AddressIn = shift;   my $LatitudeOut = shift;   my $LongitudeOut = shift;   # Create a user agent for HTTP requests.   my $ua = LWP::UserAgent->new;   # First do a simple request to get the redirect that MapPoint sends us.   my $req = GET( 'http://mappoint.msn.com/' );   my $res = $ua->simple_request( $req );   # Save the redirect URI and then grab the full page.   my $uri = $res->headers->{location};   my $req = GET( 'http://mappoint.msn.com' . $uri );   my $res = $ua->request( $req );   # Get the _  _VIEWSTATE hidden input from the result.   my ( $_  _VIEWSTATE ) =     $res->content =~ /name="_  _VIEWSTATE" value="([^\"]*)"/s;   # Construct the form fields expected by the mapper.   my $req = POST( 'http://mappoint.msn.com' . $uri,     [ 'FndControl:SearchType' => 'Address',       'FndControl:ARegionSelect' => '12',       'FndControl:StreetText' => $AddressIn->{Address},       'FndControl:CityText' => $AddressIn->{City},       'FndControl:StateText' => $AddressIn->{State},       'FndControl:ZipText' => $AddressIn->{Zip},       'FndControl:isRegionChange' => '0',       'FndControl:resultOffSet' => '0',       'FndControl:BkARegion' => '12',       'FndControl:BkPRegion' => '15',       'FndControl:hiddenSearchType' => '',       '__VIEWSTATE' => $_  _VIEWSTATE     ] );   # Works without referer, but we include it for good measure.   $req->push_header( 'Referer' => 'http://mappoint.msn.com' . $uri );   # Do a simple request because all we care about is the redirect URI.   my $res = $ua->simple_request( $req );   # Extract and return the latitude/longitude from the redirect URI.   ( $$LatitudeOut, $$LongitudeOut ) = $res->headers->{location} =~     /C=(-?[0-9]+\.[0-9]+)...(-?[0-9]+\.[0-9]+)/; } # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - sub main {   # Get the command-line options.   my ( $FromOpt, %FromAddress, $ToOpt, %ToAddress );   my $UnitOpt = 'mile';   GetOptions( "from=s" => $FromOpt,               "to=s"   => $ToOpt,               "unit=s" => $UnitOpt );   # Parse the addresses.   ParseAddress( $FromOpt, \%FromAddress );   ParseAddress( $ToOpt, \%ToAddress );   # Get latitude/longitude for the addresses.   my ( $FromLat, $FromLon, $ToLat, $ToLon );   GetPosition( \%FromAddress, $FromLat, $FromLon );   GetPosition( \%ToAddress, $ToLat, $ToLon );   # If we at least got some numbers, then find the distance.   if ( $FromLat && $FromLon && $ToLat && $ToLon ) {     print "($FromLat,$FromLon) to ($ToLat,$ToLon) is ";     my $geo = new Geo::Distance;     print $geo->distance_calc( $UnitOpt, $FromLon,                                $FromLat, $ToLon, $ToLat );     if ( $UnitOpt eq 'inch' ) { print " inches\n"; }     elsif ( $UnitOpt eq 'foot' ) { print " feet\n"; }     else { print " ", $UnitOpt, "s\n"; }   }   else {     print "Latitude/Longitude lookup failed for FROM address\n"       if !( $FromLat && $FromLon );     print "Latitude/Longitude lookup failed for TO address\n"       if !( $ToLat && $ToLon );   } } main(  ); 

Running the Hack

A couple of quick examples will show how the hack would work:

 %  perl geodist.pl --from="Los Angeles, CA" --to="New York, NY"  (34.05466,-118.24150) to (40.71012,-74.00657) is 2448.15742500315 miles %  perl geodist.pl   --from="14 Horseshoe Drive, Brookfield, CT"   --to="5 Mountain Orchard, Bethel, CT"  (41.46380,-73.42021) to (41.35659,-73.41078) is 7.43209675476431 miles %  perl geodist.pl --from=06804 --to=06801  (41.47364,-73.38575) to (41.36418,-73.39262) is 7.57999735385486 miles 

If something goes wrong with a position lookupeither because MapPoint didn't find the address or because it found multiple addressesthe script simply indicates which address had a problem:

 %  perl geodist.pl --from="Los Angeles, CA" --to="New York"  Latitude/Longitude lookup failed for TO address 

In this case, " New York " is too general and needs to be refined further.

Hacking the Hack

The most obvious enhancement is to address the two shortcomings of the existing hack: it works only with addresses within the U.S., and it fails if MapPoint returns multiple address matches. Addressing the first issue is a matter of adding some options to the command line and then changing the fields sent in the POST query. Addressing the second issue is a bit more difficult; it's easy to parse the list that comes back, but the question is what to do with it. Do you just take the first address in the list? This may or may not be what the user wants. A true solution would probably have to present the list to the user and allow him to choose.

Ron Pacheco



Spidering Hacks
Spidering Hacks
ISBN: 0596005776
EAN: 2147483647
Year: 2005
Pages: 157

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net