Hack 26. Follow Your Packets Across the Internet

Ever wonder where your network traffic goes when you visit a site on the 'Net?

Not long after the Internet began to be widely used in our society, a new word began to gain currency among the public to evoke the experience of being able to find information and communicate with people from all over the globe, and that word was cyberspace. In fact, the word "cyberspace," taken from the term cybernetics, a technical term for human-computer interfaces, has been so overused that it comes across as trite or hackneyed today. All the same, the word conjures up an image of sweeping digital vistas, waiting to be explored and homesteaded, and so has a great deal of potencywhich is probably why the word became a cliche in the first place.

The fact, however, is that the Internet works so beautifully and, usually, so transparently, that most people don't take the time to consider that cyberspace and meatspace (as we hackers sometimes jokingly refer to the Real World) are actually connected. Obviously, every web server, DSL router, cable modem, dial-up service, and so on, is located somewhere on the planet. But who knows where?

3.11.1. From Clicks to Bricks

As it happens, the Whereis service at http://www.parsec.it/whereis/ knows where Internet addresses are hosted in the real world, sometimes with astonishing accuracy. The front page of the site, clearly modeled after Google's, offers a simple search box, where you can type in an Internet domain name or an IP address in dotted-quad format (e.g., 192.168.1.1). Clicking the locate button takes you to the view shown in Figure 3-18, with a Google Map showing a marker over the most probable physical location of that Internet address. Clicking on the marker pops up some basic information about the address, including the country and locale that it's believed to be physically located in or near.

Whereis uses the standard Google Maps API to display the map on the results page. Embedded within a JavaScript block on the results page is a call to the GMarker( ) constructor, which specifies the physical coordinates of the Internet address and generates the marker that you see on the map.

What's particularly interesting about Whereis is how accurately it identifies the approximate location of high-speed residential connections, such as DSL and cable modems. If you have such a connection, try putting in your own IP address at home. If you don't happen to know what your IP address is at homeand it may be assigned dynamicallyyou can use an online service like http://www.whatismyipaddress.com/ to find out what public IP address you're appearing from, and then cut and paste that address into the Whereis search box.

What's even more interesting about Whereis is that when it fails, as it might if your Internet provider uses an upstream web proxy (which AOL has been known to do). As a result, Whereis may decide that you're in, say, Reston, Virginia, even though the sign outside your house says "Welcome to Rapid City, Iowa!" Note also that what Whereis tries to return is the physical location of the hardware hosting the domain, not the place that the web site or even the domain name purports to represent, which is why web sites such as zooleika.org.uk and www.freemap.in turn up in Fremont, California, rather than in London or Mumbai. Also, it's conceivable that some large web sites might be hosted in different locations, with different IP addresses for the same domain name, which might result in different locations being returned on different tries for a single address.

Figure 3-18. Whereis correctly places www.google.com in Mountain View, California

3.11.2. How It Works

The fact that Whereis does as well as it does seems nothing short of miraculous, under the circumstances. The Internet Address and Naming Authority (IANA), licenses a number of organizations around the world to manage the allocation of IP addresses and domain names. Each of these organizations maintains its own public database of address assignments, which typically can be accessed through the whois service on the 'Net. The problem is that each whois database returns results that have different information and are formatted in different ways. Not only that, but even once you've got, say, a mailing address for the owner of a given range of IP addresses, you still don't necessarily know where that place is in the world, in terms of latitude and longitude, which means you can't yet put it on a map or say what else is nearby.

So, the problem of physically locating an IP address turns out to be quite difficultso difficult, in fact, that there aren't any worthwhile free-as-in-freedom sources of this information on the 'Net that are more precise than the country level. Mapping IP addresses to countries can be useful for collecting statistics on international visitors to your weblog, but it's no good for making decent maps. Instead, Parsec Tech s.r.l., the maintainers of the Whereis service, would seem to be using MaxMind's GeoIP database, which, as you can see, does offer pretty impressive results. You can learn about MaxMind's products, which, interestingly enough, are used first and foremost for credit card fraud prevention, at http://www.maxmind.com/.

3.11.3. Hacking the Hack

Seeing where your favorite web sites actually live in the real world can be quite fascinating, but doesn't it make you curious to see how your requests get there in the first place? It definitely did for me, so I decided to hack the hack, by building a traceroute mapping service on top of Whereis. In technical parlance, the term traceroute is used to describe an attempt to discern which computers an Internet Protocol packet travels through on the way to its destination, after the Unix network diagnostics tool designed for that purpose. (A very similar tool ships with Windows, but its name has been abbreviated to tracert.)

The traceroute utility works as follows: all traffic sent over the Internet is broken up into packets, and each packet that's sent is marked with the IP address of the sender and the intended receiver. Additionally, each IP packet is marked with a Time-to-Live (TTL), which specifies how many network hops the packet can travel through. Each time a computer forwards the packet towards its destination, the TTL value in the packet header is decremented by one, and if it ever reaches zero, a message is sent back to the sender informing it that the receiver was unreachable. This feature of the Internet Protocol is designed to allow network engineers to detect loops and other routing problems.

traceroute piggybacks on this process, by first sending out a test packet to a given destination with a TTL of 1, and then a packet with a TTL of 2, and so on. Each time, the computer at each successive network hop along the way drops the TTL to zero and bounces the packet, thereby revealing its IP address. The process continues until the TTL value reaches the number of network hops to the destination, at which point the entire route is known. The time taken to perform each step provides an estimate of round-trip network latency and can be used to identify bottlenecks in a network route.

Fortunately, traceroute has a sufficiently simple output format that the results can easily be parsed in JavaScript by a web browserand that's exactly what the Google Maps traceroute at http://mappinghacks.com/projects/gmaps/traceroute.html does. Start by running TRaceroute from your *nix or OS X terminal, or TRacert from the Windows command line, where is an Internet domain name or an IP address. Copy the output to the clipboard, then go back to your web browser and paste the results into the text box on the right, as shown in Figure 3-19. Finally, click "Trace your packets!" and watch the hosts between you and your chosen destination appear on the map, one by one. Clicking on any of the markers that show up opens an info window that shows the country and locale, the servers hosted there, and the reported network latency to that host.

Figure 3-19. The route from New York City to googlemapshacks.com

On Windows, you can get to the command-line interface by going to Start images/U2192.jpg border=0> Run, typing in cmd, and then clicking OK.

Under the hood, when you click Trace, a JavaScript function parses the individual entries from the supplied traceroute output by looping over it with a regular expression. Next, it sends each IP address to Whereis via asynchronous XMLHttpRequest(), and then uses another regular expression to scrape the coordinates out of the JavaScript in the returned HTML page. As the results come back, a marker is created on the map for each unique location, using Google Maps GMarker objects. Some care is taken to note when Internet hosts are listed as being at the same location, so as to avoid redundant overlapping markers. Additionally, a line is drawn between each successive location with GPolyline overlays, to mark out the path of your packets as they speed through the ether. The code runs to over 180 lines, so we don't have room to print it here, but you can always view the source of the page in your browser if you're curious to see how it works.

Try it with a few different hosts and see what sort of results you get. You might see certain patterns emerge, such that your packets may have to run through a number of specific hops just to get out of your ISP. For variety's sake, you might try mapping traceroute results from any of the online traceroute services listed at http://traceroute.org/. Finally, if you map network routes that cross the Pacific Ocean, you may discover an interesting flaw in Google Maps polylines, as shown in Figure 3-20, which is that they can't cross the International Date Line! Oops!

Figure 3-20. Hey, wait! You're going the wrong way!

You Are Here: Introducing Google Maps

Introducing the Google Maps API

Mashing Up Google Maps

On the Road with Google Maps

Google Maps in Words and Pictures

API Tips and Tricks

Extreme Google Maps Hacks