Ever wonder where your network traffic goes when you visit a site on the 'Net?
Not long after the Internet began to be widely used in our society, a new word began to gain currency among the public to evoke the experience of being able to find information and communicate with people from all over the globe, and that word was cyberspace. In fact, the word "cyberspace," taken from the term cybernetics, a technical term for human-computer interfaces, has been so overused that it comes across as trite or hackneyed today. All the same, the word conjures up an image of sweeping digital vistas, waiting to be explored and homesteaded, and so has a great deal of potencywhich is probably why the word became a cliche in the first place.
The fact, however, is that the Internet works so beautifully and, usually, so transparently, that most people don't take the time to consider that cyberspace and meatspace (as we hackers sometimes jokingly refer to the Real World) are actually connected. Obviously, every web server, DSL router, cable modem, dial-up service, and so on, is located somewhere on the planet. But who knows where?
3.11.1. From Clicks to Bricks
As it happens, the Whereis service at http://www.parsec.it/whereis/ knows where Internet addresses are hosted in the real world, sometimes with astonishing accuracy. The front page of the site, clearly modeled after Google's, offers a simple search box, where you can type in an Internet domain name or an IP address in dotted-quad format (e.g., 192.168.1.1). Clicking the locate button takes you to the view shown in Figure 3-18, with a Google Map showing a marker over the most probable physical location of that Internet address. Clicking on the marker pops up some basic information about the address, including the country and locale that it's believed to be physically located in or near.
What's particularly interesting about Whereis is how accurately it identifies the approximate location of high-speed residential connections, such as DSL and cable modems. If you have such a connection, try putting in your own IP address at home. If you don't happen to know what your IP address is at homeand it may be assigned dynamicallyyou can use an online service like http://www.whatismyipaddress.com/ to find out what public IP address you're appearing from, and then cut and paste that address into the Whereis search box.
What's even more interesting about Whereis is that when it fails, as it might if your Internet provider uses an upstream web proxy (which AOL has been known to do). As a result, Whereis may decide that you're in, say, Reston, Virginia, even though the sign outside your house says "Welcome to Rapid City, Iowa!" Note also that what Whereis tries to return is the physical location of the hardware hosting the domain, not the place that the web site or even the domain name purports to represent, which is why web sites such as zooleika.org.uk and www.freemap.in turn up in Fremont, California, rather than in London or Mumbai. Also, it's conceivable that some large web sites might be hosted in different locations, with different IP addresses for the same domain name, which might result in different locations being returned on different tries for a single address.
Figure 3-18. Whereis correctly places www.google.com in Mountain View, California
3.11.2. How It Works
The fact that Whereis does as well as it does seems nothing short of miraculous, under the circumstances. The Internet Address and Naming Authority (IANA), licenses a number of organizations around the world to manage the allocation of IP addresses and domain names. Each of these organizations maintains its own public database of address assignments, which typically can be accessed through the whois service on the 'Net. The problem is that each whois database returns results that have different information and are formatted in different ways. Not only that, but even once you've got, say, a mailing address for the owner of a given range of IP addresses, you still don't necessarily know where that place is in the world, in terms of latitude and longitude, which means you can't yet put it on a map or say what else is nearby.
So, the problem of physically locating an IP address turns out to be quite difficultso difficult, in fact, that there aren't any worthwhile free-as-in-freedom sources of this information on the 'Net that are more precise than the country level. Mapping IP addresses to countries can be useful for collecting statistics on international visitors to your weblog, but it's no good for making decent maps. Instead, Parsec Tech s.r.l., the maintainers of the Whereis service, would seem to be using MaxMind's GeoIP database, which, as you can see, does offer pretty impressive results. You can learn about MaxMind's products, which, interestingly enough, are used first and foremost for credit card fraud prevention, at http://www.maxmind.com/.
3.11.3. Hacking the Hack
Seeing where your favorite web sites actually live in the real world can be quite fascinating, but doesn't it make you curious to see how your requests get there in the first place? It definitely did for me, so I decided to hack the hack, by building a traceroute mapping service on top of Whereis. In technical parlance, the term traceroute is used to describe an attempt to discern which computers an Internet Protocol packet travels through on the way to its destination, after the Unix network diagnostics tool designed for that purpose. (A very similar tool ships with Windows, but its name has been abbreviated to tracert.)
The traceroute utility works as follows: all traffic sent over the Internet is broken up into packets, and each packet that's sent is marked with the IP address of the sender and the intended receiver. Additionally, each IP packet is marked with a Time-to-Live (TTL), which specifies how many network hops the packet can travel through. Each time a computer forwards the packet towards its destination, the TTL value in the packet header is decremented by one, and if it ever reaches zero, a message is sent back to the sender informing it that the receiver was unreachable. This feature of the Internet Protocol is designed to allow network engineers to detect loops and other routing problems.
traceroute piggybacks on this process, by first sending out a test packet to a given destination with a TTL of 1, and then a packet with a TTL of 2, and so on. Each time, the computer at each successive network hop along the way drops the TTL to zero and bounces the packet, thereby revealing its IP address. The process continues until the TTL value reaches the number of network hops to the destination, at which point the entire route is known. The time taken to perform each step provides an estimate of round-trip network latency and can be used to identify bottlenecks in a network route.
Figure 3-19. The route from New York City to googlemapshacks.com
Try it with a few different hosts and see what sort of results you get. You might see certain patterns emerge, such that your packets may have to run through a number of specific hops just to get out of your ISP. For variety's sake, you might try mapping traceroute results from any of the online traceroute services listed at http://traceroute.org/. Finally, if you map network routes that cross the Pacific Ocean, you may discover an interesting flaw in Google Maps polylines, as shown in Figure 3-20, which is that they can't cross the International Date Line! Oops!
Figure 3-20. Hey, wait! You're going the wrong way!