Hack 84. Build a Free World Gazetteer

Build on the free GEONet data set about millions of world cities and landmarks.

A gazetteer is "a geographical dictionary; a book giving the names and descriptions, etc., of many places" (Webster's). Many commercial gazetteer services exist online. They provide an index of "interesting places" with geospatial coordinates: an index for a world atlas.

A gazetteer service is useful in helping to extract spatial information from text. It helps you decide where your own interesting places are near.

7.8.1. The Web Interface

You can try out this simple gazetteer at http://mappinghacks.com/cgi-bin/gazetteer.cgi. Try typing a place name into the search box. Figure 7-6 shows the results of a sample search. You can search for an exact match, or you can match from the start or the end of the name. An option to filter results by country is available. You should see any matches for your place name, along with the country it's in, the feature type of the place, and an approximate GPS reference for it.

Figure 7-6. A simple web gazetteer service

The gazetteer responds to simple GET requests. You can add query=place to look for a place name, and filter by asking for country=ISO code. This will only return exact matches for the name; optionally, you can try a fuzzy match from the start or the end of your query by appending match=start or match=end to the URL. This gazetteer also has a simple machine-readable output. The following URL, with format=rdf appended to the end, will return the same data in easy-to-parse XML:

http://mappinghacks.com/cgi-bin/index.cgi?query=London&country=UK&format=rdf

How does this gazetteer work? It is built from a source of free data known as GEOnet, published by the U.S. government's National Geospatial Intelligence Agency, formerly the National Imagery and Mapping Agency. It provides coverage of interesting placesnot just populated places, but hydro features, landmarks, and some transport infrastructure information. GEOnet offers feature indexes with locations for the whole world for free download. They are in some areas inaccurate and outdated, but they are a good start for building your own free world gazetteer.

7.8.2. The Data

At http://earth-info.nga.mil/gns/html/index.html is the GEOnet nameserver. Here you'll find the raw data and a web browsing interface for all the GEOnet data. All the GEOnet datafiles are available via FTP as well as HTTP. Grab them all using the ncftpget client, part of the ncftp package standard on *NIX machines:

> ncftpget ftp://ftp.nga.mil/pub/gns_data/*.zip

This will download all the GNS files in ZIP format into the current directory. Zipped, the data is a little under 200 MB in size! Each file consists of a tab-separated list of values in a common format. The format is explained in more detail at http://earth-info.nga.mil/gns/html/gis_countryfiles.html.

The fields we are particularly interested in are:

LAT and LONG

WGS84 latitude in decimal format

UFI

A unique feature identifier, which we'll keep for future proofing purposes

DSG

A "Feature Designation Code," which identifies the different types of features

FULL_NAME

The full name of the place

The GEOnet data has "Feature Classification" info for all kinds of metadata, including Vegetation and Undersea features, but for the most part, it consists of information about populated places. This can be tremendously useful for asking "Show me all GP's surgeries near Birmingham," for example. We created a list of GEOnet feature types found on the Web and used them to populate our gazetteer.

Each of the GEOnet country files has a two-letter code. These aren't the familiar ISO two-letter codes, though; they are FIPS codes used by the U.S. Census. Most of the rest of the world uses ISO codes, as well as other applications that might use a gazetteer service, so we'll convert from FIPS to ISO while building our database. However, be aware that the ISO may contain some copyrights on ISO country codes!

To store our world model, we'll use a simple SQL schema. This could be any SQL database, such as MySQL. We used sqlite, an SQL interface to dbm files, to build this example. To get you started, we've provided a SQL file with the model of countries and their codes, and the various feature types, which you can download from http://mappinghacks.com/gazetteer/, along with the original files and short scripts from which it was built:

create table country( 
 id integer primary key not null,
 name varchar(255),
 iso varchar(2),
 fips varchar(2)
);
create table feature (
 id integer primary key not null,
 name varchar(64),
 code varchar(4),
 fc varchar(2)
);
 
CREATE TABLE place (
 id integer primary key not null,
 name varchar(255),
 country integer,
 ufi integer,
 feature_type integer,
 lat double,
 lon double,
 alt double
);
create index name_index on place (name);

 

7.8.3. The Code

We wrote a quick script to go through the GEOnet.zip files one by one, unpacking them and looking up their country and feature codes:

#!/usr/bin/perl
 
use strict;
use Archive::Zip;
use Data::Dumper;
use DBI;
my $dbh = DBI->connect('dbi:SQLite:gaz.db','','',{AutoCommit => 0});
 
my $dir = shift;
opendir( DIR, $dir) or die "Couldnt read $dir : $!";
my @files = grep {/zip/} readdir(DIR);
my (%countries,%types);
 
my $sth = $dbh->prepare("select id, fips from country");
$sth->execute;
while (my $row = $sth->fetchrow_hashref) {
 $countries{lc($row->{fips})} = $row->{id};
}
 
my $sth = $dbh->prepare("select id, code from feature");
$sth->execute;
while (my $row = $sth->fetchrow_hashref) {
 $types{$row->{code}} = $row->{id};
}
 
my $count = 0;
foreach my $f (@files) {
 my $zip = Archive::Zip->new;
 chdir($dir);
 print $zip->read( $dir.$f );
 my $code = $f;
 $code =~ s/.zip//; 
 my @members = $zip->members;
 my $txt; 
 foreach (@members) {
 $txt = $zip->extractMember($_);
 }
 
 open(FILE,$dir.$code.'.txt');
 while () { 
 my @fields = split("	",$_); 
 my ($RC,$UFI,$LAT,$LONG,$FC,$DSG,$ADM1,$ADM2,$GENERIC,$FULL_NAME) =
 @fields[0,1,3,4,9,10,13,14,20,22];
 my $sql = "INSERT INTO place (name,country,ufi,feature_type,lat,lon)
 values (".$dbh->quote($FULL_NAME).",'$countries{$code}','$UFI',
 '$types{$DSG}','$LAT','$LONG')";
 $dbh->do($sql);
 $count++;
 if ($count > 5000) {
 $dbh->commit;
 $count = 0; 
 }
 }
 close FILE;
}

How do we read our new dictionary of places? Before we've built in an interface to it, we can write simple queries to it in SQL. The next statement looks for all place names matching "Abu" and returns the place and country:

> SELECT place.name, country.iso from place, country 
WHERE place.name like '%Abu%' 
AND country.id = place.country;

Once we've found the place we're looking fore.g., Abu Dhabiwe can look for geospatial information about it:

> SELECT place.name, place.lat,place.lon,feature.name from place, feature 
WHERE place.name = 'Abu Dhabi' 
AND feature.id = place.feature_type;

We might want a list of all populated places, or all rivers, in a country. This will be especially useful when we have a list of things and want to figure out which ones are cities, such as in trying to extract spatial proper nouns from news items, or extracting spatial references [Hack #45] .

7.8.4. Hacking the Hack

If you have access to a PostGIS database, you can make the gazetteer more interesting by storing the latitude and longitude as POINT geometry types, rather than just character strings. With polygons representing country borders or political administrative areas, you can make much more sophisticated spatial queries; distance between points and "is this place in this area" are just the start. A further hack that pursues these ideas can be found online at http://mappinghacks.com/projects/gutenmap/, where the GNS is used as the basis for a (rough) interactive map of the Peloponnesian War. See [Hack #87], which covers PostGIS geometry functions in much more detail.

Mapping Your Life

Mapping Your Neighborhood

Mapping Your World

Mapping (on) the Web

Mapping with Gadgets

Mapping on Your Desktop

Names and Places

Building the Geospatial Web

Mapping with Other People



Mapping Hacks
Mapping Hacks: Tips & Tools for Electronic Cartography
ISBN: 0596007035
EAN: 2147483647
Year: 2004
Pages: 172

Flylib.com © 2008-2020.
If you may any questions please contact us: flylib@qtcs.net