13.7. Load Data into the Database | Web Mapping Illustrated: Using Open Source GIS Toolkits

< Day Day Up >

There are many ways to put data into a database. One method is to manually type SQL commands with psql to insert data into a table. You can also use programs that convert data into an SQL script that can then be loaded, or you can use a tool that exports data directly to the database.

13.7.1. Using shp2pgsql

PostGIS comes with command-line tools called pgsql2shp and shp2pgsql. These can be used to convert from a PostGIS table to a shapefile and back.

The tool shp2pgsql converts the shapes to a text stream of SQL commands. Therefore, you need to pipe it to a text file for later loading or to the psql command to use it immediately. For example, use one of these methods:

 > shp2pgsql countyp020.shp countyp020 >  mycounties.sql  > psql -d project1 -f mycounties.sql

or use this one to load the data immediately to the database:

 > shp2pgsql mycounties.shp mycounties | psql -d project1

By default, the shp2pgsql command puts the geometry data into a field called the_geom, whereas the default for the ogr2ogr command (shown next) puts the geometry data into a field called wkb_geometry. These defaults can be overridden and changed to something more meaningful. To acquire the countyp020 data, see the next section.

13.7.2. Using ogr2ogr

ogr2ogr is an excellent program for putting spatial data into a database. It's part of the GDAL/OGR toolkit included in FWTools and introduced in Chapters 3 and 7. This command-line tool takes any OGR supported data layer and exports it into the database.

This example uses some data taken from the U.S. National Atlas site at http://nationalatlas.gov/atlasftp.html. Specifically, it uses the County Boundaries data at http://edcftp.cr.usgs.gov/pub/data/nationalatlas/countyp020.tar.gz.

The file is a gzip'd tar file. On Linux, you can expand this file using the command-line program tar:

 > tar -xzvf countyp020.tar.gz

This creates a shapefile named countyp020. On Windows, most Zip programs (e.g., WinZip) can decompress this file for you.

The ogrinfo command then provides a summary of how many features are in the file:

 > ogrinfo countyp020.shp -al -so ... Feature Count: 6138 ...

It is a good idea to have a feel for how much data will be converted before running the next step. The more features there are, the more time it will take to load the data to the database.

To translate this shapefile to the project1 database, use the ogr2ogr command:

 > ogr2ogr -f "PostgreSQL" PG:dbname=project1 countyp020.shp

This is one of the simplest examples of using ogr2ogr with PostgreSQL/PostGIS. The first parameter is the target data format. In this case, it is a PostgreSQL database. The prefix PG: provides more detail about the target data source. Here, only the database name is supplied: dbname=project1.

If the database is on a different computer or requires more user access privileges, further details will be required. Multiple database parameters can be included, and you can quote them like this:

 .."PG:dbname=project1 host=mypc user=tyler"..

The source data filename is countyp020.shp, a shapefile data layer. The conversion process may take a minute or two to run.

As shown in Example 13-1, check to see that it was loaded successfully by going into the psql interpreter again. You can list the tables that are available and do a simple query to see if all the features came across.

Example 13-1. Checking the results of loading a shapefile into PostgreSQL

 > psql project1 Welcome to psql 7.4.2, the PostgreSQL interactive terminal.   Type:  \copyright for distribution terms        \h for help with SQL commands        \? for help on internal slash commands        \g or terminate with semicolon to execute query        \q to quit   project1=# \dt              List of relations  Schema |       Name       | Type  | Owner --------+------------------+-------+-------  public | countyp020       | table | tyler  public | geometry_columns | table | tyler  public | spatial_ref_sys  | table | tyler (3 rows)   project1=# SELECT count(*) FROM countyp020;  count -------   6138 (1 row)

This reports the number of features that are in the table. Each geographic feature, in this case, counties, has a record in the countyp020 table.

You can specify the name you want the table to have when it is created using the -nln parameter followed by the name. In the previous example, it used the name of the original data layer as the name of the output table. If you want the output table to have a different name (counties) use the ogr2ogr command like this (all one line):

 > ogr2ogr -f "PostgreSQL" PG:dbname=project1 countyp020.shp -nln counties

This table has more than geographic data. In the psql interpreter, the columns of data in the table can be listed using the \d parameter followed by the name of the table, as shown in Example 13-2.

Example 13-2. Listing the columns and types in the countyp020 table

 # \d countyp020             Table "public.countyp020"     Column    |     Type     --------------+---------------  ogc_fid      | integer       wkb_geometry | geometry      area         | numeric(9,3)  perimeter    | numeric(9,3)  countyp020   | numeric(9,0)  state        | character(2)  county       | character(50)  fips         | character(5)  state_fips   | character(2)  square_mil   | numeric(19,3) Check constraints:     "$1" CHECK (srid(wkb_geometry) = -1)     "$2" CHECK (geometrytype(wkb_geometry) = 'POLYGON'::text OR wkb_geometry IS NULL)

Each column is listed and shows the datatype that each column can hold. PostgreSQL databases can handle all these types without PostGIS, except for the geometry data. Only one column in the table contains geometry data: the wkb_geometry column has geometry listed as its type:

  wkb_geometry | geometry

You can have multiple columns holding geometry data in your table. You can even store different types of geometries (points, lines, or polygons) in the same table by adding more fields of the type geometry.

The ogr2ogr utility automatically names the geometry column wkb_geometry (which stands for well-known binary geometry, another OGC specification), but it can be called anything you like. The other columns hold standard database information numeric or character/text data.

Both the shp2pgsql and ogr2ogr methods do some metadata tracking by inserting records into the table called geometry_columns. This table tracks which columns in a table include geometric features. Some applications depend on finding entries in this table, and others don't. For example, MapServer doesn't require this, but the ArcMap-PostGIS connector (discussed at the end of this chapter) depends on it. It is good practice to keep this table up to date. Here's a sample entry in the table:

 -[ RECORD 1 ]-----+------------- f_table_catalog   | f_table_schema    | public f_table_name      | countyp020 f_geometry_column | wkb_geometry coord_dimension   | 3 srid              | -1 type              | POLYGON attrelid          | 949399 varattnum         | 2 stats             |

This is the entry for the countyp020 table, as shown by f_table_name = countyp020. The name of the column holding the geometric data is stored in the f_geometry_columnfield. PostGIS can hold 3D data. coord_dimension can be set to 2 (2D) or 3 (3D). The SRID field refers to the spatial reference system. In this example, it is set to srid = -1, meaning none is specific. The field type = POLYGON says that the geometry column holds polygon features. Any application that accesses PostGIS data can inspect the geometry_columns table instead of looking for information from each and every table.

< Day Day Up >