6.3. Examining Data Content | Web Mapping Illustrated: Using Open Source GIS Toolkits

< Day Day Up >

Examining data content is an important part of any project. Understanding information specific to your dataset will help you use it more effectively. Each piece of spatial data will have some geographic component to it (coordinates describing the location of real features), but it will also have what are called attributes.These are non-geographic data about the geographic feature, such as the size of population, the name of a building, the color of a lake, etc. You will often hear the geographic coordinate data described as spatial data and the attribute information referred to as tabular, attribute, or nonspatial data. It is equally valid to call any dataset spatial if it has some geographic component to it.

6.3.1. Viewing Summary Information About Airports

The MapServer demo data includes a variety of vector spatial files; therefore you will use the ogrinfo tool to gather information about the files. At the command prompt, change into the workshop folder, and run the ogrinfo command to have it list the datasets that are in the data folder. The output from the command will look like Example 6-1.

Example 6-1. Showing a list of the layer names available in a folder containing shapefiles

 > ogrinfo data INFO: Open of 'data' using driver 'ESRI Shapefile' successful. 1: twprgpy3 (Polygon) 2: rmprdln3 (Line String) 3: lakespy2 (Polygon) 4: stprkpy3 (Polygon) 5: ctyrdln3 (Line String) 6: dlgstln2 (Line String) 7: mcd90py2 (Polygon) 8: twprdln3 (Line String) 9: plsscpy3 (Polygon) 10: mcdrdln3 (Line String) 11: majrdln3 (Line String) 12: drgidx (Polygon) 13: airports (Point) 14: ctybdpy2 (Polygon)

This shows that there are 14 layers in the data folder (the order of the listing may vary on other systems). You can also see that the folder contains ESRI shapefile format files. Each shapefile is a layer in this listing. If you look at the files located in the data folder, you will see that there are way more than 14 files. This is because a shapefile consists of more than one file: one holds spatial data, another holds tabular data, etc.

A summary of more information for each layer can be seen by adding the name of the layer to the ogrinfo command and a -summary parameter, as shown in Example 6-2.

Example 6-2. Showing the attributes, extent, and other information about a particular layer

 > ogrinfo -summary data airports INFO: Open of 'data' using driver 'ESRI Shapefile' successful. Layer name: airports Geometry: Point Feature Count: 12 Extent: (434634.000000, 5228719.000000) - (496393.000000, 5291930.000000) Layer SRS WKT: (unknown) NAME: String (64.0) LAT: Real (12.4) LON: Real (12.4) ELEVATION: Real (12.4) QUADNAME: String (32.0)

This example shows information about the airports layer.

     Geometry: Point

The geographic features in this file are points. In the next example, you will see that each airport feature has one pair of location coordinates.

     Feature Count: 12     Extent: (434634.000000, 5228719.000000) - (496393.000000, 5291930.000000)

There are 12 airport features in this layer, and they fall within the range of coordinates shown in the Extent line. The coordinates are measured in meters and are projected into the Universal Transverse Mercator (UTM) projection.

     Layer SRS WKT:     (unknown)

This explains what map projection the data is in. SRS stands for spatial reference system and WKT for well-known text format. Without getting into too much detail, these are terms popularized or created by the OGC. The SRS gives information about projections, datums, units of measure in the data, etc. WKT is a method for describing those statistics in a text-based, human-readable format (as opposed to a binary format). Refer to Appendix A for more information about map projections, SRS, and the EPSG numbering system. See Chapter 12 for more information on the OGC and its role in setting standards.

The previous example also says unknown because the creator of the data didn't explicitly include projection information within the file. This isn't very helpful if you don't know where the data is from. However, those familiar with the data might guess that it is in UTM coordinates.

     NAME: String (64.0)     LAT: Real (12.4)     LON: Real (12.4)     ELEVATION: Real (12.4)     QUADNAME: String (32.0)

These five lines tell you about the other types of nonspatial information that accompany each geographic feature. A feature, in this case, is a coordinate for an airport. These different pieces of information are often referred to as attributes, properties, columns, or fields. Each attribute has a name identifier and can hold a certain type of information. In the previous example, the text before the colon is the name of the attribute. Don't be confused by the fact that there is also an attribute called NAME in this file. The first line describes an attribute called NAME. The word after the colon tells you what kind of data can be held in that attribute either String (text characters) or Real (numbers). The numbers in the parentheses tell more specifically how much of each kind of data can be stored in the attribute. For example NAME: String (64.0) means that the attribute called NAME can hold up to 64 letters or numbers. Likewise ELEVATION: Real (12.4) means that the ELEVATION attribute can hold up to only 12-digit numbers with a maximum of 4 decimal places.

You may be wondering why this is important to review. Some of the most common errors in using map data can be traced back to a poor understanding of the data. This is why reviewing data with tools such as ogrinfo can be very helpful before launching into mapmaking. If you don't understand what kind of attributes you have at your disposal, you may not use the data to its fullest potential or you may push its use beyond appropriate bounds. Understanding your data in this depth will prevent future mistakes during the mapping process or during any analysis you may undertake. If your analysis relies on a certain kind of numbers with a level of precision or expected length of text, you need to make sure that the data you are analyzing actually holds these kinds of values, or you will get misleading results. Having this knowledge early in the process will help you have a more enjoyable experience along the way.

6.3.2. Viewing Detailed Airport Location Information

Summary information tells only part of the story. The same tools can be used to provide detailed information about the geographic data and its attributes. To get details, instead of summary information, you can use ogrinfo with a dataset and layer name like that in Example 6-3, but don't include the -summary parameter.

Example 6-3. Showing all the details about a shapefile layer

 > ogrinfo data airports INFO: Open of 'data' using driver 'ESRI Shapefile' successful. Layer name: airports Geometry: Point Feature Count: 12 Extent: (434634.000000, 5228719.000000) - (496393.000000, 5291930.000000) Layer SRS WKT: (unknown) NAME: String (64.0) LAT: Real (12.4) LON: Real (12.4) ELEVATION: Real (12.4) QUADNAME: String (32.0) OGRFeature(airports):0   NAME (String) = Bigfork Municipal Airport   LAT (Real) =      47.7789   LON (Real) =     -93.6500   ELEVATION (Real) =    1343.0000   QUADNAME (String) = Effie   POINT (451306 5291930) OGRFeature(airports):1   NAME (String) = Bolduc Seaplane Base   LAT (Real) =      47.5975   LON (Real) =     -93.4106   ELEVATION (Real) =    1325.0000   QUADNAME (String) = Balsam Lake   POINT (469137 5271647)

This view of the airport details tells you what value each airport has for each attribute. As you can see, the summary information is still included at the top of the listing, but then there are small sections for each feature. In this case there are seven lines, or attributes, for each airport. For example, you can see the name of the airport, but you can also see the UTM coordinate shown beside the POINT attribute.

This dataset also has a set of LAT and LON fields that are just numeric attributes and have nothing to do with using this data in a map. Not all types of point data have these two attributes. They just happened to be part of the attributes the creator wanted to keep. The actual UTM coordinates are encoded in the last attribute, POINT.

Only two features are shown in this example, the first starting with OGRFeature(airports):0. The full example goes all the way to OGRFeature(airports):11, including all 12 airports. The rest of the points aren't shown in this example, just to keep it simple.

ogrinfo is a great tool for digging even deeper into your data. There are more options that can be used, including a database query-like ability to select features and the ability to list only features that fall within a certain area. Running man ogrinfo (if your operating system supports manpages) shows the full usage for each parameter. Otherwise, the details are available on the OGR web site at http://www.gdal.org/ogr/ogr_utilities.html. You can also run the ogrinfo command with the --help parameter (ogrinfo --help) to get a summary of options. Example 6-4 shows some examples of how they can be used with your airport data.

Example 6-4. Listing the features that meet a specific attribute query

 > ogrinfo data airports -where "name='Bolduc Seaplane Base'" INFO: Open of 'data' using driver 'ESRI Shapefile' successful. Layer name: airports Geometry: Point Feature Count: 1 Extent: (434634.000000, 5228719.000000) - (496393.000000, 5291930.000000) Layer SRS WKT: (unknown) NAME: String (64.0) LAT: Real (12.4) LON: Real (12.4) ELEVATION: Real (12.4) QUADNAME: String (32.0) OGRFeature(airports):1   NAME (String) = Bolduc Seaplane Base   LAT (Real) =      47.5975   LON (Real) =     -93.4106   ELEVATION (Real) =    1325.0000   QUADNAME (String) = Balsam Lake   POINT (469137 5271647)

This example lists only those airports that have the name Bolduc Seaplane Base. As you can imagine, there is only one. Therefore, the summary information about this layer and one set of attribute values are listed for the single airport that meets this criteria in Example 6-5. The -sql option can also specify what attributes to list in the ogrinfo output.

If you are familiar with SQL, you will understand that the -sql option accepts an SQL statement. If SQL is something new to you, please refer to other database query language documentation, such as:

SQL in a Nutshell (O'Reilly)
SQL tutorial at http://www.w3schools.com/sql/

Many database manuals include a comprehensive reference section on SQL. The implementation of SQL in ogrinfo isn't complete and supports only SELECT statements.

Example 6-5. Selecting certain features and showing specific attributes in the results

 > ogrinfo data airports -sql "select name from airports where quadname='Side Lake'" INFO: Open of 'data' using driver 'ESRI Shapefile' successful. layer names ignored in combination with -sql. Layer name: airports Geometry: Point Feature Count: 2 Extent: (434634.000000, 5228719.000000) - (496393.000000, 5291930.000000) Layer SRS WKT: (unknown) name: String (64.0) OGRFeature(airports):4   name (String) = Christenson Point Seaplane Base   POINT (495913 5279532) OGRFeature(airports):10   name (String) = Sixberrys Landing Seaplane Base   POINT (496393 5280458)

The SQL parameter is set to show only one attribute, NAME, rather than all seven attributes for each feature. It still shows the coordinates by default, but none of the other information is displayed. This is combined with a query to show only those features that meet a certain QUADNAME requirement.

Example 6-6 shows how ogrinfo can use some spatial logic to find features that are within a certain area.

Example 6-6. Listing features that are located within a range of coordinates

 > ogrinfo data airports -spat 451869 5225734 465726 5242150 Layer name: airports Geometry: Point Feature Count: 2 Extent: (434634.000000, 5228719.000000) - (496393.000000, 5291930.000000) Layer SRS WKT: (unknown) NAME: String (64.0) LAT: Real (12.4) LON: Real (12.4) ELEVATION: Real (12.4) QUADNAME: String (32.0) OGRFeature(airports):7   NAME (String) = Grand Rapids-Itasca County/Gordon Newstrom Field   LAT (Real) =      47.2108   LON (Real) =     -93.5097   ELEVATION (Real) =    1355.0000   QUADNAME (String) = Grand Rapids   POINT (461401 5228719) OGRFeature(airports):8   NAME (String) = Richter Ranch Airport   LAT (Real) =      47.3161   LON (Real) =     -93.5914   ELEVATION (Real) =    1340.0000   QUADNAME (String) = Cohasset East   POINT (455305 5240463)

The ability to show only features based on where they are located is quite powerful. You do so using the -spat parameter followed by two pairs of coordinates. The first pair of coordinates 451869 5225734 represent the southwest corner of the area you are interested in querying. The second pair of coordinates 465726 5242150 represents the northeast corner of the area you are interested in, creating a rectangular area.

This is typically referred to as a bounding box, where one pair of coordinates represents the lower-left corner of the box and the other pair represents the upper right. A bounding box gives a program, such as ogrinfo, a quick way to find features you need.

ogrinfo then shows only those features that are located within the area you define. In this case, because the data is projected into the UTM coordinate system, the coordinates must be specified in UTM format in the -spat parameter. Because the data is stored in UTM coordinates, you can't specify the coordinates using decimal degrees (°) for instance. The coordinates must always be specified using the same units and projection as the source data, or you will get inaccurate results.

Example 6-7 is similar to a previous example showing complex query syntax using the -sql parameter, but it differs in one respect.

Example 6-7. Summary information about the results of a complex SQL query

 > ogrinfo data airports -sql "select * from airports where elevation > 1350 and quadname like '%Lake'" -summary INFO: Open of 'data' using driver 'ESRI Shapefile' successful. layer names ignored in combination with -sql. Layer name: airports Geometry: Point Feature Count: 5 Extent: (434634.000000, 5228719.000000) - (496393.000000, 5291930.000000)

If you add the -summary option, it doesn't list all the attributes of the features, but shows only a summary of the information. In this case, it summarizes only information that met the criteria of the -sql parameter. This is very handy if you just want to know how many features meet certain criteria or fall within a certain area but don't care to see all the details.

6.3.3. Viewing Statistics About a Satellite Image

You can download a sample satellite image from http://geogratis.cgdi.gc.ca/download/RADARSAT/mosaic/canada_mosaic_lcc_1000m.zip. If you unzip the file, you create a file called canada_mosaic_lcc_1000m.tif. This is a file containing an image from the RADARSAT satellite. For more information about RADARSAT, see http://www.ccrs.nrcan.gc.ca/ccrs/data/satsens/radarsat/rsatndx_e.html.

To better understand what kind of data this is, use the gdalinfo command. Like the ogrinfo command, this tool lists certain pieces of information about a file, but the GDAL tools can interact with raster/image data. The output from gdalinfo is also very similar to ogrinfo as you can see in Example 6-8. You should change to the same folder as the image before running the gdalinfo command.

Example 6-8. Listing information about the downloaded image

 > gdalinfo canada_mosaic_lcc_1000m.tif Driver: GTiff/GeoTIFF Size is 5700, 4800 Coordinate System is: PROJCS["LCC         E008",     GEOGCS["NAD83",         DATUM["North_American_Datum_1983",             SPHEROID["GRS 1980",6378137,298.2572221010042,                 AUTHORITY["EPSG","7019"]],             AUTHORITY["EPSG","6269"]],         PRIMEM["Greenwich",0],         UNIT["degree",0.0174532925199433],         AUTHORITY["EPSG","4269"]],     PROJECTION["Lambert_Conformal_Conic_2SP"],     PARAMETER["standard_parallel_1",49],     PARAMETER["standard_parallel_2",77],     PARAMETER["latitude_of_origin",0],     PARAMETER["central_meridian",-95],     PARAMETER["false_easting",0],     PARAMETER["false_northing",0],     UNIT["metre",1,         AUTHORITY["EPSG","9001"]]] Origin = (-2600000.000000,10500000.000000) Pixel Size = (1000.00000000,-1000.00000000) Corner Coordinates: Upper Left  (-2600000.000,10500000.000) (177d17'32.31"W, 66d54'22.82"N) Lower Left  (-2600000.000, 5700000.000) (122d54'49.00"W, 36d12'53.87"N) Upper Right ( 3100000.000,10500000.000) (  9d58'39.57"W, 62d25'50.45"N) Lower Right ( 3100000.000, 5700000.000) ( 62d32'49.65"W, 34d18'5.61"N) Center      (  250000.000, 8100000.000) ( 89d56'43.00"W, 62d46'47.18"N) Band 1 Block=5700x1 Type=Byte, ColorInterp=Gray

There are five main sections in this report. Unlike ogrinfo, there aren't a lot of different options, and attributes are very simplistic. The first line tells you what image format the file is.

     Driver: GTiff/GeoTIFF

In this case, it tells you the file is a GeoTIFF image. TIFF images are used in general computerized photographic applications such as digital photography and printing. However, GeoTIFF implies that the image has some geographic information encoded into it. gdalinfo can be run with a formats option, which lists all the raster formats it can read and possibly write. The version of GDAL included with FWTools has support for more than three dozen formats! These include several proprietary software vendor formats and many related to specific types of satellite data.

The next line shows the size of the image:

     Size is 5700, 4800.

An image size is characterized by the number of data rows and columns. An image is a type of raster data. A raster is made up of numerous rows of adjoining squares called cells or pixels. Rows usually consist of cells that are laid out east to west, whereas columns of cells are north to south. This isn't always the case but is a general rule of thumb. This image has 5,700 columns and 4,800 rows. The first value in the size statement is usually the width, therefore the number of columns of cells. Row and column numbering usually begins at the upper-left corner of the image and increases toward the lower-right corner. Therefore, cell 0,0 is the upper left, and cell 5700, 4800 is the lower right.

Images can be projected into various coordinate reference systems (see Appendix A for more about map projections):

     Coordinate System is:     PROJCS["LCC         E008",         GEOGCS["NAD83",             DATUM["North_American_Datum_1983",                SPHEROID["GRS 1980",6378137,298.2572221010042,                     AUTHORITY["EPSG","7019"]],                 AUTHORITY["EPSG","6269"]],             PRIMEM["Greenwich",0],             UNIT["degree",0.0174532925199433],             AUTHORITY["EPSG","4269"]],         PROJECTION["Lambert_Conformal_Conic_2SP"],         PARAMETER["standard_parallel_1",49],         PARAMETER["standard_parallel_2",77],         PARAMETER["latitude_of_origin",0],         PARAMETER["central_meridian",-95],         PARAMETER["false_easting",0],         PARAMETER["false_northing",0],         UNIT["metre",1,             AUTHORITY["EPSG","9001"]]]

These assign a cell to a global geographic coordinate. Often these coordinates need to be adjusted to improve the appearance of particular applications or to line up with other pieces of data. This image is in a projection called Lambert Conformal Conic (LCC). You will need to know what projection data is in if you want to use it with other data. If the projections between data don't match, you may need to reproject them into a common projection.

MapServer can reproject files/layers on the fly. This means you don't have to change your source data unless you want higher performance.

The latitude of origin and central meridian settings are given in geographic coordinates using degree (°) units. They describe where the coordinate 0,0 starts. Latitude 0° represents the equator. In map projections central meridians are represented by a longitude value. Longitude -95°, or 95° West, runs through central Canada.

         PARAMETER["latitude_of_origin",0],         PARAMETER["central_meridian",-95],

Note that in the earlier projection, the unit setting is metre. When you look at Pixel Size in a moment, you will see a number but no unit. It is in this unit (meters) that the pixel sizes are measured.

Cells are given row and column numbers, but are also given geographic coordinate values. The origin setting tells what the geographic coordinate is of the cell at row 0, column 0. Here, the value of origin is in the same projection and units as the projection for the whole image. The east/west coordinate -2,600,000 is 2,600,000 meters west of the central meridian. The north/south coordinate is 10,500,000 meters north of the equator.

     Origin = (-2600000.000000,10500000.000000)     Pixel Size = (1000.00000000,-1000.00000000)

Cells are also called pixels and each of them has a defined size. In this example the pixels have a size of 1000 x 1000: the -1000 is just a notation; the negative aspect of it can be ignored for now. In most cases, your pixels will be square, though it is possible to have rasters with nonsquare pixels. The unit of these pixel sizes is in meters, as defined earlier in the projection for the image. That means each pixel is 1,000 meters wide and 1,000 meters high.

Each pixel has a coordinate value as well. This coordinate locates the upper-left corner of the pixel. Depending on the size of a pixel, it can be difficult to accurately locate it: a pixel is a square, not a discrete point location. Therefore, the upper-left corner of the pixel covers a different place on the ground than the center, but both have the same location coordinate. The accuracy of raster-based data is limited by the size of the pixel.

Much like the previous origin settings, corner coordinates tell you the geographic coordinate the corner pixels and center of the image have:

     Corner Coordinates:     Upper Left  (-2600000.000,10500000.000) (177d17'32.31"W, 66d54'22.82"N)     Lower Left  (-2600000.000, 5700000.000) (122d54'49.00"W, 36d12'53.87"N)     Upper Right ( 3100000.000,10500000.000) (  9d58'39.57"W, 62d25'50.45"N)     Lower Right ( 3100000.000, 5700000.000) ( 62d32'49.65"W, 34d18'5.61"N)     Center      (  250000.000, 8100000.000) ( 89d56'43.00"W, 62d46'47.18"N)

Notice that the coordinates are first given in their projected values, but also given in their unprojected geographic coordinates, longitude, and latitude. Knowing this will help you determine where on the earth your image falls. If you thought this image was in Greece, you'd be wrong. The geographic coordinates clearly put it in the western hemisphere: 177d17'32.31"W is 177 degrees, 17 minutes, 32.31 seconds west of the prime meridian.

Images are made up of different bands of data. In some cases, you can have a dozen different bands, where each band stores values about a specific wavelength of light that a sensor photographed. In this case, there is only one band Band 1. The ColorInterp=Gray setting tells you that it is a grayscale image, and Type=Byte tells you that it is an 8-bit (8 bits=1 byte) image. Because 8 bits of data can hold 256 different values, this image could have 256 different shades of gray.

     Band 1 Block=5700x1 Type=Byte, ColorInterp=Gray

If you have more than one band in an image, you can start to have color images that combine values from, for example, red, green, and blue (RGB) bands. Most normal digital photographs you see are set up this way, with each band having 256 values of its specific color. When combined, they can be assigned to specific RGB values on, for example, your computer monitor. That type of image would be considered a 24-bit image (8 bits per band x 3 bands).

If you add the -mm parameter to the gdalinfo command, as shown in Example 6-9, you get a summary of the minimum and maximum color values for the bands in the image.

Example 6-9. Using the min/max summary option

 > gdalinfo canada_mosaic_lcc_1000m.tif -mm ... Band 1 Block=5700x1 Type=Byte, ColorInterp=Gray     Computed Min/Max=0.000,255.000

This shows that there are 256 different values used in this image (with 0 being the minimum value).

< Day Day Up >