Concepts


Map data sets and response data sets are used in the GMAP procedure. These data sets must contain the required variables or the procedure stops and you get an error message. Depending on the type of map data set used, the map and response data sets can be used individually in the GMAP procedure or merged into a single data set to be used in the GMAP procedure. Each data set must contain the same identification variable.

About Map Data Sets

There are two types of map data sets: traditional map data sets and feature tables. Each uses a different data arrangement to store the spatial information needed to create maps. All of the map data delivered with SAS/GRAPH is available in both the traditional map data set and feature table format.

About Traditional Data Sets

A traditional map data set is a SAS data set that contains coordinates that define the boundaries of map areas, such as states or counties.

Required Variables

A traditional map data set must contain at least these variables:

  • a numeric variable named X that contains the horizontal coordinates of the boundary points. The value of this variable could be either projected or unprojected. If unprojected, X represents longitude.

  • a numeric variable named Y that contains the vertical coordinates of the boundary points. The value of this variable could be either projected or unprojected. If unprojected, Y represents latitude.

  • one or more variables that uniquely identify the areas in the map. Map area identification variables can be either character or numeric and are indicated in the ID statement.

The X and Y variable values in the traditional map data set do not have to be in any specific units because they are rescaled by the GMAP procedure based on the minimum and maximum values in the data set. The minimum X and Y values are in the lower-left corner of the map, and the maximum X and Y values are in the upper-right corner.

Traditional map data sets in which the X and Y variables contain longitude and latitude should be projected before you use them with PROC GMAP. See Chapter 39, The GPROJECT Procedure, on page 1161 for details.

Segment Variable

Optionally, the traditional map data set also can contain a variable named SEGMENT to identify map areas that comprise noncontagious polygons. Each unique value of the SEGMENT variable within a single map area defines a distinct polygon. If the SEGMENT variable is not present, each map area is drawn as a separate closed polygon that indicates a single segment.

The observations for each segment of a map area in the map data set must occur in the order in which the points are to be joined. The GMAP procedure forms map area outlines by connecting the boundary points of each segment in the order in which they appear in the data set, eventually joining the last point to the first point to complete the polygon.

LONG and LAT Variables

In addition to the variables described in Required Variables on page 999, the SAS/GRAPH map data sets can also contain the following variables:

  • a numeric variable named LONG containing the unprojected longitude (in radians or degrees) of the boundary points

  • a numeric variable named LAT containing the unprojected latitude (in radians or degrees) of the boundary points.

The GMAP procedure uses the values of the X and Y variables to draw the map. Therefore, if you want to produce an unprojected map by using the values in LONG and LAT, you would have to rename LONG and LAT to X and Y first.

SAS/GRAPH software includes a number of predefined map data sets. These data sets are described in Viewing Map Data Sets on page 1001.

Traditional Map Data Sets Containing X, Y, LONG, and LAT

Most of the traditional map data sets that are provided with SAS/GRAPH software contain four coordinate variables (X, Y, LONG, and LAT). In this case, X and Y are always projected values that will be used by the SAS/GRAPH procedures (by default). If you need to use the unprojected values that are contained in the LONG and LAT variables, then you must

  1. drop the existing X and Y variables

  2. rename the LONG and LAT variables to X and Y.

The MAP= value in the GMAP procedure automatically uses X and Y. See Input Map Data Sets that Contain Both Projected and Unprojected Values on page 1164 for more details.

Traditional Map Data Sets Containing Only X and Y

The traditional map data sets that contain X and Y variables (and no LONG and LAT variables), are usually projected maps. However, there are a few traditional map data sets for the US and Canada that contain X and Y values that are unprojected longitude and latitude. In this case, you will need to use the GPROJECT procedure to project the map (see Chapter 39, The GPROJECT Procedure, on page 1161).

Note: You can determine whether a SAS traditional map data set is projected or unprojected by looking at the description of each variable that is displayed when you use the CONTENTS procedure or by browsing the MAPS.METAMAPS data set.

About Feature Tables

An alternative to using the traditional map data set is the feature table. While the traditional map data set stores the spatial information across multiple observations, the feature table uses a data arrangement to store all of the spatial information in a single variable value. The feature table's data arrangement uses the $GEOREF SAS/GRAPH format.

$GEOREF format

The $GEOREF format stores spatial information in binary data streams, making it possible to store as a single variable value all the information needed to draw a map area. Thus, the feature tables use only a single observation for each map area, and they treat a field of spatial information just like any other information that can be added to a data set. Each $GEOREF value points to a corresponding traditional map data set to retrieve the coordinate values. The traditional map data set associated with the feature table must be located in the SAS library with the feature table for GMAP to proceed correctly.

To locate the variable that contains the spatial information, run PROC CONTENTS on a feature table. In the Output window, the variable containing the spatial information will have $GEOREF as the value in the column labelled Format.

Note: Some feature tables, like MAPS. NAMES , have more than one $GEOREF format variable.

Merging Feature Tables with Response Data Sets

To display response data with a feature table, the feature table must be merged with a response data set. The merged data set is then specified by the DATA= option in the PROC GMAP statement. The combined data set can be used repeatedly for generating maps, without having to merge the map and response data again.

First, a PROC SORT must be used to sort the response and feature tables by a variable that is present within both the data sets. Once sorted, the data sets can then be merged with a SQL or DATA step MERGE with the BY variable being the variable used to sort the data sets. Once the data set is merged, the $GEOREF formatted variable from the feature table becomes the new data set's identification variable to be used in the GMAP procedure. See Example 11 on page 1069 for more details.

Viewing Map Data Sets

When viewed in SAS, a data set is displayed as a table, with the variable names or labels displayed as column headings and the variable values arranged in columns and rows. The data sets that contain geometry objects describe a map by its spatial features, so their data tables are referred to as feature tables . Because feature tables store the spatial information in a single variable value, the spatial data and response data is viewed as a 1:1 ratio. The traditional MAP data sets define map areas using geometric coordinates, so their data tables are also referred to as geometry tables . Traditional map data sets store the geometric coordinates across multiple observations.

In the MAPS library, there is a data set named METAMAPS, which contains meta data about all of the data sets that are delivered in the library. Among the meta data in MAPS.METAMAPS are the following four variables, which you can use to determine which feature table corresponds to a particular geometry table:

Table 35.1

Variable

Description

MEMNAME

Identifies the names of all of the data sets that are delivered in the MAPS library.

MEMCODE

Indicates whether a data set represents a feature table (F) or a geometry table (G).

F_TABLE

Indicates the corresponding feature table for a geometry table. This variable is blank for rows that contain meta data about a feature table.

F_GEOCOL

Indicates the variable, in the feature table, whose values encapsulate the geometry object.

For example, consider the data sets MAPS.ASIA, MAPS.STATES, and MAPS.US. Each of these represents a geometry table, and to locate the corresponding feature tables, you would look in MAPS.METAMAPS to find the MEMNAME values ASIA, STATES, and US. Here are the relevant values on those rows:

Table 35.2

MEMNAME

MEMCODE

F_TABLE

F_GEOCOL

Asia

G

NAMES

CONT95_GEO

STATES

G

US2

GEO_STATE

US

G

US2

_MAP_GEOMETRY_

From these values, you can see that the data sets that are named ASIA, STATES, and US all represent geometry tables because their MEMCODE values are G. The feature table corresponding to the ASIA data set is the data set NAMES, which stores the spatial information in the variable CONT95_GEO. The feature tables corresponding to STATES and US are both in the data set US2. The spatial information corresponding to STATES is stored in the variable GEO_STATE, and the spatial information corresponding to US is stored in the variable _MAP_GEOMETRY_.

Speciality Map Data Sets

There are several map data sets available with SAS/GRAPH software that allow you to easily label maps:

MAPS.USCENTER

  • contains the coordinates of the visual center of each state in the U.S. and Washington, D.C., as well as coordinates in the ocean for states that are too small to contain a label. There are two pairs of variables for locating labels using Annotate data sets. The X and Y variables are projected and can be used with the MAPS.US and MAPS.USCOUNTY data sets. The LONG and LAT variables are unprojected longitude and latitude in degrees and can be used with the MAPS.STATES, MAPS.COUNTIES, and MAPS.COUNTY data sets.

MAPS.USCITY

  • contains the locations of selected cities in the U.S. Many city names occur in more than one state, so you may have to subset by state to avoid duplication. There are two pairs of variables for locating labels using Annotate data sets. The X and Y variables contain projected coordinates and can be used with the MAPS.US and MAPS.COUNTY data sets. The LONG and LAT variables contain the unprojected longitude and latitude in degrees. These can be used to place labels on the MAPS.STATES, MAPS.COUNTIES, or MAPS.COUNTIES data sets.

MAPS.CANCENS

  • contains the names of the Canadian census divisions. You can use MAPS.CANCENS with the MAPS.CANADA and MAPS.CANADA3 data sets.

For details on each of these data sets, see the MAPS.METAMAPS data set.

About Response Data Sets

A response data set is a SAS data set that contains

  • one or more response variables that contain data values that are associated with map areas. Each value of the response variable is associated with a map area in the map data set.

  • identification variables that identify the map area to which a response value belongs. These variables must be the same as those that are contained in the map data set.

The response data set can contain other variables in addition to these required variables.

Using the Response Data Set with the Map Data Sets

The traditional map data set and the response data set must be used independently in the PROC GMAP statement, where the response data set is specified by the DATA= option and the traditional map data set is specified by the MAP= option. The values of the map area ID variables in the response data set determine the map areas to be included on the map. Unless the ALL option is used in the PROC GMAP statement, only the map areas with response values are shown on the map. As a result, you do not need to subset your map data set if you are mapping only a small section of the map. However, if you map the same small section frequently, then create a subset of the map data set for efficiency.

If you have a response data set named WORK.SITES, then the syntax for using GMAP might resemble the following:

 /* if necessary, define a libref pointing to the SAS maps library */  libname maps 'SAS-data-library';  /* generate a map */  proc gmap map=maps.us data=work.sites;     id state;     choro region/discrete;  run;  quit; 

A feature table and response data set are merged using a variable contained in both data sets. The new combined data set becomes the DATA= value in the PROC GMAP statement. When the response data set and the feature table are merged into one, do not use MAP= map-data-set in the PROC GMAP statement. The $GEOREF formatted variable is the ID variable for the combined data set. See Example 11 on page 1069 for more details.

Note: Response data that does not correspond to a map feature will be included in the legend.

About Response Variables

The GMAP procedure can produce block, choropleth , prism, and surface maps for both numeric and character response variables. Numeric variables fall into two categories: discrete and continuous.

  • Discrete variables contain a finite number of specific numeric values that are to be represented on the map. For example, a variable that contains only the values 1989 or 1990 is a discrete variable.

  • Continuous variables contain a range of numeric values that are to be represented on the map. For example, a variable that contains any real value between 0 and 100 is a continuous variable.

Numeric response variables are always treated as continuous variables unless the DISCRETE option is used in the action statement.

About Response Levels

Response levels are the values that identify categories of data on the graph. The categories that are shown on the graph are based on the values of the response variable. Based on the type of the response variable, a response level can be determined by any of the following:

  • a character value

  • the MIDPOINTS= option

  • a range of numeric values

  • a specific numeric value.

When response levels are determined by a character value, the GMAP procedure treats each unique value as a response level. For example, if the response variable contains the names of ten regions , each region will be a response level, resulting in ten response levels.

When character response levels are determined by the MIDPOINTS= option, any response variable values that do not match one of the specified response level values are ignored.

When response levels are determined by a range of numeric values, each response level has the same number of observations. These options are exceptions to this:

  • The LEVELS= option specifies the number of response levels to be used on the map.

  • The DISCRETE option causes the numeric variable to be treated as a discrete variable.

  • The MIDPOINTS= option chooses specific response level values as medians of the value ranges.

If the response variable values are continuous, then the GMAP procedure assigns response level intervals automatically unless you specify otherwise . The response levels represent a range of values rather than a single value.

When response levels are determined by specific numeric values, and the DISCRETE option is specified, one level is created for each value. If the response variable has an associated format, then each formatted value is represented by a different response level. Formatted values are truncated to 16 characters .

The BLOCK, CHORO, and PRISM statements assign patterns to response levels. In CHORO and PRISM maps, response levels are shown as map areas. However, in BLOCK maps, response levels are shown as blocks. The default fill pattern for the response level is solid.

PATTERN statements can define the fill patterns and colors for both blocks and map areas. PATTERN definitions that define valid block patterns are applied to the blocks (response levels), and PATTERN definitions that define valid map patterns are applied to map areas.

See PATTERN Statement on page 169 for more information on fill pattern values and default pattern rotation.

About Identification Variables

For traditional map data sets and response data sets, id-variable(s) identify the map areas (for example, counties, states, or provinces ) that make up the map. A unit area or map area is a group of observations with the same ID value. The GMAP procedure matches the value of the response variables for each map area in the response data set to the corresponding map area in the traditional map data set in order to create the output graphs.

With feature tables, the geo-variable , or $GEOREF formatted variable containing the spatial information, is the identification variable. Each observation in a feature table has a unique $GEOREF formatted variable value. When merging the feature table with the response data set using a SQL or DATA step statement, the identification variable can be any variable that is contained within both data sets. Once the merged data set has been created, the geo-variable is used in the PROC GMAP ID statement for the merged feature table and response data set. See Example 11 on page 1069 for more details.

Displaying Map Areas and Response Data

Whether the GMAP procedure draws a map area and whether it displays patterns for response values depends on the contents of the response data set and on the ALL and MISSING options. The following table describes the conditions under which the procedure does or does not display map areas and response data.

If the response data set

And if

Then the procedure

includes the map area

the map area has a response value

draws the map area and displays the response data

includes the map area

the map area has no response value (that is, the value is missing)

draws the map area but leaves it empty

includes the map area

the map area has no response value and the MISSING option is used in the map statement

draws the map area and displays a response level for the missing value

does not include the map area

the ALL option is used in the PROC GMAP statement

draws the map area but leaves it empty

does not include the map area

the ALL option is not used

does not draw the map area

Summary of Use

To use the GMAP procedure, you must do the following:

  1. If necessary, issue a LIBNAME statement for the SAS data library that contains the map data set that you want to display.

  2. If using a traditional map data set, determine what processing needs to be done to the map data set before it is displayed. Use the GPROJECT, GREDUCE, and GREMOVE procedures or a DATA step to perform the necessary processing.

  3. Issue a LIBNAME statement for the SAS data set that contains the response data set, or use a DATA step to create a response data set.

  4. If using a traditional map data set, use the PROC GMAP statement to identify the map data set as the MAP= value and response data set as the DATA= value.

  5. If using a feature table, use PROC SORT to individually sort the feature table and response data set by a variable common to both data sets. Next, use SQL or the DATA step MERGE to merge the feature table with the response data set by using a variable common to both data sets. Use the combined data set as the DATA= value in the PROC GMAP statement (do not include MAP= in the PROC GMAP statement).

  6. Use the ID statement to name the id-variable(s) or the geo-variable .

  7. Use a BLOCK, CHORO, PRISM, or SURFACE statement to identify the response variable and generate the map.

Accessing SAS Maps Online

Visit SAS Maps Online to download data updates, sample SAS/GRAPH programs that use the map data sets delivered with SAS/GRAPH, and GIF images of maps. SAS Maps Online is located at the following URL:

http://support.sas.com/rnd/datavisualization/mapsonline/html/




SAS.GRAPH 9.1 Reference, Volumes I and II
SAS.GRAPH 9.1 Reference, Volumes I and II
ISBN: N/A
EAN: N/A
Year: 2004
Pages: 342

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net