Concepts


The GBARLINE procedure produces bar charts based on the values of a bar variable with plot overlays based on the values of a plot variable . The values of the bar variable are represented by a set of midpoints . The graph itself displays information about the bar variable in the form of bar statistics .

Figure 28.2 on page 741 illustrates the parts of a bar line graph.

click to expand
Figure 28.2: Parts of a Bar Line Graph

Bar line graphs have three axes:

  • a midpoint axis that shows the categories of data, based on the bar variable

  • a left response axis that displays the scale of values for the bar statistic (based on the summary variable, if specified)

  • a right response axis that displays the scale of values for the plot statistic.

The response axes are divided into evenly spaced intervals identified with major tick marks that are labeled with the corresponding statistic value. Minor tick marks are evenly distributed between the major tick marks. Each axis is labeled with the variable name or label.

About the Bar Variable

The bar variable is the variable in the input data set whose value determines the categories of data represented by the bar. The bar variable generates the midpoints to which each observation in the data set contribute.

The bar variable can be either character or numeric. Character bar variables contain character values, which are always discrete. Numeric bar variables fall into two categories: discrete and continuous.

  • Discrete variables contain a finite number of specific numeric values that are to be represented on the chart. For example, a variable that contains years , such as 1984 or 2002, is a discrete variable.

  • Continuous variables contain a range of numeric values that are to be represented on the chart. For example, a variable of temperature data that contains real values between 0 and 212 is a continuous variable.

Numeric bar variables are always treated as continuous variables unless the DISCRETE option is used in the BAR statement.

About Midpoints

Midpoints are the values of the bar variable that identify categories of data. By default, midpoints are selected or calculated by the procedure. The way the procedure handles the midpoints depends on whether the values of the bar variable are character, discrete numeric, or continuous numeric.

Character Values

A character bar variable generates a midpoint for each unique value of the variable. In the following example, the bar variable CITY contains the names of three different cities, and each city is a midpoint, resulting in three midpoints for the chart:

click to expand
Figure 28.3: Character Midpoints

By default, character midpoints are arranged in alphabetic order. If a character variable has an associated format, then the values are arranged in order of the formatted values.

Discrete Numeric Values

A numeric bar variable used with the DISCRETE option generates a midpoint for each unique value of the bar variable. In the following example, the numeric variable YEAR used with the DISCRETE option produces one midpoint for each year:

click to expand
Figure 28.4: Discrete Numeric Midpoints

By default, numeric midpoints are arranged in ascending order. If the numeric variable has an associated format, then each formatted value generates a separate midpoint. Formatted numeric variables are arranged in ascending order according to their unformatted numeric values.

Continuous Numeric Values

A continuous numeric variable generates midpoints that represent ranges of values. By default, the GBARLINE procedure determines the ranges, calculates the median value of each range, and displays the appropriate median value at each midpoint on the chart. A value that falls exactly halfway between two midpoints is placed in the higher range.

In the following example, the numeric variable AGE produces five midpoints, each of which represents a six-year age range; the median value of the range is displayed at each midpoint:

click to expand
Figure 28.5: Continuous Numeric Midpoints

By default, midpoints of ranges are arranged in ascending order.

Selecting and Ordering Midpoints

For character or discrete numeric values, you can use the MIDPOINTS= option to rearrange the midpoints or to exclude midpoints from the chart. For example, to change the default alphabetic order of the midpoints in Figure 28.3 on page 743, specify

 midpoints='Tokyo' 'Denver' 'Seattle' 

To exclude the midpoint for Denver, specify

 midpoints='Tokyo' 'Seattle' 

In this case, values excluded by the option are not included in the calculation of the bar statistic.

You can order or select discrete numeric midpoint values just as you do character values, but you omit the quotation marks when specifying numeric values.

For continuous numeric variables, use the LEVELS= or MIDPOINTS= option to change the number of midpoints, to control the range of values each midpoint represents, or to change the order of the midpoints. To control the range of values each midpoint represents, use the MIDPOINTS= option to specify the median value of each range. For example, to select the ranges 20 “29, 30 “39, and 40 “49, specify

 midpoints=25 35 45 

Alternatively, to select the number of midpoints that you want and let the procedure calculate the ranges and medians, use the LEVELS= option.

You can also use formats to control the ranges of continuous numeric variables, but in that case the values are no longer continuous but become discrete.

Note: You cannot use the MIDPOINTS= option to exclude continuous numeric values from the chart because values below or above the ranges specified by the option are automatically included in the first and last midpoints, respectively. To exclude continuous numeric values from a chart, use a WHERE statement in a DATA step or the WHERE= DATA set option.

See also the description of the LEVELS= and MIDPOINTS= options.

About the Plot Variable

The plot variable is the variable in the input data set whose values are used to generate the overlay plot line. The plot variable is optional, but if specified, it must be a numeric variable.

To specify a plot variable, use the SUMVAR= option on the PLOT statement:

 PLOT / SUMVAR=height; 

When you specify a plot variable with the SUMVAR= option, the only statistics available for the plot are the sum or the mean. You can specify the statistic with the TYPE= option. SUM (TYPE=SUM) is the default.

If you do not specify a plot variable, then the bar variable is used as the plot variable. The only statistics available for the plot are percentage, cumulative percentage, frequency, or cumulative frequency. The default statistic is frequency (TYPE=FREQ).

For more information about these statistics, see About Chart Statistics on page 745. See also the descriptions of the SUMVAR= and TYPE= options for the PLOT statement.

About Chart Statistics

The chart statistics are the statistical values calculated for the bar variables and the plot variables. The GBARLINE procedure calculates six chart statistics. You can specify the chart statistics with the TYPE= option. For the bar, the default statistic is frequency. For the plot, the default statistic is sum.

The examples given in the descriptions of these statistics assume a data set with two variables, CITY and SALES. The values of CITY are Denver , Seattle , and Tokyo . There are 21 observations: seven for Denver, nine for Seattle, and five for Tokyo.

Frequency

The frequency statistic is the total number of observations in the data set for each midpoint. For example, seven observations of the bar variable, CITY, contain the value Denver , so the frequency for the Denver midpoint is 7.

Cumulative Frequency

The cumulative frequency statistic adds the frequency for the current midpoint to the frequency of all of the preceding midpoints. For example, the frequency for the Denver midpoint is 7, and the frequency for the next midpoint, Seattle , is 9, so the cumulative frequency for Seattle is 16.

Percentage

The percentage statistic is calculated by dividing the frequency for each midpoint by the total frequency count for all midpoints in the chart or group and multiplying it by 100. For example, the frequency count for the Denver midpoint is 7 and the total frequency count for the chart is 21, so the percentage statistic for Denver is 33.3%.

Cumulative Percentage

The cumulative percentage statistic adds the percentage for the current midpoint to the percentage for all of the preceding midpoints in the chart or group. For example, the percentage for the Denver midpoint is 33.3, and the percentage for the next midpoint, Seattle , is 42.9, so the cumulative percentage for Seattle is 76.2.

Sum

The sum statistic is the total of the values, for each midpoint, for the variable specified by the SUMVAR= option. For example, if you specify SUMVAR=SALES and the values of the SALES variable for the seven Denver observations are 8734 , 982 , 1504 , 3207 , 4502 , 624 , and 918 , the sum statistic for the Denver midpoint is 20,471.

You must use the SUMVAR= option to specify the variable for which you want the sum statistic.

Mean

The mean statistic is the average of the values, for each midpoint, for the variable specified by the SUMVAR= option. For example, if TYPE=MEAN and SUMVAR=SALES, the mean statistic for the Denver midpoint is 2924.42.

You must use the SUMVAR= option to specify the variable for which you want the mean statistic.

Calculating Weighted Statistics

By default, each observation is counted only once in the calculation of a chart statistic. To calculate weighted statistics in which an observation can be counted more than once, use the FREQ= option. This option identifies a variable whose values are used as a multiplier for the observation in the calculation of the statistic. If the value of the FREQ= variable is missing, 0, or negative, then the observation is excluded from the calculation.

If you use the SUMVAR= option, then the SUMVAR= variable value for an observation is multiplied by the FREQ= variable value for the observation for use in calculating the chart statistic.

For example, to use a variable called COUNT to produce weighted statistics, assign FREQ=COUNT. If you also assign the variable HEIGHT to the SUMVAR= option, then the following table shows how the values of COUNT and HEIGHT would affect the statistic calculation:

Value of COUNT

Value of HEIGHT

Number of times the observation is used

Value used for HEIGHT

1

55

1

55

5

65

5

325

.

63

-

-3

60

-

By default, the percentage and cumulative percentage statistics are calculated based on the frequency. If you want to graph a percentage or cumulative percentage based on a sum, then you can use the FREQ= option to specify a variable to use for the "sum" calculation and specify the PCT statistic, as shown in this example:

 freq=count type=pct 

Because the variable that is specified by the FREQ= option determines the number of times an observation is counted, the value of COUNT is the equivalent of the sum statistic.

See also the descriptions of the TYPE=, SUMVAR=, and FREQ= options.

Note: The FREQ= option is not supported by ActiveX or Java.

Missing Values

By default, the GBARLINE procedure ignores missing midpoint values for the bar variable. If you specify the MISSING option, then missing values are treated as a valid midpoint and are included on the chart.

When the value of the variable that is specified in the FREQ= option is missing, 0, or negative, the observation is excluded from the calculation of the chart statistic.

When the value of the variable specified in the SUMVAR= option is missing, the observation is excluded from the calculation of the chart statistic.

If the value of the plot variable is missing, then the GBARLINE procedure does not include the observation in the plot overlay. If you specify interpolation with a SYMBOL definition, then the plot is not broken at the missing value.

Plot Variable Values Out of Range

Exclude data values from a plot overlay by restricting the range of axis values with the RAXIS= options or with the ORDER= option in an AXIS statement. When an observation contains a value outside of the specified axis range, the GBARLINE procedure excludes the observation from the plot and issues a message to the log.

If you specify interpolation with a SYMBOL definition, then by default values outside of the axis range are excluded from interpolation calculations and, as a result, can change interpolated values for the plot overlay.

To specify that values out of range are included in the interpolation calculations, use the MODE= option in a SYMBOL statement. When MODE=INCLUDE, values that fall outside of the axis range are included in interpolation calculations but excluded from the plot. The default (MODE=EXCLUDE) omits observations that are outside of the axis range from interpolation calculations. See the MODE= option of in SYMBOL Statement on page 183 for details.

About Patterns

When a chart needs one or more patterns, the procedure uses either default patterns and outlines that are automatically generated by SAS/GRAPH, or patterns, colors, outlines, and images that are defined by PATTERN statements, graphics options, and procedure options.

The following sections summarize pattern behavior for the GBARLINE procedure. For more information, see PATTERN Statement on page 169.

Default Patterns and Outlines

In general, the default pattern that the GBARLINE procedure uses is a solid fill that it rotates once through the colors list, skipping the color that is being used as the foreground color. The procedure also outlines all areas in the foreground color. (Typically, the foreground color is the first color in the device s colors list.)

Specifically, the GBARLINE procedure uses default patterns and outlines when you do not specify any of the following:

  • any PATTERN statements

  • the COLORS= graphics options (that is, you use the device's default colors list and it has more than one color)

  • the COUTLINE= option in the BAR statement.

If you do not specify any of these statements or options, then the GBARLINE procedure

  • selects the first default fill pattern, which is always solid, and rotates it through the colors list, generating one solid pattern for each color. If the first color in the device s colors list is black (or white), then the procedure skips that color and begins generating patterns with the next color.

  • uses the foreground color to outline every patterned area.

If the procedure needs additional patterns, PROC GBARLINE selects the next default pattern fill (empty) and rotates it through the colors list, skipping the foreground color as before. The procedure continues in this fashion until it has generated enough patterns for the chart.

Changing any of these conditions may change or override the default behavior:

  • If you specify a colors list with the COLORS= option in a GOPTIONS statement and the list contains more than one color, then the procedure rotates the default solid pattern through that list, using every color, even if the foreground color is black (or white). The default outline color remains the foreground color.

  • Whenever there are PATTERN definitions in effect, whether or not the GBARLINE procedure can use them, the default outline color for all patterns changes from foreground to SAME, as described in User -Defined Patterns, Outlines, and Images on page 748.

For a description of these graphics options, see Chapter 8, Graphics Options and Device Parameters Dictionary , on page 261.

User-Defined Patterns, Outlines, and Images

You can use PATTERN statements to explicitly specify patterns, including color or fill type or both. You can also specify images to fill the bars. For complete information on all patterns, see PATTERN Statement on page 169. See also the section on controlling patterns and colors for each chart type.

When you use PATTERN statements, the procedure uses the specified patterns until all of the PATTERN definitions they generate have been used. Then, if more patterns are required, the procedure returns to the default pattern rotation.

Whenever you specify any PATTERN statement, the default pattern outline changes. Instead of the foreground color, the outline color is the same as the fill color; for example, a blue bar has a blue outline. The effect is the same as specifying COUTLINE=SAME. Even when the procedure runs out of user-defined patterns and generates default patterns, the outlines continue to match the interior pattern color.

To change the outline color of any pattern, whether it's a default or user-defined pattern, use the COUTLINE= option in the BAR statement that generates the chart.

You can use the PATTERN statement to fill specified bars with specified images. For details, see Placing Images on the Bars of Two-Dimensional Bar Charts on page 116.

You can also add background images. The IBACK= goption (see IBACK on page 317) specifies image files that fill the background area. For further information, including a listing of recognized image file types, see Image File Types Supported by SAS/GRAPH on page 106 and Placing a Background Image on page 113.

Version 6 Patterns

If you specify the V6COMP graphics option, then the procedure generates patterns by rotating the appropriate Version 6 default patterns through all of the colors in the colors list. With V6COMP, all patterns are outlined in the same color as the fill.

Note: The V6COMP graphics option is not supported by ActiveX for graphs generated by the GBARLINE procedure.




SAS.GRAPH 9.1 Reference, Volumes I and II
SAS.GRAPH 9.1 Reference, Volumes I and II
ISBN: N/A
EAN: N/A
Year: 2004
Pages: 342

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net