Syntax


The syntax for the BOXPLOT procedure is as follows :

  • PROC BOXPLOT < options > ;

    • PLOT analysis-variable* group -variable < ( block- variables ) >

    • < =symbol-variable >< / options > ;

    • INSET keywords < / options > ;

    • INSETGROUP keywords < / options > ;

    • BY variables ;

    • ID variables ;

Both the PROC BOXPLOT and PLOT statements are required. You can specify any number of PLOT statements within a single PROC BOXPLOT invocation.

PROC BOXPLOT Statement

  • PROC BOXPLOT < options > ;

The PROC BOXPLOT statement starts the BOXPLOT procedure. The following options can appear in the PROC BOXPLOT statement.

ANNOTATE= SAS-data-set

ANNO= SAS-data-set

  • specifies an ANNOTATE= type data set, as described in SAS/GRAPH Software: Reference , that enhances all box plots requested in subsequent PLOT statements.

BOX= SAS-data-set

  • names an input data set containing group summary statistics and outlier values. Typically, this data set is created as an OUTBOX= data set in a previous run of PROC BOXPLOT. Each group summary statistic and outlier value is recorded in a separate observation in a BOX= data set, so there are multiple observations per group. You cannot use a BOX= data set together with a DATA= or HISTORY= data set. If you do not specify one of these input data sets, the procedure uses the most recently created SAS data set as a DATA= data set.

DATA= SAS-data-set

  • names an input data set containing raw data to be analyzed . You cannot use a DATA= data set together with a BOX= or a HISTORY= data set. If you do not specify one of these input data sets, the procedure uses the most recently created SAS data set as a DATA= data set.

GOUT= < libref . > output catalog

  • specifies the SAS catalog in which to save the graphics output that is produced by the BOXPLOT procedure. If you omit the libref, PROC BOXPLOT looks for the catalog in the temporary library called WORK and creates the catalog if it does not exist.

HISTORY= SAS-data-set

HIST= SAS-data-set

  • names an input data set containing group summary statistics. Typically, this data set is created as an OUTHISTORY= data set in a previous run of PROC BOXPLOT, but it can also be created using a SAS summarization procedure such as PROC MEANS. The HISTORY= data set can contain only one observation for each value of the group-variable . You cannot use a HISTORY= data set with a DATA= or a BOX= data set. If you do not specify one of these three input data sets, PROC BOXPLOT uses the most recently created data set as a DATA= data set.

PLOT Statement

  • PLOT ( analysis-variables )* group-variable < ( block-variables ) > < =symbol-variable >< /options > ;

You can specify multiple PLOT statements after the PROC BOXPLOT statement. The components of the PLOT statement are as follows.

analysis-variables

identify one or more variables to be analyzed. An analysis variable is required. If you specify more than one analysis variable, enclose the list in parentheses. For example, the following statements request distinct box plots for the variables weight , length , and width :

  proc boxplot data=summary;   plot (weight length width)*day;   run;  

group-variable

specifies the variable that identifies groups in the data. The group variable is required. In the preceding PLOT statement, day is the group variable.

block-variables

specify optional variables that group the data into blocks of consecutive groups. These blocks are labeled in a legend, and each block variable provides one level of labels in the legend.

symbol-variable

specifies an optional variable whose levels (unique values) determine the symbol marker used to plot the means. Distinct symbol markers are displayed for points corresponding to the various levels of the symbol variable. You can specify the symbol markers with SYMBOL n statements (refer to SAS/GRAPH Software: Reference for complete details).

options

enhance the appearance of the box plot, request additional analyses, save results in data sets, and so on. Complete descriptions for each option follow.

Table 18.1 lists all options in the PLOT statement by function.

PLOT Statement Options

Table 18.1: PLOT Statement Options

Option

Description

Options for Controlling Box Appearance

BOXCONNECT=

connects features of adjacent box-and-whisker plots with line segments

BOXSTYLE=

specifies style of box-and-whisker plots

BOXWIDTH=

specifies width of box-and-whisker plots

BOXWIDTHSCALE=

specifies that widths of box-and-whisker plots vary proportionately to group size

CBOXES=

specifies color for outlines of box-and-whisker plots

CBOXFILL=

specifies fill color for interior of box-and-whisker plots

IDCOLOR=

specifies outlier symbol color in schematic box-and-whisker plots

IDCTEXT=

specifies outlier label color in schematic box-and-whisker plots

IDFONT=

specifies outlier label font in schematic box-and-whisker plots

IDHEIGHT=

specifies outlier label height in schematic box-and-whisker plots

IDSYMBOL=

specifies outlier symbol in schematic box-and-whisker plots

LBOXES=

specifies line types for outlines of box-and-whisker plots

NOSERIFS

eliminates serifs from the whiskers of box-and-whisker plots

NOTCHES

specifies that box-and-whisker plots are to be notched

PCTLDEF=

specifies percentile definition used for box-and-whisker plots

Options for Plotting and Labeling Points

ALLLABEL=

labels means of box-and-whisker plots

CLABEL=

specifies color for labels requested with ALLLABEL= option

CCONNECT=

specifies color for line segments that connect points on plot

LABELANGLE=

specifies angle for labels requested with ALLLABEL= option

SYMBOLLEGEND=

specifies LEGEND statement for levels of the symbol variable

SYMBOLORDER=

specifies order in which symbols are assigned for levels of the symbol variable

Reference Line Options

CHREF=

specifies color for lines requested by HREF= option

CVREF=

specifies color for lines requested by VREF= option

HREF=

requests reference lines perpendicular to horizontal axis

HREFLABELS=

specifies labels for HREF= lines

HREFLABPOS=

specifies position of HREFLABELS= labels

LHREF=

specifies line type for HREF= lines

LVREF=

specifies line type for VREF= lines

NOBYREF

specifies that reference line information in a data set is to be applied uniformly to plots created for all BY groups

VREF=

requests reference lines perpendicular to vertical axis

VREFLABELS=

specifies labels for VREF= lines

VREFLABPOS=

specifies position of VREFLABELS= labels

Block Variable Legend Options

BLOCKLABELPOS=

specifies position of label for the block variable legend

BLOCKLABTYPE=

specifies text size of the block variable legend

BLOCKPOS=

specifies vertical position of the block variable legend

BLOCKREP

repeats identical consecutive labels in the block variable legend

CBLOCKLAB=

specifies colors for filling frames enclosing block variable labels

CBLOCKVAR=

specifies colors for filling background of the block variable legend

Axis and Axis Label Options

CAXIS=

specifies color for axis lines and tick marks

CFRAME=

specifies fill color for frame for plot area

CONTINUOUS

produces horizontal axis for continuous group variable values

CTEXT=

specifies color for tick mark values and axis labels

HAXIS=

specifies major tick mark values for horizontal axis

HEIGHT=

specifies height of axis label and axis legend text

HMINOR=

specifies number of minor tick marks between major tick marks on horizontal axis

HOFFSET=

specifies length of offset at both ends of horizontal axis

NOHLABEL

suppresses label for horizontal axis

NOTICKREP

specifies that only the first occurrence of repeated, adjacent character group values is to be labeled on horizontal axis

NOVANGLE

requests vertical axis labels that are strung out vertically

SKIPHLABELS=

specifies thinning factor for tick mark labels on horizontal axis

TURNHLABELS

requests horizontal axis labels that are strung out vertically

VAXIS=

specifies major tick mark values for vertical axis

VFORMAT=

specifies format for vertical axis tick marks

VMINOR=

specifies number of minor tick marks between major tick marks on vertical axis

VOFFSET=

specifies length of offset at both ends of vertical axis

VZERO

forces origin to be included in vertical axis

WAXIS=

specifieswidthofaxislines

Input Data Set Options

MISSBREAK

specifies that missing values between identical character group values signify the start of a new group

Output Data Set Options

OUTBOX=

produces an output data set containing group summary statistics and outlier values

OUTHISTORY=

produces an output data set containing group summary statistics

Graphical Enhancement Options

ANNOTATE=

specifies annotate data set that adds features to box plot

BWSLEGEND

displays a legend identifying the function of group size specified with the BOXWIDTHSCALE= option

DESCRIPTION=

specifies string that appears in the description field of the PROC GREPLAY master menu for box plot

FONT=

specifies software font for labels and legends on plots

HTML=

specifies URLs to be associated with box-and-whisker plots

NAME =

specifies name that appears in the name field of the PROC GREPLAY master menu for box plot

NLEGEND

requests a legend displaying group sizes

OUTHIGHHTML=

specifies URLs to be associated with high outliers on box-and-whisker plots

OUTLOWHTML=

specifies URLs to be associated with low outliers on box-and-whisker plots

PAGENUM=

specifies the form of the label used in pagination

PAGENUMPOS=

specifies the position of the page number requested with the

PAGENUM=

option

Grid Options

CGRID=

specifies color for grid requested with ENDGRID or GRID option

ENDGRID

adds grid after last box-and-whisker plot

GRID

adds grid to box plot

LENDGRID=

specifies line type for grid requested with the ENDGRID option

LGRID=

specifies line type for grid requested with the GRID option

WGRID=

specifies width of grid lines

Plot Layout Options

INTERVAL=

specifies natural time interval between consecutive group positions when time, date, or datetime format is associated with a numeric group variable

INTSTART=

specifies first major tick mark value on horizontal axis when a date, time, or datetime format is associated with numeric group variable

MAXPANELS=

specifies maximum number of panels for plot

NOCHART

suppresses creation of the box plot

NOFRAME

suppresses frame for plot area

NPANELPOS=

specifies number of group positions per panel on each plot

REPEAT

repeats last group position on panel as first group position of next panel

TOTPANELS=

specifies number of panels to be used to display plot

Overlay Options

CCOVERLAY=

specifies colors for line segments connecting points on overlays

COVERLAY=

specifies colors for points on overlays

LOVERLAY=

specifies line types for line segments connecting points on overlays

NOOVERLAYLEGEND

suppresses overlay legend

OVERLAY=

specifies variables to be plotted on overlays

OVERLAYHTML=

specifies URLs to be associated with overlay plot points

OVERLAYID=

specifies labels for overlay plot points

OVERLAYLEGLAB=

specifies label for overlay legend

OVERLAYSYM=

specifies symbols used for overlays

OVERLAYSYMHT=

specifies heights for overlay symbols

WOVERLAY=

specifies widths for line segments connecting points on overlays

Clipping Options

CCLIP=

specifies color for plot symbol for clipped points

CLIPFACTOR=

determines extent to which extreme values are clipped

CLIPLEGEND=

specifies text for clipping legend

CLIPLEGPOS=

specifies position of clipping legend

CLIPSUBCHAR=

specifies substitution character for CLIPLEGEND= text

CLIPSYMBOL=

specifies plot symbol for clipped points

CLIPSYMBOLHT=

specifies symbol marker height for clipped points

COVERLAYCLIP=

specifies color for clipped points on overlays

OVERLAYCLIPSYM=

specifies symbol for clipped points on overlays

OVERLAYCLIPSYMHT=

specifies symbol height for clipped points on overlays

Following are explanations of the options that you can specify in the PLOT statement after a slash (/).

ALLLABEL=VALUE

ALLLABEL=( variable )

  • labels the point plotted for the mean of each box-and-whisker plot with its VALUE or with the value of a variable in the input data set.

ANNOTATE= SAS-data-set

  • specifies an ANNOTATE= type data set, as described in SAS/GRAPH Software: Reference .

BLOCKLABELPOS=ABOVE LEFT

  • specifies the position of a block variable label in the block legend. The keyword ABOVE places the label immediately above the legend, and LEFT places the label to the left of the legend. Use the keyword LEFT with labels that are short enough to fit in the margin of the plot; otherwise , they are truncated. The default keyword is ABOVE.

BLOCKLABTYPE=SCALED TRUNCATED

BLOCKLABTYPE= height

  • specifies how lengthy block variable values are to be treated when there is insufficient space to display them in the block legend. If you specify BLOCKLABTYPE=SCALED, the values are uniformly reduced in height so that they fit. If you specify BLOCKLABTYPE=TRUNCATED, lengthy values are truncated on the right until they fit. You can also specify a text height in vertical percent screen units for the values. By default, lengthy values are not displayed. For more information, see the section 'Displaying Blocks of Data' on page 530.

BLOCKPOS= n

  • specifies the vertical position of the legend for the values of the block variables. Values of n and the corresponding positions are as follows. By default, BLOCKPOS=1.

    n

    Legend Position

    1

    top of plot, offset from axis frame

    2

    top of plot, immediately above axis frame

    3

    bottom of plot, immediately above horizontal axis

    4

    bottom of plot, below horizontal axis label

BLOCKREP

  • specifies that block variable values for all groups are to be displayed. By default, only the first block variable value in any block is displayed, and repeated block variable values are not displayed.

BOXCONNECT=MEAN MEDIAN MAX MIN Q1 Q3

BOXCONNECT

  • specifies that the points in adjacent box-and-whisker plots representing group means, medians, maximum values, minimum values, first quartiles, or third quartiles are to be connected with line segments. If the BOXCONNECT option is specified without a keyword identifying the points to be connected, group means are connected. By default, no points are connected.

BOXSTYLE= keyword

  • specifies the style of the box-and-whisker plots displayed. If you specify BOXSTYLE=SKELETAL, the whiskers are drawn from the edges of the box to the extreme values of the group. This plot is sometimes referred to as a skeletal box-and-whisker plot. By default, the whiskers are drawn with serifs: you can specify the NOSERIFS option to draw the whiskers without serifs.

    In the following descriptions, the terms fence and far fence refer to the distance from the first and third quartiles (25th and 75th percentiles, respectively), expressed in terms of the interquartile range (IQR). For example, the lower fence is located at 1 . 5 — IQR below the 25th percentile; the upper fence is located at 1 . 5 — IQR above the 75th percentile. Similarly, the lower far fence is located at 3 — IQR below the 25th percentile; the upper far fence is located at 3 — IQR above the 75th percentile.

    If you specify BOXSTYLE=SCHEMATIC, a whisker is drawn from the upper edge of the box to the largest observed value within the upper fence and from the lower edge of the box to the smallest observed value within the lower fence. Serifs are added to the whiskers by default. Observations outside the fences are identified with a special symbol; you can specify the shape and color for this symbol with the IDSYMBOL= and IDCOLOR= options. The default symbol is a square. This type of plot corresponds to the schematic box-and-whisker plot described in Chapter 2 of Tukey (1977). See Figure 18.5 and the discussion in the section 'Styles of Box Plots' on page 522 for more information.

    If you specify BOXSTYLE=SCHEMATICID, a schematic box-and-whisker plot is displayed in which an ID variable value is used to label the symbol marking each observation outside the upper and lower fences. A BOX= data set can contain a variable named _ID_ that is used as the ID variable. Otherwise, the first variable listed in the ID statement provides the labels.

    If you specify BOXSTYLE=SCHEMATICIDFAR, a schematic box-and-whisker plot is displayed in which the value of the ID variable is used to label the symbol marking each observation outside the lower and upper far fences. Observations between the fences and the far fences are identified with a symbol but are not labeled with the ID variable.

    Figure 18.3 illustrates the elements of a skeletal box-and-whisker plot.

click to expand
Figure 18.3: Skeletal Box-and-Whisker Plot

The skeletal style of the box-and-whisker plot shown in Figure 18.3 is the default.

BOXWIDTH= value

  • specifies the width (in horizontal percent screen units) of the box-and-whisker plots.

BOXWIDTHSCALE= value

  • specifies that the box-and-whisker plot width is to vary proportionately to a particular function of the group size n . The function is determined by the value .

    If you specify a positive value, the widths are proportional to n value . In particular, if you specify BOXWIDTHSCALE=1, the widths are proportional to the group size. If you specify BOXWIDTHSCALE=0.5, the widths are proportional to , as described by McGill, Tukey, and Larsen (1978). If you specify BOXWIDTHSCALE=0, the widths are proportional to log( n ). See Example 18.4 on page 543 for an illustration of the BOXWIDTHSCALE= option.

    You can specify the BWSLEGEND option to display a legend identifying the function of n used to determine the box-and-whisker plot widths.

    By default, the box widths are constant.

BWSLEGEND

  • displays a legend identifying the function of group size n specified with the BOXWIDTHSCALE= option. No legend is displayed if all group sizes are equal. The BWSLEGEND option is not applicable unless you also specify the BOXWIDTHSCALE= option.

CAXIS= color

CAXES= color

CA= color

  • specifies the color for the axes and tick marks. This option overrides any COLOR= specifications in an AXIS statement. The default value is the first color in the device color list.

CBLOCKLAB= color (color-list)

  • specifies fill colors for the frames that enclose the block variable labels in a block legend. By default, these areas are not filled. Colors in the CBLOCKLAB= list are matched with block variables in the order in which they appear in the PLOT statement.

CBLOCKVAR= variable (variable-list)

  • specifies variables whose values are colors for filling the background of the legend associated with block variables. Each CBLOCKVAR= variable must be a character variable of no more than eight characters in the input data set, and its values must be valid SAS/GRAPH color names (refer to SAS/GRAPH Software: Reference for complete details). A list of CBLOCKVAR= variables must be enclosed in parentheses.

    The procedure matches the CBLOCKVAR= variables with block variables in the order specified. That is, each block legend is filled with the color value of the CBLOCKVAR= variable of the first observation in each block. In general, values of the i th CBLOCKVAR= variable are used to fill the block of the legend corresponding to the i th block variable.

    By default, fill colors are not used for the block variable legend. The CBLOCKVAR= option is available only when block variables are used in the PLOT statement.

CBOXES= color

CBOXES= (variable)

  • specifies the colors for the outlines of the box-and-whisker plots created with the PLOT statement. You can use one of the following approaches:

    • You can specify CBOXES= color to provide a single outline color for all the box-and-whisker plots.

    • You can specify CBOXES= (variable) to provide a distinct outline color for each box-and-whisker plot as the value of the variable. The variable must be a character variable of length 8 or less in the input data set, and its values must be valid SAS/GRAPH color names (refer to SAS/GRAPH Software: Reference for complete details). The outline color of the plot displayed for a particular group is the value of the variable in the observations corresponding to this group. Note that, if there are multiple observations per group in the input data set, the values of the variable should be identical for all the observations in a given group.

  • The default color is the second color in the device color list.

CBOXFILL= color

CBOXFILL= (variable)

  • specifies the interior fill colors for the box-and-whisker plots. You can use one of the following approaches:

    • You can specify CBOXFILL= color to provide a single color for all of the box-and-whisker plots.

    • You can specify CBOXFILL= (variable) to provide a distinct color for each box-and-whisker plot as the value of the variable. The variable must be a character variable of length 8 or less in the input data set, and its values must be valid SAS/GRAPH color names (or the value EMPTY, which you can use to suppress color filling). Refer to SAS/GRAPH Software: Reference for complete details. The interior color of the box displayed for a particular group is the value of the variable in the observations corresponding to this group. Note that if there are multiple observations per group in the input data set, the values of the variable should be identical for all the observations in a given group.

  • By default, the interiors are not filled.

CCLIP= color

  • specifies a color for the plotting symbol that is specified with the CLIPSYMBOL= option to mark clipped values. The default color is the color specified in the COLOR= option in the SYMBOL1 statement.

CCONNECT= color

  • specifies the color for line segments connecting points on the plot. The default color is the color specified in the COLOR= option in the SYMBOL1 statement. This option is not applicable unless you also specify the BOXCONNECT= option.

CCOVERLAY= (color-list)

  • specifies the colors for line segments connecting points on overlay plots. Colors in the CCOVERLAY= list are matched with variables in the corresponding positions in the OVERLAY= list. By default, points are connected by line segments of the same color as the plotted points. You can specify the value NONE to suppress the line segments connecting points of an overlay plot.

CFRAME= color

  • specifies the color for filling the rectangle enclosed by the axes and the frame. By default, this area is not filled. The CFRAME= option cannot be used in conjunction with the NOFRAME option.

CGRID= color

  • specifies the color for the grid requested by the ENDGRID or GRID option. By default, the grid is the same color as the axes.

CHREF= color

  • specifies the color for the lines requested by the HREF= option. The default value is the first color in the device color list.

CLABEL= color

  • specifies the color for labels produced by the ALLLABEL= option. The default color is the CTEXT= color.

CLIPFACTOR= factor

  • requests clipping of extreme values on the box plot. The factor that you specify determines the extent to which these values are clipped, and it must be greater than 1.

    For examples of the CLIPFACTOR= option, see Figure 18.14 on page 534 and Figure 18.15 on page 535. Related clipping options are CCLIP=, CLIPLEGEND=, CLIPLEGPOS=, CLIPSUBCHAR=, and CLIPSYMBOL=.

CLIPLEGEND=' label '

  • specifies the label for the legend that indicates the number of clipped boxes when the CLIPFACTOR= option is used. The label must be no more than 16 characters and must be enclosed in quotes. For an example, see Figure 18.15 on page 535.

CLIPLEGPOS=TOP BOTTOM

  • specifies the position for the legend that indicates the number of clipped boxes when the CLIPFACTOR= option is used. The keywords TOP and BOTTOM position the legend at the top or bottom of the chart, respectively. Do not specify CLIPLEGPOS=TOP together with the PHASELEGEND option or the BLOCKPOS=1 or BLOCKPOS=2 options. By default, CLIPLEGPOS=BOTTOM.

CLIPSUBCHAR=' character '

  • specifies a substitution character (such as # ) for the label provided with the CLIPLEGEND= option. The substitution character is replaced with the number of boxes that are clipped. For example, suppose that the following statements produce a chart in which three boxes are clipped:

      proc boxplot data=pistons;   plot diameter*hour /   clipfactor  = 1.5   cliplegend  = 'Boxes clipped=#'   clipsubchar = '#' ;   run;  

    Then the clipping legend displayed on the chart will be

      Boxes clipped=3  

CLIPSYMBOL= symbol

  • specifies a plot symbol used to identify clipped points on the chart and in the legend when the CLIPFACTOR= option is used. You should use this option in conjunction with the CLIPFACTOR= option. The default symbol is CLIPSYMBOL=SQUARE.

CLIPSYMBOLHT= value

  • specifies the height for the symbol marker used to identify clipped points on the chart when the CLIPFACTOR= option is used. The default is the height specified with the H= option in the SYMBOL statement.

    For general information about clipping options, refer to 'Clipping Extreme Values' on page 532.

CONTINUOUS

  • specifies that numeric group variable values are to be treated as continuous values. By default, the values of a numeric group variable are considered discrete values unless the HAXIS= option is specified. For more information, see the discussion in the section 'Continuous Group Variables' on page 524.

COVERLAY= (color-list)

  • specifies the colors used to plot overlay variables. Colors in the COVERLAY= list are matched with variables in the corresponding positions in the OVERLAY= list.

COVERLAYCLIP= color

  • specifies the color used to plot clipped values on overlay plots when the CLIPFACTOR= option is used.

CTEXT= color

  • specifies the color for tick mark values and axis labels. The default color is the color specified in the CTEXT= option in the most recent GOPTIONS statement.

CVREF= color

  • specifies the color for the lines requested by the VREF= option. The default value is the first color in the device color list.

DESCRIPTION= 'string'

DES= 'string'

  • specifies a description of the box plot, not longer than 40 characters, that appears in the PROC GREPLAY master menu. The default string is the variable name.

ENDGRID

  • adds a grid to the rightmost portion of the plot, beginning with the first labeled major tick mark position that follows the last box-and-whisker plot. You can use the HAXIS= option to force space to be added to the horizontal axis.

FONT= font

  • specifies a software font for labels and legends. You can also specify fonts for axis labels in an AXIS statement. The FONT= font takes precedence over the FTEXT= font specified in the GOPTIONS statement. Hardware characters are used by default. Refer to SAS/GRAPH Software: Reference for more information on the GOPTIONS statement.

GRID

  • adds a grid to the box plot. Grid lines are horizontal lines positioned at labeled major tick marks, and they cover the length and height of the plotting area.

HAXIS= values

HAXIS=AXIS n

  • specifies tick mark values for the horizontal (group) axis. If the group variable is numeric, the values must be numeric and equally spaced . Optionally, you can specify an axis name defined in a previous AXIS statement. Refer to SAS/GRAPH Software: Reference for more information on the AXIS statement.

    Specifying the HAXIS= option with a numeric group variable causes the group variable values to be treated as continuous values. For more information, see the description of the CONTINUOUS option and the discussion in the section 'Continuous Group Variables' on page 524. Numeric values can be given in an explicit or implicit list. If the group variable is character, values must be quoted strings of length 16 or less. If a date, time, or datetime format is associated with a numeric group variable, SAS datetime literals can be used. Examples of HAXIS= lists follow:

    • haxis=0 2 4 6 8 10

    • haxis=0 to 10 by 2

    • haxis='LT12A' 'LT12B' 'LT12C' 'LT15A' 'LT15B' 'LT15C'

    • haxis='20MAY88'D to '20AUG88'D by 7

    • haxis='01JAN88'D to '31DEC88'D by 30

  • If the group variable is numeric, the HAXIS= list must span the group variable values; if it is a character variable, the HAXIS= list must include all of the group variable values. You can add group positions to the box plot by specifying HAXIS= values that are not group variable values.

    If you specify a large number of HAXIS= values, some of these may be thinned to avoid collisions between tick mark labels. To avoid thinning, use one of the following methods .

    • Shorten values of the group variable by eliminating redundant characters. For example, if your group variable has values LOT1, LOT2, LOT3, and so on, you can use the SUBSTR function in a DATA step to eliminate LOT from each value, and you can modify the horizontal axis label to indicate that the values refer to lots.

    • Use the TURNHLABELS option to turn the labels vertically.

    • Use the NPANELPOS= option to force fewer group positions per panel.

HEIGHT= value

  • specifies the height (in vertical screen percent units) of the text for axis labels and legends. This value takes precedence over the HTEXT= value specified in the GOPTIONS statement. This option is recommended for use with software fonts specified with the FONT= option or with the FTEXT= option in the GOPTIONS statement. Refer to SAS/GRAPH Software: Reference for complete information on the GOPTIONS statement.

HMINOR= n

HM= n

  • specifies the number of minor tick marks between each major tick mark on the horizontal axis. Minor tick marks are not labeled. The default is HMINOR=0.

HOFFSET= value

  • specifies the length (in percent screen units) of the offset at both ends of the horizontal axis. You can eliminate the offset by specifying HOFFSET=0.

HREF= values

HREF= SAS-data-set

  • draws reference lines perpendicular to the horizontal (group) axis on the box plot. You can use this option in the following ways:

    • You can specify the values for the lines with an HREF= list. If the group variable is numeric, the values must be numeric. If the group variable is character, the values must be quoted strings of up to 16 characters. If the group variable is formatted, the values must be given as internal values. Examples of HREF= values follow:

        href=5   href=5 10 15 20 25 30   href='Shift 1' 'Shift 2' 'Shift 3'  
    • You can specify reference line values as the values of a variable named _REF_ in an HREF= data set. The type and length of _REF_ must match those of the group variable specified in the PLOT statement. Optionally, you can provide labels for the lines as values of a variable named _REFLAB_ , which must be a character variable of length 16 or less. If you want distinct reference lines to be displayed in plots for different analysis variables specified in the PLOT statement, you must include a character variable named _VAR_ , whose values are the analysis variable names. If you do not include the variable _VAR_ ,all of the lines are displayed in all of the plots.

      Each observation in an HREF= data set corresponds to a reference line. If BY variables are used in the input data set, the same BY variable structure must be used in the reference line data set unless you specify the NOBYREF option.

  • Unless the CONTINUOUS or HAXIS= option is specified, numeric group variable values are treated as discrete values, and only HREF= values matching these discrete values are valid. Other values are ignored.

HREFLABELS= 'label 1 ' ... 'label n '

HREFLABEL= 'label 1 ' ... 'label n '

HREFLAB= 'label 1 ' ... 'label n '

  • specifies labels for the reference lines requested by the HREF= option. The number of labels must equal the number of lines. Enclose each label in quotes. Labels can be up to 16 characters.

HREFLABPOS= n

  • specifies the vertical position of the HREFLABEL= label, as described in the following table. By default, n=2.

    HREFLABPOS=

    Label Position

    1

    along top of plot area

    2

    staggered from top to bottom of plot area

    3

    along bottom of plot area

    4

    staggered from bottom to top of plot area

HTML= variable

  • specifies uniform resource locators (URLs) as values of the specified character variable (or formatted values of a numeric variable). These URLs are associated with box-and-whisker plots when graphics output is directed into HTML. The value of the HTML= variable should be the same for each observation with a given value of the group variable.

IDCOLOR= color

  • specifies the color of the symbol marker used to identify outliers in schematic box-and-whisker plots (that is, when you specify one of the keywords SCHEMATIC, SCHEMATICID, or SCHEMATICIDFAR with the BOXSTYLE= option). The default color is the color specified with the CBOXES= option; otherwise, the second color in the device color list is used.

IDCTEXT= color

  • specifies the color for the text used to label outliers when you specify one of the keywords SCHEMATICID or SCHEMATICIDFAR with the BOXSTYLE= option. The default value is the color specified with the CTEXT= option.

IDFONT= font

  • specifies the font for the text used to label outliers when you specify one of the keywords SCHEMATICID or SCHEMATICIDFAR with the BOXSTYLE= option. The default font is SIMPLEX.

IDHEIGHT= value

  • specifies the height for the text used to label outliers when you specify one of the keywords SCHEMATICID or SCHEMATICIDFAR with the BOXSTYLE= option. The default value is the height specified with the HTEXT= option in the GOPTIONS statement. Refer to SAS/GRAPH Software: Reference for complete information on the GOPTIONS statement.

IDSYMBOL= symbol

  • specifies the symbol marker used to identify outliers in schematic box plots. The default symbol is SQUARE.

INTERVAL=DAY DTDAY HOUR MINUTE MONTH QTR SECOND

  • specifies the natural time interval between consecutive group positions when a time, date, or datetime format is associated with a numeric group variable. By default, the INTERVAL= option uses the number of group positions per panel (screen or page) that you specify with the NPANELPOS= option. The default time interval keywords for various time formats are shown in the following table.

    Format

    Default Keyword

    Format

    Default Keyword

    DATE

    DAY

    MONYY

    MONTH

    DATETIME

    DTDAY

    TIME

    SECOND

    DDMMYY

    DAY

    TOD

    SECOND

    HHMM

    HOUR

    WEEKDATE

    DAY

    HOUR

    HOUR

    WORDDATE

    DAY

    MMDDYY

    DAY

    YYMMDD

    DAY

    MMSS

    MINUTE

    YYQ

    QTR

    You can use the INTERVAL= option to modify the effect of the NPANELPOS= option, which specifies the number of group positions per panel. The INTERVAL= option enables you to match the scale of the horizontal axis to the scale of the group variable without having to associate a different format with the group variable.

    For example, suppose that your formatted group values span an overall time interval of 100 days and a DATETIME format is associated with the group variable. Since the default interval for the DATETIME format is DTDAY and since NPANELPOS=25 by default, the plot is displayed with four panels.

    Now, suppose that your data span an overall time interval of 100 hours and a DATETIME format is associated with the group variable. The plot for these data is created in a single panel, but the data occupy only a small fraction of the plot since the scale of the data (hours) does not match that of the horizontal axis (days). If you specify INTERVAL=HOUR, the horizontal axis is scaled for 25 hours, matching the scale of the data, and the plot is displayed with four panels.

    You should use the INTERVAL= option only in conjunction with the CONTINUOUS or HAXIS= option, which produces a horizontal axis of continuous group variable values. For more information, see the descriptions of the CONTINUOUS and HAXIS= options, and the discussion in the section 'Continuous Group Variables' on page 524.

INTSTART= value

  • specifies the starting value for a numeric horizontal axis when a date, time, or datetime format is associated with the group variable. If the value specified is greater than the first group variable value, this option has no effect.

LABELANGLE= angle

  • specifies the angle at which labels requested with the ALLLABEL= option are drawn. A positive angle rotates the labels counterclockwise; a negative angle rotates them clockwise. By default, labels are oriented horizontally.

LBOXES= linetype

LBOXES= (variable)

  • specifies the line types for the outlines of the box-and-whisker plots. You can use one of the following approaches:

    • You can specify LBOXES= linetype to provide a single linetype for all of the box-and-whisker plots.

    • You can specify LBOXES= (variable) to provide a distinct line type for each box-and-whisker plot. The variable must be a numeric variable in the input data set, and its values must be valid SAS/GRAPH linetype values ( numbers ranging from 1 to 46). The line type for the plot displayed for a particular group is the value of the variable in the observations corresponding to this group. Note that if there are multiple observations per group in the input data set, the values of the variable should be identical for all of the observations in a given group.

  • The default value is 1, which produces solid lines. Refer to the description of the SYMBOL statement in SAS/GRAPH Software: Reference for more information on valid linetypes .

LENDGRID= n

  • specifies the line type for the grid requested with the ENDGRID option. The default value is n=1 , which produces a solid line. If you use the LENDGRID= option, you do not need to specify the ENDGRID option. Refer to the description of the SYMBOL statement in SAS/GRAPH Software: Reference for more information on valid linetypes.

LGRID= n

  • specifies the line type for the grid requested with the GRID option. The default value is n=1 , which produces a solid line. If you use the LGRID= option, you do not need to specify the GRID option. Refer to the description of the SYMBOL statement in SAS/GRAPH Software: Reference for more information on valid linetypes.

LHREF= linetype

LH= linetype

  • specifies the line type for reference lines requested with the HREF= option. The default value is 2, which produces a dashed line. Refer to the description of the SYMBOL statement in SAS/GRAPH Software: Reference for more information on valid linetypes.

LOVERLAY= (linetypes)

  • specifies line types for the line segments connecting points on overlay plots. Line types in the LOVERLAY= list are matched with variables in the corresponding positions in the OVERLAY= list.

LVREF= linetype

LV= linetype

  • specifies the line type for reference lines requested by the VREF= option. The default value is 2, which produces a dashed line. Refer to the description of the SYMBOL statement in SAS/GRAPH Software: Reference for more information on valid linetypes.

MAXPANELS= n

  • specifies the maximum number of panels (pages or screens) for a plot. By default, n =20.

MISSBREAK

  • determines how groups are formed when observations are read from a DATA= data set and a character group variable is provided. When you specify the MISSBREAK option, observations with missing values of the group variable are not processed . Furthermore, the next observation with a nonmissing value of the group variable is treated as the beginning observation of a new group even if this value is identical to the most recent nonmissing group value. In other words, by specifying the option MISSBREAK and by inserting an observation with a missing group variable value into a group of consecutive observations with the same group variable value, you can split the group into two distinct groups of observations.

    By default, (that is, when you omit the MISSBREAK option), observations with missing values of the group variable are not processed, and all remaining observations with the same consecutive value of the group variable are treated as a single group.

NAME= 'string'

  • specifies a name for the box plot, not more than eight characters, that appears in the PROC GREPLAY master menu.

NLEGEND

  • requests a legend displaying group sizes. If the size is the same for each group, that number is displayed. Otherwise, the minimum and maximum group sizes are displayed.

NOBYREF

  • specifies that the reference line information in an HREF= or VREF= data set is to be applied uniformly to box plots created for all the BY groups in the input data set. If you specify the NOBYREF option, you do not need to provide BY variables in the reference line data set. By default, you must provide BY variables.

NOCHART

  • suppresses the creation of the chart. You typically specify the NOCHART option when you are using the procedure to compute group summary statistics and save them in an output data set.

NOFRAME

  • suppresses the default frame drawn around the plot.

NOHLABEL

  • suppresses the label for the horizontal (group) axis. Use the NOHLABEL option when the meaning of the axis is evident from the tick mark labels, such as when a date format is associated with the group variable.

NOOVERLAYLEGEND

  • suppresses the legend for overlay plots that is displayed by default when the OVERLAY= option is specified.

NOSERIFS

  • eliminates serifs from the whiskers of box-and-whisker plots.

NOTCHES

  • specifies that box-and-whisker plots are to be notched. The endpoints of the notches are located at the median plus and minus 1 . 58( ), where IQR is the interquartile range and n is the group size. The medians (central lines) of two box-and-whisker plots are significantly different at approximately the 0.05 level if the corresponding notches do not overlap. Refer to McGill, Tukey, and Larsen (1978) for more information. Figure 18.4 illustrates the NOTCHES option. Notice the folding effect at the bottom, which happens when the endpoint of a notch is beyond its corresponding quartile. This situation typically occurs when the group size is small.

click to expand
Figure 18.4: Box Plot: the NOTCHES Option

NOTICKREP

  • applies to character-valued group variables and specifies that only the first occurrence of repeated, adjacent group values is to be labeled on the horizontal axis.

NOVANGLE

  • requests vertical axis labels that are oriented vertically. By default, the labels are drawn at an angle of 90 degrees if a software font is used.

NPANELPOS= n

NPANEL= n

  • specifies the number of group positions per panel. A panel is defined as a screen or page. You typically specify the NPANELPOS= option to display more box-and-whisker plots on a panel than the default number, which is n =25.

    You can specify a positive or negative number for n . The absolute value of n must be at least 5. If n is positive, the number of positions is adjusted so that it is approximately equal to n and so that all panels display approximately the same number of group positions. If n is negative, no balancing is done, and each panel (except possibly the last) displays approximately n positions. In this case, the approximation is due only to axis scaling.

    You can use the INTERVAL= option to change the effect of the NPANELPOS= option when a date or time format is associated with the group variable. The INTERVAL= option enables you to match the scale of the horizontal axis to the scale of the group variable without having to associate a different format with the group variable.

OUTBOX= SAS-data-set

  • creates an output data set that contains group summary statistics and outlier values for a box plot. You can use an OUTBOX= data set as a BOX= input data set in a subsequent run of the procedure. See 'OUTBOX= Data Set' for details.

OUTHIGHHTML= variable

  • specifies a variable whose values are URLs to be associated with outlier points above the upper fence on a schematic box plot when graphics output is directed into HTML.

OUTHISTORY= SAS-data-set

  • creates an output data set that contains the group summary statistics. You can use an OUTHISTORY= data set as a HISTORY= input data set in a subsequent run of the procedure. See 'OUTHISTORY= Data Set' for details.

OUTLOWHTML= variable

  • specifies a variable whose values are URLs to be associated with outlier points below the lower fence on a schematic box plot when graphics output is directed into HTML.

OVERLAY= (variable-list)

  • specifies variables to be plotted as overlays on the box plot. One value for each overlay variable is plotted at each group position. If there are multiple observations with the same group variable value in the input data set, the overlay variable values from the first observation in each group are plotted. By default, the points in an overlay plot are connected with line segments.

OVERLAYCLIPSYM= symbol

  • specifies the symbol used to plot clipped values on overlay plots when the CLIPFACTOR= option is used.

OVERLAYCLIPSYMHT= value

  • specifies the height for the symbol used to plot clipped values on overlay plots when the CLIPFACTOR= option is used.

OVERLAYHTML= (variable-list)

  • specifies variables whose values are URLs to be associated with points on overlay plots when graphics output is directed into HTML. Variables in the OVERLAYHTML= list are matched with variables in the corresponding positions in the OVERLAY= list.

OVERLAYID= (variable-list)

  • specifies variables whose formatted values are used to label points on overlays. Variables in the OVERLAYID= list are matched with variables in the corresponding positions in the OVERLAY= list. The value of the OVERLAYID= variable should be the same for each observation with a given value of the group variable.

OVERLAYLEGLAB= 'label'

  • specifies the label displayed to the left of the overlay legend produced by the OVERLAY= option. The label can be up to 16 characters and must be enclosed in quotes. The default label is 'Overlays:'.

OVERLAYSYM= (symbol-list)

  • specifies symbols used to plot overlay variables. Symbols in the OVERLAYSYM= list are matched with variables in the corresponding positions in the OVERLAY= list.

OVERLAYSYMHT= (value-list)

  • specifies the heights of symbols used to plot overlay variables. Symbol heights in the OVERLAYSYMHT= list are matched with variables in the corresponding positions in the OVERLAY= list.

PAGENUM= 'string'

  • specifies the form of the label used for pagination. The string must be no longer than 16 characters, and it must include one or two occurrences of the substitution character ˜#'. The first ˜#' is replaced with the page number, and the optional second ˜#' is replaced with the total number of pages.

    The PAGENUM= option is useful when you are working with a large number of groups, resulting in multiple pages of output. For example, suppose that each of the following PLOT statements produces multiple pages:

      proc boxplot data=pistons;   plot diameter*hour / pagenum='Page #';   plot diameter*hour / pagenum='Page # of #';   plot diameter*hour / pagenum='#/#';   run;  

    The third page produced by the first statement would be labeled Page 3 . The third page produced by the second statement would be labeled Page 3 of 5 . The third page produced by the third statement would be labeled 3/5 .

    By default, no page number is displayed.

PAGENUMPOS=TL TR BL BR TL100 TR100 BL0 BR0

  • specifies where to position the page number requested with the PAGENUM= option. The keywords TL, TR, BL, and BR correspond to the positions top left, top right, bottom left, and bottom right, respectively. You can use the TL100 and TR100 keywords to ensure that the page number appears at the very top of a page when a title is displayed. The BL0 and BR0 keywords ensure that the page number appears at the very bottom of a page when footnotes are displayed.

    The default keyword is BR.

PCTLDEF= index

  • specifies one of five definitions used to calculate percentiles in the construction of box-and-whisker plots. The index can be 1, 2, 3, 4, or 5. The five corresponding percentile definitions are discussed in the section 'Percentile Definitions' on page 523. The default index is 5.

REPEAT

REP

  • specifies that the horizontal axis of a plot that spans multiple pages is to be arranged so that the last group position on a page is repeated as the first group position on the next page. The REPEAT option facilitates cutting and pasting panels together. When a SAS DATETIME format is associated with the group variable, the REPEAT option is the default.

SKIPHLABELS= n

SKIPHLABEL= n

  • specifies the number n of consecutive tick mark labels, beginning with the second tick mark label, that are thinned (not displayed) on the horizontal (group) axis. For example, specifying SKIPHLABEL=1 causes every other label to be skipped. Specifying SKIPHLABEL=2 causes the second and third labels to be skipped , the fifth and sixth labels to be skipped, and so forth.

    The default value of the SKIPHLABELS= option is the smallest value n for which tick mark labels do not collide. A specified n will be overridden to avoid collision. To reduce thinning, you can use the TURNHLABELS option.

SYMBOLLEGEND=LEGEND n

SYMBOLLEGEND=NONE

  • controls the legend for the levels of a symbol variable (see Example 18.1). You can specify SYMBOLLEGEND=LEGEND n , where n is the number of a LEGEND statement defined previously. You can specify SYMBOLLEGEND=NONE to suppress the default legend. Refer to SAS/GRAPH Software: Reference for more information on the LEGEND statement.

SYMBOLORDER=DATA INTERNAL FORMATTED

SYMORD=DATA INTERNAL FORMATTED

  • specifies the order in which symbols are assigned for levels of the symbol variable. The DATA keyword assigns symbols to values in the order in which values appear in the input data. The INTERNAL keyword assigns symbols based on sorted order of internal values of the symbol variable, and the FORMATTED keyword assigns them based on sorted formatted values. The default value is FORMATTED.

TOTPANELS= n

  • specifies the total number of panels (pages or screens) to be used to display the plot. This option overrides the NPANEL= option.

TURNHLABELS

TURNHLABEL

  • turns the major tick mark labels for the horizontal (group) axis so that they are arranged vertically. By default, labels are arranged horizontally. You should specify a software font (using the FONT= option) in conjunction with the TURNHLABELS option. Otherwise, the labels may be displayed with a mixture of hardware and software fonts.

    Note that arranging the labels vertically may leave insufficient vertical space on the panel for a plot.

VAXIS= value-list

VAXIS=AXIS n

  • specifies major tick mark values for the vertical axis of a box plot. The values must be listed in increasing order, must be evenly spaced, and must span the range of values displayed on the plot. You can specify the values with an explicit list or with an implicit list, as shown in the following example:

      proc boxplot;   plot width*hour / vaxis=0 2 4 6 8;   plot width*hour / vaxis=0 to 8 by 2;   run;  

    You can also specify a previously defined AXIS statement with the VAXIS= option.

VFORMAT= format

  • specifies a format to be used for displaying tick mark labels on the vertical axis of the box plot.

VMINOR= n

VM= n

  • specifies the number of minor tick marks between each major tick mark on the vertical axis. Minor tick marks are not labeled. By default, VMINOR=0.

VOFFSET= value

  • specifies the length in percent screen units of the offset at the ends of the vertical axis.

VREF= value-list

VREF= SAS-data-set

  • draws reference lines perpendicular to the vertical axis on the box plot. You can use this option in the following ways:

    • Specify the values for the lines with a VREF= list. Examples of the VREF= option follow:

        vref=20   vref=20 40 80  
    • Specify the values for the lines as the values of a numeric variable named _REF_ in a VREF= data set. Optionally, you can provide labels for the lines as values of a variable named _REFLAB_ , which must be a character variable of length 16 or less. If you want distinct reference lines to be displayed in plots for different analysis variables specified in the PLOT statement, you must include a character variable named _VAR_ , whose values are the names of the analysis variables. If you do not include the variable _VAR_ , all of the lines are displayed in all of the plots.

      Each observation in the VREF= data set corresponds to a reference line. If BY variables are used in the input data set, the same BY variable structure must be used in the VREF= data set unless you specify the NOBYREF option.

VREFLABELS= 'label1' ... 'labeln'

  • specifies labels for the reference lines requested by the VREF= option. The number of labels must equal the number of lines. Enclose each label in quotes. Labels can be up to 16 characters.

VREFLABPOS= n

  • specifies the horizontal position of the VREFLABEL= label, as described in the following table. By default, n=1 .

    n

    Label Position

    1

    left-justified in plot area

    2

    right-justified in plot area

    3

    left-justified in right margin

VZERO

  • forces the origin to be included in the vertical axis for a box plot.

WAXIS= n

  • specifies the width in pixels for the axis and frame lines. By default, n=1 .

WGRID= n

  • specifies the width in pixels for grid lines requested with the ENDGRID and GRID options. By default, n=1 .

WOVERLAY= (value-list)

  • specifies the widths in pixels for the line segments connecting points on overlay plots. Widths in the WOVERLAY= list are matched with variables in the corresponding positions in the OVERLAY= list. By default, all overlay widths are 1.

INSET Statement

  • INSET keywords < / options > ;

You can use any number of INSET statements in the BOXPLOT procedure. Each INSET statement produces one inset and must follow a PLOT statement. The inset appears in all panels produced by the last PLOT statement preceding it. The data requested using the keywords are displayed in the order in which they are specified. Summary statistics requested with an INSET statement are calculated using the observations in all groups.

keywords

identify summary statistics or other data to be displayed in the inset. By default, inset statistics are identified with appropriate labels, and numeric values are printed using appropriate formats. However, you can provide customized labels and formats. You provide the customized label by specifying the keyword for that statistic followed by an equal sign (=) and the label in quotes. Labels can have up to 24 characters. You provide the numeric format in parentheses after the keyword . Note that if you specify both a label and a format for a statistic, the label must appear before the format. The keywords are listed in Table 18.2.

options

control the appearance of the inset. Table 18.3 lists all the options in the INSET statement. Complete descriptions for each option follow.

Table 18.2: INSET Statement Keywords

DATA=

(label, value) pairs from SAS-data-set

MEAN

mean of all observations

MIN

minimum observed value

MAX

maximum observed value

NMIN

minimum group size

NMAX

maximum group size

NOBS

number of observations in box plot

STDDEV

pooled standard deviation

Table 18.3: INSET Options

CFILL= color BLANK

specifies color of inset background

CFILLH= color

specifies color of header background

CFRAME= color

specifies color of frame

CHEADER= color

specifies color of header text

CSHADOW= color

specifies color of drop shadow

CTEXT= color

specifies color of inset text

DATA

specifies data units for POSITION=( x,y ) coordinates

FONT= font

specifies font of text

FORMAT= format

specifies format of values in inset

HEADER= 'quoted string'

specifies header text

HEIGHT= value

specifies height of inset text

NOFRAME

suppresses frame around inset

POSITION= position

specifies position of inset

REFPOINT=BRBLTRTL

specifies reference point of inset positioned with POSITION=( x,y ) coordinates

The DATA= keyword specifies a SAS data set containing (label, value) pairs to be displayed in an inset. The data set must contain the variables _LABEL_ and _VALUE_ . _LABEL_ is a character variable of length 24 or less whose values provide labels for inset entries. _VALUE_ can be character or numeric, and provides values displayed in the inset. The label and value from each observation in the DATA= data set occupy one line in the inset.

The pooled standard deviation requested with the STDDEV keyword is defined as

click to expand

where N is the number of groups, n i is the size of the i th group, and is the variance of the i th group.

Following are descriptions of the options that you can specify in the INSET statement after a slash (/).

CFILL= color BLANK

  • specifies the color of the inset background (including the header background if you do not specify the CFILLH= option).

    If you do not specify the CFILL= option, then by default, the background is empty. This means that items that overlap the inset (such as box-and-whisker plots or reference lines) show through the inset. If you specify any value for the CFILL= option, then overlapping items no longer show through the inset. Specify CFILL=BLANK to leave the background uncolored and also to prevent items from showing through the inset.

CFILLH= color

  • specifies the color of the header background. By default, if you do not specify a CFILLH= color, the CFILL= color is used.

CFRAME= color

  • specifies the color of the frame around the inset. By default, the frame is the same color as the axis of the plot.

CHEADER= color

  • specifies the color of the header text. By default, if you do not specify a CHEADER= color, the CTEXT= color is used.

CSHADOW= color

CS= color

  • specifies the color of the drop shadow. If you do not specify the CSHADOW= option, a drop shadow is not displayed.

CTEXT= color

CT= color

  • specifies the color of the text in the inset. By default, the inset text color is the same as the other text on the box plot.

DATA

  • specifies that data coordinates are to be used in positioning the inset with the POSITION= option. The DATA option is available only when you specify POSITION=( x,y ), and it must be placed immediately after the coordinates ( x,y ). See the entry for the POSITION= option.

FONT= font

  • specifies the font of the text. By default, the font is SIMPLEX if the inset is located in the interior of the plot, and the font is the same as the other text displayed on the plot if the inset is located in the exterior of the plot.

FORMAT= format

  • specifies a format for all the values displayed in an inset. If you specify a format for a particular statistic, then this format overrides the format you specified with the FORMAT= option.

HEADER= 'string'

  • specifies the header text. The string cannot exceed 40 characters. If you do not specify the HEADER= option, no header line appears in the inset.

HEIGHT= value

  • specifies the height of the text.

NOFRAME

  • suppresses the frame drawn around the text.

POSITION= position

POS= position

  • determines the position of the inset. The position can be a compass point keyword, a margin keyword, or a pair of coordinates ( x,y ). You can specify coordinates in axis percent units or axis data units. For more information, see 'Positioning Insets ' on page 526. By default, POSITION=NW, which positions the inset in the upper left (northwest) corner of the plot.

REFPOINT=BR BL TR TL

RP=BR BL TR TL

  • specifies the reference point for an inset that is positioned by a pair of coordinates with the POSITION= option. Use the REFPOINT= option with POSITION= coordinates. The REFPOINT= option specifies which corner of the inset frame you want positioned at coordinates ( x, y ). The keywords BL, BR, TL, and TR represent bottom left, bottom right, top left, and top right, respectively. The default is REFPOINT=BL.

    If you specify the position of the inset as a compass point or margin keyword, the REFPOINT= option is ignored.

INSETGROUP Statement

  • INSETGROUP keywords < / options > ;

The INSETGROUP statement displays statistics associated with individual groups on the box plot produced by the last PLOT statement preceding it. No more than two INSETGROUP statements can be associated with a given PLOT statement: one above the box plot and one below it. The data requested using the keywords are displayed in the order in which they are specified.

keywords

identify summary statistics to be displayed in the insets. By default, inset statistics are identified with appropriate labels, and numeric values are printed using appropriate formats. However, you can provide customized labels and formats. You provide the customized label by specifying the keyword for that statistic followed by an equal sign (=) and the label in quotes. Labels can have up to 24 characters. You provide the numeric format in parentheses after the keyword . Note that if you specify both a label and a format for a statistic, the label must appear before the format. The keywords are listed in Table 18.4.

options

control the appearance of the insets. Table 18.5 lists all the options in the INSETGROUP statement. Complete descriptions for each option follow.

Table 18.4: INSETGROUP Statement Keywords

MEAN

group mean

MIN

minimum value in group

MAX

maximum value in group

N

number of observations in group

NHIGH

number of outliers above upper fence

NLOW

number of outliers below lower fence

NOUT

total number of outliers

Q1

first quartile

Q2

second quartile

Q3

third quartile

RANGE

range of group values

STDDEV

group standard deviation

Table 18.5: INSETGROUP Options

CFILL= color BLANK

specifies color of inset background

CFILLH= color

specifies color of header background

CFRAME= color

specifies color of frame

CHEADER= color

specifies color of header text

CTEXT= color

specifies color of inset text

FONT= font

specifies font of text

FORMAT= format

specifies format of values in inset

HEADER= 'quoted string'

specifies header text

HEIGHT= value

specifies height of inset text

NOFRAME

suppresses frame around inset

POSITION= position

specifies position of inset

Table 18.5 lists all options in the INSETGROUP statement.

Following are descriptions of the options that you can specify in the INSETGROUP statement after a slash (/).

CFILL= color

  • specifies the color of the inset background (including the header background if you do not specify the CFILLH= option). If you do not specify the CFILL= option, then by default, the background is empty.

CFILLH= color

  • specifies the color of the header background. By default, if you do not specify a CFILLH= color, the CFILL= color is used.

CFRAME= color

  • specifies the color of the frame around the inset. By default, the frame is the same color as the axis of the plot.

CHEADER= color

  • specifies the color of the header text. By default, if you do not specify a CHEADER= color, the CTEXT= color is used.

CTEXT= color

CT= color

  • specifies the color of the inset text. By default, the inset text color is the same as the other text on the plot.

FONT= font

  • specifies the font of the inset text. By default, the font is SIMPLEX.

FORMAT= format

  • specifies a format for all the values displayed in an inset. If you specify a format for a particular statistic, then this format overrides the format you specified with the FORMAT= option.

HEADER= 'string'

  • specifies the header text. The string cannot exceed 40 characters. If you do not specify the HEADER= option, no header line appears in the inset.

HEIGHT= value

  • specifies the height of the text.

NOFRAME

  • suppresses the frame drawn around the text.

POSITION= position

POS= position

  • determines the position of the inset. Valid positions are TOP, TOPOFF, AXIS, and BOTTOM. By default, POSITION=TOP.

    Position Keyword

    Description

    TOP

    top of plot, immediately above axis frame

    TOPOFF

    top of plot, offset from axis frame

    AXIS

    bottom of plot, immediately above horizontal axis

    BOTTOM

    bottom of plot, below horizontal axis label

BY Statement

  • BY variables ;

You can specify a BY statement with PROC BOXPLOT to obtain separate box plots for each group defined by the levels of the BY variables. When a BY statement appears, the procedure expects the input data set to be sorted in order of the BY variables.

If your input data set is not sorted in ascending order, use one of the following alternatives:

  • Sort the data using the SORT procedure with a similar BY statement.

  • Specify the BY statement option NOTSORTED or DESCENDING in the BY statement for the BOXPLOT procedure. The NOTSORTED option does not mean that the data are unsorted but rather that the data are arranged in groups (according to values of the BY variables) and that these groups are not necessarily in alphabetical or increasing numeric order.

  • Create an index on the BY variables using the DATASETS procedure.

For more information on the BY statement, refer to the discussion in SAS Language Reference: Concepts . For more information on the DATASETS procedure, refer to the discussion in the SAS Procedures Guide .

ID Statement

  • ID variables ;

The ID statement specifies variables used to identify observations. The ID variables must be variables in the input data set.

If you specify one of the keywords SCHEMATICID or SCHEMATICIDFAR with the BOXSTYLE= option, the value of an ID variable is used to label each extreme observation. When you specify a BOX= data set, the label values come from the variable _ID_ , if it is present in the data set. When you specify a DATA= or HISTORY= input data set, or a BOX= data set that does not contain the variable _ID_ , the labels come from the first variable listed in the ID statement. If there is no ID statement, the outliers are not labeled.




SAS.STAT 9.1 Users Guide (Vol. 1)
SAS/STAT 9.1 Users Guide, Volumes 1-7
ISBN: 1590472438
EAN: 2147483647
Year: 2004
Pages: 156

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net