Syntax


PROC UNIVARIATE < options > ;

  • BY variables ;

  • CLASS variable-1 < (v-options) > < variable-2 < (v-options) > >
    < /KEYLEVEL=value1(value1 value2) > ;

  • FREQ variable ;

  • HISTOGRAM < variables > < /options > ;

  • ID variables ;

  • INSET keyword-list < /options > ;

  • OUTPUT < OUT= SAS-data-set >
    < keyword1=names. . .keywordk= names > < percentile-options > ;

  • PROBPLOT < variables > < /options > ;

  • QQPLOT < variables > < /options > ;

  • VAR variables ;

  • WEIGHT variable ;

The PROC UNIVARIATE statement invokes the procedure. The VAR statement specifies the numeric variables to be analyzed, and it is required if the OUTPUT statement is used to save summary statistics in an output data set. If you do not use the VAR statement, all numeric variables in the data set are analyzed .

The plot statements HISTOGRAM, PROBPLOT, and QQPLOT create graphical displays, and the INSET statement enhances these displays by adding a table of summary statistics directly on the graph. You can specify one or more of each of the plot statements, the INSET statement, and the OUTPUT statement. If you use a VAR statement, the variables listed in a plot statement must be a subset of the variables listed in the VAR statement.

You can use a CLASS statement to specify one or two variables that group the data into classification levels. The analysis is carried out for each combination of levels, and you can use the CLASS statement with a plot statement to create a comparative display.

You can specify a BY statement to obtain separate analysis for each BY group. The FREQ statement specifies a variable whose values provide the frequency for each observation. The WEIGHT statement specifies a variable whose values are used to weight certain statistics. The ID statement specifies one or more variables to identify the extreme observations.

PROC UNIVARIATE Statement

  • PROC UNIVARIATE < options > ;

The PROC UNIVARIATE statement is required to invoke the UNIVARIATE procedure. You can use the PROC UNIVARIATE statement by itself to request a variety of statistics for summarizing the data distribution of each analysis variable:

  • sample moments

  • basic measures of location and variability

  • confidence intervals for the mean, standard deviation, and variance

  • tests for location

  • tests for normality

  • trimmed and Winsorized means

  • robust estimates of scale

  • quantiles and related confidence intervals

  • extreme observations and extreme values

  • frequency counts for observations

  • missing values

In addition, you can use options in the PROC UNIVARIATE statement to

  • specify the input data set to be analyzed

  • specify a graphics catalog for saving graphical output

  • specify rounding units for variable values

  • specify the definition used to calculate percentiles

  • specify the divisor used to calculate variances and standard deviations

  • request that plots be produced on line printers and define special printing characters used for features

  • suppress tables

The following are the options that can be used with the PROC UNIVARIATE statement:

ALL

  • requests all statistics and tables that the FREQ, MODES, NEXTRVAL=5, PLOT, and CIBASIC options generate. If the analysis variables are not weighted, this option also requests the statistics and tables generated by the CIPCTLDF, CIPCTLNORMAL, LOCCOUNT, NORMAL, ROBUSTSCALE, TRIMMED=.25, and WINSORIZED=.25 options. PROC UNIVARIATE also uses any values that you specify for ALPHA=, MU0=, NEXTRVAL=, CIBASIC, CIPCTLDF, CIPCTLNORMAL, TRIMMED=, or WINSORIZED= to produce the output.

ALPHA= ±

  • specifies the level of significance ± for 100(1 ˆ’ ± )% confidence intervals. The value ± must be between 0 and 1; the default value is 0.05, which results in 95% confidence intervals.

  • Note that specialized ALPHA= options are available for a number of confidence interval options. For example, you can specify CIBASIC( ALPHA=0.10 ) to request a table of basic confidence limits at the 90% level. The default values of these options are the value of the ALPHA= option in the PROC statement.

ANNOTATE= SAS-data-set

ANNO= SAS-data-set

  • specifies an input data set that contains annotate variables as described in SAS/GRAPH Reference . You can use this data set to add features to your high-resolution graphics. PROC UNIVARIATE adds the features in this data set to every high-resolution graph that is produced in the procedure. PROC UNIVARIATE does not use the ANNOTATE= data set unless you create a high-resolution graph with the HISTOGRAM, PROBPLOT, or QQPLOT statement. Use the ANNOTATE= option in the HISTOGRAM, PROBPLOT, or QQPLOT statement if you want to add a feature to a specific graph produced by the statement.

CIBASIC < ( < TYPE= keyword > < ALPHA= ± > ) >

  • requests confidence limits for the mean, standard deviation, and variance based on the assumption that the data are normally distributed. If you use the CIBASIC option, you must use the default value of VARDEF=, which is DF.

  • TYPE= keyword

    • specifies the type of confidence limit, where keyword is LOWER, UPPER, or TWOSIDED. The default value is TWOSIDED.

  • ALPHA= ±

    • specifies the level of significance ± for 100(1 ˆ’ ± )% confidence intervals. The value ± must be between 0 and 1; the default value is 0.05, which results in 95% confidence intervals. The default value is the value of ALPHA= given in the PROC statement.

CIPCTLDF < ( < TYPE= keyword > < ALPHA= ± > ) >

CIQUANTDF < ( < TYPE= keyword > < ALPHA= ± > ) >

  • requests confidence limits for quantiles based on a method that is distribution-free. In other words, no specific parametric distribution such as the normal is assumed for the data. PROC UNIVARIATE uses order statistics (ranks) to compute the confidence limits as described by Hahn and Meeker (1991). This option does not apply if you use a WEIGHT statement.

  • TYPE= keyword

    • specifies the type of confidence limit, where keyword is LOWER, UPPER, SYMMETRIC, or ASYMMETRIC. The default value is SYMMETRIC.

  • ALPHA= ±

    • specifies the level of significance ± for 100(1 ˆ’ ± )% confidence intervals. The value must be between 0 and 1; the default value is 0.05, which results in 95% confidence intervals. The default value is the value of ALPHA= given in the PROC statement.

CIPCTLNORMAL < ( < TYPE= keyword > < ALPHA= ± > ) >

CIQUANTNORMAL < ( < TYPE= keyword > < ALPHA= ± > ) >

  • requests confidence limits for quantiles based on the assumption that the data are normally distributed. The computational method is described in Section 4.4.1 of Hahn and Meeker (1991) and uses the noncentral t distribution as given by Odeh and Owen (1980). This option does not apply if you use a WEIGHT statement

  • TYPE= keyword

    • specifies the type of confidence limit, where keyword is LOWER, UPPER, or TWOSIDED. The default is TWOSIDED.

  • ALPHA= ±

    • specifies the level of significance ± for 100(1 ˆ’ ± )% confidence intervals. The value ± must be between 0 and 1; the default value is 0.05, which results in 95% confidence intervals. The default value is the value of ALPHA= given in the PROC statement.

DATA= SAS-data-set

  • specifies the input SAS data set to be analyzed. If the DATA= option is omitted, the procedure uses the most recently created SAS data set.

EXCLNPWGT

  • excludes observations with nonpositive weight values (zero or negative) from the analysis. By default, PROC UNIVARIATE treats observations with negative weights like those with zero weights and counts them in the total number of observations. This option applies only when you use a WEIGHT statement.

FREQ

  • requests a frequency table that consists of the variable values, frequencies, cell percentages, and cumulative percentages.

    If you specify the WEIGHT statement, PROC UNIVARIATE includes the weighted count in the table and uses this value to compute the percentages.

GOUT= graphics-catalog

  • specifies the SAS catalog that PROC UNIVARIATE uses to save high-resolution graphics output. If you omit the libref in the name of the graphics-catalog , PROC UNIVARIATE looks for the catalog in the temporary library called WORK and creates the catalog if it does not exist.

LOCCOUNT

  • requests a table that shows the number of observations greater than, not equal to, and less than the value of MU0=. PROC UNIVARIATE uses these values to construct the sign test and the signed rank test. This option does not apply if you use a WEIGHT statement.

MODESMODE

  • requests a table of all possible modes. By default, when the data contain multiple modes, PROC UNIVARIATE displays the lowest mode in the table of basic statistical measures. When all the values are unique, PROC UNIVARIATE does not produce a table of modes.

MU0= values

LOCATION= values

  • specifies the value of the mean or location parameter ( µ ) in the null hypothesis for tests of location summarized in the table labeled Tests for Location: Mu0=value . If you specify one value, PROC UNIVARIATE tests the same null hypothesis for all analysis variables. If you specify multiple values, a VAR statement is required, and PROC UNIVARIATE tests a different null hypothesis for each analysis variable in the corresponding order. The default value is 0.

    The following statement tests the hypothesis µ = 0 for the first variable and the hypothesis µ = 0 . 5 for the second variable.

      proc univariate mu0=0 0.5;  

NEXTROBS= n

  • specifies the number of extreme observations that PROC UNIVARIATE lists in the table of extreme observations. The table lists the n lowest observations and the n highest observations. The default value is 5, and n can range between 0 and half the maximum number of observations. You can specify NEXTROBS=0 to suppress the table of extreme observations.

NEXTRVAL= n

  • specifies the number of extreme values that PROC UNIVARIATE lists in the table of extreme values. The table lists the n lowest unique values and the n highest unique values. The default value is 0, and n can range between 0 and half the maximum number of observations. By default, n = 0 and no table is displayed.

NOBYPLOT

  • suppresses side-by-side box plots that are created by default when you use the BY statement and the ALL option or the PLOT option in the PROC statement.

NOPRINT

  • suppresses all the tables of descriptive statistics that the PROC UNIVARIATE statement creates. NOPRINT does not suppress the tables that the HISTOGRAM statement creates. You can use the NOPRINT option in the HISTOGRAM statement to suppress the creation of its tables. Use NOPRINT when you want to create an OUT= output data set only.

NORMAL

NORMALTEST

  • requests tests for normality that include a series of goodness-of-fit tests based on the empirical distribution function. The table provides test statistics and p -values for the Shapiro-Wilk test (provided the sample size is less than or equal to 2000), the Kolmogorov-Smirnov test, the Anderson-Darling test, and the Cram r-von Mises test. This option does not apply if you use a WEIGHT statement.

PCTLDEF= value

DEF= value

  • specifies the definition that PROC UNIVARIATE uses to calculate quantiles. The default value is 5. Values can be 1, 2, 3, 4, or 5. You cannot use PCTLDEF= when you compute weighted quantiles. See the section Calculating Percentiles on page 273 for details on quantile definitions.

PLOTS PLOT

  • produces a stem-and-leaf plot (or a horizontal bar chart), a box plot, and a normal probability plot in line printer output. If you use a BY statement, side-by-side box plots that are labeled Schematic Plots appear after the univariate analysis for the last BY group.

PLOTSIZE= n

  • specifies the approximate number of rows used in line-printer plots requested with the PLOTS option. If n is larger than the value of the SAS system option PAGESIZE=, PROC UNIVARIATE uses the value of PAGESIZE=. If n is less than 8, PROC UNIVARIATE uses eight rows to draw the plots.

ROBUSTSCALE

  • produces a table with robust estimates of scale. The statistics include the interquartile range, Gini s mean difference, the median absolute deviation about the median ( MAD ), and two statistics proposed by Rousseeuw and Croux (1993), Q n , and S n . This option does not apply if you use a WEIGHT statement.

ROUND= units

  • specifies the units to use to round the analysis variables prior to computing statistics. If you specify one unit, PROC UNIVARIATE uses this unit to round all analysis variables. If you specify multiple units, a VAR statement is required, and each unit rounds the values of the corresponding analysis variable. If ROUND=0, no rounding occurs. The ROUND= option reduces the number of unique variable values, thereby reducing memory requirements for the procedure. For example, to make the rounding unit 1 for the first analysis variable and 0.5 for the second analysis variable, submit the statement

      proc univariate round=1 0.5;   var yldstren tenstren;   run;  
  • When a variable value is midway between the two nearest rounded points, the value is rounded to the nearest even multiple of the roundoff value. For example, with a roundoff value of 1, the variable values of ˆ’ 2.5, ˆ’ 2.2, and ˆ’ 1.5 are rounded to ˆ’ 2; the values of ˆ’ 0.5, 0.2, and 0.5 are rounded to 0; and the values of 0.6, 1.2, and 1.4 are rounded to 1.

TRIMMED= values < ( < TYPE= keyword > < ALPHA= ± > ) >

TRIM= values < ( < TYPE= keyword > < ALPHA= ± > ) >

  • requests a table of trimmed means, where value specifies the number or the proportion of observations that PROC UNIVARIATE trims. If the value is the number n of trimmed observations, n must be between 0 and half the number of nonmissing observations. If value is a proportion p between 0 and 1/2, the number of observations that PROC UNIVARIATE trims is the smallest integer that is greater than or equal to np , where n is the number of observations. To include confidence limits for the mean and the Student s t test in the table, you must use the default value of VARDEF= which is DF. For details concerning the computation of trimmed means, see the section Trimmed Means on page 279.

  • TYPE= keyword

    • specifies the type of confidence limit for the mean, where keyword is LOWER, UPPER, or TWOSIDED. The default value is TWOSIDED.

  • ALPHA= ±

    • specifies the level of significance ± for 100(1 ˆ’ ± )% confidence intervals. The value ± must be between 0 and 1; the default value is 0.05, which results in 95% confidence intervals.

  • This option does not apply if you use a WEIGHT statement.

VARDEF= divisor

  • specifies the divisor to use in the calculation of variances and standard deviation. By default, VARDEF=DF. The following table shows the possible values for divisor and associated divisors.

    Table 3.1: Possible Values for VARDEF=

    Value

    Divisor

    Formula for Divisor

    DF

    Degrees of freedom

    n “ 1

    N

    Number of observations

    n

    WDF

    Sum of weights minus one

    ( & pound ; i w i ) ˆ’ 1

    WEIGHTWGT

    Sum of weights

    i w i

  • The procedure computes the variance as where CSS is the corrected sums of squares and equals . When you weight the analysis variables, click to expand where is the weighted mean.

  • The default value is DF. To compute the standard error of the mean, confidence limits, and Student s t test, use the default value of VARDEF=.

  • When you use the WEIGHT statement and VARDEF=DF, the variance is an estimate of s 2 where the variance of the i th observation is and w i is the weight for the i th observation. This yields an estimate of the variance of an observation with unit weight.

  • When you use the WEIGHT statement and VARDEF=WGT, the computed variance is asymptotically (for large n ) an estimate of where w is the average weight. This yields an asymptotic estimate of the variance of an observation with average weight.

WINSORIZED= values < ( < TYPE= keyword > < ALPHA= ± > ) >

WINSOR= values < ( < TYPE= keyword > < ALPHA= ± > ) >

  • requests of a table of Winsorized means, where value is the number or the proportion of observations that PROC UNIVARIATE uses to compute the Winsorized mean. If the value is the number n of winsorized observations, n must be between 0 and half the number of nonmissing observations. If value is a proportion p between 0 and 1/2, the number of observations that PROC UNIVARIATE uses is equal to the smallest integer that is greater than or equal to np , where n is the number of observations. To include confidence limits for the mean and the student t test in the table, you must use the default value of VARDEF=, which is DF. For details concerning the computation of Winsorized means, see the section Winsorized Means on page 278.

  • TYPE= keyword

    • specifies the type of confidence limit for the mean, where keyword is LOWER, UPPER, or TWOSIDED. The default is TWOSIDED.

  • ALPHA= ±

    • specifies the level of significance ± for 100(1 ˆ’ ± )% confidence intervals. The value ± must be between 0 and 1; the default value is 0.05, which results in 95% confidence intervals.

  • This option does not apply if you use a WEIGHT statement.

BY Statement

  • BY variables ;

You can specify a BY statement with PROC UNIVARIATE to obtain separate analyses for each BY group. The BY statement specifies the variables that the procedure uses to form BY groups. You can specify more than one variable . If you do not use the NOTSORTED option in the BY statement, the observations in the data set must either be sorted by all the variables that you specify or they must be indexed appropriately.

DESCENDING

  • specifies that the data set is sorted in descending order by the variable that immediately follows the word DESCENDING in the BY statement.

NOTSORTED

  • specifies that observations are not necessarily sorted in alphabetic or numeric order. The data are grouped in another way, for example, chronological order.

    The requirement for ordering or indexing observations according to the values of BY variables is suspended for BY-group processing when you use the NOTSORTED option. In fact, the procedure does not use an index if you specify NOTSORTED. The procedure defines a BY group as a set of contiguous observations that have the same values for all BY variables. If observations with the same values for the BY variables are not contiguous, the procedure treats each contiguous set as a separate BY group.

CLASS Statement

  • CLASS variable-1 < (v-options) > < variable-2 < (v-options) > >
    < /KEYLEVEL=value1( value1 value2 ) > ;

The CLASS statement specifies one or two variables that the procedure uses to group the data into classification levels. Variables in a CLASS statement are referred to as class variables . Class variables can be numeric or character. Class variables can have floating point values, but they typically have a few discrete values that define levels of the variable. You do not have to sort the data by class variables. PROC UNIVARIATE uses the formatted values of the class variables to determine the classification levels.

You can specify the following v-options enclosed in parentheses after the class variable:

MISSING

  • specifies that missing values for the CLASS variable are to be treated as valid classification levels. Special missing values that represent numeric values (the letters A through Z and the underscore (_) character) are each considered as a separate value. If you omit MISSING, PROC UNIVARIATE excludes the observations with a missing class variable value from the analysis. Enclose this option in parentheses after the class variable.

ORDER=DATA FORMATTED FREQ INTERNAL

  • specifies the display order for the class variable values. The default value is INTERNAL. You can specify the following values with the ORDER= option :

  • DATA

    • orders values according to their order in the input data set. When you use a HISTOGRAM, PROBPLOT, or QQPLOT statement, PROC UNIVARIATE displays the rows ( columns ) of the comparative plot from top to bottom (left to right) in the order that the class variable values first appear in the input data set.

  • FORMATTED

    • orders values by their ascending formatted values. This order may depend on your operating environment. When you use a HISTOGRAM, PROBPLOT, or QQPLOT statement, PROC UNIVARIATE displays the rows (columns) of the comparative plot from top to bottom (left to right) in increasing order of the formatted class variable values. For example, suppose a numeric class variable DAY (with values 1, 2, and 3) has a user -defined format that assigns Wednesday to the value 1, Thursday to the value 2, and Friday to the value 3. The rows of the comparative plot will appear in alphabetical order (Friday, Thursday, Wednesday) from top to bottom.

    • If there are two or more distinct internal values with the same formatted value, then PROC UNIVARIATE determines the order by the internal value that occurs first in the input data set. For numerical variables without an explicit format, the levels are ordered by their internal values.

  • FREQ

    • orders values by descending frequency count so that levels with the most observations are listed first. If two or more values have the same frequency count, PROC UNIVARIATE uses the formatted values to determine the order.

    • When you use a HISTOGRAM, PROBPLOT, or QQPLOT statement, PROC UNIVARIATE displays the rows (columns) of the comparative plot from top to bottom (left to right) in order of decreasing frequency count for the class variable values.

  • INTERNAL

    • orders values by their unformatted values, which yields the same order as PROC SORT. This order may depend on your operating environment.

    • When you use a HISTOGRAM, PROBPLOT, or QQPLOT statement, PROC UNIVARIATE displays the rows (columns) of the comparative plot from top to bottom (left to right) in increasing order of the internal (unformatted) values of the class variable. The first class variable is used to label the rows of the comparative plots (top to bottom). The second class variable is used to label the columns of the comparative plots (left to right). For example, suppose a numeric class variable DAY (with values 1, 2, and 3) has a user-defined format that assigns Wednesday to the value 1, Thursday to the value 2, and Friday to the value 3. The rows of the comparative plot will appear in day-of-the-week order (Wednesday, Thursday, Friday) from top to bottom.

  • You can specify the following option after the slash (/) in the CLASS statement.

KEYLEVEL= value1 ( value1 value2 )

  • specifies the key cell in a comparative plot. PROC UNIVARIATE first determines the bin size and midpoints for the key cell, and then extends the midpoint list to accommodate the data ranges for the remaining cells. Thus, the choice of the key cell determines the uniform horizontal axis that PROC UNIVARIATE uses for all cells . If you specify only one class variable and use a HISTOGRAM statement, KEYLEVEL= value identifies the key cell as the level for which variable is equal to value. By default, PROC UNIVARIATE sorts the levels in the order that is determined by the ORDER= option. Then, the key cell is the first occurrence of a level in this order. The cells display in order from top to bottom or left to right. Consequently, the key cell appears at the top (or left). When you specify a different key cell with the KEYLEVEL= option, this cell appears at the top (or left).

  • Likewise, with the PROBPLOT and QQPLOT statements, the key cell determines uniform axis scaling. If you specify two class variables, use KEYLEVEL= value1 value2 to identify the key cell as the level for which variable- n is equal to value- n .

  • By default, PROC UNIVARIATE sorts the levels of the first CLASS variable in the order that is determined by its ORDER= option and, within each of these levels, it sorts the levels of the second CLASS variable in the order that is determined by its ORDER= option. Then, the default key cell is the first occurrence of a combination of levels for the two variables in this order. The cells display in the order of the first CLASS variable from top to bottom and in the order of the second CLASS variable from left to right. Consequently, the default key cell appears at the upper left corner.

  • When you specify a different key cell with the KEYLEVEL= option, this cell appears at the upper left corner.

  • The length of the KEYLEVEL= value cannot exceed 16 characters and you must specify a formatted value.

  • The KEYLEVEL= option does not apply unless you specify a HISTOGRAM, PROBPLOT, or QQPLOT statement.

FREQ Statement

  • FREQ variable ;

The FREQ statement specifies a numeric variable whose value represents the frequency of the observation. If you use the FREQ statement, the procedure assumes that each observation represents n observations, where n is the value of variable. If the variable is not an integer, the SAS System truncates it. If the variable is less than 1 or is missing, the procedure excludes that observation from the analysis. See Example 3.6.

Note: The FREQ statement affects the degrees of freedom, but the WEIGHT statement does not.

HISTOGRAM Statement

  • HISTOGRAM < variables > < /options > ;

The HISTOGRAM statement creates histograms and optionally superimposes estimated parametric and nonparametric probability density curves. You cannot use the WEIGHT statement with the HISTOGRAM statement. You can use any number of HISTOGRAM statements after a PROC UNIVARIATE statement. The components of the HISTOGRAM statement are described as follows.

variables

  • are the variables for which histograms are to be created. If you specify a VAR statement, the variables must also be listed in the VAR statement. Otherwise, the variables can be any numeric variables in the input data set. If you do not specify variables in a VAR statement or in the HISTOGRAM statement, then by default, a histogram is created for each numeric variable in the DATA= data set. If you use a VAR statement and do not specify any variables in the HISTOGRAM statement, then by default, a histogram is created for each variable listed in the VAR statement.

  • For example, suppose a data set named Steel contains exactly two numeric variables named Length and Width . The following statements create two histograms, one for Length and one for Width :

      proc univariate data=Steel;   histogram;   run;  
  • Likewise, the following statements create histograms for Length and Width :

      proc univariate data=Steel;   var Length Width;   histogram;   run;  
  • The following statements create a histogram for Length only:

      proc univariate data=Steel;   var Length Width;   histogram Length;   run;  

options

  • add features to the histogram. Specify all options after the slash (/) in the HISTOGRAM statement. Options can be one of the following:

    • primary options for fitted parametric distributions and kernel density estimates

    • secondary options for fitted parametric distributions and kernel density estimates

    • general options for graphics and output data sets

  • For example, in the following statements, the NORMAL option displays a fitted normal curve on the histogram, the MIDPOINTS= option specifies midpoints for the histogram, and the CTEXT= option specifies the color of the text:

      proc univariate data=Steel;   histogram Length / normal   midpoints = 5.6 5.8 6.0 6.2 6.4   ctext     = blue;   run;  
  • Table 3.2 through Table 3.12 list the HISTOGRAM options by function. For complete descriptions, see the the section Dictionary of Options on page 217.

Table 3.2: Primary Options for Parametric Fitted Distributions

BETA( beta-options )

Fits beta distribution with threshold parameter , scale parameter ƒ , and shape parameters ± and ²

EXPONENTIAL( exponential-options )

Fits exponential distribution with threshold parameter and scale parameter ƒ

GAMMA( gamma-options )

Fits gamma distribution with threshold parameter , scale parameter ƒ , and shape parameter ±

LOGNORMAL( lognormal-options )

Fits lognormal distribution with threshold parameter , scale parameter , and shape parameter ƒ

NORMAL( normal-options )

Fits normal distribution with mean µ and standard deviation ƒ

WEIBULL( Weibull-options )

Fits Weibull distribution with threshold parameter , scale parameter ƒ , and shape parameter c

Parametric Density Estimation Options

  • Table 3.2 lists primary options that display a parametric density estimate on the histogram.

  • Table 3.3 through Table 3.9 list secondary options that specify parameters for fitted parametric distributions and that control the display of fitted curves. Specify these secondary options in parentheses after the primary distribution option . For example, you can fit a normal curve by specifying the NORMAL option as follows:

      proc univariate;   histogram / normal(color=red mu=10 sigma=0.5);   run;  
    Table 3.3: Secondary Options Used with All Parametric Distribution Options

    COLOR= color

    Specifies color of density curve

    FILL

    Fills area under density curve

    L= linetype

    Specifies line type of curve

    MIDPERCENTS

    Prints table of midpoints of histogram intervals

    NOPRINT

    Suppresses tables summarizing curve

    PERCENTS= value-list

    Lists percents for which quantiles calculated from data and quantiles estimated from curve are tabulated

    W= n

    Specifies width of density curve

    Table 3.4: Secondary Beta-Options

    ALPHA= value

    Specifies first shape parameter ± for beta curve

    BETA= value

    Specifies second shape parameter ² for beta curve

    SIGMA= value EST

    Specifies scale parameter ƒ for beta curve

    THETA= value EST

    Specifies lower threshold parameter for beta curve

    Table 3.5: Secondary Exponential-Options

    SIGMA= value

    Specifies scale parameter ƒ for exponential curve

    THETA= value EST

    Specifies threshold parameter for exponential curve

    Table 3.6: Secondary Gamma-Options

    ALPHA= value

    Specifies shape parameter ± for gamma curve

    SIGMA= value

    Specifies scale parameter ƒ for gamma curve

    THETA= value EST

    Specifies threshold parameter for gamma curve

    Table 3.7: Secondary Lognormal-Options

    SIGMA= value

    Specifies shape parameter ƒ for lognormal curve

    THETA= value EST

    Specifies threshold parameter for lognormal curve

    ZETA= value

    Specifies scale parameter for lognormal curve

    Table 3.8: Secondary Normal-Options

    MU= value

    Specifies mean µ for normal curve

    SIGMA= value

    Specifies standard deviation ƒ for normal curve

    Table 3.9: Secondary Weibull-Options

    C= value

    Specifies shape parameter c for Weibull curve

    SIGMA= value

    Specifies scale parameter ƒ for Weibull curve

    THETA= value EST

    Specifies threshold parameter for Weibull curve

  • The COLOR= normal-option draws the curve in red, and the MU= and SIGMA= normal-options specify the parameters µ = 10 and = 0 . 5 for the curve. Note that the sample mean and sample standard deviation are used to estimate µ and , respectively, when the MU= and SIGMA= normal-options are not specified.

Nonparametric Density Estimation Options

Use the option KERNEL( kernel-options ) to compute kernel density estimates. Specify the following secondary options in parentheses after the KERNEL option to control features of density estimates requested with the KERNEL option.

Table 3.10: Kernel-Options

C= value-list MISE

Specifies standardized bandwidth parameter c

COLOR= color

Specifies color of the kernel density curve

FILL

Fills area under kernel density curve

K=NORMAL

  • QUADRATIC

  • TRIANGULAR

Specifies type of kernel function

L= linetype

Specifies line type used for kernel density curve

LOWER=

Specifies lower bound for kernel density curve

UPPER=

Specifies upper bound for kernel density curve

W= n

Specifies line width for kernel density curve

General Options

Table 3.11 summarizes options for enhancing histograms, and Table 3.12 summarizes options for requesting output data sets.

Table 3.11: General Graphics Options

Option

Description

ANNOKEY

Applies annotation requested in ANNOTATE= data set to key cell only

ANNOTATE=

Specifies annotate data set

BARWIDTH=

Specifies width for the bars

CAXIS=

Specifies color for axis

CBARLINE=

Specifies color for outlines of histogram bars

CFILL=

Specifies color for filling under curve

CFRAME=

Specifies color for frame

CFRAMESIDE=

Specifies color for filling frame for row labels

CFRAMETOP=

Specifies color for filling frame for column labels

CGRID=

Specifies color for grid lines

CHREF=

Specifies color for HREF= lines

CPROP=

Specifies color for proportion of frequency bar

CTEXT=

Specifies color for text

CTEXTSIDE=

Specifies color for row labels of comparative histograms

CTEXTTOP=

Specifies color for column labels of comparative histograms

CVREF=

Specifies color for VREF= lines

DESCRIPTION=

Specifies description for plot in graphics catalog

ENDPOINTS=

Lists endpoints for histogram intervals

FONT=

Specifies software font for text

FORCEHIST

Forces creation of histogram

GRID

Creates a grid

FRONTREF

Draws reference lines in front of histogram bars

HEIGHT=

Specifies height of text used outside framed areas

HMINOR=

Specifies number of horizontal minor tick marks

HOFFSET=

Specifies offset for horizontal axis

HREF=

Specifies reference lines perpendicular to the horizontal axis

HREFLABELS=

Specifies labels for HREF= lines

HREFLABPOS=

Specifies vertical position of labels for HREF= lines

INFONT=

Specifies software font for text inside framed areas

INHEIGHT=

Specifies height of text inside framed areas

INTERTILE=

Specifies distance between tiles

LGRID=

Specifies a line type for grid lines

LHREF=

Specifies line style for HREF= lines

LVREF=

Specifies line style for VREF= lines

MAXNBIN=

Specifies maximum number of bins to display

MAXSIGMAS=

Limits the number of bins that display to within a specified number of standard deviations above and below mean of data in key cell

MIDPOINTS=

Lists midpoints for histogram intervals

NAME=

Specifies name for plot in graphics catalog

NCOLS=

Specifies number of columns in comparative histogram

NOBARS

Suppresses histogram bars

NOFRAME

Suppresses frame around plotting area

NOHLABEL

Suppresses label for horizontal axis

NOPLOT

Suppresses plot

NOVLABEL

Suppresses label for vertical axis

NOVTICK

Suppresses tick marks and tick mark labels for vertical axis

NROWS=

Specifies number of rows in comparative histogram

PFILL=

Specifies pattern for filling under curve

RTINCLUDE

Includes right endpoint in interval

TURNVLABELS

Turn and vertically string out characters in labels for vertical axis

VAXIS=

Specifies AXIS statement or values for vertical axis

VAXISLABEL=

Specifies label for vertical axis

VMINOR=

Specifies number of vertical minor tick marks

VOFFSET=

Specifies length of offset at upper end of vertical axis

VREF=

Specifies reference lines perpendicular to the vertical axis

VREFLABELS=

Specifies labels for VREF= lines

VREFLABPOS=

Specifies horizontal position of labels for VREF= lines

VSCALE=

Specifies scale for vertical axis

WAXIS=

Specifies line thickness for axes and frame

WBARLINE=

Specifies line thickness for bar outlines

WGRID=

Specifies line thickness for grid

Table 3.12: Options for Requesting Output Data Sets

Option

Description

MIDPERCENTS

Creates table of histogram intervals

OUTHISTOGRAM=

Specifies information on histogram intervals

Dictionary of Options

The following entries provide detailed descriptions of options in the HISTOGRAM statement.

ALPHA= value

  • specifies the shape parameter ± for fitted curves requested with the BETA and GAMMA options. Enclose the ALPHA= option in parentheses after the BETA or GAMMA options. By default, the procedure calculates a maximum likelihood estimate for ± . You can specify A= as an alias for ALPHA= if you use it as a beta-option . You can specify SHAPE= as an alias for ALPHA= if you use it as a gamma-option .

ANNOKEY

  • applies the annotation requested with the ANNOTATE= option to the key cell only. By default, the procedure applies annotation to all of the cells. This option is not available unless you use the CLASS statement. You can use the KEYLEVEL= option in the CLASS statement to specify the key cell.

ANNOTATE= SAS-data-set

ANNO= SAS-data-set

  • specifies an input data set containing annotate variables as described in SAS/GRAPH Software: Reference . The ANNOTATE= data set you specify in the HISTOGRAM statement is used for all plots created by the statement. You can also specify an ANNOTATE= data set in the PROC UNIVARIATE statement to enhance all plots created by the procedure.

BARWIDTH= value

  • specifies the width of the histogram bars in screen percent units.

BETA < ( beta-options ) >

  • displays a fitted beta density curve on the histogram. The BETA option can occur only once in a HISTOGRAM statement. The beta distribution is bounded below by the parameter and above by the value + ƒ . Use the THETA= and SIGMA= beta-options to specify these parameters. By default, THETA=0 and SIGMA=1. You can specify THETA=EST and SIGMA=EST to request maximum likelihood estimates for and ƒ . See Example 3.21.

    Note: Three- and four-parameter maximum likelihood estimation may not always converge. The beta distribution has two shape parameters, ± and ² . If these parameters are known, you can specify their values with the ALPHA= and BETA= beta-options . By default, the procedure computes maximum likelihood estimates for ± and ² . Table 3.3 (page 214) and Table 3.4 (page 215) list options you can specify with the BETA option.

BETA= value

B= value

  • specifies the second shape parameter ² for beta density curves requested with the BETA option. Enclose the BETA= option in parentheses after the BETA option. By default, the procedure calculates a maximum likelihood estimate for ² .

C= value

  • specifies the shape parameter c for Weibull density curves requested with the WEIBULL option. Enclose the C= Weibull-option in parentheses after the WEIBULL option. If you do not specify a value for c , the procedure calculates a maximum likelihood estimate. You can specify the SHAPE= Weibull-option as an alias for the C= Weibull-option .

C= value-list MISE

  • specifies the standardized bandwidth parameter c for kernel density estimates requested with the KERNEL option. Enclose the C= kernel-option in parentheses after the KERNEL option. You can specify up to five values to request multiple estimates. You can also specify the C=MISE option, which produces the estimate with a bandwidth that minimizes the approximate mean integrated square error (MISE).

  • You can also use the C= kernel-option with the K= kernel-option , which specifies the kernel function, to compute multiple estimates. If you specify more kernel functions than bandwidths, the last bandwidth in the list is repeated for the remaining estimates. Likewise, if you specify more bandwidths than kernel functions, the last kernel function is repeated for the remaining estimates. If you do not specify a value for c , the bandwidth that minimizes the approximate MISE is used for all the estimates.

CAXIS= color

CAXES= color

CA= color

  • specifies the color for the axes and tick marks. This option overrides any COLOR= specifications in an AXIS statement. The default value is the first color in the device color list.

CBARLINE= color

  • specifies the color for the outline of the histogram bars. This option overrides the C= option in the SYMBOL1 statement. The default value is the first color in the device color list.

CFILL= color

  • specifies the color to fill the bars of the histogram (or the area under a fitted density curve if you also specify the FILL option). See the entries for the FILL and PFILL= options for additional details. Refer to SAS/GRAPH Software: Reference for a list of colors. By default, bars and curve areas are not filled.

CFRAME= color

  • specifies the color for the area that is enclosed by the axes and frame. The area is not filled by default.

CFRAMESIDE= color

  • specifies the color to fill the frame area for the row labels that display along the left side of the comparative histogram. This color also fills the frame area for the label of the corresponding class variable (if you associate a label with the variable). By default, these areas are not filled. This option is not available unless you use the CLASS statement.

CFRAMETOP= color

  • specifies the color to fill the frame area for the column labels that display across the top of the comparative histogram. This color also fills the frame area for the label of the corresponding class variable (if you associate a label with the variable). By default, these areas are not filled. This option is not available unless you use the CLASS statement.

CGRID= color

  • specifies the color for grid lines when a grid displays on the histogram. The default color is the first color in the device color list. This option also produces a grid.

CHREF= color

CH= color

  • specifies the color for horizontal axis reference lines requested by the HREF= option. The default is the first color in the device color list.

COLOR= color

  • specifies the color of the density curve. Enclose the COLOR= option in parentheses after the distribution option or the KERNEL option. If you use the COLOR= option with the KERNEL option, you can specify a list of up to five colors in parentheses for multiple kernel density estimates. If there are more estimates than colors, the last color specified is used for the remaining estimates.

CPROP= color EMPTY

  • specifies the color for a horizontal bar whose length (relative to the width of the tile) indicates the proportion of the total frequency that is represented by the corresponding cell in a comparative histogram. By default, no bars are displayed. This option is not available unless you use the CLASS statement. You can specify the keyword EMPTY to display empty bars. See Example 3.20.

CTEXT= color

CT= color

  • specifies the color for tick mark values and axis labels. The default is the color specified for the CTEXT= option in the GOPTIONS statement. In the absence of a GOPTIONS statement, the default color is the first color in the device color list.

CTEXTSIDE= color

  • specifies the color for the row labels that display along the left side of the comparative histogram. By default, the color specified by the CTEXT= option is used. If you omit the CTEXT= option, the color specified in the GOPTIONS statement is used. If you omit the GOPTIONS statement, the the first color in the device color list is used. This option is not available unless you use the CLASS statement. You can specify the CFRAMESIDE= option to change the background color for the row labels.

CTEXTTOP= color

  • specifies the color for the column labels that display along the left side of the comparative histogram. By default, the color specified by the CTEXT= option is used. If you omit the CTEXT= option, the color specified in the GOPTIONS statement is used. If you omit the GOPTIONS statement, the the first color in the device color list is used. This option is not available unless you specify the CLASS statement. You can use the CFRAMETOP= option to change the background color for the column labels.

CVREF= color

CV= color

  • specifies the color for lines requested with the VREF= option. The default is the first color in the device color list.

DESCRIPTION= ' string '

DES= ' string '

  • specifies a description, up to 40 characters long, that appears in the PROC GREPLAY master menu. The default value is the variable name.

ENDPOINTS < = values KEY UNIFORM >

  • uses the endpoints as the tick mark values for the horizontal axis and determines how to compute the bin width of the histogram bars, where values specifies values for both the left and right endpoint of each histogram interval. The width of the histogram bars is the difference between consecutive endpoints. The procedure uses the same values for all variables.

  • The range of endpoints must cover the range of the data. For example, if you specify

      endpoints=2 to 10 by 2  

    then all of the observations must fall in the intervals [2,4) [4,6) [6,8) [8,10]. You also must use evenly spaced endpoints which you list in increasing order.

    KEY

    determines the endpoints for the data in the key cell. The initial number of endpoints is based on the number of observations in the key cell using the method of Terrell and Scott (1985). The procedure extends the endpoint list for the key cell in either direction as necessary until it spans the data in the remaining cells.

    UNIFORM

    determines the endpoints by using all the observations as if there were no cells. In other words, the number of endpoints is based on the total sample size by using the method of Terrell and Scott (1985).

  • Neither KEY nor UNIFORM apply unless you use the CLASS statement.

  • If you omit ENDPOINTS, the procedure uses the midpoints. If you specify ENDPOINTS, the procedure computes the endpoints by using an algorithm (Terrell and Scott 1985) that is primarily applicable to continuous data that are approximately normally distributed.

  • If you specify both MIDPOINTS= and ENDPOINTS, the procedure issues a warning message and uses the endpoints.

  • If you specify RTINCLUDE, the procedure includes the right endpoint of each histogram interval in that interval instead of including the left endpoint.

  • If you use a CLASS statement and specify ENDPOINTS, the procedure uses ENDPOINTS=KEY as the default. However if the key cell is empty, then the procedure uses ENDPOINTS=UNIFORM.

EXPONENTIAL < ( exponential-options ) >

EXP < ( exponential-options ) >

  • displays a fitted exponential density curve on the histogram. The EXPONENTIAL option can occur only once in a HISTOGRAM statement. The parameter must be less than or equal to the minimum data value. Use the THETA= exponential-option to specify . By default, THETA=0. You can specify THETA=EST to request the maximum likelihood estimate for . Use the SIGMA= exponential-option to specify . By default, the procedure computes a maximum likelihood estimate for ƒ . Table 3.3 (page 214) and Table 3.5 (page 215) list options you can specify with the EXPONENTIAL option.

FILL

  • fills areas under the fitted density curve or the kernel density estimate with colors and patterns. The FILL option can occur with only one fitted curve. Enclose the FILL option in parentheses after a density curve option or the KERNEL option. The CFILL= and PFILL= options specify the color and pattern for the area under the curve. For a list of available colors and patterns, see SAS/GRAPH Reference .

FONT= font

  • specifies a software font for reference line and axis labels. You can also specify fonts for axis labels in an AXIS statement. The FONT= font takes precedence over the FTEXT= font specified in the GOPTIONS statement. Hardware characters are used by default.

FORCEHIST

  • forces the creation of a histogram if there is only one unique observation. By default, a histogram is not created if the standard deviation of the data is zero.

FRONTREF

  • draws reference lines requested with the HREF= and VREF= options in front of the histogram bars. By default, reference lines are drawn behind the histogram bars and can be obscured by them.

GAMMA < ( gamma-options ) >

  • displays a fitted gamma density curve on the histogram. The GAMMA option can occur only once in a HISTOGRAM statement. The parameter must be less than the minimum data value. Use the THETA= gamma-option to specify . By default, THETA=0. You can specify THETA=EST to request the maximum likelihood estimate for . Use the ALPHA= and the SIGMA= gamma-options to specify the shape parameter ± and the scale parameter ƒ . By default, PROC UNIVARIATE computes maximum likelihood estimates for ± and ƒ . The procedure calculates the maximum likelihood estimate of ± iteratively using the Newton-Raphson approximation . Table 3.3 (page 214) and Table 3.6 (page 215) list options you can specify with the GAMMA option. See Example 3.22.

GRID

  • displays a grid on the histogram. Grid lines are horizontal lines that are positioned at major tick marks on the vertical axis.

HEIGHT= value

  • specifies the height, in percentage screen units, of text for axis labels, tick mark labels, and legends. This option takes precedence over the HTEXT= option in the GOPTIONS statement.

HMINOR= n

HM= n

  • specifies the number of minor tick marks between each major tick mark on the horizontal axis. Minor tick marks are not labeled. By default, HMINOR=0.

HOFFSET= value

  • specifies the offset, in percentage screen units, at both ends of the horizontal axis. You can use HOFFSET=0 to eliminate the default offset.

HREF= values

  • draws reference lines that are perpendicular to the horizontal axis at the values that you specify. If a reference line is almost completely obscured, then use the FRONTREF option to draw the reference lines in front of the histogram bars. Also see the CHREF=, HREFCHAR=, and LHREF= options.

HREFLABEL=' label1 ' ' labeln '

HREFLABELS= ' label1 ' ' labeln '

HREFLAB= ' label ' ' labeln '

  • specifies labels for the lines requested by the HREF= option. The number of labels must equal the number of lines. Enclose each label in quotes. Labels can have up to 16 characters.

HREFLABPOS=1 2 3

  • specifies the vertical position of HREFLABELS= labels. If you specify HREFLABPOS=1, the labels are positioned along the top of the histogram. If you specify HREFLABPOS=2, the labels are staggered from top to bottom of the histogram. If you specify HREFLABPOS=3, the labels are positioned along the bottom of the histogram. By default, HREFLABPOS=1.

INFONT= font

  • specifies a software font to use for text inside the framed areas of the histogram. The INFONT= option takes precedence over the FTEXT= option in the GOPTIONS statement. For a list of fonts, see SAS/GRAPH Reference .

INHEIGHT= value

  • specifies the height, in percentage screen units, of text used inside the framed areas of the histogram. By default, the height specified by the HEIGHT= option is used. If you do not specify the HEIGHT= option, the height specified with the HTEXT= option in the GOPTIONS statement is used.

INTERTILE= value

  • specifies the distance, in horizontal percentage screen units, between the framed areas, which are called tiles . By default, INTERTILE=0.75 percentage screen units. This option is not available unless you use the CLASS statement. You can specify INTERTILE=0 to create contiguous tiles.

K=NORMAL QUADRATIC TRIANGULAR

  • specifies the kernel function (normal, quadratic, or triangular) used to compute a kernel density estimate. You can specify up to five values to request multiple estimates. You must enclose this option in parentheses after the KERNEL option. You can also use the K= kernel-option with the C= kernel-option , which specifies standardized bandwidths. If you specify more kernel functions than bandwidths, the procedure repeats the last bandwidth in the list for the remaining estimates. Likewise, if you specify more bandwidths than kernel functions, the procedure repeats the last kernel function for the remaining estimates. By default, K=NORMAL.

KERNEL < ( kernel-options ) >

  • superimposes up to five kernel density estimates on the histogram. By default, the procedure uses the AMISE method to compute kernel density estimates. To request multiple kernel density estimates on the same histogram, specify a list of values for either the C= kernel-option or K= kernel-option . Table 3.10 (page 215) lists options you can specify with the KERNEL option. See Example 3.23.

L= linetype

  • specifies the line type used for fitted density curves. Enclose the L= option in parentheses after the distribution option or the KERNEL option. If you use the L= option with the KERNEL option, you can specify a list of up to five line types for multiple kernel density estimates. See the entries for the C= and K= options for details on specifying multiple kernel density estimates. By default, L=1, which produces a solid line.

LGRID= linetype

  • specifies the line type for the grid when a grid displays on the histogram. By default, LGRID=1, which produces a solid line. This option also creates a grid.

LHREF= linetype

LH= linetype

  • specifies the line type for the reference lines that you request with the HREF= option. By default, LHREF=2, which produces a dashed line.

LOGNORMAL < ( lognormal-options ) >

  • displays a fitted lognormal density curve on the histogram. The LOGNORMAL option can occur only once in a HISTOGRAM statement. The parameter must be less than the minimum data value. Use the THETA= lognormal-option to specify . By default, THETA=0. You can specify THETA=EST to request the maximum likelihood estimate for . Use the SIGMA= and ZETA= lognormal-options to specify ƒ and . By default, the procedure computes maximum likelihood estimates for ƒ and . Table 3.3 (page 214) and Table 3.7 (page 215) list options you can specify with the LOGNORMAL option. See Example 3.22 and Example 3.24.

LOWER= value-list

  • specifies lower bounds for kernel density estimates requested with the KERNEL option. Enclose the LOWER= option in parentheses after the KERNEL option. You can specify up to five lower bounds for multiple kernel density estimates. If you specify more kernel estimates than lower bounds, the last lower bound is repeated for the remaining estimates. The default is a missing value, indicating no lower bounds for fitted kernel density curves.

LVREF= linetype

LV= linetype

  • specifies the line type for lines requested with the VREF= option. By default, LVREF=2, which produces a dashed line.

MAXNBIN= n

  • specifies the maximum number of bins displayed in the comparative histogram. This option is useful when the scales or ranges of the data distributions differ greatly from cell to cell. By default, the bin size and midpoints are determined for the key cell, and then the midpoint list is extended to accommodate the data ranges for the remaining cells. However, if the cell scales differ considerably, the resulting number of bins may be so great that each cell histogram is scaled into a narrow region. By using MAXNBIN= to limit the number of bins, you can narrow the window about the data distribution in the key cell. This option is not available unless you specify the CLASS statement. The MAXNBIN= option is an alternative to the MAXSIGMAS= option.

MAXSIGMAS= value

  • limits the number of bins displayed in the comparative histogram to a range of value standard deviations (of the data in the key cell) above and below the mean of the data in the key cell. This option is useful when the scales or ranges of the data distributions differ greatly from cell to cell. By default, the bin size and midpoints are determined for the key cell, and then the midpoint list is extended to accommodate the data ranges for the remaining cells. However, if the cell scales differ considerably, the resulting number of bins may be so great that each cell histogram is scaled into a narrow region. By using MAXSIGMAS= to limit the number of bins, you can narrow the window that surrounds the data distribution in the key cell. This option is not available unless you specify the CLASS statement.

MIDPERCENTS

  • requests a table listing the midpoints and percentage of observations in each histogram interval. If you specify MIDPERCENTS in parentheses after a density estimate option, the procedure displays a table that lists the midpoints, the observed percentage of observations, and the estimated percentage of the population in each interval (estimated from the fitted distribution). See Example 3.18.

MIDPOINTS= values KEY UNIFORM

  • specifies how to determine the midpoints for the histogram intervals, where values determines the width of the histogram bars as the difference between consecutive midpoints. The procedure uses the same values for all variables.

  • The range of midpoints, extended at each end by half of the bar width, must cover the range of the data. For example, if you specify

      midpoints=2 to 10 by 0.5  

    then all of the observations should fall between 1.75 and 10.25. You must use evenly spaced midpoints listed in increasing order.

    KEY

    determines the midpoints for the data in the key cell. The initial number of midpoints is based on the number of observations in the key cell that use the method of Terrell and Scott (1985). The procedure extends the midpoint list for the key cell in either direction as necessary until it spans the data in the remaining cells.

    UNIFORM

    determines the midpoints by using all the observations as if there were no cells. In other words, the number of midpoints is based on the total sample size by using the method of Terrell and Scott (1985).

  • Neither KEY nor UNIFORM apply unless you use the CLASS statement. By default, if you use a CLASS statement, MIDPOINTS=KEY; however, if the key cell is empty then MIDPOINTS=UNIFORM. Otherwise, the procedure computes the midpoints by using an algorithm (Terrell and Scott 1985) that is primarily applicable to continuous data that are approximately normally distributed.

MU= value

  • specifies the parameter µ for normal density curves requested with the NORMAL option. Enclose the MU= option in parentheses after the NORMAL option. By default, the procedure uses the sample mean for µ .

NAME= ' string '

  • specifies a name for the plot, up to eight characters long, that appears in the PROC GREPLAY master menu. The default value is UNIVAR .

NCOLS= n

NCOL= n

  • specifies the number of columns in a comparative histogram. By default, NCOLS=1 if you specify only one class variable, and NCOLS=2 if you specify two class variables. This option is not available unless you use the CLASS statement. If you specify two class variables, you can use the NCOLS= option with the NROWS= option.

NOBARS

  • suppresses drawing of histogram bars, which is useful for viewing fitted curves only.

NOFRAME

  • suppresses the frame around the subplot area.

NOHLABEL

  • suppresses the label for the horizontal axis. You can use this option to reduce clutter.

NOPLOT

NOCHART

  • suppresses the creation of a plot. Use this option when you only want to tabulate summary statistics for a fitted density or create an OUTHISTOGRAM= data set.

NOPRINT

  • suppresses tables summarizing the fitted curve. Enclose the NOPRINT option in parentheses following the distribution option.

NORMAL < ( normal-options ) >

  • displays a fitted normal density curve on the histogram. The NORMAL option can occur only once in a HISTOGRAM statement. Use the MU= and SIGMA= normal-options to specify µ and ƒ . By default, the procedure uses the sample mean and sample standard deviation for µ and ƒ . Table 3.3 (page 214) and Table 3.8 (page 215) list options you can specify with the NORMAL option. See Example 3.19.

NOVLABEL

  • suppresses the label for the vertical axis. You can use this option to reduce clutter.

NOVTICK

  • suppresses the tick marks and tick mark labels for the vertical axis. This option also suppresses the label for the vertical axis.

NROWS= n

NROW= n

  • specifies the number of rows in a comparative histogram. By default, NROWS=2. This option is not available unless you use the CLASS statement. If you specify two class variables, you can use the NCOLS= option with the NROWS= option.

OUTHISTOGRAM= SAS-data-set

OUTHIST= SAS-data-set

  • creates a SAS data set that contains information about histogram intervals. Specifically, the data set contains the midpoints of the histogram intervals, the observed percentage of observations in each interval, and the estimated percentage of observations in each interval (estimated from each of the specified fitted curves).

PERCENTS= values

PERCENT= values

  • specifies a list of percents for which quantiles calculated from the data and quantiles estimated from the fitted curve are tabulated. The percents must be between 0 and 100. Enclose the PERCENTS= option in parentheses after the curve option. The default percents are 1, 5, 10, 25, 50, 75, 90, 95, and 99.

PFILL= pattern

  • specifies a pattern used to fill the bars of the histograms (or the areas under a fitted curve if you also specify the FILL option). See the entries for the CFILL= and FILL options for additional details. Refer to SAS/GRAPH Software: Reference for a list of pattern values. By default, the bars and curve areas are not filled.

RTINCLUDE

  • includes the right endpoint of each histogram interval in that interval. By default, the left endpoint is included in the histogram interval.

SCALE= value

  • is an alias for the SIGMA= option for curves requested by the BETA, EXPONENTIAL, GAMMA, and WEIBULL options and an alias for the ZETA= option for curves requested by the LOGNORMAL option.

SHAPE= value

  • is an alias for the ALPHA= option for curves requested with the GAMMA option, an alias for the SIGMA= option for curves requested with the LOGNORMAL option, and an alias for the C= option for curves requested with the WEIBULL option.

SIGMA= value EST

  • specifies the parameter ƒ for the fitted density curve when you request the BETA, EXPONENTIAL, GAMMA, LOGNORMAL, NORMAL, and WEIBULL options.

    See Table 3.13 for a summary of how to use the SIGMA= option. You must enclose this option in parentheses after the density curve option. As a beta-option , you can specify SIGMA=EST to request a maximum likelihood estimate for ƒ .

    Table 3.13: Uses of the SIGMA= Option

    Distribution Keyword

    SIGMA= Specifies

    Default Value

    Alias

    BETA

    Scale parameter ƒ

    1

    SCALE=

    EXPONENTIAL

    Scale parameter ƒ

    Maximum likelihood estimate

    SCALE=

    GAMMA

    Scale parameter ƒ

    Maximum likelihood estimate

    SCALE=

    WEIBULL

    Scale parameter ƒ

    Maximum likelihood estimate

    SCALE=

    LOGNORMAL

    Shape parameter ƒ

    Maximum likelihood estimate

    SCALE=

    NORMAL

    Scale parameter ƒ

    Standard deviation

    SHAPE=

    WEIBULL

    Scale parameter ƒ

    Maximum likelihood estimate

    SCALE=

THETA= value EST

  • specifies the lower threshold parameter for curves requested with the BETA, EXPONENTIAL, GAMMA, LOGNORMAL, and WEIBULL options. Enclose the THETA= option in parentheses after the curve option. By default, THETA=0. If you specify THETA=EST, an estimate is computed for .

THRESHOLD= value

  • is an alias for the THETA= option. See the preceding entry for the THETA= option.

TURNVLABELS

TURNVLABEL

  • turns the characters in the vertical axis labels so that they display vertically. This happens by default when you use a hardware font.

UPPER= value-list

  • specifies upper bounds for kernel density estimates requested with the KERNEL option. Enclose the UPPER= option in parentheses after the KERNEL option. You can specify up to five upper bounds for multiple kernel density estimates. If you specify more kernel estimates than upper bounds, the last upper bound is repeated for the remaining estimates. The default is a missing value, indicating no upper bounds for fitted kernel density curves.

VAXIS= name value-list

  • specifies the name of an AXIS statement describing the vertical axis. Alternatively, you can specify a value-list for the vertical axis.

VAXISLABEL= label

  • specifies a label for the vertical axis. Labels can have up to 40 characters.

VMINOR= n

VM= n

  • specifies the number of minor tick marks between each major tick mark on the vertical axis. Minor tick marks are not labeled. The default is zero.

VOFFSET= value

  • specifies the offset, in percentage screen units, at the upper end of the vertical axis.

VREF= value-list

  • draws reference lines perpendicular to the vertical axis at the values specified. Also see the CVREF=, LVREF=, and VREFCHAR= options. If a reference line is almost completely obscured, then use the FRONTREF option to draw the reference lines in front of the histogram bars.

VREFLABELS=' label1 ' ' labeln '

VREFLABEL=' label1 ' ' labeln '

VREFLAB= ' label1 ' ' labeln '

  • specifies labels for the lines requested by the VREF= option. The number of labels must equal the number of lines. Enclose each label in quotes. Labels can have up to 16 characters.

VREFLABPOS= n

  • specifies the horizontal position of VREFLABELS= labels. If you specify VREFLABPOS=1, the labels are positioned at the left of the histogram. If you specify VREFLABPOS=2, the labels are positioned at the right of the histogram. By default, VREFLABPOS=1.

VSCALE=COUNT PERCENT PROPORTION

  • specifies the scale of the vertical axis for a histogram. The value COUNT requests the data be scaled in units of the number of observations per data unit. The value PERCENT requests the data be scaled in units of percent of observations per data unit. The value PROPORTION requests the data be scaled in units of proportion of observations per data unit. The default is PERCENT.

W= n

  • specifies the width, in pixels, of the fitted density curve or the kernel density estimate curve. By default, W=1. You must enclose this option in parentheses after the density curve option or the KERNEL option. As a kernel-option , you can specify a list of up to five W= values.

WAXIS= n

  • specifies the line thickness, in pixels, for the axes and frame. By default, WAXIS=1.

WBARLINE= n

  • specifies the width of bar outlines. By default, WBARLINE=1.

WEIBULL < ( Weibull-options ) >

  • displays a fitted Weibull density curve on the histogram. The WEIBULL option can occur only once in a HISTOGRAM statement. The parameter must be less than the minimum data value. Use the THETA= Weibull-option to specify . By default, THETA=0. You can specify THETA=EST to request the maximum likelihood estimate for . Use the C= and SIGMA= Weibull-options to specify the shape parameter c and the scale parameter ƒ . By default, the procedure computes the maximum likelihood estimates for c and ƒ . Table 3.3 (page 214) and Table 3.9 (page 215) list option you can specify with the WEIBULL option. See Example 3.22.

  • PROC UNIVARIATE calculates the maximum likelihood estimate of a iteratively by using the Newton-Raphson approximation. See also the C=, SIGMA=, and THETA= Weibull-options .

WGRID= n

  • specifies the line thickness for the grid.

ZETA= value

  • specifies a value for the scale parameter for lognormal density curves requested with the LOGNORMAL option. Enclose the ZETA= lognormal-option in parentheses after the LOGNORMAL option. By default, the procedure calculates a maximum likelihood estimate for . You can specify the SCALE= option as an alias for the ZETA= option.

ID Statement

  • ID variables ;

The ID statement specifies one or more variables to include in the table of extreme observations. The corresponding values of the ID variables appear beside the n largest and n smallest observations, where n is the value of NEXTROBS= option. See Example 3.3.

INSET Statement

  • INSET keyword-list < /options > ;

The INSET statement places a box or table of summary statistics, called an inset , directly in a high-resolution graph created with the HISTOGRAM, PROBPLOT, or QQPLOT statement.

The INSET statement must follow the HISTOGRAM, PROBPLOT, or QQPLOT statement that creates the plot that you want to augment. The inset appears in all the graphs that the preceding plot statement produces.

You can use multiple INSET statements after a plot statement to add multiple insets to a plot. See Example 3.17.

In an INSET statement, you specify one or more keywords that identify the information to display in the inset. The information is displayed in the order that you request the keywords . Keywords can be any of the following:

  • statistical keywords

  • primary keywords

  • secondary keywords

The available statistical keywords are:

Table 3.14: Descriptive Statistic Keywords

CSS

Corrected sum of squares

CV

Coefficient of variation

KURTOSIS

Kurtosis

MAX

Largest value

MEAN

Sample mean

MIN

Smallest value

MODE

Most frequent value

N

Sample size

NMISS

Number of missing values

NOBS

Number of observations

RANGE

Range

SKEWNESS

Skewness

STD

Standard deviation

STDMEAN

Standard error of the mean

SUM

Sum of the observations

SUMWGT

Sum of the weights

USS

Uncorrected sum of squares

VAR

Variance

Table 3.15: Percentile Statistic Keywords

P1

1st percentile

P5

5th percentile

P10

10th percentile

Q1

Lower quartile (25th percentile)

MEDIAN

Median (50th percentile)

Q3

Upper quartile (75th percentile)

P90

90th percentile

P95

95th percentile

P99

99th percentile

QRANGE

Interquartile range (Q3 - Q1)

Table 3.16: Robust Statistics Keywords

GINI

Gini s mean difference

MAD

Median absolute difference about the median

QN

Q n , alternative to MAD

SN

S n , alternative to MAD

STD “GINI

Gini s standard deviation

STD “MAD MAD

standard deviation

STD “QN

Q n standard deviation

STD “QRANGE

Interquartile range standard deviation

STD “SN

S n standard deviation

Table 3.17: Hypothesis Testing Keywords

MSIGN

Sign statistic

NORMALTEST

Test statistic for normality

PNORMAL

Probability value for the test of normality

SIGNRANK

Signed rank statistic

PROBM

Probability of greater absolute value for the sign statistic

PROBN

Probability value for the test of normality

PROBS

Probability value for the signed rank test

PROBT

Probability value for the Student s t test

T

Statistics for Student s t test

A primary keyword enables you to specify secondary keywords in parentheses immediately after the primary keyword. Primary keywords are BETA, EXPONENTIAL, GAMMA, LOGNORMAL, NORMAL, WEIBULL, WEIBULL2, KERNEL, and KERNEL n . If you specify a primary keyword but omit a secondary keyword , the inset displays a colored line and the distribution name as a key for the density curve.

By default, PROC UNIVARIATE identifies inset statistics with appropriate labels and prints numeric values using appropriate formats. To customize the label, specify the keyword followed by an equal sign (=) and the desired label in quotes. To customize the format, specify a numeric format in parentheses after the keyword . Labels can have up to 24 characters.

If you specify both a label and a format for a statistic, the label must appear before the format. For example,

  inset n='Sample Size' std='Std Dev' (5.2);  

requests customized labels for two statistics and displays the standard deviation with a field width of 5 and two decimal places.

The following tables list primary keywords :

Table 3.18: Parametric Density Primary Keywords

Keyword

Distribution

Plot Statement Availability

BETA

Beta

All plot statements

EXPONENTIAL

Exponential

All plot statements

GAMMA

Gamma

All plot statements

LOGNORMAL

Lognormal

All plot statements

NORMAL

Normal

All plot statements

WEIBULL

Weibull(3-parameter)

All plot statements

WEIBULL2

Weibull(2-parameter)

PROBPLOT and QQPLOT

Table 3.19: Kernel Density Estimate Primary Keywords

Keyword

Description

KERNEL

Displays statistics for all kernel estimates

KERNEL n

Displays statistics for only the n th kernel density estimate n = 1 , 2 , 3 , 4 , or 5

Table 3.20 through Table 3.28 list the secondary keywords available with primary keywords in Table 3.18 and Table 3.19.

Table 3.20: Secondary Keywords Available with the BETA Keyword

Secondary Keyword

Alias

Description

ALPHA

SHAPE1

First shape parameter ±

BETA

SHAPE2

Second shape parameter ²

SIGMA

SCALE

Scale parameter ƒ

THETA

THRESHOLD

Lower threshold parameter

MEAN

 

Mean of the fitted distribution

STD

 

Standard deviation of the fitted distribution

Table 3.21: Secondary Keywords Available with the EXP Keyword

Secondary Keyword

Alias

Description

SIGMA

SCALE

Scale parameter ƒ

THETA

THRESHOLD

Threshold parameter

MEAN

 

Mean of the fitted distribution

STD

 

Standard deviation of the fitted distribution

Table 3.22: Secondary Keywords Available with the GAMMA Keyword

Secondary Keyword

Alias

Description

ALPHA

SHAPE

Shape parameter ±

SIGMA

SCALE

Scale parameter ƒ

THETA

THRESHOLD

Threshold parameter

MEAN

 

Mean of the fitted distribution

STD

 

Standard deviation of the fitted distribution

Table 3.23: Secondary Keywords Available with the LOGNORMAL Keyword

Secondary Keyword

Alias

Description

SIGMA

SHAPE

Shape parameter ƒ

THETA

THRESHOLD

Threshold parameter

ZETA

SCALE

Scale parameter

MEAN

 

Mean of the fitted distribution

STD

 

Standard deviation of the fitted distribution

Table 3.24: Secondary Keywords Available with the NORMAL Keyword

Secondary Keyword

Alias

Description

MU

MEAN

Mean parameter µ

SIGMA

STD

Scale parameter ƒ

Table 3.25: Secondary Keywords Available with the WEIBULL Keyword

Secondary Keyword

Alias

Description

C

SHAPE

Shape parameter c

SIGMA

SCALE

Scale parameter ƒ

THETA

THRESHOLD

Threshold parameter

MEAN

 

Mean of the fitted distribution

STD

 

Standard deviation of the fitted distribution

Table 3.26: Secondary Keywords Available with the WEIBULL2 Keyword

Secondary Keyword

Alias

Description

C

SHAPE

Shape parameter c

SIGMA

SCALE

Scale parameter ƒ

THETA

THRESHOLD

Known lower threshold

MEAN

 

Mean of the fitted distribution

STD

 

Standard deviation of the fitted distribution

Table 3.27: Secondary Keywords Available with the KERNEL Keyword

Secondary Keyword

Description

TYPE

Kernel type: normal, quadratic, or triangular

BANDWIDTH

Bandwidth » for the density estimate

BWIDTH

Alias for BANDWIDTH

C

Standardized bandwidth c for the density estimate: where n = sample size, » = bandwidth, and Q = interquartile range

AMISE

Approximate mean integrated square error (MISE) for the kernel density

Table 3.28: Goodness-of-Fit Statistics for Fitted Curves

Secondary Keyword

Description

AD

Anderson-Darling EDF test statistic

ADPVAL

Anderson-Darling EDF test p -value

CVM

Cram r-von Mises EDF test statistic

CVMPVAL

Cram r-von Mises EDF test p -value

KSD

Kolmogorov-Smirnov EDF test statistic

KSDPVAL

Kolmogorov-Smirnov EDF test p -value

The inset statistics listed in Table 3.18 through Table 3.28 are not available unless you request a plot statement and options that calculate these statistics. For example,

  proc univariate data=score;   histogram final / normal;   inset mean std normal(ad adpval);   run;  

The MEAN and STD keywords display the sample mean and standard deviation of FINAL. The NORMAL keyword with the secondary keywords AD and ADPVAL display the Anderson-Darling goodness-of-fit test statistic and p -value. The statistics that are specified with the NORMAL keyword are available only because the NORMAL option is requested in the HISTOGRAM statement.

The KERNEL or KERNEL n keyword is available only if you request a kernel density estimate in a HISTOGRAM statement. The WEIBULL2 keyword is available only if you request a two-parameter Weibull distribution in the PROBPLOT or QQPLOT statement.

Summary of Options

The following table lists INSET statement options , which are specified after the slash (/) in the INSET statement. For complete descriptions, see the section Dictionary of Options on page 235.

Table 3.29: INSET Options

CFILL= color BLANK

Specifies color of inset background

CFILLH= color

Specifies color of header background

CFRAME= color

Specifies color of frame

CHEADER= color

Specifies color of header text

CSHADOW= color

Specifies color of drop shadow

CTEXT= color

Specifies color of inset text

DATA

Specifies data units for POSITION=( x, y ) coordinates

DATA= SAS-data-set

Specifies data set for statistics in the inset table

FONT= font

Specifies font of text

FORMAT= format

Specifies format of values in inset

HEADER= ' quoted string '

Specifies header text

HEIGHT= value

Specifies height of inset text

NOFRAME

Suppresses frame around inset

POSITION= position

Specifies position of inset

REFPOINT=BR BL TR TL

Specifies reference point of inset positioned with POSITION=( x, y ) coordinates

Dictionary of Options

The following entries provide detailed descriptions of options for the INSET statement.

To specify the same format for all the statistics in the INSET statement, use the FORMAT= option.

To create a completely customized inset, use a DATA= data set. The data set contains the label and the value that you want to display in the inset.

If you specify multiple kernel density estimates, you can request inset statistics for all the estimates with the KERNEL keyword . Alternatively, you can display inset statistics for individual curves with the KERNEL n keyword , where n is the curve number between 1 and 5.

CFILL= color BLANK

  • specifies the color of the background. If you omit the CFILLH= option the header background is included. By default, the background is empty, which causes items that overlap the inset (such as curves or histogram bars) to show through the inset.

  • If you specify a value for CFILL= option, then overlapping items no longer show through the inset. Use CFILL=BLANK to leave the background uncolored and to prevent items from showing through the inset.

CFILLH= color

  • specifies the color of the header background. The default value is the CFILL= color.

CFRAME= color

  • specifies the color of the frame. The default value is the same color as the axis of the plot.

CHEADER= color

  • specifies the color of the header text. The default value is the CTEXT= color.

CSHADOW= color

  • specifies the color of the drop shadow. By default, if a CSHADOW= option is not specified, a drop shadow is not displayed.

CTEXT= color

  • specifies the color of the text. The default value is the same color as the other text on the plot.

DATA

  • specifies that data coordinates are to be used in positioning the inset with the POSITION= option. The DATA option is available only when you specify POSITION=(x,y). You must place DATA immediately after the coordinates (x,y).

DATA= SAS-data-set

  • requests that PROC UNIVARIATE display customized statistics from a SAS data set in the inset table. The data set must contain two variables:

    _LABEL_

    a character variable whose values provide labels for inset entries.

    _VALUE_

    a variable that is either character or numeric and whose values provide values for inset entries.

  • The label and value from each observation in the data set occupy one line in the inset. The position of the DATA= keyword in the keyword list determines the position of its lines in the inset.

FONT= font

  • specifies the font of the text. By default, if you locate the inset in the interior of the plot then the font is SIMPLEX. If you locate the inset in the exterior of the plot then the font is the same as the other text on the plot.

FORMAT= format

  • specifies a format for all the values in the inset. If you specify a format for a particular statistic, then this format overrides FORMAT= format. For more information about SAS formats, see SAS Language Reference: Dictionary

HEADER= string

  • specifies the header text. The string cannot exceed 40 characters. By default, no header line appears in the inset. If all the keywords that you list in the INSET statement are secondary keywords that correspond to a fitted curve on a histogram, PROC UNIVARIATE displays a default header that indicates the distribution and identifies the curve.

HEIGHT= value

  • specifies the height of the text.

NOFRAME

  • suppresses the frame drawn around the text.

POSITION= position

POS= position

  • determines the position of the inset. The position is a compass point keyword, a margin keyword, or a pair of coordinates (x,y). You can specify coordinates in axis percent units or axis data units. The default value is NW, which positions the inset in the upper left (northwest) corner of the display. See the section Positioning the Inset on page 285.

REFPOINT=BR BL TR TL

  • specifies the reference point for an inset that PROC UNIVARIATE positions by a pair of coordinates with the POSITION= option. The REFPOINT= option specifies which corner of the inset frame that you want to position at coordinates (x,y). The keywords are BL, BR, TL, and TR, which correspond to bottom left, bottom right, top left, and top right. The default value is BL. You must use REFPOINT= with POSITION=(x,y) coordinates.

OUTPUT Statement

  • OUTPUT < OUT= SAS-data-set >
    < keyword1=names keywordk=names > < percentile-options > ;

The OUTPUT statement saves statistics and BY variables in an output data set. When you use a BY statement, each observation in the OUT= data set corresponds to one of the BY groups. Otherwise, the OUT= data set contains only one observation.

You can use any number of OUTPUT statements in the UNIVARIATE procedure. Each OUTPUT statement creates a new data set containing the statistics specified in that statement. You must use the VAR statement with the OUTPUT statement. The OUTPUT statement must contain a specification of the form keyword=names or the PCTLPTS= and PCTLPRE= specifications. See Example 3.7 and Example 3.8.

OUT= SAS-data-set

  • identifies the output data set. If SAS-data-set does not exist, PROC UNIVARIATE creates it. If you omit OUT=, the data set is named DATA n , where n is the smallest integer that makes the name unique. The default SAS-data-set is DATA n .

keyword = name

  • specifies the statistics to include in the output data set and gives names to the new variables that contain the statistics. Specify a keyword for each desired statistic, an equal sign, and the names of the variables to contain the statistic. In the output data set, the first variable listed after a keyword in the OUTPUT statement contains the statistic for the first variable listed in the VAR statement; the second variable contains the statistic for the second variable in the VAR statement, and so on. If the list of names following the equal sign is shorter than the list of variables in the VAR statement, the procedure uses the names in the order in which the variables are listed in the VAR statement. The available keywords are listed in the following tables:

    Table 3.30: Descriptive Statistic Keywords

    CSS

    Corrected sum of squares

    CV

    Coefficient of variation

    KURTOSIS

    Kurtosis

    MAX

    Largest value

    MEAN

    Sample mean

    MIN

    Smallest value

    MODE

    Most frequent value

    N

    Sample size

    NMISS

    Number of missing values

    NOBS

    Number of observations

    RANGE

    Range

    SKEWNESS

    Skewness

    STD

    Standard deviation

    STDMEAN

    Standard error of the mean

    SUM

    Sum of the observations

    SUMWGT

    Sum of the weights

    USS

    Uncorrected sum of squares

    VAR

    Variance

    Table 3.31: Quantile Statistic Keywords

    P1

    1st percentile

    P5

    5th percentile

    P10

    10th percentile

    Q1

    Lower quartile (25th percentile)

    MEDIAN

    Median (50th percentile)

    Q3

    Upper quartile (75th percentile)

    P90

    90th percentile

    P95

    95th percentile

    P99

    99th percentile

    QRANGE

    Interquartile range (Q3 - Q1)

    Table 3.32: Robust Statistics Keywords

    GINI

    Gini s mean difference

    MAD

    Median absolute difference about the median

    QN

    Q n , alternative to MAD

    SN

    S n , alternative to MAD

    STD “GINI

    Gini s standard deviation

    STD “MAD MAD

    standard deviation

    STD “QN

    Q n standard deviation

    STD “QRANGE

    Interquartile range standard deviation

    STD “SN

    S n standard deviation

    Table 3.33: Hypothesis Testing Keywords

    MSIGN

    Sign statistic

    NORMALTEST

    Test statistic for normality

    SIGNRANK

    Signed rank statistic

    PROBM

    Probability of a greater absolute value for the sign statistic

    PROBN

    Probability value for the test of normality

    PROBS

    Probability value for the signed rank test

    PROBT

    Probability value for the Student s t test

    T

    Statistic for the Student s t test

  • To store the same statistic for several analysis variables, specify a list of names . The order of the names corresponds to the order of the analysis variables in the VAR statement. PROC UNIVARIATE uses the first name to create a variable that contains the statistic for the first analysis variable, the next name to create a variable that contains the statistic for the second analysis variable, and so on. If you do not want to output statistics for all the analysis variables, specify fewer names than the number of analysis variables.

  • The UNIVARIATE procedure automatically computes the 1st, 5th, 10th, 25th, 50th, 75th, 90th, 95th, and 99th percentiles for the data. These can be saved in an output data set using keyword=names specifications. For additional percentiles, you can use the following percentile-options :

PCTLPTS= percentiles

  • specifies one or more percentiles that are not automatically computed by the UNIVARIATE procedure. The PCTLPRE= and PCTLPTS= options must be used together. You can specify percentiles with the expression start TO stop BY increment where start is a starting number, stop is an ending number, and increment is a number to increment by. The PCTLPTS= option generates additional percentiles and outputs them to a data set; these additional percentiles are not printed.

  • To compute the 50th, 95th, 97.5th, and 100th percentiles, submit the statement

      output pctlpre=P_ pctlpts=50,95 to 100 by 2.5;  
  • You can use PCTLPTS= to output percentiles that are not in the list of quantile statistics. PROC UNIVARIATE computes the requested percentiles based on the method that you specify with the PCTLDEF= option in the PROC UNIVARIATE statement. You must use PCTLPRE=, and optionally PCTLNAME=, to specify variable names for the percentiles. For example, the following statements create an output data set that is named Pctls that contains the 20th and 40th percentiles of the analysis variables PreTest and PostTest :

      proc univariate data=Score;   var PreTest PostTest;   output out=Pctls pctlpts=20 40 pctlpre=PreTest_ PostTest_   pctlname=P20 P40;   run;  
  • PROC UNIVARIATE saves the 20th and 40th percentiles for PreTest and PostTest in the variables PreTest “P20, PostTest “P20, PreTest “P40, and PostTest “P40.

PCTLPRE= prefixes

  • specifies one or more prefixes to create the variable names for the variables that contain the PCTLPTS= percentiles. To save the same percentiles for more than one analysis variable, specify a list of prefixes. The order of the prefixes corresponds to the order of the analysis variables in the VAR statement. The PCTLPRE= and PCTLPTS= options must be used together.

  • The procedure generates new variable names using the prefix and the percentile values. If the specified percentile is an integer, the variable name is simply the prefix followed by the value. If the specified value is not an integer, an underscore replaces the decimal point in the variable name, and decimal values are truncated to one decimal place. For example, the following statements create the variables PWID20, PWID33 “3, PWID66 “6, and PWID80 for the 20th, 33.33rd, 66.67th, and 80th percentiles of Width , respectively:

      proc univariate noprint;   var Width;   output pctlpts=20 33.33 66.67 80 pctlpre=pwid;   run;  
  • If you request percentiles for more than one variable, you should list prefixes in the same order in which the variables appear in the VAR statement. If combining the prefix and percentile value results in a name longer than 32 characters, the prefix is truncated so that the variable name is 32 characters.

PCTLNAME= suffixes

  • specifies one or more suffixes to create the names for the variables that contain the PCTLPTS= percentiles. PROC UNIVARIATE creates a variable name by combining the PCTLPRE= value and suffix-name. Because the suffix names are associated with the percentiles that are requested, list the suffix names in the same order as the PCTLPTS= percentiles. If you specify n suffixes with the PCTLNAME= option and m percentile values with the PCTLPTS= option, where m > n , the suffixes are used to name the first n percentiles, and the default names are used for the remaining m ˆ’ n percentiles. For example, consider the following statements:

      proc univariate;   var Length Width Height;   output pctlpts  = 20 40   pctlpre  = pl pw ph   pctlname = twenty;   run;  
  • The value TWENTY in the PCTLNAME= option is used for only the first percentile in the PCTLPTS= list. This suffix is appended to the values in the PCTLPRE= option to generate the new variable names PLTWENTY, PWTWENTY, and PHTWENTY, which contain the 20th percentiles for Length , Width , and Height , respectively. Since a second PCTLNAME= suffix is not specified, variable names for the 40th percentiles for Length , Width , and Height are generated using the prefixes and percentile values. Thus, the output data set contains the variables PLTWENTY, PL40, PWTWENTY, PW40, PHTWENTY, and PH40.

  • You must specify PCTLPRE= to supply prefix names for the variables that contain the PCTLPTS= percentiles.

  • If the number of PCTLNAME= values is fewer than the number of percentiles, or if you omit PCTLNAME=, PROC UNIVARIATE uses the percentile as the suffix to create the name of the variable that contains the percentile. For an integer percentile, PROC UNIVARIATE uses the percentile. Otherwise, PROC UNIVARIATE truncates decimal values of percentiles to two decimal places and replaces the decimal point with an underscore.

  • If either the prefix and suffix name combination or the prefix and percentile name combination is longer than 32 characters, PROC UNIVARIATE truncates the prefix name so that the variable name is 32 characters.

PROBPLOT Statement

  • PROBPLOT < variables > < /options > ;

The PROBPLOT statement creates a probability plot, which compares ordered variable values with the percentiles of a specified theoretical distribution. If the data distribution matches the theoretical distribution, the points on the plot form a linear pattern. Consequently, you can use a probability plot to determine how well a theoretical distribution models a set of measurements.

Probability plots are similar to Q-Q plots, which you can create with the QQPLOT statement. Probability plots are preferable for graphical estimation of percentiles, whereas Q-Q plots are preferable for graphical estimation of distribution parameters.

You can use any number of PROBPLOT statements in the UNIVARIATE procedure. The components of the PROBPLOT statement are described as follows.

variables

  • are the variables for which to create probability plots. If you specify a VAR statement, the variables must also be listed in the VAR statement. Otherwise, the variables can be any numeric variables in the input data set. If you do not specify a list of variables , then by default the procedure creates a probability plot for each variable listed in the VAR statement, or for each numeric variable in the DATA= data set if you do not specify a VAR statement. For example, each of the following PROBPLOT statements produces two probability plots, one for Length and one for Width :

      proc univariate data=Measures;   var Length Width;   probplot;   proc univariate data=Measures;   probplot Length Width;   run;  

options

  • specify the theoretical distribution for the plot or add features to the plot. If you specify more than one variable, the options apply equally to each variable. Specify all options after the slash (/) in the PROBPLOT statement. You can specify only one option naming a distribution in each PROBPLOT statement, but you can specify any number of other options . The distributions available are the beta, exponential, gamma, lognormal, normal, two-parameter Weibull, and three-parameter Weibull. By default, the procedure produces a plot for the normal distribution.

  • In the following example, the NORMAL option requests a normal probability plot for each variable, while the MU= and SIGMA= normal-options request a distribution reference line corresponding to the normal distribution with µ = 10 and ƒ = 0.3. The SQUARE option displays the plot in a square frame, and the CTEXT= option specifies the text color.

      proc univariate data=Measures;   probplot Length1 Length2 / normal(mu=10 sigma=0.3)   square ctext=blue;   run;  
  • Table 3.34 through Table 3.43 list the PROBPLOT options by function. For complete descriptions, see the section Dictionary of Options on page 245. Options can be any of the following:

    • primary options

    • secondary options

    • general options

Table 3.34: Primary Options for Theoretical Distributions

BETA( beta-options )

Specifies beta probability plot for shape parameters ± and ² specified with mandatory ALPHA= and BETA= beta-options

EXPONENTIAL( exponential-options )

Specifies exponential probability plot

GAMMA( gamma-options )

Specifies gamma probability plot for shape parameter ± specified with mandatory ALPHA= gamma-option

LOGNORMAL( lognormal-options )

Specifies lognormal probability plot for shape parameter ƒ specified with mandatory SIGMA= lognormal-option

NORMAL( normal-options )

Specifies normal probability plot

WEIBULL( Weibull-options )

Specifies three-parameter Weibull probability plot for shape parameter c specified with mandatory C= Weibull-option

WEIBULL2( Weibull2-options )

Specifies two-parameter Weibull probability plot

Table 3.35: Secondary Reference Line Options Used with All Distributions

COLOR= color

Specifies color of distribution reference line

L= linetype

Specifies line type of distribution reference line

W= n

Specifies width of distribution reference line

Table 3.36: Secondary Beta-Options

ALPHA= value-list EST

Specifies mandatory shape parameter ±

BETA= value-list EST

Specifies mandatory shape parameter ²

SIGMA= value EST

Specifies ƒ for distribution reference line

THETA= value EST

Specifies ƒ for distribution reference line

Table 3.37: Secondary Exponential-Options

SIGMA= value EST

Specifies ƒ for distribution reference line

THETA= value EST

Specifies for distribution reference line

Table 3.38: Secondary Gamma-Options

ALPHA= value-list EST

Specifies mandatory shape parameter ±

SIGMA= value EST

Specifies ƒ for distribution reference line

THETA= value EST

Specifies for distribution reference line

Table 3.39: Secondary Lognormal-Options

SIGMA= value

Specifies mandatory shape parameter ƒ

SLOPE= value EST

Specifies slope of distribution reference line

THETA= value EST

Specifies for distribution reference line

ZETA= value

Specifies for distribution reference line (slope is exp( ))

Table 3.40: Secondary Normal-Options

MU= value EST

Specifies µ for distribution reference line

SIGMA= value EST

Specifies ƒ for distribution reference line

Table 3.41: Secondary Weibull-Options

C= value-list EST

Specifies mandatory shape parameter c

SIGMA= value EST

Specifies ƒ for distribution reference line

THETA= value EST

Specifies for distribution reference line

Table 3.42: Secondary Weibull2-Options

C= value EST

Specifies c for distribution reference line (slope is 1 /c )

SIGMA= value EST

Specifies ƒ for distribution reference line (intercept is log( ƒ ))

SLOPE= value EST

Specifies slope of distribution reference line

THETA= value

Specifies known lower threshold

Table 3.43: General Graphics Options

Option

Description

ANNOKEY

Applies annotation requested in ANNOTATE= data set to key cell only

ANNOTATE=

Specifies annotate data set

CAXIS=

Specifies color for axis

CFRAME=

Specifies color for frame

CFRAMESIDE=

Specifies color for filling frame for row labels

CFRAMETOP=

Specifies color for filling frame for column labels

CGRID=

Specifies color for grid lines

CHREF=

Specifies color for HREF= lines

CTEXT=

Specifies color for text

CVREF=

Specifies color for VREF= lines

DESCRIPTION=

Specifies description for plot in graphics catalog

FONT=

Specifies software font for text

GRID

Creates a grid

HEIGHT=

Specifies height of text used outside framed areas

HMINOR=

Specifies number of horizontal minor tick marks

HREF=

Specifies reference lines perpendicular to the horizontal axis

HREFLABELS=

Specifies labels for HREF= lines

INFONT=

Specifies software font for text inside framed areas

INHEIGHT=

Specifies height of text inside framed areas

INTERTILE=

Specifies distance between tiles

LGRID=

Specifies a line type for grid lines

LHREF=

Specifies line style for HREF= lines

LVREF=

Specifies line style for VREF= lines

NADJ=

Adjusts sample size when computing percentiles

NAME=

Specifies name for plot in graphics catalog

NCOLS=

Specifies number of columns in comparative probability plot

NOFRAME

Suppresses frame around plotting area

NOHLABEL

Suppresses label for horizontal axis

NOVLABEL

Suppresses label for vertical axis

NOVTICK

Suppresses tick marks and tick mark labels for vertical axis

NROWS=

Specifies number of rows in comparative probability plot

PCTLMINOR

Requests minor tick marks for percentile axis

PCTLORDER=

Specifies tick mark labels for percentile axis

RANKADJ=

Adjusts ranks when computing percentiles

SQUARE

Displays plot in square format

VAXISLABEL=

Specifies label for vertical axis

VMINOR=

Specifies number of vertical minor tick marks

VREF=

Specifies reference lines perpendicular to the vertical axis

VREFLABELS=

Specifies labels for VREF= lines

VREFLABPOS=

Specifies horizontal position of labels for VREF= lines

WAXIS=

Specifies line thickness for axes and frame

Distribution Options

Table 3.34 lists options for requesting a theoretical distribution.

Table 3.35 through Table 3.42 list secondary options that specify distribution parameters and control the display of a distribution reference line. Specify these options in parentheses after the distribution keyword. For example, you can request a normal probability plot with a distribution reference line by specifying the NORMAL option as follows:

  proc univariate;   probplot Length / normal(mu=10 sigma=0.3 color=red);   run;  

The MU= and SIGMA= normal-options display a distribution reference line that corresponds to the normal distribution with mean µ = 10 and standard deviation ƒ = 0 . 3, and the COLOR= normal-option specifies the color for the line.

General Graphics Options

Table 3.43 summarizes general options for enhancing probability plots.

Dictionary of Options

The following entries provide detailed descriptions of options in the PROBPLOT statement.

ALPHA= value EST

  • specifies the mandatory shape parameter ± for probability plots requested with the BETA and GAMMA options. Enclose the ALPHA= option in parentheses after the BETA or GAMMA options. If you specify ALPHA=EST, a maximum likelihood estimate is computed for ± .

ANNOKEY

  • applies the annotation requested with the ANNOTATE= option to the key cell only. By default, the procedure applies annotation to all of the cells. This option is not available unless you use the CLASS statement. Specify the KEYLEVEL= option in the CLASS statement to specify the key cell.

ANNOTATE= SAS-data-set

ANNO= SAS-data-set

  • specifies an input data set containing annotate variables as described in SAS/GRAPH Software: Reference . The ANNOTATE= data set you specify in the HISTOGRAM statement is used for all plots created by the statement. You can also specify an ANNOTATE= data set in the PROC UNIVARIATE statement to enhance all plots created by the procedure.

BETA(ALPHA= value EST BETA= value EST < beta-options > )

  • creates a beta probability plot for each combination of the required shape parameters ± and ² specified by the required ALPHA= and BETA= beta-options . If you specify ALPHA=EST and BETA=EST, the procedure creates a plot based on maximum likelihood estimates for ± and ² . You can specify the SCALE= beta-option as an alias for the SIGMA= beta-option and the THRESHOLD= beta-option as an alias for the THETA= beta-option . To create a plot that is based on maximum likelihood estimates for ± and ² , specify ALPHA=EST and BETA=EST.

  • To obtain graphical estimates of ± and ² , specify lists of values in the ALPHA= and BETA= beta-options , and select the combination of ± and ² that most nearly linearizes the point pattern. To assess the point pattern, you can add a diagonal distribution reference line corresponding to lower threshold parameter and scale parameter ƒ with the THETA= and SIGMA= beta-options . Alternatively, you can add a line that corresponds to estimated values of and ƒ with the beta-options THETA=EST and SIGMA=EST. Agreement between the reference line and the point pattern indicates that the beta distribution with parameters ± , ² , , and ƒ is a good fit.

BETA= value EST

B= value EST

  • specifies the mandatory shape parameter ² for probability plots requested with the BETA option. Enclose the BETA= option in parentheses after the BETA option. If you specify BETA=EST, a maximum likelihood estimate is computed for ² .

C= value EST

  • specifies the shape parameter c for probability plots requested with the WEIBULL and WEIBULL2 options. Enclose this option in parentheses after the WEIBULL or WEIBULL2 option. C= is a required Weibull-option in the WEIBULL option; in this situation, it accepts a list of values, or if you specify C=EST, a maximum likelihood estimate is computed for c . You can optionally specify C= value or C=EST as a Weibull2-option with the WEIBULL2 option to request a distribution reference line; in this situation, you must also specify Weibull2-option SIGMA= value or SIGMA=EST.

CAXIS= color

CAXES= color

  • specifies the color for the axes. This option overrides any COLOR= specifications in an AXIS statement. The default value is the first color in the device color list.

CFRAME= color

  • specifies the color for the area that is enclosed by the axes and frame. The area is not filled by default.

CFRAMESIDE= color

  • specifies the color to fill the frame area for the row labels that display along the left side of a comparative probability plot. This color also fills the frame area for the label of the corresponding class variable (if you associate a label with the variable). By default, these areas are not filled. This option is not available unless you use the CLASS statement.

CFRAMETOP= color

  • specifies the color to fill the frame area for the column labels that display across the top of a comparative probability plot. This color also fills the frame area for the label of the corresponding class variable (if you associate a label with the variable). By default, these areas are not filled. This option does not apply unless you use the CLASS statement.

CGRID= color

  • specifies the color for grid lines when a grid displays on the plot. The default color is the first color in the device color list. This option also produces a grid.

CHREF= color

CH= color

  • specifies the color for horizontal axis reference lines requested by the HREF= option. The default color is the first color in the device color list.

COLOR= color

  • specifies the color of the diagonal distribution reference line. The default color is the first color in the device color list. Enclose the COLOR= option in parentheses after a distribution option keyword.

CTEXT= color

  • specifies the color for tick mark values and axis labels. The default color is the color that you specify for the CTEXT= option in the GOPTIONS statement. If you omit the GOPTIONS statement, the default is the first color in the device color list.

CVREF= color

CV= color

  • specifies the color for the reference lines requested by the VREF= option. The default color is the first color in the device color list.

DESCRIPTION= ' string '

DES= ' string '

  • specifies a description, up to 40 characters long, that appears in the PROC GREPLAY master menu. The default string is the variable name.

EXPONENTIAL < ( exponential-options ) >

EXP < ( exponential-options ) >

  • creates an exponential probability plot. To assess the point pattern, add a diagonal distribution reference line corresponding to and ƒ with the THETA= and SIGMA= exponential-options . Alternatively, you can add a line corresponding to estimated values of the threshold parameter and the scale parameter ƒ with the exponential-options THETA=EST and SIGMA=EST. Agreement between the reference line and the point pattern indicates that the exponential distribution with parameters and ƒ is a good fit. You can specify the SCALE= exponential-option as an alias for the SIGMA= exponential-option and the THRESHOLD= exponential-option as an alias for the THETA= exponential-option .

FONT= font

  • specifies a software font for the reference lines and axis labels. You can also specify fonts for axis labels in an AXIS statement. The FONT= font takes precedence over the FTEXT= font specified in the GOPTIONS statement. Hardware characters are used by default.

GAMMA(ALPHA= value EST < gamma-options > )

  • creates a gamma probability plot for each value of the shape parameter ± given by the mandatory ALPHA= gamma-option . If you specify ALPHA=EST, the procedure creates a plot based on a maximum likelihood estimate for ± . To obtain a graphical estimate of ± , specify a list of values for the ALPHA= gamma-option , and select the value that most nearly linearizes the point pattern. To assess the point pattern, add a diagonal distribution reference line corresponding to and ƒ with the THETA= and SIGMA= gamma-options . Alternatively, you can add a line corresponding to estimated values of the threshold parameter and the scale parameter ƒ with the gamma-options THETA=EST and SIGMA=EST. Agreement between the reference line and the point pattern indicates that the gamma distribution with parameters ± , and ƒ is a good fit. You can specify the SCALE= gamma-option as an alias for the SIGMA= gamma-option and the THRESHOLD= gamma-option as an alias for the THETA= gamma-option .

GRID

  • displays a grid. Grid lines are reference lines that are perpendicular to the percentile axis at major tick marks.

HEIGHT= value

  • specifies the height, in percentage screen units, of text for axis labels, tick mark labels, and legends. This option takes precedence over the HTEXT= option in the GOPTIONS statement.

HMINOR= n

HM= n

  • specifies the number of minor tick marks between each major tick mark on the horizontal axis. Minor tick marks are not labeled. By default, HMINOR=0.

HREF= values

  • draws reference lines that are perpendicular to the horizontal axis at the values you specify.

HREFLABELS=' label1 ' ' labeln '

HREFLABEL= ' label1 ' ' labeln '

HREFLAB= ' label1 ' ' labeln '

  • specifies labels for the reference lines requested by the HREF= option. The number of labels must equal the number of reference lines. Labels can have up to 16 characters.

HREFLABPOS= n

  • specifies the vertical position of HREFLABELS= labels. If you specify HREFLABPOS=1, the labels are positioned along the top of the plot. If you specify HREFLABPOS=2, the labels are staggered from top to bottom of the plot. If you specify HREFLABPOS=3, the labels are positioned along the bottom of the plot. By default, HREFLABPOS=1.

INFONT= font

  • specifies a software font to use for text inside the framed areas of the plot. The INFONT= option takes precedence over the FTEXT= option in the GOPTIONS statement. For a list of fonts, see SAS/GRAPH Reference .

INHEIGHT= value

  • specifies the height, in percentage screen units, of text used inside the framed areas of the plot. By default, the height specified by the HEIGHT= option is used. If you do not specify the HEIGHT= option, the height specified with the HTEXT= option in the GOPTIONS statement is used.

INTERTILE= value

  • specifies the distance, in horizontal percentage screen units, between the framed areas, which are called tiles . By default, the tiles are contiguous. This option is not available unless you use the CLASS statement.

L= linetype

  • specifies the line type for a diagonal distribution reference line. Enclose the L= option in parentheses after a distribution option. By default, L=1, which produces a solid line.

LGRID= linetype

  • specifies the line type for the grid requested by the GRID= option. By default, LGRID=1, which produces a solid line.

LHREF= linetype

LH= linetype

  • specifies the line type for the reference lines that you request with the HREF= option. By default, LHREF=2, which produces a dashed line.

LOGNORMAL(SIGMA= value EST < lognormal-options > )

LNORM(SIGMA= value EST < lognormal-options > )

  • creates a lognormal probability plot for each value of the shape parameter ƒ given by the mandatory SIGMA= lognormal-option . If you specify SIGMA=EST, the procedure creates a plot based on a maximum likelihood estimate for ƒ . To obtain a graphical estimate of ƒ , specify a list of values for the SIGMA= lognormal-option , and select the value that most nearly linearizes the point pattern. To assess the point pattern, add a diagonal distribution reference line corresponding to and with the THETA= and ZETA= lognormal-options . Alternatively, you can add a line corresponding to estimated values of the threshold parameter and the scale parameter with the lognormal-options THETA=EST and ZETA=EST. Agreement between the reference line and the point pattern indicates that the lognormal distribution with parameters ƒ , and is a good fit. You can specify the THRESHOLD= lognormal-option as an alias for the THETA= lognormal-option and the SCALE= lognormal-option as an alias for the ZETA= lognormal-option . See Example 3.26.

LVREF= linetype

  • specifies the line type for the reference lines requested with the VREF= option. By default, LVREF=2, which produces a dashed line.

MU= value EST

  • specifies the mean µ for a normal probability plot requested with the NORMAL option. Enclose the MU= normal-option in parentheses after the NORMAL option. The MU= normal-option must be specified with the SIGMA= normal-option , and they request a distribution reference line. You can specify MU=EST to request a distribution reference line with µ equal to the sample mean.

NADJ= value

  • specifies the adjustment value added to the sample size in the calculation of theoretical percentiles. By default, NADJ=1/4. Refer to Chambers et al. (1983).

NAME= ' string '

  • specifies a name for the plot, up to eight characters long, that appears in the PROC GREPLAY master menu. The default value is UNIVAR .

NCOLS= n

NCOL= n

  • specifies the number of columns in a comparative probability plot. By default, NCOLS=1 if you specify only one class variable, and NCOLS=2 if you specify two class variables. This option is not available unless you use the CLASS statement. If you specify two class variables, you can use the NCOLS= option with the NROWS= option.

NOFRAME

  • suppresses the frame around the subplot area.

NOHLABEL

  • suppresses the label for the horizontal axis. You can use this option to reduce clutter.

NORMAL < ( normal-options ) >

  • creates a normal probability plot. This is the default if you omit a distribution option. To assess the point pattern, you can add a diagonal distribution reference line corresponding to µ and ƒ with the MU= and SIGMA= normal-options . Alternatively, you can add a line corresponding to estimated values of µ and ƒ with the normal-options MU=EST and SIGMA=EST; the estimates of the mean µ and the standard deviation ƒ are the sample mean and sample standard deviation. Agreement between the reference line and the point pattern indicates that the normal distribution with parameters µ and ƒ is a good fit.

NOVLABEL

  • suppresses the label for the vertical axis. You can use this option to reduce clutter.

NOVTICK

  • suppresses the tick marks and tick mark labels for the vertical axis. This option also suppresses the label for the vertical axis.

NROWS= n

NROW= n

  • specifies the number of rows in a comparative probability plot. By default, NROWS=2. This option is not available unless you use the CLASS statement. If you specify two class variables, you can use the NCOLS= option with the NROWS= option.

PCTLMINOR

  • requests minor tick marks for the percentile axis. The HMINOR option overrides the minor tick marks requested by the PCTLMINOR option.

PCTLORDER= values

  • specifies the tick marks that are labeled on the theoretical percentile axis. Since the values are percentiles, the labels must be between 0 and 100, exclusive. The values must be listed in increasing order and must cover the plotted percentile range. Otherwise, the default values of 1, 5, 10, 25, 50, 75, 90, 95, and 99 are used.

RANKADJ= value

  • specifies the adjustment value added to the ranks in the calculation of theoretical percentiles. By default, RANKADJ= RANKADJ= , as recommended by Blom (1958). Refer to Chambers et al. (1983) for additional information.

SCALE= value EST

  • is an alias for the SIGMA= option for plots requested by the BETA, EXPONENTIAL, GAMMA, and WEIBULL options and for the ZETA= option when you request the LOGNORMAL option. See the entries for the SIGMA= and ZETA= options.

SHAPE= value EST

  • is an alias for the ALPHA= option with the GAMMA option, for the SIGMA= option with the LOGNORMAL option, and for the C= option with the WEIBULL and WEIBULL2 options. See the entries for the ALPHA=, SIGMA=, and C= options.

SIGMA= value EST

  • specifies the parameter ƒ , where ƒ > 0. Alternatively, you can specify SIGMA=EST to request a maximum likelihood estimate for ƒ . The interpretation and use of the SIGMA= option depend on the distribution option with which it is used. See Table 3.44 for a summary of how to use the SIGMA= option. You must enclose this option in parentheses after the distribution option.

Table 3.44: Uses of the SIGMA= Option

Distribution Option

Use of the SIGMA= Option

BETA EXPONENTIAL GAMMA WEIBULL

THETA= and SIGMA= ƒ request a distribution reference line corresponding to and ƒ .

LOGNORMAL

SIGMA= ƒ 1 ƒ n requests n probability plots with shape parameters ƒ 1 ƒ n . The SIGMA= option must be specified.

NORMAL

MU= µ and SIGMA= ƒ request a distribution reference line corresponding to µ and ƒ . SIGMA=EST requests a line with ƒ equal to the sample standard deviation.

WEIBULL2

SIGMA= ƒ and C= c request a distribution reference line corresponding to ƒ and c .

SLOPE= value EST

  • specifies the slope for a distribution reference line requested with the LOGNORMAL and WEIBULL2 options. Enclose the SLOPE= option in parentheses after the distribution option. When you use the SLOPE= lognormal-option with the LOGNORMAL option, you must also specify a threshold parameter value with the THETA= lognormal-option to request the line. The SLOPE= lognormal-option is an alternative to the ZETA= lognormal-option for specifying , since the slope is equal to exp( ).

    When you use the SLOPE= Weibull2-option with the WEIBULL2 option, you must also specify a scale parameter value ƒ with the SIGMA= Weibull2-option to request the line. The SLOPE= Weibull2-option is an alternative to the C= Weibull2-option for specifying c , since the slope is equal to .

    For example, the first and second PROBPLOT statements produce the same probability plots and the third and fourth PROBPLOT statements produce the same probability plots:

      proc univariate data=Measures;   probplot Width / lognormal(sigma=2 theta=0 zeta=0);   probplot Width / lognormal(sigma=2 theta=0 slope=1);   probplot Width / weibull2(sigma=2 theta=0 c=.25);   probplot Width / weibull2(sigma=2 theta=0 slope=4);   run;  

SQUARE

  • displays the probability plot in a square frame. By default, the plot is in a rectangular frame.

THETA= value EST

  • specifies the lower threshold parameter for plots requested with the BETA, EXPONENTIAL, GAMMA, LOGNORMAL, WEIBULL, and WEIBULL2 options. Enclose the THETA= option in parentheses after a distribution option. When used with the WEIBULL2 option, the THETA= option specifies the known lower threshold , for which the default is 0. When used with the other distribution options, the THETA= option specifies for a distribution reference line; alternatively in this situation, you can specify THETA=EST to request a maximum likelihood estimate for . To request the line, you must also specify a scale parameter.

THRESHOLD= value EST

  • is an alias for the THETA= option.

VAXISLABEL= ' label '

  • specifies a label for the vertical axis. Labels can have up to 40 characters.

VMINOR= n

VM= n

  • specifies the number of minor tick marks between each major tick mark on the vertical axis. Minor tick marks are not labeled. The default is zero.

VREF= values

  • draws reference lines perpendicular to the vertical axis at the values specified. Also see the CVREF=, LVREF=, and VREFCHAR= options.

VREFLABELS= ' label1 ' ' labeln '

VREFLABEL= ' label1 ' ' labeln '

VREFLAB= ' label1 ' ' labeln '

  • specifies labels for the reference lines requested by the VREF= option. The number of labels must equal the number of reference lines. Enclose each label in quotes. Labels can have up to 16 characters.

VREFLABPOS= n

  • specifies the horizontal position of VREFLABELS= labels. If you specify VREFLABPOS=1, the labels are positioned at the left of the histogram. If you specify VREFLABPOS=2, the labels are positioned at the right of the histogram. By default, VREFLABPOS=1.

W= n

  • specifies the width, in pixels, for a diagonal distribution line. Enclose the W= option in parentheses after the distribution option. By default, W=1.

WAXIS= n

  • specifies the line thickness, in pixels, for the axes and frame. By default, WAXIS=1.

WEIBULL(C= value EST < Weibull-options > )

WEIB(C= value EST < Weibull-options > )

  • creates a three-parameter Weibull probability plot for each value of the required shape parameter c specified by the mandatory C= Weibull-option . To create a plot that is based on a maximum likelihood estimate for c , specify C=EST. To obtain a graphical estimate of c , specify a list of values in the C= Weibull-option , and select the value that most nearly linearizes the point pattern. To assess the point pattern, add a diagonal distribution reference line corresponding to and ƒ with the THETA= and SIGMA= Weibull-options . Alternatively, you can add a line corresponding to estimated values of and ƒ with the Weibull-options THETA=EST and SIGMA=EST. Agreement between the reference line and the point pattern indicates that the Weibull distribution with parameters c , , and ƒ is a good fit. You can specify the SCALE= Weibull-option as an alias for the SIGMA= Weibull-option and the THRESHOLD= Weibull-option as an alias for the THETA= Weibull-option .

WEIBULL2 < ( Weibull2-options ) >

W2 < ( Weibull2-options ) >

  • creates a two-parameter Weibull probability plot. You should use the WEIBULL2 option when your data have a known lower threshold , which is 0 by default. To specify the threshold value , use the THETA= Weibull2-option . By default, THETA=0. An advantage of the two-parameter Weibull plot over the three-parameter Weibull plot is that the parameters c and ƒ can be estimated from the slope and intercept of the point pattern. A disadvantage is that the two-parameter Weibull distribution applies only in situations where the threshold parameter is known. To obtain a graphical estimate of , specify a list of values for the THETA= Weibull2-option , and select the value that most nearly linearizes the point pattern. To assess the point pattern, add a diagonal distribution reference line corresponding to ƒ and c with the SIGMA= and C= Weibull2-options . Alternatively, you can add a distribution reference line corresponding to estimated values of ƒ and c with the Weibull2-options SIGMA=EST and C=EST. Agreement between the reference line and the point pattern indicates that the Weibull distribution with parameters c , , and ƒ is a good fit. You can specify the SCALE= Weibull2-option as an alias for the SIGMA= Weibull2-option and the SHAPE= Weibull2-option as an alias for the C= Weibull2-option .

ZETA= value EST

  • specifies a value for the scale parameter for the lognormal probability plots requested with the LOGNORMAL option. Enclose the ZETA= lognormal-option in parentheses after the LOGNORMAL option. To request a distribution reference line with intercept and slope exp( ), specify the THETA= and ZETA= .

QQPLOT Statement

  • QQPLOT < variables > < /options > ;

The QQPLOT statement creates quantile-quantile plots (Q-Q plots) using high-resolution graphics and compares ordered variable values with quantiles of a specified theoretical distribution. If the data distribution matches the theoretical distribution, the points on the plot form a linear pattern. Thus, you can use a Q-Q plot to determine how well a theoretical distribution models a set of measurements.

Q-Q plots are similar to probability plots, which you can create with the PROBPLOT statement. Q-Q plots are preferable for graphical estimation of distribution parameters, whereas probability plots are preferable for graphical estimation of percentiles.

You can use any number of QQPLOT statements in the UNIVARIATE procedure. The components of the QQPLOT statement are described as follows.

variables

  • are the variables for which to create Q-Q plots. If you specify a VAR statement, the variables must also be listed in the VAR statement. Otherwise, the variables can be any numeric variables in the input data set. If you do not specify a list of variables , then by default the procedure creates a Q-Q plot for each variable listed in the VAR statement, or for each numeric variable in the DATA= data set if you do not specify a VAR statement. For example, each of the following QQPLOT statements produces two Q-Q plots, one for Length and one for Width :

      proc univariate data=Measures;   var Length Width;   qqplot;   proc univariate data=Measures;   qqplot Length Width;   run;  

options

  • specify the theoretical distribution for the plot or add features to the plot. If you spec ify more than one variable, the options apply equally to each variable. Specify all options after the slash (/) in the QQPLOT statement. You can specify only one option naming the distribution in each QQPLOT statement, but you can specify any number of other options . The distributions available are the beta, exponential, gamma, lognormal, normal, two-parameter Weibull, and three-parameter Weibull. By default, the procedure produces a plot for the normal distribution.

  • In the following example, the NORMAL option requests a normal Q-Q plot for each variable. The MU= and SIGMA= normal-options request a distribution reference line with intercept 10 and slope 0.3 for each plot, corresponding to a normal distribution with mean µ = 10 and standard deviation ƒ = 0 . 3. The SQUARE option displays the plot in a square frame, and the CTEXT= option specifies the text color.

      proc univariate data=measures;   qqplot length1 length2 / normal(mu=10 sigma=0.3)   square ctext=blue;   run;  
  • Table 3.45 through Table 3.54 list the QQPLOT options by function. For complete descriptions, see the section Dictionary of Options on page 258.

    Table 3.45: Primary Options for Theoretical Distributions

    BETA( beta-options )

    Specifies beta Q-Q plot for shape parameters ± and ² specified with mandatory ALPHA= and BETA= beta-options

    EXPONENTIAL( exponential-options )

    Specifies exponential Q-Q plot

    GAMMA( gamma-options )

    Specifies gamma Q-Q plot for shape parameter ± specified with mandatory ALPHA= gamma-option

    LOGNORMAL( lognormal-options )

    Specifies lognormal Q-Q plot for shape parameter ƒ specified with mandatory SIGMA= lognormal-option

    NORMAL( normal-options )

    Specifies normal Q-Q plot

    WEIBULL( Weibull-options )

    Specifies three-parameter Weibull Q-Q plot for shape parameter c specified with mandatory C= Weibull-option

    WEIBULL2( Weibull2-options )

    Specifies two-parameter Weibull Q-Q plot

    Table 3.46: Secondary Reference Line Options Used with All Distributions

    COLOR= color

    Specifies color of distribution reference line

    L= linetype

    Specifies line type of distribution reference line

    W= n

    Specifies width of distribution reference line

    Table 3.47: Secondary Beta-Options

    ALPHA= value-list EST

    Specifies mandatory shape parameter ±

    BETA= value-list EST

    Specifies mandatory shape parameter ²

    SIGMA= value EST

    Specifies ƒ for distribution reference line

    THETA= value EST

    Specifies ƒ for distribution reference line

    Table 3.48: Secondary Exponential-Options

    SIGMA= value EST

    Specifies ƒ for distribution reference line

    THETA= value EST

    Specifies for distribution reference line

    Table 3.49: Secondary Gamma-Options

    ALPHA= value-list EST

    Specifies mandatory shape parameter ±

    SIGMA= value EST

    Specifies ƒ for distribution reference line

    THETA= value EST

    Specifies for distribution reference line

    Table 3.50: Secondary Lognormal-Options

    SIGMA= value-list EST

    Specifies mandatory shape parameter ƒ

    SLOPE= value EST

    Specifies slope of distribution reference line

    THETA= value EST

    Specifies for distribution reference line

    ZETA= value

    Specifies for distribution reference line (slope is exp( ))

    Table 3.51: Secondary Normal-Options

    MU= value EST

    Specifies µ for distribution reference line

    SIGMA= value EST

    Specifies ƒ for distribution reference line

    Table 3.52: Secondary Weibull-Options

    C= value-list EST

    Specifies mandatory shape parameter c

    SIGMA= value EST

    Specifies ƒ for distribution reference line

    THETA= value EST

    Specifies for distribution reference line

    Table 3.53: Secondary Weibull2-Options

    C= value EST

    Specifies c for distribution reference line (slope is 1 /c )

    SIGMA= value EST

    Specifies ƒ for distribution reference line (intercept is log( ƒ ))

    SLOPE= value EST

    Specifies slope of distribution reference line

    THETA= value

    Specifies known lower threshold

    Table 3.54: General Graphics Options

    Option

    Description

    ANNOKEY

    Applies annotation requested in ANNOTATE= data set to key cell only

    ANNOTATE=

    Specifies annotate data set

    CAXIS=

    Specifies color for axis

    CFRAME=

    Specifies color for frame

    CFRAMESIDE=

    Specifies color for filling frame for row labels

    CFRAMETOP=

    Specifies color for filling frame for column labels

    CGRID=

    Specifies color for grid lines

    CHREF=

    Specifies color for HREF= lines

    CTEXT=

    Specifies color for text

    CVREF=

    Specifies color for VREF= lines

    DESCRIPTION=

    Specifies description for plot in graphics catalog

    FONT=

    Specifies software font for text

    GRID

    Creates a grid

    HEIGHT=

    Specifies height of text used outside framed areas

    HMINOR=

    Specifies number of horizontal minor tick marks

    HREF=

    Specifies reference lines perpendicular to the horizontal axis

    HREFLABELS=

    Specifies labels for HREF= lines

    HREFLABPOS=

    Specifies vertical position of labels for HREF= lines

    INFONT=

    Specifies software font for text inside framed areas

    INHEIGHT=

    Specifies height of text inside framed areas

    INTERTILE=

    Specifies distance between tiles

    LGRID=

    Specifies a line type for grid lines

    LHREF=

    Specifies line style for HREF= lines

    LVREF=

    Specifies line style for VREF= lines

    NADJ=

    Adjusts sample size when computing percentiles

    NAME=

    Specifies name for plot in graphics catalog

    NCOLS=

    Specifies number of columns in comparative Q-Q plot

    NOFRAME

    Suppresses frame around plotting area

    NOHLABEL

    Suppresses label for horizontal axis

    NOVLABEL

    Suppresses label for vertical axis

    NOVTICK

    Suppresses tick marks and tick mark labels for vertical axis

    NROWS=

    Specifies number of rows in comparative Q-Q plot

    PCTLAXIS

    Displays a nonlinear percentile axis

    PCTLMINOR

    Requests minor tick marks for percentile axis

    PCTLSCALE

    Replaces theoretical quantiles with percentiles

    RANKADJ=

    Adjusts ranks when computing percentiles

    SQUARE

    Displays plot in square format

    VAXISLABEL=

    Specifies label for vertical axis

    VMINOR=

    Specifies number of vertical minor tick marks

    VREF=

    Specifies reference lines perpendicular to the vertical axis

    VREFLABELS=

    Specifies labels for VREF= lines

    VREFLABPOS=

    Specifies horizontal position of labels for VREF= lines

    WAXIS=

    Specifies line thickness for axes and frame

  • Options can be any of the following:

    • primary options

    • secondary options

    • general options

Distribution Options

Table 3.45 lists primary options for requesting a theoretical distribution.

Table 3.46 through Table 3.53 list secondary options that specify distribution parameters and control the display of a distribution reference line. Specify these options in parentheses after the distribution keyword. For example, you can request a normal Q-Q plot with a distribution reference line by specifying the NORMAL option as follows:

  proc univariate;   qqplot Length / normal(mu=10 sigma=0.3 color=red);   run;  

The MU= and SIGMA= normal-options display a distribution reference line that corresponds to the normal distribution with mean µ = 10 and standard deviation ƒ = 0 . 3, and the COLOR= normal-option specifies the color for the line.

General Options

Table 3.54 summarizes general options for enhancing Q-Q plots.

Dictionary of Options

The following entries provide detailed descriptions of options in the QQPLOT statement.

ALPHA= value EST

  • specifies the mandatory shape parameter ± for quantile plots requested with the BETA and GAMMA options. Enclose the ALPHA= option in parentheses after the BETA or GAMMA options. If you specify ALPHA=EST, a maximum likelihood estimate is computed for ± .

ANNOKEY

  • applies the annotation requested with the ANNOTATE= option to the key cell only. By default, the procedure applies annotation to all of the cells. This option is not available unless you use the CLASS statement. Specify the KEYLEVEL= option in the CLASS statement to specify the key cell.

ANNOTATE= SAS-data-set

ANNO= SAS-data-set

  • specifies an input data set containing annotate variables as described in SAS/GRAPH Software: Reference . The ANNOTATE= data set you specify in the HISTOGRAM statement is used for all plots created by the statement. You can also specify an ANNOTATE= data set in the PROC UNIVARIATE statement to enhance all plots created by the procedure.

BETA(ALPHA= value EST BETA= value EST < beta-options > )

  • creates a beta quantile plot for each combination of the required shape parameters ± and ² specified by the required ALPHA= and BETA= beta-options . If you specify ALPHA=EST and BETA=EST, the procedure creates a plot based on maximum likelihood estimates for ± and ² . You can specify the SCALE= beta-option as an alias for the SIGMA= beta-option and the THRESHOLD= beta-option as an alias for the THETA= beta-option . To create a plot that is based on maximum likelihood estimates for ± and ² , specify ALPHA=EST and BETA=EST.

  • To obtain graphical estimates of ± and ² , specify lists of values in the ALPHA= and BETA= beta-options , and select the combination of ± and ² that most nearly linearizes the point pattern. To assess the point pattern, you can add a diagonal distribution reference line corresponding to lower threshold parameter and scale parameter ƒ with the THETA= and SIGMA= beta-options . Alternatively, you can add a line that corresponds to estimated values of and ƒ with the beta-options THETA=EST and SIGMA=EST. Agreement between the reference line and the point pattern indicates that the beta distribution with parameters ± , ² , , and ƒ is a good fit.

BETA= value EST

B= value EST

  • specifies the mandatory shape parameter ² for quantile plots requested with the BETA option. Enclose the BETA= option in parentheses after the BETA option. If you specify BETA=EST, a maximum likelihood estimate is computed for ² .

C= value EST

  • specifies the shape parameter c for quantile plots requested with the WEIBULL and WEIBULL2 options. Enclose this option in parentheses after the WEIBULL or WEIBULL2 option. C= is a required Weibull-option in the WEIBULL option; in this situation, it accepts a list of values, or if you specify C=EST, a maximum likelihood estimate is computed for c . You can optionally specify C= value or C=EST as a Weibull2-option with the WEIBULL2 option to request a distribution reference line; in this situation, you must also specify Weibull2-option SIGMA= value or SIGMA=EST.

CAXIS= color

CAXES= color

  • specifies the color for the axes. This option overrides any COLOR= specifications in an AXIS statement. The default value is the first color in the device color list.

CFRAME= color

  • specifies the color for the area that is enclosed by the axes and frame. The area is not filled by default.

CFRAMESIDE= color

  • specifies the color to fill the frame area for the row labels that display along the left side of a comparative quantile plot. This color also fills the frame area for the label of the corresponding class variable (if you associate a label with the variable). By default, these areas are not filled. This option is not available unless you use the CLASS statement.

CFRAMETOP= color

  • specifies the color to fill the frame area for the column labels that display across the top of a comparative quantile plot. This color also fills the frame area for the label of the corresponding class variable (if you associate a label with the variable). By default, these areas are not filled. This option does not apply unless you use the CLASS statement.

CGRID= color

  • specifies the color for grid lines when a grid displays on the plot. The default color is the first color in the device color list. This option also produces a grid.

CHREF= color

CH= color

  • specifies the color for horizontal axis reference lines requested by the HREF= option. The default color is the first color in the device color list.

COLOR= color

  • specifies the color of the diagonal distribution reference line. The default color is the first color in the device color list. Enclose the COLOR= option in parentheses after a distribution option keyword.

CTEXT= color

  • specifies the color for tick mark values and axis labels. The default color is the color that you specify for the CTEXT= option in the GOPTIONS statement. If you omit the GOPTIONS statement, the default is the first color in the device color list.

CVREF= color

CV= color

  • specifies the color for the reference lines requested by the VREF= option. The default color is the first color in the device color list.

DESCRIPTION= ' string '

DES= ' string '

  • specifies a description, up to 40 characters long, that appears in the PROC GREPLAY master menu. The default string is the variable name.

EXPONENTIAL < ( exponential-options ) >

EXP < ( exponential-options ) >

  • creates an exponential quantile plot. To assess the point pattern, add a diagonal distribution reference line corresponding to and ƒ with the THETA= and SIGMA= exponential-options . Alternatively, you can add a line corresponding to estimated values of the threshold parameter and the scale parameter ƒ with the exponential-options THETA=EST and SIGMA=EST. Agreement between the reference line and the point pattern indicates that the exponential distribution with parameters and ƒ is a good fit. You can specify the SCALE= exponential-option as an alias for the SIGMA= exponential-option and the THRESHOLD= exponential-option as an alias for the THETA= exponential-option .

FONT= font

  • specifies a software font for the reference lines and axis labels. You can also specify fonts for axis labels in an AXIS statement. The FONT= font takes precedence over the FTEXT= font specified in the GOPTIONS statement. Hardware characters are used by default.

GAMMA(ALPHA= value EST < gamma-options > )

  • creates a gamma quantile plot for each value of the shape parameter ± given by the mandatory ALPHA= gamma-option . If you specify ALPHA=EST, the procedure creates a plot based on a maximum likelihood estimate for ± . To obtain a graphical estimate of ± , specify a list of values for the ALPHA= gamma-option , and select the value that most nearly linearizes the point pattern. To assess the point pattern, add a diagonal distribution reference line corresponding to and ƒ with the THETA= and SIGMA= gamma-options . Alternatively, you can add a line corresponding to estimated values of the threshold parameter and the scale parameter with the gamma-options THETA=EST and SIGMA=EST. Agreement between the reference line and the point pattern indicates that the gamma distribution with parameters ± , and ƒ is a good fit. You can specify the SCALE= gamma-option as an alias for the SIGMA= gamma-option and the THRESHOLD= gamma-option as an alias for the THETA= gamma-option .

GRID

  • displays a grid of horizontal lines positioned at major tick marks on the vertical axis.

HEIGHT= value

  • specifies the height, in percentage screen units, of text for axis labels, tick mark labels, and legends. This option takes precedence over the HTEXT= option in the GOPTIONS statement.

HMINOR= n

HM= n

  • specifies the number of minor tick marks between each major tick mark on the horizontal axis. Minor tick marks are not labeled. By default, HMINOR=0.

HREF= values

  • draws reference lines that are perpendicular to the horizontal axis at specified values. When you use the PCTLAXIS option, HREF= values must be in quantile units.

HREFLABELS= ' label1 ' ' labeln '

HREFLABEL= ' label1' . . . 'labeln '

HREFLAB= ' label1 ' . . . ' labeln '

  • specifies labels for the reference lines requested by the HREF= option. The number of labels must equal the number of reference lines. Labels can have up to 16 characters.

HREFLABPOS= n

  • specifies the vertical position of HREFLABELS= labels. If you specify HREFLABPOS=1, the labels are positioned along the top of the plot. If you specify HREFLABPOS=2, the labels are staggered from top to bottom of the plot. If you specify HREFLABPOS=3, the labels are positioned along the bottom of the plot. By default, HREFLABPOS=1.

INFONT= font

  • specifies a software font to use for text inside the framed areas of the plot. The INFONT= option takes precedence over the FTEXT= option in the GOPTIONS statement. For a list of fonts, see SAS/GRAPH Reference .

INHEIGHT= value

  • specifies the height, in percentage screen units, of text used inside the framed areas of the plot. By default, the height specified by the HEIGHT= option is used. If you do not specify the HEIGHT= option, the height specified with the HTEXT= option in the GOPTIONS statement is used.

INTERTILE= value

  • specifies the distance, in horizontal percentage screen units, between the framed areas, which are called tiles . By default, INTERTILE=0.75 percentage screen units. This option is not available unless you use the CLASS statement. You can specify INTERTILE=0 to create contiguous tiles.

L= linetype

  • specifies the line type for a diagonal distribution reference line. Enclose the L= option in parentheses after a distribution option. By default, L=1, which produces a solid line.

LGRID= linetype

  • specifies the line type for the grid requested by the GRID option. By default, LGRID=1, which produces a solid line. The LGRID= option also produces a grid.

LHREF= linetype

LH= linetype

  • specifies the line type for the reference lines that you request with the HREF= option. By default, LHREF=2, which produces a dashed line.

LOGNORMAL(SIGMA= value EST < lognormal-options > )

LNORM(SIGMA= value EST < lognormal-options > )

  • creates a lognormal quantile plot for each value of the shape parameter ƒ given by the mandatory SIGMA= lognormal-option . If you specify SIGMA=EST, the procedure creates a plot based on a maximum likelihood estimate for ƒ . To obtain a graphical estimate of ƒ , specify a list of values for the SIGMA= lognormal-option , and select the value that most nearly linearizes the point pattern. To assess the point pattern, add a diagonal distribution reference line corresponding to and with the THETA= and ZETA= lognormal-options . Alternatively, you can add a line corresponding to estimated values of the threshold parameter and the scale parameter with the lognormal-options THETA=EST and ZETA=EST. Agreement between the reference line and the point pattern indicates that the lognormal distribution with parameters ƒ , and is a good fit. You can specify the THRESHOLD= lognormal-option as an alias for the THETA= lognormal-option and the SCALE= lognormal-option as an alias for the ZETA= lognormal-option . See Example 3.31 through Example 3.33.

LVREF= linetype

  • specifies the line type for the reference lines requested with the VREF= option. By default, LVREF=2, which produces a dashed line.

MU= value EST

  • specifies the mean µ for a normal quantile plot requested with the NORMAL option. Enclose the MU= normal-option in parentheses after the NORMAL option. The MU= normal-option must be specified with the SIGMA= normal-option , and they request a distribution reference line. You can specify MU=EST to request a distribution reference line with µ equal to the sample mean.

NADJ= value

  • specifies the adjustment value added to the sample size in the calculation of theoretical percentiles. By default, NADJ=1/4. Refer to Chambers et al. (1983) for additional information.

NAME= ' string '

  • specifies a name for the plot, up to eight characters long, that appears in the PROC GREPLAY master menu. The default value is UNIVAR .

NCOLS= n

NCOL= n

  • specifies the number of columns in a comparative quantile plot. By default, NCOLS=1 if you specify only one class variable, and NCOLS=2 if you specify two class variables. This option is not available unless you use the CLASS statement. If you specify two class variables, you can use the NCOLS= option with the NROWS= option.

NOFRAME

  • suppresses the frame around the subplot area. If you specify the PCTLAXIS option, then you cannot specify the NOFRAME option.

NOHLABEL

  • suppresses the label for the horizontal axis. You can use this option to reduce clutter.

NORMAL < ( normal-options ) >

  • creates a normal quantile plot. This is the default if you omit a distribution option. To assess the point pattern, you can add a diagonal distribution reference line corresponding to µ and ƒ with the MU= and SIGMA= normal-options . Alternatively, you can add a line corresponding to estimated values of µ and ƒ with the normal-options MU=EST and SIGMA=EST; the estimates of the mean µ and the standard deviation ƒ are the sample mean and sample standard deviation. Agreement between the reference line and the point pattern indicates that the normal distribution with parameters µ and ƒ is a good fit. See Example 3.28 and Example 3.30.

NOVLABEL

  • suppresses the label for the vertical axis. You can use this option to reduce clutter.

NOVTICK

  • suppresses the tick marks and tick mark labels for the vertical axis. This option also suppresses the label for the vertical axis.

NROWS= n

NROW= n

  • specifies the number of rows in a comparative quantile plot. By default, NROWS=2. This option is not available unless you use the CLASS statement. If you specify two class variables, you can use the NCOLS= option with the NROWS= option.

PCTLAXIS < ( axis-options ) >

  • adds a nonlinear percentile axis along the frame of the Q-Q plot opposite the theoretical quantile axis. The added axis is identical to the axis for probability plots produced with the PROBPLOT statement. When using the PCTLAXIS option, you must specify HREF= values in quantile units, and you cannot use the NOFRAME option. You can specify the following axis-options :

Table 3.55: Axis Options

GRID

Draws vertical grid lines at major percentiles

GRIDCHAR=' character '

Specifies grid line plotting character on line printer

LABEL=' string '

Specifies label for percentile axis

LGRID= linetype

Specifies line type for grid

PCTLMINOR

  • requests minor tick marks for the percentile axis when you specify PCTLAXIS. The HMINOR option overrides the PCTLMINOR option.

PCTLSCALE

  • requests scale labels for the theoretical quantile axis in percentile units, resulting in a nonlinear axis scale. Tick marks are drawn uniformly across the axis based on the quantile scale. In all other respects, the plot remains the same, and you must specify HREF= values in quantile units. For a true nonlinear axis, use the PCTLAXIS option or use the PROBPLOT statement.

RANKADJ= value

  • specifies the adjustment value added to the ranks in the calculation of theoretical percentiles. By default, RANKADJ= ± 3/8, as recommended by Blom (1958). Refer to Chambers et al. (1983) for additional information.

SCALE= value EST

  • is an alias for the SIGMA= option for plots requested by the BETA, EXPONENTIAL, GAMMA, WEIBULL, and WEIBULL2 options and for the ZETA= option with the LOGNORMAL option. See the entries for the SIGMA= and ZETA= options.

SHAPE= value EST

  • is an alias for the ALPHA= option with the GAMMA option, for the SIGMA= op tion with the LOGNORMAL option, and for the C= option with the WEIBULL and WEIBULL2 options. See the entries for the ALPHA=, SIGMA=, and C= options.

SIGMA= value EST

  • specifies the parameter ƒ , where ƒ > 0. Alternatively, you can specify SIGMA=EST to request a maximum likelihood estimate for ƒ . The interpretation and use of the SIGMA= option depend on the distribution option with which it is used, as summarized in Table 3.56. Enclose this option in parentheses after the distribution option.

Table 3.56: Uses of the SIGMA= Option

Distribution Option

Use of the SIGMA= Option

BETA EXPONENTIAL GAMMA WEIBULL

THETA= and SIGMA= ƒ request a distribution reference line corresponding to and ƒ .

LOGNORMAL

SIGMA= ƒ 1 ƒ n requests n quantile plots with shape parameters ƒ 1 ƒ n . The SIGMA= option must be specified.

NORMAL

MU= µ and SIGMA= ƒ request a distribution reference line corresponding to µ and ƒ . SIGMA=EST requests a line with ƒ equal to the sample standard deviation.

WEIBULL2

SIGMA= ƒ and C= c request a distribution reference line corresponding to ƒ and c .

SLOPE= value EST

  • specifies the slope for a distribution reference line requested with the LOGNORMAL and WEIBULL2 options. Enclose the SLOPE= option in parentheses after the distribution option. When you use the SLOPE= lognormal-option with the LOGNORMAL option, you must also specify a threshold parameter value with the THETA= lognormal-option to request the line. The SLOPE= lognormal-option is an alternative to the ZETA= lognormal-option for specifying , since the slope is equal to exp( ).

  • When you use the SLOPE= Weibull2-option with the WEIBULL2 option, you must also specify a scale parameter value ƒ with the SIGMA= Weibull2-option to request the line. The SLOPE= Weibull2-option is an alternative to the C= Weibull2-option for specifying c , since the slope is equal to .

  • For example, the first and second QQPLOT statements produce the same quantile plots and the third and fourth QQPLOT statements produce the same quantile plots:

      proc univariate data=Measures;   qqplot Width / lognormal(sigma=2 theta=0 zeta=0);   qqplot Width / lognormal(sigma=2 theta=0 slope=1);   qqplot Width / weibull2(sigma=2 theta=0 c=.25);   qqplot Width / weibull2(sigma=2 theta=0 slope=4);  

SQUARE

  • displays the quantile plot in a square frame. By default, the frame is rectangular.

THETA= value EST

  • specifies the lower threshold parameter for plots requested with the BETA, EXPONENTIAL, GAMMA, LOGNORMAL, WEIBULL, and WEIBULL2 options. Enclose the THETA= option in parentheses after a distribution option. When used with the WEIBULL2 option, the THETA= option specifies the known lower threshold , for which the default is 0. When used with the other distribution options, the THETA= option specifies for a distribution reference line; alternatively in this situation, you can specify THETA=EST to request a maximum likelihood estimate for . To request the line, you must also specify a scale parameter.

THRESHOLD= value EST

  • is an alias for the THETA= option.

VAXISLABEL= ' label '

  • specifies a label for the vertical axis. Labels can have up to 40 characters.

VMINOR= n

VM= n

  • specifies the number of minor tick marks between each major tick mark on the vertical axis. Minor tick marks are not labeled. The default is zero.

VREF= values

  • draws reference lines perpendicular to the vertical axis at the values specified. Also see the CVREF=, LVREF=, and VREFCHAR= options.

VREFLABELS= ' label1 ' ' labeln '

VREFLABEL= ' label1 ' ' labeln '

VREFLAB= ' label1 ' ' labeln '

  • specifies labels for the reference lines requested by the VREF= option. The number of labels must equal the number of reference lines. Enclose each label in quotes. Labels can have up to 16 characters.

VREFLABPOS= n

  • specifies the horizontal position of VREFLABELS= labels. If you specify VREFLABPOS=1, the labels are positioned at the left of the histogram. If you specify VREFLABPOS=2, the labels are positioned at the right of the histogram. By default, VREFLABPOS=1.

W= n

  • specifies the width, in pixels, for a diagonal distribution line. Enclose the W= option in parentheses after the distribution option. By default, W=1.

WAXIS= n

  • specifies the line thickness, in pixels, for the axes and frame. By default, WAXIS=1.

WEIBULL(C= value EST < Weibull-options > )

WEIB(C= value EST < Weibull-options > )

  • creates a three-parameter Weibull quantile plot for each value of the required shape parameter c specified by the mandatory C= Weibull-option . To create a plot that is based on a maximum likelihood estimate for c , specify C=EST. To obtain a graphical estimate of c , specify a list of values in the C= Weibull-option , and select the value that most nearly linearizes the point pattern. To assess the point pattern, add a diagonal distribution reference line corresponding to and ƒ with the THETA= and SIGMA= Weibull-options . Alternatively, you can add a line corresponding to estimated values of and ƒ with the Weibull-options THETA=EST ad SIGMA=EST. Agreement between the reference line and the point pattern indicates that the Weibull distribution with parameters c , , and ƒ is a good fit. You can specify the SCALE= Weibull-option as an alias for the SIGMA= Weibull-option and the THRESHOLD= Weibull-option as an alias for the THETA= Weibull-option . See Example 3.34.

WEIBULL2 < ( Weibull2-options ) >

W2 < ( Weibull2-options ) >

  • creates a two-parameter Weibull quantile plot. You should use the WEIBULL2 option when your data have a known lower threshold , which is 0 by default. To specify the threshold value , use the THETA= Weibull2-option . By default, THETA=0. An advantage of the two-parameter Weibull plot over the three-parameter Weibull plot is that the parameters c and ƒ can be estimated from the slope and intercept of the point pattern. A disadvantage is that the two-parameter Weibull distribution applies only in situations where the threshold parameter is known. To obtain a graphical estimate of , specify a list of values for the THETA= Weibull2-option , and select the value that most nearly linearizes the point pattern. To assess the point pattern, add a diagonal distribution reference line corresponding to ƒ and c with the SIGMA= and C= Weibull2-options . Alternatively, you can add a distribution reference line corresponding to estimated values of ƒ and c with the Weibull2-options SIGMA=EST and C=EST. Agreement between the reference line and the point pattern indicates that the Weibull distribution with parameters c , , and ƒ is a good fit. You can specify the SCALE= Weibull2-option as an alias for the SIGMA= Weibull2-option and the SHAPE= Weibull2-option as an alias for the C= Weibull2-option . See Example 3.34.

ZETA= value EST

  • specifies a value for the scale parameter for the lognormal quantile plots requested with the LOGNORMAL option. Enclose the ZETA= lognormal-option in parentheses after the LOGNORMAL option. To request a distribution reference line with intercept and slope exp( ), specify the THETA= and ZETA= .

VAR Statement

  • VAR variables ;

The VAR statement specifies the analysis variables and their order in the results. By default, if you omit the VAR statement, PROC UNIVARIATE analyzes all numeric variables that are not listed in the other statements.

Using the Output Statement with the VAR Statement

You must provide a VAR statement when you use an OUTPUT statement. To store the same statistic for several analysis variables in the OUT= data set, you specify a list of names in the OUTPUT statement. PROC UNIVARIATE makes a one-to-one correspondence between the order of the analysis variables in the VAR statement and the list of names that follow a statistic keyword.

WEIGHT Statement

  • WEIGHT variable ;

The WEIGHT statement specifies numeric weights for analysis variables in the statistical calculations. The UNIVARIATE procedure uses the values w i of the WEIGHT variable to modify the computation of a number of summary statistics by assuming that the variance of the i th value x i of the analysis variable is equal to ƒ 2 /w i , where ƒ is an unknown parameter. The values of the WEIGHT variable do not have to be integers and are typically positive. By default, observations with nonpositive or missing values of the WEIGHT variable are handled as follows: [*]

  • If the value is zero, the observation is counted in the total number of observations.

  • If the value is negative, it is converted to zero, and the observation is counted in the total number of observations.

  • If the value is missing, the observation is excluded from the analysis.

To exclude observations that contain negative and zero weights from the analysis, use EXCLNPWGT. Note that most SAS/STAT procedures, such as PROC GLM, exclude negative and zero weights by default. The weight variable does not change how the procedure determines the range, mode, extreme values, extreme observations, or number of missing values. When you specify a WEIGHT statement, the procedure also computes a weighted standard error and a weighted version of Student s t test. The Student s t test is the only test of location that PROC UNIVARIATE computes when you weight the analysis variables.

When you specify a WEIGHT variable, the procedure uses its values, w i , to compute weighted versions of the statistics [  ] provided in the Moments table. For example, the procedure computes a weighted mean x w and a weighted variance as

and

click to expand

where x i is the i th variable value. The divisor d is controlled by the VARDEF= option in the PROC UNIVARIATE statement.

The WEIGHT statement does not affect the determination of the mode, extreme values, extreme observations, or the number of missing values of the analysis variables. However, the weights w i are used to compute weighted percentiles. [*] The WEIGHT variable has no effect on graphical displays produced with the plot statements.

The CIPCTLDF, CIPCTLNORMAL, LOCCOUNT, NORMAL, ROBUSTSCALE, TRIMMED=, and WINSORIZED= options are not available with the WEIGHT statement.

To compute weighted skewness or kurtosis, use VARDEF=DF or VARDEF=N in the PROC statement.

You cannot specify the HISTOGRAM, PROBPLOT, or QQPLOT statements with the WEIGHT statement.

When you use the WEIGHT statement, consider which value of the VARDEF= option is appropriate. See VARDEF= and the calculation of weighted statistics in for more information.

[*] In Release 6.12 and earlier releases, observations were used in the analysis if and only if the WEIGHT variable value was greater than zero.

[  ] In Release 6.12 and earlier releases, weighted skewness and kurtosis were not computed.

[*] In Release 6.12 and earlier releases, the weights did not affect the computation of percentiles and the procedure did not exclude the observations with missing weights from the count of observations.




Base SAS 9.1.3 Procedures Guide (Vol. 3)
Base SAS 9.1 Procedures Guide, Volumes 1, 2, 3 and 4
ISBN: 1590472047
EAN: 2147483647
Year: 2004
Pages: 74

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net