Syntax: MEANS Procedure


Tip: Supports the Output Delivery System. See Output Delivery System on page 32 for details.

ODS Table Name : Summary

Reminder: You can use the ATTRIB, FORMAT, LABEL, and WHERE statements. See Chapter 3, Statements with the Same Function in Multiple Procedures, on page 57 for details. You can also use any global statements. See Global Statements on page 18 for a list.

PROC MEANS < option(s) >< statistic-keyword(s) >;

  • BY <DESCENDING> variable-1 < <DESCENDING> variable-n ><NOTSORTED>;

    CLASS variable(s) </ option(s) >;

    FREQ variable ;

    ID variable(s) ;

    OUTPUT <OUT= SAS-data-set >< output-statistic-specification(s) >

    < id- group -specification(s) >< maximum-id-specification(s) >

    < minimum-id-specification(s) ></ option(s) >;

    TYPES request(s) ;

    VAR variable(s) < / WEIGHT= weight-variable >;

    WAYS list ;

    WEIGHT variable ;

To do this

Use this statement

Calculate separate statistics for each BY group

BY

Identify variables whose values define subgroups for the analysis

CLASS

Identify a variable whose values represent the frequency of each observation

FREQ

Include additional identification variables in the output data set

ID

Create an output data set that contains specified statistics and identification variables

OUTPUT

Identify specific combinations of class variables to use to subdivide the data

TYPES

Identify the analysis variables and their order in the results

VAR

Specify the number of ways to make unique combinations of class variables

WAYS

Identify a variable whose values weight each observation in the statistical calculations

WEIGHT

PROC MEANS Statement

See also: Chapter 47, The SUMMARY Procedure, on page 1191

PROC MEANS < option(s) >< statistic-keyword(s) >;

To do this

Use this option

Specify the input data set

DATA=

Disable floating point exception recovery

NOTRAP

Specify the amount of memory to use for data summarization with class variables

SUMSIZE=

Override the SAS system option THREADS NOTHREADS

THREADS NOTHREADS

Control the classification levels

 
 

Specify a secondary data set that contains the combinations of class variables to analyze

CLASSDATA=

 

Create all possible combinations of class variable values

COMPLETETYPES

 

Exclude from the analysis all combinations of class variable values that are not in the CLASSDATA= data set

EXCLUSIVE

 

Use missing values as valid values to create combinations of class variables

MISSING

Control the statistical analysis

 
 

Specify the confidence level for the confidence limits

ALPHA=

 

Exclude observations with nonpositive weights from the analysis

EXCLNPWGTS

 

Specify the sample size to use for the P2 quantile estimation method

QMARKERS=

 

Specify the quantile estimation method

QMETHOD=

 

Specify the mathematical definition used to compute quantiles

QNTLDEF=

 

Select the statistics

statistic-keyword

 

Specify the variance divisor

VARDEF=

Control the output

 
 

Specify the field width for the statistics

FW=

 

Specify the number of decimal places for the statistics

MAXDEC=

 

Suppress reporting the total number of observations for each unique combination of the class variables

NONOBS

 

Suppress all displayed output

NOPRINT

 

Order the values of the class variables according to the specified order

ORDER=

 

Display the output

PRINT

 

Display the analysis for all requested combinations of class variables

PRINTALLTYPES

 

Display the values of the ID variables

PRINTIDVARS

Control the output data set

 
 

Specify that the _TYPE_ variable contain character values.

CHARTYPE

 

Order the output data set by descending _TYPE_ value

DESCENDTYPES

 

Select ID variables based on minimum values

IDMIN

 

Limit the output statistics to the observations with the highest _TYPE_ value

NWAY

Options

ALPHA= value

  • specifies the confidence level to compute the confidence limits for the mean. The percentage for the confidence limits is (1 ˆ’ value ) 100. For example, ALPHA=.05 results in a 95% confidence limit.

    Default: .05

    Range: between 0 and 1

    Interaction: To compute confidence limits specify the statistic-keyword CLM, LCLM, or UCLM.

    See also: Confidence Limits on page 553

    Featured in: Example 7 on page 574

CHARTYPE

  • specifies that the _TYPE_ variable in the output data set is a character representation of the binary value of _TYPE_. The length of the variable equals the number of class variables.

    Main discussion: Output Data Set on page 557

    Interaction: When you specify more than 32 class variables, _TYPE_ automatically becomes a character variable.

    Featured in: Example 10 on page 579

CLASSDATA= SAS-data-set

  • specifies a data set that contains the combinations of values of the class variables that must be present in the output. Any combinations of values of the class variables that occur in the CLASSDATA= data set but not in the input data set appear in the output and have a frequency of zero.

    Restriction: The CLASSDATA= data set must contain all class variables. Their data type and format must match the corresponding class variables in the input data set.

    Interaction: If you use the EXCLUSIVE option, then PROC MEANS excludes any observation in the input data set whose combination of class variables is not in the CLASSDATA= data set.

    Tip: Use the CLASSDATA= data set to filter or to supplement the input data set.

    Featured in: Example 4 on page 565

COMPLETETYPES

  • creates all possible combinations of class variables even if the combination does not occur in the input data set.

    Interaction: The PRELOADFMT option in the CLASS statement ensures that PROC MEANS writes all user -defined format ranges or values for the combinations of class variables to the output, even when a frequency is zero.

    Tip: Using COMPLETETYPES does not increase the memory requirements.

    Featured in: Example 6 on page 571

DATA= SAS-data-set

  • identifies the input SAS data set.

    Main discussion: Input Data Sets on page 19

DESCENDTYPES

  • orders observations in the output data set by descending _TYPE_ value.

    Alias: DESCENDING DESCEND

    Interaction: Descending has no effect if you specify NWAY.

    Tip: Use DESCENDTYPES to make the overall total (_TYPE_=0) the last observation in each BY group.

    See also: Output Data Set on page 557

    Featured in: Example 9 on page 578

EXCLNPWGTS

  • excludes observations with nonpositive weight values (zero or negative) from the analysis. By default, PROC MEANS treats observations with negative weights like those with zero weights and counts them in the total number of observations.

    Alias: EXCLNPWGT

    See also: WEIGHT= on page 548 and WEIGHT Statement on page 549

EXCLUSIVE

  • excludes from the analysis all combinations of the class variables that are not found in the CLASSDATA= data set.

    Requirement: If a CLASSDATA= data set is not specified, then this option is ignored.

    Featured in: Example 4 on page 565

FW= field-width

  • specifies the field width to display the statistics in printed or displayed output. FW= has no effect on statistics that are saved in an output data set.

    Default: 12

    Tip: If PROC MEANS truncates column labels in the output, then increase the field width.

    Featured in: Example 1 on page 558, Example 4 on page 565, and Example 5 on page 568

IDMIN

  • specifies that the output data set contain the minimum value of the ID variables.

    Interaction: Specify PRINTIDVARS to display the value of the ID variables in the output.

    See also: ID Statement on page 540

MAXDEC= number

  • specifies the maximum number of decimal places to display the statistics in the printed or displayed output. MAXDEC= has no effect on statistics that are saved in an output data set.

    Default: BEST. width for columnar format, typically about 7.

    Range: 0-8

    Featured in: Example 2 on page 560 and Example 4 on page 565

MISSING

  • considers missing values as valid values to create the combinations of class variables. Special missing values that represent numeric values (the letters A through Z and the underscore (_) character) are each considered as a separate value.

    Default: If you omit MISSING, then PROC MEANS excludes the observations with a missing class variable value from the analysis.

    See also: SAS Language Reference: Concepts for a discussion of missing values that have special meaning.

    Featured in: Example 6 on page 571

NONOBS

  • suppresses the column that displays the total number of observations for each unique combination of the values of the class variables. This column corresponds to the _FREQ_ variable in the output data set.

    See also: The N Obs Statistic on page 556

    Featured in: Example 5 on page 568 and Example 6 on page 571

NOPRINT

  • See PRINT NOPRINT on page 531.

NOTHREADS

  • See THREADS NOTHREADS on page 534.

NOTRAP

  • disables floating point exception (FPE) recovery during data processing. By default, PROC MEANS traps these errors and sets the statistic to missing.

    In operating environments where the overhead of FPE recovery is significant, NOTRAP can improve performance. Note that normal SAS FPE handling is still in effect so that PROC MEANS terminates in the case of math exceptions.

NWAY

  • specifies that the output data set contain only statistics for the observations with the highest _TYPE_ and _WAY_ values. When you specify class variables, this corresponds to the combination of all class variables.

    Interaction: If you specify a TYPES statement or a WAYS statement, then PROC MEANS ignores this option.

    See also: Output Data Set on page 557

    Featured in: Example 10 on page 579

ORDER=DATA FORMATTED FREQ UNFORMATTED

  • specifies the sort order to create the unique combinations for the values of the class variables in the output, where

    DATA

    • orders values according to their order in the input data set.

      Interaction: If you use PRELOADFMT in the CLASS statement, then the order for the values of each class variable matches the order that PROC FORMAT uses to store the values of the associated user-defined format. If you use the CLASSDATA= option, then PROC MEANS uses the order of the unique values of each class variable in the CLASSDATA= data set to order the output levels. If you use both options, then PROC MEANS first uses the user-defined formats to order the output. If you omit EXCLUSIVE, then PROC MEANS appends after the user-defined format and the CLASSDATA= values the unique values of the class variables in the input data set based on the order in which they are encountered .

      Tip: By default, PROC FORMAT stores a format definition in sorted order. Use the NOTSORTED option to store the values or ranges of a user defined format in the order that you define them.

  • FORMATTED

    • orders values by their ascending formatted values. This order depends on your operating environment.

      Alias: FMT EXTERNAL

  • FREQ

    • orders values by descending frequency count so that levels with the most observations are listed first.

      Interaction: For multiway combinations of the class variables, PROC MEANS determines the order of a class variable combination from the individual class variable frequencies.

      Interaction: Use the ASCENDING option in the CLASS statement to order values by ascending frequency count.

  • UNFORMATTED

    • orders values by their unformatted values, which yields the same order as PROC SORT. This order depends on your operating environment.

      Alias: UNFMT INTERNAL

  • Default: UNFORMATTED

  • See also: Ordering the Class Values on page 551

PCTLDEF=

  • See QNTLDEF= on page 532.

PRINT NOPRINT

  • specifies whether PROC MEANS displays the statistical analysis. NOPRINT suppresses all the output.

    Default: PRINT

    Tip: Use NOPRINT when you want to create only an OUT= output data set.

    Featured in: For an example of NOPRINT, see Example 8 on page 576 and Example 12 on page 584

PRINTALLTYPES

  • displays all requested combinations of class variables (all _TYPE_ values) in the printed or displayed output. Normally, PROC MEANS shows only the NWAY type.

    Alias: PRINTALL

    Interaction: If you use the NWAY option, the TYPES statement, or the WAYS statement, then PROC MEANS ignores this option.

    Featured in: Example 4 on page 565

PRINTIDVARS

  • displays the values of the ID variables in printed or displayed output.

    Alias: PRINTIDS

    Interaction: Specify IDMIN to display the minimum value of the ID variables.

    See also: ID Statement on page 540

QMARKERS= number

  • specifies the default number of markers to use for the P 2 quantile estimation method. The number of markers controls the size of fixed memory space.

    Default: The default value depends on which quantiles you request. For the median (P50), number is 7. For the quartiles (P25 and P50), number is 25. For the quantiles P1, P5, P10, P90, P95, or P99, number is 105. If you request several quantiles, then PROC MEANS uses the largest value of number .

    Range: an odd integer greater than 3

    Tip: Increase the number of markers above the defaults settings to improve the accuracy of the estimate; reduce the number of markers to conserve memory and computing time.

    Main Discussion: Quantiles on page 555

QMETHOD=OSP2HIST

  • specifies the method that PROC MEANS uses to process the input data when it computes quantiles. If the number of observations is less than or equal to the QMARKERS= value and QNTLDEF=5, then both methods produce the same results.

    OS

    • uses order statistics. This is the same method that PROC UNIVARIATE uses.

      Note: This technique can be very memory- intensive .

  • P2HIST

    • uses the P 2 method to approximate the quantile.

  • Default: OS

    Restriction: When QMETHOD=P2, PROC MEANS will not compute weighted quantiles.

    Tip: When QMETHOD=P2, reliable estimations of some quantiles (P1,P5,P95,P99) may not be possible for some data sets.

    Main Discussion: Quantiles on page 555

QNTLDEF=12345

  • specifies the mathematical definition that PROC MEANS uses to calculate quantiles when QMETHOD=OS. To use QMETHOD=P2, you must use QNTLDEF=5.

    Default: 5

    Alias: PCTLDEF=

    Main discussion: Quantile and Related Statistics on page 1359

statistic-keyword(s)

  • specifies which statistics to compute and the order to display them in the output.

    The available keywords in the PROC statement are

    Descriptive statistic keywords

    CLM

    RANGE

    CSS

    SKEWNESS SKEW

    CV

    STDDEVSTD

    KURTOSIS KURT

    STDERR

    LCLM

    SUM

    MAX

    SUMWGT

    MEAN

    UCLM

    MIN

    USS

    N

    VAR

    NMISS

     

    Quantile statistic keywords

     

    MEDIANP50

    Q3P75

    P1

    P90

    P5

    P95

    P10

    P99

    Q1P25

    QRANGE

    Hypothesis testing keywords

     

    PROBT

    T

    Default: N, MEAN, STD, MIN, and MAX

    Requirement: To compute standard error, confidence limits for the mean, and the Student s t -test, you must use the default value of the VARDEF= option, which is DF. To compute skewness or kurtosis, you must use VARDEF=N or VARDEF=DF.

    Tip: Use CLM or both LCLM and UCLM to compute a two-sided confidence limit for the mean. Use only LCLM or UCLM, to compute a one-sided confidence limit.

    Main discussion: The definitions of the keywords and the formulas for the associated statistics are listed in Keywords and Formulas on page 1354.

    Featured in: Example 1 on page 558 and Example 3 on page 563

SUMSIZE= value

  • specifies the amount of memory that is available for data summarization when you use class variables. value may be one of the following:

    n n K n M n G

    • specifies the amount of memory available in bytes, kilobytes, megabytes, or gigabytes, respectively. If n is 0, then PROC MEANS use the value of the SAS system option SUMSIZE=.

  • MAXIMUMMAX

    • specifies the maximum amount of memory that is available.

    Default: The value of the SUMSIZE= system option.

    Tip: For best results, do not make SUMSIZE= larger than the amount of physical memory that is available for the PROC step. If additional space is needed, then PROC MEANS uses utility files.

    See also: The SAS system option SUMSIZE= in SAS Language Reference: Dictionary .

    Main discussion: Computational Resources on page 552

THREADS NOTHREADS

  • enables or disables parallel processing of the input data set. This option overrides the SAS system option THREADS NOTHREADS. See SAS Language Reference: Concepts for more information about parallel processing.

    Default: value of SAS system option THREADS NOTHREADS.

    Interaction: PROC MEANS honors the SAS system option THREADS except when a BY statement is specified or the value of the SAS system option CPUCOUNT is less than 2. You can use THREADS in the PROC MEANS statement to force PROC MEANS to use parallel processing in these situations.

VARDEF= divisor

  • specifies the divisor to use in the calculation of the variance and standard deviation. Table 28.1 on page 534 shows the possible values for divisor and associated divisors.

    Table 28.1: Possible Values for VARDEF=

    Value

    Divisor

    Formula for Divisor

    DF

    degrees of freedom

    n ˆ’ 1

    N

    number of observations

    n

    WDF

    sum of weights minus one

    ( & pound ; i w i ) ˆ’ 1

    WEIGHT WGT

    sum of weights

    i w i

    The procedure computes the variance as CSS/divisor , where CSS is the corrected sums of squares and equals ˆ‘ ( x i ˆ’ x ) 2 . When you weight the analysis variables, CSS equals ˆ‘ w i ( x i ˆ’ x w ) 2 , where x w is the weighted mean.

    Default: DF

    Requirement: To compute the standard error of the mean, confidence limits for the mean, or the Student s t- test, use the default value of VARDEF=.

    Tip: When you use the WEIGHT statement and VARDEF=DF, the variance is an estimate of ƒ 2 , where the variance of the i th observation is var ( x i ) = ƒ 2 / w i and is the weight for the i th observation. This yields an estimate of the variance of an observation with unit weight.

    Tip: When you use the WEIGHT statement and VARDEF=WGT, the computed variance is asymptotically (for large n ) an estimate of ƒ 2 / w , where w is the average weight. This yields an asymptotic estimate of the variance of an observation with average weight.

    See also: Weighted Statistics Example on page 65

    Main discussion: Keywords and Formulas on page 1354

BY Statement

Produces separate statistics for each BY group.

Main discussion: BY on page 58

See also: Comparison of the BY and CLASS Statements on page 539

Featured in: Example 3 on page 563

BY <DESCENDING> variable-1 < <DESCENDING> variable-n > <NOTSORTED>;

Required Arguments

variable

  • specifies the variable that the procedure uses to form BY groups. You can specify more than one variable. If you omit the NOTSORTED option in the BY statement, then the observations in the data set either must be sorted by all the variables that you specify or must be indexed appropriately. Variables in a BY statement are called BY variables .

Options

DESCENDING

  • specifies that the observations are sorted in descending order by the variable that immediately follows the word DESCENDING in the BY statement.

NOTSORTED

  • specifies that observations are not necessarily sorted in alphabetic or numeric order. The observations are sorted in another way, for example, chronological order.

    The requirement for ordering or indexing observations according to the values of BY variables is suspended for BY-group processing when you use the NOTSORTED option. In fact, the procedure does not use an index if you specify NOTSORTED. The procedure defines a BY group as a set of contiguous observations that have the same values for all BY variables. If observations with the same values for the BY variables are not contiguous, then the procedure treats each contiguous set as a separate BY group.

Using the BY Statement with the SAS System Option NOBYLINE

If you use the BY statement with the SAS system option NOBYLINE, which suppresses the BY line that normally appears in output that is produced with BY-group processing, then PROC MEANS always starts a new page for each BY group. This behavior ensures that if you create customized BY lines by putting BY-group information in the title and suppressing the default BY lines with NOBYLINE, then the information in the titles matches the report on the pages. (See Creating Titles That Contain BY-Group Information on page 20 and Suppressing the Default BY Line on page 20.)

CLASS Statement

Specifies the variables whose values define the subgroup combinations for the analysis.

Tip: You can use multiple CLASS statements.

Tip: Some CLASS statement options are also available in the PROC MEANS statement. They affect all CLASS variables. Options that you specify in a CLASS statement apply only to the variables in that CLASS statement.

See also: For information about how the CLASS statement groups formatted values, see Formatted Values on page 25.

Featured in: Example 2 on page 560, Example 4 on page 565, Example 5 on page 568, Example 6 on page 571, and Example 10 on page 579

CLASS variable(s) </ options >;

Required Arguments

variable(s)

  • specifies one or more variables that the procedure uses to group the data. Variables in a CLASS statement are referred to as class variables . Class variables are numeric or character. Class variables can have continuous values, but they typically have a few discrete values that define levels of the variable. You do not have to sort the data by class variables.

    Interaction: Use the TYPES statement or the WAYS statement to control which class variables that PROC MEANS uses to group the data.

    Tip: To reduce the number of class variable levels, use a FORMAT statement to combine variable values. When a format combines several internal values into one formatted value, PROC MEANS outputs the lowest internal value.

    See also: Using Class Variables on page 550

Options

ASCENDING

  • specifies to sort the class variable levels in ascending order.

    Alias: ASCEND

    Interaction: PROC MEANS issues a warning message if you specify both ASCENDING and DESCENDING and ignores both options.

    Featured in: Example 10 on page 579

DESCENDING

  • specifies to sort the class variable levels in descending order.

    Alias: DESCEND

    Interaction: PROC MEANS issues a warning message if you specify both ASCENDING and DESCENDING and ignores both options.

EXCLUSIVE

  • excludes from the analysis all combinations of the class variables that are not found in the preloaded range of user-defined formats.

    Requirement: You must specify PRELOADFMT to preload the class variable formats.

    Featured in: Example 6 on page 571

GROUPINTERNAL

  • specifies not to apply formats to the class variables when PROC MEANS groups the values to create combinations of class variables.

    Interaction: If you specify the PRELOADFMT option, then PROC MEANS ignores the GROUPINTERNAL option and uses the formatted values.

    Interaction: If you specify the ORDER=FORMATTED option, then PROC MEANS ignores the GROUPINTERNAL option and uses the formatted values.

    Tip: This option saves computer resources when the numeric class variables contain discrete values.

    See also: Computer Resources on page 539

MISSING

  • considers missing values as valid values for the class variable levels. Special missing values that represent numeric values (the letters A through Z and the underscore (_) character) are each considered as a separate value.

    Default: If you omit MISSING, then PROC MEANS excludes the observations with a missing class variable value from the analysis.

    See also: SAS Language Reference: Concepts for a discussion of missing values with special meanings.

    Featured in: Example 10 on page 579

MLF

  • enables PROC MEANS to use the primary and secondary format labels for a given range or overlapping ranges to create subgroup combinations when a multilabel format is assigned to a class variable.

    Requirement: You must use PROC FORMAT and the MULTILABEL option in the VALUE statement to create a multilabel format.

    Interaction: If you use the OUTPUT statement with MLF, then the class variable contains a character string that corresponds to the formatted value. Because the formatted value becomes the internal value, the length of this variable is the number of characters in the longest format label.

    Interaction: Using MLF with ORDER=FREQ may not produce the order that you expect for the formatted values.

    Tip: If you omit MLF, then PROC MEANS uses the primary format labels, which corresponds to using the first external format value, to determine the subgroup combinations.

    See also: The MULTILABEL option in the VALUE statement of the FORMAT procedure on page 440.

    Featured in: Example 5 on page 568

    Note: When the formatted values overlap, one internal class variable value maps to more than one class variable subgroup combination. Therefore, the sum of the N statistics for all subgroups is greater than the number of observations in the data set (the overall N statistic).

ORDER=DATA FORMATTED FREQ UNFORMATTED

  • specifies the order to group the levels of the class variables in the output, where

    DATA

    • orders values according to their order in the input data set.

      Interaction: If you use PRELOADFMT, then the order of the values of each class variable matches the order that PROC FORMAT uses to store the values of the associated user-defined format. If you use the CLASSDATA= option in the PROC statement, then PROC MEANS uses the order of the unique values of each class variable in the CLASSDATA= data set to order the output levels. If you use both options, then PROC MEANS first uses the user-defined formats to order the output. If you omit EXCLUSIVE in the PROC statement, then PROC MEANS appends after the user-defined format and the CLASSDATA= values the unique values of the class variables in the input data set based on the order in which they are encountered.

      Tip: By default, PROC FORMAT stores a format definition in sorted order. Use the NOTSORTED option to store the values or ranges of a user defined format in the order that you define them.

      Featured in: Example 10 on page 579

  • FORMATTED

    • orders values by their ascending formatted values. This order depends on your operating environment. If no format has been assigned to a class variable, then the default format, BEST12., is used.

      Alias: FMT EXTERNAL

      Featured in: Example 5 on page 568

  • FREQ

    • orders values by descending frequency count so that levels with the most observations are listed first.

      Interaction: For multiway combinations of the class variables, PROC MEANS determines the order of a level from the individual class variable frequencies.

      Interaction: Use the ASCENDING option to order values by ascending frequency count.

      Featured in: Example 5 on page 568

  • UNFORMATTED

    • orders values by their unformatted values, which yields the same order as PROC SORT. This order depends on your operating environment. This sort sequence is particularly useful for displaying dates chronologically.

      Alias: UNFMT INTERNAL

  • Default: UNFORMATTED

    Tip: By default, all orders except FREQ are ascending. For descending orders, use the DESCENDING option.

    See also: Ordering the Class Values on page 551

PRELOADFMT

  • specifies that all formats are preloaded for the class variables.

    Requirement: PRELOADFMT has no effect unless you specify either COMPLETETYPES, EXCLUSIVE, or ORDER=DATA and you assign formats to the class variables.

    Interaction: To limit PROC MEANS output to the combinations of formatted class variable values present in the input data set, use the EXCLUSIVE option in the CLASS statement.

    Interaction: To include all ranges and values of the user-defined formats in the output, even when the frequency is zero, use COMPLETETYPES in the PROC statement.

    Featured in: Example 6 on page 571

Comparison of the BY and CLASS Statements

Using the BY statement is similar to using the CLASS statement and the NWAY option in that PROC MEANS summarizes each BY group as an independent subset of the input data. Therefore, no overall summarization of the input data is available. However, unlike the CLASS statement, the BY statement requires that you previously sort BY variables.

When you use the NWAY option, PROC MEANS might encounter insufficient memory for the summarization of all the class variables. You can move some class variables to the BY statement. For maximum benefit, move class variables to the BY statement that are already sorted or that have the greatest number of unique values.

You can use the CLASS and BY statements together to analyze the data by the levels of class variables within BY groups. See Example 3 on page 563.

How PROC MEANS Handles Missing Values for Class Variables

By default, if an observation contains a missing value for any class variable, then PROC MEANS excludes that observation from the analysis. If you specify the MISSING option in the PROC statement, then the procedure considers missing values as valid levels for the combination of class variables.

Specifying the MISSING option in the CLASS statement allows you to control the acceptance of missing values for individual class variables.

Computer Resources

The total of unique class values that PROC MEANS allows depends on the amount of computer memory that is available. See Computational Resources on page 552 for more information.

The GROUPINTERNAL option can improve computer performance because the grouping process is based on the internal values of the class variables. If a numeric class variable is not assigned a format and you do not specify GROUPINTERNAL, then PROC MEANS uses the default format, BEST12., to format numeric values as character strings. Then PROC MEANS groups these numeric variables by their character values, which takes additional time and computer memory.

FREQ Statement

Specifies a numeric variable that contains the frequency of each observation.

Main discussion: FREQ on page 61

FREQ variable ;

Required Arguments

variable

  • specifies a numeric variable whose value represents the frequency of the observation. If you use the FREQ statement, then the procedure assumes that each observation represents n observations, where n is the value of variable . If n is not an integer, then SAS truncates it. If n is less than 1 or is missing, then the procedure does not use that observation to calculate statistics.

    The sum of the frequency variable represents the total number of observations.

    Note: The FREQ variable does not affect how PROC MEANS identifies multiple extremes when you use the IDGROUP syntax in the OUTPUT statement.

ID Statement

Includes additional variables in the output data set.

See Also: Discussion of id-group-specification in OUTPUT Statement on page 540.

ID variable(s) ;

Required Arguments

variable(s)

  • identifies one or more variables from the input data set whose maximum values for groups of observations PROC MEANS includes in the output data set.

    Interaction: Use IDMIN in the PROC statement to include the minimum value of the ID variables in the output data set.

    Tip: Use the PRINTIDVARS option in the PROC statement to include the value of the ID variable in the displayed output.

Selecting the Values of the ID Variables

When you specify only one variable in the ID statement, the value of the ID variable for a given observation is the maximum (minimum) value found in the corresponding group of observations in the input data set. When you specify multiple variables in the ID statement, PROC MEANS selects the maximum value by processing the variables in the ID statement in the order that you list them. PROC MEANS determines which observation to use from all the ID variables by comparing the values of the first ID variable. If more than one observation contains the same maximum (minimum) ID value, then PROC MEANS uses the second and subsequent ID variable values as tiebreakers. In any case, all ID values are taken from the same observation for any given BY group or classification level within a type.

See Sorting Orders for Character Variables on page 1028 for information on how PROC MEANS compares character values to determine the maximum value.

OUTPUT Statement

Writes statistics to a new SAS data set.

Tip: You can use multiple OUTPUT statements to create several OUT= data sets.

Featured in: Example 8 on page 576, Example 9 on page 578, Example 10 on page 579, Example 11 on page 581, and Example 12 on page 584

OUTPUT <OUT= SAS-data-set > < output-statistic-specification(s) >

  • < id-group-specification(s) > < maximum-id-specification(s) >

  • < minimum-id-specification(s) > </ option(s) >;

Options

OUT= SAS-data-set

  • names the new output data set. If SAS-data-set does not exist, then PROC MEANS creates it. If you omit OUT=, then the data set is named DATA n , where n is the smallest integer that makes the name unique.

  • Default: DATA n

  • Tip: You can use data set options with the OUT= option. See Data Set Options on page 18 for a list.

output-statistic-specification(s)

  • specifies the statistics to store in the OUT= data set and names one or more variables that contain the statistics. The form of the output-statistic-specification is

    • statistic-keyword <( variable-list )>=< name(s) >

  • where

  • statistic-keyword

    • specifies which statistic to store in the output data set. The available statistic keywords are

      Descriptive statistics keyword

      CSS

      RANGE

      CV

      SKEWNESSSKEW

      KURTOSISKURT

      STDDEV STD

      LCLM

      STDERR

      MAX

      SUM

      MEAN

      SUMWGT

      MIN

      UCLM

      N

      USS

      NMISS

      VAR

      Quantile statistics keyword

      MEDIANP50

      Q3P75

      P1

      P90

      P5

      P95

      P10

      P99

      Q1P25

      QRANGE

      Hypothesis testing keyword

      PROBT

      T

    • By default the statistics in the output data set automatically inherit the analysis variable s format, informat, and label. However, statistics computed for N, NMISS, SUMWGT, USS, CSS, VAR, CV, T, PROBT, SKEWNESS, and KURTOSIS will not inherit the analysis variable s format because this format may be invalid for these statistics (for example, dollar or datetime formats).

    • Restriction: If you omit variable and name(s) , then PROC MEANS allows the statistic-keyword only once in a single OUTPUT statement, unless you also use the AUTONAME option.

    • Featured in: Example 8 on page 576, Example 9 on page 578, Example 11 on page 581, and Example 12 on page 584

  • variable-list

    • specifies the names of one or more numeric analysis variables whose statistics you want to store in the output data set.

    • Default: all numeric analysis variables

  • name(s)

    • specifies one or more names for the variables in output data set that will contain the analysis variable statistics. The first name contains the statistic for the first analysis variable; the second name contains the statistic for the second analysis variable; and so on.

    • Default: the analysis variable name. If you specify AUTONAME, then the default is the combination of the analysis variable name and the statistic-keyword .

    • Interaction: If you specify variable-list , then PROC MEANS uses the order in which you specify the analysis variables to store the statistics in the output data set variables.

    • Featured in: Example 8 on page 576

  • Default: If you use the CLASS statement and an OUTPUT statement without an output-statistic-specification , then the output data set contains five observations for each combination of class variables: the value of N, MIN, MAX, MEAN, and STD. If you use the WEIGHT statement or the WEIGHT option in the VAR statement, then the output data set also contains an observation with the sum of weights (SUMWGT) for each combination of class variables.

  • Tip: Use the AUTONAME option to have PROC MEANS generate unique names for multiple variables and statistics.

id-group-specification

  • combines the features and extends the ID statement, the IDMIN option in the PROC statement, and the MAXID and MINID options in the OUTPUT statement to create an OUT= data set that identifies multiple extreme values. The form of the id-group-specification is

    • IDGROUP (<MINMAX ( variable-list-1 ) < MINMAX ( variable-list-n )>>

      • <<MISSING> <OBS> <LAST>> OUT <[ n ]>

      • ( id-variable-list )=< name(s) >)

  • MINMAX( variable-list )

    • specifies the selection criteria to determine the extreme values of one or more input data set variables specified in variable-list . Use MIN to determine the minimum extreme value and MAX to determine the maximum extreme value.

      When you specify multiple selection variables, the ordering of observations for the selection of n extremes is done the same way that PROC SORT sorts data with multiple BY variables. PROC MEANS concatenates the variable values into a single key. The MAX( variable-list ) selection criterion is similar to using PROC SORT and the DESCENDING option in the BY statement.

    • Default: If you do not specify MIN or MAX, then PROC MEANS uses the observation number as the selection criterion to output observations.

    • Restriction: If you specify criteria that are contradictory, then PROC MEANS uses only the first selection criterion.

    • Interaction: When multiple observations contain the same extreme values in all the MIN or MAX variables, PROC MEANS uses the observation number to resolve which observation to write to the output. By default, PROC MEANS uses the first observation to resolve any ties. However, if you specify the LAST option, then PROC MEANS uses the last observation to resolve any ties.

  • LAST

    • specifies that the OUT= data set contains values from the last observation (or the last n observations, if n is specified). If you do not specify LAST, then the OUT= data set contains values from the first observation (or the first n observations, if n is specified). The OUT= data set might contain several observations because in addition to the value of the last (first) observation, the OUT= data set contains values from the last (first) observation of each subgroup level that is defined by combinations of class variable values.

    • Interaction: When you specify MIN or MAX and when multiple observations contain the same extreme values, PROC MEANS uses the observation number to resolve which observation to save to the OUT= data set. If you specify LAST, then PROC MEANS uses the later observations to resolve any ties. If you do not specify LAST, then PROC MEANS uses the earlier observations to resolve any ties.

  • MISSING

    • specifies that missing values be used in selection criteria.

    • Alias: MISS

  • OBS

    • includes an _OBS_ variable in the OUT= data set that contains the number of the observation in the input data set where the extreme value was found.

    • Interaction: If you use WHERE processing, then the value of _OBS_ might not correspond to the location of the observation in the input data set.

    • Interaction: If you use [ n ] to write multiple extreme values to the output, then PROC MEANS creates n _OBS_ variables and uses the suffix n to create the variable names, where n is a sequential integer from 1 to n .

  • [ n ]

    • specifies the number of extreme values for each variable in id-variable-list to include in the OUT= data set. PROC MEANS creates n new variables and uses the suffix _n to create the variable names, where n is a sequential integer from 1 to n .

      By default, PROC MEANS determines one extreme value for each level of each requested type. If n is greater than one, then n extremes are output for each level of each type. When n is greater than one and you request extreme value selection, the time complexity is Q( T * N log 2 n ), where T is the number of types requested and is the number of observations in the input data set. By comparison, to group the entire data set, the time complexity is Q( N log 2 n ).

    • Default: 1

    • Range: an integer between 1 and 100

    • Example: To output two minimum extreme values for each variable, use

       idgroup(min(x) out[2](x y z)=MinX MinY MinZ); 
      • The OUT= data set contains the variables MinX_1, MinX_2, MinY_1, MinY_2, MinZ_1, and MinZ_2.

  • ( id-variable-list )

    • identifies one or more input data set variables whose values PROC MEANS includes in the OUT= data set. PROC MEANS determines which observations to output by the selection criteria that you specify (MIN, MAX, and LAST).

  • name(s)

    • specifies one or more names for variables in the OUT= data set.

    • Default: If you omit name , then PROC MEANS uses the names of variables in the id-variable-list .

    • Tip: Use the AUTONAME option to automatically resolve naming conflicts.

  • Alias: IDGRP

  • Requirement: You must specify the MINMAX selection criteria first and OUT( id-variable-list )= after the suboptions MISSING, OBS, and LAST.

  • Tip: You can use id-group-specification to mimic the behavior of the ID statement and a maximum-id-specification or minimum-id-specification in the OUTPUT statement.

  • Tip: When you want the output data set to contain extreme values along with other id variables, it is more efficient to include them in the id-variable-list than to request separate statistics. For example, the statement

     output idgrp(max(x) out(x a b)= ); 

    is more efficient than the statement

     output idgrp(max(x) out(a b)= ) max(x)=; 
  • Featured in: Example 8 on page 576 and Example 12 on page 584

  • CAUTION:

    • The IDGROUP syntax allows you to create output variables with the same name. When this happens, only the first variable appears in the output data set. Use the AUTONAME option to automatically resolve these naming conflicts.

  • Note: If you specify fewer new variable names than the combination of analysis variables and identification variables, then the remaining output variables use the corresponding names of the ID variables as soon as PROC MEANS exhausts the list of new variable names.

maximum-id-specification(s)

  • specifies that one or more identification variables be associated with the maximum values of the analysis variables. The form of the maximum-id-specification is

    • MAXID <( variable-1 <( id-variable-list-1 )> < variable-n

      • <( id-variable-list-n )>>)> = name(s)

  • variable

    • identifies the numeric analysis variable whose maximum values PROC MEANS determines. PROC MEANS may determine several maximum values for a variable because, in addition to the overall maximum value, subgroup levels, which are defined by combinations of class variables values, also have maximum values.

    • Tip: If you use an ID statement and omit variable , then PROC MEANS uses all analysis variables.

  • id-variable-list

    • identifies one or more variables whose values identify the observations with the maximum values of the analysis variable.

    • Default: the ID statement variables

  • name(s)

    • specifies the names for new variables that contain the values of the identification variable associated with the maximum value of each analysis variable.

  • Tip: If you use an ID statement, and omit variable and id-variable , then PROC MEANS associates all ID statement variables with each analysis variable. Thus, for each analysis variable, the number of variables that are created in the output data set equals the number of variables that you specify in the ID statement.

  • Tip: Use the AUTONAME option to automatically resolve naming conflicts.

  • Limitation: If multiple observations contain the maximum value within a class level, then PROC MEANS saves the value of the ID variable for only the first of those observations in the output data set.

  • Featured in: Example 11 on page 581

  • CAUTION:

    • The MAXID syntax allows you to create output variables with the same name. When this happens, only the first variable appears in the output data set. Use the AUTONAME option to automatically resolve these naming conflicts.

  • Note: If you specify fewer new variable names than the combination of analysis variables and identification variables, then the remaining output variables use the corresponding names of the ID variables as soon as PROC MEANS exhausts the list of new variable names.

minid-specification

  • See the description of maximum-id-specification on page 544. This option behaves in exactly the same way, except that PROC MEANS determines the minimum values instead of the maximum values. The form of the minid-specification is

    • MINID<( variable-1 <( id-variable-list-1 )> < variable-n

      • <( id-variable-list-n )>>)> = name(s)

AUTOLABEL

  • specifies that PROC MEANS appends the statistic name to the end of the variable label. If an analysis variable has no label, then PROC MEANS creates a label by appending the statistic name to the analysis variable name.

  • Featured in: Example 12 on page 584

AUTONAME

  • specifies that PROC MEANS creates a unique variable name for an output statistic when you do not explicitly assign the variable name in the OUTPUT statement. This is accomplished by appending the statistic-keyword to the end of the input variable name from which the statistic was derived. For example, the statement

     output min(x)=/autoname; 

    produces the x_Min variable in the output data set.

  • AUTONAME activates the SAS internal mechanism to automatically resolve conflicts in the variable names in the output data set. Duplicate variables will not generate errors. As a result, the statement

     output min(x)= min(x)=/autoname; 

    produces two variables, x_Min and x_Min2, in the output data set.

  • Featured in: Example 12 on page 584

KEEPLEN

  • specifies that statistics in the output data set inherit the length of the analysis variable that PROC MEANS uses to derive them.

  • CAUTION:

    • You permanently lose numeric precision when the length of the analysis variable causes PROC MEANS to truncate or round the value of the statistic. However, the precision of the statistic will match that of the input.

LEVELS

  • includes a variable named _LEVEL_ in the output data set. This variable contains a value from 1 to n that indicates a unique combination of the values of class variables (the values of _TYPE_ variable).

  • Main discussion: Output Data Set on page 557

  • Featured in: Example 8 on page 576

NOINHERIT

  • specifies that the variables in the output data set that contain statistics do not inherit the attributes (label and format) of the analysis variables which are used to derive them.

  • Tip: By default, the output data set includes an output variable for each analysis variable and for five observations that contain N, MIN, MAX, MEAN, and STDDEV. Unless you specify NOINHERIT, this variable inherits the format of the analysis variable, which may be invalid for the N statistic (for example, datetime formats).

WAYS

  • includes a variable named _WAY_ in the output data set. This variable contains a value from 1 to the maximum number of class variables that indicates how many class variables PROC MEANS combines to create the TYPE value.

  • Main discussion: Output Data Set on page 557

  • See also: WAYS Statement on page 548

  • Featured in: Example 8 on page 576

TYPES Statement

Identifies which of the possible combinations of class variables to generate.

Main discussion: Output Data Set on page 557

Requirement: CLASS statement

Featured in: Example 2 on page 560, Example 5 on page 568, and Example 12 on page 584

TYPES request(s) ;

Required Arguments

request(s)

  • specifies which of the 2 k combinations of class variables PROC MEANS uses to create the types, where k is the number of class variables. A request is composed of one class variable name, several class variable names separated by asterisks , or ().

  • To request class variable combinations quickly, use a grouping syntax by placing parentheses around several variables and joining other variables or variable combinations. For example, the following statements illustrate grouping syntax:

    Request

    Equivalent to

    types A*(B C);

    types A*B A*C;

    types (A B)*(C D);

    types A*C A*D B*C B*D;

    types (A B C)*D;

    types A*D B*D C*D;

  • Interaction The CLASSDATA= option places constraints on the NWAY type. PROC MEANS generates all other types as if derived from the resulting NWAY type.

  • Tip: Use ( ) to request the overall total (_TYPE_=0).

  • Tip: If you do not need all types in the output data set, then use the TYPES statement to specify particular subtypes rather than applying a WHERE clause to the data set. Doing so saves time and computer memory.

Order of Analyses in the Output

The analyses are written to the output in order of increasing values of the _TYPE_ variable, which is calculated by PROC MEANS. The _TYPE_ variable has a unique value for each combination of class variables; the values are determined by how you specify the CLASS statement, not the TYPES statement. Therefore, if you specify

 class A B C;  types (A B)*C; 

then the B*C analysis (_TYPE_=3) is written first, followed by the A*C analysis (_TYPE_=5). However, if you specify

 class B A C;  types (A B)*C; 

then the A*C analysis comes first.

The _TYPE_ variable is calculated even if no output data set is requested. For more information about the _TYPE_ variable, see Output Data Set on page 557.

VAR Statement

Identifies the analysis variables and their order in the output.

Default: If you omit the VAR statement, then PROC MEANS analyzes all numeric variables that are not listed in the other statements. When all variables are character variables, PROC MEANS produces a simple count of observations.

Tip: You can use multiple VAR statements.

See also: Chapter 47, The SUMMARY Procedure, on page 1191

Featured in: Example 1 on page 558

VAR variable(s) </ WEIGHT= weight-variable >;

Required Arguments

variable(s)

  • identifies the analysis variables and specifies their order in the results.

Option

WEIGHT= weight-variable

  • specifies a numeric variable whose values weight the values of the variables that are specified in the VAR statement. The variable does not have to be an integer. If the value of the weight variable is

    Weight value

    PROC MEANS

    counts the observation in the total number of observations

    less than 0

    converts the value to zero and counts the observation in the total number of observations

    missing

    excludes the observation

    To exclude observations that contain negative and zero weights from the analysis, use EXCLNPWGT. Note that most SAS/STAT procedures, such as PROC GLM, exclude negative and zero weights by default.

    The weight variable does not change how the procedure determines the range, extreme values, or number of missing values.

  • Restriction: To compute weighted quantiles, use QMETHOD=OS in the PROC statement.

  • Restriction: Skewness and kurtosis are not available with the WEIGHT option.

  • Tip: When you use the WEIGHT option, consider which value of the VARDEF= option is appropriate. See the discussion of VARDEF= on page 534.

  • Tip: Use the WEIGHT option in multiple VAR statements to specify different weights for the analysis variables.

  • Note: Prior to Version 7 of SAS, the procedure did not exclude the observations with missing weights from the count of observations.

WAYS Statement

Specifies the number of ways to make unique combinations of class variables.

Tip: Use the TYPES statement to specify additional combinations of class variables.

Featured in: Example 6 on page 571

WAYS list ;

Required Arguments

list

  • specifies one or more integers that define the number of class variables to combine to form all the unique combinations of class variables. For example, you can specify 2 for all possible pairs and 3 for all possible triples. The list can be specified in the following ways:

    • m

      m1 m2 mn

      m1,m2, ,mn

      m TO n <BY increment >

      m1,m2, TO m3 <BY increment >, m4

  • Range: 0 to maximum number of class variables

  • Example: To create the two-way types for the classification variables A, B, and C, use

     class A B C ;  ways 2; 
    • This WAYS statement is equivalent to specifying a*b, a*c, and b*c in the TYPES statement.

  • See also: WAYS option on page 546

WEIGHT Statement

Specifies weights for observations in the statistical calculations.

See also: For information on how to calculate weighted statistics and for an example that uses the WEIGHT statement, see WEIGHT on page 63

WEIGHT variable ;

Required Arguments

variable

  • specifies a numeric variable whose values weight the values of the analysis variables. The values of the variable do not have to be integers. If the value of the weight variable is

    Weight value

    PROC MEANS

    counts the observation in the total number of observations

    less than 0

    converts the value to zero and counts the observation in the total number of observations

    missing

    excludes the observation

  • To exclude observations that contain negative and zero weights from the analysis, use EXCLNPWGT. Note that most SAS/STAT procedures, such as PROC GLM, exclude negative and zero weights by default.

  • Restriction: To compute weighted quantiles, use QMETHOD=OS in the PROC statement.

  • Restriction: Skewness and kurtosis are not available with the WEIGHT statement.

  • Interaction: If you use the WEIGHT= option in a VAR statement to specify a weight variable, then PROC MEANS uses this variable instead to weight those VAR statement variables.

  • Tip: When you use the WEIGHT statement, consider which value of the VARDEF= option is appropriate. See the discussion of VARDEF= on page 534 and the calculation of weighted statistics in Keywords and Formulas on page 1354 for more information.

  • Note: Prior to Version 7 of SAS, the procedure did not exclude the observations with missing weights from the count of observations.

CAUTION:

  • Single extreme weight values can cause inaccurate results. When one (and only one) weight value is many orders of magnitude larger than the other weight values (for example, 49 weight values of 1 and one weight value of 1 —10 14 ), certain statistics might not be within acceptable accuracy limits. The affected statistics are those that are based on the second moment (such as standard deviation, corrected sum of squares, variance, and standard error of the mean). Under certain circumstances, no warning is written to the SAS log.




Base SAS 9.1.3 Procedures Guide (Vol. 1)
Base SAS 9.1 Procedures Guide, Volumes 1, 2, 3 and 4
ISBN: 1590472047
EAN: 2147483647
Year: 2004
Pages: 260

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net