Syntax | SAS.STAT 9.1 Users Guide (Vol. 5)

The following statements are available in PROC NPAR1WAY:

PROC NPAR1WAY < options > ;
- BY variables ;
- CLASS variable ;
- EXACT statistic-options < / computation-options > ;
- FREQ variable ;
- OUTPUT < OUT= SAS-data-set >< options > ;
- VAR variables ;

Both the PROC NPAR1WAY statement and the CLASS statement are required for the NPAR1WAY procedure. The rest of this section gives detailed syntax information for the BY, CLASS, EXACT, FREQ, OUTPUT, and VAR statements in alphabetical order after the description of the PROC NPAR1WAY statement. Table 52.1 summarizes the basic function of each PROC NPAR1WAY statement.

Table 52.1: Summary of PROC NPAR1WAY Statements
Statement	Description
BY	provides separate analyses for each BY group
CLASS	identifies the classification variable
EXACT	requests exact tests
FREQ	identifies a frequency variable
OUTPUT	requests an output data set
VAR	identifies analysis variables

PROC NPAR1WAY Statement

PROC NPAR1WAY < options > ;

The PROC NPAR1WAY statement invokes the procedure and optionally identifies the input data set or requests particular analyses. By default, the procedure uses the most recently created SAS data set and omits missing values from the analysis. If you do not specify any analysis options, PROC NPAR1WAY performs an analysis of variance (option ANOVA), tests for location differences (options WILCOXON, MEDIAN, SAVAGE, and VW), and performs empirical distribution function tests (option EDF).

The following table lists the options available with the PROC NPAR1WAY statement. Descriptions follow in alphabetical order.

Table 52.2: PROC NPAR1WAY Statement Options
Task	Options
Specify the input data set	DATA=
Include missing CLASS values	MISSING
Suppress all displayed output	NOPRINT
Request analyses	AB ANOVA D EDF KLOTZ MEDIAN MOOD SAVAGE SCORES=DATA ST VW WILCOXON
Suppress continuity correction	CORRECT=NO

You can specify the following options in the PROC NPAR1WAY statement:

requests an analysis using Ansari-Bradley scores. See the section Ansari-Bradley Scores on page 3168 for more information.

ANOVA

requests a standard analysis of variance on the raw data.

CORRECT=NO

suppresses the continuity correction for the Wilcoxon two-sample test and the Siegel-Tukey two-sample test. See the section Simple Linear Rank Tests for Two-Sample Data on page 3163 for more information.

requests the one-sided Kolmogorov-Smirnov D + and D ˆ’ statistics and their asymptotic p -values, in addition to the two-sided D statistic produced by the EDF option for two-sample data. The D option invokes the EDF option. The statistics D + and D ˆ’ are provided automatically if you request exact Kolmogorov-Smirnov statistics with the KS option in the EXACT statement for two-sample data. See the section Tests Based on the Empirical Distribution Function on page 3168 for details on Kolmogorov-Smirnov statistics.

DATA = SAS-data-set

names the SAS data set to be analyzed by PROC NPAR1WAY. If you omit the DATA= option, the procedure uses the most recently created SAS data set.

EDF

requests statistics based on the empirical distribution function. These include the Kolmogorov-Smirnov and Cramer-von Mises tests and, if there are only two classification levels, the Kuiper test. See the section Tests Based on the Empirical Distribution Function on page 3168 for more information.

The EDF option produces the Kolmogorov-Smirnov D statistic for two-sample data. You can also request the one-sided D + and D ˆ’ statistics for two-sample data with the D option.

KLOTZ

requests an analysis using Klotz scores. See the section Klotz Scores on page 3168 for more information.

MEDIAN

requests an analysis using median scores. When there are two classification levels, or two samples, this option produces the two-sample median test. When there are more than two samples, this option produces the multisample median test, which is also known as the Brown-Mood test. See the section Median Scores on page 3167 for more information.

MISSING

treats missing values of the CLASS variable as a valid class level.

MOOD

requests an analysis using Mood scores. See the section Mood Scores on page 3168 for more information.

NOPRINT

suppresses the display of all output. You can use the NOPRINT option when you only want to create an output data set. Note that this option temporarily disables the Output Delivery System (ODS). For more information, see Chapter 14, Using the Output Delivery System.

SAVAGE

requests an analysis using Savage scores. See the section Savage Scores on page 3167 for more information.

SCORES=DATA

requests an analysis using input data as scores. This option gives you the flexibility to construct any scores for your data with the DATA step and then analyze these scores with PROC NPAR1WAY. See the section Scores for Linear Rank and One-Way ANOVA Tests on page 3166 for more information. Using the SCORES=DATA option for raw (unscored) two-sample data produces a permutation test known as Pitman s test.

requests an analysis using Siegel-Tukey scores. See the section Siegel-Tukey Scores on page 3167 for more information.

requests an analysis using Van der Waerden scores. See the section Van der Waerden Scores on page 3167 for more information.

WILCOXON

requests an analysis using Wilcoxon scores. When there are two classification levels, or two samples, this option produces the Wilcoxon rank-sum test. For any number of classification levels, this option produces the Kruskal-Wallis test. See the section Wilcoxon Scores on page 3166 for more information.

BY Statement

BY variables ;

You can specify a BY statement with PROC NPAR1WAY to obtain separate analyses on observations in groups defined by the BY variables. When a BY statement appears, the procedure expects the input data set to be sorted in order of the BY variables.

If your input data set is not sorted in ascending order, use one of the following alternatives:

Sort the data using the SORT procedure with a similar BY statement.
Specify the BY statement option NOTSORTED or DESCENDING in the BY statement for the NPAR1WAY procedure. The NOTSORTED option does not mean that the data are unsorted but rather that the data are arranged in groups (according to values of the BY variables) and that these groups are not necessarily in alphabetical or increasing numeric order.
Create an index on the BY variables using the DATASETS procedure.

For more information on the BY statement, refer to the discussion in SAS Language Reference: Concepts . For more information on the DATASETS procedure, refer to the discussion in the SAS Procedures Guide .

CLASS Statement

CLASS variable ;

The CLASS statement, which is required, names one and only one classification variable. The variable can be character or numeric. The CLASS variable identifies groups (or samples) in the data, and PROC NPAR1WAY provides analyses to examine differences among these groups. There may be two or more groups in the data.

EXACT Statement

EXACT statistic-options < / computation-options > ;

The EXACT statement requests exact tests for the specified statistics. Optionally, PROC NPAR1WAY computes Monte Carlo estimates of the exact p -values. The statistic-options specify the statistics for which to provide exact tests, and the computation-options specify options for the computation of exact statistics.

CAUTION: PROC NPAR1WAY computes exact tests with fast and efficient algorithms that are superior to direct enumeration. Exact tests are appropriate when a data set is small, sparse, skewed, or heavily tied. For some large problems, computation of exact tests may require a large amount of time and memory. Consider using asymptotic tests for such problems. Alternatively, when asymptotic methods may not be sufficient for such large problems, consider using Monte Carlo estimation of exact p -values. See the section Computational Resources on page 3173 for more information.

Statistic-Options

The statistic-options specify the statistics for which to provide exact tests.

Exact p -values are available for all nonparametric tests of location and scale differences produced by PROC NPAR1WAY. These include tests based on the following scores: Wilcoxon, median, Van der Waerden, Savage, Siegel-Tukey, Ansari-Bradley, Klotz, and Mood scores. Additionally, exact p -values are available for tests using the raw input data as scores. The procedure computes exact p -values when the data are classified into two levels (two-sample tests) and when the data are classified into more than two levels (multisample tests). Two-sample tests are based on simple linear rank statistics. Multisample tests are based on one-way ANOVA statistics. Exact p -values are also available for the two-sample Kolmogorov-Smirnov test. See the section Exact Tests on page 3171 for details.

Table 52.3 lists the available statistic-options and the exact tests computed. The option names are identical to the corresponding options in the PROC NPAR1WAY statement and the OUTPUT statement.

Table 52.3: EXACT Statement Statistic-Options
Option	Exact Test Computed
AB	Ansari-Bradley Test
KLOTZ	Klotz Test
KS	Two-Sample Kolmogorov-Smirnov Test
MEDIAN	Median Test
MOOD	Mood Test
SAVAGE	Savage Test
SCORES=DATA	Test Using Input Data as Scores
ST	Siegel-Tukey Test
WILCOXON	Wilcoxon Test for Two-Sample Data
	Kruskal-Wallis Test for Multisample Data
VW	Van der Waerden Test

If you list no statistic-options in the EXACT statement, then PROC NPAR1WAY computes all the available exact p -values for those tests requested in the PROC NPAR1WAY statement.

Computation-Options

The computation-options specify options for computation of exact statistics. You can specify the following computation-options in the EXACT statement:

ALPHA= ±

specifies the level of the confidence limits for Monte Carlo p -value estimates. The value of the ALPHA= option must be between 0 and 1, and the default is 0.01. Aconfidence level of ± produces 100(1 ˆ’ ± )% confidence limits. The default of ALPHA=0.01 produces 99% confidence limits for the Monte Carlo estimates. The ALPHA= option invokes the MC option.

MAXTIME= value

specifies the maximum clock time (in seconds) that PROC NPAR1WAY can use to compute an exact p -value. If the procedure does not complete the computation within the specified time, the computation terminates. The value of the MAXTIME= option must be a positive number. The MAXTIME= option is valid for Monte Carlo estimation of exact p -values, as well as for direct exact p -value computation.

See the section Computational Resources on page 3173 for more information.

requests Monte Carlo estimation of exact p -values, instead of direct exact p -value computation. Monte Carlo estimation can be useful for large problems that require a great amount of time and memory for exact computations but for which asymptotic approximations may not be sufficient. See the section Monte Carlo Estimation on page 3174 for more information.

The MC option is available for all the EXACT statement statistic-options . The ALPHA=, N=, and SEED= options also invoke the MC option.

N= n

specifies the number of samples for Monte Carlo estimation. The value of the N= option must be a positive integer, and the default is 10,000 samples. Larger values of n produce more precise estimates of exact p -values. Because larger values of n generate more samples, the computation time increases . The N= option invokes the MC option.

POINT

requests exact point probabilities for the test statistics.

The POINT option is available for all the EXACT statement statistic-options . The POINT option is not available with the MC option.

SEED= number

specifies the initial seed for random number generation for Monte Carlo estimation. The value of the SEED= option must be an integer. If you do not specify the SEED= option, or if the SEED= value is negative or zero, PROC NPAR1WAY uses the time of day from the computer s clock to obtain the initial seed. The SEED= option invokes the MC option.

FREQ Statement

FREQ variable ;

The FREQ statement names a numeric variable that provides a frequency for each observation in the DATA= data set. If you use a FREQ statement, PROC NPAR1WAY assumes that an observation occurs n times, where n is the value of the FREQ variable for that observation. The sum of the FREQ variable values represents the total number of observations, and the analysis is based on this expanded number of observations.

If the value of the FREQ variable is missing or is less than one, PROC NPAR1WAY does not use that observation in the analysis. If the value of the FREQ variable is not an integer, PROC NPAR1WAY uses only the integer portion as the frequency of the observation.

OUTPUT Statement

OUTPUT < OUT= SAS-data-set >< options > ;

The OUTPUT statement creates a SAS data set containing statistics computed by PROC NPAR1WAY. You specify which statistics to store in the output data set, using options identical to those used in the PROC NPAR1WAY statement. The output data set contains one observation for each analysis variable named in the VAR statement. For more information on the contents of the output data set, see the section Output Data Set on page 3175.

Note that you can use the Output Delivery System (ODS) to create a SAS data set from any piece of PROC NPAR1WAY output. For more information, see Table 52.6 on page 3184 and Chapter 14, Using the Output Delivery System.

You can specify the following options in the OUTPUT statement:

OUT= SAS-data-set

names the output data set. If you omit the OUT= option, the data set is named DATA n , where n is the smallest integer that makes the name unique.

options

specifies the statistics you want in the new data set. The options are identical to those used in the PROC NPAR1WAY statement to request analyses. Table 52.4 shows the available options. When you specify one of these options in the OUTPUT statement, the output data set contains statistics from that analysis. See the section Output Data Set on page 3175 for a list of the output data set variables corresponding to each option.

Table 52.4: OUTPUT Statement Options
Option	Output Data Set Statistics
AB	Ansari-Bradley Test
ANOVA	Standard analysis of variance
EDF	Kolmogorov-Smirnov Test Cramer-von Mises Test Kuiper Test for Two-Sample Data
KLOTZ	Klotz Test
MEDIAN	Median Test
MOOD	Mood Test
SAVAGE	Savage Test
SCORES=DATA	Test Using Input Data as Scores
ST	Siegel-Tukey Test
WILCOXON	Wilcoxon Test for Two-Sample Data Kruskal-Wallis Test
VW	Van der Waerden Test

If you do not specify any statistics options in the OUTPUT statement, then the output data set includes statistics from all analyses specified in the PROC NPAR1WAY statement.

VAR Statement

VAR variables ;

The VAR statement names the response or dependent variables to be analyzed. These variables must be numeric. If the VAR statement is omitted, the procedure analyzes all numeric variables in the data set except for the CLASS variable, the FREQ variable, and the BY variables.