Details


Missing Values

If an observation has a missing value for a response variable, PROC NPAR1WAY excludes that observation from the analysis.

By default, PROC NPAR1WAY excludes observations with missing values of the CLASS variable. If you specify the MISSING option, PROC NPAR1WAY treats missing values of the CLASS variable as a valid class level and includes these observations in the analysis.

PROC NPAR1WAY treats missing BY variable values like any other BY variable value. The missing values form a separate BY group . When a value of the FREQ variable is missing, PROC NPAR1WAY excludes the observation from the analysis.

Tied Values

Tied values occur when two are more observations are equal, whether the observations occur in the same sample or in different samples. In theory, nonparametric tests were developed for continuous distributions where the probability of a tie is zero. In practice, however, ties often occur. PROC NPAR1WAY uses the same method to handle ties for all score types. The procedure computes the scores as if there were no ties, averages the scores for tied observations, and assigns this average score to each observation with the same value.

When there are tied values, PROC NPAR1WAY first sorts the observations in ascending order and assigns ranks as if there were no ties. Then the procedure computes the scores based on these ranks, using the formula for the specified score type. The procedure averages the scores for tied observations and assigns this average score to each of the tied observations. Thus, all equal data values have the same score value. PROC NPAR1WAY then computes the test statistic from these scores.

Note that the asymptotic tests may be less accurate when the distribution of the data is heavily tied. For such data, it may be appropriate to use the exact tests provided by PROC NPAR1WAY as described in the section Exact Tests on page 3171.

When computing empirical distribution function statistics for data with ties, PROC NPAR1WAY uses the formulas given in the section Tests Based on the Empirical Distribution Function on page 3168. No special handling of ties is necessary.

Note that PROC NPAR1WAY bases its computations on the internal numeric values of the analysis variables ; the procedure does not format or round these values before analysis. When values differ in their internal representation, even slightly, PROC NPAR1WAY does not treat them as tied values. If this is a concern for your data, then round the analysis variables by an appropriate amount before invoking PROC NPAR1WAY. For information on the ROUND function, refer to the discussion in SAS Language Reference: Dictionary .

Statistical Computations

Simple Linear Rank Tests for Two-Sample Data

Statistics of the form

click to expand

are called simple linear rank statistics , where

R j

is the rank of the observation j

a ( R j )

is the score based on that rank

c j

is an indicator variable denoting the class to which the j th observation belongs

n

is the total number of observations

For two-sample data (where the observations are classified into two levels), PROC NPAR1WAY calculates simple linear rank statistics for the scores that you specify. The section Scores for Linear Rank and One-Way ANOVA Tests on page 3166 describes the available scores, which you can use to test for differences in location and differences in scale.

To compute S , PROC NPAR1WAY sums the scores of the observations in the smaller of the two samples. If both samples have the same number of observations, PROC NPAR1WAY sums those scores for the sample that appears first in the input data set.

For each score that you specify, PROC NPAR1WAY computes an asymptotic test of the null hypothesis of no difference between the two classification levels. Exact tests are also available for these two-sample linear rank statistics. PROC NPAR1WAY computes exact tests for each score type that you specify in the EXACT statement. See the section Exact Tests on page 3171 for details.

To compute an asymptotic test for a linear rank sum statistic, PROC NPAR1WAY uses a standardized test statistic z , which has an asymptotic standard normal distribution under the null hypothesis. The standardized test statistic is computed as

click to expand

where E ( S ) is the expected value of S under the null hypothesis, and Var ( S ) is the variance under the null hypothesis. As shown in Randles and Wolfe (1979),

click to expand

where n 1 is the number of observations in the first (smaller) class level or sample, n 2 is the number of observations in the other class level, and

click to expand

where is the average score,

click to expand

PROC NPAR1WAY computes one-sided and two-sided asymptotic p -values for each two-sample linear rank test. When the test statistic z is greater than its null hypothesis expected value of zero, PROC NPAR1WAY computes the right-sided p -value, which is the probability of a larger value of the statistic occurring under the null hypothesis. When the test statistic is less than or equal to zero, PROC NPAR1WAY computes the left-sided p -value, which is the probability of a smaller value of the statistic occurring under the null hypothesis. The one-sided p -value P 1 can be expressed as

click to expand

where Z has a standard normal distribution. The two-sided p -value P 2 is computed as

click to expand

For Wilcoxon scores and Siegel-Tukey scores, PROC NPAR1WAY incorporates a continuity correction when computing the standardized test statistic z , unless you specify the CORRECT=NO option. PROC NPAR1WAY applies the continuity correction by subtracting 0.5 from the numerator S ˆ’ E ( S ) if it is greater than zero. If the numerator is less than zero, PROC NPAR1WAY adds 0.5. Some sources recommend a continuity correction for nonparametric tests that use a continuous distribution to approximate a discrete distribution. Refer to Sheskin (1997). If you specify CORRECT=NO, PROC NPAR1WAY does not use a continuity correction for any test.

One-Way ANOVA Tests

PROC NPAR1WAY computes a one-way ANOVA test for each score type that you specify. Under the null hypothesis of no difference among class levels (or samples), this test statistic has an asymptotic chi-square distribution with r ˆ’ 1 degrees of freedom, where r is the number of class levels. For Wilcoxon scores, this test is known as the Kruskal-Wallis test.

Exact one-way ANOVA tests are also available for multisample data (where the data are classified into more than two levels). For two-sample data, exact simple linear rank tests are available. PROC NPAR1WAY computes exact tests for each score type that you specify in the EXACT statement. See the section Exact Tests on page 3171 for details on exact tests.

PROC NPAR1WAY computes the one-way ANOVA test statistic as

click to expand

where T i is the total of scores for the class level i , E ( T i ) is the expected total for level i under the null hypothesis of no difference among levels, n i is the number of observations in level i , and S 2 is the sample variance of the scores.

click to expand

where a ( R j ) is the score for observation j , and c ij indicates whether observation j is in level i .

click to expand

where a is the average score,

click to expand

Scores for Linear Rank and One-Way ANOVA Tests

For each score type that you specify, PROC NPAR1WAY computes a one-way ANOVA statistic and also a linear rank statistic for two-sample data. The following score types are used primarily to test for differences in location: Wilcoxon, median, Van der Waerden, and Savage. The following scores types are used to test for scale differences: Siegel-Tukey, Ansari-Bradley, Klotz, and Mood. This section gives formulas for the score types. For further information on the formulas and the applicability of each score, refer to Randles and Wolfe (1979), Gibbons and Chakraborti (1992), Conover (1999), and Hollander and Wolfe (1999).

In addition to the score types described in this section, you can specify the SCORES=DATA option to use the input data observations as scores. This enables you to produce a very wide variety of tests. You can construct any scores using the DATA step, and then PROC NPAR1WAY computes the corresponding linear rank and one-way ANOVA tests. You can also analyze the raw data with the SCORES=DATA option; for two-sample data, this permutation test is known as Pitman s test.

Wilcoxon Scores

Wilcoxon scores are the ranks of the observations.

Using Wilcoxon scores in the linear rank statistic for two-sample data produces the rank sum statistic of the Mann-Whitney-Wilcoxon test. Using Wilcoxon scores in the one-way ANOVA statistic produces the Kruskal-Wallis test. Wilcoxon scores are locally most powerful for location shifts of a logistic distribution.

When computing the asymptotic Wilcoxon two-sample test, PROC NPAR1WAY uses a continuity correction by default, as described in the section Simple Linear Rank Tests for Two-Sample Data on page 3163. If you specify CORRECT=NO in the PROC NPAR1WAY statement, the procedure does not use a continuity correction.

Median Scores

Median scores equal 1 for observations greater than the median, and 0 otherwise .

click to expand

Using median scores in the linear rank statistic for two-sample data produces the two-sample median test. The one-way ANOVA statistic with median scores is equivalent to the Brown-Mood test. Median scores are particularly powerful for distributions that are symmetric and heavy-tailed.

Van der Waerden Scores

Van der Waerden scores are the quantiles of a standard normal distribution. These scores are also known as quantile normal scores .

click to expand

where is the cumulative distribution function of a standard normal distribution. These scores are powerful for normal distributions.

Savage Scores

Savage scores are expected values of order statistics from the exponential distribution, with 1 subtracted to center the scores around 0.

click to expand

Savage scores are powerful for comparing scale differences in exponential distributions or location shifts in extreme value distributions (Hajek 1969, p. 83).

Siegel-Tukey Scores

Siegel-Tukey scores are computed as

click to expand

where the score values continue to increase in this pattern towards the middle ranks until all observations have been assigned a score.

Ansari-Bradley Scores

Ansari-Bradley scores are similar to Siegel-Tukey scores, but Ansari-Bradley assigns the same scores to corresponding extreme ranks. (Siegel Tukey scores are just a permutation of the ranks 1 , 2 , , n .)

click to expand

Equivalently, Ansari-Bradley scores are defined as

click to expand
Klotz Scores

Klotz scores are the squares of the Van der Waerden (or quantile normal) scores.

click to expand

where is the cumulative distribution function of a standard normal distribution.

Mood Scores

Mood scores are computed as the square of the difference between each rank and the average rank.

click to expand

Tests Based on the Empirical Distribution Function

If you specify the EDF option, PROC NPAR1WAY computes tests based on the empirical distribution function. These include the Kolmogorov-Smirnov and Cramer-von Mises tests, and also the Kuiper test for two-sample data. This section gives formulas for these test statistics. For further information on the formulas and the interpretation of EDF statistics, refer to Hollander and Wolfe (1999) and Gibbons and Chakraborti (1992). For details on the k -sample analogues of the Kolmogorov-Smirnov and Cramer-von Mises statistics used by NPAR1WAY, refer to Kiefer (1959).

The empirical distribution function (EDF) of a sample { x j }, j = 1 , 2 , , n , is defined as the following function:

click to expand

where I ( ·) is an indicator function. PROC NPAR1WAY uses the subsample of values within the i th class level to generate an EDF for the class, F i . The EDF for the overall sample, pooled over classes, can also be expressed as

click to expand

where n i is the number of observations in the i th class level, and n is the total number of observations.

Kolmogorov-Smirnov Test

The Kolmogorov-Smirnov statistic measures the maximum deviation of the EDF within the classes from the pooled EDF. PROC NPAR1WAY computes the Kolmogorov-Smirnov statistic as

click to expand

The asymptotic Kolmogorov-Smirnov statistic is computed as

For each class level i and overall, PROC NPAR1WAY displays the value of F i at the maximum deviation from F and the value ( F i ˆ’ F ) at the maximum deviation from F . PROC NPAR1WAY also gives the observation where the maximum deviation occurs.

If there are only two class levels, PROC NPAR1WAY computes the two-sample Kolmogorov-Smirnov test statistic D as

click to expand

The p -value for this test is the probability that D is greater than the observed value d under the null hypothesis of no difference between class levels or samples. PROC NPAR1WAY computes the asymptotic p -value for D with the approximation

click to expand

where

click to expand

The quality of this approximation has been studied by Hodges (1957).

If you specify the D option, or if you request exact Kolmogorov-Smirnov p -values with the KS option in the EXACT statement, PROC NPAR1WAY also computes the one-sided Kolmogorov-Smirnov statistics D + and D ˆ’ for two-sample data.

click to expand

The asymptotic probability that D + is greater than the observed value d + , under the null hypothesis of no difference between the two class levels, is computed as

click to expand

Similarly, the asymptotic probability that D ˆ’ is greater than the observed value d ˆ’ is computed as

click to expand

To request exact p -values for the Kolmogorov-Smirnov statistics, you can specify the KS option in the EXACT statement. See the section Exact Tests on page 3171 for more information.

Cramer-von Mises Test

The Cramer-von Mises statistic is defined as

click to expand

where t j is the number of ties at the j th distinct value and p is the number of distinct values. The asymptotic value is computed as

PROC NPAR1WAY displays the contribution of each class level to the sum CM a .

Kuiper Test

For data with two class levels, PROC NPAR1WAY computes the Kuiper statistic, its scaled value for the asymptotic distribution, and the asymptotic p -value. The Kuiper statistic is computed as

click to expand

The asymptotic value is

PROC NPAR1WAY displays max j F 1 ( x j ) ˆ’ F 2 ( x j ) for each class level.

The p -value for the Kuiper test is the probability of observing a larger value of K a under the null hypothesis of no difference between the two classes. PROC NPAR1WAY computes this p -value according to Owen (1962), p. 441.

Exact Tests

PROC NPAR1WAY provides exact p -values for tests for location and scale differences based on the following scores: Wilcoxon, median, van der Waerden, Savage, Siegel-Tukey, Ansari-Bradley, Klotz, and Mood scores. Additionally, PROC NPAR1WAY provides exact p -values for tests using the raw data as scores. Exact tests are available for two-sample and multisample data. When the data are classified into two samples, tests are based on simple linear rank statistics. When the data are classified into more than two samples, tests are based on one-way ANOVA statistics.

Exact tests can be useful in situations where the asymptotic assumptions are not met and the asymptotic p -values are not close approximations for the true p -values. Standard asymptotic methods involve the assumption that the test statistic follows a particular distribution when the sample size is sufficiently large. When the sample size is not large, asymptotic results may not be valid, with the asymptotic p -values differing perhaps substantially from the exact p -values. Asymptotic results may also be unreliable when the distribution of the data is sparse, skewed, or heavily tied. Refer to Agresti (1996) and Bishop, Fienberg, and Holland (1975). Exact computations are based on the statistical theory of exact conditional inference for contingency tables, reviewed by Agresti (1992).

In addition to computation of exact p -values, PROC NPAR1WAY provides the option of estimating exact p -values by Monte Carlo simulation. This can be useful for problems that are so large that exact computations require a great amount of time and memory, but for which asymptotic approximations may not be sufficient.

The following sections summarize the exact computational algorithms, define the exact p -values that PROC NPAR1WAY computes, discuss the computational resource requirements, and describe the Monte Carlo estimation option.

Computational Algorithms

PROC NPAR1WAY computes exact p -values using the network algorithm developed by Mehta and Patel (1983). This algorithm provides a substantial advantage over direct enumeration, which can be very time consuming and feasible only for small problems. Refer to Agresti (1992) for a review of algorithms for computation of exact p -values, and refer to Mehta, Patel, and Tsiatis (1984) and Mehta, Patel, and Senchaudhuri (1991) for information on the performance of the network algorithm.

PROC NPAR1WAY constructs a contingency table from the input data, with rows formed by the levels of the classification variable and columns formed by the response variable values. The reference set for a given contingency table is the set of all contingency tables with the observed marginal row and column sums. Corresponding to this reference set, the network algorithm forms a directed acyclic network consisting of nodes in a number of stages. A path through the network corresponds to a distinct table in the reference set. The distances between nodes are defined so that the total distance of a path through the network is the corresponding value of the test statistic. At each node, the algorithm computes the shortest and longest path distances for all the paths that pass through that node. For the two-sample linear rank statistics, which can be expressed as a linear combination of cell frequencies multiplied by increasing row and column scores, PROC NPAR1WAY computes shortest and longest path distances using the algorithm given in Agresti, Mehta, and Patel (1990). For the multisample one-way test statistics, PROC NPAR1WAY computes an upper bound for the longest path and a lower bound for the shortest path, following the approach of Valz and Thompson (1994).

The longest and shortest path distances or bounds for a node are compared to the value of the test statistic to determine whether all paths through the node contribute to the p -value, none of the paths through the node contribute to the p -value, or neither of these situations occur. If all paths through the node contribute, the p -value is incremented accordingly , and these paths are eliminated from further analysis. If no paths contribute, these paths are eliminated from the analysis. Otherwise, the algorithm continues, still processing this node and the associated paths. The algorithm finishes when all nodes have been accounted for.

In applying the network algorithm, PROC NPAR1WAY uses full precision to represent all statistics, row and column scores, and other quantities involved in the computations. Although it is possible to use rounding to improve the speed and memory requirements of the algorithm, PROC NPAR1WAY does not do this since it can result in reduced accuracy of the p -values.

Definition of p-Values

For two-sample linear rank tests, PROC NPAR1WAY computes exact one-sided and two-sided p -values for each test specified in the EXACT statement. For the one-sided test, PROC NPAR1WAY displays the right-sided p -value when the observed value of the test statistic is greater than its expected value. The right-sided p -value is the sum of probabilities for those tables having a test statistic greater than or equal to the observed test statistic. Otherwise, when the test statistic is less than or equal to its expected value, PROC NPAR1WAY displays the left-sided p -value. The left-sided p -value is the sum of probabilities for those tables having a test statistic less than or equal to the one observed. The one-sided p -value P 1 can be expressed as

click to expand

where S is the observed value of the test statistic and Mean is the expected value of the test statistic under the null hypothesis. PROC NPAR1WAY computes the two-sided p -value as the sum of the one-sided p -value and the corresponding area in the opposite tail of the distribution of the statistic, equidistant from the expected value. The two-sided p -value P 2 can be expressed as

click to expand

For multisample data, the tests are based on one-way ANOVA statistics. For a test of this form, large values of the test statistic indicate a departure from the null hypothesis; the test is inherently two-sided. The exact p -value is the sum of probabilities for those tables having a test statistic greater than or equal to the value of the observed test statistic.

If you specify the POINT option in the EXACT statement, PROC NPAR1WAY also displays exact point probabilities for the test statistics. The exact point probability is the exact probability that the test statistic equals the observed value.

Computational Resources

PROC NPAR1WAY uses relatively fast and efficient algorithms for exact computations. These recently developed algorithms, together with improvements in computer power, make it feasible now to perform exact computations for data sets where previously only asymptotic methods could be applied. Nevertheless, there are still large problems that may require a prohibitive amount of time and memory for exact computations, depending on the speed and memory available on your computer. For large problems, consider whether exact methods are really needed or whether asymptotic methods might give results quite close to the exact results while requiring much less computer time and memory. When asymptotic methods may not be sufficient for such large problems, consider using Monte Carlo estimation of exact p -values, as described in the section Monte Carlo Estimation on page 3174.

A formula does not exist that can predict in advance how much time and memory are needed to compute an exact p -value for a certain problem. The time and memory required depend on several factors, including which test is being performed, the total sample size, the number of rows and columns, and the specific arrangement of the observations into table cells . Generally, larger problems (in terms of total sample size, number of rows, and number of columns) tend to require more time and memory. Additionally, for a fixed total sample size, time and memory requirements tend to increase as the number of rows and columns increase, since this corresponds to an increase in the number of tables in the reference set. Also for a fixed sample size, time and memory requirements increase as the marginal row and column totals become more homogeneous. Refer to Agresti, Mehta, and Patel (1990) and Gail and Mantel (1977).

At any time while PROC NPAR1WAY is computing exact p -values, you can terminate the computations by pressing the system interrupt key sequence (refer to the SAS Companion for your system) and choosing to stop computations. After you terminate exact computations, PROC NPAR1WAY completes all other remaining tasks . The procedure produces the requested output and reports missing values for any exact p -values not computed by the time of termination.

You can also use the MAXTIME= option in the EXACT statement to limit the amount of time PROC NPAR1WAY uses for exact computations. You specify a MAXTIME= value that is the maximum amount of time (in seconds) that PROC NPAR1WAY can use to compute an exact p -value. If PROC NPAR1WAY does not finish computing an exact p -value within that time, it terminates the computation and completes all other remaining tasks.

Monte Carlo Estimation

If you specify the MC option in the EXACT statement, PROC NPAR1WAY computes Monte Carlo estimates of the exact p -values instead of directly computing the exact p -values. Monte Carlo estimation can be useful for large problems that require a great amount of time and memory for exact computations but for which asymptotic approximations may not be sufficient. To describe the precision of each Monte Carlo estimate, PROC NPAR1WAY provides the asymptotic standard error and 100(1 ˆ’ ± )% confidence limits. The confidence level ± is determined by the ALPHA= option in the EXACT statement, which, by default, equals 0.01, and produces 99% confidence limits. The N= option in the EXACT statement specifies the number of samples PROC NPAR1WAY uses for Monte Carlo estimation; the default is 10,000 samples. You can specify a larger value for n to improve the precision of the Monte Carlo estimates. Because larger values of n generate more samples, the computation time increases . Or you can specify a smaller value of n to reduce the computation time.

To compute a Monte Carlo estimate of an exact p -value, PROC NPAR1WAY generates a random sample of tables with the same total sample size, row totals, and column totals as the observed table. PROC NPAR1WAY uses the algorithm of Agresti, Wackerly, and Boyett (1979), which generates tables in proportion to their hyper-geometric probabilities conditional on the marginal frequencies. For each sample table, PROC NPAR1WAY computes the value of the test statistic and compares it to the value for the observed table. When estimating a right-sided p -value, PROC NPAR1WAY counts all sample tables for which the test statistic is greater than or equal to the observed test statistic. Then the p -value estimate equals the number of these tables divided by the total number of tables sampled.

MC

=

M/N

M

=

number of samples with (Test Statistic t )

N

=

total number of samples

t

=

observed Test Statistic

PROC NPAR1WAY computes left-sided and two-sided p -value estimates in a similar manner. For left-sided p -values, PROC NPAR1WAY evaluates whether the test statistic for each sampled table is less than or equal to the observed test statistic. For two-sided p -values, PROC NPAR1WAY examines the sample test statistics according to the expression for P 2 given in the section Definition of p -Values on page 3172.

The variable M is a binomial variable with N trials and success probability p . It follows that the asymptotic standard error of the Monte Carlo estimate is

click to expand

PROC NPAR1WAY constructs asymptotic confidence limits for the p -values according to

click to expand

where z ± / 2 is the 100(1 ˆ’ ± / 2) percentile of the standard normal distribution, and the confidence level ± is determined by the ALPHA= option in the EXACT statement.

When the Monte Carlo estimate MC equals 0, then PROC NPAR1WAY computes the confidence limits for the p -value as

When the Monte Carlo estimate MC equals 1, then PROC NPAR1WAY computes the confidence limits as

Output Data Set

The OUTPUT statement creates a SAS data set that contains statistics computed by PROC NPAR1WAY. You specify which statistics to store in the output data set, using options identical to those used in the PROC NPAR1WAY statement. When you specify one of these options in the OUTPUT statement, PROC NPAR1WAY includes all available statistics from that analysis in the output data set.

The output data set contains one observation for each analysis variable within a BY-group. The OUTPUT data set can include the following variables:

  • BY variables

  • _VA R _ , which identifies the analysis variable

  • variables containing the specified statistics

The following table lists the variable names and descriptions for all available statistics. Note that some statistics are available only for the two-sample case (where the classification variable groups the data into two classes). Other statistics are available only for the multisample case.

When you request exact p -values for certain analyses using the EXACT statement, PROC NPAR1WAY also includes those p -values in the output data set if you specify the corresponding analysis options in the OUTPUT statement. If you do not request exact p -values, then they do not appear in the output data set.

Monte Carlo estimates of exact p -values are not available in this output data set, but you can use the Output Delivery System (ODS) to store Monte Carlo estimates in a SAS data set. You can use the Output Delivery System to create a SAS data set from any piece of PROC NPAR1WAY output. For more information, see Table 52.6 on page 3184 and Chapter 14, Using the Output Delivery System.

Table 52.5: Output Data Set Variable Names and Descriptions

Option

Output Variables

 

Variable Descriptions

ANOVA

_MSA_

 

ANOVA Effect Mean Square, Among MS

_MSE_

ANOVA Error Mean Square, Within MS

_F_

F Statistic for ANOVA

P_F

p -value, F Statistic for ANOVA

WILCOXON

_WIL_

[ *]

Two-sample Wilcoxon Statistic

Z_ IL

[ *]

Wilcoxon Statistic, Standardized

PL_WIL

[ *]

p -value, Wilcoxon Test (Left-sided)

PR_WIL

[ *]

p -value, Wilcoxon Test (Right-sided)

P2_WIL

[ *]

p -value, Wilcoxon Test (Two-sided)

PTL_WIL

[ *]

p -value, Wilcoxon t Approximation (Left-sided)

PTR_WIL

[ *]

p -value, Wilcoxon t Approximation, (Right-sided)

PT2_WIL

[ *]

p -value, Wilcoxon t Approximation, (Two-sided)

XPL_WIL

[ *]

Exact p -value, Wilcoxon Test (Left-sided)

XPR_WIL

[ *]

Exact p -value, Wilcoxon Test (Right-sided)

XPT_WIL

[ *]

Exact Point Probability, Wilcoxon Test

XP2_WIL

[ *]

Exact p -value, Wilcoxon Test (Two-sided)

_KW_

 

Kruskal-Wallis Statistic

DF_KW

Degrees of Freedom, Kruskal-Wallis Test

P_KW

p -value, Kruskal-Wallis Test

XP_KW

[ **]

Exact p -value, Kruskal-Wallis Test

XPT_KW

[ **]

Exact Point Probability, Kruskal-Wallis Test

MEDIAN

_MED_

[ *]

Two-sample Median Statistic

Z_MED

[ *]

Median Statistic, Standardized

PL_MED

[ *]

p -value, Median Test (Left-sided)

PR_MED

[ *]

p -value, Median Test (Right-sided)

P2_MED

[ *]

p -value, Median Test (Two-sided)

XPL_MED

[ *]

Exact p -value, Median Test (Left-sided)

XPR_MED

[ *]

Exact p -value, Median Test (Right-sided)

XPT_MED

[ *]

Exact Point Probability, Median Test

XP2_MED

[ *]

Exact p -value, Median Test (Two-sided)

_CHMED_

 

Median Chi-square (Brown-Mood Test)

DF_CHMED

Degrees of Freedom, Median Chi-square

P_CHMED

p -value, Median Chi-square Test

XP_CHMED

[ **]

Exact p -value, Median Chi-square

XPT_CHME

[ **]

Exact Point Probability, Median Chi-square

VW

_VW_

[ *]

Two-sample Van der Waerden Statistic

Z_VW

[ *]

Van der Waerden Statistic, Standardized

PL_VW

[ *]

p -value, Van der Waerden Test (Left-sided)

PR_VW

[ *]

p -value, Van der Waerden Test (Right-sided)

P2_VW

[ *]

p -value, Van der Waerden Test (Two-sided)

XPL_VW

[ *]

Exact p -value, Van der Waerden Test (Left-sided)

XPR_VW

[ *]

Exact p -value, Van der Waerden Test (Right-sided)

XPT_VW

[ *]

Exact Point Probability, Van der Waerden Test

XP2_VW

[ *]

Exact p -value, Van der Waerden Test (Two-sided)

_CHVW_

 

Van der Waerden Chi-square

DF_CHVW

Degrees of Freedom, Van der Waerden Chi-square

P_CHVW

p -value, Van der Waerden Chi-square Test

XP_CHVW

[ **]

Exact p -value, Van der Waerden Chi-square

XPT_CHVW

[ **]

Exact Point Prob, Van der Waerden Chi-square

SAVAGE

_SAV_

[ *]

Two-sample Savage Statistic

Z_SAV

[ *]

Savage Statistic, Standardized

PL_SAV

[ *]

p -value, Savage Test (Left-sided)

PR_SAV

[ *]

p -value, Savage Test (Right-sided)

P2_SAV

[ *]

p -value, Savage Test (Two-sided)

XPL_SAV

[ *]

Exact p -value, Savage Test (Left-sided)

XPR_SAV

[ *]

Exact p -value, Savage Test (Right-sided)

XPT_SAV

[ *]

Exact Point Probability, Savage Test

XP2_SAV

[ *]

Exact p -value, Savage Test (Two-sided)

_CHSAV_

 

Savage Chi-square

DF_CHSAV

Degrees of Freedom, Savage Chi-square

P_CHSAV

p -value, Savage Chi-square Test

XP_CHSAV

[ **]

Exact p -value, Savage Chi-square

XPT_CHSA

[ **]

Exact Point Probability, Savage Chi-square

ST

_ST_

[ *]

Two-sample Siegel-Tukey Statistic

Z_ST

[ *]

Siegel-Tukey Statistic, Standardized

PL_ST

[ *]

p -value, Siegel-Tukey Test (Left-sided)

PR_ST

[ *]

p -value, Siegel-Tukey Test (Right-sided)

P2_ST

[ *]

p -value, Siegel-Tukey Test (Two-sided)

XPL_ST

[ *]

Exact p -value, Siegel-Tukey Test (Left-sided)

XPR_ST

[ *]

Exact p -value, Siegel-Tukey Test (Right-sided)

XPT_ST

[ *]

Exact Point Probability, Siegel-Tukey Test

XP2_ST

[ *]

Exact p -value, Siegel-Tukey Test (Two-sided)

_CHST_

Siegel-Tukey Chi-square

DF_CHST

Degrees of Freedom, Siegel-Tukey Chi-square

P_CHST

p -value, Siegel-Tukey Chi-square Test

XP_CHST

[ **]

Exact p -value, Siegel-Tukey Chi-square

XPT_CHST

[ **]

Exact Point Probability, Siegel-Tukey Chi-square

AB

_AB_

[ *]

Two-sample Ansari-Bradley Statistic

Z_AB

[ *]

Ansari-Bradley Statistic, Standardized

PL_AB

[ *]

p -value, Ansari-Bradley Test (Left-sided)

PR_AB

[ *]

p -value, Ansari-Bradley Test (Right-sided)

P2_AB

[ *]

p -value, Ansari-Bradley Test (Two-sided)

XPL_AB

[ *]

Exact p -value, Ansari-Bradley Test (Left-sided)

XPR_AB

[ *]

Exact p -value, Ansari-Bradley Test (Right-sided)

XPT_AB

[ *]

Exact Point Probability, Ansari-Bradley Test

XP2_AB

[ *]

Exact p -value, Ansari-Bradley Test (Two-sided)

_CHAB_

Ansari Bradley Chi-square

DF_CHAB

Degrees of Freedom, Ansari-Bradley Chi-square

P_CHAB

p -value, Ansari-Bradley Chi-square Test

XP_CHAB

[ **]

Exact p -value, Ansari-Bradley Chi-square

XPT_CHAB

[ **]

Exact Point Probability, Ansari-Bradley Chi-square

KLOTZ

_KLOTZ_

[ *]

Two-sample Klotz Statistic

Z_K

[ *]

Klotz Statistic, Standardized

PL_K

[ *]

p -value, Klotz Test (Left-sided)

PR_K

[ *]

p -value, Klotz Test (Right-sided)

P2_K

[ *]

p -value, Klotz Test (Two-sided)

XPL_K

[ *]

Exact p -value, Klotz Test (Left-sided)

XPR_K

[ *]

Exact p -value, Klotz Test (Right-sided)

XPT_K

[ *]

Exact Point Probability, Klotz Test

XP2_K

[ *]

Exact p -value, Klotz Test (Two-sided)

_CHK_

Klotz Chi-square

DF_CHK

Degrees of Freedom, Klotz Chi-square

P_CHK

p -value, Klotz Chi-square Test

XP_CHK

[ **]

Exact p -value, Klotz Chi-square

XPT_CHK

[ **]

Exact Point Probability, Klotz Chi-square

MOOD

_MOOD_

[ *]

Two-sample Mood Statistic

Z_MOOD

[ *]

Mood Statistic, Standardized

PL_MOOD

[ *]

p -value, Mood Test (Left-sided)

PR_MOOD

[ *]

p -value, Mood Test (Right-sided)

P2_MOOD

[ *]

p -value, Mood Test (Two-sided)

XPL_MOOD

[ *]

Exact p -value, Mood Test (Left-sided)

XPR_MOOD

[ *]

Exact p -value, Mood Test (Right-sided)

XPT_MOOD

[ *]

Exact Point Probability, Mood Test

XP2_MOOD

[ *]

Exact p -value, Mood Test (Two-sided)

_CHMOOD_

Mood Chi-square

DF_CHMOO

Degrees of Freedom, Mood Chi-square

P_CHMOOD

p -value, Mood Chi-square Test

XP_CHMOO

[ **]

Exact p -value, Mood Chi-square

XPT_CHMO

[ **]

Exact Point Probability, Mood Chi-square

SCORES=DATA

_DATA_

[ *]

Two-sample Data Scores Statistic

Z_DATA

[ *]

Data Scores Statistic, Standardized

PL_DATA

[ *]

p -value, Data Scores Test (Left-sided)

PR_DATA

[ *]

p -value, Data Scores Test (Right-sided)

P2_DATA

[ *]

p -value, Data Scores Test (Two-sided)

XPL_DATA

[ *]

Exact p -value, Data Scores Test (Left-sided)

XPR_DATA

[ *]

Exact p -value, Data Scores Test (Right-sided)

XPT_DATA

[ *]

Exact Point Probability, Data Scores Test

XP2_DATA

[ *]

Exact p -value, Data Scores Test (Two-sided)

_CHDATA_

Data Scores Chi-square

DF_CHDAT

Degrees of Freedom, Data Scores Chi-square

P_CHDATA

p -value, Data Scores Chi-square Test

XP_CHDAT

[ **]

Exact p -value, Data Scores Chi-square

XPT_CHDA

[ **]

Exact Point Probability, Data Scores Chi-square

EDF

_KS_

 

Kolmogorov-Smirnov Statistic

_KSA_

 

Kolmogorov-Smirnov Statistic (Asymptotic)

_Dp_

[ *]

Two-sample Kolmogorov-Smirnov D+

P_Dp

[ *]

p -value, Kolmogorov-Smirnov D+

_Dm_

[ *]

Two-sample Kolmogorov-Smirnov D-

P_Dm

[ *]

p -value, Kolmogorov-Smirnov D-

_D_

[ *]

Two-sample Kolmogorov-Smirnov Statistic

P_KSA

[ *]

p -value, Two-sample Kolmogorov-Smirnov

XP_Dp

[ *]

Exact p -value, Kolmogorov-Smirnov D+

XPT_Dp

[ *]

Exact Point Probability, Kolmogorov-Smirnov D+

XP_ Dm

[ *]

Exact p -value, Kolmogorov-Smirnov D-

XPT_Dm

[ *]

Exact Point Probability, Kolmogorov-Smirnov D-

XP_D

[ *]

Exact p -value, Kolmogorov-Smirnov D

XPT_D

[ *]

Exact Point Probability, Kolmogorov-Smirnov D

_CM_

Cramer-von Mises Statistic

_CMA_

Cramer-von Mises Statistic (Asymptotic)

_K_

[ *]

Kuiper Two-sample Statistic

_KA_

[ *]

Kuiper Two-sample Statistic (Asymptotic)

P_KA

[ *]

p -value, Two-sample Kuiper (Asymptotic)

[ *] Statistic included only for two-sample cases

[ **] Statistic included only for multisample cases

Displayed Output

If you specify the ANOVA option, PROC NPAR1WAY displays a Class Means table and an Analysis of Variance table for each response variable. The Class Means table includes the following information for each CLASS variable value, or level:

  • N, the number of observations

  • the Mean of the response variable

The Analysis of Variance table includes the following information for each Source of variation (Among classes, and Within classes):

  • DF, the degrees of freedom associated with the source

  • the Sum of Squares

  • the Mean Square, the sum of squares divided by the degrees of freedom

The Analysis of Variance table also includes the following:

  • the F Value for testing the hypothesis that the group means are equal. This is computed by dividing the Mean Square (Among) by the Mean Square (Within).

  • Pr > F, the significance probability corresponding to the F Value

For each score type that you specify, PROC NPAR1WAY displays a Class Scores table. The available score types include Wilcoxon, median, Van der Waerden, Savage, Siegel-Tukey, Ansari-Bradley, Klotz, Mood, and raw data scores. PROC NPAR1WAY assigns the specified scores to the response variable values, and classifies then according to the CLASS variable values. The Class Scores table includes the following information for each class:

  • N, the number of observations

  • Sum of Scores

  • Expected Under H0, the expected sum of scores under the null hypothesis of no difference among classes

  • Std Dev Under H0, the standard deviation under the null hypothesis

  • Mean Score

When there are only two levels of the CLASS variable, PROC NPAR1WAY displays the following Two-Sample Test results for each analysis of scores:

  • Statistic, which is the sum of scores for the class with the smaller sample size

  • Z, the standardized test statistic, which has an asymptotic standard normal distribution under the null hypothesis

  • One-Sided Pr < Z, or One-Sided Pr > Z, the asymptotic one-sided p -value, displayed as Pr < Z or Pr > Z, depending on whether Z is <= 0 or > 0

  • Two-Sided Pr > Z, the asymptotic two-sided p -value

For Wilcoxon scores, PROC NPAR1WAY also displays a t -approximation for the two-sample test.

If you request an exact test by specifying the score type in the EXACT statement, PROC NPAR1WAY displays the following exact p -values for two-sample data:

  • One-Sided Pr <= S, or One-Sided Pr >= S, the one-sided exact p -value, displayed as Pr <= S or Pr >= S, depending on whether S <= Mean or S > Mean, where S is the test statistic and Mean is its expected value under the null hypothesis

  • Point Pr = S, the point probability, if you specify the POINT optioninthe EXACT statement

  • Two-Sided Pr >= S - Mean, the two-sided exact p -value

If you request Monte Carlo estimates for the exact test by specifying the MC option in the EXACT statement, PROC NPAR1WAY displays the following information for two-sample data:

  • Estimate of One-Sided Pr <= S or One-Sided Pr >= S, the one-sided exact p -value, together with its Lower and Upper Confidence Limits

  • Estimate of Two-Sided Pr >= S - Mean, the two-sided exact p -value, together with its Lower and Upper Confidence Limits

  • Number of Samples used to compute the Monte Carlo estimates

  • Initial Seed used to compute the Monte Carlo estimates

For both two-sample and multisample data, PROC NPAR1WAY displays the following One-Way Analysis for each score type:

  • Chi-Square, the one-way ANOVA statistic for testing the null hypothesis of no difference among classes

  • DF, the degrees of freedom

  • Pr > Chi-Square, the asymptotic p -value

For multisample data, if you request an exact test by specifying the score type in the EXACT statement, PROC NPAR1WAY also displays the exact p -value as follows:

  • Exact Pr >= Chi-Square

  • Exact Pr = Chi-Square, the point probability, if you specify the POINT option in the EXACT statement

For multisample data, if you request a Monte Carlo estimate for the exact test by specifying the MC option in the EXACT statement, PROC NPAR1WAY displays the following information:

  • Estimate of Pr >= Chi-Square, together with its Lower and Upper Confidence Limits

  • Number of Samples used to compute the Monte Carlo estimate

  • Initial Seed used to compute the Monte Carlo estimate

If you specify the EDF option, PROC NPAR1WAY produces tables for the Kolmogorov-Smirnov Test, the Cramer-von Mises Test, and for two-sample data only, the Kuiper Test. The Kolmogorov-Smirnov Test table includes the following information for each CLASS variable value, or level:

  • N, the number of observations

  • EDF at Maximum, the value of the class EDF (empirical distribution function) at its maximum deviation from the pooled EDF

  • Deviation from Mean at Maximum, the value of at its maximum, where n i it the class sample size, F i is the class EDF, and F is the pooled EDF

PROC NPAR1WAY displays the following Kolmogorov-Smirnov statistics:

  • KS, the Kolmogorov-Smirnov statistic

  • KSa, the asymptotic Kolmogorov-Smirnov statistic, where

For two-sample data, PROC NPAR1WAY displays the following Kolmogorov-Smirnov statistics:

  • Pr > KSa, the asymptotic p -value for KSa, which equals Pr > D

  • D = max F1 ˆ’ F2 , the two-sample Kolmogorov-Smirnov statistic

For two-sample data, if you specify the D option, PROC NPAR1WAY also displays the following one-sided Kolmogorov-Smirnov statistics and their asymptotic p -values:

  • D+ = max(F1 ˆ’ F2)

  • Pr > D+

  • D ˆ’ = max(F2 ˆ’ F1)

  • Pr > D

For two-sample data, if you request an exact Kolmogorov-Smirnov test by specifying the KS option in the EXACT statement, PROC NPAR1WAY displays the following exact p -values:

  • Exact Pr >= D

  • Exact Pr >= D+

  • Exact Pr >= D ˆ’

  • Exact Point Pr = D, Exact Point Pr = D+, and Exact Point Pr = D ˆ’ , if you specify the POINT option in the EXACT statement

If you request Monte Carlo estimates for the two-sample exact Kolmogorov-Smirnov test, PROC NPAR1WAY displays the following information for two-sample data:

  • Estimate of Pr >= D, together with its Lower and Upper Confidence Limits

  • Estimate of Pr >= D+, together with its Lower and Upper Confidence Limits

  • Estimate of Pr >= D ˆ’ , together with its Lower and Upper Confidence Limits

  • Number of Samples used to compute the Monte Carlo estimates

  • Initial Seed used to compute the Monte Carlo estimates

The Cramer-von Mises Test table includes the following information for each CLASS variable value, or level:

  • N, the number of observations

  • Summed Deviation from Mean, which is click to expand

PROC NPAR1WAY also displays the following Cramer-von Mises statistics:

  • CM, the Cramer-von Mises statistic

  • CMa, the asymptotic Cramer-von Mises statistic, where CMa = n CM

For two-sample data, PROC NPAR1WAY displays the Kuiper Test table, which includes the following information for each class:

  • N, the number of observations

  • Deviation from Mean, which is max j F 1 ( x j ) ˆ’ F 2 ( x j )

PROC NPAR1WAY also displays the following Kuiper two-sample test statistics:

  • K, the Kuiper two-sample test statistic

  • Ka, the asymptotic Kuiper two-sample test statistic, where

  • Pr > Ka

ODS Table Names

PROC NPAR1WAY assigns a name to each table it creates. You can use these names to reference the table when using the Output Delivery System (ODS) to select tables and create output data sets. These names are listed in the following table. For more information on ODS, see Chapter 14, Using the Output Delivery System.

The WILCOXON, MEDIAN, VW, SAVAGE, and EDF options are the default if you do not specify any analysis options in the PROC NPAR1WAY statement.

Table 52.6: ODS Tables Produced in PROC NPAR1WAY

ODS Table Name

Description

Statement

Option

ANOVA

Analysis of variance

PROC

ANOVA

ABAnalysis

Ansari-Bradley one-way analysis

PROC

AB

ABMC

Monte Carlo estimates for the Ansari-Bradley exact test

EXACT

AB / MC

ABScores

Ansari-Bradley scores

PROC

AB

ABTest

Ansari-Bradley two-sample test

PROC

AB [ *]

ClassMeans

Class Means

PROC

ANOVA

CVMStats

Cramer-von Mises statistics

PROC

EDF

CVMTest

Cramer-von Mises test

PROC

EDF

DataScores

Data scores

PROC

SCORES=DATA

DataScoresAnalysis

Data scores one-way analysis

PROC

SCORES=DATA

DataScoresMC

Monte Carlo estimates for the exact test based on data scores

EXACT

SCORES=DATA / MC

DataScoresTest

Data scores two-sample test

PROC

SCORES=DATA [ *]

KlotzAnalysis

Klotz one-way analysis

PROC

KLOTZ

KlotzMC

Monte Carlo estimates for the Klotz exact test

EXACT

KLOTZ / MC

KlotzScores

Klotz scores

PROC

KLOTZ

KlotzTest

Klotz two-sample test

PROC

KLOTZ

KolSmirExactTest

Kolmogorov-Smirnov exact test

EXACT

KS [ *]

KolSmir2Stats

Kolmogorov-Smirnov two-sample statistics

PROC

EDF [ *]

KolSmirStats

Kolmogorov-Smirnov statistics

PROC

EDF [ **]

KolSmirTest

Kolmogorov-Smirnov test

PROC

EDF

KruskalWallisMC

Monte Carlo estimates for the Kruskal-Wallis exact test

EXACT

WILCOXON / MC [ **]

KruskalWallisTest

Kruskal-Wallis test

PROC

WILCOXON

KSMC

Monte Carlo estimates for the Kolmogorov-Smirnov exact test

EXACT

KS / MC [ *]

KuiperStats

Kuiper two-sample statistics

PROC

EDF [ *]

KuiperTest

Kuiper test

PROC

EDF [ *]

MedianAnalysis

Median one-way analysis

PROC

MEDIAN

MedianMC

Monte Carlo estimates for the median exact test

EXACT

MEDIAN / MC

MedianScores

Median scores

PROC

MEDIAN

MedianTest

Median two-sample test

PROC

MEDIAN [ *]

MoodAnalysis

Mood one-way analysis

PROC

MOOD

MoodMC

Monte Carlo estimates for the Mood exact test

EXACT

MOOD / MC

MoodScores

Mood scores

PROC

MOOD

MoodTest

Mood two-sample test

PROC

MOOD

SavageAnalysis

Savage one-way analysis

PROC

SAVAGE

SavageMC

Monte Carlo estimates for the Savage exact test

EXACT

SAVAGE / MC

SavageScores

Savage scores

PROC

SAVAGE

SavageTest

Savage two-sample test

PROC

SAVAGE [ *]

STAnalysis

Siegel-Tukey one-way analysis

PROC

ST

STMC

Monte Carlo estimates for the Siegel-Tukey exact test

EXACT

ST/MC

STScores

Siegel-Tukey scores

PROC

ST

STTest

Siegel-Tukey two-sample test

PROC

ST [ *]

VWAnalysis

Van der Waerden one-way analysis

PROC

VW

VWMC

Monte Carlo estimates for the Van der Waerden exact test

EXACT

VW / MC

VWScores

Van der Waerden scores

PROC

VW

VWTest

Van der Waerden two-sample test

PROC

VW [ *]

WilcoxonMC

Monte Carlo estimates for the Wilcoxon two-sample exact test

EXACT

WILCOXON / MC [ *]

WilcoxonScores

Wilcoxon scores

PROC

WILCOXON

WilcoxonTest

Wilcoxon two-sample test

PROC

WILCOXON [ *]

[ *] PROC NPAR1WAY produces this table only for two-sample data.

[ **] PROC NPAR1WAY produces this table only for multisample data.




SAS.STAT 9.1 Users Guide (Vol. 5)
SAS.STAT 9.1 Users Guide (Vol. 5)
ISBN: N/A
EAN: N/A
Year: 2004
Pages: 98

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net