If an observation has a missing value for a response variable, PROC NPAR1WAY excludes that observation from the analysis.
By default, PROC NPAR1WAY excludes observations with missing values of the CLASS variable. If you specify the MISSING option, PROC NPAR1WAY treats missing values of the CLASS variable as a valid class level and includes these observations in the analysis.
PROC NPAR1WAY treats missing BY variable values like any other BY variable value. The missing values form a separate BY group . When a value of the FREQ variable is missing, PROC NPAR1WAY excludes the observation from the analysis.
Tied values occur when two are more observations are equal, whether the observations occur in the same sample or in different samples. In theory, nonparametric tests were developed for continuous distributions where the probability of a tie is zero. In practice, however, ties often occur. PROC NPAR1WAY uses the same method to handle ties for all score types. The procedure computes the scores as if there were no ties, averages the scores for tied observations, and assigns this average score to each observation with the same value.
When there are tied values, PROC NPAR1WAY first sorts the observations in ascending order and assigns ranks as if there were no ties. Then the procedure computes the scores based on these ranks, using the formula for the specified score type. The procedure averages the scores for tied observations and assigns this average score to each of the tied observations. Thus, all equal data values have the same score value. PROC NPAR1WAY then computes the test statistic from these scores.
Note that the asymptotic tests may be less accurate when the distribution of the data is heavily tied. For such data, it may be appropriate to use the exact tests provided by PROC NPAR1WAY as described in the section Exact Tests on page 3171.
When computing empirical distribution function statistics for data with ties, PROC NPAR1WAY uses the formulas given in the section Tests Based on the Empirical Distribution Function on page 3168. No special handling of ties is necessary.
Note that PROC NPAR1WAY bases its computations on the internal numeric values of the analysis variables ; the procedure does not format or round these values before analysis. When values differ in their internal representation, even slightly, PROC NPAR1WAY does not treat them as tied values. If this is a concern for your data, then round the analysis variables by an appropriate amount before invoking PROC NPAR1WAY. For information on the ROUND function, refer to the discussion in SAS Language Reference: Dictionary .
Statistics of the form
are called simple linear rank statistics , where
R j | is the rank of the observation j |
a ( R j ) | is the score based on that rank |
c j | is an indicator variable denoting the class to which the j th observation belongs |
n | is the total number of observations |
For two-sample data (where the observations are classified into two levels), PROC NPAR1WAY calculates simple linear rank statistics for the scores that you specify. The section Scores for Linear Rank and One-Way ANOVA Tests on page 3166 describes the available scores, which you can use to test for differences in location and differences in scale.
To compute S , PROC NPAR1WAY sums the scores of the observations in the smaller of the two samples. If both samples have the same number of observations, PROC NPAR1WAY sums those scores for the sample that appears first in the input data set.
For each score that you specify, PROC NPAR1WAY computes an asymptotic test of the null hypothesis of no difference between the two classification levels. Exact tests are also available for these two-sample linear rank statistics. PROC NPAR1WAY computes exact tests for each score type that you specify in the EXACT statement. See the section Exact Tests on page 3171 for details.
To compute an asymptotic test for a linear rank sum statistic, PROC NPAR1WAY uses a standardized test statistic z , which has an asymptotic standard normal distribution under the null hypothesis. The standardized test statistic is computed as
where E ( S ) is the expected value of S under the null hypothesis, and Var ( S ) is the variance under the null hypothesis. As shown in Randles and Wolfe (1979),
where n 1 is the number of observations in the first (smaller) class level or sample, n 2 is the number of observations in the other class level, and
where is the average score,
PROC NPAR1WAY computes one-sided and two-sided asymptotic p -values for each two-sample linear rank test. When the test statistic z is greater than its null hypothesis expected value of zero, PROC NPAR1WAY computes the right-sided p -value, which is the probability of a larger value of the statistic occurring under the null hypothesis. When the test statistic is less than or equal to zero, PROC NPAR1WAY computes the left-sided p -value, which is the probability of a smaller value of the statistic occurring under the null hypothesis. The one-sided p -value P 1 can be expressed as
where Z has a standard normal distribution. The two-sided p -value P 2 is computed as
For Wilcoxon scores and Siegel-Tukey scores, PROC NPAR1WAY incorporates a continuity correction when computing the standardized test statistic z , unless you specify the CORRECT=NO option. PROC NPAR1WAY applies the continuity correction by subtracting 0.5 from the numerator S ˆ’ E ( S ) if it is greater than zero. If the numerator is less than zero, PROC NPAR1WAY adds 0.5. Some sources recommend a continuity correction for nonparametric tests that use a continuous distribution to approximate a discrete distribution. Refer to Sheskin (1997). If you specify CORRECT=NO, PROC NPAR1WAY does not use a continuity correction for any test.
PROC NPAR1WAY computes a one-way ANOVA test for each score type that you specify. Under the null hypothesis of no difference among class levels (or samples), this test statistic has an asymptotic chi-square distribution with r ˆ’ 1 degrees of freedom, where r is the number of class levels. For Wilcoxon scores, this test is known as the Kruskal-Wallis test.
Exact one-way ANOVA tests are also available for multisample data (where the data are classified into more than two levels). For two-sample data, exact simple linear rank tests are available. PROC NPAR1WAY computes exact tests for each score type that you specify in the EXACT statement. See the section Exact Tests on page 3171 for details on exact tests.
PROC NPAR1WAY computes the one-way ANOVA test statistic as
where T i is the total of scores for the class level i , E ( T i ) is the expected total for level i under the null hypothesis of no difference among levels, n i is the number of observations in level i , and S 2 is the sample variance of the scores.
where a ( R j ) is the score for observation j , and c ij indicates whether observation j is in level i .
where a is the average score,
For each score type that you specify, PROC NPAR1WAY computes a one-way ANOVA statistic and also a linear rank statistic for two-sample data. The following score types are used primarily to test for differences in location: Wilcoxon, median, Van der Waerden, and Savage. The following scores types are used to test for scale differences: Siegel-Tukey, Ansari-Bradley, Klotz, and Mood. This section gives formulas for the score types. For further information on the formulas and the applicability of each score, refer to Randles and Wolfe (1979), Gibbons and Chakraborti (1992), Conover (1999), and Hollander and Wolfe (1999).
In addition to the score types described in this section, you can specify the SCORES=DATA option to use the input data observations as scores. This enables you to produce a very wide variety of tests. You can construct any scores using the DATA step, and then PROC NPAR1WAY computes the corresponding linear rank and one-way ANOVA tests. You can also analyze the raw data with the SCORES=DATA option; for two-sample data, this permutation test is known as Pitman s test.
Wilcoxon scores are the ranks of the observations.
Using Wilcoxon scores in the linear rank statistic for two-sample data produces the rank sum statistic of the Mann-Whitney-Wilcoxon test. Using Wilcoxon scores in the one-way ANOVA statistic produces the Kruskal-Wallis test. Wilcoxon scores are locally most powerful for location shifts of a logistic distribution.
When computing the asymptotic Wilcoxon two-sample test, PROC NPAR1WAY uses a continuity correction by default, as described in the section Simple Linear Rank Tests for Two-Sample Data on page 3163. If you specify CORRECT=NO in the PROC NPAR1WAY statement, the procedure does not use a continuity correction.
Median scores equal 1 for observations greater than the median, and 0 otherwise .
Using median scores in the linear rank statistic for two-sample data produces the two-sample median test. The one-way ANOVA statistic with median scores is equivalent to the Brown-Mood test. Median scores are particularly powerful for distributions that are symmetric and heavy-tailed.
Van der Waerden scores are the quantiles of a standard normal distribution. These scores are also known as quantile normal scores .
where is the cumulative distribution function of a standard normal distribution. These scores are powerful for normal distributions.
Savage scores are expected values of order statistics from the exponential distribution, with 1 subtracted to center the scores around 0.
Savage scores are powerful for comparing scale differences in exponential distributions or location shifts in extreme value distributions (Hajek 1969, p. 83).
Siegel-Tukey scores are computed as
where the score values continue to increase in this pattern towards the middle ranks until all observations have been assigned a score.
Ansari-Bradley scores are similar to Siegel-Tukey scores, but Ansari-Bradley assigns the same scores to corresponding extreme ranks. (Siegel Tukey scores are just a permutation of the ranks 1 , 2 , , n .)
Equivalently, Ansari-Bradley scores are defined as
Klotz scores are the squares of the Van der Waerden (or quantile normal) scores.
where is the cumulative distribution function of a standard normal distribution.
Mood scores are computed as the square of the difference between each rank and the average rank.
If you specify the EDF option, PROC NPAR1WAY computes tests based on the empirical distribution function. These include the Kolmogorov-Smirnov and Cramer-von Mises tests, and also the Kuiper test for two-sample data. This section gives formulas for these test statistics. For further information on the formulas and the interpretation of EDF statistics, refer to Hollander and Wolfe (1999) and Gibbons and Chakraborti (1992). For details on the k -sample analogues of the Kolmogorov-Smirnov and Cramer-von Mises statistics used by NPAR1WAY, refer to Kiefer (1959).
The empirical distribution function (EDF) of a sample { x j }, j = 1 , 2 , , n , is defined as the following function:
where I ( ·) is an indicator function. PROC NPAR1WAY uses the subsample of values within the i th class level to generate an EDF for the class, F i . The EDF for the overall sample, pooled over classes, can also be expressed as
where n i is the number of observations in the i th class level, and n is the total number of observations.
The Kolmogorov-Smirnov statistic measures the maximum deviation of the EDF within the classes from the pooled EDF. PROC NPAR1WAY computes the Kolmogorov-Smirnov statistic as
The asymptotic Kolmogorov-Smirnov statistic is computed as
For each class level i and overall, PROC NPAR1WAY displays the value of F i at the maximum deviation from F and the value ( F i ˆ’ F ) at the maximum deviation from F . PROC NPAR1WAY also gives the observation where the maximum deviation occurs.
If there are only two class levels, PROC NPAR1WAY computes the two-sample Kolmogorov-Smirnov test statistic D as
The p -value for this test is the probability that D is greater than the observed value d under the null hypothesis of no difference between class levels or samples. PROC NPAR1WAY computes the asymptotic p -value for D with the approximation
where
The quality of this approximation has been studied by Hodges (1957).
If you specify the D option, or if you request exact Kolmogorov-Smirnov p -values with the KS option in the EXACT statement, PROC NPAR1WAY also computes the one-sided Kolmogorov-Smirnov statistics D + and D ˆ’ for two-sample data.
The asymptotic probability that D + is greater than the observed value d + , under the null hypothesis of no difference between the two class levels, is computed as
Similarly, the asymptotic probability that D ˆ’ is greater than the observed value d ˆ’ is computed as
To request exact p -values for the Kolmogorov-Smirnov statistics, you can specify the KS option in the EXACT statement. See the section Exact Tests on page 3171 for more information.
The Cramer-von Mises statistic is defined as
where t j is the number of ties at the j th distinct value and p is the number of distinct values. The asymptotic value is computed as
PROC NPAR1WAY displays the contribution of each class level to the sum CM a .
For data with two class levels, PROC NPAR1WAY computes the Kuiper statistic, its scaled value for the asymptotic distribution, and the asymptotic p -value. The Kuiper statistic is computed as
The asymptotic value is
PROC NPAR1WAY displays max j F 1 ( x j ) ˆ’ F 2 ( x j ) for each class level.
The p -value for the Kuiper test is the probability of observing a larger value of K a under the null hypothesis of no difference between the two classes. PROC NPAR1WAY computes this p -value according to Owen (1962), p. 441.
PROC NPAR1WAY provides exact p -values for tests for location and scale differences based on the following scores: Wilcoxon, median, van der Waerden, Savage, Siegel-Tukey, Ansari-Bradley, Klotz, and Mood scores. Additionally, PROC NPAR1WAY provides exact p -values for tests using the raw data as scores. Exact tests are available for two-sample and multisample data. When the data are classified into two samples, tests are based on simple linear rank statistics. When the data are classified into more than two samples, tests are based on one-way ANOVA statistics.
Exact tests can be useful in situations where the asymptotic assumptions are not met and the asymptotic p -values are not close approximations for the true p -values. Standard asymptotic methods involve the assumption that the test statistic follows a particular distribution when the sample size is sufficiently large. When the sample size is not large, asymptotic results may not be valid, with the asymptotic p -values differing perhaps substantially from the exact p -values. Asymptotic results may also be unreliable when the distribution of the data is sparse, skewed, or heavily tied. Refer to Agresti (1996) and Bishop, Fienberg, and Holland (1975). Exact computations are based on the statistical theory of exact conditional inference for contingency tables, reviewed by Agresti (1992).
In addition to computation of exact p -values, PROC NPAR1WAY provides the option of estimating exact p -values by Monte Carlo simulation. This can be useful for problems that are so large that exact computations require a great amount of time and memory, but for which asymptotic approximations may not be sufficient.
The following sections summarize the exact computational algorithms, define the exact p -values that PROC NPAR1WAY computes, discuss the computational resource requirements, and describe the Monte Carlo estimation option.
PROC NPAR1WAY computes exact p -values using the network algorithm developed by Mehta and Patel (1983). This algorithm provides a substantial advantage over direct enumeration, which can be very time consuming and feasible only for small problems. Refer to Agresti (1992) for a review of algorithms for computation of exact p -values, and refer to Mehta, Patel, and Tsiatis (1984) and Mehta, Patel, and Senchaudhuri (1991) for information on the performance of the network algorithm.
PROC NPAR1WAY constructs a contingency table from the input data, with rows formed by the levels of the classification variable and columns formed by the response variable values. The reference set for a given contingency table is the set of all contingency tables with the observed marginal row and column sums. Corresponding to this reference set, the network algorithm forms a directed acyclic network consisting of nodes in a number of stages. A path through the network corresponds to a distinct table in the reference set. The distances between nodes are defined so that the total distance of a path through the network is the corresponding value of the test statistic. At each node, the algorithm computes the shortest and longest path distances for all the paths that pass through that node. For the two-sample linear rank statistics, which can be expressed as a linear combination of cell frequencies multiplied by increasing row and column scores, PROC NPAR1WAY computes shortest and longest path distances using the algorithm given in Agresti, Mehta, and Patel (1990). For the multisample one-way test statistics, PROC NPAR1WAY computes an upper bound for the longest path and a lower bound for the shortest path, following the approach of Valz and Thompson (1994).
The longest and shortest path distances or bounds for a node are compared to the value of the test statistic to determine whether all paths through the node contribute to the p -value, none of the paths through the node contribute to the p -value, or neither of these situations occur. If all paths through the node contribute, the p -value is incremented accordingly , and these paths are eliminated from further analysis. If no paths contribute, these paths are eliminated from the analysis. Otherwise, the algorithm continues, still processing this node and the associated paths. The algorithm finishes when all nodes have been accounted for.
In applying the network algorithm, PROC NPAR1WAY uses full precision to represent all statistics, row and column scores, and other quantities involved in the computations. Although it is possible to use rounding to improve the speed and memory requirements of the algorithm, PROC NPAR1WAY does not do this since it can result in reduced accuracy of the p -values.
For two-sample linear rank tests, PROC NPAR1WAY computes exact one-sided and two-sided p -values for each test specified in the EXACT statement. For the one-sided test, PROC NPAR1WAY displays the right-sided p -value when the observed value of the test statistic is greater than its expected value. The right-sided p -value is the sum of probabilities for those tables having a test statistic greater than or equal to the observed test statistic. Otherwise, when the test statistic is less than or equal to its expected value, PROC NPAR1WAY displays the left-sided p -value. The left-sided p -value is the sum of probabilities for those tables having a test statistic less than or equal to the one observed. The one-sided p -value P 1 can be expressed as
where S is the observed value of the test statistic and Mean is the expected value of the test statistic under the null hypothesis. PROC NPAR1WAY computes the two-sided p -value as the sum of the one-sided p -value and the corresponding area in the opposite tail of the distribution of the statistic, equidistant from the expected value. The two-sided p -value P 2 can be expressed as
For multisample data, the tests are based on one-way ANOVA statistics. For a test of this form, large values of the test statistic indicate a departure from the null hypothesis; the test is inherently two-sided. The exact p -value is the sum of probabilities for those tables having a test statistic greater than or equal to the value of the observed test statistic.
If you specify the POINT option in the EXACT statement, PROC NPAR1WAY also displays exact point probabilities for the test statistics. The exact point probability is the exact probability that the test statistic equals the observed value.
PROC NPAR1WAY uses relatively fast and efficient algorithms for exact computations. These recently developed algorithms, together with improvements in computer power, make it feasible now to perform exact computations for data sets where previously only asymptotic methods could be applied. Nevertheless, there are still large problems that may require a prohibitive amount of time and memory for exact computations, depending on the speed and memory available on your computer. For large problems, consider whether exact methods are really needed or whether asymptotic methods might give results quite close to the exact results while requiring much less computer time and memory. When asymptotic methods may not be sufficient for such large problems, consider using Monte Carlo estimation of exact p -values, as described in the section Monte Carlo Estimation on page 3174.
A formula does not exist that can predict in advance how much time and memory are needed to compute an exact p -value for a certain problem. The time and memory required depend on several factors, including which test is being performed, the total sample size, the number of rows and columns, and the specific arrangement of the observations into table cells . Generally, larger problems (in terms of total sample size, number of rows, and number of columns) tend to require more time and memory. Additionally, for a fixed total sample size, time and memory requirements tend to increase as the number of rows and columns increase, since this corresponds to an increase in the number of tables in the reference set. Also for a fixed sample size, time and memory requirements increase as the marginal row and column totals become more homogeneous. Refer to Agresti, Mehta, and Patel (1990) and Gail and Mantel (1977).
At any time while PROC NPAR1WAY is computing exact p -values, you can terminate the computations by pressing the system interrupt key sequence (refer to the SAS Companion for your system) and choosing to stop computations. After you terminate exact computations, PROC NPAR1WAY completes all other remaining tasks . The procedure produces the requested output and reports missing values for any exact p -values not computed by the time of termination.
You can also use the MAXTIME= option in the EXACT statement to limit the amount of time PROC NPAR1WAY uses for exact computations. You specify a MAXTIME= value that is the maximum amount of time (in seconds) that PROC NPAR1WAY can use to compute an exact p -value. If PROC NPAR1WAY does not finish computing an exact p -value within that time, it terminates the computation and completes all other remaining tasks.
If you specify the MC option in the EXACT statement, PROC NPAR1WAY computes Monte Carlo estimates of the exact p -values instead of directly computing the exact p -values. Monte Carlo estimation can be useful for large problems that require a great amount of time and memory for exact computations but for which asymptotic approximations may not be sufficient. To describe the precision of each Monte Carlo estimate, PROC NPAR1WAY provides the asymptotic standard error and 100(1 ˆ’ ± )% confidence limits. The confidence level ± is determined by the ALPHA= option in the EXACT statement, which, by default, equals 0.01, and produces 99% confidence limits. The N= option in the EXACT statement specifies the number of samples PROC NPAR1WAY uses for Monte Carlo estimation; the default is 10,000 samples. You can specify a larger value for n to improve the precision of the Monte Carlo estimates. Because larger values of n generate more samples, the computation time increases . Or you can specify a smaller value of n to reduce the computation time.
To compute a Monte Carlo estimate of an exact p -value, PROC NPAR1WAY generates a random sample of tables with the same total sample size, row totals, and column totals as the observed table. PROC NPAR1WAY uses the algorithm of Agresti, Wackerly, and Boyett (1979), which generates tables in proportion to their hyper-geometric probabilities conditional on the marginal frequencies. For each sample table, PROC NPAR1WAY computes the value of the test statistic and compares it to the value for the observed table. When estimating a right-sided p -value, PROC NPAR1WAY counts all sample tables for which the test statistic is greater than or equal to the observed test statistic. Then the p -value estimate equals the number of these tables divided by the total number of tables sampled.
MC | = | M/N |
M | = | number of samples with (Test Statistic ‰ t ) |
N | = | total number of samples |
t | = | observed Test Statistic |
PROC NPAR1WAY computes left-sided and two-sided p -value estimates in a similar manner. For left-sided p -values, PROC NPAR1WAY evaluates whether the test statistic for each sampled table is less than or equal to the observed test statistic. For two-sided p -values, PROC NPAR1WAY examines the sample test statistics according to the expression for P 2 given in the section Definition of p -Values on page 3172.
The variable M is a binomial variable with N trials and success probability p . It follows that the asymptotic standard error of the Monte Carlo estimate is
PROC NPAR1WAY constructs asymptotic confidence limits for the p -values according to
where z ± / 2 is the 100(1 ˆ’ ± / 2) percentile of the standard normal distribution, and the confidence level ± is determined by the ALPHA= option in the EXACT statement.
When the Monte Carlo estimate MC equals 0, then PROC NPAR1WAY computes the confidence limits for the p -value as
When the Monte Carlo estimate MC equals 1, then PROC NPAR1WAY computes the confidence limits as
The OUTPUT statement creates a SAS data set that contains statistics computed by PROC NPAR1WAY. You specify which statistics to store in the output data set, using options identical to those used in the PROC NPAR1WAY statement. When you specify one of these options in the OUTPUT statement, PROC NPAR1WAY includes all available statistics from that analysis in the output data set.
The output data set contains one observation for each analysis variable within a BY-group. The OUTPUT data set can include the following variables:
BY variables
_VA R _ , which identifies the analysis variable
variables containing the specified statistics
The following table lists the variable names and descriptions for all available statistics. Note that some statistics are available only for the two-sample case (where the classification variable groups the data into two classes). Other statistics are available only for the multisample case.
When you request exact p -values for certain analyses using the EXACT statement, PROC NPAR1WAY also includes those p -values in the output data set if you specify the corresponding analysis options in the OUTPUT statement. If you do not request exact p -values, then they do not appear in the output data set.
Monte Carlo estimates of exact p -values are not available in this output data set, but you can use the Output Delivery System (ODS) to store Monte Carlo estimates in a SAS data set. You can use the Output Delivery System to create a SAS data set from any piece of PROC NPAR1WAY output. For more information, see Table 52.6 on page 3184 and Chapter 14, Using the Output Delivery System.
Option | Output Variables | Variable Descriptions | |
---|---|---|---|
ANOVA | _MSA_ | ANOVA Effect Mean Square, Among MS | |
_MSE_ | ANOVA Error Mean Square, Within MS | ||
_F_ | F Statistic for ANOVA | ||
P_F | p -value, F Statistic for ANOVA | ||
WILCOXON | _WIL_ | [ *] | Two-sample Wilcoxon Statistic |
Z_ IL | [ *] | Wilcoxon Statistic, Standardized | |
PL_WIL | [ *] | p -value, Wilcoxon Test (Left-sided) | |
PR_WIL | [ *] | p -value, Wilcoxon Test (Right-sided) | |
P2_WIL | [ *] | p -value, Wilcoxon Test (Two-sided) | |
PTL_WIL | [ *] | p -value, Wilcoxon t Approximation (Left-sided) | |
PTR_WIL | [ *] | p -value, Wilcoxon t Approximation, (Right-sided) | |
PT2_WIL | [ *] | p -value, Wilcoxon t Approximation, (Two-sided) | |
XPL_WIL | [ *] | Exact p -value, Wilcoxon Test (Left-sided) | |
XPR_WIL | [ *] | Exact p -value, Wilcoxon Test (Right-sided) | |
XPT_WIL | [ *] | Exact Point Probability, Wilcoxon Test | |
XP2_WIL | [ *] | Exact p -value, Wilcoxon Test (Two-sided) | |
_KW_ | Kruskal-Wallis Statistic | ||
DF_KW | Degrees of Freedom, Kruskal-Wallis Test | ||
P_KW | p -value, Kruskal-Wallis Test | ||
XP_KW | [ **] | Exact p -value, Kruskal-Wallis Test | |
XPT_KW | [ **] | Exact Point Probability, Kruskal-Wallis Test | |
MEDIAN | _MED_ | [ *] | Two-sample Median Statistic |
Z_MED | [ *] | Median Statistic, Standardized | |
PL_MED | [ *] | p -value, Median Test (Left-sided) | |
PR_MED | [ *] | p -value, Median Test (Right-sided) | |
P2_MED | [ *] | p -value, Median Test (Two-sided) | |
XPL_MED | [ *] | Exact p -value, Median Test (Left-sided) | |
XPR_MED | [ *] | Exact p -value, Median Test (Right-sided) | |
XPT_MED | [ *] | Exact Point Probability, Median Test | |
XP2_MED | [ *] | Exact p -value, Median Test (Two-sided) | |
_CHMED_ | Median Chi-square (Brown-Mood Test) | ||
DF_CHMED | Degrees of Freedom, Median Chi-square | ||
P_CHMED | p -value, Median Chi-square Test | ||
XP_CHMED | [ **] | Exact p -value, Median Chi-square | |
XPT_CHME | [ **] | Exact Point Probability, Median Chi-square | |
VW | _VW_ | [ *] | Two-sample Van der Waerden Statistic |
Z_VW | [ *] | Van der Waerden Statistic, Standardized | |
PL_VW | [ *] | p -value, Van der Waerden Test (Left-sided) | |
PR_VW | [ *] | p -value, Van der Waerden Test (Right-sided) | |
P2_VW | [ *] | p -value, Van der Waerden Test (Two-sided) | |
XPL_VW | [ *] | Exact p -value, Van der Waerden Test (Left-sided) | |
XPR_VW | [ *] | Exact p -value, Van der Waerden Test (Right-sided) | |
XPT_VW | [ *] | Exact Point Probability, Van der Waerden Test | |
XP2_VW | [ *] | Exact p -value, Van der Waerden Test (Two-sided) | |
_CHVW_ | Van der Waerden Chi-square | ||
DF_CHVW | Degrees of Freedom, Van der Waerden Chi-square | ||
P_CHVW | p -value, Van der Waerden Chi-square Test | ||
XP_CHVW | [ **] | Exact p -value, Van der Waerden Chi-square | |
XPT_CHVW | [ **] | Exact Point Prob, Van der Waerden Chi-square | |
SAVAGE | _SAV_ | [ *] | Two-sample Savage Statistic |
Z_SAV | [ *] | Savage Statistic, Standardized | |
PL_SAV | [ *] | p -value, Savage Test (Left-sided) | |
PR_SAV | [ *] | p -value, Savage Test (Right-sided) | |
P2_SAV | [ *] | p -value, Savage Test (Two-sided) | |
XPL_SAV | [ *] | Exact p -value, Savage Test (Left-sided) | |
XPR_SAV | [ *] | Exact p -value, Savage Test (Right-sided) | |
XPT_SAV | [ *] | Exact Point Probability, Savage Test | |
XP2_SAV | [ *] | Exact p -value, Savage Test (Two-sided) | |
_CHSAV_ | Savage Chi-square | ||
DF_CHSAV | Degrees of Freedom, Savage Chi-square | ||
P_CHSAV | p -value, Savage Chi-square Test | ||
XP_CHSAV | [ **] | Exact p -value, Savage Chi-square | |
XPT_CHSA | [ **] | Exact Point Probability, Savage Chi-square | |
ST | _ST_ | [ *] | Two-sample Siegel-Tukey Statistic |
Z_ST | [ *] | Siegel-Tukey Statistic, Standardized | |
PL_ST | [ *] | p -value, Siegel-Tukey Test (Left-sided) | |
PR_ST | [ *] | p -value, Siegel-Tukey Test (Right-sided) | |
P2_ST | [ *] | p -value, Siegel-Tukey Test (Two-sided) | |
XPL_ST | [ *] | Exact p -value, Siegel-Tukey Test (Left-sided) | |
XPR_ST | [ *] | Exact p -value, Siegel-Tukey Test (Right-sided) | |
XPT_ST | [ *] | Exact Point Probability, Siegel-Tukey Test | |
XP2_ST | [ *] | Exact p -value, Siegel-Tukey Test (Two-sided) | |
_CHST_ | Siegel-Tukey Chi-square | ||
DF_CHST | Degrees of Freedom, Siegel-Tukey Chi-square | ||
P_CHST | p -value, Siegel-Tukey Chi-square Test | ||
XP_CHST | [ **] | Exact p -value, Siegel-Tukey Chi-square | |
XPT_CHST | [ **] | Exact Point Probability, Siegel-Tukey Chi-square | |
AB | _AB_ | [ *] | Two-sample Ansari-Bradley Statistic |
Z_AB | [ *] | Ansari-Bradley Statistic, Standardized | |
PL_AB | [ *] | p -value, Ansari-Bradley Test (Left-sided) | |
PR_AB | [ *] | p -value, Ansari-Bradley Test (Right-sided) | |
P2_AB | [ *] | p -value, Ansari-Bradley Test (Two-sided) | |
XPL_AB | [ *] | Exact p -value, Ansari-Bradley Test (Left-sided) | |
XPR_AB | [ *] | Exact p -value, Ansari-Bradley Test (Right-sided) | |
XPT_AB | [ *] | Exact Point Probability, Ansari-Bradley Test | |
XP2_AB | [ *] | Exact p -value, Ansari-Bradley Test (Two-sided) | |
_CHAB_ | Ansari Bradley Chi-square | ||
DF_CHAB | Degrees of Freedom, Ansari-Bradley Chi-square | ||
P_CHAB | p -value, Ansari-Bradley Chi-square Test | ||
XP_CHAB | [ **] | Exact p -value, Ansari-Bradley Chi-square | |
XPT_CHAB | [ **] | Exact Point Probability, Ansari-Bradley Chi-square | |
KLOTZ | _KLOTZ_ | [ *] | Two-sample Klotz Statistic |
Z_K | [ *] | Klotz Statistic, Standardized | |
PL_K | [ *] | p -value, Klotz Test (Left-sided) | |
PR_K | [ *] | p -value, Klotz Test (Right-sided) | |
P2_K | [ *] | p -value, Klotz Test (Two-sided) | |
XPL_K | [ *] | Exact p -value, Klotz Test (Left-sided) | |
XPR_K | [ *] | Exact p -value, Klotz Test (Right-sided) | |
XPT_K | [ *] | Exact Point Probability, Klotz Test | |
XP2_K | [ *] | Exact p -value, Klotz Test (Two-sided) | |
_CHK_ | Klotz Chi-square | ||
DF_CHK | Degrees of Freedom, Klotz Chi-square | ||
P_CHK | p -value, Klotz Chi-square Test | ||
XP_CHK | [ **] | Exact p -value, Klotz Chi-square | |
XPT_CHK | [ **] | Exact Point Probability, Klotz Chi-square | |
MOOD | _MOOD_ | [ *] | Two-sample Mood Statistic |
Z_MOOD | [ *] | Mood Statistic, Standardized | |
PL_MOOD | [ *] | p -value, Mood Test (Left-sided) | |
PR_MOOD | [ *] | p -value, Mood Test (Right-sided) | |
P2_MOOD | [ *] | p -value, Mood Test (Two-sided) | |
XPL_MOOD | [ *] | Exact p -value, Mood Test (Left-sided) | |
XPR_MOOD | [ *] | Exact p -value, Mood Test (Right-sided) | |
XPT_MOOD | [ *] | Exact Point Probability, Mood Test | |
XP2_MOOD | [ *] | Exact p -value, Mood Test (Two-sided) | |
_CHMOOD_ | Mood Chi-square | ||
DF_CHMOO | Degrees of Freedom, Mood Chi-square | ||
P_CHMOOD | p -value, Mood Chi-square Test | ||
XP_CHMOO | [ **] | Exact p -value, Mood Chi-square | |
XPT_CHMO | [ **] | Exact Point Probability, Mood Chi-square | |
SCORES=DATA | _DATA_ | [ *] | Two-sample Data Scores Statistic |
Z_DATA | [ *] | Data Scores Statistic, Standardized | |
PL_DATA | [ *] | p -value, Data Scores Test (Left-sided) | |
PR_DATA | [ *] | p -value, Data Scores Test (Right-sided) | |
P2_DATA | [ *] | p -value, Data Scores Test (Two-sided) | |
XPL_DATA | [ *] | Exact p -value, Data Scores Test (Left-sided) | |
XPR_DATA | [ *] | Exact p -value, Data Scores Test (Right-sided) | |
XPT_DATA | [ *] | Exact Point Probability, Data Scores Test | |
XP2_DATA | [ *] | Exact p -value, Data Scores Test (Two-sided) | |
_CHDATA_ | Data Scores Chi-square | ||
DF_CHDAT | Degrees of Freedom, Data Scores Chi-square | ||
P_CHDATA | p -value, Data Scores Chi-square Test | ||
XP_CHDAT | [ **] | Exact p -value, Data Scores Chi-square | |
XPT_CHDA | [ **] | Exact Point Probability, Data Scores Chi-square | |
EDF | _KS_ | Kolmogorov-Smirnov Statistic | |
_KSA_ | Kolmogorov-Smirnov Statistic (Asymptotic) | ||
_Dp_ | [ *] | Two-sample Kolmogorov-Smirnov D+ | |
P_Dp | [ *] | p -value, Kolmogorov-Smirnov D+ | |
_Dm_ | [ *] | Two-sample Kolmogorov-Smirnov D- | |
P_Dm | [ *] | p -value, Kolmogorov-Smirnov D- | |
_D_ | [ *] | Two-sample Kolmogorov-Smirnov Statistic | |
P_KSA | [ *] | p -value, Two-sample Kolmogorov-Smirnov | |
XP_Dp | [ *] | Exact p -value, Kolmogorov-Smirnov D+ | |
XPT_Dp | [ *] | Exact Point Probability, Kolmogorov-Smirnov D+ | |
XP_ Dm | [ *] | Exact p -value, Kolmogorov-Smirnov D- | |
XPT_Dm | [ *] | Exact Point Probability, Kolmogorov-Smirnov D- | |
XP_D | [ *] | Exact p -value, Kolmogorov-Smirnov D | |
XPT_D | [ *] | Exact Point Probability, Kolmogorov-Smirnov D | |
_CM_ | Cramer-von Mises Statistic | ||
_CMA_ | Cramer-von Mises Statistic (Asymptotic) | ||
_K_ | [ *] | Kuiper Two-sample Statistic | |
_KA_ | [ *] | Kuiper Two-sample Statistic (Asymptotic) | |
P_KA | [ *] | p -value, Two-sample Kuiper (Asymptotic) | |
[ *] Statistic included only for two-sample cases [ **] Statistic included only for multisample cases |
If you specify the ANOVA option, PROC NPAR1WAY displays a Class Means table and an Analysis of Variance table for each response variable. The Class Means table includes the following information for each CLASS variable value, or level:
N, the number of observations
the Mean of the response variable
The Analysis of Variance table includes the following information for each Source of variation (Among classes, and Within classes):
DF, the degrees of freedom associated with the source
the Sum of Squares
the Mean Square, the sum of squares divided by the degrees of freedom
The Analysis of Variance table also includes the following:
the F Value for testing the hypothesis that the group means are equal. This is computed by dividing the Mean Square (Among) by the Mean Square (Within).
Pr > F, the significance probability corresponding to the F Value
For each score type that you specify, PROC NPAR1WAY displays a Class Scores table. The available score types include Wilcoxon, median, Van der Waerden, Savage, Siegel-Tukey, Ansari-Bradley, Klotz, Mood, and raw data scores. PROC NPAR1WAY assigns the specified scores to the response variable values, and classifies then according to the CLASS variable values. The Class Scores table includes the following information for each class:
N, the number of observations
Sum of Scores
Expected Under H0, the expected sum of scores under the null hypothesis of no difference among classes
Std Dev Under H0, the standard deviation under the null hypothesis
Mean Score
When there are only two levels of the CLASS variable, PROC NPAR1WAY displays the following Two-Sample Test results for each analysis of scores:
Statistic, which is the sum of scores for the class with the smaller sample size
Z, the standardized test statistic, which has an asymptotic standard normal distribution under the null hypothesis
One-Sided Pr < Z, or One-Sided Pr > Z, the asymptotic one-sided p -value, displayed as Pr < Z or Pr > Z, depending on whether Z is <= 0 or > 0
Two-Sided Pr > Z, the asymptotic two-sided p -value
For Wilcoxon scores, PROC NPAR1WAY also displays a t -approximation for the two-sample test.
If you request an exact test by specifying the score type in the EXACT statement, PROC NPAR1WAY displays the following exact p -values for two-sample data:
One-Sided Pr <= S, or One-Sided Pr >= S, the one-sided exact p -value, displayed as Pr <= S or Pr >= S, depending on whether S <= Mean or S > Mean, where S is the test statistic and Mean is its expected value under the null hypothesis
Point Pr = S, the point probability, if you specify the POINT optioninthe EXACT statement
Two-Sided Pr >= S - Mean, the two-sided exact p -value
If you request Monte Carlo estimates for the exact test by specifying the MC option in the EXACT statement, PROC NPAR1WAY displays the following information for two-sample data:
Estimate of One-Sided Pr <= S or One-Sided Pr >= S, the one-sided exact p -value, together with its Lower and Upper Confidence Limits
Estimate of Two-Sided Pr >= S - Mean, the two-sided exact p -value, together with its Lower and Upper Confidence Limits
Number of Samples used to compute the Monte Carlo estimates
Initial Seed used to compute the Monte Carlo estimates
For both two-sample and multisample data, PROC NPAR1WAY displays the following One-Way Analysis for each score type:
Chi-Square, the one-way ANOVA statistic for testing the null hypothesis of no difference among classes
DF, the degrees of freedom
Pr > Chi-Square, the asymptotic p -value
For multisample data, if you request an exact test by specifying the score type in the EXACT statement, PROC NPAR1WAY also displays the exact p -value as follows:
Exact Pr >= Chi-Square
Exact Pr = Chi-Square, the point probability, if you specify the POINT option in the EXACT statement
For multisample data, if you request a Monte Carlo estimate for the exact test by specifying the MC option in the EXACT statement, PROC NPAR1WAY displays the following information:
Estimate of Pr >= Chi-Square, together with its Lower and Upper Confidence Limits
Number of Samples used to compute the Monte Carlo estimate
Initial Seed used to compute the Monte Carlo estimate
If you specify the EDF option, PROC NPAR1WAY produces tables for the Kolmogorov-Smirnov Test, the Cramer-von Mises Test, and for two-sample data only, the Kuiper Test. The Kolmogorov-Smirnov Test table includes the following information for each CLASS variable value, or level:
N, the number of observations
EDF at Maximum, the value of the class EDF (empirical distribution function) at its maximum deviation from the pooled EDF
Deviation from Mean at Maximum, the value of at its maximum, where n i it the class sample size, F i is the class EDF, and F is the pooled EDF
PROC NPAR1WAY displays the following Kolmogorov-Smirnov statistics:
KS, the Kolmogorov-Smirnov statistic
KSa, the asymptotic Kolmogorov-Smirnov statistic, where
For two-sample data, PROC NPAR1WAY displays the following Kolmogorov-Smirnov statistics:
Pr > KSa, the asymptotic p -value for KSa, which equals Pr > D
D = max F1 ˆ’ F2 , the two-sample Kolmogorov-Smirnov statistic
For two-sample data, if you specify the D option, PROC NPAR1WAY also displays the following one-sided Kolmogorov-Smirnov statistics and their asymptotic p -values:
D+ = max(F1 ˆ’ F2)
Pr > D+
D ˆ’ = max(F2 ˆ’ F1)
Pr > D
For two-sample data, if you request an exact Kolmogorov-Smirnov test by specifying the KS option in the EXACT statement, PROC NPAR1WAY displays the following exact p -values:
Exact Pr >= D
Exact Pr >= D+
Exact Pr >= D ˆ’
Exact Point Pr = D, Exact Point Pr = D+, and Exact Point Pr = D ˆ’ , if you specify the POINT option in the EXACT statement
If you request Monte Carlo estimates for the two-sample exact Kolmogorov-Smirnov test, PROC NPAR1WAY displays the following information for two-sample data:
Estimate of Pr >= D, together with its Lower and Upper Confidence Limits
Estimate of Pr >= D+, together with its Lower and Upper Confidence Limits
Estimate of Pr >= D ˆ’ , together with its Lower and Upper Confidence Limits
Number of Samples used to compute the Monte Carlo estimates
Initial Seed used to compute the Monte Carlo estimates
The Cramer-von Mises Test table includes the following information for each CLASS variable value, or level:
N, the number of observations
Summed Deviation from Mean, which is
PROC NPAR1WAY also displays the following Cramer-von Mises statistics:
CM, the Cramer-von Mises statistic
CMa, the asymptotic Cramer-von Mises statistic, where CMa = n CM
For two-sample data, PROC NPAR1WAY displays the Kuiper Test table, which includes the following information for each class:
N, the number of observations
Deviation from Mean, which is max j F 1 ( x j ) ˆ’ F 2 ( x j )
PROC NPAR1WAY also displays the following Kuiper two-sample test statistics:
K, the Kuiper two-sample test statistic
Ka, the asymptotic Kuiper two-sample test statistic, where
Pr > Ka
PROC NPAR1WAY assigns a name to each table it creates. You can use these names to reference the table when using the Output Delivery System (ODS) to select tables and create output data sets. These names are listed in the following table. For more information on ODS, see Chapter 14, Using the Output Delivery System.
The WILCOXON, MEDIAN, VW, SAVAGE, and EDF options are the default if you do not specify any analysis options in the PROC NPAR1WAY statement.
ODS Table Name | Description | Statement | Option |
---|---|---|---|
ANOVA | Analysis of variance | PROC | ANOVA |
ABAnalysis | Ansari-Bradley one-way analysis | PROC | AB |
ABMC | Monte Carlo estimates for the Ansari-Bradley exact test | EXACT | AB / MC |
ABScores | Ansari-Bradley scores | PROC | AB |
ABTest | Ansari-Bradley two-sample test | PROC | AB [ *] |
ClassMeans | Class Means | PROC | ANOVA |
CVMStats | Cramer-von Mises statistics | PROC | EDF |
CVMTest | Cramer-von Mises test | PROC | EDF |
DataScores | Data scores | PROC | SCORES=DATA |
DataScoresAnalysis | Data scores one-way analysis | PROC | SCORES=DATA |
DataScoresMC | Monte Carlo estimates for the exact test based on data scores | EXACT | SCORES=DATA / MC |
DataScoresTest | Data scores two-sample test | PROC | SCORES=DATA [ *] |
KlotzAnalysis | Klotz one-way analysis | PROC | KLOTZ |
KlotzMC | Monte Carlo estimates for the Klotz exact test | EXACT | KLOTZ / MC |
KlotzScores | Klotz scores | PROC | KLOTZ |
KlotzTest | Klotz two-sample test | PROC | KLOTZ |
KolSmirExactTest | Kolmogorov-Smirnov exact test | EXACT | KS [ *] |
KolSmir2Stats | Kolmogorov-Smirnov two-sample statistics | PROC | EDF [ *] |
KolSmirStats | Kolmogorov-Smirnov statistics | PROC | EDF [ **] |
KolSmirTest | Kolmogorov-Smirnov test | PROC | EDF |
KruskalWallisMC | Monte Carlo estimates for the Kruskal-Wallis exact test | EXACT | WILCOXON / MC [ **] |
KruskalWallisTest | Kruskal-Wallis test | PROC | WILCOXON |
KSMC | Monte Carlo estimates for the Kolmogorov-Smirnov exact test | EXACT | KS / MC [ *] |
KuiperStats | Kuiper two-sample statistics | PROC | EDF [ *] |
KuiperTest | Kuiper test | PROC | EDF [ *] |
MedianAnalysis | Median one-way analysis | PROC | MEDIAN |
MedianMC | Monte Carlo estimates for the median exact test | EXACT | MEDIAN / MC |
MedianScores | Median scores | PROC | MEDIAN |
MedianTest | Median two-sample test | PROC | MEDIAN [ *] |
MoodAnalysis | Mood one-way analysis | PROC | MOOD |
MoodMC | Monte Carlo estimates for the Mood exact test | EXACT | MOOD / MC |
MoodScores | Mood scores | PROC | MOOD |
MoodTest | Mood two-sample test | PROC | MOOD |
SavageAnalysis | Savage one-way analysis | PROC | SAVAGE |
SavageMC | Monte Carlo estimates for the Savage exact test | EXACT | SAVAGE / MC |
SavageScores | Savage scores | PROC | SAVAGE |
SavageTest | Savage two-sample test | PROC | SAVAGE [ *] |
STAnalysis | Siegel-Tukey one-way analysis | PROC | ST |
STMC | Monte Carlo estimates for the Siegel-Tukey exact test | EXACT | ST/MC |
STScores | Siegel-Tukey scores | PROC | ST |
STTest | Siegel-Tukey two-sample test | PROC | ST [ *] |
VWAnalysis | Van der Waerden one-way analysis | PROC | VW |
VWMC | Monte Carlo estimates for the Van der Waerden exact test | EXACT | VW / MC |
VWScores | Van der Waerden scores | PROC | VW |
VWTest | Van der Waerden two-sample test | PROC | VW [ *] |
WilcoxonMC | Monte Carlo estimates for the Wilcoxon two-sample exact test | EXACT | WILCOXON / MC [ *] |
WilcoxonScores | Wilcoxon scores | PROC | WILCOXON |
WilcoxonTest | Wilcoxon two-sample test | PROC | WILCOXON [ *] |
[ *] PROC NPAR1WAY produces this table only for two-sample data. [ **] PROC NPAR1WAY produces this table only for multisample data. |