Getting Started


This example illustrates how you can use PROC NPAR1WAY to perform a one-way nonparametric analysis. The data from Halverson and Sherwood (1932) consist of weight gain measurements for five different levels of gossypol additive. Gossypol is a substance contained in cottonseed shells , and these data were collected to study the effect of gossypol on animal nutrition.

The following DATA step statements create the SAS data set Gossypol :

  data Gossypol;   input Dose n;   do i=1 to n;   input Gain @@;   output;   end;   datalines;   0 16   228 229 218 216 224 208 235 229 233 219 224 220 232 200 208 232   .04 11   186 229 220 208 228 198 222 273 216 198 213   .07 12   179 193 183 180 143 204 114 188 178 134 208 196   .10 17   130 87 135 116 118 165 151 59 126 64 78 94 150 160 122 110 178   .13 11   154 130 130 118 118 104 112 134 98 100 104   ;  

The data set Gossypol contains the variable Dose , which represents the amount of gossypol additive, and the variable Gain , which represents the weight gain.

Researchers are interested in whether there is a difference in weight gain among the different dose levels of gossypol. The following statements invoke the NPAR1WAY procedure to perform a nonparametric analysis of this problem:

  proc npar1way data=Gossypol;   class Dose;   var Gain;   run;  

The variable Dose is the CLASS variable, and the VAR statement specifies the variable Gain is the response variable. The CLASS statement is required, and you must name only one CLASS variable. You may name one or more analysis variables in the VAR statement. If you omit the VAR statement, PROC NPAR1WAY analyzes all numeric variables in the data set except for the CLASS variable, the FREQ variable, and the BY variables .

Since no analysis options are specified in the PROC NPAR1WAY statement, the ANOVA, WILCOXON, MEDIAN, VW, SAVAGE, and EDF options are invoked by default. The following tables show the results of these analyses.

The tables in Figure 52.1 are produced with the ANOVA option. For each level of the CLASS variable Dose , PROC NPAR1WAY displays the number of observations and the mean of the analysis variable Gain . PROC NPAR1WAY displays a standard analysis of variance on the raw data. This gives the same results as the GLM and ANOVA procedures. The p -value for the F test is <.0001, which indicates that Dose accounts for a significant portion of the variability in the dependent variable Gain .

start figure
  The NPAR1WAY Procedure   Analysis of Variance for Variable Gain   Classified by Variable Dose   Dose             N                Mean   --------------------------------------   0            16          222.187500   0.04            11          217.363636   0.07            12          175.000000   0.1            17          120.176471   0.13            11          118.363636   Source    DF    Sum of Squares    Mean Square     F Value    Pr > F   -------------------------------------------------------------------   Among      4     140082.986077    35020.74652     55.8143    <.0001   Within    62      38901.998997      627.45160   Average scores were used for ties.  
end figure

Figure 52.1: Analysis of Variance

The WILCOXON option produces the output in Figure 52.2. PROC NPAR1WAY first provides a summary of the Wilcoxon scores for the analysis variable Gain by class level. For each level of the CLASS variable Dose , PROC NPAR1WAY displays the following information: number of observations, sum of the Wilcoxon scores, expected sum under the null hypothesis of no difference among class levels, standard deviation under the null hypothesis, and mean score.

start figure
  Wilcoxon Scores (Rank Sums) for Variable Gain   Classified by Variable Dose   Sum of      Expected       Std Dev          Mean   Dose       N        Scores      Under H0      Under H0         Score   --------------------------------------------------------------------   0      16        890.50         544.0     67.978966     55.656250   0.04      11        555.00         374.0     59.063588     50.454545   0.07      12        395.50         408.0     61.136622     32.958333   0.1      17        275.50         578.0     69.380741     16.205882   0.13      11        161.50         374.0     59.063588     14.681818   Average scores were used for ties.   Kruskal-Wallis Test   Chi-Square         52.6656   DF                       4   Pr > Chi-Square     <.0001  
end figure

Figure 52.2: Wilcoxon Score Analysis

Next PROC NPAR1WAY displays the one-way ANOVA statistic, which for Wilcoxon scores is known as the Kruskal-Wallis test. The statistic equals 52.6656, with four degrees of freedom, which is the number of class levels minus one. The p -value, or probability of a larger statistic under the null hypothesis, is <.0001. This leads to rejection of the null hypothesis that there is no difference in location for Gain among the levels of Dose . This p -value is asymptotic, computed from the asymptotic chi-square distribution of the test statistic. For certain data sets it may also be useful to compute the exact p -value; for example, for small data sets, or data sets that are sparse, skewed, or heavily tied. You can use the EXACT statement to request exact p -values for any of the location or scale tests available in PROC NPAR1WAY.

Figure 52.3 through Figure 52.5 display the analyses produced by the MEDIAN, VW, and SAVAGE options. For each score type, PROC NPAR1WAY provides a summary of scores and the one-way ANOVA statistic, as previously described for Wilcoxon scores. Other score types available in PROC NPAR1WAY are Siegel-Tukey, Ansari-Bradley, Klotz, and Mood, which are used to test for scale differences. Additionally, you can request the SCORES=DATA option, which uses the input data as scores. This option gives you the flexibility to construct any scores for your data with the DATA step and then analyze these scores with PROC NPAR1WAY.

start figure
  Median Scores (Number of Points Above Median) for Variable Gain   Classified by Variable Dose   Sum of      Expected       Std Dev          Mean   Dose       N        Scores      Under H0      Under H0         Score   --------------------------------------------------------------------   0      16          16.0      7.880597      1.757902          1.00   0.04      11          11.0      5.417910      1.527355          1.00   0.07      12           6.0      5.910448      1.580963          0.50   0.1      17           0.0      8.373134      1.794152          0.00   0.13      11           0.0      5.417910      1.527355          0.00   Average scores were used for ties.   Median One-Way Analysis   Chi-Square         54.1765   DF                       4   Pr > Chi-Square    <.0001  
end figure

Figure 52.3: Median Score Analysis
start figure
  Van der Waerden Scores (Normal) for Variable Gain   Classified by Variable Dose   Sum of      Expected       Std Dev          Mean   Dose       N        Scores      Under H0      Under H0         Score   --------------------------------------------------------------------   0      16     16.116474           0.0      3.325957      1.007280   0.04      11      8.340899           0.0      2.889761      0.758264   0.07      12   0.576674           0.0      2.991186   0.048056   0.1      17   14.688921           0.0      3.394540   0.864054   0.13      11   9.191777           0.0      2.889761   0.835616   Average scores were used for ties.   Van der Waerden One-Way Analysis   Chi-Square         47.2972   DF                       4   Pr > Chi-Square    <.0001  
end figure

Figure 52.4: Van der Waerden Score Analysis
start figure
  Savage Scores (Exponential) for Variable Gain   Classified by Variable Dose   Sum of      Expected       Std Dev          Mean   Dose       N        Scores      Under H0      Under H0         Score   --------------------------------------------------------------------   0      16     16.074391           0.0      3.385275      1.004649   0.04      11      7.693099           0.0      2.941300      0.699373   0.07      12   3.584958           0.0      3.044534   0.298746   0.1      17   11.979488           0.0      3.455082   0.704676   0.13      11   8.203044           0.0      2.941300   0.745731   Average scores were used for ties.   Savage One-Way Analysis   Chi-Square         39.4908   DF                       4   Pr > Chi-Square    <.0001   
end figure

Figure 52.5: Savage Score Analysis
start figure
  Kolmogorov-Smirnov Test for Variable Gain   Classified by Variable Dose   EDF at    Deviation from Mean   Dose        N       Maximum        at Maximum   -------------------------------------------------   0       16      0.000000   1.910448   0.04       11      0.000000   1.584060   0.07       12      0.333333   0.499796   0.1       17      1.000000          2.153861   0.13       11      1.000000          1.732565   Total      67      0.477612   Maximum Deviation Occurred at Observation 36   Value of Gain at Maximum = 178.0   Kolmogorov-Smirnov Statistics (Asymptotic)   KS  0.457928    KSa  3.748300   Cramer-von Mises Test for Variable Gain   Classified by Variable Dose   Summed Deviation   Dose           N            from Mean   ---------------------------------------   0          16             2.165210   0.04          11             0.918280   0.07          12             0.348227   0.1          17             1.497542   0.13          11             1.335745   Cramer-von Mises Statistics (Asymptotic)   CM  0.093508    CMa  6.265003  
end figure

Figure 52.6: Empirical Distribution Function Analysis

The tables in Figure 52.6 display the empirical distribution function statistics, comparing the distribution of Gain for the different levels of Dose . These tables are produced by the EDF option, and they include Kolmogorov-Smirnov statistics and Cramer-von Mises statistics.

In the preceding example, the CLASS variable Dose has five levels, and the analyses examine possible differences among these five levels, or samples. The following statements invoke the NPAR1WAY procedure to perform a nonparametric analysis of the two lowest levels of Dose :

  proc npar1way data=Gossypol;   where Dose <= .04;   class Dose;   var Gain;   run;  

The following tables show the results of this two-sample analysis. The tables in Figure 52.7 are produced by the ANOVA option.

start figure
  The NPAR1WAY Procedure   Analysis of Variance for Variable Gain   Classified by Variable Dose   Dose             N                Mean   --------------------------------------   0            16          222.187500   0.04            11          217.363636   Source    DF    Sum of Squares    Mean Square     F Value    Pr > F   -------------------------------------------------------------------   Among      1        151.683712     151.683712      0.5587    0.4617   Within    25       6786.982955     271.479318   Average scores were used for ties.  
end figure

Figure 52.7: Analysis of Variance for Two-Sample Data

Figure 52.8 displays the output produced by the WILCOXON option. PROC NPAR1WAY provides a summary of the Wilcoxon scores for the analysis variable Gain for each of the two class levels. Since there are only two levels, PROC NPAR1WAY displays the two-sample test, based on the simple linear rank statistic with Wilcoxon scores. The normal approximation includes a continuity correction. To remove this, you can specify the CORRECT=NO option. PROC NPAR1WAY also gives a t approximation for the Wilcoxon two-sample test. And as for the multisample analysis, PROC NPAR1WAY computes a one-way ANOVA statistic, which for Wilcoxon scores is known as the Kruskal-Wallis test. All these p -values show no difference in Gain for the two Dose levels at the .05 level of significance.

start figure
  Wilcoxon Scores (Rank Sums) for Variable Gain   Classified by Variable Dose   Sum of      Expected       Std Dev          Mean   Dose       N        Scores      Under H0      Under H0         Score   --------------------------------------------------------------------   0      16        253.50         224.0     20.221565     15.843750   0.04      11        124.50         154.0     20.221565     11.318182   Average scores were used for ties.   Wilcoxon Two-Sample Test   Statistic             124.5000   Normal Approximation   Z   1.4341   One-Sided Pr <  Z       0.0758   Two-Sided Pr > Z      0.1515   t Approximation   One-Sided Pr <  Z       0.0817   Two-Sided Pr > Z      0.1635   Z includes a continuity correction of 0.5.   Kruskal-Wallis Test   Chi-Square              2.1282   DF                           1   Pr > Chi-Square         0.1446  
end figure

Figure 52.8: Wilcoxon Two-Sample Analysis

Figure 52.9 through Figure 52.11 display the two-sample analyses produced by the MEDIAN, VW, and SAVAGE options.

start figure
  Median Scores (Number of Points Above Median) for Variable Gain   Classified by Variable Dose   Sum of      Expected       Std Dev          Mean   Dose       N        Scores      Under H0      Under H0         Score   --------------------------------------------------------------------   0      16           9.0      7.703704      1.299995      0.562500   0.04      11           4.0      5.296296      1.299995      0.363636   Average scores were used for ties.   Median Two-Sample Test   Statistic              4.0000   Z   0.9972   One-Sided Pr <  Z      0.1593   Two-Sided Pr > Z     0.3187   Median One-Way Analysis   Chi-Square             0.9943   DF                          1   Pr > Chi-Square        0.3187  
end figure

Figure 52.9: Median Two-Sample Analysis
start figure
  Van der Waerden Scores (Normal) for Variable Gain   Classified by Variable Dose   Sum of      Expected       Std Dev          Mean   Dose       N        Scores      Under H0      Under H0         Score   --------------------------------------------------------------------   0      16      3.346520           0.0      2.320336      0.209157   0.04      11   3.346520           0.0      2.320336   0.304229   Average scores were used for ties.   Van der Waerden Two-Sample Test   Statistic   3.3465   Z   1.4423   One-Sided Pr <  Z      0.0746   Two-Sided Pr > Z     0.1492   Van der Waerden One-Way Analysis   Chi-Square             2.0801   DF                          1   Pr > Chi-Square        0.1492  
end figure

Figure 52.10: Van der Waerden Two-Sample Analysis
start figure
  Savage Scores (Exponential) for Variable Gain   Classified by Variable Dose   Sum of      Expected       Std Dev          Mean   Dose       N        Scores      Under H0      Under H0         Score   --------------------------------------------------------------------   0      16      1.834554           0.0      2.401839      0.114660   0.04      11   1.834554           0.0      2.401839   0.166778   Average scores were used for ties.   Savage Two-Sample Test   Statistic   1.8346   Z   0.7638   One-Sided Pr <  Z      0.2225   Two-Sided Pr > Z     0.4450   Savage One-Way Analysis   Chi-Square             0.5834   DF                          1   Pr > Chi-Square        0.4450  
end figure

Figure 52.11: Savage Two-Sample Analysis

The tables in Figure 52.12 display the empirical distribution function statistics, comparing the distribution of Gain for the two levels of Dose . The p -value for the Kolmogorov-Smirnov two-sample test is 0.6199, which indicates no rejection of the null hypothesis that the Gain distributions are identical for the two levels of Dose .

start figure
  Kolmogorov-Smirnov Test for Variable Gain   Classified by Variable Dose   EDF at    Deviation from Mean   Dose        N       Maximum        at Maximum   --------------------------------------------------   0       16      0.250000   0.481481   0.04       11      0.545455          0.580689   Total      27      0.370370   Maximum Deviation Occurred at Observation 4   Value of Gain at Maximum = 216.0   Kolmogorov-Smirnov Two-Sample Test (Asymptotic)   KS   0.145172    D         0.295455   KSa  0.754337    Pr > KSa  0.6199   Cramer-von Mises Test for Variable Gain   Classified by Variable Dose   Summed Deviation   Dose           N            from Mean   ----------------------------------------   0          16             0.098638   0.04          11             0.143474   Cramer-von Mises Statistics (Asymptotic)   CM 0.008967    CMa 0.242112   Kuiper Test for Variable Gain   Classified by Variable Dose   Deviation   Dose         N       from Mean   ------------------------------   0        16        0.090909   0.04        11        0.295455   Kuiper Two-Sample Test (Asymptotic)   K  0.386364   Ka   0.986440    Pr > Ka   0.8383  
end figure

Figure 52.12: Two-Sample EDF Tests



SAS.STAT 9.1 Users Guide (Vol. 5)
SAS.STAT 9.1 Users Guide (Vol. 5)
ISBN: N/A
EAN: N/A
Year: 2004
Pages: 98

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net