Examples | SAS.STAT 9.1 Users Guide (Vol. 5)

Example 53.1. Precise Analysis of Variance

The data for the following example are from Powell et al. (1982). In order to calibrate an instrument for measuring atomic weight, 24 replicate measurements of the atomic weight of silver (chemical symbol Ag ) are made with the new instrument and with a reference instrument.

Note: The results from this example vary from machine to machine depending on floating-point configuration.

The following statements read the measurements for the two instruments into the SAS data set AgWeight .

  title 'Atomic Weight of Silver by Two Different Instruments';   data AgWeight;   input Instrument AgWeight @@;   datalines;   1 107.8681568   1 107.8681465   1 107.8681572   1 107.8681785   1 107.8681446   1 107.8681903   1 107.8681526   1 107.8681494   1 107.8681616   1 107.8681587   1 107.8681519   1 107.8681486   1 107.8681419   1 107.8681569   1 107.8681508   1 107.8681672   1 107.8681385   1 107.8681518   1 107.8681662   1 107.8681424   1 107.8681360   1 107.8681333   1 107.8681610   1 107.8681477   2 107.8681079   2 107.8681344   2 107.8681513   2 107.8681197   2 107.8681604   2 107.8681385   2 107.8681642   2 107.8681365   2 107.8681151   2 107.8681082   2 107.8681517   2 107.8681448   2 107.8681198   2 107.8681482   2 107.8681334   2 107.8681609   2 107.8681101   2 107.8681512   2 107.8681469   2 107.8681360   2 107.8681254   2 107.8681261   2 107.8681450   2 107.8681368   ;

Notice that the variation in the atomic weight measurements is several orders of magnitude less than their mean. This is a situation that can be difficult for standard, regression-based analysis-of-variance procedures to handle correctly.

The following statements invoke the ORTHOREG procedure to perform a simple one-way analysis of variance, testing for differences between the two instruments.

  proc orthoreg data=AgWeight;   class Instrument;   model AgWeight = Instrument;   run;

Output 53.1.1 shows the resulting analysis.

Output 53.1.1: PROC ORTHOREG Results for Atomic Weight Example

  Atomic Weight of Silver by Two Different Instruments   The ORTHOREG Procedure   Class Level Information   Factor        Levels    -Values-   Instrument         2    1 2   Atomic Weight of Silver by Two Different Instruments   The ORTHOREG Procedure   Dependent Variable: AgWeight   Sum of   Source                 DF         Squares     Mean Square    F Value    Pr > F   Model                   1    3.6383419E-9    3.6383419E-9      15.95    0.0002   Error                  46    1.0495173E-8    2.281559E-10   Corrected Total        47    1.4133515E-8   Root MSE    0.0000151048   R-Square    0.2574265445   Standard   Parameter          DF   Parameter Estimate          Error   t Value   Pr > t   Intercept           1     107.868136354166   3.0832608E-6   3.499E7     <.0001   (Instrument='1')    1     0.00001741249999   4.3603893E-6      3.99     0.0002   (Instrument='2')    0                    0              .       .        .

The mean difference between instruments is about 1 . 74 — 10 ^{ˆ’ 5} (the value of the (Instrument='1') parameter in the parameter estimates table), whereas the level of background variation in the measurements is about 1 . 51 — 10 ^{ˆ’ 5} (the value of the root mean squared error). The difference is significant, with a p -value of 0.0002.

The National Institute of Standards and Technology (1998) has provided certified ANOVA values for this data set. The following statements use ODS to examine the ANOVA values produced by both the ORTHOREG and GLM procedures more precisely for comparison with the NIST-certified values:

  ods listing close;   ods output ANOVA         = OrthoregANOVA   FitStatistics = OrthoregFitStat;   proc orthoreg data=AgWeight;   class Instrument;   model AgWeight = Instrument;   run;   ods output OverallANOVA  = GLMANOVA   FitStatistics = GLMFitStat;   proc glm data=AgWeight;   class Instrument;   model AgWeight = Instrument;   run;   ods listing;   data _null_; set OrthoregANOVA  (in=inANOVA)   OrthoregFitStat(in=inFitStat);   if (inANOVA) then do;   if (Source = 'Model') then put "Model SS: " ss e20.;   if (Source = 'Error') then put "Error SS: " ss e20.;   end;   if (inFitStat) then do;   if (Statistic = 'Root MSE') then   put "Root MSE: " nValue1 e20.;   if (Statistic = 'R-Square') then   put "R-Square: " nValue1 best20.;   end;   data _null_; set GLMANOVA  (in=inANOVA)   GLMFitStat(in=inFitStat);   if (inANOVA) then do;   if (Source = 'Model') then put "Model SS: " ss e20.;   if (Source = 'Error') then put "Error SS: " ss e20.;   end;   if (inFitStat) then      put "Root MSE: " RootMSE e20.;   if (inFitStat) then   put "R-Square: " RSquare best20.;   run;

In releases of SAS/STAT software prior to Version 8, PROC GLM gave much less accurate results than PROC ORTHOREG, as shown in the following tables, which compare the ANOVA values certified by NIST with those produced by the two procedures.

	Model SS	Error SS
NIST-certified	3.6383418750000E-09	1.0495172916667E-08
ORTHOREG	3.6383418747907E-09	1.0495172916797E-08
GLM, Version 8	3.6383418747907E-09	1.0495172916797E-08
GLM, Previous releases		1.0331496763990E-08

	Root MSE	R-Square
NIST-certified	1.5104831444641E-05	0.25742654453832
ORTHOREG	1.5104831444735E-05	0.25742654452494
GLM, Version 8	1.5104831444735E-05	0.25742654452494
GLM, Previous releases	1.4986585859992E-05

While the ORTHOREG values and the GLM values for Version 8 are quite close to the certified ones, the GLM values for prior releases are not. In fact, since the model sum of squares is so small, in prior releases the GLM procedure set it (and consequently R ² ) to zero.

Example 53.2. Wampler Data

This example applies the ORTHOREG procedure to a collection of data sets noted for being ill conditioned. The OUTEST= data set is used to collect the results for comparison with values certified to be correct by the National Institute of Standards and Technology (1998).

Note: The results from this example vary from machine to machine depending on floating-point configuration.

The data are from Wampler (1970). The independent variates for all five data sets are x ⁱ , i = 1 , 5 , for x = 0 , 1 , , 20. Two of the five dependent variables are exact linear functions of the independent terms:

The other three dependent variables have the same mean value as y ₁ , but with nonzero errors.

where e is a vector of values with standard deviation 2044, chosen to be orthogonal to the mean model for y ₁ .

The following statements create a SAS data set Wampler containing the Wampler data, run a SAS macro program using PROC ORTHOREG to fitafifth-order polynomial in x to each of the Wampler dependent variables, and collect the results in a data set named ParmEst .

  data Wampler;   do x=0 to 20;   input e @@;   y1 = 1 +       x    +        x**2 +      x**3   +       x**4 +        x**5;   y2 = 1 + .1   *x    + .01   *x**2 + .001*x**3   + .0001*x**4 + .00001*x**5;   y3 = y1 +       e;   y4 = y1 +   100*e;   y5 = y1 + 10000*e;   output;   end;   datalines;   759   2048 2048   2048 2523   2048 2048   2048 1838   2048 2048     2048 1838   2048 2048   2048 2523   2048 2048   2048 759   ;   %macro WTest;   data ParmEst; if (0); run;   %do i = 1 %to 5;   proc orthoreg data=Wampler outest=ParmEst&i noprint;   model y&i = x x*x x*x*x x*x*x*x x*x*x*x*x;   data ParmEst&i; set ParmEst&i; Dep = "y&i";   data ParmEst; set ParmEst ParmEst&i;   label Col1='x'     Col2='x**2'  Col3='x**3'   Col4='x**4'  Col5='x**5';   run;   %end;   %mend;   %WTest;

Instead of displaying the raw values of the RMSE and parameter estimates, use a further DATA step to compute the deviations from the values certified to be correct by the National Institute of Standards and Technology (1998).

  data ParmEst; set ParmEst;   if      (Dep = 'y1') then   _RMSE_ = _RMSE_   0.00000000000000;   else if (Dep = 'y2') then   _RMSE_ = _RMSE_   0.00000000000000;   else if (Dep = 'y3') then   _RMSE_ = _RMSE_   2360.14502379268;   else if (Dep = 'y4') then   _RMSE_ = _RMSE_   236014.502379268;   else if (Dep = 'y5') then   _RMSE_ = _RMSE_   23601450.2379268;   if (Dep ^= 'y2') then do;   Intercept = Intercept   1.00000000000000;   Col1      = Col1   1.00000000000000;   Col2      = Col2   1.00000000000000;   Col3      = Col3   1.00000000000000;   Col4      = Col4   1.00000000000000;   Col5      = Col5   1.00000000000000;   end;   else do;   Intercept = Intercept   1.00000000000000;   Col1      = Col1   0.100000000000000;   Col2      = Col2   0.100000000000000e   1;   Col3      = Col3   0.100000000000000e   2;   Col4      = Col4   0.100000000000000e   3;   Col5      = Col5   0.100000000000000e   4;   end;   proc print data=ParmEst label noobs;   title 'Wampler data: Deviations from Certified Values';   format _RMSE_ Intercept Col1-Col5 e9.;   var Dep _RMSE_ Intercept Col1-Col5;   run;

The results, shown in Output 53.2.1, indicate that the values computed by PROC ORTHOREG are quite close to the NIST-certified values.

Output 53.2.1: Wampler data: Deviations from Certified Values

  Wampler data: Deviations from Certified Values   Dep     _RMSE_  Intercept          x       x**2       x**3       x**4       x**5   y1    0.00E+00   1.49E   10   9.08E   12   5.99E   12   1.26E   12   9.68E   14   2.00E   15   y2    0.00E+00   6.33E   15   5.55E   16   1.37E   16   1.13E   17   5.56E   19   1.52E   20   y3    1.09E   11   3.02E   10   1.70E   10   4.88E   11   5.75E   12   3.18E   13   6.88E   15   y4   3.20E   10   2.74E   09   5.60E-09   2.12E   09   2.89E-10   1.63E   11   3.24E   13   y5   2.98E   08   2.46E   07   5.54E   07   2.12E   07   2.90E   08   1.64E   09   3.27E   11