Example 64.2. Regression Parameter Estimates


Example 64.2. Regression Parameter Estimates

In this example, PROC REG computes regression parameter estimates for the Fitness data. (See Example 64.1 to create the Fitness data set.) The parameter estimates are output to a data set and used as scoring coefficients. For the first part of this example, PROC SCORE is used to score the Fitness data, which are the same data used in the regression.

In the second part of this example, PROC SCORE is used to score a new data set, Fitness2 . For PROC SCORE, the TYPE= specification is PARMS , and the names of the score variables are found in the variable _MODEL_ , which gets its values from the model label. The following code produces Output 64.2.1 through Output 64.2.3:

Output 64.2.1: Creating an OUTEST= Data Set with PROC REG
start example
  REGRESSION SCORING EXAMPLE   The REG Procedure   Model: oxyhat   Dependent Variable: Oxygen   Analysis of Variance   Sum of           Mean   Source                   DF        Squares         Square    F Value    Pr > F   Model                     5      509.62201      101.92440      15.80    0.0021   Error                     6       38.70060        6.45010   Corrected Total          11      548.32261   Root MSE              2.53970    R-Square     0.9294   Dependent Mean       48.38942    Adj R-Sq     0.8706   Coeff Var             5.24847   Parameter Estimates   Parameter       Standard   Variable     DF       Estimate          Error    t Value    Pr > t   Intercept     1      151.91550       31.04738        4.89      0.0027   Age           1   0.63045        0.42503   1.48      0.1885   Weight        1   0.10586        0.11869   0.89      0.4068   RunTime       1   1.75698        0.93844   1.87      0.1103   RunPulse      1   0.22891        0.12169   1.88      0.1090   RestPulse     1   0.17910        0.13005   1.38      0.2176  
end example
 
Output 64.2.2: OUTEST= Data Set from PROC REG Reproduced with PROC PRINT
start example
  REGRESSION SCORING EXAMPLE   OUTEST= Data Set from PROC REG   Obs    _MODEL_    _TYPE_    _DEPVAR_     _RMSE_    Intercept       Age   1     oxyhat     PARMS      Oxygen     2.53970     151.916     -0.63045   Rest   Obs     Weight      RunTime    RunPulse      Pulse     Oxygen   1   0.10586   1.75698   0.22891   0.17910   1  
end example
 
Output 64.2.3: Predicted and Residual Scores from the OUT= Data Set Created by PROC SCORE and Reproduced Using PROC PRINT
start example
  REGRESSION SCORING EXAMPLE   Predicted Scores for Regression   Run      Rest     Run   Obs    Age    Weight    Oxygen     Time    Pulse    Pulse     oxyhat   1     44     89.47    44.609    11.37      62      178     42.8771   2     40     75.07    45.313    10.07      62      185     47.6050   3     44     85.84    54.297     8.65      45      156     56.1211   4     42     68.15    59.571     8.17      40      166     58.7044   5     38     89.02    49.874     9.22      55      178     51.7386   6     47     77.45    44.811    11.63      58      176     42.9756   7     40     75.98    45.681    11.95      70      176     44.8329   8     43     81.19    49.091    10.85      64      162     48.6020   9     44     81.42    39.442    13.08      63      174     41.4613   10     38     81.87    60.055     8.63      48      170     56.6171   11     44     73.03    50.541    10.13      45      168     52.1299   12     45     87.66    37.388    14.03      56      186     37.0080   REGRESSION SCORING EXAMPLE   Negative Residual Scores for Regression   Run      Rest     Run   Obs    Age    Weight    Oxygen     Time    Pulse    Pulse     oxyhat   1     44     89.47    44.609    11.37      62      178   1.73195   2     40     75.07    45.313    10.07      62      185      2.29197   3     44     85.84    54.297     8.65      45      156      1.82407   4     42     68.15    59.571     8.17      40      166   0.86657   5     38     89.02    49.874     9.22      55      178      1.86460   6     47     77.45    44.811    11.63      58      176   1.83542   7     40     75.98    45.681    11.95      70      176   0.84811   8     43     81.19    49.091    10.85      64      162   0.48897   9     44     81.42    39.442    13.08      63      174      2.01935   10     38     81.87    60.055     8.63      48      170   3.43787   11     44     73.03    50.541    10.13      45      168      1.58892   12     45     87.66    37.388    14.03      56      186   0.38002  
end example
 
  proc reg data=Fitness outest=RegOut;   OxyHat: model Oxygen=Age Weight RunTime RunPulse RestPulse;   title 'REGRESSION SCORING EXAMPLE';   run;   proc print data=RegOut;   title2 'OUTEST= Data Set from PROC REG';   run;   proc score data=Fitness score=RegOut out=RScoreP type=parms;   var Age Weight RunTime RunPulse RestPulse;   run;   proc print data=RScoreP;   title2 'Predicted Scores for Regression';   run;   proc score data=Fitness score=RegOut out=RScoreR type=parms;   var Oxygen Age Weight RunTime RunPulse RestPulse;   run;   proc print data=RScoreR;   title2 'Negative Residual Scores for Regression';   run;  

Output 64.2.1 shows the PROC REG output. The column labeled 'Parameter Estimates' lists the parameter estimates. These estimates are output to the RegOut data set.

Output 64.2.2 lists the RegOut data set. Notice that _TYPE_ ='PARMS' and _MODEL_ ='OXYHAT', which are from the label in the MODEL statement in PROC REG.

Output 64.2.3 lists the data sets created by PROC SCORE. Since the SCORE= data set does not contain observations with _TYPE_ ='MEAN' or _TYPE_ ='STD', the data in the Fitness data set are not standardized before scoring. The SCORE= data set contains the variable Intercept , so this intercept value is used in computing the score. To produce the RScoreP data set, the VAR statement in PROC SCORE includes only the independent variables from the model in PROC REG. As a result, the OxyHat variable contains predicted values. To produce the RScoreR data set, the VAR statement in PROC SCORE includes both the dependent variables and the independent variables from the model in PROC REG. As a result, the OxyHat variable contains negative residuals (PREDICT-ACTUAL). If the RESIDUAL option is specified, the variable OxyHat contains positive residuals (ACTUAL-PREDICT). If the PREDICT option is specified, the OxyHat variable contains predicted values.

The second part of this example uses the parameter estimates to score a new data set. The following code produces Output 64.2.4 and Output 64.2.5:

  /* The FITNESS2 data set contains observations 13-16 from */   /* the FITNESS data set used in EXAMPLE 2 in the PROC REG */   /* chapter.                                               */   data Fitness2;   input Age Weight Oxygen RunTime RestPulse RunPulse;   datalines;   45  66.45  44.754  11.12  51  176   47  79.15  47.273  10.60  47  162   54  83.12  51.855  10.33  50  166   49  81.42  49.156   8.95  44  180   ;   proc print data=Fitness2;   title 'REGRESSION SCORING EXAMPLE';   title2 'New Raw Data Set to be Scored';   run;   proc score data=Fitness2 score=RegOut out=NewPred type=parms   nostd predict;   var Oxygen Age Weight RunTime RunPulse RestPulse;   run;   proc print data=NewPred;   title2 'Predicted Scores for Regression';   title3 'for Additional Data from FITNESS2';   run;  

Output 64.2.4 lists the Fitness2 data set.

Output 64.2.4: Listing of the Fitness2 Data Set
start example
  REGRESSION SCORING EXAMPLE   New Raw Data Set to be Scored   Run      Rest     Run   Obs    Age    Weight    Oxygen     Time    Pulse    Pulse   1      45     66.45    44.754    11.12      51      176   2      47     79.15    47.273    10.60      47      162   3      54     83.12    51.855    10.33      50      166   4      49     81.42    49.156     8.95      44      180  
end example
 

PROC SCORE scores the Fitness2 data set using the parameter estimates in the RegOut data set. These parameter estimates result from fitting a regression equationtothe Fitness data set. The NOSTD option is specified, so the raw data are not standardized before scoring. (However, the NOSTD option is not necessary here. The SCORE= data set does not contain observations with _TYPE_ ='MEAN' or _TYPE_ ='STD', so standardization is not performed.) The VAR statement contains the dependent variables and the independent variables used in PROC REG. In addition, the PREDICT option is specified. This combination gives predicted values for the new score variable. The name of the new score variable is OxyHat , from the value of the _MODEL_ variable in the SCORE= data set. Output 64.2.5 shows the data set produced by PROC SCORE.

Output 64.2.5: Predicted Scores from the OUT= Data Set Created by PROC SCORE and Reproduced Using PROC PRINT
start example
  REGRESSION SCORING EXAMPLE   Predicted Scores for Regression   for Additional Data from FITNESS2   Run      Rest     Run   Obs    Age    Weight    Oxygen     Time    Pulse    Pulse     oxyhat   1      45     66.45    44.754    11.12      51      176     47.5507   2      47     79.15    47.273    10.60      47      162     49.7802   3      54     83.12    51.855    10.33      50      166     43.9682   4      49     81.42    49.156     8.95      44      180     47.5949  
end example
 



SAS.STAT 9.1 Users Guide (Vol. 6)
SAS.STAT 9.1 Users Guide (Vol. 6)
ISBN: N/A
EAN: N/A
Year: 2004
Pages: 127

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net