In this example, PROC REG computes regression parameter estimates for the Fitness data. (See Example 64.1 to create the Fitness data set.) The parameter estimates are output to a data set and used as scoring coefficients. For the first part of this example, PROC SCORE is used to score the Fitness data, which are the same data used in the regression.
In the second part of this example, PROC SCORE is used to score a new data set, Fitness2 . For PROC SCORE, the TYPE= specification is PARMS , and the names of the score variables are found in the variable _MODEL_ , which gets its values from the model label. The following code produces Output 64.2.1 through Output 64.2.3:
REGRESSION SCORING EXAMPLE The REG Procedure Model: oxyhat Dependent Variable: Oxygen Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 5 509.62201 101.92440 15.80 0.0021 Error 6 38.70060 6.45010 Corrected Total 11 548.32261 Root MSE 2.53970 R-Square 0.9294 Dependent Mean 48.38942 Adj R-Sq 0.8706 Coeff Var 5.24847 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > t Intercept 1 151.91550 31.04738 4.89 0.0027 Age 1 0.63045 0.42503 1.48 0.1885 Weight 1 0.10586 0.11869 0.89 0.4068 RunTime 1 1.75698 0.93844 1.87 0.1103 RunPulse 1 0.22891 0.12169 1.88 0.1090 RestPulse 1 0.17910 0.13005 1.38 0.2176
REGRESSION SCORING EXAMPLE OUTEST= Data Set from PROC REG Obs _MODEL_ _TYPE_ _DEPVAR_ _RMSE_ Intercept Age 1 oxyhat PARMS Oxygen 2.53970 151.916 -0.63045 Rest Obs Weight RunTime RunPulse Pulse Oxygen 1 0.10586 1.75698 0.22891 0.17910 1
REGRESSION SCORING EXAMPLE Predicted Scores for Regression Run Rest Run Obs Age Weight Oxygen Time Pulse Pulse oxyhat 1 44 89.47 44.609 11.37 62 178 42.8771 2 40 75.07 45.313 10.07 62 185 47.6050 3 44 85.84 54.297 8.65 45 156 56.1211 4 42 68.15 59.571 8.17 40 166 58.7044 5 38 89.02 49.874 9.22 55 178 51.7386 6 47 77.45 44.811 11.63 58 176 42.9756 7 40 75.98 45.681 11.95 70 176 44.8329 8 43 81.19 49.091 10.85 64 162 48.6020 9 44 81.42 39.442 13.08 63 174 41.4613 10 38 81.87 60.055 8.63 48 170 56.6171 11 44 73.03 50.541 10.13 45 168 52.1299 12 45 87.66 37.388 14.03 56 186 37.0080 REGRESSION SCORING EXAMPLE Negative Residual Scores for Regression Run Rest Run Obs Age Weight Oxygen Time Pulse Pulse oxyhat 1 44 89.47 44.609 11.37 62 178 1.73195 2 40 75.07 45.313 10.07 62 185 2.29197 3 44 85.84 54.297 8.65 45 156 1.82407 4 42 68.15 59.571 8.17 40 166 0.86657 5 38 89.02 49.874 9.22 55 178 1.86460 6 47 77.45 44.811 11.63 58 176 1.83542 7 40 75.98 45.681 11.95 70 176 0.84811 8 43 81.19 49.091 10.85 64 162 0.48897 9 44 81.42 39.442 13.08 63 174 2.01935 10 38 81.87 60.055 8.63 48 170 3.43787 11 44 73.03 50.541 10.13 45 168 1.58892 12 45 87.66 37.388 14.03 56 186 0.38002
proc reg data=Fitness outest=RegOut; OxyHat: model Oxygen=Age Weight RunTime RunPulse RestPulse; title 'REGRESSION SCORING EXAMPLE'; run; proc print data=RegOut; title2 'OUTEST= Data Set from PROC REG'; run; proc score data=Fitness score=RegOut out=RScoreP type=parms; var Age Weight RunTime RunPulse RestPulse; run; proc print data=RScoreP; title2 'Predicted Scores for Regression'; run; proc score data=Fitness score=RegOut out=RScoreR type=parms; var Oxygen Age Weight RunTime RunPulse RestPulse; run; proc print data=RScoreR; title2 'Negative Residual Scores for Regression'; run;
Output 64.2.1 shows the PROC REG output. The column labeled 'Parameter Estimates' lists the parameter estimates. These estimates are output to the RegOut data set.
Output 64.2.2 lists the RegOut data set. Notice that _TYPE_ ='PARMS' and _MODEL_ ='OXYHAT', which are from the label in the MODEL statement in PROC REG.
Output 64.2.3 lists the data sets created by PROC SCORE. Since the SCORE= data set does not contain observations with _TYPE_ ='MEAN' or _TYPE_ ='STD', the data in the Fitness data set are not standardized before scoring. The SCORE= data set contains the variable Intercept , so this intercept value is used in computing the score. To produce the RScoreP data set, the VAR statement in PROC SCORE includes only the independent variables from the model in PROC REG. As a result, the OxyHat variable contains predicted values. To produce the RScoreR data set, the VAR statement in PROC SCORE includes both the dependent variables and the independent variables from the model in PROC REG. As a result, the OxyHat variable contains negative residuals (PREDICT-ACTUAL). If the RESIDUAL option is specified, the variable OxyHat contains positive residuals (ACTUAL-PREDICT). If the PREDICT option is specified, the OxyHat variable contains predicted values.
The second part of this example uses the parameter estimates to score a new data set. The following code produces Output 64.2.4 and Output 64.2.5:
/* The FITNESS2 data set contains observations 13-16 from */ /* the FITNESS data set used in EXAMPLE 2 in the PROC REG */ /* chapter. */ data Fitness2; input Age Weight Oxygen RunTime RestPulse RunPulse; datalines; 45 66.45 44.754 11.12 51 176 47 79.15 47.273 10.60 47 162 54 83.12 51.855 10.33 50 166 49 81.42 49.156 8.95 44 180 ; proc print data=Fitness2; title 'REGRESSION SCORING EXAMPLE'; title2 'New Raw Data Set to be Scored'; run; proc score data=Fitness2 score=RegOut out=NewPred type=parms nostd predict; var Oxygen Age Weight RunTime RunPulse RestPulse; run; proc print data=NewPred; title2 'Predicted Scores for Regression'; title3 'for Additional Data from FITNESS2'; run;
Output 64.2.4 lists the Fitness2 data set.
REGRESSION SCORING EXAMPLE New Raw Data Set to be Scored Run Rest Run Obs Age Weight Oxygen Time Pulse Pulse 1 45 66.45 44.754 11.12 51 176 2 47 79.15 47.273 10.60 47 162 3 54 83.12 51.855 10.33 50 166 4 49 81.42 49.156 8.95 44 180
PROC SCORE scores the Fitness2 data set using the parameter estimates in the RegOut data set. These parameter estimates result from fitting a regression equationtothe Fitness data set. The NOSTD option is specified, so the raw data are not standardized before scoring. (However, the NOSTD option is not necessary here. The SCORE= data set does not contain observations with _TYPE_ ='MEAN' or _TYPE_ ='STD', so standardization is not performed.) The VAR statement contains the dependent variables and the independent variables used in PROC REG. In addition, the PREDICT option is specified. This combination gives predicted values for the new score variable. The name of the new score variable is OxyHat , from the value of the _MODEL_ variable in the SCORE= data set. Output 64.2.5 shows the data set produced by PROC SCORE.
REGRESSION SCORING EXAMPLE Predicted Scores for Regression for Additional Data from FITNESS2 Run Rest Run Obs Age Weight Oxygen Time Pulse Pulse oxyhat 1 45 66.45 44.754 11.12 51 176 47.5507 2 47 79.15 47.273 10.60 47 162 49.7802 3 54 83.12 51.855 10.33 50 166 43.9682 4 49 81.42 49.156 8.95 44 180 47.5949