Examples


Example 61.1. Aerobic Fitness Prediction

Aerobic fitness (measured by the ability to consume oxygen) is fit to some simple exercise tests. The goal is to develop an equation to predict fitness based on the exercise tests rather than on expensive and cumbersome oxygen consumption measurements. Three model-selection methods are used: forward selection, backward selection, and MAXR selection. The following statements produce Output 61.1.1 through Output 61.1.5. (Collinearity diagnostics for the full model are shown in Figure 61.42 on page 3896.)

  *-------------------Data on Physical Fitness-------------------*   These measurements were made on men involved in a physical   fitness course at N.C.State Univ. The variables are Age   (years), Weight (kg), Oxygen intake rate (ml per kg body   weight per minute), time to run 1.5 miles (minutes), heart   rate while resting, heart rate while running (same time   Oxygen rate measured), and maximum heart rate recorded while   running.   ***Certain values of MaxPulse were changed for this analysis.   *--------------------------------------------------------------*;   data fitness;   input Age Weight Oxygen RunTime RestPulse RunPulse MaxPulse @@;   datalines;   44 89.47 44.609 11.37 62 178 182   40 75.07 45.313 10.07 62 185 185   44 85.84 54.297  8.65 45 156 168   42 68.15 59.571  8.17 40 166 172   38 89.02 49.874  9.22 55 178 180   47 77.45 44.811 11.63 58 176 176   40 75.98 45.681 11.95 70 176 180   43 81.19 49.091 10.85 64 162 170   44 81.42 39.442 13.08 63 174 176   38 81.87 60.055  8.63 48 170 186   44 73.03 50.541 10.13 45 168 168   45 87.66 37.388 14.03 56 186 192   45 66.45 44.754 11.12 51 176 176   47 79.15 47.273 10.60 47 162 164   54 83.12 51.855 10.33 50 166 170   49 81.42 49.156  8.95 44 180 185   51 69.63 40.836 10.95 57 168 172   51 77.91 46.672 10.00 48 162 168   48 91.63 46.774 10.25 48 162 164   49 73.37 50.388 10.08 67 168 168   57 73.37 39.407 12.63 58 174 176   54 79.38 46.080 11.17 62 156 165   52 76.32 45.441  9.63 48 164 166   50 70.87 54.625  8.92 48 146 155   51 67.25 45.118 11.08 48 172 172   54 91.63 39.203 12.88 44 168 172   51 73.71 45.790 10.47 59 186 188   57 59.08 50.545  9.93 49 148 155   49 76.32 48.673  9.40 56 186 188   48 61.24 47.920 11.50 52 170 176   52 82.78 47.467 10.50 53 170 172   ;   proc reg data=fitness;   model Oxygen=Age Weight RunTime RunPulse RestPulse MaxPulse   / selection=forward;   model Oxygen=Age Weight RunTime RunPulse RestPulse MaxPulse   / selection=backward;   model Oxygen=Age Weight RunTime RunPulse RestPulse MaxPulse   / selection=maxr;   run;  
Output 61.1.1: Forward Selection Method: PROC REG
start example
  The REG Procedure   Model: MODEL1   Dependent Variable: Oxygen   Forward Selection: Step 1   Variable RunTime Entered: R-Square = 0.7434 and C(p) = 13.6988   Analysis of Variance   Sum of           Mean   Source                   DF        Squares         Square    F Value    Pr > F   Model                     1      632.90010      632.90010      84.01    <.0001   Error                    29      218.48144        7.53384   Corrected Total          30      851.38154   Parameter     Standard   Variable      Estimate        Error   Type II SS F Value   Pr > F   Intercept     82.42177      3.85530   3443.36654   457.05  <.0001   RunTime   3.31056      0.36119    632.90010    84.01  <.0001   Bounds on condition number: 1, 1   --------------------------------------------------------------------------------   Forward Selection: Step 2   Variable Age Entered: R-Square = 0.7642 and C(p) = 12.3894   Analysis of Variance   Sum of           Mean   Source                   DF        Squares         Square    F Value    Pr > F   Model                     2      650.66573      325.33287      45.38    <.0001   Error                    28      200.71581        7.16842   Corrected Total          30      851.38154   Parameter     Standard   Variable      Estimate        Error   Type II SS  F Value  Pr > F   Intercept     88.46229      5.37264   1943.41071   271.11  <.0001   Age   0.15037      0.09551     17.76563     2.48  0.1267   RunTime   3.20395      0.35877    571.67751    79.75  <.0001   Bounds on condition number: 1.0369, 4.1478   --------------------------------------------------------------------------------  
end example
 

The FORWARD model-selection method begins with no variables in the model and adds RunTime , then Age ,

then RunPulse , then MaxPulse ,

  Forward Selection: Step 3   Variable RunPulse Entered: R-Square = 0.8111 and C(p) = 6.9596   Analysis of Variance   Sum of           Mean   Source                   DF        Squares         Square    F Value    Pr > F   Model                     3      690.55086      230.18362      38.64    <.0001   Error                    27      160.83069        5.95669   Corrected Total          30      851.38154   Parameter     Standard   Variable      Estimate        Error   Type II SS  F Value  Pr > F   Intercept    111.71806     10.23509    709.69014   119.14  <.0001   Age   0.25640      0.09623     42.28867     7.10  0.0129   RunTime   2.82538      0.35828    370.43529    62.19  <.0001   RunPulse   0.13091      0.05059     39.88512     6.70  0.0154   Bounds on condition number: 1.3548, 11.597   --------------------------------------------------------------------------------   Forward Selection: Step 4   Variable MaxPulse Entered: R-Square = 0.8368 and C(p) = 4.8800   Analysis of Variance   Sum of           Mean   Source                   DF        Squares         Square    F Value    Pr > F   Model                     4      712.45153      178.11288      33.33    <.0001   Error                    26      138.93002        5.34346   Corrected Total          30      851.38154   Parameter     Standard   Variable      Estimate        Error   Type II SS  F Value  Pr > F   Intercept     98.14789     11.78569    370.57373    69.35  <.0001   Age   0.19773      0.09564     22.84231     4.27  0.0488   RunTime   2.76758      0.34054    352.93570    66.05  <.0001   RunPulse   0.34811      0.11750     46.90089     8.78  0.0064   MaxPulse       0.27051      0.13362     21.90067     4.10  0.0533   Bounds on condition number: 8.4182, 76.851   --------------------------------------------------------------------------------  

and finally, Weight .Thefinal variable available to add to the model, RestPulse , is not added since it does not meet the 50% (the default value of the SLE option is 0.5 for FORWARD selection) significance-level criterion for entry into the model.

  Forward Selection: Step 5   Variable Weight Entered: R-Square = 0.8480 and C(p) = 5.1063   Analysis of Variance   Sum of           Mean   Source                   DF        Squares         Square    F Value    Pr > F   Model                     5      721.97309      144.39462      27.90    <.0001   Error                    25      129.40845        5.17634   Corrected Total          30      851.38154   Parameter     Standard   Variable      Estimate        Error   Type II SS  F Value  Pr > F   Intercept    102.20428     11.97929    376.78935    72.79  <.0001   Age   0.21962      0.09550     27.37429     5.29  0.0301   Weight   0.07230      0.05331      9.52157     1.84  0.1871   RunTime   2.68252      0.34099    320.35968    61.89  <.0001   RunPulse   0.37340      0.11714     52.59624    10.16  0.0038   MaxPulse       0.30491      0.13394     26.82640     5.18  0.0316   Bounds on condition number: 8.7312, 104.83   --------------------------------------------------------------------------------   No other variable met the 0.5000 significance level for entry into the model.   Summary of Forward Selection   Variable    Number    Partial     Model   Step   Entered     Vars In   R-Square   R-Square    C(p)     F Value   Pr > F   1    RunTime         1      0.7434     0.7434    13.6988     84.01   <.0001   2    Age             2      0.0209     0.7642    12.3894      2.48   0.1267   3    RunPulse        3      0.0468     0.8111     6.9596      6.70   0.0154   4    MaxPulse        4      0.0257     0.8368     4.8800      4.10   0.0533   5    Weight          5      0.0112     0.8480     5.1063      1.84   0.1871  

The BACKWARD model-selection method begins with the full model.

Output 61.1.2: Backward Selection Method: PROC REG
start example
  The REG Procedure   Model: MODEL2   Dependent Variable: Oxygen   Backward Elimination: Step 0   All Variables Entered: R-Square = 0.8487 and C(p) = 7.0000   Analysis of Variance   Sum of           Mean   Source                   DF        Squares         Square    F Value    Pr > F   Model                     6      722.54361      120.42393      22.43    <.0001   Error                    24      128.83794        5.36825   Corrected Total          30      851.38154   Parameter     Standard   Variable      Estimate        Error   Type II SS  F Value  Pr > F   Intercept    102.93448     12.40326    369.72831    68.87  <.0001   Age   0.22697      0.09984     27.74577     5.17  0.0322   Weight   0.07418      0.05459      9.91059     1.85  0.1869   RunTime   2.62865      0.38456    250.82210    46.72  <.0001   RunPulse   0.36963      0.11985     51.05806     9.51  0.0051   RestPulse   0.02153      0.06605      0.57051     0.11  0.7473   MaxPulse       0.30322      0.13650     26.49142     4.93  0.0360   Bounds on condition number: 8.7438, 137.13   --------------------------------------------------------------------------------  
end example
 

RestPulse is the first variable deleted,

  Backward Elimination: Step 1   Variable RestPulse Removed: R-Square = 0.8480 and C(p) = 5.1063   Analysis of Variance   Sum of           Mean   Source                   DF        Squares         Square    F Value    Pr > F   Model                     5      721.97309      144.39462      27.90    <.0001   Error                    25      129.40845        5.17634   Corrected Total          30      851.38154   Parameter     Standard   Variable      Estimate        Error   Type II SS  F Value  Pr > F   Intercept    102.20428     11.97929    376.78935    72.79  <.0001   Age   0.21962      0.09550     27.37429     5.29  0.0301   Weight   0.07230      0.05331      9.52157     1.84  0.1871   RunTime   2.68252      0.34099    320.35968    61.89  <.0001   RunPulse   0.37340      0.11714     52.59624    10.16  0.0038   MaxPulse       0.30491      0.13394     26.82640     5.18  0.0316   Bounds on condition number: 8.7312, 104.83   --------------------------------------------------------------------------------  
followed by Weight . No other variables are deleted from the model since the variables remaining ( Age , RunTime , RunPulse ,and MaxPulse ) are all significant at the 10% (the default value of the SLS option is 0.1 for the BACKWARD elimination method) significance level.
  Backward Elimination: Step 2   Variable Weight Removed: R-Square = 0.8368 and C(p) = 4.8800   Analysis of Variance   Sum of           Mean   Source                   DF        Squares         Square    F Value    Pr > F   Model                     4      712.45153      178.11288      33.33    <.0001   Error                    26      138.93002        5.34346   Corrected Total          30      851.38154   Parameter     Standard   Variable      Estimate        Error   Type II SS  F Value  Pr > F   Intercept     98.14789     11.78569    370.57373    69.35  <.0001   Age   0.19773      0.09564     22.84231     4.27  0.0488   RunTime   2.76758      0.34054    352.93570    66.05  <.0001   RunPulse   0.34811      0.11750     46.90089     8.78  0.0064   MaxPulse       0.27051      0.13362     21.90067     4.10  0.0533   Bounds on condition number: 8.4182, 76.851   --------------------------------------------------------------------------------   All variables left in the model are significant at the 0.1000 level.   Summary of Backward Elimination   Variable    Number    Partial     Model   Step   Removed     Vars In   R-Square   R-Square    C(p)     F Value   Pr > F   1    RestPulse       5      0.0007     0.8480     5.1063      0.11   0.7473   2    Weight          4      0.0112     0.8368     4.8800      1.84   0.1871  

The MAXR method tries to find the best one-variable model, the best two-variable model, and so on. For the fitness data, the one-variable model contains RunTime ; the two-variable model contains RunTime and Age ;

Output 61.1.3: Maximum R-Square Improvement Selection Method: PROC REG
start example
  The REG Procedure   Model: MODEL3   Dependent Variable: Oxygen   Maximum R-Square Improvement: Step 1   Variable RunTime Entered: R-Square = 0.7434 and C(p) = 13.6988   Analysis of Variance   Sum of           Mean   Source                   DF        Squares         Square    F Value    Pr > F   Model                     1      632.90010      632.90010      84.01    <.0001   Error                    29      218.48144        7.53384   Corrected Total          30      851.38154   Parameter     Standard   Variable      Estimate        Error   Type II SS  F Value  Pr > F   Intercept     82.42177      3.85530   3443.36654   457.05  <.0001   RunTime       -3.31056      0.36119    632.90010    84.01  <.0001   Bounds on condition number: 1, 1   --------------------------------------------------------------------------------   The above model is the best  1-variable model found.   Maximum R-Square Improvement: Step 2   Variable Age Entered: R-Square = 0.7642 and C(p) = 12.3894   Analysis of Variance   Sum of           Mean   Source                   DF        Squares         Square    F Value    Pr > F   Model                     2      650.66573      325.33287      45.38    <.0001   Error                    28      200.71581        7.16842   Corrected Total          30      851.38154   Parameter     Standard   Variable      Estimate        Error   Type II SS  F Value  Pr > F   Intercept     88.46229      5.37264   1943.41071   271.11  <.0001   Age   0.15037      0.09551     17.76563     2.48  0.1267   RunTime   3.20395      0.35877    571.67751    79.75  <.0001   Bounds on condition number: 1.0369, 4.1478   --------------------------------------------------------------------------------   The above model is the best  2-variable model found.  
end example
 
the three-variable model contains RunTime , Age , and RunPulse ; the four-variable model contains Age , RunTime , RunPulse , and MaxPulse ; thefive-variable model
  Maximum R-Square Improvement: Step 3   Variable RunPulse Entered: R-Square = 0.8111 and C(p) = 6.9596   Analysis of Variance   Sum of           Mean   Source                   DF        Squares         Square    F Value    Pr > F   Model                     3      690.55086      230.18362      38.64    <.0001   Error                    27      160.83069        5.95669   Corrected Total          30      851.38154   Parameter     Standard   Variable      Estimate        Error   Type II SS  F Value  Pr > F   Intercept    111.71806     10.23509    709.69014   119.14  <.0001   Age   0.25640      0.09623     42.28867     7.10  0.0129   RunTime   2.82538      0.35828    370.43529    62.19  <.0001   RunPulse   0.13091      0.05059     39.88512     6.70  0.0154   Bounds on condition number: 1.3548, 11.597   --------------------------------------------------------------------------------   The above model is the best  3-variable model found.   Maximum R-Square Improvement: Step 4   Variable MaxPulse Entered: R-Square = 0.8368 and C(p) = 4.8800   Analysis of Variance   Sum of           Mean   Source                   DF        Squares         Square    F Value    Pr > F   Model                     4      712.45153      178.11288      33.33    <.0001   Error                    26      138.93002        5.34346   Corrected Total          30      851.38154   Parameter     Standard   Variable      Estimate        Error   Type II SS  F Value  Pr > F   Intercept     98.14789     11.78569    370.57373    69.35  <.0001   Age   0.19773      0.09564     22.84231     4.27  0.0488   RunTime   2.76758      0.34054    352.93570    66.05  <.0001   RunPulse   0.34811      0.11750     46.90089     8.78  0.0064   MaxPulse       0.27051      0.13362     21.90067     4.10  0.0533   Bounds on condition number: 8.4182, 76.851   --------------------------------------------------------------------------------   The above model is the best  4-variable model found.  
contains Age , Weight , RunTime , RunPulse , and MaxPulse ; andfinally, the six-variable model contains all the variables in the MODEL statement.
  Maximum R-Square Improvement: Step 5   Variable Weight Entered: R-Square = 0.8480 and C(p) = 5.1063   Analysis of Variance   Sum of           Mean   Source                   DF        Squares         Square    F Value    Pr > F   Model                     5      721.97309      144.39462      27.90    <.0001   Error                    25      129.40845        5.17634   Corrected Total          30      851.38154   Parameter     Standard   Variable      Estimate        Error   Type II SS  F Value  Pr > F   Intercept    102.20428     11.97929    376.78935    72.79  <.0001   Age   0.21962      0.09550     27.37429     5.29  0.0301   Weight   0.07230      0.05331      9.52157     1.84  0.1871   RunTime   2.68252      0.34099    320.35968    61.89  <.0001   RunPulse   0.37340      0.11714     52.59624    10.16  0.0038   MaxPulse       0.30491      0.13394     26.82640     5.18  0.0316   Bounds on condition number: 8.7312, 104.83   --------------------------------------------------------------------------------   The above model is the best  5-variable model found.   Maximum R-Square Improvement: Step 6   Variable RestPulse Entered: R-Square = 0.8487 and C(p) = 7.0000   Analysis of Variance   Sum of           Mean   Source                   DF        Squares         Square    F Value    Pr > F   Model                     6      722.54361      120.42393      22.43    <.0001   Error                    24      128.83794        5.36825   Corrected Total          30      851.38154   Parameter     Standard   Variable      Estimate        Error   Type II SS  F Value  Pr > F   Intercept    102.93448     12.40326    369.72831    68.87  <.0001   Age   0.22697      0.09984     27.74577     5.17  0.0322   Weight   0.07418      0.05459      9.91059     1.85  0.1869   RunTime   2.62865      0.38456    250.82210    46.72  <.0001   RunPulse   0.36963      0.11985     51.05806     9.51  0.0051   RestPulse   0.02153      0.06605      0.57051     0.11  0.7473   MaxPulse       0.30322      0.13650     26.49142     4.93  0.0360   Bounds on condition number: 8.7438, 137.13   --------------------------------------------------------------------------------   The above model is the best  6-variable model found.   No further improvement in R-Square is possible.  

Note that for all three of these methods, RestPulse contributes least to the model. In the case of forward selection, it is not added to the model. In the case of backward selection, it is the first variable to be removed from the model. In the case of MAXR selection, RestPulse is included only for the full model.

For the STEPWISE, BACKWARDS and FORWARD selection methods, you can control the amount of detail displayed by using the DETAILS option. For example, the following statements display only the selection summary table for the FORWARD selection method.

  proc reg data=fitness;   model Oxygen=Age Weight RunTime RunPulse RestPulse MaxPulse   / selection=forward details=summary;   run;  
Output 61.1.4: Forward Selection Summary
start example
  The REG Procedure   Model: MODEL1   Dependent Variable: Oxygen   Summary of Forward Selection   Variable    Number    Partial     Model   Step   Entered     Vars In   R-Square   R-Square    C(p)     F Value   Pr > F   1    RunTime         1      0.7434     0.7434    13.6988     84.01   <.0001   2    Age             2      0.0209     0.7642    12.3894      2.48   0.1267   3    RunPulse        3      0.0468     0.8111     6.9596      6.70   0.0154   4    MaxPulse        4      0.0257     0.8368     4.8800      4.10   0.0533   5    Weight          5      0.0112     0.8480     5.1063      1.84   0.1871  
end example
 

Next, the RSQUARE model-selection method is used to request R 2 and C p statistics for all possible combinations of the six independent variables. The following statements produce Output 61.1.5

  model Oxygen=Age Weight RunTime RunPulse RestPulse MaxPulse   / selection=rsquare cp;   title Physical fitness data: all models;   run;  
Output 61.1.5: All Models by the RSQUARE Method: PROC REG
start example
  Physical fitness data: all models   The REG Procedure   Model: MODEL2   Dependent Variable: Oxygen   R-Square Selection Method   Number in   Model    R-Square      C(p)  Variables in Model   1     0.7434   13.6988  RunTime   1     0.1595  106.3021  RestPulse   1     0.1584  106.4769  RunPulse   1     0.0928  116.8818  Age   1     0.0560  122.7072  MaxPulse   1     0.0265  127.3948  Weight   ------------------------------------------------------------------------------   2     0.7642   12.3894  Age RunTime   2     0.7614   12.8372  RunTime RunPulse   2     0.7452   15.4069  RunTime MaxPulse   2     0.7449   15.4523  Weight RunTime   2     0.7435   15.6746  RunTime RestPulse   2     0.3760   73.9645  Age RunPulse   2     0.3003   85.9742  Age RestPulse   2     0.2894   87.6951  RunPulse MaxPulse   2     0.2600   92.3638  Age MaxPulse   2     0.2350   96.3209  RunPulse RestPulse   2     0.1806  104.9523  Weight RestPulse   2     0.1740  105.9939  RestPulse MaxPulse   2     0.1669  107.1332  Weight RunPulse   2     0.1506  109.7057  Age Weight   2     0.0675  122.8881  Weight MaxPulse   ------------------------------------------------------------------------------   3     0.8111    6.9596  Age RunTime RunPulse   3     0.8100    7.1350  RunTime RunPulse MaxPulse   3     0.7817   11.6167  Age RunTime MaxPulse   3     0.7708   13.3453  Age Weight RunTime   3     0.7673   13.8974  Age RunTime RestPulse   3     0.7619   14.7619  RunTime RunPulse RestPulse   3     0.7618   14.7729  Weight RunTime RunPulse   3     0.7462   17.2588  Weight RunTime MaxPulse   3     0.7452   17.4060  RunTime RestPulse MaxPulse   3     0.7451   17.4243  Weight RunTime RestPulse   3     0.4666   61.5873  Age RunPulse RestPulse   3     0.4223   68.6250  Age RunPulse MaxPulse   3     0.4091   70.7102  Age Weight RunPulse   3     0.3900   73.7424  Age RestPulse MaxPulse   3     0.3568   79.0013  Age Weight RestPulse   3     0.3538   79.4891  RunPulse RestPulse MaxPulse   3     0.3208   84.7216  Weight RunPulse MaxPulse   3     0.2902   89.5693  Age Weight MaxPulse   3     0.2447   96.7952  Weight RunPulse RestPulse   3     0.1882  105.7430  Weight RestPulse MaxPulse   ------------------------------------------------------------------------------   4     0.8368    4.8800  Age RunTime RunPulse MaxPulse   4     0.8165    8.1035  Age Weight RunTime RunPulse   4     0.8158    8.2056  Weight RunTime RunPulse MaxPulse   4     0.8117    8.8683  Age RunTime RunPulse RestPulse   4     0.8104    9.0697  RunTime RunPulse RestPulse MaxPulse   4     0.7862   12.9039  Age Weight RunTime MaxPulse   4     0.7834   13.3468  Age RunTime RestPulse MaxPulse   4     0.7750   14.6788  Age Weight RunTime RestPulse   4     0.7623   16.7058  Weight RunTime RunPulse RestPulse   4     0.7462   19.2550  Weight RunTime RestPulse MaxPulse   4     0.5034   57.7590  Age Weight RunPulse RestPulse   4     0.5025   57.9092  Age RunPulse RestPulse MaxPulse   4     0.4717   62.7830  Age Weight RunPulse MaxPulse   4     0.4256   70.0963  Age Weight RestPulse MaxPulse   4     0.3858   76.4100  Weight RunPulse RestPulse MaxPulse   ------------------------------------------------------------------------------   5     0.8480    5.1063  Age Weight RunTime RunPulse MaxPulse   5     0.8370    6.8461  Age RunTime RunPulse RestPulse MaxPulse   5     0.8176    9.9348  Age Weight RunTime RunPulse RestPulse   5     0.8161   10.1685  Weight RunTime RunPulse RestPulse MaxPulse   5     0.7887   14.5111  Age Weight RunTime RestPulse MaxPulse   5     0.5541   51.7233  Age Weight RunPulse RestPulse MaxPulse   ------------------------------------------------------------------------------   6     0.8487    7.0000  Age Weight RunTime RunPulse RestPulse MaxPulse  
end example
 

The models in Output 61.1.5 are arranged first by the number of variables in the model and second by the magnitude of R 2 for the model. Before making a final decision about which model to use, you would want to perform collinearity diagnostics. Note that, since many different models have been fit and the choice of a final model is based on R 2 , the statistics are biased and the p -values for the parameter estimates are not valid.

Example 61.2. Predicting Weight by Height and Age

In this example, the weights of school children are modeled as a function of their heights and ages. Modeling is performed separately for boys and girls . The example shows the use of a BY statement with PROC REG, multiple MODEL statements, and the OUTEST= and OUTSSCP= options, which create data sets. Since the BY statement is used, interactive processing is not possible in this example; no statements can appear after the first RUN statement. The following statements produce Output 61.2.1 through Output 61.2.4:

  *------------Data on Age, Weight, and Height of Children-------*   Age (months), height (inches), and weight (pounds) were   recorded for a group of school children.   From Lewis and Taylor (1967).   *--------------------------------------------------------------*;   data htwt;   input sex $ age :3.1 height weight @@;   datalines;   f 143 56.3  85.0 f 155 62.3 105.0 f 153 63.3 108.0 f 161 59.0  92.0   f 191 62.5 112.5 f 171 62.5 112.0 f 185 59.0 104.0 f 142 56.5  69.0   f 160 62.0  94.5 f 140 53.8  68.5 f 139 61.5 104.0 f 178 61.5 103.5   f 157 64.5 123.5 f 149 58.3  93.0 f 143 51.3  50.5 f 145 58.8  89.0   f 191 65.3 107.0 f 150 59.5  78.5 f 147 61.3 115.0 f 180 63.3 114.0   f 141 61.8  85.0 f 140 53.5  81.0 f 164 58.0  83.5 f 176 61.3 112.0   f 185 63.3 101.0 f 166 61.5 103.5 f 175 60.8  93.5 f 180 59.0 112.0   f 210 65.5 140.0 f 146 56.3  83.5 f 170 64.3  90.0 f 162 58.0  84.0   f 149 64.3 110.5 f 139 57.5  96.0 f 186 57.8  95.0 f 197 61.5 121.0   f 169 62.3  99.5 f 177 61.8 142.5 f 185 65.3 118.0 f 182 58.3 104.5   f 173 62.8 102.5 f 166 59.3  89.5 f 168 61.5  95.0 f 169 62.0  98.5   f 150 61.3  94.0 f 184 62.3 108.0 f 139 52.8  63.5 f 147 59.8  84.5   f 144 59.5  93.5 f 177 61.3 112.0 f 178 63.5 148.5 f 197 64.8 112.0   f 146 60.0 109.0 f 145 59.0  91.5 f 147 55.8  75.0 f 145 57.8  84.0   f 155 61.3 107.0 f 167 62.3  92.5 f 183 64.3 109.5 f 143 55.5  84.0   f 183 64.5 102.5 f 185 60.0 106.0 f 148 56.3  77.0 f 147 58.3 111.5   f 154 60.0 114.0 f 156 54.5  75.0 f 144 55.8  73.5 f 154 62.8  93.5   f 152 60.5 105.0 f 191 63.3 113.5 f 190 66.8 140.0 f 140 60.0  77.0   f 148 60.5  84.5 f 189 64.3 113.5 f 143 58.3  77.5 f 178 66.5 117.5   f 164 65.3  98.0 f 157 60.5 112.0 f 147 59.5 101.0 f 148 59.0  95.0   f 177 61.3  81.0 f 171 61.5  91.0 f 172 64.8 142.0 f 190 56.8  98.5   f 183 66.5 112.0 f 143 61.5 116.5 f 179 63.0  98.5 f 186 57.0  83.5   f 182 65.5 133.0 f 182 62.0  91.5 f 142 56.0  72.5 f 165 61.3 106.5   f 165 55.5  67.0 f 154 61.0 122.5 f 150 54.5  74.0 f 155 66.0 144.5   f 163 56.5  84.0 f 141 56.0  72.5 f 147 51.5  64.0 f 210 62.0 116.0   f 171 63.0  84.0 f 167 61.0  93.5 f 182 64.0 111.5 f 144 61.0  92.0   f 193 59.8 115.0 f 141 61.3  85.0 f 164 63.3 108.0 f 186 63.5 108.0   f 169 61.5  85.0 f 175 60.3  86.0 f 180 61.3 110.5 m 165 64.8  98.0   m 157 60.5 105.0 m 144 57.3  76.5 m 150 59.5  84.0 m 150 60.8 128.0   m 139 60.5  87.0 m 189 67.0 128.0 m 183 64.8 111.0 m 147 50.5  79.0   m 146 57.5  90.0 m 160 60.5  84.0 m 156 61.8 112.0 m 173 61.3  93.0   m 151 66.3 117.0 m 141 53.3  84.0 m 150 59.0  99.5 m 164 57.8  95.0   m 153 60.0  84.0 m 206 68.3 134.0 m 250 67.5 171.5 m 176 63.8  98.5   m 176 65.0 118.5 m 140 59.5  94.5 m 185 66.0 105.0 m 180 61.8 104.0   m 146 57.3  83.0 m 183 66.0 105.5 m 140 56.5  84.0 m 151 58.3  86.0   m 151 61.0  81.0 m 144 62.8  94.0 m 160 59.3  78.5 m 178 67.3 119.5   m 193 66.3 133.0 m 162 64.5 119.0 m 164 60.5  95.0 m 186 66.0 112.0   m 143 57.5  75.0 m 175 64.0  92.0 m 175 68.0 112.0 m 175 63.5  98.5   m 173 69.0 112.5 m 170 63.8 112.5 m 174 66.0 108.0 m 164 63.5 108.0   m 144 59.5  88.0 m 156 66.3 106.0 m 149 57.0  92.0 m 144 60.0 117.5   m 147 57.0  84.0 m 188 67.3 112.0 m 169 62.0 100.0 m 172 65.0 112.0   m 150 59.5  84.0 m 193 67.8 127.5 m 157 58.0  80.5 m 168 60.0  93.5   m 140 58.5  86.5 m 156 58.3  92.5 m 156 61.5 108.5 m 158 65.0 121.0   m 184 66.5 112.0 m 156 68.5 114.0 m 144 57.0  84.0 m 176 61.5  81.0   m 168 66.5 111.5 m 149 52.5  81.0 m 142 55.0  70.0 m 188 71.0 140.0   m 203 66.5 117.0 m 142 58.8  84.0 m 189 66.3 112.0 m 188 65.8 150.5   m 200 71.0 147.0 m 152 59.5 105.0 m 174 69.8 119.5 m 166 62.5  84.0   m 145 56.5  91.0 m 143 57.5 101.0 m 163 65.3 117.5 m 166 67.3 121.0   m 182 67.0 133.0 m 173 66.0 112.0 m 155 61.8  91.5 m 162 60.0 105.0   m 177 63.0 111.0 m 177 60.5 112.0 m 175 65.5 114.0 m 166 62.0  91.0   m 150 59.0  98.0 m 150 61.8 118.0 m 188 63.3 115.5 m 163 66.0 112.0   m 171 61.8 112.0 m 162 63.0  91.0 m 141 57.5  85.0 m 174 63.0 112.0   m 142 56.0  87.5 m 148 60.5 118.0 m 140 56.8  83.5 m 160 64.0 116.0   m 144 60.0  89.0 m 206 69.5 171.5 m 159 63.3 112.0 m 149 56.3  72.0   m 193 72.0 150.0 m 194 65.3 134.5 m 152 60.8  97.0 m 146 55.0  71.5   m 139 55.0  73.5 m 186 66.5 112.0 m 161 56.8  75.0 m 153 64.8 128.0   m 196 64.5  98.0 m 164 58.0  84.0 m 159 62.8  99.0 m 178 63.8 112.0   m 153 57.8  79.5 m 155 57.3  80.5 m 178 63.5 102.5 m 142 55.0  76.0   m 164 66.5 112.0 m 189 65.0 114.0 m 164 61.5 140.0 m 167 62.0 107.5   m 151 59.3  87.0   ;   title '----- Data on age, weight, and height of children ------';   proc reg outest=est1 outsscp=sscp1 rsquare;   by sex;   eq1: model weight=height;   eq2: model weight=height age;   proc print data=sscp1;   title2 'SSCP type data set';   proc print data=est1;   title2 'EST type data set';   run;  
Output 61.2.1: Height and Weight Data: Female Children
start example
  ----- Data on age, weight, and height of children ------   ------------------------------------ sex=f -------------------------------------   The REG Procedure   Model: eq1   Dependent Variable: weight   Analysis of Variance   Sum of           Mean   Source                   DF        Squares         Square    F Value    Pr > F   Model                     1          21507          21507     141.09    <.0001   Error                   109          16615      152.42739   Corrected Total         110          38121   Root MSE             12.34615    R-Square     0.5642   Dependent Mean       98.87838    Adj R-Sq     0.5602   Coeff Var            12.48620   Parameter Estimates   Parameter       Standard   Variable     DF       Estimate          Error    t Value    Pr > t   Intercept     1   153.12891       21.24814   7.21      <.0001   height        1        4.16361        0.35052      11.88      <.0001   ----- Data on age, weight, and height of children ------   ------------------------------------ sex=f -------------------------------------   The REG Procedure   Model: eq2   Dependent Variable: weight   Analysis of Variance   Sum of           Mean   Source                   DF        Squares         Square    F Value    Pr > F   Model                     2          22432          11216      77.21    <.0001   Error                   108          15689      145.26700   Corrected Total         110          38121   Root MSE             12.05268    R-Square     0.5884   Dependent Mean       98.87838    Adj R-Sq     0.5808   Coeff Var            12.18939   Parameter Estimates   Parameter       Standard   Variable     DF       Estimate          Error    t Value    Pr > t   Intercept     1   150.59698       20.76730   7.25      <.0001   height        1        3.60378        0.40777       8.84      <.0001   age           1        1.90703        0.75543       2.52      0.0130  
end example
 
Output 61.2.2: Height and Weight Data: Male Children
start example
  ----- Data on age, weight, and height of children ------   ------------------------------------ sex=m -------------------------------------   The REG Procedure   Model: eq1   Dependent Variable: weight   Analysis of Variance   Sum of           Mean   Source                   DF        Squares         Square    F Value    Pr > F   Model                     1          31126          31126     206.24    <.0001   Error                   124          18714      150.92222   Corrected Total         125          49840   Root MSE             12.28504    R-Square     0.6245   Dependent Mean      103.44841    Adj R-Sq     0.6215   Coeff Var            11.87552   Parameter Estimates   Parameter       Standard   Variable     DF       Estimate          Error    t Value    Pr > t   Intercept     1   125.69807       15.99362   7.86      <.0001   height        1        3.68977        0.25693      14.36      <.0001   ----- Data on age, weight, and height of children ------   ------------------------------------ sex=m -------------------------------------   The REG Procedure   Model: eq2   Dependent Variable: weight   Analysis of Variance   Sum of           Mean   Source                   DF        Squares         Square    F Value    Pr > F   Model                     2          32975          16487     120.24    <.0001   Error                   123          16866      137.11922   Corrected Total         125          49840   Root MSE             11.70979    R-Square     0.6616   Dependent Mean      103.44841    Adj R-Sq     0.6561   Coeff Var            11.31945   Parameter Estimates   Parameter       Standard   Variable     DF       Estimate          Error    t Value    Pr > t   Intercept     1   113.71346       15.59021   7.29      <.0001   height        1        2.68075        0.36809       7.28      <.0001   age           1        3.08167        0.83927       3.67      0.0004  
end example
 
Output 61.2.3: SSCP Matrix
start example
  ----- Data on age, weight, and height of children ------   SSCP type data set   Obs   sex   _TYPE_   _NAME_      Intercept      height       weight         age   1    f     SSCP    Intercept      111.0      6718.40     10975.50     1824.90   2    f     SSCP    height        6718.4    407879.32    669469.85   110818.32   3    f     SSCP    weight       10975.5    669469.85   1123360.75   182444.95   4    f     SSCP    age           1824.9    110818.32    182444.95    30363.81   5    f     N                      111.0       111.00       111.00      111.00   6    m     SSCP    Intercept      126.0      7825.00     13034.50     2072.10   7    m     SSCP    height        7825.0    488243.60    817919.60   129432.57   8    m     SSCP    weight       13034.5    817919.60   1398238.75   217717.45   9    m     SSCP    age           2072.1    129432.57    217717.45    34515.95   10    m     N                      126.0       126.00       126.00      126.00  
end example
 
Output 61.2.4: OUTEST Data Set
start example
  ----- Data on age, weight, and height of children ------   EST type data set   Obs sex _MODEL_ _TYPE_ _DEPVAR_ _RMSE_  Intercept  height weight   age   _IN_ _P_ _EDF_  _RSQ_   1   f    eq1   PARMS   weight 12.3461   153.129 4.16361   1    .        1   2   109  0.56416   2   f    eq2   PARMS   weight 12.0527   150.597 3.60378   1   1.90703   2   3   108  0.58845   3   m    eq1   PARMS   weight 12.2850   125.698 3.68977   1    .        1   2   124  0.62451   4   m    eq2   PARMS   weight 11.7098   113.713 2.68075   1   3.08167   2   3   123  0.66161  
end example
 

For both females and males, the overall F statistics for both models are significant, indicating that the model explains a significant portion of the variation in the data. For females, the full model is

click to expand

and, for males, the full model is

click to expand

The OUTSSCP= data set is shown in Output 61.2.3. Note how the BY groups are separated. Observations with _TYPE_ = ˜N contain the number of observations in the associated BY group. Observations with _TYPE_ = ˜SSCP contain the rows of the uncorrected sums of squares and crossproducts matrix. The observations with _NAME_ = ˜Intercept contain crossproducts for the intercept.

The OUTEST= data set is displayed in Output 61.2.4; again, the BY groups are separated. The _MODEL_ column contains the labels for models from the MODEL statements. If no labels are specified, the defaults MODEL1 and MODEL2 would appear as values for _MODEL_ . Note that _TYPE_ = ˜PARMS for all observations, indicating that all observations contain parameter estimates. The _DEPVAR_ column displays the dependent variable, and the _RMSE_ column gives the Root Mean Square Error for the associated model. The Intercept column gives the estimate for the intercept for the associated model, and variables with the same name as variables in the original data set ( height, age ) give parameter estimates for those variables. The dependent variable, weight , is shown with a value of ˆ’ 1. The _IN_ column contains the number of regressors in the model not including the intercept; _P_ contains the number of parameters in the model; _EDF_ contains the error degrees of freedom; and _RSQ_ contains the R 2 statistic. Finally, note that the _IN_, _P_, _EDF_ and _RSQ_ columns appear in the OUTEST= data set since the RSQUARE option is specified in the PROC REG statement.

Example 61.3. Regression with Quantitative and Qualitative Variables

At times it is desirable to have independent variables in the model that are qualitative rather than quantitative. This is easily handled in a regression framework. Regression uses qualitative variables to distinguish between populations. There are two main advantages of fitting both populations in one model. You gain the ability to test for different slopes or intercepts in the populations, and more degrees of freedom are available for the analysis.

Regression with qualitative variables is different from analysis of variance and analysis of covariance. Analysis of variance uses qualitative independent variables only. Analysis of covariance uses quantitative variables in addition to the qualitative variables in order to account for correlation in the data and reduce MSE; however, the quantitative variables are not of primary interest and merely improve the precision of the analysis.

Consider the case where Y i is the dependent variable, X1 i is a quantitative variable, X2 i is a qualitative variable taking on values 0 or 1, and X1 i X2 i is the interaction. The variable X2 i is called a dummy , binary, or indicator variable. With values 0 or 1, it distinguishes between two populations. The model is of the form

click to expand

for the observations i =1 , 2 , ,n . The parameters to be estimated are ² , ² 1 , ² 2 , and ² 3 . The number of dummy variables used is one less than the number of qualitative levels. This yields a nonsingular X ² X matrix. See Chapter 10 of Neter, Wasserman, and Kutner (1990) for more details.

An example from Neter, Wasserman, and Kutner (1990) follows . An economist is investigating the relationship between the size of an insurance firm and the speed at which they implement new insurance innovations. He believes that the type of firm may affect this relationship and suspects that there may be some interaction between the size and type of firm. The dummy variable in the model allows the two firms to have different intercepts. The interaction term allows the firms to have different slopes as well.

In this study, Y i is the number of months from the time the first firm implemented the innovation to the time it was implemented by the ith firm. The variable X1 i is the size of the firm, measured in total assets of the firm. The variable X2 i denotes the firm type and is 0 if the firm is a mutual fund company and 1 if the firm is a stock company. The dummy variable allows each firm type to have a different intercept and slope.

The previous model can be broken down into a model for each firm type by plugging in the values for X2 i . If X2 i = 0, the model is

click to expand

This is the model for a mutual company. If X2 i =1, the model for a stock firm is

click to expand

This model has intercept ² + ² 2 and slope ² 1 + ² 3 .

The data [ *] follow. Note that the interaction term is created in the DATA step since polynomial effects such as size*type are not allowed in the MODEL statement in the REG procedure.

  title 'Regression With Quantitative and Qualitative Variables';   data insurance;   input time size type @@;   sizetype=size*type;   datalines;   17 151 0   26  92 0   21 175 0   30  31 0   22 104 0   0 277 0   12 210 0   19 120 0    4 290 0   16 238 0   28 164 1   15 272 1   11 295 1   38  68 1   31  85 1   21 224 1   20 166 1   13 305 1   30 124 1   14 246 1   ;   run;  

The following statements begin the analysis:

  proc reg data=insurance;   model time = size type sizetype;   run;  

The ANOVA table is displayed in Output 61.3.1.

Output 61.3.1: ANOVA Table and Parameter Estimates
start example
  Regression With Quantitative and Qualitative Variables   The REG Procedure   Model: MODEL1   Dependent Variable: time   Analysis of Variance   Sum of           Mean   Source                   DF        Squares         Square    F Value    Pr > F   Model                     3     1504.41904      501.47301      45.49    <.0001   Error                    16      176.38096       11.02381   Corrected Total          19     1680.80000   Root MSE              3.32021    R-Square     0.8951   Dependent Mean       19.40000    Adj R-Sq     0.8754   Coeff Var            17.11450   Parameter Estimates   Parameter       Standard   Variable     DF       Estimate          Error    t Value    Pr > t   Intercept     1       33.83837        2.44065      13.86      <.0001   size          1   0.10153        0.01305   7.78      <.0001   type          1        8.13125        3.65405       2.23      0.0408   sizetype      1   0.00041714        0.01833   0.02      0.9821  
end example
 

The overall F statistic is significant ( F =45.490, p <0.0001). The interaction term is not significant ( t = ˆ’ 0.023, p =0.9821). Hence, this term should be removed and the model re-fitted, as shown in the following statements.

  delete sizetype;   print;   run;  

The DELETE statement removes the interaction term ( sizetype ) from the model. The new ANOVA table is shown in Output 61.3.2.

Output 61.3.2: ANOVA Table and Parameter Estimates
start example
  Regression With Quantitative and Qualitative Variables   The REG Procedure   Model: MODEL1.1   Dependent Variable: time   Analysis of Variance   Sum of           Mean   Source                   DF        Squares         Square    F Value    Pr > F   Model                     2     1504.41333      752.20667      72.50    <.0001   Error                    17      176.38667       10.37569   Corrected Total          19     1680.80000   Root MSE              3.22113    R-Square     0.8951   Dependent Mean       19.40000    Adj R-Sq     0.8827   Coeff Var            16.60377   Parameter Estimates   Parameter       Standard   Variable     DF       Estimate          Error    t Value    Pr > t   Intercept     1       33.87407        1.81386      18.68      <.0001   size          1   0.10174        0.00889   11.44      <.0001   type          1        8.05547        1.45911       5.52      <.0001  
end example
 

The overall F statistic is still significant ( F =72.497, p <0.0001). The intercept and the coefficients associated with size and type are significantly different from zero ( t =18.675, p <0.0001; t = ˆ’ 11.443, p <0.0001; t =5.521, p <0.0001, respectively). Notice that the R 2 did not change with the omission of the interaction term.

The fitted model is

click to expand

The fitted model for a mutual fund company ( X 2 i = 0) is

click to expand

and the fitted model for a stock company ( X 2 i = 1) is

click to expand

So the two models have different intercepts but the same slope.

Now plot the residual versus predicted values using the firm type as the plot symbol (PLOT=TYPE); this can be useful in determining if the firm types have different residual patterns. PROC REG does not support the plot y*x=type syntax for high-resolution graphics, so use PROC GPLOT to create Output 61.3.3. First, the OUTPUT statement saves the residuals and predicted values from the new model in the OUT= data set.

  output out=out r=r p=p;   run;   symbol1 v='0' c=blue   f=swissb;   symbol2 v='1' c=yellow f=swissb;   axis1 label=(angle=90);   proc gplot data=out;   plot r*p=type    / nolegend vaxis=axis1 cframe=ligr;   plot p*size=type / nolegend vaxis=axis1 cframe=ligr;   run;  
Output 61.3.3: Plot of Residual vs. Predicted Values
start example
click to expand
end example
 

The residuals show no major trend. Neither firm type by itself shows a trend either. This indicates that the model is satisfactory.

A plot of the predicted values versus size appears in Output 61.3.4, where the firm type is again used as the plotting symbol.

The different intercepts are very evident in this plot.

Example 61.4. Displaying Plots for Simple Linear Regression

This example introduces the basic PROC REG graphics syntax used to produce a standard plot of data from the aerobic fitness data set (Example 61.1 on page 3924). A simple linear regression of Oxygen on RunTime is performed, and a plot of Oxygen * RunTime is requested . The fitted model, the regression line, and the four default statistics are also displayed in Output 61.4.1.

  data fitness;   set fitness;   label Age      ='age(years)'   Weight   ='weight(kg)'   Oxygen   ='oxygen uptake(ml/kg/min)'   RunTime  ='1.5 mile time(min)'   RestPulse='rest pulse'   RunPulse ='running pulse'   MaxPulse ='maximum running pulse';   proc reg data=fitness;   model Oxygen=RunTime;   plot Oxygen*RunTime / cframe=ligr;   run;  
Output 61.4.1: Simple Linear Regression
start example
click to expand
end example
 

Example 61.5. Creating a C p Plot

The C p statistics for model selection are plotted against the number of parameters in the model, and the CHOCKING= and CMALLOWS= options draw useful reference lines. Note the four default statistics in the plot margin, the default model equation, and the default legend in Output 61.5.1.

  title 'Cp Plot with Reference Lines';   symbol1 c=green;   proc reg data=fitness;   model Oxygen=Age Weight RunTime RunPulse RestPulse MaxPulse   / selection=rsquare noprint;   plot cp.*np.   / chocking=red cmallows=blue   vaxis=0 to 15 by 5 cframe=ligr;   run;  
Output 61.5.1: C p Plot
start example
click to expand
end example
 

Using the criteria suggested by Hocking (1976) (see the section Dictionary of PLOT Statement Options beginning on page 3844), Output 61.5.1 indicates that a 6-variable model is a reasonable choice for doing parameter estimation, while a 5-variable model may be suitable for doing prediction.

Example 61.6. Controlling Plot Appearance with Graphic Options

This example uses model fit summary statistics from the OUTEST= data set to create a plot for a model selection analysis. Global graphics statements and PLOT statement options are used to control the appearance of the plot.

  goptions ctitle=black   htitle=3.5pct ftitle=swiss   ctext =magenta htext =3.0pct ftext =swiss   cback =ligr    border;   symbol1 v=circle c=red h=1 w=2;   title1 Selection=Rsquare;   title2 plot Rsquare versus the number of parameters P in   each model;   proc reg data=fitness;   model Oxygen=Age Weight RunTime RunPulse RestPulse MaxPulse   / selection=rsquare noprint;   plot rsq.*np.   / aic bic edf gmsep jp np pc sbc sp   haxis=2 to 7 by 1   caxis=red cframe=white ctext=blue   modellab=Full Model modelht=2.4   statht=2.4;   run;  

In the GOPTIONS statement,

BORDER

frames the entire display

CBACK=

specifies the background color

CTEXT=

selects the default color for the border and all text, including titles, footnotes, and notes

CTITLE=

specifies the title, footnote, note, and border color

HTEXT=

specifies the height for all text in the display

HTITLE=

specifies the height for the first title line

FTEXT=

selects the default font for all text, including titles, footnotes, notes, the model label and equation, the statistics, the axis labels, the tick values, and the legend

FTITLE=

specifies the first title font

For more information on the GOPTIONS statement and other global graphics statements, refer to SAS/GRAPH Software: Reference .

Output 61.6.1: Controlling Plot Appearance and Plotting OUTEST= Statistics
start example
click to expand
end example
 

In Output 61.6.1, note the following:

  • The PLOT statement option CTEXT= affects all text not controlled by the CTITLE= option in the GOPTIONS statement. Hence, the GOPTIONS statement option CTEXT=MAGENTA has no effect. Therefore, the color of the title is black and all other text is blue.

  • The area enclosed by the axes and the frame has a white background, while the background outside the plot area is gray.

  • The MODELHT= option allows the entire model equation to fit on one line.

  • The STATHT= option allows the statistics in the margin to fit in one column.

  • The displayed statistics and the fitted model equation refer to the selected model. See the Traditional High-Resolution Graphics Plots section beginning on page 3840 for more information.

Example 61.7. Plotting Model Diagnostic Statistics

This example illustrates how you can display diagnostics for checking the adequacy of a regression model. The following statements plot the studentized deleted residuals against the observation number for the full model. Vertical reference lines at ± tinv( . 95 ,n ˆ’ p ˆ’ 1) = ± 1 . 714 are added to identify possible outlying Oxygen values. A vertical reference line is displayed at zero by default when the RSTUDENT option is specified. The graph is shown in Output 61.7.1. Observations 15 and 17 are indicated as possible outliers.

  title Check for Outlying Observations;   symbol v=dot h=1 c=green;   proc reg data=fitness;   model Oxygen=Age Weight RunTime RunPulse RestPulse MaxPulse;   plot rstudent.*obs.   / vref= -1.714 1.714 cvref=blue lvref=1   href= 0 to 30 by 5 chref=red  cframe=ligr;   run;  
Output 61.7.1: Plotting Model Diagnostic Statistics
start example
click to expand
end example
 

Example 61.8. Creating P and Q Plots

The following program creates probability-probability plots and quantile-quantile plots of the residuals ( Output 61.8.1 and Output 61.8.2, respectively). An annotation data set is created to produce the (0,0) ˆ’ (1,1) reference line for the PP-plot. Note that the NOSTAT option for the PP-plot suppresses the statistics that would be displayed in the margin.

  data annote1;   length function color ;   retain ysys xsys 


SAS.STAT 9.1 Users Guide (Vol. 6)
SAS.STAT 9.1 Users Guide (Vol. 6)
ISBN: N/A
EAN: N/A
Year: 2004
Pages: 127

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net