The LOGISTIC procedure is similar in use to the other regression procedures in the SAS System. To demonstrate the similarity, suppose the response variable y is binary or ordinal, and x1 and x2 are two explanatory variables of interest. To fit a logistic regression model, you can use a MODEL statement similar to that used in the REG procedure:
proc logistic; model y=x1 x2; run;
The response variable y can be either character or numeric. PROC LOGISTIC enumerates the total number of response categories and orders the response levels according to the response variable option ORDER= in the MODEL statement. The procedure also allows the input of binary response data that are grouped:
proc logistic; model r/n=x1 x2; run;
Here, n represents the number of trials and r represents the number of events.
The following example illustrates the use of PROC LOGISTIC. The data, taken from Cox and Snell (1989, pp. 10 “11), consist of the number, r , of ingots not ready for rolling, out of n tested , for a number of combinations of heating time and soaking time. The following invocation of PROC LOGISTIC fits the binary logit model to the grouped data:
data ingots; input Heat Soak r n @@; datalines; 7 1.0 0 10 14 1.0 0 31 27 1.0 1 56 51 1.0 3 13 7 1.7 0 17 14 1.7 0 43 27 1.7 4 44 51 1.7 0 1 7 2.2 0 7 14 2.2 2 33 27 2.2 0 21 51 2.2 0 1 7 2.8 0 12 14 2.8 0 31 27 2.8 1 22 51 4.0 0 1 7 4.0 0 9 14 4.0 0 19 27 4.0 1 16 ; proc logistic data=ingots; model r/n=Heat Soak; run;
The results of this analysis are shown in the following tables.
PROC LOGISTIC first lists background information in Figure 42.1 about the fitting of the model. Included are the name of the input data set, the response variable(s) used, the number of observations used, and the link function used.
|     |  
The LOGISTIC Procedure Model Information Data Set WORK.INGOTS Response Variable (Events) r Response Variable (Trials) n Model binary logit Optimization Technique Fisher's scoring Number of Observations Read 19 Number of Observations Used 19 Sum of Frequencies Read 387 Sum of Frequencies Used 387
|     |  
The Response Profile table (Figure 42.2) lists the response categories (which are Event and Nonevent when grouped data are input), their ordered values, and their total frequencies for the given data.
|     |  
Response Profile Ordered Binary Total Value Outcome Frequency 1 Event 12 2 Nonevent 375 Model Convergence Status Convergence criterion (GCONV=1E-8) satisfied.
|     |  
The Model Fit Statistics table (Figure 42.3) contains the Akaike Information Criterion (AIC), the Schwarz Criterion (SC), and the negative of twice the log likelihood (-2 Log L) for the intercept-only model and the fitted model. AIC and SC can be used to compare different models, and the ones with smaller values are preferred. Results of the likelihood ratio test and the efficient score test for testing the joint significance of the explanatory variables ( Soak and Heat ) are included in the Testing Global Null Hypothesis: BETA=0 table (Figure 42.3).
|     |  
  Model Fit Statistics   Intercept   Intercept            and   Criterion          Only     Covariates   AIC             108.988        101.346   SC              112.947        113.221     2 Log L        106.988         95.346   Testing Global Null Hypothesis: BETA=0   Test                 Chi-Square       DF     Pr > ChiSq   Likelihood Ratio        11.6428        2         0.0030   Score                   15.1091        2         0.0005   Wald                    13.0315        2         0.0015   |     |  
The Analysis of Maximum Likelihood Estimates table in Figure 42.4 lists the parameter estimates, their standard errors, and the results of the Wald test for individual parameters. The odds ratio for each effect parameter, estimated by exponentiating the corresponding parameter estimate, is shown in the Odds Ratios Estimates table (Figure 42.4), along with 95% Wald confidence intervals.
|     |  
  Analysis of Maximum Likelihood Estimates   Standard          Wald   Parameter   DF    Estimate       Error    Chi-Square   Pr > ChiSq   Intercept    1   5.5592      1.1197       24.6503       <.0001   Heat         1      0.0820      0.0237       11.9454       0.0005   Soak         1      0.0568      0.3312        0.0294       0.8639   Odds Ratio Estimates   Point          95% Wald   Effect    Estimate      Confidence Limits   Heat         1.085       1.036       1.137   Soak         1.058       0.553       2.026   |     |  
Using the parameter estimates, you can calculate the estimated logit of as
 If  Heat  =7 and  Soak  =1, then logit(  
  ) =  ˆ’  4  .  9284. Using this logit estimate, you can calculate  
  as follows : 
This gives the predicted probability of the event (ingot not ready for rolling) for Heat =7 and Soak =1. Note that PROC LOGISTIC can calculate these statistics for you; use the OUTPUT statement with the PREDICTED= option.
Finally, the Association of Predicted Probabilities and Observed Responses table (Figure 42.5) contains four measures of association for assessing the predictive ability of a model. They are based on the number of pairs of observations with different response values, the number of concordant pairs, and the number of discordant pairs, which are also displayed. Formulas for these statistics are given in the Rank Correlation of Observed Responses and Predicted Probabilities section on page 2350.
|     |  
Association of Predicted Probabilities and Observed Responses Percent Concordant 64.4 Somers' D 0.460 Percent Discordant 18.4 Gamma 0.555 Percent Tied 17.2 Tau-a 0.028 Pairs 4500 c 0.730
|     |  
To illustrate the use of an alternative form of input data, the following program creates the INGOTS data set with new variables NotReady and Freq instead of n and r . The variable NotReady represents the response of individual units; it has a value of 1 for units not ready for rolling (event) and a value of 0 for units ready for rolling (nonevent). The variable Freq represents the frequency of occurrence of each combination of Heat , Soak ,and NotReady . Note that, compared to the previous data set, NotReady =1 implies Freq = r , and NotReady =0 implies Freq = n ˆ’ r .
data ingots; input Heat Soak NotReady Freq @@; datalines; 7 1.0 0 10 14 1.0 0 31 14 4.0 0 19 27 2.2 0 21 51 1.0 1 3 7 1.7 0 17 14 1.7 0 43 27 1.0 1 1 27 2.8 1 1 51 1.0 0 10 7 2.2 0 7 14 2.2 1 2 27 1.0 0 55 27 2.8 0 21 51 1.7 0 1 7 2.8 0 12 14 2.2 0 31 27 1.7 1 4 27 4.0 1 1 51 2.2 0 1 7 4.0 0 9 14 2.8 0 31 27 1.7 0 40 27 4.0 0 15 51 4.0 0 1 ;
The following SAS statements invoke PROC LOGISTIC to fit the same model using the alternative form of the input data set.
proc logistic data=ingots; model NotReady(event='1') = Soak Heat; freq Freq; run;
Results of this analysis are the same as the previous one. The displayed output for the two runs are identical except for the background information of the model fitand the Response Profile table shown in Figure 42.6.
|     |  
The LOGISTIC Procedure Response Profile Ordered Total Value NotReady Frequency 1 0 375 2 1 12 Probability modeled is NotReady=1.
|     |  
By default, Ordered Values are assigned to the sorted response values in ascending order, and PROC LOGISTIC models the probability of the response level that corresponds to the Ordered Value 1. There are several methods to change these defaults; the preceding statements specify the response variable option EVENT= to model the probability of NotReady =1 as displayed in Figure 42.6. See the Response Level Ordering section on page 2329 for more details.