The LOGISTIC procedure is similar in use to the other regression procedures in the SAS System. To demonstrate the similarity, suppose the response variable y is binary or ordinal, and x1 and x2 are two explanatory variables of interest. To fit a logistic regression model, you can use a MODEL statement similar to that used in the REG procedure:
proc logistic; model y=x1 x2; run;
The response variable y can be either character or numeric. PROC LOGISTIC enumerates the total number of response categories and orders the response levels according to the response variable option ORDER= in the MODEL statement. The procedure also allows the input of binary response data that are grouped:
proc logistic; model r/n=x1 x2; run;
Here, n represents the number of trials and r represents the number of events.
The following example illustrates the use of PROC LOGISTIC. The data, taken from Cox and Snell (1989, pp. 10 “11), consist of the number, r , of ingots not ready for rolling, out of n tested , for a number of combinations of heating time and soaking time. The following invocation of PROC LOGISTIC fits the binary logit model to the grouped data:
data ingots; input Heat Soak r n @@; datalines; 7 1.0 0 10 14 1.0 0 31 27 1.0 1 56 51 1.0 3 13 7 1.7 0 17 14 1.7 0 43 27 1.7 4 44 51 1.7 0 1 7 2.2 0 7 14 2.2 2 33 27 2.2 0 21 51 2.2 0 1 7 2.8 0 12 14 2.8 0 31 27 2.8 1 22 51 4.0 0 1 7 4.0 0 9 14 4.0 0 19 27 4.0 1 16 ; proc logistic data=ingots; model r/n=Heat Soak; run;
The results of this analysis are shown in the following tables.
PROC LOGISTIC first lists background information in Figure 42.1 about the fitting of the model. Included are the name of the input data set, the response variable(s) used, the number of observations used, and the link function used.
The LOGISTIC Procedure Model Information Data Set WORK.INGOTS Response Variable (Events) r Response Variable (Trials) n Model binary logit Optimization Technique Fisher's scoring Number of Observations Read 19 Number of Observations Used 19 Sum of Frequencies Read 387 Sum of Frequencies Used 387
The Response Profile table (Figure 42.2) lists the response categories (which are Event and Nonevent when grouped data are input), their ordered values, and their total frequencies for the given data.
Response Profile Ordered Binary Total Value Outcome Frequency 1 Event 12 2 Nonevent 375 Model Convergence Status Convergence criterion (GCONV=1E-8) satisfied.
The Model Fit Statistics table (Figure 42.3) contains the Akaike Information Criterion (AIC), the Schwarz Criterion (SC), and the negative of twice the log likelihood (-2 Log L) for the intercept-only model and the fitted model. AIC and SC can be used to compare different models, and the ones with smaller values are preferred. Results of the likelihood ratio test and the efficient score test for testing the joint significance of the explanatory variables ( Soak and Heat ) are included in the Testing Global Null Hypothesis: BETA=0 table (Figure 42.3).
Model Fit Statistics Intercept Intercept and Criterion Only Covariates AIC 108.988 101.346 SC 112.947 113.221 2 Log L 106.988 95.346 Testing Global Null Hypothesis: BETA=0 Test Chi-Square DF Pr > ChiSq Likelihood Ratio 11.6428 2 0.0030 Score 15.1091 2 0.0005 Wald 13.0315 2 0.0015
The Analysis of Maximum Likelihood Estimates table in Figure 42.4 lists the parameter estimates, their standard errors, and the results of the Wald test for individual parameters. The odds ratio for each effect parameter, estimated by exponentiating the corresponding parameter estimate, is shown in the Odds Ratios Estimates table (Figure 42.4), along with 95% Wald confidence intervals.
Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 1 5.5592 1.1197 24.6503 <.0001 Heat 1 0.0820 0.0237 11.9454 0.0005 Soak 1 0.0568 0.3312 0.0294 0.8639 Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits Heat 1.085 1.036 1.137 Soak 1.058 0.553 2.026
Using the parameter estimates, you can calculate the estimated logit of as
If Heat =7 and Soak =1, then logit( ) = ˆ’ 4 . 9284. Using this logit estimate, you can calculate as follows :
This gives the predicted probability of the event (ingot not ready for rolling) for Heat =7 and Soak =1. Note that PROC LOGISTIC can calculate these statistics for you; use the OUTPUT statement with the PREDICTED= option.
Finally, the Association of Predicted Probabilities and Observed Responses table (Figure 42.5) contains four measures of association for assessing the predictive ability of a model. They are based on the number of pairs of observations with different response values, the number of concordant pairs, and the number of discordant pairs, which are also displayed. Formulas for these statistics are given in the Rank Correlation of Observed Responses and Predicted Probabilities section on page 2350.
Association of Predicted Probabilities and Observed Responses Percent Concordant 64.4 Somers' D 0.460 Percent Discordant 18.4 Gamma 0.555 Percent Tied 17.2 Tau-a 0.028 Pairs 4500 c 0.730
To illustrate the use of an alternative form of input data, the following program creates the INGOTS data set with new variables NotReady and Freq instead of n and r . The variable NotReady represents the response of individual units; it has a value of 1 for units not ready for rolling (event) and a value of 0 for units ready for rolling (nonevent). The variable Freq represents the frequency of occurrence of each combination of Heat , Soak ,and NotReady . Note that, compared to the previous data set, NotReady =1 implies Freq = r , and NotReady =0 implies Freq = n ˆ’ r .
data ingots; input Heat Soak NotReady Freq @@; datalines; 7 1.0 0 10 14 1.0 0 31 14 4.0 0 19 27 2.2 0 21 51 1.0 1 3 7 1.7 0 17 14 1.7 0 43 27 1.0 1 1 27 2.8 1 1 51 1.0 0 10 7 2.2 0 7 14 2.2 1 2 27 1.0 0 55 27 2.8 0 21 51 1.7 0 1 7 2.8 0 12 14 2.2 0 31 27 1.7 1 4 27 4.0 1 1 51 2.2 0 1 7 4.0 0 9 14 2.8 0 31 27 1.7 0 40 27 4.0 0 15 51 4.0 0 1 ;
The following SAS statements invoke PROC LOGISTIC to fit the same model using the alternative form of the input data set.
proc logistic data=ingots; model NotReady(event='1') = Soak Heat; freq Freq; run;
Results of this analysis are the same as the previous one. The displayed output for the two runs are identical except for the background information of the model fitand the Response Profile table shown in Figure 42.6.
The LOGISTIC Procedure Response Profile Ordered Total Value NotReady Frequency 1 0 375 2 1 12 Probability modeled is NotReady=1.
By default, Ordered Values are assigned to the sorted response values in ascending order, and PROC LOGISTIC models the probability of the response level that corresponds to the Ordered Value 1. There are several methods to change these defaults; the preceding statements specify the response variable option EVENT= to model the probability of NotReady =1 as displayed in Figure 42.6. See the Response Level Ordering section on page 2329 for more details.