In this example, Dose is a variable representing the level of a stimulus, N represents the number of subjects tested at each level of the stimulus, and Response is the number of subjects responding to that level of the stimulus. Both probit and logit response models are fit to the data. The LOG10 option in the PROC PROBIT statement requests that the log base 10 of Dose is used as the independent variable. Specifically, for a given level of Dose , the probability p of a positive response is modeled as
The probabilities are estimated first using the normal distribution function (the default) and then using the logistic distribution function. Note that, in this model specification, the natural rate is assumed to be zero.
The LACKFIT option specifies lack-of-fit tests and the INVERSECL option specifies inverse confidence limits.
In the DATA step that reads the data, a number of observations are generated that have a missing value for the response. Although the PROBIT procedure does not use the observations with the missing values to fit the model, it does give predicted values for all nonmissing sets of independent variables . These data points fill in the plot of fitted and observed values in the logistic model displayed in Output 60.1.2. The plot, requested with the PREDPPLOT statement, displays the estimated logistic cumulative distribution function and the observed response rates. The VAR= DOSE option specifies the horizontal axis variable in the plot.
The following statements produce Output 60.1.1:
Output from Probit Procedure Probit Procedure Iteration History for Parameter Estimates Iter Ridge Loglikelihood Intercept Log10(Dose) 0 0 51.292891 0 0 1 0 37.881166 1.355817008 2.635206083 2 0 37.286169 1.764939171 3.3408954936 3 0 37.280389 1.812147863 3.4172391614 4 0 37.280388 1.812704962 3.418117919 5 0 37.280388 1.812704962 3.418117919
data a; infile cards eof=eof; input Dose N Response; Observed= Response/N; output; return; eof: do Dose=0.5 to 7.5 by 0.25; output; end; datalines; 1 10 1 2 12 2 3 10 4 4 10 5 5 12 8 6 10 8 7 10 10 ;
proc probit log10; model Response/N=Dose / lackfit inversecl itprint; output out=B p=Prob std=std xbeta=xbeta; title 'Output from Probit Procedure'; run; symbol v=dot c=white; proc probit log10; model Response/N=Dose / d=logistic inversecl; predpplot var = dose cfit = blue cframe=ligr inborder; output out=B p=Prob std=std xbeta=xbeta; title Output from Probit Procedure; run;
Output from Probit Procedure Probit Procedure Model Information Data Set WORK.B Events Variable Response Trials Variable N Number of Observations 7 Number of Events 38 Number of Trials 74 Missing Values 29 Name of Distribution Normal Log Likelihood 37.28038802 Last Evaluation of the Negative of the Gradient Intercept Log10(Dose) 3.434907E-7 2.09809E-8 Last Evaluation of the Negative of the Hessian Intercept Log10(Dose) Intercept 36.005280383 20.152675982 Log10(Dose) 20.152675982 13.078826305 Goodness-of-Fit Tests Statistic Value DF Pr > ChiSq Pearson Chi-Square 3.6497 5 0.6009 L.R. Chi-Square 4.6381 5 0.4616 Response-Covariate Profile Response Levels 2 Number of Covariate Values 7
The p -values in the Goodness-of-Fit table of 0.6009 for the Pearson chi-square and 0.4616 for the likelihood ratio chi-square indicate an adequate fit for the model fit with the normal distribution.
Output from Probit Procedure Probit Procedure Analysis of Parameter Estimates Standard 95% Confidence Chi- Parameter DF Estimate Error Limits Square Pr > ChiSq Intercept 1 1.8127 0.4493 2.6934 0.9320 16.27 <.0001 Log10(Dose) 1 3.4181 0.7455 1.9569 4.8794 21.02 <.0001 Probit Model in Terms of Tolerance Distribution MU SIGMA 0.53032254 0.29255866 Estimated Covariance Matrix for Tolerance Parameters MU SIGMA MU 0.002418 0.000409 SIGMA 0.000409 0.004072
Tolerance distribution parameter estimates for the normal distribution indicate a mean tolerance for the population of 0.5303.
Output from Probit Procedure Probit Procedure Probit Analysis on Log10(Dose) Probability Log10(Dose) 95% Fiducial Limits 0.01 0.15027 0.69518 0.07710 0.02 0.07052 0.55766 0.13475 0.03 0.01992 0.47064 0.17156 0.04 0.01814 0.40534 0.19941 0.05 0.04911 0.35233 0.22218 0.06 0.07546 0.30731 0.24165 0.07 0.09857 0.26793 0.25881 0.08 0.11926 0.23273 0.27425 0.09 0.13807 0.20080 0.28837 0.10 0.15539 0.17147 0.30142 0.15 0.22710 0.05086 0.35631 0.20 0.28410 0.04369 0.40124 0.25 0.33299 0.12343 0.44116 0.30 0.37690 0.19348 0.47857 0.35 0.41759 0.25658 0.51504 0.40 0.45620 0.31429 0.55182 0.45 0.49356 0.36754 0.58999 0.50 0.53032 0.41693 0.63057 0.55 0.56709 0.46296 0.67451 0.60 0.60444 0.50618 0.72271 0.65 0.64305 0.54734 0.77603 0.70 0.68374 0.58745 0.83550 0.75 0.72765 0.62776 0.90265 0.80 0.77655 0.66999 0.98008 0.85 0.83354 0.71675 1.07279 0.90 0.90525 0.77313 1.19191 0.91 0.92257 0.78646 1.22098 0.92 0.94139 0.80083 1.25265 0.93 0.96208 0.81653 1.28759 0.94 0.98519 0.83394 1.32672 0.95 1.01154 0.85367 1.37149 0.96 1.04250 0.87669 1.42424 0.97 1.08056 0.90480 1.48928 0.98 1.13116 0.94189 1.57602 0.99 1.21092 0.99987 1.71321
The LD50 (ED50 for log dose) is 0.5303, the dose corresponding to a probability of 0.5. This is the same as the mean tolerance for the normal distribution.
Output from Probit Procedure Probit Procedure Probit Analysis on Dose Probability Dose 95% Fiducial Limits 0.01 0.70750 0.20175 1.19427 0.02 0.85012 0.27691 1.36380 0.03 0.95517 0.33834 1.48444 0.04 1.04266 0.39324 1.58274 0.05 1.11971 0.44429 1.66793 0.06 1.18976 0.49282 1.74443 0.07 1.25478 0.53960 1.81473 0.08 1.31600 0.58515 1.88042 0.09 1.37427 0.62980 1.94252 0.10 1.43019 0.67380 2.00181 0.15 1.68696 0.88950 2.27147 0.20 1.92353 1.10584 2.51906 0.25 2.15276 1.32870 2.76161 0.30 2.38180 1.56128 3.01000 0.35 2.61573 1.80543 3.27374 0.40 2.85893 2.06200 3.56306 0.45 3.11573 2.33098 3.89038 0.50 3.39096 2.61175 4.27138 0.55 3.69051 2.90374 4.72619 0.60 4.02199 3.20759 5.28090 0.65 4.39594 3.52651 5.97077 0.70 4.82770 3.86765 6.84706 0.75 5.34134 4.24385 7.99189 0.80 5.97787 4.67724 9.55169 0.85 6.81617 5.20900 11.82480 0.90 8.03992 5.93105 15.55653 0.91 8.36704 6.11584 16.63320 0.92 8.73752 6.32165 17.89163 0.93 9.16385 6.55431 19.39034 0.94 9.66463 6.82245 21.21881 0.95 10.26925 7.13949 23.52275 0.96 11.02811 7.52816 26.56066 0.97 12.03830 8.03149 30.85201 0.98 13.52585 8.74763 37.67206 0.99 16.25233 9.99709 51.66627
The ED50 for dose is 3.39 with a 95% confidence interval of (2.61, 4.27).
Plot of Observed and Fitted Probabilities Probit Procedure Model Information Data Set WORK.A Events Variable Response Trials Variable N Number of Observations 7 Number of Events 38 Number of Trials 74 Missing Values 29 Name of Distribution Logistic Log Likelihood 37.11065336 Algorithm converged.
Plot of Observed and Fitted Probabilities Probit Procedure Analysis of Parameter Estimates Standard 95% Confidence Chi- Parameter DF Estimate Error Limits Square Pr > ChiSq Intercept 1 3.2246 0.8861 4.9613 1.4880 13.24 0.0003 Log10(Dose) 1 5.9702 1.4492 3.1299 8.8105 16.97 <.0001
The regression parameter estimates for the logistic model of ˆ’ 3.22 and 5.97 are approximately / times as large as those for the normal model.
Plot of Observed and Fitted Probabilities Probit Procedure Probit Analysis on Log10(Dose) Probability Log10(Dose) 95% Fiducial Limits 0.01 0.22955 0.97441 0.04234 0.02 0.11175 0.75158 0.12404 0.03 0.04212 0.62018 0.17265 0.04 0.00780 0.52618 0.20771 0.05 0.04693 0.45265 0.23533 0.06 0.07925 0.39205 0.25826 0.07 0.10686 0.34037 0.27796 0.08 0.13103 0.29521 0.29530 0.09 0.15259 0.25502 0.31085 0.10 0.17209 0.21875 0.32498 0.15 0.24958 0.07552 0.38207 0.20 0.30792 0.03092 0.42645 0.25 0.35611 0.11742 0.46451 0.30 0.39820 0.19143 0.49932 0.35 0.43644 0.25684 0.53275 0.40 0.47221 0.31588 0.56619 0.45 0.50651 0.36986 0.60089 0.50 0.54013 0.41957 0.63807 0.55 0.57374 0.46559 0.67894 0.60 0.60804 0.50846 0.72474 0.65 0.64381 0.54896 0.77673 0.70 0.68205 0.58815 0.83637 0.75 0.72414 0.62752 0.90582 0.80 0.77233 0.66915 0.98876 0.85 0.83067 0.71631 1.09242 0.90 0.90816 0.77562 1.23343 0.91 0.92766 0.79014 1.26931 0.92 0.94922 0.80607 1.30912 0.93 0.97339 0.82378 1.35391 0.94 1.00100 0.84384 1.40523 0.95 1.03332 0.86713 1.46546 0.96 1.07245 0.89511 1.53864 0.97 1.12237 0.93053 1.63228 0.98 1.19200 0.97952 1.76329 0.99 1.30980 1.06166 1.98569
Plot of Observed and Fitted Probabilities Probit Procedure Probit Analysis on Dose Probability Dose 95% Fiducial Limits 0.01 0.58945 0.10607 1.10241 0.02 0.77312 0.17718 1.33058 0.03 0.90757 0.23978 1.48817 0.04 1.01813 0.29773 1.61327 0.05 1.11413 0.35266 1.71922 0.06 1.20018 0.40546 1.81244 0.07 1.27896 0.45670 1.89654 0.08 1.35218 0.50675 1.97379 0.09 1.42100 0.55588 2.04572 0.10 1.48625 0.60430 2.11339 0.15 1.77656 0.84038 2.41030 0.20 2.03199 1.07379 2.66961 0.25 2.27043 1.31046 2.91416 0.30 2.50152 1.55393 3.15736 0.35 2.73172 1.80652 3.40996 0.40 2.96627 2.06957 3.68292 0.45 3.21006 2.34345 3.98927 0.50 3.46837 2.62768 4.34578 0.55 3.74746 2.92138 4.77466 0.60 4.05546 3.22451 5.30573 0.65 4.40366 3.53961 5.98041 0.70 4.80891 3.87391 6.86079 0.75 5.29836 4.24155 8.05044 0.80 5.92009 4.66820 9.74455 0.85 6.77126 5.20365 12.37149 0.90 8.09391 5.96508 17.11715 0.91 8.46559 6.16800 18.59129 0.92 8.89644 6.39837 20.37592 0.93 9.40575 6.66469 22.58957 0.94 10.02317 6.97977 25.42292 0.95 10.79732 7.36428 29.20549 0.96 11.81534 7.85438 34.56521 0.97 13.25466 8.52173 42.88232 0.98 15.55972 9.53941 57.98207 0.99 20.40815 11.52549 96.75820
Both the ED50 and the LD50 are similar to those for the normal model.
The statement PREDPPLOT creates the plot of observed and fitted probabilities in Output 60.1.2. The dashed line represent pointwise confidence bands for the probabilities.
In this example, two preparations , a standard preparation and a test preparation, are each given at several dose levels to groups of insects . The symptoms are recorded for each insect within each group , and two multilevel probit models are fit. Because the natural sort order of the three levels is not the same as the response order, the ORDER=DATA option is specified in the PROC PROBIT statement to get the desired order.
The following statements produce Output 60.2.1:
Probit Models for Symptom Severity Probit Procedure Class Level Information Name Levels Values Prep 2 stand test Symptoms 3 None Mild Severe
data multi; input Prep $ Dose Symptoms $ N; LDose=log10(Dose); if Prep='test' then PrepDose=LDose; else PrepDose=0; datalines; stand 10 None 33 stand 10 Mild 7 stand 10 Severe 10 stand 20 None 17 stand 20 Mild 13 stand 20 Severe 17 stand 30 None 14 stand 30 Mild 3 stand 30 Severe 28 stand 40 None 9 stand 40 Mild 8 stand 40 Severe 32 test 10 None 44 test 10 Mild 6 test 10 Severe 0 test 20 None 32 test 20 Mild 10 test 20 Severe 12 test 30 None 23 test 30 Mild 7 test 30 Severe 21 test 40 None 16 test 40 Mild 6 test 40 Severe 19 ; proc probit order=data; class Prep Symptoms; nonpara: model Symptoms=Prep LDose PrepDose / lackfit; weight N; title Probit Models for Symptom Severity; run; proc probit order=data; class Prep Symptoms; parallel: model Symptoms=Prep LDose / lackfit; weight N; title Probit Models for Symptom Severity; run;
The first model allows for nonparallelism between the dose response curves for the two preparations by inclusion of an interaction between Prep and LDose . The interaction term is labeled PrepDose in the Analysis of Parameter Estimates table. The results of this first model indicate that the parameter for the interaction term is not significant, having a Wald chi-square of 0.73. Also, since the first model is a generalization of the second, a likelihood ratio test statistic for this same parameter can be obtained by multiplying the difference in log likelihoods between the two models by 2. The value obtained, 2 — ( ˆ’ 345 . 94 ˆ’ ( ˆ’ 346 . 31)), is 0.73. This is in close agreement with the Wald chi-square from the first model. The lack-of-fit test statistics for the two models do not indicate a problem with either fit.
Probit Models for Symptom Severity Probit Procedure Model Information Data Set WORK.MULTI Dependent Variable Symptoms Weight Variable N Number of Observations 23 Missing Values 1 Name of Distribution Normal Log Likelihood 345.9401767
Probit Models for Symptom Severity Probit Procedure Analysis of Parameter Estimates Standard 95% Confidence Chi- Parameter DF Estimate Error Limits Square Pr > ChiSq Intercept 1 3.8080 0.6252 2.5827 5.0333 37.10 <.0001 Intercept2 1 0.4684 0.0559 0.3589 0.5780 70.19 <.0001 Prep stand 1 1.2573 0.8190 2.8624 0.3479 2.36 0.1247 Prep test 0 0.0000 0.0000 0.0000 0.0000 . . LDose 1 2.1512 0.3909 2.9173 1.3851 30.29 <.0001 PrepDose 1 0.5072 0.5945 1.6724 0.6580 0.73 0.3935
Probit Models for Symptom Severity Probit Procedure Class Level Information Name Levels Values Prep 2 stand test Symptoms 3 None Mild Severe
Probit Models for Symptom Severity Probit Procedure Model Information Data Set WORK.MULTI Dependent Variable Symptoms Weight Variable N Number of Observations 23 Missing Values 1 Name of Distribution Normal Log Likelihood 346.306141
Probit Models for Symptom Severity Probit Procedure Analysis of Parameter Estimates Standard 95% Confidence Chi- Parameter DF Estimate Error Limits Square Pr > ChiSq Intercept 1 3.4148 0.4126 2.6061 4.2235 68.50 <.0001 Intercept2 1 0.4678 0.0558 0.3584 0.5772 70.19 <.0001 Prep stand 1 0.5675 0.1259 0.8142 0.3208 20.33 <.0001 Prep test 0 0.0000 0.0000 0.0000 0.0000 . . LDose 1 2.3721 0.2949 2.9502 1.7940 64.68 <.0001
The negative coefficient associated with LDose indicates that the probability of having no symptoms ( Symptoms = None) or no or mild symptoms ( Symptoms = None or Symptoms = Mild) decreases as LDose increases; that is, the probability of a severe symptom increases with LDose . This association is apparent for both treatment groups.
The negative coefficient associated with the standard treatment group ( Prep = stand ) indicates that the standard treatment is associated with more severe symptoms across all Ldose values.
The following statements use the PREDPPLOT statement to create the plot shown in Output 60.2.2 of the probabilities of the response taking on individual levels as a function of LDose . Since there are two covariates, LDose and Prep , the value of the CLASS variable Prep is fixed at the highest level, test . Although not shown here, the CDFPLOT statement creates similar plots of the cumulative response probabilities, instead of individual response level probabilities.
proc probit data=multi order=data; class Prep Symptoms; parallel: model Symptoms=Prep LDose / lackfit; predpplot var=ldose level=("None" "Mild" "Severe") cfit=blue cframe=ligr inborder noconf ; weight N; title 'Probit Models for Symptom Severity'; run;
The following statements use the XDATA= data set to create a plot of the predicted probabilities with Prep set to the stand level. The resulting plot is shown in Output 60.2.3 .
data xrow; input Prep $ Dose Symptoms $ N; LDose=log10(Dose); datalines; stand 40 Severe 32 run; proc probit data=multi order=data xdata=xrow; class Prep Symptoms; parallel: model Symptoms=Prep LDose / lackfit; predpplot var=ldose level=("None" "Mild" "Severe") cfit=blue cframe=ligr inborder noconf ; weight N; title Predicted Probabilities for Standard Preparation; run;
In this example, a series of people are questioned as to whether or not they would subscribe to a new newspaper. For each person, the variables sex ( Female , Male), age , and subs (1=yes,0=no) are recorded. The PROBIT procedure is used to fita logistic regression model to the probability of a positive response (subscribing) as a function of the variables sex and age . Specifically, the probability of subscribing is modeled as
where F is the cumulative logistic distribution function.
By default, the PROBIT procedure models the probability of the lower response level for binary data. One way to model Pr( subs =1)is to format the response variable so that the formatted value corresponding to subs =1 is the lower level. The following statements format the values of subs as 1 = accept and 0 = reject , so that PROBIT models Pr(accept) = Pr( subs =1).
The following statements produce Output 60.3.1:
Logistic Regression of Subscription Status Probit Procedure Class Level Information Name Levels Values subs 2 accept reject sex 2 Female Male PROC PROBIT is modeling the probabilities of levels of subs having LOWER Ordered Values in the response profile table.
data news; input sex $ age subs; datalines; Female 35 0 Male 44 0 Male 45 1 Female 47 1 Female 51 0 Female 47 0 Male 54 1 Male 47 1 Female 35 0 Female 34 0 Female 48 0 Female 56 1 Male 46 1 Female 59 1 Female 46 1 Male 59 1 Male 38 1 Female 39 0 Male 49 1 Male 42 1 Male 50 1 Female 45 0 Female 47 0 Female 30 1 Female 39 0 Female 51 0 Female 45 0 Female 43 1 Male 39 1 Male 31 0 Female 39 0 Male 34 0 Female 52 1 Female 46 0 Male 58 1 Female 50 1 Female 32 0 Female 52 1 Female 35 0 Female 51 0 ; proc format; value subscrib 1 = 'accept' 0 = 'reject'; run; proc probit; class subs sex; model subs=sex age / d=logistic itprint; format subs subscrib.; title 'Logistic Regression of Subscription Status'; run;
Logistic Regression of Subscription Status Probit Procedure Iteration History for Parameter Estimates Iter Ridge Loglikelihood Intercept sexFemale age 0 0 27.725887 0 0 0 1 0 20.142659 3.634567629 1.648455751 0.1051634384 2 0 19.52245 5.254865196 2.234724956 0.1506493473 3 0 19.490439 5.728485385 2.409827238 0.1639621828 4 0 19.490303 5.76187293 2.422349862 0.1649007124 5 0 19.490303 5.7620267 2.422407743 0.1649050312 6 0 19.490303 5.7620267 2.422407743 0.1649050312 Model Information Data Set WORK.NEWS Dependent Variable subs Number of Observations 40 Name of Distribution Logistic Log Likelihood 19.49030281 PROC PROBIT is modeling the probabilities of levels of subs having LOWER Ordered Values in the response profile table.
Logistic Regression of Subscription Status Probit Procedure PROC PROBIT is modeling the probabilities of levels of subs having LOWER Ordered Values in the response profile table. Last Evaluation of the Negative of the Gradient Intercept sexFemale age 5.95457E-12 8.768328E-10 1.636696E-8 Last Evaluation of the Negative of the Hessian Intercept sexFemale age Intercept 6.4597397447 4.6042218284 292.04051848 sexFemale 4.6042218284 4.6042218284 216.20829515 age 292.04051848 216.20829515 13487.329973 Algorithm converged.
Logistic Regression of Subscription Status Probit Procedure PROC PROBIT is modeling the probabilities of levels of subs having LOWER Ordered Values in the response profile table. Analysis of Parameter Estimates Standard 95% Confidence Chi Parameter DF Estimate Error Limits Square Pr > ChiSq Intercept 1 5.7620 2.7635 11.1783 0.3458 4.35 0.0371 sex Female 1 2.4224 0.9559 4.2959 0.5489 6.42 0.0113 sex Male 0 0.0000 0.0000 0.0000 0.0000 . . age 1 0.1649 0.0652 0.0371 0.2927 6.40 0.0114
From Output 60.3.1, there appears to be an effect due to both the variables sex and age . The positive coefficient for age indicates that older people are more likely to subscribe than younger people. The negative coefficient for sex indicates that females are less likely to subscribe than males.
The data, which is from an epidemiology study, consists of five variables: the number, r , of individuals surviving after an epidemic , out of n treated, for combinations of medicine dosage ( dose ), treatment ( treat = A, B), and sex ( sex = 0(Female), 1(Male)).
To see if the two treatments have different effects on male and female individual survival rate, the interaction term between the two variables treat and sex is included in the model.
The following invocation of PROC PROBIT fits the binary probit model to the grouped data:
data epidemic; input treat$ dose n r sex; label dose = Dose; datalines; A 2.17 142 142 0 A .57 132 47 1 A 1.68 128 105 1 A 1.08 126 100 0 A 1.79 125 118 0 B 1.66 117 115 1 B 1.49 127 114 0 B 1.17 51 44 1 B 2.00 127 126 0 B .80 129 100 1 ; data xval; input treat $ dose sex ; datalines; B 2. 1 ; title 'Epidemiology Study'; proc probit optc lackfit covout data = epidemic outest = out1 xdata = xval; class treat sex; model r/n = dose treat sex sex*treat/corrb covb inversecl; output out = out2 p =p; predpplot var = dose font = swiss vref(intersect) = .6667 vreflab = 'two thirds' vreflabpos = 2 cfit=blue cframe=ligr ; inset / cfill = white ctext = blue pos = se ; ippplot font = swiss href(intersect) = .75 hreflab = 'three quarters' vreflabpos = 2 threshlabpos = 2 cfit=blue cframe=ligr ; inset / cfill = white ctext = blue; lpredplot font = swiss vref(intersect) = 1. vreflab = 'unit probit' vreflabpos = 2 cfit=blue cframe=ligr ; inset / cfill = white ctext = blue; run;
The results of this analysis are shown in the following tables and figures.
Beginning with SAS Release 8.2, the PROBIT procedure does not support multiple MODEL statements. Only the last one is used if there is more than one MODEL statement in one invocation of the PROBIT procedure.
Epidemiology Study Probit Procedure Class Level Information Name Levels Values treat 2 A B sex 2 0 1
Output 60.4.1 displays the table of level information for all classification variables in the CLASS statement.
Output 60.4.2 displays the table of parameter information for the effects in the MODEL statement. The name of a parameter is generated from combining the variable names and level names in the effect. The maximum length of a parameter name is 32. The name of the effects are specified in the MODEL statement. The length of names of effects can be specified by the NAMELEN= option in the PROC PROBIT statement, with the default length 20.
Epidemiology Study Probit Procedure Parameter Information Parameter Effect treat sex Intercept Intercept dose dose treatA treat A treatB treat B sex0 sex 0 sex1 sex 1 treatAsex0 treat*sex A 0 treatAsex1 treat*sex A 1 treatBsex0 treat*sex B 0 treatBsex1 treat*sex B 1
Output 60.4.3 displays background information about the model fit. Included are the name of the input data set, the response variables used, and the number of observations, events, and trials. The table also includes the status of the convergence of the model fitting algorithm and the final value of log-likelihood function.
Epidemiology Study Probit Procedure Model Information Data Set WORK.EPIDEMIC Events Variable r Trials Variable n Number of Observations 10 Number of Events 1011 Number of Trials 1204 Name of Distribution Normal Log Likelihood 387.2467391 Algorithm converged.
Epidemiology Study Probit Procedure Goodness-of-Fit Tests Statistic Value DF Pr > ChiSq Pearson Chi-Square 4.9317 4 0.2944 L.R. Chi-Square 5.7079 4 0.2220 Response-Covariate Profile Response Levels 2 Number of Covariate Values 10
Output 60.4.4 displays the table of goodness-of-fit tests requested with the LACKFIT option in the PROC PROBIT statement. Two goodness-of-fit statistics, the Pearson chi-square statistic and the likelihood ratio chi-square statistic, are computed. The grouping method for computing these statistics can be specified by the AGGREGATE= option. The details can be found in the AGGREGATE= option and an example can be found in the second part of this example. By default, the PROBIT procedure uses the covariates in the MODEL statement to do grouping. Observations with the same values of the covariates in the MODEL statement are grouped into cells and the two statistics are computed according to these cells. The total number of cells , and the number of levels for the response variable are reported next in the Response-Covariate Profile.
In this example, neither the Pearson chi-square nor the log-likelihood ratio chi-square tests are significant at the 0.1 level, which is the default test level used by the PROBIT procedure. That means that the model, which includes the interaction of treat and sex , is suitable for this epidemiology data set. (Further investigation shows that models without the interaction of treat and sex are not acceptable by either test.)
Epidemiology Study Probit Procedure Type III Analysis of Effects Wald Effect DF Chi-Square Pr > ChiSq dose 1 42.1691 <.0001 treat 1 16.1421 <.0001 sex 1 1.7710 0.1833 treat*sex 1 13.9343 0.0002
Output 60.4.5 displays the Type III test results for all effects specified in the MODEL statement, which include the degrees of freedom for the effect, the Wald Chi-Square test statistic, and the p -value.
Output 60.4.6 displays the table of parameter estimates for the model. The PROBIT procedure displays information for all the parameters of an effect. Degenerate parameters are indicated by 0 degree of freedom. Confidence intervals are computed for all parameters with non-zero degrees of freedom, including the natural threshold C if the OPTC option is specified in the PROC PROBIT statement. The confidence level can be specified by the ALPHA= option in the MODEL statement. The default confidence level is 95%.
Epidemiology Study Probit Procedure Analysis of Parameter Estimates Standard 95% Confidence Chi- Parameter DF Estimate Error Limits Square Pr > ChiSq Intercept 1 0.8871 0.3632 1.5991 0.1752 5.96 0.0146 dose 1 1.6774 0.2583 1.1711 2.1837 42.17 <.0001 treat A 1 1.2537 0.2616 1.7664 0.7410 22.97 <.0001 treat B 0 0.0000 0.0000 0.0000 0.0000 . . sex 0 1 0.4633 0.2289 0.9119 0.0147 4.10 0.0429 sex 1 0 0.0000 0.0000 0.0000 0.0000 . . treat*sex A 0 1 1.2899 0.3456 0.6126 1.9672 13.93 0.0002 treat*sex A 1 0 0.0000 0.0000 0.0000 0.0000 . . treat*sex B 0 0 0.0000 0.0000 0.0000 0.0000 . . treat*sex B 1 0 0.0000 0.0000 0.0000 0.0000 . . _C_ 1 0.2735 0.0946 0.0881 0.4589
From this table, you can see the following results:
dose has significant positive effect on the survival rate.
Individuals under treatment A have a lower survival rate.
Male individuals have a higher survival rate.
Female individuals under treatment A have a higher survival rate.
Output 60.4.7 and Output 60.4.8 display tables of estimated covariance matrix and estimated correlation matrix for estimated parameters with a non-zero degree of freedom, respectively. They are computed by the inverse of the Hessian matrix of the estimated parameters.
Epidemiology Study Probit Procedure Estimated Covariance Matrix Intercept dose treatA sex0 treatAsex0 Intercept 0.131944 0.087353 0.053551 0.030285 0.067056 dose 0.087353 0.066723 0.047506 0.034081 0.058620 treatA 0.053551 0.047506 0.068425 0.036063 0.075323 sex0 0.030285 0.034081 0.036063 0.052383 0.063599 treatAsex0 0.067056 0.058620 0.075323 0.063599 0.119408 _C_ 0.028073 0.018196 0.017084 0.008088 0.019134 Estimated Covariance Matrix _C_ Intercept 0.028073 dose 0.018196 treatA 0.017084 sex0 0.008088 treatAsex0 0.019134 _C_ 0.008948
Epidemiology Study Probit Procedure Estimated Correlation Matrix Intercept dose treatA sex0 treatAsex0 Intercept 1.000000 0.930998 0.563595 0.364284 0.534227 dose 0.930998 1.000000 0.703083 0.576477 0.656744 treatA 0.563595 0.703083 1.000000 0.602359 0.833299 sex0 0.364284 0.576477 0.602359 1.000000 0.804154 treatAsex0 0.534227 0.656744 0.833299 0.804154 1.000000 _C_ 0.817027 0.744699 0.690420 0.373565 0.585364 Estimated Correlation Matrix _C_ Intercept 0.817027 dose 0.744699 treatA 0.690420 sex0 0.373565 treatAsex0 0.585364 _C_ 1.000000
Output 60.4.9 displays the computed values and fiducial limits for the first single continuous variable dose in the MODEL statement, given the probability levels, without the effect of the natural threshold, and when the option INSERSECL in the MODEL statement is specified. If there is no single continuous variable in the MODEL specification but the INVERSECL option is specified, an error is reported. If the XDATA= option is used to input a data set for the independent variables in the MODEL statement, the PROBIT procedure uses these values for the independent variables other than the single continuous variable. Missing values are not permitted in the XDATA= data set for the independent variables, although the value for the single continuous variable is not used in the computing of the fiducial limits. A suitable valid value should be given. In the data set xval created by the SAS statements on page 3784, Dose =2.
Epidemiology Study Probit Procedure Probit Analysis on dose Probability dose 95% Fiducial Limits 0.01 0.85801 1.81301 0.33743 0.02 0.69549 1.58167 0.21116 0.03 0.59238 1.43501 0.13093 0.04 0.51482 1.32476 0.07050 0.05 0.45172 1.23513 0.02130 0.06 0.39802 1.15888 0.02063 0.07 0.35093 1.09206 0.05742 0.08 0.30877 1.03226 0.09039 0.09 0.27043 0.97790 0.12040 0.10 0.23513 0.92788 0.14805 0.15 0.08900 0.72107 0.26278 0.20 0.02714 0.55706 0.35434 0.25 0.12678 0.41669 0.43322 0.30 0.21625 0.29095 0.50437 0.35 0.29917 0.17477 0.57064 0.40 0.37785 0.06487 0.63387 0.45 0.45397 0.04104 0.69546 0.50 0.52888 0.14481 0.75654 0.55 0.60380 0.24800 0.81819 0.60 0.67992 0.35213 0.88157 0.65 0.75860 0.45879 0.94803 0.70 0.84151 0.56985 1.01942 0.75 0.93099 0.68770 1.09847 0.80 1.03063 0.81571 1.18970 0.85 1.14677 0.95926 1.30171 0.90 1.29290 1.12867 1.45386 0.91 1.32819 1.16747 1.49273 0.92 1.36654 1.20867 1.53590 0.93 1.40870 1.25284 1.58450 0.94 1.45579 1.30084 1.64012 0.95 1.50949 1.35397 1.70515 0.96 1.57258 1.41443 1.78353 0.97 1.65015 1.48626 1.88238 0.98 1.75326 1.57833 2.01720 0.99 1.91577 1.71776 2.23537
See the section XDATA= SAS-data-set on page 3763 for the default values for those effects other than the single continuous variable, for which the fiducial limits are computed.
In this example, there are two classification variables, treat and sex . Fiducial limits for the dose variable are computed for the highest level of the classification variables, treat = B and sex = 1, which is the default specification. Since these are the default values, you would get the same values and fiducial limits if you did not specify the XDATA= option in this example. The confidence level for the fiducial limits can be specified by the ALPHA= option in the MODEL statement. The default level is 95%.
If a LOG10 or LOG option is used in the PROC PROBIT statement, the values and the fiducial limits are computed for both the single continuous variable and its logarithm.
Output 60.4.10 displays the OUTEST= data set. All parameters for an effect are included.
Obs _MODEL_ _NAME_ _TYPE_ _DIST_ _STATUS_ _LNLIKE_ r Intercept 1 r PARMS Normal 0 Converged 387.247 1.00000 0.88714 2 Intercept COV Normal 0 Converged 387.247 0.88714 0.13194 3 dose COV Normal 0 Converged 387.247 1.67739 0.08735 4 treatA COV Normal 0 Converged 387.247 1.25367 0.05355 5 treatB COV Normal 0 Converged 387.247 0.00000 0.00000 6 sex0 COV Normal 0 Converged 387.247 0.46329 0.03029 7 sex1 COV Normal 0 Converged 387.247 0.00000 0.00000 8 treatAsex0 COV Normal 0 Converged 387.247 1.28991 0.06706 9 treatAsex1 COV Normal 0 Converged 387.247 0.00000 0.00000 10 treatBsex0 COV Normal 0 Converged 387.247 0.00000 0.00000 11 treatBsex1 COV Normal 0 Converged 387.247 0.00000 0.00000 12 _C_ COV Normal 0 Converged 387.247 0.27347 0.02807 treat treat treat treat treat Obs dose treatA B sex0 sex1 Asex0 Asex1 Bsex0 Bsex1 _C_ 1 1.67739 1.25367 0 0.46329 0 1.28991 0 0 0 0.27347 2 0.08735 0.05355 0 0.03029 0 0.06706 0 0 0 0.02807 3 0.06672 0.04751 0 0.03408 0 0.05862 0 0 0 0.01820 4 0.04751 0.06843 0 0.03606 0 0.07532 0 0 0 0.01708 5 0.00000 0.00000 0 0.00000 0 0.00000 0 0 0 0.00000 6 0.03408 0.03606 0 0.05238 0 0.06360 0 0 0 0.00809 7 0.00000 0.00000 0 0.00000 0 0.00000 0 0 0 0.00000 8 0.05862 0.07532 0 0.06360 0 0.11941 0 0 0 0.01913 9 0.00000 0.00000 0 0.00000 0 0.00000 0 0 0 0.00000 10 0.00000 0.00000 0 0.00000 0 0.00000 0 0 0 0.00000 11 0.00000 0.00000 0 0.00000 0 0.00000 0 0 0 0.00000 12 0.01820 -0.01708 0 -0.00809 0 0.01913 0 0 0 0.00895
The following three outputs, Output 60.4.11, Output 60.4.12, and Output 60.4.13, are generated from the three plot statements. The first plot, specified with the PREDPPLOT statement, is the plot of the predicted probability against the single continuous variable Dose, which is specified by the VAR= option in the PREDPPLOT statement. This single continuous variable must be in the MODEL statement. If the VAR= option is not used, the first single continuous variable in the MODEL statement is used. In this example, you would get the same plot if the VAR = dose was not used in the PREDPPLOT statement. You can specify values of other independent variables in the MODEL statement using an XDATA= data set, or by using the default values.
The second plot, specified with the IPPPLOT statement, is the inverse of the predicted probability plot with the fiducial limits. It should be pointed out that the fiducial limits are not just the inverse of the confidence limits in the predicted probability plot; see the section Inverse Confidence Limits on page 3761 for the computation of these limits. The third plot, specified with the LPREDPLOT statement, is the plot of the linear predictor x ² ² against the first single continuous variable (or the single continuous variable specified by the VAR= option) with the Wald confidence intervals.
After each plot statement, an optional INSET statement is used to draw a box within the plot (inset box). In the inset box, information about the model fitting can be specified. See INSET Statement on page 3723 for more detail.
Combining INEST= data set and the MAXIT= option in the MODEL statement, the PROBIT procedure can do prediction, if the parameterizations for the models used for the training data and the validation data are exactly the same.
After the first invocation of PROC PROBIT, you have the estimated parameters and their covariance matrix in the data set OUTEST = Out1 , and the fitted probabilities for the training data set epidemic in the data set OUTPUT = Out2 . See Output 60.4.10 on page 3791 for the data set Out1 and Output 60.4.14 on page 3795 for the data set Out2 .
Obs treat dose n r sex p 1 A 2.17 142 142 0 0.99272 2 A 0.57 132 47 1 0.35925 3 A 1.68 128 105 1 0.81899 4 A 1.08 126 100 0 0.77517 5 A 1.79 125 118 0 0.96682 6 B 1.66 117 115 1 0.97901 7 B 1.49 127 114 0 0.90896 8 B 1.17 51 44 1 0.89749 9 B 2.00 127 126 0 0.98364 10 B 0.80 129 100 1 0.76414
The validation data are collected in data set validate . The second invocation of PROC PROBIT simply passes the estimated parameters from the training data set epidemic to the validation data set validate for prediction. The predicted probabilities are stored in the data set OUTPUT = Out3 (see Output 60.4.15 on page 3795). The third invocation of PROC PROBIT passes the estimated parameters as initial values for a new fit of the validation data set using the same model. Predicted probabilities are stored in the data set OUTPUT = Out4 (see Output 60.4.16 on page 3795). Goodness-of-Fit tests are computed based on the cells grouped by the AGGREGATE= group variable. Results are shown in Output 60.4.17 on page 3796.
data validate; input treat $ dose sex n r group; datalines; B 2.0 0 44 43 1 B 2.0 1 54 52 2 B 1.5 1 36 32 3 B 1.5 0 45 40 4 A 2.0 0 66 64 5 A 2.0 1 89 89 6 A 1.5 1 45 39 7 A 1.5 0 66 60 8 B 2.0 0 44 44 1 B 2.0 1 54 54 2 B 1.5 1 36 30 3 B 1.5 0 45 41 4 A 2.0 0 66 65 5 A 2.0 1 89 88 6 A 1.5 1 45 38 7 A 1.5 0 66 59 8 ; proc probit optc data = validate inest = out1; class treat sex; model r/n = dose treat sex sex*treat / maxit=0; output out = out3 p =p; run ; proc probit optc lackfit data = validate inest = out1; class treat sex; model r/n = dose treat sex sex*treat / aggregate = group ; output out = out4 p =p; run ;
Obs treat dose sex n r group p 1 B 2.0 0 44 43 1 0.98364 2 B 2.0 1 54 52 2 0.99506 3 B 1.5 1 36 32 3 0.96247 4 B 1.5 0 45 40 4 0.91145 5 A 2.0 0 66 64 5 0.98500 6 A 2.0 1 89 89 6 0.91835 7 A 1.5 1 45 39 7 0.74300 8 A 1.5 0 66 60 8 0.91666 9 B 2.0 0 44 44 1 0.98364 10 B 2.0 1 54 54 2 0.99506 11 B 1.5 1 36 30 3 0.96247 12 B 1.5 0 45 41 4 0.91145 13 A 2.0 0 66 65 5 0.98500 14 A 2.0 1 89 88 6 0.91835 15 A 1.5 1 45 38 7 0.74300 16 A 1.5 0 66 59 8 0.91666
Obs treat dose sex n r group p 1 B 2.0 0 44 43 1 0.98954 2 B 2.0 1 54 52 2 0.98262 3 B 1.5 1 36 32 3 0.86187 4 B 1.5 0 45 40 4 0.90095 5 A 2.0 0 66 64 5 0.98768 6 A 2.0 1 89 89 6 0.98614 7 A 1.5 1 45 39 7 0.88075 8 A 1.5 0 66 60 8 0.88964 9 B 2.0 0 44 44 1 0.98954 10 B 2.0 1 54 54 2 0.98262 11 B 1.5 1 36 30 3 0.86187 12 B 1.5 0 45 41 4 0.90095 13 A 2.0 0 66 65 5 0.98768 14 A 2.0 1 89 88 6 0.98614 15 A 1.5 1 45 38 7 0.88075 16 A 1.5 0 66 59 8 0.88964
Probit Procedure Goodness-of-Fit Tests Statistic Value DF Pr > ChiSq Pearson Chi-Square 2.8101 2 0.2454 L.R. Chi-Square 2.8080 2 0.2456