Examples | SAS.STAT 9.1 Users Guide (Vol. 5)

Example 60.1. Dosage Levels

In this example, Dose is a variable representing the level of a stimulus, N represents the number of subjects tested at each level of the stimulus, and Response is the number of subjects responding to that level of the stimulus. Both probit and logit response models are fit to the data. The LOG10 option in the PROC PROBIT statement requests that the log base 10 of Dose is used as the independent variable. Specifically, for a given level of Dose , the probability p of a positive response is modeled as

The probabilities are estimated first using the normal distribution function (the default) and then using the logistic distribution function. Note that, in this model specification, the natural rate is assumed to be zero.

The LACKFIT option specifies lack-of-fit tests and the INVERSECL option specifies inverse confidence limits.

In the DATA step that reads the data, a number of observations are generated that have a missing value for the response. Although the PROBIT procedure does not use the observations with the missing values to fit the model, it does give predicted values for all nonmissing sets of independent variables . These data points fill in the plot of fitted and observed values in the logistic model displayed in Output 60.1.2. The plot, requested with the PREDPPLOT statement, displays the estimated logistic cumulative distribution function and the observed response rates. The VAR= DOSE option specifies the horizontal axis variable in the plot.

Example 11.1: Plot of Observed and Fitted Probabilities

The following statements produce Output 60.1.1:

Example 11.2: Dosage Levels: PROC PROBIT

  Output from Probit Procedure   Probit Procedure   Iteration History for Parameter Estimates   Iter   Ridge    Loglikelihood       Intercept     Log10(Dose)   0       0   51.292891               0               0   1       0   37.881166   1.355817008     2.635206083   2       0   37.286169   1.764939171    3.3408954936   3       0   37.280389   1.812147863    3.4172391614   4       0   37.280388   1.812704962     3.418117919   5       0   37.280388   1.812704962     3.418117919

  data a;   infile cards eof=eof;   input Dose N Response;   Observed= Response/N;   output;   return;   eof: do Dose=0.5 to 7.5 by 0.25;   output;   end;   datalines;   1 10 1   2 12 2   3 10 4   4 10 5   5 12 8   6 10 8   7 10 10   ;

  proc probit log10;   model Response/N=Dose / lackfit inversecl itprint;   output out=B p=Prob std=std xbeta=xbeta;   title 'Output from Probit Procedure';   run;   symbol v=dot c=white;   proc probit log10;   model Response/N=Dose / d=logistic inversecl;   predpplot var = dose cfit = blue cframe=ligr inborder;   output out=B p=Prob std=std xbeta=xbeta;   title Output from Probit Procedure;   run;

  Output from Probit Procedure   Probit Procedure   Model Information   Data Set                        WORK.B   Events Variable               Response   Trials Variable                      N   Number of Observations               7   Number of Events                    38   Number of Trials                    74   Missing Values                      29   Name of Distribution            Normal   Log Likelihood   37.28038802   Last Evaluation of the Negative of the Gradient   Intercept   Log10(Dose)   3.434907E-7   2.09809E-8   Last Evaluation of the Negative of the Hessian   Intercept   Log10(Dose)   Intercept    36.005280383  20.152675982   Log10(Dose)  20.152675982  13.078826305   Goodness-of-Fit Tests   Statistic                          Value       DF    Pr > ChiSq   Pearson Chi-Square                3.6497        5        0.6009   L.R.    Chi-Square                4.6381        5        0.4616   Response-Covariate Profile   Response Levels                2   Number of Covariate Values     7

The p -values in the Goodness-of-Fit table of 0.6009 for the Pearson chi-square and 0.4616 for the likelihood ratio chi-square indicate an adequate fit for the model fit with the normal distribution.

  Output from Probit Procedure   Probit Procedure   Analysis of Parameter Estimates   Standard    95% Confidence     Chi-   Parameter   DF  Estimate   Error        Limits       Square Pr > ChiSq   Intercept    1   1.8127  0.4493   2.6934   0.9320  16.27     <.0001   Log10(Dose)  1    3.4181  0.7455    1.9569    4.8794  21.02     <.0001   Probit Model in Terms of Tolerance Distribution   MU         SIGMA   0.53032254    0.29255866   Estimated Covariance Matrix   for Tolerance Parameters   MU             SIGMA   MU             0.002418   0.000409   SIGMA   0.000409          0.004072

Tolerance distribution parameter estimates for the normal distribution indicate a mean tolerance for the population of 0.5303.

  Output from Probit Procedure   Probit Procedure   Probit Analysis on Log10(Dose)   Probability       Log10(Dose)     95% Fiducial Limits   0.01   0.15027   0.69518       0.07710   0.02   0.07052   0.55766       0.13475   0.03   0.01992   0.47064       0.17156   0.04           0.01814   0.40534       0.19941   0.05           0.04911   0.35233       0.22218   0.06           0.07546   0.30731       0.24165   0.07           0.09857   0.26793       0.25881   0.08           0.11926   0.23273       0.27425   0.09           0.13807   0.20080       0.28837   0.10           0.15539   0.17147       0.30142   0.15           0.22710   0.05086       0.35631   0.20           0.28410     0.04369       0.40124   0.25           0.33299     0.12343       0.44116   0.30           0.37690     0.19348       0.47857   0.35           0.41759     0.25658       0.51504   0.40           0.45620     0.31429       0.55182   0.45           0.49356     0.36754       0.58999   0.50           0.53032     0.41693       0.63057   0.55           0.56709     0.46296       0.67451   0.60           0.60444     0.50618       0.72271   0.65           0.64305     0.54734       0.77603   0.70           0.68374     0.58745       0.83550   0.75           0.72765     0.62776       0.90265   0.80           0.77655     0.66999       0.98008   0.85           0.83354     0.71675       1.07279   0.90           0.90525     0.77313       1.19191   0.91           0.92257     0.78646       1.22098   0.92           0.94139     0.80083       1.25265   0.93           0.96208     0.81653       1.28759   0.94           0.98519     0.83394       1.32672   0.95           1.01154     0.85367       1.37149   0.96           1.04250     0.87669       1.42424   0.97           1.08056     0.90480       1.48928   0.98           1.13116     0.94189       1.57602   0.99           1.21092     0.99987       1.71321

The LD50 (ED50 for log dose) is 0.5303, the dose corresponding to a probability of 0.5. This is the same as the mean tolerance for the normal distribution.

  Output from Probit Procedure   Probit Procedure   Probit Analysis on Dose   Probability          Dose       95% Fiducial Limits   0.01       0.70750       0.20175       1.19427   0.02       0.85012       0.27691       1.36380   0.03       0.95517       0.33834       1.48444   0.04       1.04266       0.39324       1.58274   0.05       1.11971       0.44429       1.66793   0.06       1.18976       0.49282       1.74443   0.07       1.25478       0.53960       1.81473   0.08       1.31600       0.58515       1.88042   0.09       1.37427       0.62980       1.94252   0.10       1.43019       0.67380       2.00181   0.15       1.68696       0.88950       2.27147   0.20       1.92353       1.10584       2.51906   0.25       2.15276       1.32870       2.76161   0.30       2.38180       1.56128       3.01000   0.35       2.61573       1.80543       3.27374   0.40       2.85893       2.06200       3.56306   0.45       3.11573       2.33098       3.89038   0.50       3.39096       2.61175       4.27138   0.55       3.69051       2.90374       4.72619   0.60       4.02199       3.20759       5.28090   0.65       4.39594       3.52651       5.97077   0.70       4.82770       3.86765       6.84706   0.75       5.34134       4.24385       7.99189   0.80       5.97787       4.67724       9.55169   0.85       6.81617       5.20900      11.82480   0.90       8.03992       5.93105      15.55653   0.91       8.36704       6.11584      16.63320   0.92       8.73752       6.32165      17.89163   0.93       9.16385       6.55431      19.39034   0.94       9.66463       6.82245      21.21881   0.95      10.26925       7.13949      23.52275   0.96      11.02811       7.52816      26.56066   0.97      12.03830       8.03149      30.85201   0.98      13.52585       8.74763      37.67206   0.99      16.25233       9.99709      51.66627

The ED50 for dose is 3.39 with a 95% confidence interval of (2.61, 4.27).

  Plot of Observed and Fitted Probabilities   Probit Procedure   Model Information   Data Set                        WORK.A   Events Variable               Response   Trials Variable                      N   Number of Observations               7   Number of Events                    38   Number of Trials                    74   Missing Values                      29   Name of Distribution          Logistic   Log Likelihood   37.11065336   Algorithm converged.

  Plot of Observed and Fitted Probabilities   Probit Procedure   Analysis of Parameter Estimates   Standard   95% Confidence     Chi-   Parameter   DF Estimate    Error       Limits       Square Pr > ChiSq   Intercept    1   3.2246   0.8861   4.9613   1.4880   13.24     0.0003   Log10(Dose)  1   5.9702   1.4492  3.1299    8.8105   16.97     <.0001

The regression parameter estimates for the logistic model of ˆ’ 3.22 and 5.97 are approximately / times as large as those for the normal model.

  Plot of Observed and Fitted Probabilities   Probit Procedure   Probit Analysis on Log10(Dose)   Probability          Log10(Dose)       95% Fiducial Limits   0.01   0.22955   0.97441       0.04234   0.02   0.11175   0.75158       0.12404   0.03   0.04212   0.62018       0.17265   0.04              0.00780   0.52618       0.20771   0.05              0.04693   0.45265       0.23533   0.06              0.07925   0.39205       0.25826   0.07              0.10686   0.34037       0.27796   0.08              0.13103   0.29521       0.29530   0.09              0.15259   0.25502       0.31085   0.10              0.17209   0.21875       0.32498   0.15              0.24958   0.07552       0.38207   0.20              0.30792       0.03092       0.42645   0.25              0.35611       0.11742       0.46451   0.30              0.39820       0.19143       0.49932   0.35              0.43644       0.25684       0.53275   0.40              0.47221       0.31588       0.56619   0.45              0.50651       0.36986       0.60089   0.50              0.54013       0.41957       0.63807   0.55              0.57374       0.46559       0.67894   0.60              0.60804       0.50846       0.72474   0.65              0.64381       0.54896       0.77673   0.70              0.68205       0.58815       0.83637   0.75              0.72414       0.62752       0.90582   0.80              0.77233       0.66915       0.98876   0.85              0.83067       0.71631       1.09242   0.90              0.90816       0.77562       1.23343   0.91              0.92766       0.79014       1.26931   0.92              0.94922       0.80607       1.30912   0.93              0.97339       0.82378       1.35391   0.94              1.00100       0.84384       1.40523   0.95              1.03332       0.86713       1.46546   0.96              1.07245       0.89511       1.53864   0.97              1.12237       0.93053       1.63228   0.98              1.19200       0.97952       1.76329   0.99              1.30980       1.06166       1.98569

  Plot of Observed and Fitted Probabilities   Probit Procedure   Probit Analysis on Dose   Probability          Dose       95% Fiducial Limits   0.01       0.58945       0.10607       1.10241   0.02       0.77312       0.17718       1.33058   0.03       0.90757       0.23978       1.48817   0.04       1.01813       0.29773       1.61327   0.05       1.11413       0.35266       1.71922   0.06       1.20018       0.40546       1.81244   0.07       1.27896       0.45670       1.89654   0.08       1.35218       0.50675       1.97379   0.09       1.42100       0.55588       2.04572   0.10       1.48625       0.60430       2.11339   0.15       1.77656       0.84038       2.41030   0.20       2.03199       1.07379       2.66961   0.25       2.27043       1.31046       2.91416   0.30       2.50152       1.55393       3.15736   0.35       2.73172       1.80652       3.40996   0.40       2.96627       2.06957       3.68292   0.45       3.21006       2.34345       3.98927   0.50       3.46837       2.62768       4.34578   0.55       3.74746       2.92138       4.77466   0.60       4.05546       3.22451       5.30573   0.65       4.40366       3.53961       5.98041   0.70       4.80891       3.87391       6.86079   0.75       5.29836       4.24155       8.05044   0.80       5.92009       4.66820       9.74455   0.85       6.77126       5.20365      12.37149   0.90       8.09391       5.96508      17.11715   0.91       8.46559       6.16800      18.59129   0.92       8.89644       6.39837      20.37592   0.93       9.40575       6.66469      22.58957   0.94      10.02317       6.97977      25.42292   0.95      10.79732       7.36428      29.20549   0.96      11.81534       7.85438      34.56521   0.97      13.25466       8.52173      42.88232   0.98      15.55972       9.53941      57.98207   0.99      20.40815      11.52549      96.75820

Both the ED50 and the LD50 are similar to those for the normal model.

The statement PREDPPLOT creates the plot of observed and fitted probabilities in Output 60.1.2. The dashed line represent pointwise confidence bands for the probabilities.

Example 60.2. Multilevel Response

In this example, two preparations , a standard preparation and a test preparation, are each given at several dose levels to groups of insects . The symptoms are recorded for each insect within each group , and two multilevel probit models are fit. Because the natural sort order of the three levels is not the same as the response order, the ORDER=DATA option is specified in the PROC PROBIT statement to get the desired order.

The following statements produce Output 60.2.1:

Example 11.3: Multilevel Response: PROC PROBIT

  Probit Models for Symptom Severity   Probit Procedure   Class Level Information   Name          Levels    Values   Prep               2    stand test   Symptoms           3    None Mild Severe

  data multi;   input Prep $ Dose Symptoms $ N;   LDose=log10(Dose);   if Prep='test' then PrepDose=LDose;   else PrepDose=0;   datalines;   stand     10      None       33   stand     10      Mild        7   stand     10      Severe     10   stand     20      None       17   stand     20      Mild       13   stand     20      Severe     17   stand     30      None       14   stand     30      Mild        3   stand     30      Severe     28   stand     40      None        9   stand     40      Mild        8   stand     40      Severe     32   test      10      None       44   test      10      Mild        6   test      10      Severe      0   test      20      None       32   test      20      Mild       10   test      20      Severe     12   test      30      None       23   test      30      Mild        7   test      30      Severe     21   test      40      None       16   test      40      Mild        6   test      40      Severe     19   ;   proc probit order=data;   class Prep Symptoms;   nonpara: model Symptoms=Prep LDose PrepDose / lackfit;   weight N;   title Probit Models for Symptom Severity;   run;   proc probit order=data;   class Prep Symptoms;   parallel: model Symptoms=Prep LDose / lackfit;   weight N;   title Probit Models for Symptom Severity;   run;

The first model allows for nonparallelism between the dose response curves for the two preparations by inclusion of an interaction between Prep and LDose . The interaction term is labeled PrepDose in the Analysis of Parameter Estimates table. The results of this first model indicate that the parameter for the interaction term is not significant, having a Wald chi-square of 0.73. Also, since the first model is a generalization of the second, a likelihood ratio test statistic for this same parameter can be obtained by multiplying the difference in log likelihoods between the two models by 2. The value obtained, 2 — ( ˆ’ 345 . 94 ˆ’ ( ˆ’ 346 . 31)), is 0.73. This is in close agreement with the Wald chi-square from the first model. The lack-of-fit test statistics for the two models do not indicate a problem with either fit.

  Probit Models for Symptom Severity   Probit Procedure   Model Information   Data Set                    WORK.MULTI   Dependent Variable            Symptoms   Weight Variable                      N   Number of Observations              23   Missing Values                       1   Name of Distribution            Normal   Log Likelihood   345.9401767

  Probit Models for Symptom Severity   Probit Procedure   Analysis of Parameter Estimates   Standard   95% Confidence     Chi-   Parameter        DF Estimate    Error       Limits       Square Pr > ChiSq   Intercept         1   3.8080   0.6252   2.5827   5.0333   37.10     <.0001   Intercept2        1   0.4684   0.0559   0.3589   0.5780   70.19     <.0001   Prep       stand  1   1.2573   0.8190   2.8624   0.3479    2.36     0.1247   Prep       test   0   0.0000   0.0000   0.0000   0.0000     .        .   LDose             1   2.1512   0.3909   2.9173   1.3851   30.29     <.0001   PrepDose          1   0.5072   0.5945   1.6724   0.6580    0.73     0.3935

  Probit Models for Symptom Severity   Probit Procedure   Class Level Information   Name          Levels    Values   Prep               2    stand test   Symptoms           3    None Mild Severe

  Probit Models for Symptom Severity   Probit Procedure   Model Information   Data Set                    WORK.MULTI   Dependent Variable            Symptoms   Weight Variable                      N   Number of Observations              23   Missing Values                       1   Name of Distribution            Normal   Log Likelihood   346.306141

  Probit Models for Symptom Severity   Probit Procedure   Analysis of Parameter Estimates   Standard   95% Confidence     Chi-   Parameter        DF Estimate    Error       Limits       Square Pr > ChiSq   Intercept         1   3.4148   0.4126   2.6061   4.2235   68.50     <.0001   Intercept2        1   0.4678   0.0558   0.3584   0.5772   70.19     <.0001   Prep       stand  1   0.5675   0.1259   0.8142   0.3208   20.33     <.0001   Prep       test   0   0.0000   0.0000   0.0000   0.0000     .        .   LDose             1   2.3721   0.2949   2.9502   1.7940   64.68     <.0001

The negative coefficient associated with LDose indicates that the probability of having no symptoms ( Symptoms = None) or no or mild symptoms ( Symptoms = None or Symptoms = Mild) decreases as LDose increases; that is, the probability of a severe symptom increases with LDose . This association is apparent for both treatment groups.

The negative coefficient associated with the standard treatment group ( Prep = stand ) indicates that the standard treatment is associated with more severe symptoms across all Ldose values.

The following statements use the PREDPPLOT statement to create the plot shown in Output 60.2.2 of the probabilities of the response taking on individual levels as a function of LDose . Since there are two covariates, LDose and Prep , the value of the CLASS variable Prep is fixed at the highest level, test . Although not shown here, the CDFPLOT statement creates similar plots of the cumulative response probabilities, instead of individual response level probabilities.

  proc probit data=multi order=data;   class Prep Symptoms;   parallel: model Symptoms=Prep LDose / lackfit;   predpplot var=ldose level=("None" "Mild" "Severe")   cfit=blue cframe=ligr inborder noconf ;   weight N;   title 'Probit Models for Symptom Severity';   run;

Output 60.2.2: Plot of Predicted Probilities for the Test Preparation Group

The following statements use the XDATA= data set to create a plot of the predicted probabilities with Prep set to the stand level. The resulting plot is shown in Output 60.2.3 .

Output 60.2.3: Plot of Predicted Probabilities for the Standard Preparation Group

  data xrow;   input Prep $ Dose Symptoms $ N;   LDose=log10(Dose);   datalines;   stand     40      Severe     32   run;   proc probit data=multi order=data xdata=xrow;   class Prep Symptoms;   parallel: model Symptoms=Prep LDose / lackfit;   predpplot var=ldose level=("None" "Mild" "Severe")   cfit=blue cframe=ligr inborder noconf ;   weight N;   title Predicted Probabilities for Standard Preparation;   run;

Example 60.3. Logistic Regression

In this example, a series of people are questioned as to whether or not they would subscribe to a new newspaper. For each person, the variables sex ( Female , Male), age , and subs (1=yes,0=no) are recorded. The PROBIT procedure is used to fita logistic regression model to the probability of a positive response (subscribing) as a function of the variables sex and age . Specifically, the probability of subscribing is modeled as

where F is the cumulative logistic distribution function.

By default, the PROBIT procedure models the probability of the lower response level for binary data. One way to model Pr( subs =1)is to format the response variable so that the formatted value corresponding to subs =1 is the lower level. The following statements format the values of subs as 1 = accept and 0 = reject , so that PROBIT models Pr(accept) = Pr( subs =1).

The following statements produce Output 60.3.1:

Output 60.3.1: Logistic Regression: PROC PROBIT

  Logistic Regression of Subscription Status   Probit Procedure   Class Level Information   Name      Levels    Values   subs           2    accept reject   sex            2    Female Male   PROC PROBIT is modeling the probabilities of levels of subs having LOWER   Ordered Values in the response profile table.

  data news;   input sex $ age subs;   datalines;   Female     35    0   Male       44    0   Male       45    1   Female     47    1   Female     51    0   Female     47    0   Male       54    1   Male       47    1   Female     35    0   Female     34    0   Female     48    0   Female     56    1   Male       46    1   Female     59    1   Female     46    1   Male       59    1   Male       38    1   Female     39    0   Male       49    1   Male       42    1   Male       50    1   Female     45    0   Female     47    0   Female     30    1   Female     39    0   Female     51    0   Female     45    0   Female     43    1   Male       39    1   Male       31    0   Female     39    0   Male       34    0   Female     52    1   Female     46    0   Male       58    1   Female     50    1   Female     32    0   Female     52    1   Female     35    0   Female     51    0   ;   proc format;   value subscrib 1 = 'accept' 0 = 'reject';   run;   proc probit;   class subs sex;   model subs=sex age / d=logistic itprint;   format subs subscrib.;   title 'Logistic Regression of Subscription Status';   run;

  Logistic Regression of Subscription Status   Probit Procedure   Iteration History for Parameter Estimates   Iter    Ridge    Loglikelihood       Intercept        sexFemale             age   0        0   27.725887               0               0               0   1        0   20.142659   3.634567629   1.648455751    0.1051634384   2        0   19.52245   5.254865196   2.234724956    0.1506493473   3        0   19.490439   5.728485385   2.409827238    0.1639621828   4        0   19.490303   5.76187293   2.422349862    0.1649007124   5        0   19.490303   5.7620267   2.422407743    0.1649050312   6        0   19.490303   5.7620267   2.422407743    0.1649050312   Model Information   Data Set                     WORK.NEWS   Dependent Variable                subs   Number of Observations              40   Name of Distribution          Logistic   Log Likelihood   19.49030281   PROC PROBIT is modeling the probabilities of levels of subs having LOWER   Ordered Values in the response profile table.

  Logistic Regression of Subscription Status   Probit Procedure   PROC PROBIT is modeling the probabilities of levels of subs having LOWER   Ordered Values in the response profile table.   Last Evaluation of the Negative of the Gradient   Intercept     sexFemale           age     5.95457E-12  8.768328E-10   1.636696E-8   Last Evaluation of the Negative of the Hessian   Intercept     sexFemale           age   Intercept  6.4597397447  4.6042218284  292.04051848   sexFemale  4.6042218284  4.6042218284  216.20829515   age        292.04051848  216.20829515  13487.329973   Algorithm converged.

  Logistic Regression of Subscription Status   Probit Procedure   PROC PROBIT is modeling the probabilities of levels of subs having LOWER   Ordered Values in the response profile table.   Analysis of Parameter Estimates   Standard   95% Confidence     Chi   Parameter         DF Estimate    Error       Limits       Square Pr > ChiSq   Intercept          1   5.7620   2.7635   11.1783   0.3458    4.35      0.0371   sex        Female  1   2.4224   0.9559   4.2959   0.5489    6.42      0.0113   sex        Male    0   0.0000   0.0000   0.0000   0.0000     .         .   age                1   0.1649   0.0652   0.0371   0.2927    6.40      0.0114

From Output 60.3.1, there appears to be an effect due to both the variables sex and age . The positive coefficient for age indicates that older people are more likely to subscribe than younger people. The negative coefficient for sex indicates that females are less likely to subscribe than males.

Example 60.4. An Epidemiology Study

The data, which is from an epidemiology study, consists of five variables: the number, r , of individuals surviving after an epidemic , out of n treated, for combinations of medicine dosage ( dose ), treatment ( treat = A, B), and sex ( sex = 0(Female), 1(Male)).

To see if the two treatments have different effects on male and female individual survival rate, the interaction term between the two variables treat and sex is included in the model.

The following invocation of PROC PROBIT fits the binary probit model to the grouped data:

  data epidemic;   input treat$ dose n r sex;   label dose = Dose;   datalines;   A  2.17 142 142  0   A   .57 132  47  1   A  1.68 128 105  1   A  1.08 126 100  0   A  1.79 125 118  0   B  1.66 117 115  1   B  1.49 127 114  0   B  1.17  51  44  1   B  2.00 127 126  0   B   .80 129 100  1   ;   data xval;   input treat $ dose sex ;   datalines;   B  2. 1   ;   title 'Epidemiology Study';   proc probit optc lackfit covout data = epidemic   outest = out1 xdata = xval;   class treat sex;   model r/n = dose treat sex sex*treat/corrb covb inversecl;   output out = out2 p =p;   predpplot   var = dose   font = swiss   vref(intersect) = .6667   vreflab = 'two thirds'   vreflabpos = 2   cfit=blue   cframe=ligr   ;   inset    /   cfill = white   ctext = blue   pos = se ;   ippplot   font = swiss   href(intersect) = .75   hreflab = 'three quarters'   vreflabpos = 2   threshlabpos = 2   cfit=blue   cframe=ligr   ;   inset    /   cfill = white   ctext = blue;   lpredplot   font = swiss   vref(intersect) = 1.   vreflab = 'unit probit'   vreflabpos = 2   cfit=blue   cframe=ligr   ;   inset    /   cfill = white   ctext = blue;   run;

The results of this analysis are shown in the following tables and figures.

Beginning with SAS Release 8.2, the PROBIT procedure does not support multiple MODEL statements. Only the last one is used if there is more than one MODEL statement in one invocation of the PROBIT procedure.

Output 60.4.1: Class Level Information

  Epidemiology Study   Probit Procedure   Class Level Information   Name       Levels    Values   treat           2    A B   sex             2    0 1

Output 60.4.1 displays the table of level information for all classification variables in the CLASS statement.

Output 60.4.2 displays the table of parameter information for the effects in the MODEL statement. The name of a parameter is generated from combining the variable names and level names in the effect. The maximum length of a parameter name is 32. The name of the effects are specified in the MODEL statement. The length of names of effects can be specified by the NAMELEN= option in the PROC PROBIT statement, with the default length 20.

Output 60.4.2: Parameter Information

  Epidemiology Study   Probit Procedure   Parameter Information   Parameter     Effect       treat    sex   Intercept     Intercept   dose          dose   treatA        treat        A   treatB        treat        B   sex0          sex                   0   sex1          sex                   1   treatAsex0    treat*sex    A        0   treatAsex1    treat*sex    A        1   treatBsex0    treat*sex    B        0   treatBsex1    treat*sex    B        1

Output 60.4.3 displays background information about the model fit. Included are the name of the input data set, the response variables used, and the number of observations, events, and trials. The table also includes the status of the convergence of the model fitting algorithm and the final value of log-likelihood function.

Output 60.4.3: Model Information

  Epidemiology Study   Probit Procedure   Model Information   Data Set                  WORK.EPIDEMIC   Events Variable                       r   Trials Variable                       n   Number of Observations               10   Number of Events                   1011   Number of Trials                   1204   Name of Distribution             Normal   Log Likelihood   387.2467391   Algorithm converged.

Output 60.4.4: Goodness-of-Fit Tests and Response-Covariate Profile

  Epidemiology Study   Probit Procedure   Goodness-of-Fit Tests   Statistic                         Value       DF    Pr > ChiSq   Pearson Chi-Square               4.9317        4        0.2944   L.R.    Chi-Square               5.7079        4        0.2220   Response-Covariate Profile   Response Levels                2   Number of Covariate Values    10

Output 60.4.4 displays the table of goodness-of-fit tests requested with the LACKFIT option in the PROC PROBIT statement. Two goodness-of-fit statistics, the Pearson chi-square statistic and the likelihood ratio chi-square statistic, are computed. The grouping method for computing these statistics can be specified by the AGGREGATE= option. The details can be found in the AGGREGATE= option and an example can be found in the second part of this example. By default, the PROBIT procedure uses the covariates in the MODEL statement to do grouping. Observations with the same values of the covariates in the MODEL statement are grouped into cells and the two statistics are computed according to these cells. The total number of cells , and the number of levels for the response variable are reported next in the Response-Covariate Profile.

In this example, neither the Pearson chi-square nor the log-likelihood ratio chi-square tests are significant at the 0.1 level, which is the default test level used by the PROBIT procedure. That means that the model, which includes the interaction of treat and sex , is suitable for this epidemiology data set. (Further investigation shows that models without the interaction of treat and sex are not acceptable by either test.)

Output 60.4.5: Type III Tests

  Epidemiology Study   Probit Procedure   Type III Analysis of Effects   Wald   Effect       DF    Chi-Square    Pr > ChiSq   dose          1       42.1691        <.0001   treat         1       16.1421        <.0001   sex           1        1.7710        0.1833   treat*sex     1       13.9343        0.0002

Output 60.4.5 displays the Type III test results for all effects specified in the MODEL statement, which include the degrees of freedom for the effect, the Wald Chi-Square test statistic, and the p -value.

Output 60.4.6 displays the table of parameter estimates for the model. The PROBIT procedure displays information for all the parameters of an effect. Degenerate parameters are indicated by 0 degree of freedom. Confidence intervals are computed for all parameters with non-zero degrees of freedom, including the natural threshold C if the OPTC option is specified in the PROC PROBIT statement. The confidence level can be specified by the ALPHA= option in the MODEL statement. The default confidence level is 95%.

Output 60.4.6: Analysis of Parameter Estimates

  Epidemiology Study   Probit Procedure   Analysis of Parameter Estimates   Standard   95% Confidence     Chi-   Parameter     DF Estimate    Error       Limits       Square Pr > ChiSq   Intercept      1   0.8871   0.3632   1.5991   0.1752    5.96     0.0146   dose           1   1.6774   0.2583   1.1711   2.1837   42.17     <.0001   treat     A    1   1.2537   0.2616   1.7664   0.7410   22.97     <.0001   treat     B    0   0.0000   0.0000   0.0000   0.0000     .        .   sex       0    1   0.4633   0.2289   0.9119   0.0147    4.10     0.0429   sex       1    0   0.0000   0.0000   0.0000   0.0000     .        .   treat*sex A 0  1   1.2899   0.3456   0.6126   1.9672   13.93     0.0002   treat*sex A 1  0   0.0000   0.0000   0.0000   0.0000     .        .   treat*sex B 0  0   0.0000   0.0000   0.0000   0.0000     .        .   treat*sex B 1  0   0.0000   0.0000   0.0000   0.0000     .        .   _C_            1   0.2735   0.0946   0.0881   0.4589

From this table, you can see the following results:

dose has significant positive effect on the survival rate.
Individuals under treatment A have a lower survival rate.
Male individuals have a higher survival rate.
Female individuals under treatment A have a higher survival rate.

Output 60.4.7 and Output 60.4.8 display tables of estimated covariance matrix and estimated correlation matrix for estimated parameters with a non-zero degree of freedom, respectively. They are computed by the inverse of the Hessian matrix of the estimated parameters.

Output 60.4.7: Estimated Covariance Matrix

  Epidemiology Study   Probit Procedure   Estimated Covariance Matrix   Intercept          dose        treatA          sex0    treatAsex0   Intercept       0.131944   0.087353      0.053551      0.030285   0.067056   dose   0.087353      0.066723   0.047506   0.034081      0.058620   treatA          0.053551   0.047506      0.068425      0.036063   0.075323   sex0            0.030285   0.034081      0.036063      0.052383   0.063599   treatAsex0   0.067056      0.058620   0.075323   0.063599      0.119408   _C_   0.028073      0.018196   0.017084   0.008088      0.019134   Estimated Covariance Matrix   _C_   Intercept   0.028073   dose            0.018196   treatA   0.017084   sex0   0.008088   treatAsex0      0.019134   _C_             0.008948

Output 60.4.8: Estimated Correlation Matrix

  Epidemiology Study   Probit Procedure   Estimated Correlation Matrix   Intercept          dose        treatA          sex0    treatAsex0   Intercept       1.000000   0.930998      0.563595      0.364284   0.534227   dose   0.930998      1.000000   0.703083   0.576477      0.656744   treatA          0.563595   0.703083      1.000000      0.602359   0.833299   sex0            0.364284   0.576477      0.602359      1.000000   0.804154   treatAsex0   0.534227      0.656744   0.833299   0.804154      1.000000   _C_   0.817027      0.744699   0.690420   0.373565      0.585364   Estimated Correlation Matrix   _C_   Intercept   0.817027   dose            0.744699   treatA   0.690420   sex0   0.373565   treatAsex0      0.585364   _C_             1.000000

Output 60.4.9 displays the computed values and fiducial limits for the first single continuous variable dose in the MODEL statement, given the probability levels, without the effect of the natural threshold, and when the option INSERSECL in the MODEL statement is specified. If there is no single continuous variable in the MODEL specification but the INVERSECL option is specified, an error is reported. If the XDATA= option is used to input a data set for the independent variables in the MODEL statement, the PROBIT procedure uses these values for the independent variables other than the single continuous variable. Missing values are not permitted in the XDATA= data set for the independent variables, although the value for the single continuous variable is not used in the computing of the fiducial limits. A suitable valid value should be given. In the data set xval created by the SAS statements on page 3784, Dose =2.

Output 60.4.9: Probit Analysis on Dose

  Epidemiology Study   Probit Procedure   Probit Analysis on dose   Probability          dose       95% Fiducial Limits   0.01   0.85801   1.81301   0.33743   0.02   0.69549   1.58167   0.21116   0.03   0.59238   1.43501   0.13093   0.04   0.51482   1.32476   0.07050   0.05   0.45172   1.23513   0.02130   0.06   0.39802   1.15888       0.02063   0.07   0.35093   1.09206       0.05742   0.08   0.30877   1.03226       0.09039   0.09   0.27043   0.97790       0.12040   0.10   0.23513   0.92788       0.14805   0.15   0.08900   0.72107       0.26278   0.20       0.02714   0.55706       0.35434   0.25       0.12678   0.41669       0.43322   0.30       0.21625   0.29095       0.50437   0.35       0.29917   0.17477       0.57064   0.40       0.37785   0.06487       0.63387   0.45       0.45397       0.04104       0.69546   0.50       0.52888       0.14481       0.75654   0.55       0.60380       0.24800       0.81819   0.60       0.67992       0.35213       0.88157   0.65       0.75860       0.45879       0.94803   0.70       0.84151       0.56985       1.01942   0.75       0.93099       0.68770       1.09847   0.80       1.03063       0.81571       1.18970   0.85       1.14677       0.95926       1.30171   0.90       1.29290       1.12867       1.45386   0.91       1.32819       1.16747       1.49273   0.92       1.36654       1.20867       1.53590   0.93       1.40870       1.25284       1.58450   0.94       1.45579       1.30084       1.64012   0.95       1.50949       1.35397       1.70515   0.96       1.57258       1.41443       1.78353   0.97       1.65015       1.48626       1.88238   0.98       1.75326       1.57833       2.01720   0.99       1.91577       1.71776       2.23537

See the section XDATA= SAS-data-set on page 3763 for the default values for those effects other than the single continuous variable, for which the fiducial limits are computed.

In this example, there are two classification variables, treat and sex . Fiducial limits for the dose variable are computed for the highest level of the classification variables, treat = B and sex = 1, which is the default specification. Since these are the default values, you would get the same values and fiducial limits if you did not specify the XDATA= option in this example. The confidence level for the fiducial limits can be specified by the ALPHA= option in the MODEL statement. The default level is 95%.

If a LOG10 or LOG option is used in the PROC PROBIT statement, the values and the fiducial limits are computed for both the single continuous variable and its logarithm.

Output 60.4.10 displays the OUTEST= data set. All parameters for an effect are included.

Output 60.4.10: Outest Data Set for Epidemiology Study

  Obs _MODEL_ _NAME_     _TYPE_ _DIST_  _STATUS_   _LNLIKE_      r    Intercept   1         r          PARMS  Normal 0 Converged   387.247   1.00000   0.88714   2         Intercept  COV    Normal 0 Converged   387.247   0.88714   0.13194   3         dose       COV    Normal 0 Converged   387.247   1.67739   0.08735   4         treatA     COV    Normal 0 Converged   387.247   1.25367   0.05355   5         treatB     COV    Normal 0 Converged   387.247   0.00000   0.00000   6         sex0       COV    Normal 0 Converged   387.247   0.46329   0.03029   7         sex1       COV    Normal 0 Converged   387.247   0.00000   0.00000   8         treatAsex0 COV    Normal 0 Converged   387.247   1.28991   0.06706   9         treatAsex1 COV    Normal 0 Converged   387.247   0.00000   0.00000   10         treatBsex0 COV    Normal 0 Converged   387.247   0.00000   0.00000   11         treatBsex1 COV    Normal 0 Converged   387.247   0.00000   0.00000   12         _C_        COV    Normal 0 Converged   387.247   0.27347   0.02807   treat                 treat treat treat treat   Obs   dose    treatA    B     sex0   sex1   Asex0 Asex1 Bsex0 Bsex1    _C_   1  1.67739   1.25367   0   0.46329   0   1.28991  0     0     0    0.27347   2   0.08735  0.05355   0    0.03029   0   0.06706  0     0     0   0.02807   3  0.06672   0.04751   0   0.03408   0   0.05862  0     0     0    0.01820   4   0.04751  0.06843   0    0.03606   0   0.07532  0     0     0   0.01708   5  0.00000  0.00000   0    0.00000   0   0.00000  0     0     0    0.00000   6   0.03408  0.03606   0    0.05238   0   0.06360  0     0     0   0.00809   7  0.00000  0.00000   0    0.00000   0   0.00000  0     0     0    0.00000   8  0.05862   0.07532   0   0.06360   0   0.11941  0     0     0    0.01913   9  0.00000  0.00000   0    0.00000   0   0.00000  0     0     0    0.00000   10  0.00000  0.00000   0    0.00000   0   0.00000  0     0     0    0.00000   11  0.00000  0.00000   0    0.00000   0   0.00000  0     0     0    0.00000   12  0.01820 -0.01708   0   -0.00809   0   0.01913  0     0     0    0.00895

The following three outputs, Output 60.4.11, Output 60.4.12, and Output 60.4.13, are generated from the three plot statements. The first plot, specified with the PREDPPLOT statement, is the plot of the predicted probability against the single continuous variable Dose, which is specified by the VAR= option in the PREDPPLOT statement. This single continuous variable must be in the MODEL statement. If the VAR= option is not used, the first single continuous variable in the MODEL statement is used. In this example, you would get the same plot if the VAR = dose was not used in the PREDPPLOT statement. You can specify values of other independent variables in the MODEL statement using an XDATA= data set, or by using the default values.

Output 60.4.11: Predicted Probability Plot

Output 60.4.12: Inverse Predicted Probability Plot

Output 60.4.13: Linear Predictor Plot

The second plot, specified with the IPPPLOT statement, is the inverse of the predicted probability plot with the fiducial limits. It should be pointed out that the fiducial limits are not just the inverse of the confidence limits in the predicted probability plot; see the section Inverse Confidence Limits on page 3761 for the computation of these limits. The third plot, specified with the LPREDPLOT statement, is the plot of the linear predictor x ² ² against the first single continuous variable (or the single continuous variable specified by the VAR= option) with the Wald confidence intervals.

After each plot statement, an optional INSET statement is used to draw a box within the plot (inset box). In the inset box, information about the model fitting can be specified. See INSET Statement on page 3723 for more detail.

Combining INEST= data set and the MAXIT= option in the MODEL statement, the PROBIT procedure can do prediction, if the parameterizations for the models used for the training data and the validation data are exactly the same.

After the first invocation of PROC PROBIT, you have the estimated parameters and their covariance matrix in the data set OUTEST = Out1 , and the fitted probabilities for the training data set epidemic in the data set OUTPUT = Out2 . See Output 60.4.10 on page 3791 for the data set Out1 and Output 60.4.14 on page 3795 for the data set Out2 .

Output 60.4.14: Out2

  Obs   treat    dose     n      r     sex       p   1     A      2.17    142    142     0     0.99272   2     A      0.57    132     47     1     0.35925   3     A      1.68    128    105     1     0.81899   4     A      1.08    126    100     0     0.77517   5     A      1.79    125    118     0     0.96682   6     B      1.66    117    115     1     0.97901   7     B      1.49    127    114     0     0.90896   8     B      1.17     51     44     1     0.89749   9     B      2.00    127    126     0     0.98364   10     B      0.80    129    100     1     0.76414

The validation data are collected in data set validate . The second invocation of PROC PROBIT simply passes the estimated parameters from the training data set epidemic to the validation data set validate for prediction. The predicted probabilities are stored in the data set OUTPUT = Out3 (see Output 60.4.15 on page 3795). The third invocation of PROC PROBIT passes the estimated parameters as initial values for a new fit of the validation data set using the same model. Predicted probabilities are stored in the data set OUTPUT = Out4 (see Output 60.4.16 on page 3795). Goodness-of-Fit tests are computed based on the cells grouped by the AGGREGATE= group variable. Results are shown in Output 60.4.17 on page 3796.

  data validate;   input treat $ dose sex n r group;   datalines;   B  2.0  0  44 43  1   B  2.0  1  54 52  2   B  1.5  1  36 32  3   B  1.5  0  45 40  4   A  2.0  0  66 64  5   A  2.0  1  89 89  6   A  1.5  1  45 39  7   A  1.5  0  66 60  8   B  2.0  0  44 44  1   B  2.0  1  54 54  2   B  1.5  1  36 30  3   B  1.5  0  45 41  4   A  2.0  0  66 65  5   A  2.0  1  89 88  6   A  1.5  1  45 38  7   A  1.5  0  66 59  8   ;   proc probit optc data = validate inest = out1;   class treat sex;   model r/n = dose treat sex sex*treat / maxit=0;   output out = out3 p =p;   run ;   proc probit optc lackfit data = validate inest = out1;   class treat sex;   model r/n = dose treat sex sex*treat / aggregate = group ;   output out = out4 p =p;   run ;

Output 60.4.15: Out3

  Obs   treat    dose    sex     n     r    group       p   1     B       2.0     0     44    43      1      0.98364   2     B       2.0     1     54    52      2      0.99506   3     B       1.5     1     36    32      3      0.96247   4     B       1.5     0     45    40      4      0.91145   5     A       2.0     0     66    64      5      0.98500   6     A       2.0     1     89    89      6      0.91835   7     A       1.5     1     45    39      7      0.74300   8     A       1.5     0     66    60      8      0.91666   9     B       2.0     0     44    44      1      0.98364   10     B       2.0     1     54    54      2      0.99506   11     B       1.5     1     36    30      3      0.96247   12     B       1.5     0     45    41      4      0.91145   13     A       2.0     0     66    65      5      0.98500   14     A       2.0     1     89    88      6      0.91835   15     A       1.5     1     45    38      7      0.74300   16     A       1.5     0     66    59      8      0.91666

Output 60.4.16: Out4

  Obs   treat    dose    sex     n     r    group       p   1     B       2.0     0     44    43      1      0.98954   2     B       2.0     1     54    52      2      0.98262   3     B       1.5     1     36    32      3      0.86187   4     B       1.5     0     45    40      4      0.90095   5     A       2.0     0     66    64      5      0.98768   6     A       2.0     1     89    89      6      0.98614   7     A       1.5     1     45    39      7      0.88075   8     A       1.5     0     66    60      8      0.88964   9     B       2.0     0     44    44      1      0.98954   10     B       2.0     1     54    54      2      0.98262   11     B       1.5     1     36    30      3      0.86187   12     B       1.5     0     45    41      4      0.90095   13     A       2.0     0     66    65      5      0.98768   14     A       2.0     1     89    88      6      0.98614   15     A       1.5     1     45    38      7      0.88075   16     A       1.5     0     66    59      8      0.88964

Output 60.4.17: Goodness-of-Fit Table

  Probit Procedure   Goodness-of-Fit Tests   Statistic                         Value       DF    Pr > ChiSq   Pearson Chi-Square               2.8101        2        0.2454   L.R.    Chi-Square               2.8080        2        0.2456