Examples | SAS/STAT 9.1, Users Guide, Volume 3 (volume 3 ONLY)

Example 39.1. Motorette Failure

This example fits a Weibull model and a lognormal model to the example given in Kalbfleisch and Prentice (1980, p. 5). An output data set called models is specified to contain the parameter estimates. By default, the natural log of the variable time is used by the procedure as the response. After this log transformation, the Weibull model is fit using the extreme value baseline distribution, and the lognormal is fit using the normal baseline distribution.

Since the extreme value and normal distributions do not contain any shape parameters, the variable SHAPE1 is missing in the models data set. An additional output data set, out , is created that contains the predicted quantiles and their standard errors for values of the covariate corresponding to temp =130 and temp =150. This is done with the control variable, which is set to 1 for only two observations.

Using the standard error estimates obtained from the output data set, approximate 90% confidence limits for the predicted quantities are then created in a subsequent DATA step for the log response. The logs of the predicted values are obtained because the values of the P= variable in the OUT= data set are in the same units as the original response variable, time . The standard errors of the quantiles of the log( time )are approximated (using a Taylor series approximation ) by the standard deviation of time divided by the mean value of time . These confidence limits are then converted back to the original scale by the exponential function. The following statements produce Output 39.1.1 through Output 39.1.5.

  title 'Motorette Failures With Operating Temperature as a Covariate';   data motors;   input time censor temp @@;   if _N_=1 then   do;   temp=130;   time=.;   control=1;   z=1000/(273.2+temp);   output;   temp=150;   time=.;   control=1;   z=1000/(273.2+temp);   output;   end;   if temp>150;   control=0;   z=1000/(273.2+temp);   output;   datalines;   8064 0 150 8064 0 150 8064 0 150 8064 0 150 8064 0 150   8064 0 150 8064 0 150 8064 0 150 8064 0 150 8064 0 150   1764 1 170 2772 1 170 3444 1 170 3542 1 170 3780 1 170   4860 1 170 5196 1 170 5448 0 170 5448 0 170 5448 0 170   408 1 190  408 1 190 1344 1 190 1344 1 190 1440 1 190   1680 0 190 1680 0 190 1680 0 190 1680 0 190 1680 0 190   408 1 220  408 1 220  504 1 220  504 1 220  504 1 220   528 0 220  528 0 220  528 0 220  528 0 220  528 0 220   ;   proc print data=motors;   run;   proc lifereg data=motors outest=modela covout;   a: model time*censor(0)=z;   output out=outa quantiles=.1 .5 .9 std=std p=predtime   control=control;   run;   proc lifereg data=motors outest=modelb covout;   b: model time*censor(0)=z / dist=lnormal;   output out=outb quantiles=.1 .5 .9 std=std p=predtime   control=control;   run;   data models;   set modela modelb;   run;   proc print data=models;   id _model_;   title 'fitted models';   run;   data out;   set outa outb;   run;   data out1;   set out;   ltime=log(predtime);   stde=std/predtime;   upper=exp(ltime+1.64*stde);   lower=exp(ltime-1.64*stde);   run;   proc print;   id temp;   title 'quantile estimates and confidence limits';   run;

Output 39.1.1: Motorette Failure Data

  Motorette Failures With Operating Temperature as a Covariate   Obs   time    censor    temp    control       z   1      .       0       130       1       2.48016   2      .       0       150       1       2.36295   3   1764       1       170       0       2.25632   4   2772       1       170       0       2.25632   5   3444       1       170       0       2.25632   6   3542       1       170       0       2.25632   7   3780       1       170       0       2.25632   8   4860       1       170       0       2.25632   9   5196       1       170       0       2.25632   10   5448       0       170       0       2.25632   11   5448       0       170       0       2.25632   12   5448       0       170       0       2.25632   13    408       1       190       0       2.15889   14    408       1       190       0       2.15889   15   1344       1       190       0       2.15889   16   1344       1       190       0       2.15889   17   1440       1       190       0       2.15889   18   1680       0       190       0       2.15889   19   1680       0       190       0       2.15889   20   1680       0       190       0       2.15889   21   1680       0       190       0       2.15889   22   1680       0       190       0       2.15889   23    408       1       220       0       2.02758   24    408       1       220       0       2.02758   25    504       1       220       0       2.02758   26    504       1       220       0       2.02758   27    504       1       220       0       2.02758   28    528       0       220       0       2.02758   29    528       0       220       0       2.02758   30    528       0       220       0       2.02758   31    528       0       220       0       2.02758   32    528       0       220       0       2.02758

Output 39.1.2: Motorette Failure ”Model A

  The LIFEREG Procedure   Model Information   Data Set                     WORK.MOTORS   Dependent Variable             Log(time)   Censoring Variable                censor   Censoring Value(s)                     0   Number of Observations                30   Noncensored Values                    17   Right Censored Values                 13   Left Censored Values                   0   Interval Censored Values               0   Missing Values                         2   Name of Distribution             Weibull   Log Likelihood              -22.95148315   Type III Analysis of Effects   Wald   Effect       DF    Chi-Square    Pr > ChiSq   z             1       99.5239        <.0001   Analysis of Parameter Estimates   Standard   95% Confidence     Chi   Parameter     DF Estimate    Error       Limits       Square Pr > ChiSq   Intercept      1 -11.8912   1.9655 -15.7435  -8.0389   36.60     <.0001   z              1   9.0383   0.9060   7.2626  10.8141   99.52     <.0001   Scale          1   0.3613   0.0795   0.2347   0.5561   Weibull Shape  1   2.7679   0.6091   1.7982   4.2605

Output 39.1.3: Motorette Failure ”Model B

  The LIFEREG Procedure   Model Information   Data Set                     WORK.MOTORS   Dependent Variable             Log(time)   Censoring Variable                censor   Censoring Value(s)                     0   Number of Observations                30   Noncensored Values                    17   Right Censored Values                 13   Left Censored Values                   0   Interval Censored Values               0   Missing Values                         2   Name of Distribution           Lognormal   Log Likelihood              -24.47381031   Type III Analysis of Effects   Wald   Effect       DF    Chi-Square    Pr > ChiSq   z             1       42.0001        <.0001   Analysis of Parameter Estimates   Standard   95% Confidence     Chi   Parameter     DF Estimate    Error       Limits       Square Pr > ChiSq   Intercept      1 -10.4706   2.7719 -15.9034  -5.0377   14.27     0.0002   z              1   8.3221   1.2841   5.8052  10.8389   42.00     <.0001   Scale          1   0.6040   0.1107   0.4217   0.8652

Output 39.1.4: Motorette Failure ”Fitted Models

  fitted models   _MODEL_    _NAME_       _TYPE_     _DIST_       _STATUS_      _LNLIKE_        time   Intercept        z        _SCALE_   A       time         PARMS     Weibull      0 Converged    -22.9515     -1.0000    -11.8912     9.03834     0.36128   A       Intercept    COV       Weibull      0 Converged    -22.9515    -11.8912      3.8632    -1.77878     0.03448   A       z            COV       Weibull      0 Converged    -22.9515      9.0383     -1.7788     0.82082    -0.01488   A       Scale        COV       Weibull      0 Converged    -22.9515      0.3613      0.0345    -0.01488     0.00632   B       time         PARMS     Lognormal    0 Converged    -24.4738     -1.0000    -10.4706     8.32208     0.60403   B       Intercept    COV       Lognormal    0 Converged    -24.4738    -10.4706      7.6835    -3.55566     0.03267   B       z            COV       Lognormal    0 Converged    -24.4738      8.3221     -3.5557     1.64897    -0.01285   B       Scale        COV       Lognormal    0 Converged    -24.4738      0.6040      0.0327    -0.01285     0.01226

Output 39.1.5: Motorette Failure ”Quantile Estimates and Confidence Limits

  quantile estimates and confidence limits   temp   time   censor   control      z      _PROB_   predtime      std      ltime     stde      upper       lower   130     .       0        1      2.48016     0.1    16519.27    5999.85    9.7123   0.36320    29969.51   9105.47   130     .       0        1      2.48016     0.5    32626.65    9874.33   10.3929   0.30265    53595.71  19861.63   130     .       0        1      2.48016     0.9    50343.22   15044.35   10.8266   0.29884    82183.49  30838.80   150     .       0        1      2.36295     0.1     5726.74    1569.34    8.6529   0.27404     8976.12   3653.64   150     .       0        1      2.36295     0.5    11310.68    2299.92    9.3335   0.20334    15787.62   8103.28   150     .       0        1      2.36295     0.9    17452.49    3629.28    9.7672   0.20795    24545.37  12409.24   130     .       0        1      2.48016     0.1    12033.19    5482.34    9.3954   0.45560    25402.68   5700.09   130     .       0        1      2.48016     0.5    26095.68   11359.45   10.1695   0.43530    53285.36  12779.95   130     .       0        1      2.48016     0.9    56592.19   26036.90   10.9436   0.46008   120349.65  26611.42   150     .       0        1      2.36295     0.1     4536.88    1443.07    8.4200   0.31808     7643.71   2692.83   150     .       0        1      2.36295     0.5     9838.86    2901.15    9.1941   0.29487    15957.38   6066.36   150     .       0        1      2.36295     0.9    21336.97    7172.34    9.9682   0.33615    37029.72  12294.62

Example 39.2. Computing Predicted Values for a Tobit Model

The LIFEREG Procedure can be used to perform a Tobit analysis. The Tobit model, described by Tobin (1958), is a regression model for left-censored data assuming a normally distributed error term . The model parameters are estimated by maximum likelihood. PROC LIFEREG provides estimates of the parameters of the distribution of the uncensored data. Refer to Greene (1993) and Maddala (1983) for a more complete discussion of censored normal data and related distributions. This example shows how you can use PROC LIFEREG and the DATA step to compute two of the three types of predicted values discussed there.

Consider a continuous random variable Y, and a constant C. If you were to sample from the distribution of Y but discard values less than (greater than) C, the distribution of the remaining observations would be truncated on the left (right). If you were to sample from the distribution of Y and report values less than (greater than) C as C, the distribution of the sample would be left (right) censored .

The probability density function of the truncated random variable Y ² is given by

where f _Y ( y ) is the probability density function of Y. PROC LIFEREG cannot compute the proper likelihood function to estimate parameters or predicted values for a truncated distribution.

Suppose the model being fit is specified as follows :

where ˆˆ _i is a normal error term with zero mean and standard deviation ƒ .

Define the censored random variable Y _i as

This is the Tobit model for left-censored normal data. Y* _i is sometimes called the latent variable . PROC LIFEREG estimates parameters of the distribution of Y* _i by maximum likelihood.

You can use the LIFEREG procedure to compute predicted values based on the mean functions of the latent and observed variables . The mean of the latent variable Y* i is x ² _i ² and you can compute values of the mean for different settings of x _i by specifying XBETA= variable-name in an OUTPUT statement. Estimates of x ² _i ² for each observation will be written to the OUT= data set. Predicted values of the observed variable Y _i can be computed based on the mean

where

and represent the normal probability density and cumulative distribution functions.

Although the distribution of ˆˆ _i in the Tobit model is often assumed normal, you can use other distributions for the Tobit model in the LIFEREG procedure by specifying a distribution with the DISTRIBUTION= option in the MODEL statement. One distribution should be mentioned is the logistic distribution. For this distribution, the MLE has bounded influence function with respect to the response variable, but not the design variables. If you believe your data has outliers in the response direction, you might try this distribution for some robust estimation of the Tobit model.

With the logistic distribution the predicted values of the observed variable Y _i can be computed based on the mean of Y* i

The following table shows a subset of the Mroz (1987) data set. In this data, Hours is the number of hours the wife worked outside the household in a given year, Yrs_ Ed is the years of education, and Yrs_ Exp is the years of work experience. A Tobit model will be fit to the hours worked with years of education and experience as covariates.

Hours	Yrs_ Ed	Yrs_ Exp
	8	9
	8	12
	9	10
	10	15
	11	4
	11	6
1000	12	1
1960	12	29
	13	3
2100	13	36
3686	14	11
1920	14	38
	15	14
1728	16	3
1568	16	19
1316	17	7
	17	15

If the wife was not employed (worked 0 hours), her hours worked will be left-censored at zero. In order to accommodate left censoring in PROC LIFEREG, you need two variables to indicate censoring status of observations. You can think of these variables as lower and upper endpoints of interval censoring. If there is no censoring, set both variables to the observed value of Hours . To indicate left censoring, set the lower endpoint to missing and the upper endpoint to the censored value, zero in this case.

The following statements create a SAS data set with the variables Hours , Yrs_ Ed , and Yrs_ Exp from the preceding data. A new variable, Lower is created such that Lower =. if Hours =0 and Lower = Hours if Hours >0.

  data subset;   input Hours Yrs_Ed Yrs_Exp @@;   if Hours eq 0   then Lower=.;   else Lower=Hours;   datalines;   0 8 9 0 8 12 0 9 10 0 10 15 0 11 4 0 11 6   1000 12 1 1960 12 29 0 13 3 2100 13 36   3686 14 11 1920 14 38 0 15 14 1728 16 3   1568 16 19 1316 17 7 0 17 15   ;

The following statements fit a normal regression model to the left-censored Hours data using Yrs_ Ed and Yrs_ Exp as covariates. You will need the estimated standard deviation of the normal distribution to compute the predicted values of the censored distribution from the preceding formulas. The data set OUTEST contains the standard deviation estimate in a variable named _ SCALE_ . You also need estimates of . These are contained in the data set OUT as the variable Xbeta

  proc lifereg data=subset outest=OUTEST(keep=_scale_);   model (lower, hours) = yrs_ed yrs_exp / d=normal;   output out=OUT xbeta=Xbeta;   run;

Output 39.2.1 shows the results of the model fit. These tables show parameter estimates for the uncensored, or latent variable, distribution.

Output 39.2.1: Parameter Estimates from PROC LIFEREG

  The LIFEREG Procedure   Model Information   Data Set                     WORK.SUBSET   Dependent Variable                 Lower   Dependent Variable                 Hours   Number of Observations                17   Noncensored Values                     8   Right Censored Values                  0   Left Censored Values                   9   Interval Censored Values               0   Name of Distribution              Normal   Log Likelihood               -74.9369977   Analysis of Parameter Estimates   Standard   95% Confidence     Chi   Parameter DF Estimate    Error       Limits       Square Pr > ChiSq   Intercept  1 -5598.64 2850.248 -11185.0 -12.2553    3.86     0.0495   Yrs_Ed     1 373.1477 191.8872  -2.9442 749.2397    3.78     0.0518   Yrs_Exp    1 63.3371   38.3632 -11.8533 138.5276    2.73     0.0987   Scale      1 1582.870 442.6732 914.9433 2738.397

The following statements combine the two data sets created by PROC LIFEREG to compute predicted values for the censored distribution. The OUTEST= data set contains the estimate of the standard deviation from the uncensored distribution, and the OUT= data set contains estimates of .

  data predict;   drop lambda _scale_ _prob_;   set out;   if _n_ eq 1 then set outest;   lambda = pdf('NORMAL',Xbeta/_scale_)   / cdf('NORMAL',Xbeta/_scale_);   Predict = cdf('NORMAL', Xbeta/_scale_)   * (Xbeta + _scale_*lambda);   label Xbeta='MEAN OF UNCENSORED VARIABLE'   Predict = 'MEAN OF CENSORED VARIABLE';   run;   proc print data=predict noobs label;   var hours lower yrs: xbeta predict;   run;

Output 39.2.2 shows the original variables, the predicted means of the uncensored distribution, and the predicted means of the censored distribution.

Output 39.2.2: Predicted Means from PROC LIFEREG

  MEAN OF      MEAN OF   UNCENSORED    CENSORED   Hours   Lower    Yrs_Ed    Yrs_Exp     VARIABLE     VARIABLE   0        .       8          9       -2043.42        73.46   0        .       8         12       -1853.41        94.23   0        .       9         10       -1606.94       128.10   0        .      10         15        -917.10       276.04   0        .      11          4       -1240.67       195.76   0        .      11          6       -1113.99       224.72   1000     1000      12          1       -1057.53       238.63   1960     1960      12         29         715.91      1052.94   0        .      13          3        -557.71       391.42   2100     2100      13         36        1532.42      1672.50   3686     3686      14         11         322.14       805.58   1920     1920      14         38        2032.24      2106.81   0        .      15         14         885.30      1170.39   1728     1728      16          3         561.74       951.69   1568     1568      16         19        1575.13      1708.24   1316     1316      17          7        1188.23      1395.61   0        .      17         15        1694.93      1809.97

Example 39.3. Overcoming Convergence Problems by Specifying Initial Values

This example illustrates the use of parameter initial value specification to help overcome convergence difficulties.

The following statements create a data set and request a Weibull regression model be fit to the data.

  data raw;   input censor x c1 @@;   datalines;   0 16 0.00   0 17 0.00   0 18 0.00   0 17 0.04   0 18 0.04   0 18 0.04   0 23 0.40   0 22 0.40   0 22 0.40   0 33 4.00   0 34 4.00   0 35 4.00   1 54 40.00  1 54 40.00  1 54 40.00   1 54 400.00 1 54 400.00 1 54 400.00   ;   run;   proc print;   run;   title 'OLS (default) initial values';   proc lifereg data=raw;   model x*censor(1) = c1 / distribution = weibull itprint;   run;

Output 39.3.1 shows the data set contents.

Output 39.3.1: Contents of the Data Set

  Obs   censor     x        c1   1       0      16      0.00   2       0      17      0.00   3       0      18      0.00   4       0      17      0.04   5       0      18      0.04   6       0      18      0.04   7       0      23      0.40   8       0      22      0.40   9       0      22      0.40   10       0      33      4.00   11       0      34      4.00   12       0      35      4.00   13       1      54     40.00   14       1      54     40.00   15       1      54     40.00   16       1      54    400.00   17       1      54    400.00   18       1      54    400.00

Convergence was not attained in 50 iterations for this model, as the messages to the log indicate:

WARNING:	Convergence was not attained in 50 iterations. You may want to increase the maximum number of iterations (MAXITER= option) or change the convergence criteria (CONVERGE = value) in the MODEL statement.
WARNING:	The procedure is continuing in spite of the above warning. Results shown are based on the last maximum likelihood iteration. Validity of the model fit is questionable.

The first line ( iter =0) of the iteration history table, in Output 39.3.2, shows the default initial ordinary least squares (OLS) estimates of the parameters.

Output 39.3.2: Initial Least Squares

  OLS (default) initial values   Iter   Ridge       Loglike       Intercept              c1           Scale   0        0    -22.891088    3.2324769714    0.0020664542    0.3995754195

The log logistic distribution is more robust to large values of the response than the Weibull, so one approach to improving the convergence performance is to fitalog logistic distribution, and if this converges, use the resulting parameter estimates as initial values in a subsequent fit of a model with the Weibull distribution.

The following statements fit a log logistic distribution to the data.

  proc lifereg data=raw;   model x*censor(1) = c1 / distribution = llogistic;   run;

The algorithm converges, and the maximum likelihood estimates for the log logistic distribution are shown in Output 39.3.3

Output 39.3.3: Estimates from the Log Logistic Distribution

  The LIFEREG Procedure   Model Information   Data Set                        WORK.RAW   Dependent Variable                Log(x)   Censoring Variable                censor   Censoring Value(s)                     1   Number of Observations                18   Noncensored Values                    12   Right Censored Values                  6   Left Censored Values                   0   Interval Censored Values               0   Name of Distribution           LLogistic   Log Likelihood              12.093136846   Analysis of Parameter Estimates   Standard   95% Confidence     Chi   Parameter     DF Estimate    Error       Limits       Square Pr > ChiSq   Intercept      1   2.8983   0.0318   2.8360   2.9606 8309.43    <.0001   c1             1   0.1592   0.0133   0.1332   0.1852 143.85     <.0001   Scale          1   0.0498   0.0122   0.0308   0.0804

The following statements re-fit the Weibull model using the maximum likelihood estimates from the log logistic fit as initial values.

  proc lifereg data=raw outest=outest;   model x*censor(1) = c1 / itprint distribution = weibull   intercept=2.898 initial=0.16 scale=0.05;   output out=out xbeta=xbeta;   run;

Examination of the resulting output in Output 39.3.4 shows that the convergence problem has been solved by specifying different initial values.

As an example, the following invocation of PROC LIFEREG, using the INEST= data set providing starting values for the three parameters, is equivalent to the previous invocation.

  data in;   input intercept c1 scale;   datalines;   2.898 0.16 0.05   ;   proc lifereg data=raw inest=in outest=outest;   model x*censor(1) = c1 / itprint distribution = weibull;   output out=out xbeta=xbeta;   run;

Output 39.3.4: Final Estimates from the Weibull Distribution

  The LIFEREG Procedure   Model Information   Data Set                        WORK.RAW   Dependent Variable                Log(x)   Censoring Variable                censor   Censoring Value(s)                     1   Number of Observations                18   Noncensored Values                    12   Right Censored Values                  6   Left Censored Values                   0   Interval Censored Values               0   Name of Distribution             Weibull   Log Likelihood              11.232023272   Algorithm converged.   Analysis of Parameter Estimates   Standard   95% Confidence     Chi   Parameter     DF Estimate    Error       Limits       Square Pr > ChiSq   Intercept      1   2.9699   0.0326   2.9059   3.0338 8278.86     <.0001   c1             1   0.1435   0.0165   0.1111   0.1758   75.43     <.0001   Scale          1   0.0844   0.0189   0.0544   0.1308   Weibull Shape  1  11.8526   2.6514   7.6455   18.3749

Example 39.4. Analysis of Arbitrarily Censored Data with Interaction Effects

The following artificial data are for a study of the natural recovery time of mice after injection of a certain toxin. 20 mice were grouped by sex ( sex : 1 = Male, 2 = Female) with equal sizes. Their ages (in days) were recorded at the injection. Their recovery times (in minutes) were also recorded. Toxin density in blood was used to decide whether a mouse recovered. Mice were checked at two times for recovery. If a mouse had recovered at the first time, the observation is left-censored, and no further measurement is made. The variable time1 is set to missing and time2 is set to the measurement time to indicate left-censoring. If a mouse had not recovered at the first time, it was checked later at a second time. If it had recovered by the second measurement time, the observation is interval-censored and the variable time1 is set to the first measurement time and time2 is set to the second measurement time. If there was no recovery at the second measurement, the observation is right-censored, and time1 is set to the second measurement time and time2 is set to missing to indicate right-censoring.

The following statements create a SAS data set containing the data from the experiment and fit a Weibull model with age, sex, and age and sex interaction as covariates.

  title 'Natural Recovery Time';   data mice;   input sex age time1 time2 ;   datalines;   1  57  631   631   1  45  .     170   1  54  227   227   1  43  143   143   1  64  916   .   1  67  691   705   1  44  100   100   1  59  730   .   1  47  365   365   1  74  1916  1916   2  79  1326  .   2  75  837   837   2  84  1200  1235   2  54  .     365   2  74  1255  1255   2  71  1823  .   2  65  537   637   2  33  583   683   2  77  955   .   2  46  577   577   ;   data xrow1;   input sex age time1 time2 ;   datalines;   1  50  .  .   ;   data xrow2;   input sex age time1 time2 ;   datalines;   2  60.6  .  .   ;   proc lifereg data=mice xdata=xrow1;   class sex ;   model (time1, time2) = age sex age*sex / dist=Weibull;   probplot  / nodata   font = swiss   plower=.5   vref(intersect) = 75   vreflab = '75 Percent'   vreflabpos = 2   cfit=blue   cframe=ligr   ;   inset / cfill = white   ctext = blue;   run;

Standard output is shown in Output 39.4.1. Tables containing general model information, Type III tests for the main effects and interaction terms, and parameter estimates are created.

Output 39.4.1: Parameter Estimates for the Interaction Model

  Natural Recovery Time   The LIFEREG Procedure   Model Information   Data Set                       WORK.MICE   Dependent Variable            Log(time1)   Dependent Variable            Log(time2)   Number of Observations                20   Noncensored Values                     9   Right Censored Values                  5   Left Censored Values                   2   Interval Censored Values               4   Name of Distribution             Weibull   Log Likelihood              -25.91033295   Type III Analysis of Effects   Wald   Effect       DF    Chi-Square    Pr > ChiSq   age           1       33.8496        <.0001   sex           1       14.0245        0.0002   age*sex       1       10.7196        0.0011   Analysis of Parameter Estimates   Standard   95% Confidence     Chi   Parameter       DF Estimate    Error       Limits       Square Pr > ChiSq   Intercept        1   5.4110   0.5549   4.3234   6.4986   95.08     <.0001   age              1   0.0250   0.0086   0.0081   0.0419    8.42     0.0037   sex           1  1   3.9808   1.0630   6.0643   1.8974   14.02     0.0002   sex           2  0   0.0000   0.0000   0.0000   0.0000     .        .   age*sex       1  1   0.0613   0.0187   0.0246   0.0980   10.72     0.0011   age*sex       2  0   0.0000   0.0000   0.0000   0.0000     .        .   Scale            1   0.4087   0.0900   0.2654   0.6294   Weibull Shape    1   2.4468   0.5391   1.5887   3.7682

The following two plots display the predicted probability against the recovery time for two different populations. Output 39.4.2 is created with the PROBPLOT statement with the option XDATA= xrow1 , which specifies the population with sex = 1, age = 50. Although the SAS statements are not shown , Output 39.4.3 is created with the PROBPLOT statement with the option XDATA= xrow2 , which specifies the population with sex = 2, age = 60.6. These are the default values that the LIFEREG procedure would use for the probability plot if the XDATA= option had not been specified. Reference lines are used to display specified predicted probability points and their relative locations on the plot.

Output 39.4.2: Probability Plot for Recovery Time with sex =1,age =50

Output 39.4.3: Probability Plot for Recovery Time with sex =2, age = 60.6

Example 39.5. Probability Plotting_Right Censoring

The following statements create a SAS data set containing observed and right-censored lifetimes of 70 diesel engine fans (Nelson 1982, p. 318).

  title 'Engine Fan Lifetime Study;   data fan;   input lifetime censor@@;   lifetime = lifetime / 1000;   label lifetime = Lifetime;   datalines;   450 0    460 1   1150 0   1150 0   1560 1   1600 0   1660 1   1850 1   1850 1   1850 1   1850 1   1850 1   2030 1   2030 1   2030 1   2070 0   2070 0   2080 0   2200 1   3000 1   3000 1   3000 1   3000 1   3100 0   3200 1   3450 0   3750 1   3750 1   4150 1   4150 1   4150 1   4150 1   4300 1   4300 1   4300 1   4300 1   4600 0   4850 1   4850 1   4850 1   4850 1   5000 1   5000 1   5000 1   6100 1   6100 0   6100 1   6100 1   6300 1   6450 1   6450 1   6700 1   7450 1   7800 1   7800 1   8100 1   8100 1   8200 1   8500 1   8500 1   8500 1   8750 1   8750 0   8750 1   9400 1   9900 1  10100 1  10100 1  10100 1  11500 1   ;   run;

Some of the fans had not failed at the time the data were collected, and the unfailed units have right-censored lifetimes. The variable LIFETIME represents either a failure time or a censoring time in thousands of hours. The variable CENSOR is equal to 0 if the value of LIFETIME is a failure time, and it is equal to 1 if the value is a censoring time. The following statements use the LIFEREG procedure to produce the probability plot with an inset for the engine lifetimes.

  symbol v=dot c=white;   proc lifereg;   model lifetime*censor(1) = / d = weibull;   probplot   cencolor = red   cframe   = ligr   cfit     = blue   ppout   npintervals=simul   ;   inset /   cfill = white   ctext = blue;   run;

The resulting graphical output is shown in Output 39.5.1. The estimated CDF, a line representing the maximum likelihood fit, and pointwise parametric confidence bands are plotted in the body of Output 39.5.1. The values of right-censored observations are plotted along the top of the graph. The Cumulative Probability Estimates table is also created in Output 39.5.2.

Output 39.5.1: Probability Plot for the Fan Data

Output 39.5.2: CDF Estimates

  The LIFEREG Procedure   Cumulative Probability Estimates   Simultaneous                   Kaplan-   95% Confidence      Kaplan-       Meier   Cumulative         Limits            Meier    Standard   Lifetime   Probability     Lower     Upper    Estimate       Error   0.45         0.0071    0.0007    0.2114      0.0143      0.0142   1.15         0.0215    0.0033    0.2114      0.0288      0.0201   1.15         0.0360    0.0073    0.2168      0.0433      0.0244   1.6         0.0506    0.0125    0.2304      0.0580      0.0282   2.07         0.0666    0.0190    0.2539      0.0751      0.0324   2.07         0.0837    0.0264    0.2760      0.0923      0.0361   2.08         0.1008    0.0344    0.2972      0.1094      0.0392   3.1         0.1189    0.0436    0.3223      0.1283      0.0427   3.45         0.1380    0.0535    0.3471      0.1477      0.0460   4.6         0.1602    0.0653    0.3844      0.1728      0.0510   6.1         0.1887    0.0791    0.4349      0.2046      0.0581   8.75         0.2488    0.0884    0.6391      0.2930      0.0980

Example 39.6. Probability Plotting_Arbitrarily Censoring

Table 39.3 contains microprocessor failure data from Nelson (1990). Units were inspected at predetermined time intervals. The data consist of inspection interval endpoints (in hours) and the number of units failing in each interval. A missing (.) lower endpoint indicates left censoring, and a missing upper endpoint indicates right censoring. These can be thought of as semi-infinite intervals with a lower (upper) endpoint of negative (positive) infinity for left (right) censoring.

Table 39.3: Interval-Censored Data
Lower Endpoint	Upper Endpoint	Number Failed
.	6	6
6	12	2
24	48	2
24	.	1
48	168	1
48	.	839
168	500	1
168	.	150
500	1000	2
500	.	149
1000	2000	1
1000	.	147
2000	.	122

The following SAS program will compute the Turnbull estimate and create a lognormal probability plot.

  data micro;   input t1 t2 f ;   datalines;   . 6 6   6 12 2   12 24 0   24 48 2   24 .  1   48 168 1   48 .   839   168 500 1   168 .   150   500 1000 2   500 .    149   1000 2000 1   1000 . 147   2000 . 122   ;   symbol v=dot c=white;   proc lifereg data=micro;   model (t1 t2)=/d=lognormal intercept=25 scale=5;   weight f;   probplot   cframe = ligr   cfit   = blue   pupper = 10   itprintem   printprobs   maxitem = (1000,25)   ppout;   inset / cfill = white;   run;

The two initial values INTERCEPT= 25 and SCALE= 5 in the MODEL statement are used to aid convergence in the model-fitting algorithm.

The following tables are created by the PROBPLOT statement in addition to the standard tabular output from the MODEL statement. Output 39.6.1 shows the iteration history for the Turnbull estimate of the CDF for the microprocessor data. With both options ITPRINTEM and PRINTPROBS specified in the PROBPLOT statement, this table contains the log likelihoods and interval probabilities for every 25th iteration and the last iteration. It would only contain the log likelihoods if the option PRINTPROBS were not specified.

Output 39.6.1: Iteration History for the Turnbull Estimate

  The LIFEREG Procedure   Iteration History for the Turnbull Estimate of the CDF   Iteration  Loglikelihood       (., 6)      (6, 12)     (24, 48)    (48, 168)   (168, 500)   (500, 1000)   (1000, 2000)    (2000, .)     1133.4051        0.125        0.125        0.125        0.125   0.125         0.125          0.125        0.125   25   104.16622   0.00421644   0.00140548   0.00140648   0.00173338   0.00237846    0.00846094     0.04565407   0.93474475   50   101.15151   0.00421644   0.00140548   0.00140648   0.00173293   0.00234891    0.00727679     0.01174486   0.96986811   75   101.06641   0.00421644   0.00140548   0.00140648   0.00173293   0.00234891    0.00727127     0.00835638    0.9732621   100   101.06534   0.00421644   0.00140548   0.00140648   0.00173293   0.00234891    0.00727125     0.00801814   0.97360037   125   101.06533   0.00421644   0.00140548   0.00140648   0.00173293   0.00234891    0.00727125     0.00798438   0.97363413   130   101.06533   0.00421644   0.00140548   0.00140648   0.00173293   0.00234891    0.00727125       0.007983   0.97363551

The table in Output 39.6.2 summarizes the Turnbull estimates of the interval probabilities, the reduced gradients, and Lagrange multipliers as described in the section Arbitrarily Censored Data on page 2119.

Output 39.6.2: Summary for the Turnbull Algorithm

  The LIFEREG Procedure   Lower       Upper                        Reduced        Lagrange   Lifetime   Lifetime    Probability        Gradient      Multiplier   .           6         0.0042               0               0   6          12         0.0014               0               0   24          48         0.0014               0               0   48         168         0.0017               0               0   168         500         0.0023               0               0   500        1000         0.0073    -7.219342E-9               0   1000        2000         0.0080    -0.037063236               0   2000           .         0.9736    0.0003038877               0

Output 39.6.3 shows the final estimate of the CDF, along with standard errors and nonparametric confidence limits. Two kinds of nonparametric confidence limits, pointwise or simultaneous, are available. The default is the pointwise nonparametric confidence limits. You can specify the simultaneous nonparametric confidence limits by the NPINTERVALS= SIMUL option.

Output 39.6.3: Final CDF Estimates for Turnbull Algorithm

  The LIFEREG Procedure   Cumulative Probability Estimates   Pointwise 95%   Confidence   Lower       Upper     Cumulative         Limits         Standard   Lifetime   Lifetime    Probability     Lower     Upper       Error   6           6         0.0042    0.0019    0.0094      0.0017   12          24         0.0056    0.0028    0.0112      0.0020   48          48         0.0070    0.0038    0.0130      0.0022   168         168         0.0088    0.0047    0.0164      0.0028   500         500         0.0111    0.0058    0.0211      0.0037   1000        1000         0.0184    0.0094    0.0357      0.0063   2000        2000         0.0264    0.0124    0.0553      0.0101

Output 39.6.4 shows the CDF estimates, the maximum likelihood fit, and the pointwise parametric confidence limits plotted on a lognormal probability plot.

Output 39.6.4: Lognormal Probability Plot for the Microprocessor Data