Any observation with missing values for the dependent variable is not used in the model estimation unless it is one and only one of the values in an interval specification. Also, if one of the explanatory variables or the censoring variable is missing, the observation is not used. For any observation to be used in the estimation of a model, only the
Main effects as well as interaction terms are allowed in the model specification, similar to the GLM procedure. For numeric variables, a main effect is a linear term equal to the value of the variable unless the variable appears in the CLASS statement. For variables listed in the CLASS statement, PROC LIFEREG creates indicator variables (variables taking the values zero or one) for every level of the variable except the last level. If there is no intercept
By default, the LIFEREG Procedure computes initial values for the parameters using ordinary least squares (OLS) ignoring censoring. This might not be the best set of starting values for a given set of data. For example, if there are extreme values in your data the OLS fit may be excessively influenced by the extreme observations,
You can specify the INITIAL= option in the MODEL statement to override these starting values. You can also specify the INTERCEPT=, SCALE=, and SHAPE= options to set initial values of the intercept, scale, and shape parameters. For models with multilevel interaction effects, it is a little difficult to use the INITIAL= option to provide starting values for all parameters. In this case, you can use the INEST= data set. See the section
INEST= Data Set on
page 2121 for detail. The INEST= data set
The rank of the design matrix
X
is estimated before the model is fit.
The log-likelihood function is maximized by means of a ridge-stabilized Newton-Raphson algorithm. The maximized value of the log likelihood can take positive or negative values, depending on the specified model and the values of the maximum
If convergence of the maximum likelihood estimates is attained, a Type III chi-square test statistic is computed for each effect, testing whether there is any contribution from any of the levels of the effect. This statistic is computed as a quadratic form in the appropriate parameter estimates using the corresponding submatrix of the
The asymptotic covariance matrix is computed as the inverse of the
Chi-square tests for individual parameters are Wald tests based on the observed information matrix and the parameter estimates. If an effect has a single degree of freedom in the parameter estimates table, the chi-square test for this parameter is equivalent to the Type III test for this effect.
In releases previous to Version 8.2, a multiple degree of freedom statistic was computed for each effect to test for contribution from any level of the effect. In general, the Type III test statistic in a main effect only model (no interaction terms) will be equal to the previously computed effect statistic, unless there are collinearities among the effects. If there are collinearities, the Type III statistic will adjust for them, and the value of the Type III statistic and the number of degrees of freedom might not be equal to those of the previous effect statistic.
Suppose there are n observations from the model y = X ² + ƒ ˆˆ , where X i s an n — k matrix of covariate values (including the intercept), y is a vector of responses, and ˆˆ is a vector of errors with survival function S , cumulative distribution function F , and probability density function f . That is, S ( t ) = Pr( ˆˆ i >t ), F ( t ) = Pr( ˆˆ i ‰ t ), and f ( t ) = dF ( t ) /dt , where ˆˆ i is a component of the error vector. Then, if all the responses are observed, the log likelihood, L , can be written as
where
If some of the responses are left, right, or interval censored, the log likelihood can be written as
with the first sum over uncensored observations, the second sum over right-censored observations, the third sum over left-censored observations, the last sum over interval-censored observations, and
where z i is the lower end of a censoring interval.
If the response is specified in the binomial format, events / trials , then the log-likelihood function is
where
r
i
is the number of events and
n
i
is the number of trials for the
i
th observation. In this case,
. For the symmetric distributions, logistic and normal, this is the same as
. Additional information on censored and limited dependent variable models can be found in Kalbfleisch and Prentice (1980) and Maddala (1983).
The estimated covariance matrix of the parameter estimates is computed as the negative inverse of I , which is the information matrix of second derivatives of L with respect to the parameters evaluated at the final parameter estimates. If I is not positive definite, a positive definite submatrix of I is inverted, and the remaining rows and columns of the inverse are set to zero. If some of the parameters, such as the scale and intercept, are restricted, the corresponding elements of the estimated covariance matrix are set to zero. The standard error estimates for the parameter estimates are taken as the square roots of the corresponding diagonal elements.
For restrictions placed on the intercept, scale, and shape parameters, one-degree-of-freedom Lagrange
where g is the derivative of the log likelihood with respect to the restricted parameter at the restricted maximum and
where the 1 subscripts refer to the restricted parameter and the 2 subscripts refer to the unrestricted parameters. The information matrix is evaluated at the restricted maximum. These statistics are asymptotically distributed as chi-squares with one degree of freedom under the null hypothesis that the restrictions are valid, provided that some regularity conditions are satisfied. See Rao (1973, p. 418) for a more complete discussion. It is possible for these statistics to be missing if the observed information matrix is not positive definite. Higher degree-of-freedom tests for multiple restrictions are not currently computed.
A Lagrange multiplier test statistic is computed to test this constraint. Notice that this test statistic is comparable to the Wald test statistic for testing that the scale is one. The Wald statistic is the result of squaring the difference of the estimate of the scale parameter from one and dividing this by the square of its estimated standard error.
For each distribution, the baseline survival function (
S
) and the probability density function(
f
) are listed for the additive random disturbance (
y
or log(
T
)) with location parameter
¼
and scale parameter
ƒ
. See the section Overview on page 2083. These distributions apply when the log of the response is
For example, for the WEIBULL distribution, S ( w ) and f ( w ) are the survival function and the probability density function for the extreme value distribution (distribution of the log of the response) while G ( t ) and g ( t ) are the survival function and the probability density function of a Weibull distribution (using the untransformed response).
The
Additionally, it is worth mentioning that, for the Weibull distribution, the accelerated failure time model is also a proportional-hazards model. However, the parameterization for the covariates
The distributions supported in the LIFEREG procedure follow. ¼ = Intercept and ƒ = Scale in the output.
where exp( ˆ’ ¼ ) = ± .
where “( a ) denotes the complete gamma function, “( a, z ) denotes the incomplete gamma function, and is a free shape parameter. The parameter is referred to as Shape by PROC LIFEREG. Refer to Lawless, 1982, p.240 and Klein and Moeschberger, 1997, p.386 for a description of the generalized gamma distribution.
where ³ = 1 / ƒ and ± = exp( ˆ’ ¼ / ƒ ).
where is the cumulative distribution function for the normal distribution.
where ƒ =1 / ³ and ± = exp( ˆ’ ¼ / ƒ ).
If your parameterization is different from the ones shown here, you can still use the procedure to fit your model. For example, a common parameterization for the Weibull distribution is
so that » = exp( ¼ ) and ² = 1 / ƒ .
Again note that the expected value of the baseline log response is, in general, not zero and that the distributions are not symmetric in all cases. Thus, for a given set of covariates, x , the expected value of the log response is not always x ² ² .
Some relations among the distributions are as
The gamma with Shape =1 is a Weibull distribution.
The gamma with Shape =0 is a lognormal distribution.
The Weibull with Scale =1 is an exponential distribution.
For a given set of covariates, x (including the intercept term), the p th quantile of the log response, y p , is given by
where
u
p
is the
p
th quantile of the baseline distribution. The estimated quantile is computed by replacing the unknown parameters with their estimates, including any shape parameters on which the baseline distribution might depend. The estimated quantile of the original response is obtained by taking the exponential of the estimated log quantile unless the NOLOG option is specified in the
The standard errors of the quantile estimates are computed using the estimated covariance matrix of the parameter estimates and a Taylor series expansion of the quantile estimate. The standard error is computed as
where V is the estimated covariance matrix of the parameter vector ( ² ² , ƒ , ) ² , and z is the vector
where
is the vector of the shape parameters. Unless the NOLOG option is specified, this standard error estimate is converted into a standard error estimate for exp(
y
p
) as exp(
·
p
)STD. It may be more desirable to compute confidence limits for the log response and convert them back to the original response variable than to use the standard error estimates for exp(
y
p
) directly. See Example 39.1 for a 90% confidence interval of the response
The variable, CDF , is computed as
where the
and F is the baseline cumulative distribution function.
Confidence intervals are computed for all model parameters and are
A two-sided (1
ˆ’
±
) — 100% confidence interval [
²
iL
,
²
iU
] for the regression parameter
²
i
is based on the asymptotic normality of the maximum likelihood
i
and is computed by
where SE
i
is the estimated standard error of
i
, and
z
p
is the
p
— 100% percentile of the standard normal distribution.
A two-sided (1
ˆ’
±
) — 100% confidence interval [
ƒ
L
,
ƒ
U
] for the scale parameter
ƒ
in the location-scale model is based on the asymptotic normality of the logarithm of the maximum likelihood estimator log(
), and is computed by
Refer to Meeker and Escobar (1998) for more information.
The Weibull distribution scale parameter · and shape parameter ² are obtained by transforming the extreme value location parameter ¼ and scale parameter ƒ :
Consequently, two-sided (1 ˆ’ ± ) — 100% confidence intervals for the Weibull scale and shape parameters are computed as
A two-sided (1 ˆ’ ± ) — 100% confidence interval for the 3-parameter gamma shape parameter is computed by
Probability plots are useful tools for the display and analysis of lifetime data. Probability plots use an inverse distribution scale so that a cumulative distribution function (CDF) plots as a straight line. A nonparametric estimate of the CDF of the lifetime data will plot approximately as a straight line, thus providing a visual assessment of goodness-of-fit.
You can use the PROBPLOT statement in LIFEREG to create probability plots of data that are complete, right-censored, interval-censored, or a combination of censoring types (arbitrarily censored). A line representing the maximum likelihood fit from the MODEL statement and pointwise parametric confidence bands for the cumulative probabilities are also included on the plot.
A random variable Y belongs to a location-scale family of distributions if its CDF F is of the form
where ¼ is the location parameter and ƒ is the scale parameter. Here, G is a CDF that cannot depend on any unknown parameters, and G is the CDF of Y if ¼ = 0 and ƒ =1. For example, if Y is a normal random variable with mean ¼ and standard deviation ƒ ,
and
The normal, extreme value, and logistic distributions are location-scale models. The 3-parameter gamma distribution is a location-scale model if the shape parameter is fixed. If T has a lognormal, Weibull, or log-logistic distribution, then log( T ) has a distribution that is a location-scale model. Probability plots are constructed for lognormal, Weibull, and log-logistic distributions by using log( T ) instead of T in the plots.
Let
y
(1)
‰
y
(2)
‰
...
‰
y
(
n
)
be ordered observations of a random sample with distribution function
F
(
y
). A probability plot is a plot of the points
y
(
i
)
against
m
i
=
G
ˆ’
1
(
a
i
), where
a
i
=
(
y
i
) is an estimate of the CDF
The nonparametric CDF estimates
a
i
are sometimes called
plotting
If
F
is one of the location-scale distributions, then
y
is the lifetime;
If the data actually have the stated distribution, then
‰ˆ
F
,
and points ( y ( i ) , m i ) should fall approximately on a straight line.
There are several ways to compute the nonparametric CDF estimates used in probability plots from lifetime data. These are discussed in the
The censoring times must be taken into account when you compute plotting positions for right-censored data. The modified Kaplan-Meier method described in the following section is the default method for computing nonparametric CDF estimates for display on probability plots. Refer to Abernethy (1996), Meeker and Escobar (1998), and Nelson (1982) for discussions of the
Let
y
(1)
‰
y
(2)
‰
...
‰
y
(
n
)
be ordered observations of a random sample including failure times and
with S = 1. The expected rank plotting position is computed as a i = 1 “ S i . The option PPOS=EXPRANK specifies the expected rank plotting position.
For the Kaplan-Meier method,
The Kaplan-Meier plotting position is then computed as
. The option PPOS=KM specifies the Kaplan-Meier plotting position.
For the modified Kaplan-Meier method, use
where
S
i
is computed from the Kaplan-Meier formula with
S
= 1. The plotting position is then computed as
The option PPOS=MKM specifies the modified Kaplan-Meier plotting position. If the PPOS option is not specified, the modified Kaplan-Meier plotting position is used as the default method.
For complete samples,
a
i
=
i/
(
n
+ 1) for the expected rank method,
for the Kaplan-Meier method, and
for the modified Kaplan-Meier method. If the largest observation is a failure for the Kaplan-Meier estimator, then
F
n
= 1 and the point is not plotted.
Let
y
(1)
‰
y
(2)
‰
...
‰
y
(
n
)
be ordered observations of a random sample including failure times and censor times. A failure order number
j
i
is assigned to the
i
th failure:
j
i
=
j
i
ˆ’
1
+
”
, where
j
= 0. The increment
”
is initially 1 and is modified when a censoring time is
The plotting position is computed for the i th failure time as
For complete samples, the failure order number
j
i
is equal to
i
, the order of the failure in the sample. In this case, the preceding equation for
a
i
is an
The LIFEREG procedure can create probability plots for data that consists of combinations of exact, left-censored, right-censored, and interval-censored lifetimes, that is, arbitrarily censored data. The LIFEREG procedure uses an iterative algorithm developed by Turnbull (1976) to compute a nonparametric maximum likelihood estimate of the cumulative distribution function for the data. Since the technique is maximum likelihood, standard errors of the cumulative probability estimates are computed from the inverse of the associated Fisher information matrix. This algorithm is an example of the expectation-maximization (EM) algorithm. The default initial estimate
If an interval probability is smaller than a tolerance (10 ˆ’ 6 by default) after convergence, the probability is set to zero, the interval probabilities are renormalized so that they add to one, and iterations are restarted. Usually the algorithm converges in just a few more iterations. You can change the default value of the tolerance with the TOLPROB= option. You can specify the NOPOLISH option to avoid setting small probabilities to zero and restarting the algorithm.
If you specify the ITPRINTEM option, a table summarizing the Turnbull estimate of the interval probabilities is displayed. The columns labeled Reduced Gradient and Lagrange Multiplier are used in checking final convergence of the maximum likelihood estimate. The Lagrange multipliers must all be greater than or equal to zero, or the solution is not maximum likelihood. Refer to Gentleman and Geyer (1994) for more details of the convergence checking. Also refer to Meeker and Escobar (1998, chap. 3) for more information.
See Example 39.6 on page 2142 for an illustration.
You can use the PPOUT option in the PROBPLOT statement to create a table containing the nonparametric CDF estimates computed by the selected method, Kaplan-Meier CDF estimates, standard errors of the Kaplan-Meier estimator, and nonparametric confidence limits for the CDF. The confidence limits are either pointwise or simultaneous, depending on the value of the NPINTERVALS= option in the PROBPLOT statement. The method used in the LIFEREG procedure for computation of approximate pointwise and simultaneous confidence intervals for cumulative failure probabilities relies on the Kaplan-Meier estimator of the cumulative distribution function of failure time and approximate standard deviation of the Kaplan-Meier estimator. For the case of arbitrarily censored data, the Turnbull algorithm, discussed previously, provides an extension of the Kaplan-Meier estimator. Both the Kaplan-Meier and the Turnbull estimators provide an estimate of the standard error of the CDF estimator, se
that is used in computing confidence intervals.
Approximate (1 ˆ’ ± )100% pointwise confidence intervals are computed as in Meeker and Escobar (1998, section 3.6) as
where
where z p is the p th quantile of the standard normal distribution.
Approximate (1 ˆ’ ± )100% simultaneous confidence bands valid over the lifetime interval ( t a , t b ) are computed as the Equal Precision case of Nair (1984) and Meeker and Escobar (1998, section 3.8) as
where
where the factor x = e a,b, 1 ˆ’ ± / 2 is the solution of
The time interval (
t
a
, t
b
) over which the bands are valid depends in a complicated way on the constants
a
and
b
defined in Nair (1984), 0
< a < b <
1.
a
and
b
are chosen by default so that the confidence bands are valid between the lowest and highest times corresponding to failures in the case of multiply censored data, or, to the
If specified, the INEST= data set specifies initial estimates for all the parameters in the model. The INEST= data set must contain the intercept variable (named Intercept ) and all independent variables in the MODEL statement.
If BY processing is used, the INEST= data set should also include the BY variables, and there must be at least one observation for each BY group. If there is more than one observation in one BY group, the first one read is used for that BY
If the INEST= data set also contains the _ TYPE_ variable, only observations with _ TYPE_ value PARMS are used as starting values. Combining the INEST= data set and the MAXITER= option in the MODEL statement, partial scoring can be done, such as predicting on a validation data set by using the model built from a training data set.
You can specify starting values for the iterative algorithm in the INEST= data set. This data set overwrites the INITIAL= option in the MODEL statement, which is a little difficult to use for models including multilevel interaction effects. The INEST= data set has the same structure as the OUTEST= data set but is not required to have all the variables or observations that appear in the OUTEST= data set. One simple use of the INEST= option is passing the previous OUTEST= data set directly to the next model as an INEST= data set,
The OUTEST= data set contains parameter estimates and the log likelihood for the model. You can specify a label in the MODEL statement to distinguish between the estimates for different modeling using the LIFEREG procedure. If the COVOUT option is specified, the OUTEST= data set also contains the estimated covariance matrix of the parameter estimates. Note that, if the LIFEREG procedure does not converge, the parameter estimates are set to missing in the OUTEST data set.
The OUTEST= data set contains all variables specified in the MODEL statement and the BY statement. One observation consists of parameter values for the model with the dependent variable having the value ˆ’ 1. If the COVOUT option is specified, there are additional observations containing the rows of the estimated covariance matrix. For these observations, the dependent variable contains the parameter estimate for the corresponding row variable. The following variables are also added to the data set:
|
_ MODEL_ |
a character variable containing the label of the MODEL statement, if present. Otherwise, the variable s value is blank. |
|
_ NAME_ |
a character variable containing the name of the dependent variable for the parameter estimates observations or the
|
|
_ TYPE_ |
a character variable containing the type of the observation, either PARMS for parameter estimates or COV for covariance estimates |
|
_ DIST_ |
a character variable containing the name of the distribution modeled |
|
_ LNLIKE_ |
a numeric variable containing the last computed value of the log likelihood |
|
INTERCEPT |
a numeric variable containing the intercept parameter estimates and covariances |
|
_ SCALE_ |
a numeric variable containing the scale parameter estimates and covariances |
|
_ SHAPE1_ |
a numeric variable containing the first shape parameter estimates and covariances if the specified distribution has additional shape parameters |
Any BY variables specified are also added to the OUTEST= data set.
The XDATA= data set is used for plotting the predicted probability when there are covariates specified in a MODEL statement and a probability plot is specified with a PROBPLOT statement. See Example 39.4 on page 2136 for an illustration.
The XDATA= data set is an input SAS data set that contains values for all the independent variables in the MODEL statement and variables in the CLASS statement.The XDATA= data set has the same structure as the DATA= data set but is not required to have all the variables or observations that appear in the DATA= data set.
The XDATA= data set must contain all the independent variables in the MODEL statement and variables in the CLASS statement. Even though variables in the CLASS statement may not be used, valid values are required for these variables in the XDATA= data set. Missing values are not allowed. Missing values are not allowed in the XDATA= data set for any of the independent variables either. Missing values are allowed for the dependent variables and other variables if they are included in the XDATA= data set.
If BY processing is used, the XDATA= data set should also include the BY variables, and there must be at least one valid observation for each BY group. If there is more than one valid observation in a BY group, the last one read is used for that BY group.
If there is no XDATA= data set in the PROC LIFEREG statement, by default, the LIFEREG procedure will use the overall mean for effects containing a continuous variable (or variables) and the highest level of a single classification variable as reference level. The rules are summarized as follows:
If the effect contains a continuous variable (or variables), the overall mean of this effect (not the variables) is used.
If the effect is a single classification variable, the highest level of the variable is used.
Let p be the number of parameters estimated in the model. The minimum working space (in bytes) needed is
However, if sufficient space is available, the input data set is also kept in memory; otherwise, the input data set is reread for each evaluation of the likelihood function and its derivatives, with the resulting execution time of the procedure substantially increased.
Let n be the number of observations used in the model estimation. Each evaluation of the likelihood function and its first and second derivatives requires O ( np 2 ) multiplications and additions, n individual function evaluations for the log density or log distribution function, and n evaluations of the first and second derivatives of the function. The calculation of each updating step from the gradient and Hessian requires O ( p 3 ) multiplications and additions. The O ( v ) notation means that, for large values of the argument, v , O ( v ) is approximately a constant times v .
For each model, PROC LIFEREG displays
the name of the Data Set
the name of the Dependent Variable
the name of the Censoring Variable
the Censoring Value(s) that
the number of Noncensored and Censored Values
the final estimate of the maximized log likelihood
the iteration history and the Last Evaluation of the Gradient and Hessian if the ITPRINT option is specified (not shown)
For each explanatory variable in the model, the LIFEREG procedure displays
the name of the Variable
the degrees of freedom (DF) associated with the variable in the model
the Estimate of the parameter
the standard error (Std Err) estimate from the observed information matrix
an approximate chi-square statistic for testing that the parameter is zero (the class variables also have an overall chi-square test statistic computed that precedes the individual level parameters)
the probability of a larger chi-square value (Pr>Chi)
the Label of the variable or, if the variable is a class level, the Value of the class variable
If there are constrained parameters in the model, such as the scale or intercept, then PROC LIFEREG displays a Lagrange multiplier test for the constraint.
PROC LIFEREG assigns a name to each table it creates. You can use these names to reference the table when using the Output Delivery System (ODS) to select tables and create output data sets. These names are listed in the following table. For more information on ODS, see Chapter 14, Using the Output Delivery System.
|
ODS Table Name |
Description |
Statement |
Option |
|---|---|---|---|
|
ClassLevels |
Class variable levels |
CLASS |
default [ *] |
|
ConvergenceStatus |
Convergence status |
MODEL |
default |
|
CorrB |
Parameter estimate correlation matrix |
MODEL |
CORRB |
|
CovB |
Parameter estimate covariance matrix |
MODEL |
COVB |
|
EMIterHistory |
Iteration history for Turnbull algorithm |
PROBPLOT |
ITPRINTEM |
|
IterHistory |
Iteration history |
MODEL |
ITPRINT |
|
LagrangeStatistics |
Lagrange statistics |
MODEL |
NOINT NOSCALE |
|
LastGrad |
Last evaluation of the gradient |
MODEL |
ITPRINT |
|
LastHess |
Last evaluation of the Hessian |
MODEL |
ITPRINT |
|
ModelInfo |
Model information |
MODEL |
default |
|
NObs |
Observations Summary |
PROC |
default |
|
ParameterEstimates |
Parameter estimates |
MODEL |
default |
|
ParmInfo |
Parameter indices |
MODEL |
default |
|
ProbEstimates |
Nonparametric CDF estimates |
PROBPLOT |
PPOUT |
|
Turnbull |
Probability estimates from Turnbull algorithm |
PROBPLOT |
ITPRINTEM |
|
Type3Analysis |
Type 3 tests |
MODEL |
default [ *] |
|
[ *] Depends on data. |
|||