Chapter 54: The PHREG Procedure


Overview

The analysis of survival data requires special techniques because the data are almost always incomplete, and familiar parametric assumptions may be unjustifiable. Investigators follow subjects until they reach a prespecified endpoint (for example, death). However, subjects sometimes withdraw from a study, or the study is completed before the endpoint is reached. In these cases, the survival times (also known as failure times) are censored ; subjects survived to a certain time beyond which their status is unknown. The uncensored survival times are sometimes referred to as event times. Methods for survival analysis must account for both censored and uncensored data.

There are many types of models that have been used for survival data. Two of the more popular types of models are the accelerated failure time model (Kalbfleisch and Prentice 1980) and the Cox proportional hazards model (Cox 1972). Each has its own assumptions on the underlying distribution of the survival times. Two closely related functions often used to describe the distribution of survival times are the survivor function and the hazard function (see the section Failure Time Distribution on page 3239 for definitions). The accelerated failure time model assumes a parametric form for the effects of the explanatory variables and usually assumes a parametric form for the underlying survivor function. Cox s proportional hazards model also assumes a parametric form for the effects of the explanatory variables , but it allows an unspecified form for the underlying survivor function.

The PHREG procedure performs regression analysis of survival data based on the Cox proportional hazards model. Cox s semiparametric model is widely used in the analysis of survival data to explain the effect of explanatory variables on hazard rates. The survival time of each member of a population is assumed to follow its own hazard function, » i ( t ), expressed as

click to expand

where » ( t ) is an arbitrary and unspecified baseline hazard function, Z i is the vector of explanatory variables for the i th individual, and ² is the vector of unknown regression parameters associated with the explanatory variables. The vector ² is assumed to be the same for all individuals. The survivor function can be expressed as

click to expand

where click to expand is the baseline survivor function. To estimate ² , Cox (1972; 1975) introduced the partial likelihood function, which eliminates the unknown baseline hazard » ( t ) and accounts for censored survival times.

The partial likelihood of Cox also allows time-dependent explanatory variables. An explanatory variable is time-dependent if its value for any given individual can change over time. Time-dependent variables have many useful applications in survival analysis. You can use a time-dependent variable to model the effect of subjects changing treatment groups. Or you can include time-dependent variables such as blood pressure or blood chemistry measures that vary with time during the course of a study. You can also use time-dependent variables to test the validity of the proportional hazards model.

An alternative way to fit models with time-dependent explanatory variables is to use the counting process style of input. The counting process formulation enables PROC PHREG to fit a superset of the Cox model, known as the multiplicative hazards model. This extension also includes recurrent events data and left truncation of failure times. The theory of these models is based on the counting process pioneered by Andersen and Gill (1982), and the model is often referred to as the Andersen-Gill Model.

Multivariate failure time data arise when each study subject can potentially experience several events (for instance, multiple infections after surgery) or when there exists some natural or artificial clustering of subjects (for instance, a litter of mice) that induces dependence among the failure times of the same cluster. Data in the former situation are referred to as multiple events data, which include recurrent events data as a special case; data in the latter situation are referred to as clustered data. You can use PROC PHREG to carry out various methods for analyzing these data.

The population under study may consist of a number of subpopulations, each of which has its own baseline hazard function. PROC PHREG performs a stratified analysis to adjust for such subpopulation differences. Under the stratified model, the hazard function for the j th individual in the i th stratum is expressed as

click to expand

where » i ( t ) is the baseline hazard function for the i th stratum, and Z ij is the vector of explanatory variables for the individual. The regression coefficients are assumed to be the same for all individuals across all strata.

Ties in the failure times may arise when the time scale is genuinely discrete or when survival times generated from the continuous-time model are grouped into coarser units. The PHREG procedure includes four methods of handling ties. The discrete logistic model is available for discrete time-scale data. The other three methods apply to continuous time-scale data. The exact method computes the exact conditional probability under the model that the set of observed tied event times occurs before all the censored times with the same value or before larger values. Breslow and Efron methods provide approximations to the exact method.

Variable selection is a typical exploratory exercise in multiple regression when the investigator is interested in identifying important prognostic factors from a large number of candidate variables. The PHREG procedure provides four model selection methods: forward selection, backward elimination , stepwise selection, and best subset selection. The best subset selection method is based on the likelihood score statistic. This method identifies a specified number of best models containing one, two, three variables and so on, up to the single model containing all of the explanatory variables.

The PHREG procedure also enables you to

  • include an offset variable in the model

  • weight the observations in the input data

  • test linear hypotheses about the regression parameters

  • perform conditional logistic regression analysis for matched case-control studies

  • create a SAS data set containing survivor function estimates, residuals, and regression diagnostics

  • create a SAS data set containing survival distribution estimates and confidence interval for the survivor function at each event time for a given realization of the explanatory variables

PROC PHREG can also be used to fit the multinomial logit choice model to discrete choice data. See [http://support.sas.com/techsup/tnote/tnote_ stat.html#market] for more information on discrete choice modeling and the multinomial logit model. Look for the latest Discrete Choice report.

The remaining sections of this chapter contain information on how to use PROC PHREG, information on the underlying statistical methodology, and some sample applications of the procedure. The Getting Started section on page 3217 introduces PROC PHREG with two examples. The Syntax section on page 3221 describes the syntax of the procedure. The Details section on page 3239 summarizes the statistical techniques employed in PROC PHREG. The Examples section on page 3272 includes eight additional examples of useful applications. Experienced SAS/STAT software users may decide to proceed to the Syntax section, while other users may choose to read both the Getting Started and Examples sections before proceeding to Syntax and Details.

Experimental graphics are now available in PROC PHREG for model assessment. For more information, see the section ODS Graphics on page 3271.




SAS.STAT 9.1 Users Guide (Vol. 5)
SAS.STAT 9.1 Users Guide (Vol. 5)
ISBN: N/A
EAN: N/A
Year: 2004
Pages: 98

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net