Details


Failure Time Distribution

Let T be a nonnegative random variable representing the failure time of an individual from a homogeneous population. The survival distribution function (also known as the survivor function) of T is written as

A mathematically equivalent way of specifying the distribution of T is through its hazard function. The hazard function » ( t ) specifies the instantaneous failure rate at t . If T is a continuous random variable, » ( t ) is expressed as

click to expand

where f ( t ) is the probability density function of T . If T is discrete with masses at x 1 <x 2 < ... , then » ( t ) is given by

click to expand

where

click to expand

for j =1 , 2 ,

Partial Likelihood Function for the Cox Model

Let Z l ( t ) denote the vector explanatory variables for the l th individual at time t . Let t 1 < t 2 < < t k denote the k distinct, ordered event times. Let d i denote the multiplicity of failures at t i ; that is, d i is the size of the set D i of individuals that fail at t i . Let w l be the weight associated with the l th individual. Using this notation, the likelihood functions used in PROC PHREG to estimate ² are described in the following sections.

Continuous Time Scale

Let R i denote the risk set just before the i th ordered event time t i . Let denote the set of individuals whose event or censored times exceed t i or whose censored times are equal to t i .

Exact Likelihood
click to expand
Breslow Likelihood
click to expand

Incorporating weights, the Breslow likelihood becomes

click to expand
Efron Likelihood
click to expand

Incorporating weights, the Efron likelihood becomes

click to expand

Discrete Time Scale

Let Q i denote the set of all subsets of d i individuals from the risk set R i . For each q ˆˆ Q i , q is a d i -tuple ( q 1 ,q 2 , ,q d i ) of individuals who might have failed at t i .

Discrete Logistic Likelihood
click to expand

The computation of L 4 ( ² ) and its derivatives is based on an adaptation of the recurrence algorithm of Gail et al. (1981) to the logarithmic scale. When there are no ties on the event times (that is, d i 1), all four likelihood functions L 1 ( ² ), L 2 ( ² ), L 3 ( ² ), and L 4 ( ² ) reduce to the same expression. In a stratified analysis, the partial likelihood is the product of the partial likelihood functions for the individual strata.

Counting Process Style of Input

In the counting process formulation, data for each subject are identified by a triple { N, Y, Z } of counting, at risk, and covariate processes. Here, N ( t ) indicates the number of events that the subject experiences over the time interval (0 ,t ]; Y ( t ) indicates whether the subject is at risk at time t (one if at risk and zero otherwise ); and Z ( t ) is a vector of explanatory variables for the subject at time t . The sample path of N is a step function with jumps of size +1 at the event times, and N (0) = 0. Unless Z ( t ) changes continuously with time, the data for each subject can be represented by multiple observations, each identifying a semiclosed time interval ( t 1 ,t 2], the values of the explanatory variables over that interval, and the event status at t 2. The subject remains at risk during the interval ( t 1 ,t 2], and an event may occur at t 2. Values of the explanatory variables for the subject remain unchanged in the interval. This style of data input was originated by Terry M. Therneau (1994).

For example, a patient has a tumor recurrence at weeks 3, 10, and 15 and is followed to week 23. The explanatory variables are Trt (treatment), Z1 (initial tumor number), and Z2 (initial tumor size), and, for this patient, the values of Trt , Z1 , and Z2 are (1,1,3). The data for this patient are represented by the following four observations:

T1

T2

Event

Trt

Z1

Z2

3

1

1

1

3

3

10

1

1

1

3

10

15

1

1

1

3

15

23

1

1

3

Here ( T1 , T2 ] contains the at-risk intervals. The variable Event is a censoring variable indicating whether a recurrence has occurred at T2 ; a value of 1 indicates a tumor recurrence, and a value of 0 indicates nonrecurrence. The PHREG procedure fits the multiplicative hazards model, which is specified as follows :

  proc phreg;   model (T1,T2) * Event(0) = Trt Z1 Z2;   run;  

Another useful application of the counting process formulation is delayed entry of subjects into the risk set. For example, in studying the mortality of workers exposed to a carcinogen, the survival time is chosen to be the workers age at death by malignant neoplasm. Any worker joining the workplace at a later age than a given event failure time is not included in the corresponding risk set. The variables of a worker consist of Entry (age at which the worker entered the workplace), Age (age at death or age censored), Status (an indicator of whether the observation time is censored, with the value 0 identifying a censored time), and X1 and X2 (explanatory variables thought to be related to survival). The specification for such an application is as follows.

  proc phreg;   model (Entry, Age) * Status(0) = X1 X2;   run;  

Alternatively, you can use a time-dependent variable to control the risk set, as illustrated in the following specification:

  proc phreg;   model Age * Status(0) = X1 X2;   if Age < Entry then X1= .;   run;  

Here, X1 becomes a time-dependent variable. At a given death time t , the value of X1 is reevaluated for each subject with Age t ; subjects with Entry > t are given a missing value in X1 and are subsequently removed from the risk set. Computationally, this approach is not as efficient as the one that uses the counting process formulation.

The Multiplicative Hazards Model

Consider a set of n subjects such that the counting process N i { N i ( t ) ,t 0} for the i th subject represents the number of observed events experienced over time t . The sample paths of the process N i are step functions with jumps of size +1, with N i (0) = 0. Let ² denote the vector of unknown regression coefficients. The multiplicative hazards function ( t, Z i ( t )) for N i is given by

click to expand

where

  • Y i ( t ) indicates whether the i th subject is at risk at time t ( specifically , Y i ( t ) = 1 if at risk and Y i ( t ) = 0 otherwise)

  • Z i ( t ) is the vector of explanatory variables for the i th subject at time t

  • ( t ) is an unspecified baseline hazard function

Refer to Fleming and Harrington (1991)andAndersen et al. (1992). The Cox model is a special case of this multiplicative hazards model, where Y i ( t ) = 1until the first event or censoring, and Y i ( t ) = 0 thereafter.

The partial likelihood for n independent triplets ( N i ,Y i , Z i ) ,i = 1 ,..., n , has the form

click to expand

where N i ( t ) = 1 if N i ( t ) ˆ’ N i ( t ˆ’ ) = 1, and N i ( t ) = 0 otherwise.

Proportional Rates/Means Models for Recurrent Events

Let N ( t ) be the number of events experienced by a subject over the time interval (0 ,t ]. Let dN ( t ) be the increment of the counting process N over [ t, t + dt ). The rate function is given by

click to expand

where µ ( . ) is an unknown continuous function. If the Z are time-independent, the rate model is reduced to the mean model

click to expand

The partial likelihood for n independent triplets ( N i , Y i , Z i ) ,i = 1 , ..., n , of counting, at-risk, and covariate processes is the same as that of the multiplicative hazards model. However, a robust sandwich estimate is used for the covariance matrix of the parameter estimator instead of the model-based estimate.

Let T ki be the k th event time of the i th subject. Let C i be the censoring time of the i th subject. The at-risk indicator and the failure indicator are, respectively,

click to expand

Denote

click to expand

Let be the maximum likelihood estimate of ² and let I ( ) be the observed information matrix. The robust sandwich covariance matrix estimate is given by

click to expand

where

click to expand

For a given realization of the covariates ¾ , the Nelson estimator is used to predict the amean function

click to expand

with standard error estimate given by

click to expand

where

click to expand

Since the cumulative mean function is always nonnegative, the log transform is used to compute confidence intervals. The 100(1 ˆ’ ± )% pointwise confidence limits for µ ¾ ( t ) are

where z ± / 2 is the upper 100 ± / 2 percentage point of the standard normal distribution.

Newton-Raphson Method

Let L ( ² ) be one of the likelihood functions described in the previous subsections. Let l ( ² ) = log L ( ² ). Finding ² such that L ( ² ) is maximized is equivalent to finding the solution to the likelihood equations

With = as the initial solution, the iterative scheme is expressed as

click to expand

The term after the minus sign is the Newton-Raphson step. If the likelihood function evaluated at j +1 is less than that evaluated at j , then j +1 is recomputed using half the step size. The iterative scheme continues until convergence is obtained, that is, until j +1 is sufficiently close to j . Then the maximum likelihood estimate of ² is = j +1 .

The model-based variance estimate of is obtained by inverting the information matrix I ( )

click to expand

Robust Sandwich Variance Estimate

For the i th subject, i = 1 , ..., n , let X i , w i , and Z i ( t ) be the observed time, weight, and the covariate vector at time t , respectively. Let i be the event indicator and let Y i ( t ) = I ( X i t ). Let

click to expand

Let . The score residual for the i th individual is

click to expand

The robust sandwich variance estimate of derived by Binder (1992) who incorporated weights into the analysis is

click to expand

where I ( ) is the observed information matrix, and a 2 = aa ² . Note that when w i 1,

where D is the matrix of DFBETAS residuals. This robust variance estimate was proposed by Lin and Wei (1989)andReid and Cr peau (1985).

Testing the Global Null Hypothesis

The following three likelihood statistics can be used to test the global null hypothesis H : ² = . Under mild assumptions, each statistic has an asymptotic chi-square distribution with p degrees of freedom given the null hypothesis. The value p is the dimension of ² .

Likelihood Ratio Test

click to expand

This formulation of the likelihood ratio test is not appropriate for the COVS option.

Walds Test

click to expand

where ( ) = m ( ) for the model-based variance estimate and ( ) = s ( ) for the robust sandwich variance estimate.

Score Test

click to expand

where . Replacing I ˆ’ 1 ( ) by the robust sandwich variance estimate s ( ), the modified score test is also printed when the COVS option is specified.

Hazards Ratio Estimates and Confidence Limits

Let ² i and i denote the i th component of ² and , respectively. The hazards ratio (also known as risk ratio) for the explanatory variable with regression coefficient ² i is defined as exp( ² i ). The hazards ratio is estimated by exp( i ). The 100(1 ˆ’ ± )% confidence limits for the hazards ratio are calculated as

click to expand

where ii ( ) is the i th diagonal element of the estimated covariance matrix ( ), and z ± / 2 is the 100(1 ˆ’ ± / 2) percentile point of the standard normal distribution.

The hazards ratio is the ratio of the hazards functions that correspond to a change of one unit of the given variable and conditional on fixed values of all other variables.

Testing Linear Hypotheses about Regression Coefficients

Linear hypotheses for ² are expressed in matrix form as

where L is a matrix of coefficients for the linear hypotheses, and c is a vector of constants. The Wald chi-square statistic for testing H is computed as

click to expand

where ( ) is the estimated covariance matrix. Under H , has an asymptotic chi-square distribution with r degrees of freedom, where r is the rank of L .

Analysis of Multivariate Failure Time Data

Multivariate failure time data arise when each study subject can potentially experience several events (for instance, multiple infections after surgery) or when there exists some natural or artificial clustering of subjects (for instance, a litter of mice) that induces dependence among the failure times of the same cluster. Data in the former situation are referred to as multiple events data, and data in the latter situation are referred to as clustered data. The multiple events data can be further classified into ordered and unordered data. For ordered data, there is a natural ordering of the multiple failures within a subject, which includes recurrent events data as a special case. For unordered data, the multiple event times result from several concurrent failure processes.

Multiple events data can be analyzed by the Wei, Lin, and Weissfeld (1989), aka WLW, method based on the marginal Cox models. For the special case of recurrent events data, you can fit the intensity model (Andersen and Gill 1982), the proportional rates/means model (Pepe and Cai 1993; Lawless and Nadeau 1995; Lin, Wei, Yang, and Ying 2000), or the stratified models for total time and gap time proposed by Prentice, Williams, and Peterson (1981), aka PWP. For clustered data, you can carry out the analysis of Lee, Wei, and Amato (1992) based on the marginal Cox model. To use PROC PHREG to perform these analyses correctly and effectively, you have to array your data in a specific way to produce the correct risk sets.

All examples described in this section can be found in the program phrmult.sas in the SAS/STAT sample library. Furthermore, the Examples section in this chapter contains two examples to illustrate the methods for analyzing recurrent events data and clustered data.

Marginal Cox Models for Multiple Events Data

Suppose there are n subjects and each subject can experience up to K potential events. Let Z ki ( . ) be the covariate process associated with the k th event for the i th subject. The marginal Cox models are given by

click to expand

where » k ( t ) is the (event-specific) baseline hazard function for the k th event and ² k is the (event-specific) column vector of regression coefficients for the k th event. WLW estimates ² 1 , ..., ² K by the maximum partial likelihood estimates 1 , ..., K , respectively, and uses a robust sandwich covariance matrix estimate for to account for the dependence of the multiple failure times.

By using a properly prepared input data set, you can estimate the regression parameters for all the marginal Cox models and compute the robust sandwich covariance estimates in one PROC PHREG invocation. For convenience of discussion, suppose each subject can potentially experience K =3 events and there are two explanatory variables Z1 and Z2 . The event-specific parameters to be estimated are ² 1 =( ² 11 , ² 21 ) ² for the first marginal model, ² 2 = ( ² 12 , ² 22 ) ² for the second marginal model, and ² 3 = ( ² 13 , ² 23 ) ² for the third marginal model. Inference of these parameters is based on the robust sandwich covariance matrix estimate of the parameter estimators. It is necessary that each row of the input data set represents the data for a potential event of a subject. The input data set should contain

  • an ID variable for identifying the subject so that all observations of the same subject have the same ID value

  • an Enum variable to index the multiple events. For example, Enum =1 for the first event, Enum =2 for the second event, and so on.

  • a Time variable to represent the observed time from some time origin for the event. For recurrence events data, it is the time from the study entry to each recurrence.

  • a Status variable to indicate whether the Time value is a censored or uncensored time. For example, Status=1 indicates an uncensored time and Status=0 indicates a censored time.

  • independent variables ( Z1 and Z2 ).

The WLW analysis can be carried out by specifying

 proc phreg covs(aggregate);     model Time*Status(0)=Z11 Z12 Z13 Z21 Z22 Z23;     strata Enum;     id ID;     Z11= Z1 * (Enum=1);     Z12= Z1 * (Enum=2);     Z13= Z1 * (Enum=3);     Z21= Z2 * (Enum=1);     Z22= Z2 * (Enum=2);     Z23= Z2 * (Enum=3);     run; 

Variable Enum is specified in the STRATA statement so that there is one marginal Cox model for each distinct value of Enum . Variables Z11 , Z12 , Z13 , Z21 , Z22 , and Z23 in the MODEL statement are event-specific variables derived from the independent variables Z1 and Z2 by the given programming statements. In particular, variables Z11 , Z12 ,and Z13 are the event-specific variables for the explanatory variable Z1 ; variables Z21 , Z22 ,and Z23 are event-specific variables for the explanatory variable Z2 . For j = 1 , 2, and k = 1 , 2 , 3, variable Zjk contains the same values as the explanatory variable Zj for the rows that correspond to k th marginal model and the value 0 for all other rows; as such, ² jk is the regression coefficient for Zjk . You can avoid using the programming statements in PROC PHREG by creating these event-specific variables in the input data set using the same programming statements in a DATA step.

The option COVS(AGGREGATE) is specified in the PROC statement to obtain the robust sandwich estimate of the covariance matrix, and the score residuals used in computing the middle part of the sandwich estimate are aggregated over identical ID values. You can also include TEST statements in the PROC PHREG code to test various linear hypotheses of the regression parameters based on the robust sandwich covariance matrix estimate.

Consider the AIDS study data in Wei, Lin, and Weissfeld (1989) from a randomized clinical trial to assess the antiretrovial capacity of ribavirin over times in AIDS patients. Blood sample were collected at weeks 4, 8, and 12 from each patients in three treatment groups (placebo, low dose of ribavirin, and high dose). For each serum sample, the failure time is the number of days before virus positivity was detected . If the sample was contaminated or it took a longer period of time than was achievable in the laboratory, the sample was censored. For example,

  • Patient #1 in the placebo group has uncensored times 9, 6, and 7 days (that is, it took 9 days to detect viral positivity in the first blood sample, 6 days for the second blood sample, and 7 days for the third blood sample).

  • Patient #14 in the low dose group of rabavirin has uncensored times of 16 and 17 days for the first and second sample, respectively, and a censored time of 21 days for the third blood sample.

  • Patient #28 in the High dose group has an uncensored time of 21 days for the first sample, no measurement for the second blood sample, and a censored time of 25 days for the third sample.

For a full-rank parameterization, two design variables are sufficient to represent three treatment groups. Based on the reference coding with placebo as the reference, the values of the two dummy explanatory variables Z1 and Z2 representing the treatments are

Treatment Group

Z1

Z2

Placebo

Low dose ribavirin

1

High dose ribavirin

1

The bulk of the task in using PROC PHREG to perform the WLW analysis lies in the preparation of the input data set. As discussed earlier, the input data set should contain the ID , Enum , Time , and Status variables, and event-specific independent variables Z11, Z12, Z13, Z21, Z22, and Z23. Data for the three patients described earlier are arrayed as follows:

ID

Time

Status

Enum

Z1

Z2

1

9

1

1

1

6

1

2

1

7

1

3

14

16

1

1

1

14

17

1

2

1

14

21

3

1

28

21

1

1

1

28

25

3

1

The first three rows are data for Patient #1 with event times at 9, 6, and 7 days, one row for each event. The next three rows are data for Patient #14, who has an uncensored time of 16 days for the first serum sample, an uncensored time of 17 days for the second sample, and a censored time of 21 days for the third sample. The last two rows are data for Patient #28 of the high dose group ( Z1 =0 and Z2 =1). Since the patient did not have a second serum sample, there are only two rows of data.

To perform the WLW analysis, you specify

  proc phreg covs(aggregate);   model Time*Status(0)=Z11 Z12 Z13 Z21 Z22 Z23;   strata Enum;   id ID;   Z11= Z1 * (Enum=1);   Z12= Z1 * (Enum=2);   Z13= Z1 * (Enum=3);   Z21= Z2 * (Enum=1);   Z22= Z2 * (Enum=2);   Z23= Z2 * (Enum=3);   EqualLowDose: test Z11=Z12, Z12=Z23;   AverageLow: test Z11,Z12,Z13 / e average;   run;  

Two linear hypotheses are tested using the TEST statements. The specification

  EqualLowDose: test Z11=Z12, Z12=Z13;  

tests the null hypothesis ² 11 = ² 12 = ² 13 of identical low-dose effects across three marginal models. The specification

  AverageLow: test Z11,Z12,Z13 / e average;  

tests the null hypothesis of no low-dose effects (that is, ² 11 = ² 12 = ² 13 = 0). The AVERAGE option computes the optimal weights for estimating the average low-dose effect click to expand and performs a 1 DF test for testing the null hypothesis that . The E option displays the coefficients for the linear hypotheses, including the optimal weights.

Marginal Cox Models for Clustered Data

Suppose there are n clusters with K i members in the i th cluster, i = 1 , ..., n . Let Z ki ( . ) be the covariate process associated with the k th member of the i th cluster. The marginal Cox model is given by

click to expand

where » ( t ) is an arbitrary baseline hazard function and ² is the vector of regression coefficients. Lee, Wei, and Amato (1992) estimate ² by the maximum partial likelihood estimate under the independent working assumption, and use a robust sandwich covariance estimate to account for the intracluster dependence.

To use PROC PHREG to analyze the clustered data, each member of a cluster is represented by an observation in the input data set. The input data set to PROC PHREG should contain

  • an ID variable to identify the cluster so that members of the same cluster have thesameIDvalue

  • a Time variable to represent the observed survival time of a member of a cluster

  • a Status variable to indicate whether the Time value is an uncensored or censored time. For example, Status=1 indicates an uncensored time and Status=0 indicates a censored time.

  • the explanatory variables thought to be related to the failure time

Consider a tumor study in which one of three female rats of the same litter was randomly subjected to a drug treatment. The failure time is the time from randomization to the detection of tumor. If a rat died before the tumor was detected, the failure time was censored. For example,

  • In litter #1, the drug-treated rat has an uncensored time of 101 days, one untreated rat has a censored time of 49 days, and the other untreated rat has a failure time of 104 days.

  • In litter #3, the drug-treated rat has a censored time of 104 days, one untreated rat has a censored time of 102 days, and the other untreated rat has a censored time of 104 days.

In this example, a litter is a cluster and the rats of the same litter are members of the cluster. Let Trt be a 0-1 variable representing the treatment a rat received, with value 1 for drug treatment and 0 otherwise. Data for the two litters of rats described earlier contribute six observations to the input data set:

Litter

Time

Status

Trt

1

101

1

1

49

1

1

104

3

104

1

3

102

3

104

The analysis of Lee, Wei, and Amato (1992) can be performed by PROC PHREG as follows:

  proc phreg covs(aggregate);   model Time*Status(0)=Treatment;   id Litter;   run;  

Intensity and Rate/Mean Models for Recurrent Events Data

Suppose each subject experiences recurrences of the same phenomenon . Let N ( t ) be the number of events a subject experiences over the interval [0, t ] and let Z ( . ) be the covariate process of the subject.

The intensity model (Andersen and Gill 1982)isgivenby

click to expand

where F t represents all the information of the processes N and Z up to time t , » ( t ) is an arbitrary baseline intensity function, and ² is the vector of regression coefficients. This model consists of two components : 1) all the influence of the prior events on future recurrences, if there is any, is mediated through the time-dependent covariates, and 2) the covariates have multiplicative effects on the instantaneous rate of the counting process. If the covariates are time invariant, the risk of recurrences is unaffected by the past events.

The proportional rates and means models (Pepe and Cai 1993; Lawless and Nadeau 1995; Lin, Wei, Yang, and Ying 2000) assume that the covariates have multiplicative effects on the mean and rate functions of the counting process. The rate function is given by

click to expand

where µ ( t ) is an unknown continuous function and ² is the vector of regression parameters. If Z is time invariant, the mean function is given by

click to expand

For both the intensity and the proportional rates/means models, estimates of the regression coefficients are obtained by solving the the partial likelihood score function. However, the covariance matrix estimate for the intensity model is computed as the inverse of the observed information matrix, while that for the proportional rates/means model is given by a sandwich estimate. For a given pattern of fixed covariates, the Nelson estimate for the cumulative intensity function is the same for the cumulative mean function, but their standard errors are not the same.

To fit the intensity or rate/mean model using PROC PHREG, the counting process style of input is needed. A subject with K events contributes K +1 observations to the input data set. The k th observation of the subject identifies the time interval from the ( k ˆ’ 1)th event or time 0 (if k =1) to the k th event, k =1 , ..., K . The ( K +1)th observation represents the time interval from the K th event to time of censorship. The input data set should contain

  • a TStart variable to represent the ( k ˆ’ 1)th recurrence time or the value 0 if k =1

  • a TStop variable to represent the k th recurrence time or the follow-up time if k = K +1

  • a Status variable indicating whether the TStop time is a recurrence time or a censored time; for example, Status =1 for a recurrence time and Status =0 for censored time

  • explanatory variables thought to be related to the recurrence times

If the rate/mean model is used, the input data should also contain an ID variable for identifying the subjects.

Consider the Chronic Granulomatous Disease (CGD) data listed in Fleming and Harrington (1991). The disease is a rare disorder characterized by recurrent pyrogenic infections. The study is a placebo-controlled randomized clinical trial conducted by the International CGD Cooperative Study to assess the effect of gamma interferon to reduce the rate of infection. For each study patient the times of recurrent infections along with a number of prognostic factors were collected. For example,

  • Patient #17404, age 38, in the gamma interferon group had a follow-up time of 293 without any infection.

  • Patient #204001, age 12, in the placebo group had an infection at 219 days, a recurrent infection at 373 days, and was followed up to 414 days.

Let Trt be the variable representing the treatment status with value 1 for gamma interferon and value 2 for placebo. Let Age be a covariate representing the age of the CGD patient. Data for the two CGD patients described earlier are given in the following table.

ID

TStart

TStop

Status

Trt

Age

174054

293

1

38

204001

219

1

2

12

204001

219

373

1

2

12

204001

373

414

2

12

Since Patient #174054 had no infection through the end of the follow-up period (293 days), there is only one observation representing the period from time 0 to the end of the follow-up. Data for Patient #204001 are broken into three observations, since there are two infections. The first observation represents the period from time 0 to the first infection, the second observation represents the period from the first infection to the second infection, and the third time period represents the period from the second infection to the end of the follow-up.

The following specification fits the intensity model.

  proc phreg;   model (TStart,TStop)*Status(0)=Trt Age;   run;  

You can predict the cumulative intensity function for the a given pattern of fixed covariates by specifying the CUMHAZ= option in the BASELINE statement. Suppose you are interested in two fixed patterns, one for patients of age 30 in the gamma interferon group and the other for patients of age 1 in the placebo group. You first create the SAS data set as follows:

  data Pattern;   Trt=1; Age=30;   output;   Trt=2; Age=1;   output;   run;  

You then include the following BASELINE statement in the PROC PHREG specification. The CUMHAZ=_all_ option produces the cumulative hazard function estimates, the standard error estimates, and the lower and upper pointwise confidence limits.

  baseline data=pattern out=out1 cumhaz=_all_ /nomean;  

The following specification of PROC PHREG fits the mean model and predicts the cumulative mean function for the two patterns of covariates in the Pattern data set.

  proc phreg covs(aggregate);   model (Tstart,Tstop)*Status(0)=Trt Age;   baseline data=Pattern out=out2 cmf=_all_ /nomean;   id ID;  

The COV(AGGREGATE) option computes the robust sandwich covariance matrix estimate. The CMF=_ALL_ option adds the cumulative mean funtion estimates, the standard error estimates, and the lower and upper pointwise confidence limits to the OUT=Out2 data set.

PWP Models for Recurrent Events Data

Let N ( t ) be the number of events a subject experiences by time t . Let Z ( t ) be the covariate vectors of the subject at time t . For a subject who has K events before censorship takes place, let t = 0, let t k be the k th recurrence time, k = 1 , ..., K , and let t K +1 be the censored time. Prentice, Williams, and Peterson (1981) consider two time scales , a total time from the beginning of the study and a gap time from immediately preceding failure. The PWP models are stratified Cox-type models that allow the shape of the hazard function to depend on the number of preceding events and possibly on other characteristics of { N ( t ) and Z ( t )}. The total time and gap time models are given, respectively, as follows:

click to expand

where » k is an arbitrary baseline intensity functions, and ² k is a vector of stratum-specific regression coefficients. Here, a subject moves to the k th stratum immediately after his ( k ˆ’ 1)th recurrence time and remains there until the k th recurrence occurs or until censorship takes place. For instance, a subject who experiences only one event moves from the first stratum to the second stratum after the event occurs and remains in the second stratum until the end of the follow-up.

You can use PROC PHREG to carry out the analyses of the PWP models, but you have to prepare the input data set to provide the correct risk sets. The input data set for analyzing the total time is the same as the AG model with an additional variable to represent the stratum that the subject is in. A subject with K events contributes K + 1 observations to the input data set, one for each stratum that the subject moves to. The input data should contain

  • a TStart variable to represent the ( k ˆ’ 1)th recurrence time or the value 0 if k =1

  • a TStop variable to represent the k th recurrence time or the time of censorship if k = K +1

  • a Status variable with value 1 if the Time value is a recurrence time and value 0 if the Time value is a censored time

  • an Enum variable representing the index of the stratum that the subject is in. For a subject who has only one event at t 1 and is followed to time t c , Enum =1 for the first observation (where Time = t 1 and Status =1) and Enum =2 for the second observation (where Time = t c and Status =0).

  • explanatory variables thought to be related to the recurrence times

To analyze gap times, the input data set should also include a GapTime variable which is equal to ( TStop ˆ’ TStart ).

Consider the data of two subjects in CGD data described in the previous section.

  • Patients #174054, age 38, in the gamma interferon group had a follow-up time of 293 without any infection.

  • Patient #204001, age 12, in the placebo group had an infection at 219 days, a recurrent infection at 373 days, and a follow-up time of 414 days.

To illustrate, suppose all subjects have at most two observed events. The data for the two subjects in the input data set are as follows:

ID

TStart

TStop

Gaptime

Status

Enum

Trt

Age

174054

293

293

1

1

38

204001

219

219

1

1

2

12

204001

219

373

154

1

2

2

12

204001

373

414

41

3

2

12

Subject #174054 contributes only one observation to the input data, since there is no observed event. Subject #204001 contributes three observations, since there are two observed events.

To fit the total time model of PWP with stratum-specific slopes, you can either create the stratum-specific explanatory variables ( Trt 1 , Trt 2 , and Trt 3 for Trt , and Age1 , Age2 , and Age3 for Age ) in a DATA step, or you can specify them in PROC PHREG using programming statements as follows:

 proc phreg;     model (TStart,TStop)*Status(0)=Trt1 Trt2 Trt3 Age1 Age2 Age3;     strata Enum;     Trt1= Trt * (Enum=1);     Trt2= Trt * (Enum=2);     Trt3= Trt * (Enum=3);     Age1= Age * (Enum=1);     Age2= Age * (Enum=2);     Age3= Age * (Enum=3);     run; 

To fit the total time model of PWP with the common regression coefficients, you specify

  proc phreg;   model (TStart,TStop)*Status(0)=Trt Age;   strata Enum;   run;  

To fit the gap time model of PWP with stratum-specific regression coefficients, you specify

  proc phreg;   model Gaptime*Status(0)=Trt1 Trt2 Trt3 Age1 Age2 Age3;   strata Enum;   Trt1= Trt * (Enum=1);   Trt2= Trt * (Enum=2);   Trt3= Trt * (Enum=3);   Age1= Age * (Enum=1);   Age2= Age * (Enum=2);   Age3= Age * (Enum=3);   run;  

To fit the gap time model of PWP with common regression coefficients, you specify

  proc phreg;   model Gaptime*Status(0)=Trt Age;   strata Enum;   run;  

Residuals

The cumulative baseline hazard function is estimated by

click to expand

Although this formula is for the BRESLOW=TIES option, the same formula is used for other TIES= options. The discrepancies between results obtained by using an appropriate formula for a nondefault TIES= option and those obtained by the given formula are minimal.

The martingale residual at t is defined as

click to expand

Here i ( t ) estimates the difference over (0 ,t ] between the observed number of events for the i th subject and a conditional expected number of events. The quantity i i ( ˆ ) is referred to as the martingale residual for the i th subject. When the counting process MODEL specification is used, the RESMART= variable contains the component ( i ( t 2 ) ˆ’ i ( t 1 )) instead of the martingale residual at t 2 . The martingale residual for a subject can be obtained by summing up these component residuals within the subject. For the Cox model with no time-dependent explanatory variables, the martingale residual for the i th subject with observation time t i and event status i , where

click to expand

is

click to expand

The deviance residuals d i are a transform of the martingale residuals:

click to expand

The square root shrinks large negative martingale residuals, while the logarithmic transformation expands martingale residuals that are close to unity. As such, the deviance residuals are more symmetrically distributed about zero than the martingale residuals. For the Cox model, the deviance residual reduces to the form

click to expand

When the counting process MODEL specification is used, values of the RESDEV= variable are set to missing because the deviance residuals can be calculated on a per subject basis only.

The Schoenfeld (1982) residual vector is calculated on a per event time basis as

click to expand

where t is an event time, and is a weighted average of the covariates over the risk set at time t and is given by

click to expand

Under the proportional hazards assumption, the Schoenfeld residuals have the sample path of a random walk; therefore, they are useful in assessing time trend or lack of proportionality. Harrell (1986) proposed a z -transform of the Pearson correlation between these residuals and the rank order of the failure time as a test statistic for nonproportional hazards. Therneau, Grambsch, and Fleming (1990) considered a Kolmogorov-type test using the cumulative sum of the residuals.

The score process for the i th subject at time t is

click to expand

The vector L i L i ( ˆ ) is the score residual for the i th subject. When the counting process MODEL specification is used, the RESSCO= variables contain the components of ( L i ( t 2) ˆ’ L i ( t 1)) instead of the score process at t 2. The score residual for a subject can be obtained by summing up these component residuals within the subject.

The score residuals are a decomposition of the first partial derivative of the log likelihood. They are useful in assessing the influence of each subject on individual parameter estimates. They also play an important role in the computation of the robust sandwich variance estimators of Lin and Wei (1989)andWei, Lin, and Weissfeld (1989).

Diagnostics Based on Weighted Residuals

The vector of weighted Schoenfeld residuals, r i , is computed as

where n e is the total number of events, = ( ) is the estimated covariance matrix of , and U i ( t i ) is the vector of Schoenfeld residuals at the event time t i .The components of r i are output to the WTRESSCH= variables.

The weighted Schoenfeld residuals are useful in assessing the proportional hazards assumption. The idea is that most of the common alternatives to the proportional hazards can be cast in terms of a time-varying coefficient model

click to expand

where » ( t, Z ) and » ( t ) are hazards rates. Let j and r ij be the j th component of and r i , respectively. Grambsch and Therneau (1994) suggest using a smoothed plot of ( j + r ij ) versus t i to discover the functional form of the time-varying coefficient ² j ( t ). A zero slope indicates that the coefficient is not varying with time.

The weighted score residuals are used more often than their unscaled counterparts in assessing local influence. Let ( i ) be the estimate of ² when the i th subject is left out, and let i = ˆ’ ( i ) . The j th component of i can be used to assess any untoward effect of the i th subject on j . The exact computation of i involves refitting the model each time a subject is omitted. Cain and Lange (1984) derived the following approximation of i as weighted score residuals:

Here, = ( ) is the estimated covariance matrix of , and L i is the vector of the score residuals for the i th subject. Values of are output to the DFBETA= variables. Again, when the counting process MODEL specification is used, the DFBETA= variables contain the component ( L i ( t 2) ˆ’ L i ( t 1)). The vector for a subject can be obtained by summing these components within the subject.

Note that these DFBETA statistics are a transform of the score residuals. In computing the robust sandwich variance estimators of Lin and Wei (1989) and Wei, Lin, and Weissfeld (1989), it is more convenient to use the DFBETA statistics than the score residuals (see Example 54.8 on page 3304).

Influence of Observations on Overall Fit of the Model

The LD statistic approximates the likelihood displacement, which is the amount by which minus twice the log likelihood ( ˆ’ 2log L ( )), under a fitted model, changes when each subject in turn is left out. When the i th subject is omitted, the likelihood displacement is

click to expand

where ( i ) is the vector of parameter estimates obtained by fitting the model without the i th subject. Instead of refitting the model without the i th subject, Pettitt and Bin Daud (1989) propose that the likelihood displacement for the i th subject be approximated by

This approximation is output to the LD= variable.

The LMAX statistic is another global influence statistic. This statistic is based on the symmetric matrix

where L is the matrix with rows that are the score residual vectors L i . The elements of the eigenvector associated with the largest eigenvalue of the matrix B , standardized to unit length, give a measure of the sensitivity of the fit of the model to each observation in the data. The influence of the i th subject on the global fit of the model is proportional to the magnitude of i , where i is the i th element of the vector that satisfies

click to expand

with » max being the largest eigenvalue of B . The sign of i is irrelevant, and its absolute value is output to the LMAX= variable.

When the counting process MODEL specification is used, the LD= and LMAX= variables are set to missing, because these two global influence statistics can be calculated on a per subject basis only.

Survival Distribution Estimates for the Cox Model

Two estimators of the survivor function are available: one is the product-limit estimate and the other is based on the empirical cumulative hazard function.

Product-Limit Estimates

Let C i denote the set of individuals censored in the half- open interval [ t i , t i +1 ), where t = 0 and t k +1 = ˆ . Let ³ l denote the censoring times in [ t i , t i +1 ); l ranges over C i . The likelihood function for all individuals is given by

click to expand

where D is empty. The likelihood L is maximized by taking S ( t ) = S ( t i + 0) for t i < t t i +1 and allowing the probability mass to fall only on the observed event times t 1 , ... , t k . By considering a discrete model with hazard contribution 1 ˆ’ ± i at t i , you take click to expand , where ± =1. Substitution into the likelihood function produces

click to expand

If you replace ² with estimated from the partial likelihood function and then maximize with respect to ± 1 , ... , ± k , the maximum likelihood estimate ± i of ± i becomes a solution of

click to expand

When only a single failure occurs at t i , i can be found explicitly. Otherwise, an iterative solution is obtained by the Newton method.

The estimated baseline cumulative hazard function is

click to expand

where ( t ) is the estimated baseline survivor function given by

click to expand

For details, refer to Kalbfleisch and Prentice (1980). Foragivenrealizationofthe explanatory variables ¾ , the product-limit estimate of the survival function at Z = ¾ is

click to expand

Empirical Cumulative Hazards Function Estimates

Let ¾ be a given realization of the explanatory variables. The empirical cumulative hazard function estimate at Z = ¾ is

click to expand

The variance estimator of ( t, ¾ ) is given by the following (Tsiatis 1981):

click to expand

where ( ) is the estimated covariance matrix of and

click to expand

The empirical cumulative hazard function (CH) estimate of the survivor function for Z = ¾ is

click to expand

Confidence Intervals for the Survivor Function

Let ( t, ¾ ) and ( t, ¾ ) correspond to the product-limit (PL) and empirical cumulative hazard function (CH) estimates of the survivor function for Z = ¾ , respectively. Both the standard error of log( ( t, ¾ )) and the standard error of log( ( t, ¾ )) are approximated by ( t, ¾ ), which is the square root of the variance estimate of ( t, ¾ ); refer to Kalbfleisch and Prentice (1980, p.116). By the delta method, the standard errors of ( t, ¾ ) and ( t, ¾ ) are given by

click to expand

respectively. The standard errors of log[-log( ( t, ¾ ))] and log[-log( ( t, ¾ ))] are given by

click to expand

respectively.

Let z ± / 2 be the upper percentile point of the standard normal distribution. A 100(1 ˆ’ ± )% confidence interval for the survivor function S ( t, ¾ ) is given in the following table.

Method

CLTYPE

Confidence Limits

LOG

PL

click to expand

LOG

CH

click to expand

LOGLOG

PL

click to expand

LOGLOG

CH

click to expand

NORMAL

PL

click to expand

NORMAL

CH

click to expand

Left Truncation of Failure Times

Left truncation arises when individuals only come under observation some known time after the natural time origin of the phenomenon under study. The risk set just prior to an event time does not include individuals whose left truncation times exceed the given event time. Thus, any contribution to the likelihood must be conditional on the truncation limit having been exceeded.

Although left truncation can be accommodated in PROC PHREG through the counting process style of input, such specification does not allow survival estimates to be output. Using the ENTRY= option in PROC PHREG for left truncation does not suppress computing the survival estimates.

Consider the following specifications of PROC PHREG:

  proc phreg data=one;   model t2*dead(0)=x1-x10/entry=t1;   baseline out=out1 survival=s;   title 'The ENTRY= option is Specified';   run;   proc phreg data=one;   model (t1,t2)*dead(0)=x1-x10;   baseline out=out2 survival=s;   title 'Counting Process Style of Input';   run;  

Both specifications yield the same model estimates; however, the baseline data set out2 is empty, since survivor function estimates are not computed when you use the counting process style of input.

Variable Selection Methods

Five variable selection methods are available. The simplest method (and the default) is SELECTION=NONE, for which PROC PHREG fits the complete model as specified in the MODEL statement. The other four methods are FORWARD for forward selection, BACKWARD for backward elimination , STEPWISE for stepwise selection, and SCORE for best subsets selection. These methods are specified with the SELECTION= option in the MODEL statement. Intercept parameters are forced to stay in the model unless the NOINT option is specified.

When SELECTION=FORWARD, PROC PHREG first estimates parameters for variables forced into the model. These variables are the intercepts and the first n explanatory variables in the MODEL statement, where n is the number specified by the START= or INCLUDE= option in the MODEL statement ( n is zero by default). Next, the procedure computes the adjusted chi-square statistics for each variable not in the model and examines the largest of these statistics. If it is significantatthe SLSENTRY= level, the corresponding variable is added to the model. Once a variable is entered in the model, it is never removed from the model. The process is repeated until none of the remaining variables meet the specified level for entry or until the STOP= value is reached.

When SELECTION=BACKWARD, parameters for the complete model as specified in the MODEL statement are estimated unless the START= option is specified. In that case, only the parameters for the intercepts and the first n explanatory variables in the MODEL statement are estimated, where n is the number specified by the START= option. Results of the Wald test for individual parameters are examined. The least significant variable that does not meet the SLSSTAY= level for staying in the model is removed. Once a variable is removed from the model, it remains excluded. The process is repeated until no other variable in the model meets the specified level for removal or until the STOP= value is reached.

The SELECTION=STEPWISE option is similar to the SELECTION=FORWARD option except that variables already in the model do not necessarily remain. Variables are entered into and removed from the model in such a way that each forward selection step can be followed by one or more backward elimination steps. The stepwise selection process terminates if no further variable can be added to the model or if the variable just entered into the model is the only variable removed in the subsequent backward elimination.

For SELECTION=SCORE, PROC PHREG uses the branch and bound algorithm of Furnival and Wilson (1974)tofind a specified number of models with the highest likelihood score (chi-square) statistic for all possible model sizes, from 1, 2, 3 variables, and so on, up to the single model containing all of the explanatory variables. The number of models displayed for each model size is controlled by the BEST= option. You can use the START= option to impose a minimum model size, and you can use the STOP= option to impose a maximum model size. For instance, with BEST=3, START=2, and STOP=5, the SCORE selection method displays the best three models (that is, the three models with the highest score chi-squares) containing 2, 3, 4, and 5 variables.

The SEQUENTIAL and STOPRES options can alter the default criteria for adding variables to or removing variables from the model when they are used with the FORWARD, BACKWARD, or STEPWISE selection methods.

Assessment of the Proportional Hazards Model (Experimental)

The proportional hazards model specifies that the hazard function for the failure time T associated with a p 1 column covariate vector Z takes the form

click to expand

where » ( . ) is an unspecifed baseline hazard function and ² is a p 1 column vector of regression parameters. Lin et al. (1993) present graphical and numerical methods for model assessment based on the cumulative sums of martingale residuals and their transforms over certain coordinates (e.g., covariate values or follow-up times). The distributions of these stochastic processes under the assumed model can be approximated by the distributions of certain zero-mean Gaussian processes whose realizations can be generated by simulation. Each observed residual pattern can then be compared, both graphically and numerically , with a number of realizations from the null distribution. Such comparisons enable you to assess objectively whether the observed residual pattern reflects anything beyond random fluctuation. These procedures are useful in determining appropriate functional forms of covariates and assessing the proportional hazards assumption. You use the ASSESS statement to carry out these model-checking procedures.

For a sample of n subjects, let ( X i , i , Z i ) be the data of the i th subject; that is, X i represents the observed failure time, i has a value of 1 if X i is an uncensored time and 0 otherwise, and Z i =( Z 1 i ,...,Z pi ) ² is a p -vector of covariates. Let N i ( t ) = i I ( X i t ) and Y i ( t ) = I ( X i t ). Let

click to expand

Let be the maximum partial likelihood estimate of ² , and let I ( ) be the observed information matrix.

The martingale residuals are defined as

click to expand

where click to expand

The empirical score process U ( , t ) = ( U 1 ( , t ) , ..., U p ( , t )) ² is a transform of the martingale residuals:

click to expand

Checking the Functional Form of a Covariate

To check the functional form of the j th covariate, consider the partial-sum process of i = i ( ˆ ):

click to expand

Under that null hypothesis that the model holds, W j ( z ) can be approximated by the zero-mean gaussian process

click to expand

where ( G 1 , ..., G n ) are independent standard normal variables that are independent of ( X i , i , Z i ), i = 1 , ..., n .

You can assess the functional form of the j th covariate by plotting a small number of realizations (the default is 20) of on the same graph as the observed W j ( z ) and visually comparing them to see how typical the observed pattern of W j ( z ) is of the null distribution samples. You can supplement the graphical inspection method with a Kolmogorov-type supremum test. Let s j be the observed value of S j =sup z W j ( z ) and let click to expand . The p -value Pr( S j s j ) is approximated by sj ), which in turn is approximated by generating a large number of realizations (1000 is the default) of .

Checking the Proportional Hazards Assumption

Consider the standardized empirical score process for the j th component of Z

click to expand

Under the null hypothesis that the model holds, ( t ) can be approximated by

click to expand

where j ( , t ) is the j th component of ( , t ), and ( G 1 , , G n ) are independent standard normal variables that are independent of ( X i , i , Z i , ( i = 1 , , n ).

You can assess the proportional hazards assumption for the j th covariate by plotting a few realizations of ( t ) on the same graph as the observed ( t ) and visually comparing them to see how typical the observed pattern of ( t ) is of the null distribution samples. Again you can supplement the graphical inspection method with a Kolmogorov-type supremum test. Let be the observed value of and let . The p -value is approximated by , which in turn is approximated by generating a large number of realizations (1000 is the default) of (.).

Computational Resources

Let n be the number of observations in a BY group. Let p be the number of explanatory variables. The minimum working space (in bytes) needed to process the BY group is

click to expand

Extra memory is needed for certain TIES= options. Let k be the maximum multiplicity of tied times. The TIES=DISCRETE option requires extra memory (in bytes) of

The TIES=EXACT option requires extra memory (in bytes) of

If sufficient space is available, the input data are also kept in memory. Otherwise, the input data are reread from the utility file for each evaluation of the likelihood function and its derivatives, with the resulting execution time substantially increased.

Displayed Output

If you use the NOPRINT option in the PROC PHREG statement, the procedure does not display any output. Otherwise, the displayed output of the PHREG procedure includes the following:

  • the Model Information table, which contains:

    • the two-level name of the input data set

    • the name and label of the failure time variable

    • if you specify the censoring variable,

      • the name and label of the censoring variable

      • the values that the censoring variable assumes to indicate censored times

    • if you use the OFFSET= option in the MODEL statement, the name and label of the offset variable

    • if you specify the FREQ statement, the name and label of the frequency variable

    • if you specify the WEIGHT statement, the name and label of the weight variable

    • the method of handling ties in the failure time

  • the Summary of the Number of Event and Censored Values table, which displays, for each stratum, the breakdown of the number of events and censored values. This table is not produced if the NOSUMMARY option is specified.

  • if you specify the SIMPLE option in the PROC PHREG statement, the Simple Statistics for Explanatory Variables table, which displays, for each stratum, the mean, standard deviation, and minimum and maximum for each explanatory variable in the MODEL statement

  • if you specify the ITPRINT option in the MODEL statement, the Iteration History table, which displays the iteration number, step size, log likelihood, and parameter estimates at each iteration The last evaluation of the gradient vector is also displayed.

  • the Model Fit Statistics table, which gives the values of ˆ’ 2 log likelihood for fitting a model with no explanatory variable and for fitting a model with all the explanatory variables. The AIC and SBC are also given in this table.

  • the Testing Global Null Hypothesis: BETA=0 table, which displays results of the likelihood ratio test, the score test, and the Wald test

  • the Analysis of Maximum Likelihood Estimates table, which contains:

    • the maximum likelihood estimate of the parameter

    • the estimated standard error of the parameter estimate, computed as the square root of the corresponding diagonal element of the estimated covariance matrix

    • if you specify the COVS option in the PROC statement, the ratio of the robust standard error estimate to the model-based standard error estimate

    • the Wald Chi-Square statistic, computed as the square of the parameter estimate divided by its standard error estimate

    • the degrees of freedom of the Wald chi-square statistic. It has a value of 1 unless the corresponding parameter is redundant or infinite, in which casethevalueis0.

    • the p -value of the Wald chi-square statistic with respect to a chi-square distribution with one degree of freedom

    • the hazards ratio estimate computed by exponentiating the parameter estimate

    • if you specified the RISKLIMITS option in the MODEL statement, the confidence limits for the hazards ratio

  • if you specify SELECTION=SCORE in the MODEL statement, the Regression Models Selected by Score Criterion table, which gives the number of explanatory variables in each model, the score chi-square statistic, and the names of the variables included in the model

  • if you use the FORWARD or STEPWISE selection method and you specify the DETAILS option in the MODEL statement, the Analysis of Variables Not in the Model table, which gives the Score chi-square statistic for testing the significance of each variable not in the model (after adjusting for the variables already in the model), and the p -value of the chi-square statistic with respect to a chi-square distribution with one degree of freedom. This table is produced before a variable is selected for entry in a forward selection step.

  • if you specify the FORWARD, BACKWARD, or STEPWISE selection method, a table summarizing the model-building process, which gives the step number, the explanatory variable entered or removed at each step, the chi-square statistic, and the corresponding p -value on which the entry or removal is based

  • if you use the COVB option in the MODEL statement, the estimated covariance matrix of the parameter estimates

  • if you use the CORRB option in the MODEL statement, the estimated correlation matrix of the parameter estimates

  • if you specify a TEST statement,

    • the Linear Coefficients table, which gives the coefficients and constants of the linear hypothesis (if the E option is specified)

    • the printing of the intermediate calculations of the Wald test (if the option PRINT is specified)

    • the Test Results table, which gives the Wald chi-square statistic, the degrees of freedom, and the p -value

    • the Average Effect table, which gives the weighted average of the parameter estimates for the variables in the TEST statement, the estimated standard error, the z-score, and the p -value (if the AVERAGE option is specified)

ODS Table Names

PROC PHREG assigns a name to each table it creates. You can use these names to reference the table when using the Output Delivery System (ODS) to select tables and create output data sets. These names are listed in the following table. For more information on ODS, see Chapter 14, Using the Output Delivery System.

Table 54.1: ODS Tables Produced in PROC PHREG

ODS Table Name

Description

Statement

Option

BestSubsets

Best subset selection

MODEL

SELECTION=SCORE

CensoredSummary

Summary of event and censored observations

MODEL

default

ConvergenceStatus

Convergence status

MODEL

default

CorrB

Estimated correlation matrix of parameter estimators

MODEL

CORRB

CovB

Estimated covariance matrix of parameter estimators

MODEL

COVB

FitStatistics

Model fit statistics

MODEL

default

FunctionalFormSupTest (experimental)

Supremum test for functional form

ASSESS

VAR=

GlobalScore

Global chi-square test

MODEL

NOFIT

GlobalTests

Tests of the global null hypothesis

MODEL

default

IterHistory

Iteration history

MODEL

ITPRINT

LastGradient

Last evaluation of gradient

MODEL

ITPRINT

ModelBuildingSummary

Summary of model building

MODEL

SELECTION=B/F/S

ModelInfo

Model information

PROC

default

NObs

Number of observations

 

default

ParameterEstimates Maximum

likelihood estimates of model parameters

MODEL

default

ProportionalHazardsSupTest (experimental)

Supremum test for proportional hazards assumption

ASSESS

PH

ResidualChiSq

Residual chi-square

MODEL

SELECTION=F/B

SimpleStatistics

Summary statistics for explanatory variables

PROC

SIMPLE

TestAverage

Average effect for test

TEST

AVERAGE

TestCoeff

Coefficients for linear hypotheses

TEST

E

TestPrint1

L [cov( b )] L and Lb - c

TEST

PRINT

TestPrint2

Ginv( L [cov( b )] L ) and Ginv( L [cov( b )] L )( Lb - c )

TEST

PRINT

TestStmts

Linear hypotheses testing results

TEST

 

VariablesNotInModel

Analysis of variables not in the model

MODEL

SELECTION=F/S

ODS Graphics (Experimental)

This section describes the use of ODS for creating statistical graphs for model assessment with the PHREG procedure. These graphics are experimental in this release, meaning that both the graphical results and the syntax for specifying them are subject to change in a future release.

To request these graphs you must specify the ODS GRAPHICS statement in addition to the ASSESS statement in PROC PHREG. For more information on the ODS GRAPHICS statement, see Chapter 15, Statistical Graphics Using ODS.

ODS Graph Names

PROC PHREG assigns a name to each graph it creates using the ODS. You can use these names to reference the graphs when using ODS. The names are listed in Table 54.2.

Table 54.2: ODS Graphics Produced by PROC PHREG

ODS Graph Name

Description

Statement

Option

CumulativeResiduals

Cumulative martingale residual plot

ASSESS

VAR=

CumResidPanel

Panel plot of cumulative martingale residuals

ASSESS

VAR= and CRPANEL

ScoreProcess

Standardized score process plot

ASSESS

PH




SAS.STAT 9.1 Users Guide (Vol. 5)
SAS.STAT 9.1 Users Guide (Vol. 5)
ISBN: N/A
EAN: N/A
Year: 2004
Pages: 98

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net