Syntax


The following statements are available in PROC PHREG.

  • PROC PHREG < options > ;

    • ASSESS keyword </ options > ;

    • MODEL response < * censor (list) > = variables < / options > ;

    • < programming statements >

    • STRATA variable < (list) >< variable < (list) >>< /option > ;

    • < label: > TEST equation1 < , , equationk >< /option > ;

    • FREQ variable ;

    • WEIGHT variable < /option > ;

    • ID variables ;

    • OUTPUT < OUT= SAS-data-set >

      • < keyword= name ... keyword=name >< /options > ;

    • BASELINE < OUT = SAS-data-set >

      • < COVARIATES= SAS-data-set >

      • < keyword=name ... keyword=name >< /options > ;

    • BY variables ;

The PROC PHREG statement invokes the procedure. All other statements except the MODEL statement are optional. Items within < > are optional, and there is no required order for the statements following the PROC PHREG statement. The MODEL statement specifies the variables that define the survival time, the censoring variable, and the explanatory variables. The STRATA statement specifies a variable or set of variables defining the strata for the analysis. The TEST statement contains equations that define linear hypotheses concerning the model parameters. The ID statement specifies the variables with values that are used to label the observations. The OUTPUT and BASELINE statements create data sets containing the survival estimates. DATA step programming statements can be included to create time-dependent explanatory variables.

PROC PHREG Statement

  • PROC PHREG < options > ;

You can specify the following options in the PROC PHREG statement.

COVOUT

  • adds the estimated covariance matrix of the parameter estimates to the OUTEST= data set. The COVOUT option has no effect unless the OUTEST= option is specified.

COVM

  • requests the model-based covariance matrix (which is the inverse of the observed information matrix) be presented and used in the analysis if the COVS option is also specified. The COVM option has no effect if the COVS option is not specified.

COVSANDWICH < (AGGREGATE) >

COVS < (AGGREGATE) >

  • requests the robust sandwich estimate of Lin and Wei (1989) for the covariance matrix. When this option is specified, this robust sandwich estimate is used in the Wald tests for testing the global null hypothesis, null hypotheses of individual parameters, and the hypotheses in the TEST statements. In addition, a modified score test is computed in the testing of the global null hypothesis, and the parameter estimates table has an additional StdErrRatio column, which contains the ratios of the robust estimate of the standard error relative to the corresponding model-based estimate. Optionally , you can specify the keyword AGGREGATE enclosed in parentheses after the COVSANDWICH (or COVS) option, which requests a summing up of the score residuals for each distinct ID pattern in the computation of the robust sandwich covariance estimate. This AGGREGATE option has no effects if the ID statement is not specified.

DATA= SAS-data-set

  • names the SAS data set containing the data to be analyzed . If you omit the DATA= option, the procedure uses the most recently created SAS data set.

MULTIPASS

  • requests that, for each Newton-Raphson iteration, PROC PHREG recompiles the risk sets corresponding to the event times for the (start,stop) style of response and re-computes the values of the time-dependent variables defined by the programming statements for each observation in the risk sets. If the MULTIPASS option is not specified, PROC PHREG computes all risk sets and all the variable values and saves them into a utility file. The MULTIPASS option decreases required disk space at the expense of increased execution time; however, for very large data, it may actually save time since it is time consuming to write and read large utility files. This option has an effect only when the (start, stop) style of response is used or when there are time-dependent explanatory variables.

NOPRINT

  • suppresses all displayed output. Note that this option temporarily disables the Output Delivery System (ODS); see Chapter 14, Using the Output Delivery System, for more information.

NOSUMMARY

  • suppresses the display of the event and censored observation frequencies.

OUTEST= SAS-data-set

  • creates an output SAS data set that contains estimates of the regression coefficients. If you use the COVOUT option, the data set also contains the estimated covariance matrix of the parameter estimates. The data set includes

    • any BY variables specified

    • _TIES_, a character variable of length 8 with four possible values: BRESLOW, DISCRETE, EFRON, and EXACT. These are the four values of the TIES= option in the MODEL statement.

    • _TYPE_, a character variable of length 8 with two possible values: PARMS for parameter estimates or COV for covariance estimates. If both the COVM and COVS options are specified in the PROC LIFETEST statement along with the COVOUT option, _TYPE_=COVM for the model-based covariance estimates and _ TYPE_ =COVS for the robust sandwich covariance estimates.

    • _STATUS_, a character variable indicating whether the estimates have converged

    • _NAME_, a character variable containing the name of the TIME variable for the row of parameter estimates and the name of each explanatory variable to label the rows of covariance estimates

    • one variable for each explanatory variable in the MODEL statement. In a forward, backward, or stepwise regression analysis, if an explanatory variable is not included in the final model, the corresponding parameter estimate and covariances are set to missing.

    • _LNLIKE_, a numeric variable containing the last computed value of the log likelihood

SIMPLE

  • displays simple descriptive statistics (mean, standard deviation, minimum, and maximum) for each explanatory variable in the MODEL statement.

ASSESS Statement (Experimental)

  • ASSESS < VAR= (list) >< PH >< / options > ;

The ASSESS statement performs the graphical and numerical methods of Lin, Wei, and Ying (1993) for checking the adequacy of the Cox regression model. The methods are derived from cumulative sums of martingale residuals over follow-up times or covariate values. You can assess the functional form of a covariate or you can check the proportional hazards assumption for each covariate in the Cox model. PROC PHREG uses the experimental ODS graphics for the graphical displays. For specific information about the experimental graphics that is available in PROC PHREG, see the section ODS Graphics on page 3271. You must specify at least one of the following to create an analysis.

VAR=(list)

  • specifies the list of explanatory variables for which their functional forms are assessed. For each variable on the list, the observed cumulative martingale residuals are plotted against the values of the explanatory variable along with 20 (or n if NPATHS= n is specified) simulated residual patterns.

PROPORTIONALHAZARDS

PH

  • requests the checking of the proportional hazards assumption. For each explanatory variable in the model, the observed score process component is plotted against the follow-up time along with 20 (or n if NPATHS= n is specified) simulated patterns.

    The following options can be specified after a slash (/).

NPATHS= n

  • specifies the number of simulated residual patterns to be displayed in a cumulative martingale residual plot or a score process plot. The default is n =20.

CRPANEL

  • requests that a plot with four panels, each containing the observed cumulative martingale residuals and two simulated residual patterns, be created.

RESAMPLE < = n >

  • requests that the Kolmogorov-type supremum test be computed on 1,000 simulated patterns or on n simulated patterns if n is specified.

SEED= n

  • specifies an integer seed for the random number generator used in creating simulated realizations for plots and for the Kolmogorov-type supremum tests. Specifying a seed enables you to reproduce identical graphs and p -values for the model assessments from the same PHREG specification. If the SEED= option is not specified,orifyou specify a nonpositive seed, a random seed is derived from the time of day.

BASELINE Statement

  • BASELINE < OUT= SAS-data-set >< COVARIATES= SAS-data-set >

    • keyword=name ... keyword=name >< /options > ;

The BASELINE statement creates a new SAS data set that contains the survivor function estimates at the event times of each stratum for every pattern of explanatory variable values ( x ) given in the COVARIATES= data set. By default, the data set also contains the survivor function estimates corresponding to the means of the explanatory variables ( x = z ) for each stratum. If you want only these estimates, you can omit the COVARIATES= option. No BASELINE data set is created if the model contains a time-dependent variable defined by means of programming statement.

The following list explains specifications in the BASELINE statement.

OUT= SAS-data-set

  • names the output BASELINE data set. If you omit the OUT= option, the data set is created and given a default name using the DATA n convention.

COVARIATES= SAS-data-set

  • names the SAS data set containing the set of explanatory variable values for which the survivor functions are estimated. There must be a corresponding variable in the COVARIATES= data set for each explanatory variable in the final model.

keyword=name

  • specifies the statistics included in the BASELINE data set and assigns names to the new variables that contain the statistics. Specify a keyword for each desired statistic (see the following list of keywords), an equal sign, and the variable to contain the statistic. The keywords and the corresponding statistics are

CMF

cumulative mean function estimate for recurrent events data. Specifying CMF=_ALL_ is equivalent to specifying CMF=CMF, STDCMF=StdErrCMF, LOWERCMF=LowerCMF, and UPPERCMF=UpperCMF. Nelson (2002) refers to the mean function estimate as MCF (mean cumulative function).

CUMHAZ

cumulative hazard function estimate for recurrent events data. Specifying CMFHAZ=_ALL_ is equivalent to specifying CUMHAZ=CumHaz, STDCUMHAZ=StdErrCumHaz, LOWERCUMHAZ=LowerCumHaz, and UPPERCUMHAZ=UpperCumHaz.

LOGLOGS

log of the negative log of SURVIVAL

LOGSURV

log of SURVIVAL

LOWER L

lower pointwise confidence limit for the survivor function.The confidence level is determined by the ALPHA= option.

LOWERCMF

lower pointwise confidence limit for the cumulative mean function. The confidence level is determined by the ALPHA= option.

LOWERCUMHAZ

lower pointwise confidence limit for the cumulative hazard function. The confidence level is determined by the ALPHA= option.

STDERR

standard error of the survivor function estimator

STDCMF

standard error of the cumulative mean function estimator

STDCUMHAZ

standard error of the cumulative hazard function estimator

STDXBETA

standard error of the linear predictor estimator,

SURVIVAL

survivor function estimate click to expand Specifying SURVIVAL=_ALL_ is equivalent to specifying SURVIVAL=Survival, STDERR=StdErrSurvival, LOWER=LowerSurvival, and UPPER=UpperSurvival.

UPPER U

upper pointwise confidence limit for the survivor function.The confidence level is determined by the ALPHA= option.

UPPERCMF

upper pointwise confidence limit for the cumulative mean function. The confidence level is determined by the ALPHA= option.

UPPERCUMHAZ

upper pointwise confidence limit for the cumulative hazard function. The confidence level is determined by the ALPHA= option.

XBETA

estimate of the linear predictor, x ²

The following options can appear in the BASELINE statement after a slash (/).

ALPHA= value

  • specifies the significance level of the confidence interval for the survivor function. The value must be between 0 and 1. The default is 0.05, which results in a 95% confidence interval.

CLTYPE= method

  • specifies the method used to compute the confidence limits for S ( t, z ), the survivor function for a subject with a fixed covariate vector z at event time t . The CLTYPE= option can take the following values:

LOG

specifies that the confidence limits for log( S ( t, z )) aretobecomputed using the normal theory approximation . The confidence limits for S ( t, z ) are obtained by back-transforming the confidence limits for log( S ( t, z )). The default is CLTYPE=LOG.

LOGLOG

specifies that the confidence limits for the log( ˆ’ log( S ( t, z ))) are to be computed using normal theory approximation. The confidence limits for S ( t, z ) are obtained by back-transforming the confidence limits for log( ˆ’ log( S ( t, z ))).

NORMAL

specifies that the confidence limits for S ( t, z ) are to be computed directly using normal theory approximation.

METHOD= method

  • specifies the method used to compute the survivor function estimates. The two available methods are

CH EMP NELSON

specifies that the Nelson (empirical) cumulative hazard function estimate of the survivor function is to be computed; that is, the survivor function is estimated by exponentiating the negative empirical cumulative hazard function.

PL

specifies that the product-limit estimate of the survivor function is to be computed. The default is METHOD=PL.

NOMEAN

  • excludes the survivor function estimates corresponding to the sample means of the explanatory variables.

    The METHOD= and CLTYPE= options apply only to the survival estimates. For recurrent events data, both CMF= and CUMHAZ= statistics are the Nelson estimators, but their standard error are not the same. Confidence limits for the cumulative mean function and cumulative hazard function are based on the log transform.

BY Statement

  • BY variables ;

You can specify a BY statement with PROC PHREG to obtain separate analyses on observations in groups defined by the BY variables. When a BY statement appears, the procedure expects the input data set to be sorted in order of the BY variables. The variables are one or more variables in the input data set.

If your input data set is not sorted in ascending order, use one of the following alternatives:

  • Sort the data using the SORT procedure with a similar BY statement.

  • Specify the BY statement option NOTSORTED or DESCENDING in the BY statement for the PHREG procedure. The NOTSORTED option does not mean that the data are unsorted but rather that the data are arranged in groups (according to values of the BY variables) and that these groups are not necessarily in alphabetical or increasing numeric order.

  • Create an index on the BY variables using the DATASETS procedure (in base SAS software).

For more information on the BY statement, refer to the discussion in SAS Language Reference: Contents . For more information on the DATASETS procedure, refer to the discussion in the SAS Procedures Guide .

FREQ Statement

  • FREQ variable < /option > ;

The variable in the FREQ statement identifies the variable (in the input data set) containing the frequency of occurrence of each observation. PROC PHREG treats each observation as if it appears n times, where n is the value of the FREQ variable for the observation. If not an integer, the frequency value is truncated to an integer. If the frequency value is missing, the observation is not used in the estimation of the regression parameters.

The following option can be specified in the FREQ statement after a slash (/):

NOTRUNCATE

NOTRUNC

  • specifies that frequency values are not truncated to integers.

ID Statement

  • ID variables ;

The ID statement specifies additional variables for identifying observations in the input data. These variables are placed in the OUT= data set created by the OUTPUT statement. In the computation of the robust sandwich variance estimate, you can aggregate over distinct values of these ID variables.

Only variables in the input data set can be included in the ID statement.

MODEL Statement

  • MODEL response < *censor ( list ) > = variables < /options > ;

    MODEL (t1, t2) < *censor(list) > = variables < /options > ;

The MODEL statement identifies the variables to be used as the failure time variables, the optional censoring variable, and the explanatory variables. Two forms of MODEL syntax can be specified; the first form allows one response variable, while the second form allows two variables for the counting process style of input (see the section Counting Process Style of Input on page 3241 for more information).

In the first MODEL statement, preceding the equal sign, is the name of the failure time variable. This can optionally be followed by an asterisk, the name of the censoring variable, and a list of censoring values (separated by blanks or commas if there is more than one) enclosed in parentheses. If the censoring variable takes on one of these values, the corresponding failure time is considered to be censored. The variables following the equal sign are the explanatory variables (sometimes called independent variables or covariates) for the model.

Instead of a single failure time variable, the second MODEL statement identifies a pair of failure time variables. Their names are enclosed in parentheses, and they signify the endpoints of a semi-closed interval ( t 1 , t 2] during which the subject is at risk. If the censoring variable takes on one of the censoring values, the time t 2 is considered to be censored.

The censoring variable and the explanatory variables must be numeric. The failure time variables must contain nonnegative values. Any observation with a negative failure time is excluded from the analysis, as is any observation with a missing value for any of the variables listed in the MODEL statement.

You can specify the following options in the MODEL statement.

Ties-Handling Option

TIES= method

specifies how to handle ties in the failure time. The TIES= option can take the following values:

BRESLOW

uses the approximate likelihood of Breslow (1974). This is the default value.

DISCRETE

replaces the proportional hazards model by the discrete logistic model

click to expand

where » ( t ) and h ( t ; z ) are discrete hazard functions.

EFRON

uses the approximate likelihood of Efron (1977).

EXACT

computes the exact conditional probability under the proportional hazards assumption that all tied event times occur before censored times of the same value or before larger values. This is equivalent to summing all terms of the marginal likelihood for ² that are consistent with the observed data (Kalbfleisch and Prentice 1980; DeLong, Guirguis, and So 1994).

The EXACT method may take a considerable amount of computer resources. If ties are not extensive , the EFRON and BRESLOW methods provide satisfactory approximations to the EXACT method for the continuous time-scale model. In general, Efron s approximation gives results that are much closer to the EXACT method results than Breslow s approximation does. If the time scale is genuinely discrete, you should use the DISCRETE method. The DISCRETE method is also required in the analysis of case-control studies when there is more than one case in a matched set. If there are no ties, all four methods result in the same likelihood and yield identical estimates. The default, TIES=BRESLOW, is the most efficient method when there are no ties.

Model-Specification Options

ENTRYTIME= variable

ENTRY= variable

  • specifies the name of the variable that represents the left truncation time. This option has no effect when the counting process style of input is specified. See the section Left Truncation of Failure Times on page 3263 for more information.

NOFIT

  • performs the global score test, which tests the joint significance of all the explanatory variables in the MODEL statement. No parameters are estimated. If the NOFIT option is specified along with other MODEL statement options, NOFIT takes precedence, and all other options are ignored except the TIES= option.

OFFSET= name

  • specifies the name of an offset variable, which is an explanatory variable with a regression coefficient fixed as one. This option can be used to incorporate risk weights for the likelihood function.

SELECTION= method

  • specifies the method used to select the model. The method s available are

    BACKWARD B

    requests backward elimination .

    FORWARD F

    requests forward selection.

    NONE N

    fits the complete model specified in the MODEL statement. This is the default value.

    SCORE

    requests best subset selection. It identifies a specified number of models with the highest score chi-square statistic for all possible model sizes ranging from one explanatory variable to the total number of explanatory variables listed in the MODEL statement.

    STEPWISE S

    requests stepwise selection.

    For more information, see the section Variable Selection Methods on page 3264.

Model-Building Options

The following options enable you to provide additional specifications for the BACKWARD, FORWARD, SCORE, and STEPWISE model selection methods. They have no effect when SELECTION=NONE. Only the INCLUDE=, START=, STOP=, and BEST= options work with the SCORE method.

BEST= n

  • is used exclusively with the SCORE model selection method. The BEST= n option specifies that n models with the highest score chi-square statistics are to be displayed for each model size. If the option is omitted and there are no more than 10 explanatory variables, then all possible models are listed for each model size. If the option is omitted and there are more than 10 explanatory variables, then the number of models selected for each model size is, at most, equal to the number of explanatory variables listed in the MODEL statement. See Example 54.2 on page 3279 for an illustration of the SCORE selection method and the BEST= option.

DETAILS

  • produces a detailed display at each step of the model-building process. It produces an Analysis of Variables Not in the Model table before displaying the variable selected for entry for FORWARD or STEPWISE selection. For each model fitted, it produces the Analysis of Maximum Likelihood Estimates table. See Example 54.1 on page 3272 for a discussion of these tables.

INCLUDE= n

  • includes the first n explanatory variables listed in the MODEL statement in every model. The value for n ranges from 1 to s , where s is the number of explanatory variables in the MODEL statement. The default value of n is 0.

MAXSTEP= n

  • specifies the maximum number of times the explanatory variables can move in and out of the model before the STEPWISE model-building process ends. The default value for n is twice the number of explanatory variables in the MODEL statement. The option has no effect for other model selection methods.

SEQUENTIAL

  • forces variables to be added to the model in the order specified in the MODEL statement or to be eliminated from the model in the reverse order specified in the MODEL statement.

SLENTRY= value

SLE= value

  • specifies the significance level (a value between 0 and 1) for entering an explanatory variable into the model in the FORWARD or STEPWISE method. For all variables not in the model, the one with the smallest p -value is entered if the p -value is less than or equal to the specified significance level. The default value is 0.05.

SLSTAY= value

SLS= value

  • specifies the significance level (a value between 0 and 1) for removing an explanatory variable from the model in the BACKWARD or STEPWISE method. For all variables in the model, the one with the largest p -value is removed if the p -value exceeds the specified significance level. The default value is 0.05.

START= n

  • begins the FORWARD, BACKWARD, or STEPWISE model selection process with the first n explanatory variables listed in the MODEL statement. The value for n ranges from 0 to s , where s is the total number of explanatory variables in the MODEL statement. The default value of n is s for the BACKWARD method and 0 for the FORWARD and STEPWISE methods. Note that START= n specifies only that the first n explanatory variables appear in the first model, while INCLUDE= n specifies that the first n explanatory variables be included in every model. For the SCORE method, START= n specifies that the smallest models contain n explanatory variables, where n ranges from 1 to s . The default value of n is 1.

STOP= n

  • specifies the maximum (FORWARD method) or minimum (BACKWARD method) number of explanatory variables to be included in the final model. The value for n ranges from 0 to s , where s is the number of explanatory variables in the MODEL statement. The default value of n is 0 for the BACKWARD method and s for the FORWARD method. For the SCORE method, STOP= n specifies that the largest models contain n explanatory variables, where n ranges from 1 to s . The default value of n is s . The STOP= option has no effect for the STEPWISE method.

STOPRES

SR

  • specifies that the addition and deletion of variables are to be based on the result of the likelihood score test for testing the joint significance of variables not in the model. This score chi-square statistic is referred to as the residual chi-square. In the FORWARD method, the STOPRES option enters the explanatory variables into the model one at a time until the residual chi-square becomes insignificant (that is, until the p -value of the residual chi-square exceeds the SLENTRY= value). In the BACKWARD method, the STOPRES option removes variables from the model one at a time until the residual chi-square becomes significant (that is, until the p -value of the residual chi-square becomes less than the SLSTAY= value). The STOPRES option has no effect for the STEPWISE method.

Optimization Options

Four convergence criteria are allowed: ABSFCONV=, FCONV=, GCONV=, and XCONV=. If you specify more than one convergence criterion, the optimization is terminated as soon as one of the criteria is satisfied. If none of the criteria is specified, the default is GCONV=1E ˆ’ 8.

ABSFCONV= value

  • specifies the absolute function convergence criterion. Termination requires a small change in the objective function (log partial likelihood function) in subsequent iterations,

    click to expand

    where l k is the value of the objective function at iteration k .

CONVERGELIKE= value

  • is the same as specifying the ABSFCONV= option.

CONVERGEPARM= value

  • is the same as specifying the XCONV= option.

FCONV= value

  • specifies the relative function convergence criterion. Termination requires a small relative change in the objective function (log partial likelihood function) in subsequent iterations,

    click to expand

    where l k is the value of the objective function at iteration k .

GCONV= value

  • specifies the relative gradient convergence criterion. Termination requires that the normalized prediction function reduction is small,

    click to expand

    where l k is the log partial likelihood, g k is the gradient vector (first partial derivatives of the log partial likelihood), and H k is the negative Hessian matrix (second partial derivatives of the log partial likelihood), all at iteration k .

MAXITER= n

  • specifies the maximum number of iterations allowed. The default value for n is 25. If convergence is not attained in n iterations, the displayed output and all data sets created by PROC PHREG contain results that are based on the last maximum likelihood iteration.

RIDGING=ABSOLUTE RELATIVE NONE

  • specifies the technique to improve the log-likelihood when its value is worse than that of the previous step. For RIDGING=ABSOLUTE, the diagonal elements of the negative (expected) Hessian are inflated by adding the ridge value. For RIDGING=RELATIVE, the diagonal elements are inflated by the factor equal to 1 plus the ridge value. For RIDGING=NONE, the crude line-search method of taking half a step is used instead of ridging.

SINGULAR= value

  • specifies the singularity criterion for determining linear dependencies in the set of explanatory variables. The default value is 10 ˆ’ 12 .

XCONV= value

  • specifies the relative parameter convergence criterion. Termination requires a small relative parameter change in subsequent iterations,

    where

    click to expand

    where is the estimate of the i th parameter at iteration k .

Display Options

ALPHA= value

  • sets the significance level used for the confidence limits for the hazards ratios. The value must be between 0 and 1. The default value is 0.05, which results in the calcula-tionofa95%confidence interval. This option has no effect unless the RISKLIMITS option is specified.

CORRB

  • displays the estimated correlation matrix of the parameter estimates.

COVB

  • displays the estimated covariance matrix of the parameter estimates.

ITPRINT

  • displays the iteration history, including the last evaluation of the gradient vector.

RISKLIMITS

RL

  • displays, for each explanatory variable, the 100(1 ˆ’ ± )% confidence limits for the hazards ratio ( e ² i ). The value for ± is determined by the ALPHA= option.

OUTPUT Statement

  • OUTPUT < OUT= SAS-data-set >

    < keyword=name ... keyword=name >< /options > ;

The OUTPUT statement creates a new SAS data set containing statistics calculated for each observation. These can include the estimated linear predictor and its standard error, survival distribution estimates, residuals, and influence statistics. In addition, this data set includes the time variable, the explanatory variables listed in the MODEL statement, the censoring variable (if specified), and the BY, STRATA, FREQ, and ID variables (if specified).

For observations with missing values in the time variable or any explanatory variables, the output statistics are set to missing. However, for observations with missing values only in the censoring variable or the FREQ variable, survival estimates are still computed. Therefore, by adding observations with missing values in the FREQ variable or the censoring variable, you can compute the survivor function estimates for new observations or for settings of explanatory variables not present in the data without affecting the model fit.

No OUTPUT data set is created if the model contains a time-dependent variable defined by means of programming statements.

The following list explains specifications in the OUTPUT statement.

OUT= SAS-data-set

  • names the output data set. If you omit the OUT= option, the OUTPUT data set is created and given a default name using the DATA n convention.

keyword=name

  • specifies the statistics included in the OUTPUT data set and names the new variables that contain the statistics. Specify a keyword for each desired statistic (see the following list of keywords), an equal sign, and either a variable or a list of variables to contain the statistic. The keywords that accept a list of variables are DFBETA, RESSCH, RESSCO, and WTRESSCH. For these keywords, you can specify as many names in name as the number of explanatory variables specified in the MODEL statement. If you specify k names and k is less than the total number of explanatory variables, only the changes for the first k parameter estimates are output. The keywords and the corresponding statistics are as follows :

    DFBETA

    approximate changes in the parameter estimates ( ˆ’ ( j ) ) when the j th observation is omitted. These variables are a weighted transform of the score residual variables and are useful in assessing local influence and in computing robust variance estimates.

    LD

    approximate likelihood displacement when the observation is left out. This diagnostic can be used to assess the impact of each observation on the overall fit of the model.

    LMAX

    relative influence of observations on the overall fit of the model. This diagnostic is useful in assessing the sensitivity of the fitof the model to each observation.

    LOGLOGS

    log of the negative log of SURVIVAL

    LOGSURV

    log of SURVIVAL

    NUM_ LEFT

    number of subjects at risk at the observation time j (or at the right endpoint of the at risk interval when a counting process MODEL specification is used)

    RESDEV

    deviance residual j . This is a transform of the martingale residual to achieve a more symmetric distribution.

    RESMART

    martingale residual j . The residual at the observation time j can be interpreted as the difference over [0 , j ] in the observed number of events minus the expected number of events given by the model.

    RESSCH

    Schoenfeld residuals. These residuals are useful in assessing the proportional hazards assumption.

    RESSCO

    score residuals. These residuals are a decomposition of the first partial derivative of the log likelihood. They can be used to assess the leverage exerted by each subject in the parameter estimation. They are also useful in constructing robust sandwich variance estimators.

    STDXBETA

    standard error of the estimated linear predictor,

    SURVIVAL

    survivor function estimate click to expand , where j is the observation time

    WTRESSCH

    weighted Schoenfeld residuals. These residuals are useful in investigating the nature of nonproportionality if the proportional hazard assumption does not hold.

    XBETA

    estimate of the linear predictor,

    The following options can appear in the OUTPUT statement after a slash (/).

ORDER= sort_ order

  • specifies the order of the observations in the OUTPUT data set. Available values for sort_ order are

    DATA

    requests that the output observations be sorted the same as the input data set.

    SORTED

    requests that the output observations be sorted by strata and descending order of the time variable within each stratum.

    The default is ORDER=DATA.

METHOD= method

  • specifies the method used to compute the survivor function estimates. The two available methods are

    CH EMP

    specifies that the empirical cumulative hazard function estimate of the survivor function is to be computed; that is, the survivor function is estimated by exponentiating the negative empirical cumulative hazard function.

    PL

    specifies that the product-limit estimate of the survivor function is to be computed. The default is METHOD=PL.

Programming Statements

Programming statements are used to create or modify the values of the explanatory variables in the MODEL statement. They are especially useful in fitting models with time-dependent explanatory variables. Programming statements can also be used to create explanatory variables that are not time dependent. For example, you can create indicator variables from a categorical variable and incorporate them into the model. PROC PHREG programming statements cannot be used to create or modify the values of the response variable, the censoring variable, the frequency variable, or the strata variables.

The following DATA step statements are available in PROC PHREG:

  ABORT   ARRAY   assignment statements   CALL   DO   iterative DO   DO UNTIL   DO WHILE   END   GOTO   IF-THEN/ELSE   LINK-RETURN   PUT   SELECT   SUM statement  

By default, the PUT statement in PROC PHREG writes to the Output window instead of the Log window. If you want the results of the PUT statements to go to the Log window, add the following statement before the PUT statements:

  FILE LOG;  

DATA step functions are also available. Use these programming statements the same way you use them in the DATA step. For detailed information, refer to SAS Language Reference: Dictionary .

Consider the following example of using programming statements in PROC PHREG. Suppose blood pressure is measured at multiple times during the course of a study investigating the effect of blood pressure on some survival time. By treating the blood pressure as a time-dependent explanatory variable, you are able to use the value of the most recent blood pressure at each specific point of time in the modeling process rather than using the initial blood pressure or the final blood pressure. The values of the following variables are recorded for each patient, if they are available. Otherwise, the variables contain missing values.

Time

survival time

Censor

censoring indicator (with 0 as the censoring value)

BP0

blood pressure on entry to the study

T1

time 1

BP1

blood pressure at T1

T2

time 2

BP2

blood pressure at T2

The following programming statements create a variable BP . At each time T , the value of BP is the blood pressure reading for that time, if available. Otherwise, it is the last blood pressure reading.

  proc phreg;   model Time*Censor(0)=BP;   BP = BP0;   if Time>=T1 and T1^=. then BP=BP1;   if Time>=T2 and T2^=. then BP=BP2;   run;  

For other illustrations of using programming statements, see the Getting Started section on page 3217 and Example 54.4 on page 3285.

STRATA Statement

  • STRATA variable < ( list ) >< ... variable < ( list ) >>< /option > ;

The proportional hazards assumption may not be realistic for all data. If so, it may still be reasonable to perform a stratified analysis. The STRATA statement names the variables that determine the stratification. Strata are formed according to the nonmissing values of the STRATA variables unless the MISSING option is specified. In the STRATA statement, variable is a variable with values that are used to determine the strata levels, and list is an optional list of values for a numeric variable. Multiple variables can appear in the STRATA statement.

The values for variable can be formatted or unformatted. If the variable is a character variable, or if the variable is numeric and no list appears, then the strata are defined by the unique values of the variable. If the variable is numeric and is followed by a list, then the levels for that variable correspond to the intervals defined by the list. The corresponding strata are formed by the combination of levels and unique values. The list can include numeric values separated by commas or blanks, value to value by value range specifications, or combinations of these.

For example, the specification

  strata age (5, 10 to 40 by 10) sex ;  

indicates that the levels for age are to be less than 5, 5 to 10, 10 to 20, 20 to 30, 30 to 40, and greater than 40. (Note that observations with exactly the cutpoint value fall into the interval preceding the cutpoint.) Thus, with the sex variable, this STRATA statement specifies 12 strata altogether.

The following option can be specified in the STRATA statement after a slash (/).

MISSING

  • allows missing values ( ˜. for numeric variables and blanks for character variables) as valid STRATA variable values. Otherwise, observations with missing STRATA variable values are deleted from the analysis.

TEST Statement

  • label: > TEST equation1 < , ... , equationk >< /option > ;

The TEST statement tests linear hypotheses about the regression coefficients. PROC PHREG performs a Wald test for the joint hypothesis specified in a single TEST statement. Each equation specifies a linear hypothesis; multiple equations (rows of the joint hypothesis) are separated by commas. The label, which must be a valid SAS name, is used to identify the resulting output, and should always be included. You can submit multiple TEST statements.

The form of an equation is as follows:

  • term < ± term ... >< = < ± term < ± term ... >>>

here term is a variable or a constant or a constant times a variable. The variable is any explanatory variable in the MODEL statement. When no equal sign appears, the expression is set to 0. The following code illustrates possible uses of the TEST statement:

 proc phreg;     model time= a1 a2 a3 a4;     Test1: Test a1, a2;     Test2: Test a1=0,a2=0;     Test3: Test a1=a2=a3;     Test4: Test a1=a2,a2=a3;  run; 

Note that the first and second TEST statements are equivalent, as are the third and fourth TEST statements.

The following options can be specified in the TEST statement after a slash (/).

AVERAGE

  • enables you to assess the average effect of the variables in the given TEST statement. An overall estimate of the treatment effect is computed as a weighted average of the treatment coefficients as illustrated in the following code:

      TREATMENT: test trt1, trt2, trt3, trt4 / average;  

    Let ² 1 , ² 2 , ² 3 and ² 4 be corresponding parameters for trt1, trt2, trt3, and trt4, re-spectively. Let = ( 1 , 2 , 3 , 4 ) ² be estimated coefficient vector and let ( ) be the corresponding variance estimate. Assuming ² 1 = ² 2 = ² 3 = ² 4 . The average treatment effect is estimated by c ² , where click to expand and 1 4 =(1 , 1 , 1 , 1) ² .

E

  • specifies that the linear coefficients and constants be printed. When the AVERAGE option is specified along with the E option, the optimal weights of the average effect are also printed in the same tables as the coefficients.

PRINT

  • displays intermediate calculations. This includes L ( ) L ² bordered by ( L ˆ’ c ), and [ L ( ) L ² ] ˆ’ 1 bordered by [ L ( ) L ² ] ˆ’ 1 ( L ˆ’ c ), where L is a matrix of linear coefficients and c is a vector of constants. See the section Testing Linear Hypotheses about Regression Coefficients on page 3247.

WEIGHT Statement

  • WEIGHT variable < /option > ;

The variable in the WEIGHT statement identifies the variable in the input data set that contains the case weights. When the WEIGHT statement appears, each observation in the input data set is weighted by the value of the WEIGHT variable. The WEIGHT values can be nonintegral and are not truncated. Observations with negative, zero or missing values for the WEIGHT variable are not used in the model fitting. When the WEIGHT statement is not specified, each observation is assigned a weight of 1. The WEIGHT statement is available for TIES=BRESLOW and TIES=EFRON only.

The following option can be specified in the WEIGHT statement after a slash (/):

NORMALIZE

NORM

  • causes the weights specified by the WEIGHT variable to be normalized so that they add up the actual sample size. With this option, the estimated covariance matrix of the parameter estimators is invariant to the scale of the WEIGHT variable.




SAS.STAT 9.1 Users Guide (Vol. 5)
SAS.STAT 9.1 Users Guide (Vol. 5)
ISBN: N/A
EAN: N/A
Year: 2004
Pages: 98

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net