The following statements are available in PROC LIFETEST:
PROC LIFETEST < options > ;
TIME variable < * censor (list) > ;
BY variables ;
FREQ variable ;
ID variables ;
STRATA variable < (list) ><... variable < (list) >> ;
SURVIVAL options ;
TEST variables ;
The simplest use of PROC LIFETEST is to request the nonparametric estimates of the survivor function for a sample of survival times. In such a case, only the PROC LIFETEST statement and the TIME statement are required. You can use the STRATA statement to divide the data into various strata. A separate survivor function is then estimated for each stratum, and tests of the homogeneity of strata are performed. However, if the GROUP = option is also specify in the STRATA statement, stratified tests are carried out to test the k samples defined by the GROUP= variable while controlling for the effect of the STRATA variables. You can use the SURVIVAL statement to output the estimates of the survivor function into a SAS data set. You can specify covariates in the TEST statement. PROC LIFETEST computes linear rank statistics to test the effects of these covariates on survival.
The PROC LIFETEST statement invokes the procedure. All statements except the TIME statement are optional, and there is no required order for the statements following the PROC LIFETEST statement. The TIME statement is used to specify the variables that define the survival time and censoring indicator. The STRATA statement specifies a variable or set of variables defining the strata for the analysis. The SURVIVAL statement enables you to specify a transformation to be used in the computation of the confidence intervals; it also enables you to output simultaneous confidence intervals. The TEST statement specifies a list of numeric covariates to be tested for their association with the response survival time. Each variable is tested individually, and a joint test statistic is also computed. The ID statement provides a list of variables whose values are used to identify observations in the product-limit estimates of the survival function. When only the TIME statement appears, no strata are defined and no tests of homogeneity are performed.
PROC LIFETEST < options > ;
The PROC LIFETEST statement invokes the procedure. The following options can appear in the PROC LIFETEST statement and are described in alphabetic order. If no options are requested , PROC LIFETEST computes and displays product-limit estimates of the survival distribution within each stratum and tests the equality of the survival functions across strata.
Task | Options | Description |
---|---|---|
Specify Data Set | DATA= | specifies the input SAS data set |
OUTSURV= | names an output data set to contain survival estimates and confidence limits | |
OUTTEST= | names an output data set to contain rank test statistics for association of survival time with covariates limits | |
Estimate Survival | METHOD= | specifies method to compute survivor function |
ALPHA= | sets confidence level for survival estimates | |
INTERVALS= | specifies interval endpoints for life-table estimates | |
NINTERVAL= | specifies number of intervals for life-table estimates | |
WIDTH= | specifies width of intervals for life-table estimates | |
Plot Survival | PLOTS= | specifies plots |
MAXTIME= | sets maximum value of time variable for plotting | |
Traditional High-Resolution Graphics | ANNOTATE= | specifies an annotate data set that adds features to plots |
CENSOREDSYMBOL= | defines symbol used for censored observations in plots | |
DESCRIPTION= | specifies string that appears in the description field of the PROC GREPLAY master menu for the plots | |
EVENTSYMBOL= | specifies symbol used for event observations in plots | |
GOUT= | specifies graphics catalog name for saving graphics output | |
LANNOTATE= | specifies an input data set that contains variables for local annotation | |
Line Printer Plots | LINEPRINTER | specifies that plots are produced by line printer |
FORMCHAR(1,2,7,9)= | defines characters used for line printer plot axes | |
NOCENSPLOT | suppresses the plot of censored observations | |
Control Output | NOPRINT | suppresses display of printed output |
NOTABLE | suppresses display of survival function estimates | |
INTERVALS= | displays only the product-limit estimate for the smallest time within each specified interval | |
TIMELIST= | specifies a list of time points at which the Kaplan-Meier estimates are displayed | |
REDUCEOUT | specifies that only INTERVAL= or TIMELIST= observations are listed in the OUTSURV= data set | |
Miscellaneous | ALPHAQT= | sets confidence level for survival time quartiles |
MISSING SINGULAR= | allows missing values to be a stratum level sets tolerance for testing singularity of covariance matrix of rank statistics | |
TIMELIM= | specifies the time limit used to estimate the mean survival time and its standard error |
ALPHA= value
specifies a number between 0.0001 and 0.9999 that sets the confidence level for the confidence intervals for the survivor function. The confidence level for the interval is 1 - ALPHA. For example, the option ALPHA=0.05 requests a 95% confidence interval for the SDF at each time point. The default value is 0.05.
ALPHAQT= value
specifies a number between 0.0001 and 0.9999 that sets the level for the confidence intervals for the quartiles of the survival time. The confidence level for the interval is 1 - ALPHAQT. For example, the option ALPHAQT=0.05 requests a 95% confidence interval for the quantiles of the survival time. The default value is 0.05.
ANNOTATE= SAS-data-set
ANNO= SAS-data-set
specifies an input data set that contains appropriate variables for annotation of the traditional high-resolution graphics. The ANNOTATE= option enables you to add features (for example, labels explaining extreme observations) to plots produced on graphics devices. The ANNOTATE= option cannot be used if the LINEPRINTER option or the experimental ODS GRAPHICS statement is specified. The data set specified must be an ANNOTATE= type data set, as described in SAS/GRAPH Software: Reference .
The data set specified with the ANNOTATE= option in the PROC LIFETEST statement is global in the sense that the information in this data set is displayed on every plot produced by a single invocation of PROC LIFETEST.
CENSOREDSYMBOL= name string
CS= name string
specifies the symbol value for the censored observations in the traditional high-resolution graphics. The value, name or string , is the symbol value specification allowed in SAS/GRAPH software. The default is CS=CIRCLE. If you want to omit plotting the censored observations, specify CS=NONE. The CENSOREDSYMBOL= option cannot be used if the LINEPRINTER option or the experimental ODS GRAPHICS statement is specified.
DATA = SAS-data-set
names the SAS data set used by PROC LIFETEST. By default, the most recently created SAS data set is used.
DESCRIPTION= string
DES= string
specifies a descriptive string of up to 40 characters that appears in the Description field of the traditional high-resolution graphics catalog. The description does not appear on the plots. By default, PROC LIFETEST assigns a description of the form PLOT OF vname vs hname , where vname and hname are the names of the y variable and the x variable, respectively. The DESCRIPTION= option cannot be used if the LINEPRINTER option or the experimental ODS GRAPHICS is specified.
EVENTSYMBOL= name string
ES= name string
specifies the symbol value for the event observations in the traditional high-resolution graphics. The value, name or string , is the symbol value specification allowed in SAS/GRAPH software. The default is ES=NONE. The EVENTSYMBOL= option cannot be used if the LINEPRINTER option or the experimental ODS GRAPHICS statement is specified.
FORMCHAR(1,2,7,9)= string
defines the characters used for constructing the vertical and horizontal axes of the line printer plots. The string should be four characters. The first and second characters define the vertical and horizontal bars, respectively, which are also used in drawing the steps of the product-limit survival function. The third character defines the tick mark for the axes, and the fourth character defines the lower left corner of the plot. If the FORMCHAR option in PROC LIFETEST is not specified, the value supplied, if any, with the system option FORMCHAR= is used. The default is FORMCHAR(1,2,7,9)=-+-. Any character or hexadecimal string can be used to customize the plot appearance. To send the plot output to a printer with the IBM graphics character set (1 or 2) or display it directly on your PC screen, you can use the following hexadecimal representation
formchar(1,2,7,9)='B3C4C5C0'x
or system option
formchar='B3C4DAC2BFC3C5B4C0C1D9'x
Refer to the chapter titled The PLOT Procedure, in the SAS Procedures Guide or the section System Options in SAS Language Reference: Dictionary for further information.
GOUT= graphics-catalog
specifies the graphics catalog for saving traditional high-resolution graphics output from PROC LIFETEST. The default is WORK.GSEG. The GOUT= option cannot be used if the LINEPRINTER option or the experimental ODS GRAPHICS statement is specified. For more information, refer to the chapter titled The GREPLAY Procedure in SAS/GRAPH Software: Reference .
INTERVALS= values
specifies a list of interval endpoints for the life-table method. These endpoints must all be nonnegative numbers . The initial interval is assumed to start at zero whether or not zero is specified in the list. Each interval contains its lower endpoint but does not contain its upper endpoint. When this option is used with the product-limit method, it reduces the number of survival estimates displayed by displaying only the estimates for the smallest time within each specified interval. The INTERVALS= option can be specified in any of the following ways:
list separated by blanks | intervals=1357 |
list separated by commas | intervals=1,3,5,7 |
xtoy | intervals=1 to 7 |
xtoybyz | intervals=1 to 7 by 1 |
combination of the above | intervals=1,3 to 5,7 |
For example, the specification
intervals=5,10 to 30 by 10
produces the set of intervals
{[0 , 5) , [5 , 10) , [10 , 20) , [20 , 30) , [30 , ˆ )}
LANNOTATE= SAS-data-set
LANN= SAS-data-set
specifies an input data set that contains variables for local annotation of traditional high-resolution graphics. You can use the LANNOTATE= option to specify a different annotation for each BY group, in which case the BY variables must be included in the LANNOTATE= data set. The LANNOTATE= option cannot be used if the LINEPRINTER option or the experimental ODS GRAPHICS statement is specified. The data set specified must be an ANNOTATE= type data set, as described in SAS/GRAPH Software: Reference .
If there is no BY-group processing, the ANNOTATE= and LANNOTATE= options have the same effects.
LINEPRINTER
LS
specifies that plots are produced by a line printer instead of by a graphical device. This option cannot be used if the experimental ODS GRAPHICS statement is specified.
MAXTIME= value
specifies the maximum value of the time variable allowed on the plots so that outlying points do not determine the scale of the time axis of the plots. This parameter only affects the displayed plots and has no effect on the calculations.
METHOD= type
specifies the method used to compute the survival function estimates. Valid values for type are as follows .
PL KM | specifies that product-limit (PL) or Kaplan-Meier (KM) estimates are computed. |
ACT LIFE LT | specifies that life-table (or actuarial) estimates are computed. |
By default, METHOD=PL.
MISSING
allows missing values for numeric variables and blank values for character variables as valid stratum levels. See the section Missing Values on page 2171 for details.
By default, PROC LIFETEST does not use observations with missing values for any stratum variables.
NINTERVAL= value
specifies the number of intervals used to compute the life-table estimates of the survivor function. This parameter is overridden by the WIDTH= option or the INTERVALS= option. When you specify the NINTERVAL= option, PROC LIFETEST tries to find an interval that results in round numbers for the endpoints. Consequently, the number of intervals may be different from the number requested. Use the INTERVALS= option to control the interval endpoints. The default is NINTERVAL=10.
NOCENSPLOT
NOCENS
requests that the plot of censored observations be suppressed when the PLOTS= option is specified. This option is not needed when the life-table method is used to compute the survival estimates, since the plot of censored observations is not produced.
NOPRINT
suppresses the display of output. This option is useful when only an output data set is needed. Note that this option temporarily disables the Output Delivery System (ODS).
For more information, see Chapter 14, Using the Output Delivery System.
NOTABLE
suppresses the display of survival function estimates. Only the number of censored and event times, plots, and test results are displayed.
OUTSURV= SAS-data-set
OUTS= SAS-data-set
creates an output SAS data set to contain the estimates of the survival function and corresponding confidence limits for all strata. See the section Output Data Sets on page 2183 for more information on the contents of the OUTSURV= SAS data set.
OUTTEST= SAS-data-set
OUTT= SAS-data-set
creates an output SAS data set to contain the overall chi-square test statistic for association with failure time for the variables in the TEST statement, the values of the univariate rank test statistics for each variable in the TEST statement, and the estimated covariance matrix of the univariate rank test statistics. See the section Output Data Sets on page 2183 for more information on the contents of the OUTTEST= SAS data set.
PLOTS= (type <(NAME=name)> <, ..., type <(NAME=name)> >)
creates plots of survival estimates or censored observations, where type is the type of plot and name is a catalog entry name of up to eight characters. Valid values of type are as follows:
CENSORED C | specifies a plot of censored observations by strata (product-limit method only). |
SURVIVAL S | specifies a plot of the estimated SDF versus time. |
LOGSURV LS | specifies a plot of the ˆ’ log(estimated SDF) versus time. |
LOGLOGS LLS | specifies a plot of the log( ˆ’ log(estimated SDF)) versus log(time). |
HAZARD H | specifies a plot of the estimated hazard function versus time (life-table method only). |
PDF P | specifies a plot of the estimated probability density function versus time (life-table method only). |
Parentheses are required in specifying the plots. For example,
plots = (s)
requests a plot of the estimated survivor function versus time, and
plots = (s(name=Surv2), h(name=Haz2))
requests a plot of the estimated survivor function versus time and a plot of the estimated hazard function versus time, with Surv2 and Haz2 as their catalog names, respectively.
REDUCEOUT
specifies that the OUTSURV= data set contains only those observations that are included in the INTERVALS= or TIMELIST= option. This option has no effect if the OUTSURV= option is not specified. It also has no effect if neither the INTERVALS= option nor the TIMELIST= option is specified.
SINGULAR= value
specifies the tolerance for testing singularity of the covariance matrix for the rank test statistics. The test requires that a pivot for sweeping a covariance matrix be at least this number times a norm of the matrix. The default value is 1E-12.
TIMELIM= time-limit
specifies the time limit used in the estimation of the mean survival time and its standard error. The mean survival time can be shown to be the area under the Kaplan-Meier survival curve. However, if the largest observed time in the data is censored, the area under the survival curve is not a closed area. In such a situation, you can choose a time limit L and estimate the mean survival curve limited to a time L (Lee 1992, pp. 72_76). This option is ignored if the largest observed time is an event time. Valid time-limit values are as follows:
EVENT LET | specifies that the time limit L is the largest event time in the data. TIMELIM=EVENT is the default. |
OBSERVED LOT | specifies that the time limit L is the largest observed time in the data. |
number | specifies that the time limit L is the given number .The number must be positive and at least as large as the largest event time in the data. |
TIMELIST= number-list
specifies a list of time points at which the Kaplan-Meier estimates are displayed. The time points are listed in the column labeled as Timelist. Since the Kaplan-Meier survival curve is a decreasing step function, each given time point falls in an interval that has a constant survival estimate. The event time that corresponds to the beginning of the time interval is displayed along with its survival estimate.
WIDTH= value
sets the width of the intervals used in the life-table calculation of the survival function. This parameter is overridden by the INTERVALS= option.
BY variables ;
You can specify a BY statement with PROC LIFETEST to obtain separate analyses on observations in groups defined by the BY variables.
The BY statement is more efficient than the STRATA statement for defining strata in large data sets. However, if you use the BY statement to define strata, PROC LIFETEST does not pool over strata for testing the association of survival time with covariates nor does it test for homogeneity across the BY groups.
Interval size is computed separately for each BY group. When intervals are determined by default, they may be different for each BY group. To make intervals the same for each BY group, use the INTERVALS= option in the PROC LIFETEST statement.
When a BY statement appears, the procedure expects the input data set to be sorted in order of the BY variables. If your input data set is not sorted in ascending order, use one of the following alternatives:
Sort the data using the SORT procedure with a similar BY statement.
Specify the BY statement option NOTSORTED or DESCENDING in the BY statement for the LIFETEST procedure. The NOTSORTED option does not mean that the data are unsorted but rather that the data are arranged in groups (according to values of the BY variables) and that these groups are not necessarily in alphabetical or increasing numeric order.
Create an index on the BY variables using the DATASETS procedure.
For more information on the BY statement, refer to the discussion in SAS Language Reference: Concepts . For more information on the DATASETS procedure, refer to the discussion in the SAS Procedures Guide .
FREQ variable ;
The variable in the FREQ statement identifies a variable containing the frequency of occurrence of each observation. PROC LIFETEST treats each observation as if it appeared n times, where n is the value of the FREQ variable for the observation. The FREQ statement is useful for producing life tables when the data are already in the form of a summary data set. If not an integer, the frequency value is truncated to an integer. If the frequency value is less than one, the observation is not used.
ID variables ;
The ID variable values are used to label the observations of the product-limit survival function estimates. SAS format statements can be used to format the values of the ID variables.
STRATA variable < (list) ><... variable < (list) >>< /options > ;
The STRATA statement indicates which variables determine strata levels for the computations . The strata are formed according to the nonmissing values of the designated strata variables. The MISSING option can be used to allow missing values as a valid stratum level. Other options enable you to specify various k -sample tests, trend tests and stratified tests.
In the preceding syntax, variable is a variable whose values determine the stratum levels and list is a list of endpoints for a numeric variable. The values for variable can be formatted or unformatted. If the variable is a character variable, or if the variable is numeric and no list appears, then the strata are defined by the unique values of the strata variable. More than one variable can be specified in the STRATA statement, and each numeric variable can be followed by a list. Each interval contains its lower endpoint but does not contain its upper endpoint. The corresponding strata are formed by the combination of levels. If a variable is numeric and is followed by a list, then the levels for that variable correspond to the intervals defined by the list. The initial interval is assumed to start at ˆ’ˆ and the final interval is assumed to end at ˆ .
The specification of STRATA variables can have any of the following forms:
list separated by blanks | strata age(5 10 20 30) |
list separated by commas | strata age(5,10,20,30) |
xtoy | strata age(5 to 10) |
xtoybyz | strata age(5 to 30 by 10) |
combination of the above | strata age(5,10 to 50 by 10) |
For example, the specification
strata age(5,20 to 50 by 10) sex;
indicates the following levels for the Age variable
{( ˆ’ˆ , 5) , [5 , 20) , [20 , 30) , [30 , 40) , [40 , 50) , [50 , ˆ )}
This statement also specifies that the age strata is further subdivided by values of the variable Sex . In this example, there are 6 age groups by 2 sex groups, forming a total of 12 strata.
The specification of several variables (for example, ABC ) is equivalent to the A * B * C ... syntax of the TABLES statement in the FREQ procedure. The number of strata levels usually grows very rapidly with the number of STRATA variables, so you must be cautious when specifying the list of STRATA variables.
The following options can appear in the STRATA statement after a slash (/). Other than the MISSING option, these options are dedicated to the tests of the two or more samples of survival data.
GROUP= variable
specifies the variable whose formatted values identify the various samples whose underlying survival curves are to be compared. The tests are stratified on the levels of the STRATA variables. For instance, in a multicenter trial in which two forms of therapy are to be compared, you specify the variable identifying therapies as the GROUP= variable and the variable identifying centers as the STRATA variable, in order to perform a stratified 2-sample test to compare the therapies while controlling the effect of the centers.
MISSING
allows missing values to be a stratum level or a valid value of the GROUP= variable.
NODETAIL
suppresses the display of the rank statistics and the corresponding covariance matrices for various strata. If the TREND option is specified, the display of the scores for computing the trend tests is suppressed.
NOTEST
suppresses the k -sample tests, stratified tests, and trend tests
TREND
computes the trend tests for testing the null hypothesis that the k population hazards rate are the same versus an ordered alternatives. If there is only one STRATA variable and the variable is numeric, the unformatted values of the variable are used as the scores; otherwise , the scores are 1 , 2 ,..., in the given order of the strata.
TEST=( list )
enables you to select the weight functions for the k -sample tests, stratified tests, or trend tests. You can specify a list containing one or more of the following keywords.
LOGRANK | specifies the log-rank test |
WILCOXON | specifies the Wilcoxon test. The test is also referred to as the Gehan test or the Breslow test. |
TARONE | specifies the Tarone-Ware test |
PETO | specifies the Peto-Peto test. The test is also referred to as the Peto-Peto-Prentice test. |
MODPETO | specifies the modified Peto-Peto test |
FLEMING( 1, 2) | specifies the family of tests in Harrington and Fleming (1982), where 1 and 2 are nonegative numbers. FLEMING( 1, 2) reduces to the Fleming-Harrington G family (Fleming and Harrington 1981) when 2=0, which you can specify FLEMING( ) with one argument. When =0, the test becomes the log-rank test. When =1, the test should be very close to the Peto-Peto test. |
LR | specifies the likelihood ratio test based on the exponential model. |
ALL | specifies all the nonparametric tests with 1=1 and 2=0 for the FLEMING(. , .) test. |
By default, TEST=(LOGRANK WILCOXON LR) for the k -sample tests, and TEST=(LOGRANK WILCOXON) for stratified and trend tests.
SURVIVAL options ;
The SURVIVAL statement creates an output SAS data set containing the results of the estimation of the survivor function. Although you can use the OUTSURV= option in the PROC LIFETEST statement to produce the output data set, the SURVIVAL statement enables you to output confidence bands and to specify a transformation of survival time in the computation of the pointwise confidence intervals and the confidence bands. Options in the PROC LIFETEST statements (ALPHA=, INTERVALS=, REDUCEOUT, and TIMELIST=) that applies to the OUTSURV= data can also be specified in the SURVIVAL statements. You can plot these survival estimates using the experimental ODS graphics (see the section ODS Graphics on page 2190).
Task | Options | Description |
---|---|---|
Specify Data Set | OUT= | specifies the output SAS data set |
Specify Transformation | CONFTYPE= | specifies the transformation for the computation of pointwise and simultaneous confidence intervals for the survivor function |
Specify Confidence Bands | CONFBAND= | specifies the confidence bands to be output |
BANDMAX= | specifies the maximum time for the confidence bands | |
BANDMIN= | specifies the minimum time for the confidence bands | |
Standard Errors | STDERR | outputs the standard errors |
BANDMAXTIME= value
BANDMAX= value
specifies the maximum time for the confidence bands. The default is the largest observed event time. If the specified BANDMAX= time exceeds the largest observed event time, it is truncated to the largest observed event time.
BANDMINTIME= value
BANDMIN= value
specifies the minimum time for the confidence bands. The default is the smallest observed event time. For the equal precision band , if the BANDMIN= value is less than the smallest observed event time, it is defaulted to the smallest observed event time.
CONFBAND= keyword
specifies the confidence bands to output. Confidence bands are available only for the product-limit method. You can use the following keywords :
ALL | outputs both the Hall-Wellner and the equal precision confidence bands. |
EP | outputs the equal precision confidence band. |
HW | outputs the Hall and Wellner confidence band. |
CONFTYPE= keyword
specifies the transformation applied to S ( t ) to obtain the pointwise confidence intervals as well as the confidence bands. The following keywords can be used and the default is CONFTYPE=LOGLOG.
ASINSQRT | the arcsine-square root transformation
|
LOGLOG | the log-log transformation
This is also referred to as the log cumulative hazard transformation since it is applying the logarithmic function log( . ) to the cumulative hazard function. Collett (1994) and Lachin (2000) refer it as the complementary log-log transformation. |
LINEAR | the identity transformation
|
LOG | the logarithmic transformation
|
LOGIT | the logit transformation
|
OUT= SAS-Data-Set
names the SAS data set that contains the survival estimates. If the OUT= option is omitted, PROC LIFETEST creates an output SAS data set with the default name DATA n . If you do not want to create this output SAS data set, set OUT=_ NULL_ .
STDERR
specifies that the standard error of the survivor function (SDF_ STDERR) be output. If the life-table method is used, the standard error of the density function (PDF_ STDERR) and the standard error of the hazard function (HAZ_ STDERR) are also output.
TEST variables ;
The TEST statement specifies a list of numeric (continuous) covariates that you want tested for association with the failure time.
Two sets of rank statistics are computed. These rank statistics and their variances are pooled over all strata. Univariate (marginal) test statistics are displayed for each of the covariates.
Additionally, a sequence of test statistics for joint effects of covariates is displayed. The first element of the sequence is the largest univariate test statistic. Other variables are then added on the basis of the largest increase in the joint test statistic. The process continues until all the variables have been added or until the remaining variables are linearly dependent on the previously added variables.
See the section Rank Tests for the Association of Survival Time with Covariates on page 2180 for more information.
TIME variable < * censor(list) > ;
The TIME statement is required. It is used to indicate the failure time variable, where variable is the name of the failure time variable that can be optionally followed by an asterisk, the name of the censoring variable, and a parenthetical list of values that correspond to right censoring. The censoring values should be numeric, nonmissing values. For example, the statement
time T*Flag(1,2);
identifies the variable T as containing the values of the event or censored time. If the variable Flag has value 1 or 2, the corresponding value of T is a right-censored value.