81- Pulmonary Complications of Cystic Fibrosis

Editors: Shields, Thomas W.; LoCicero, Joseph; Ponn, Ronald B.; Rusch, Valerie W.

Title: General Thoracic Surgery, 6th Edition

Copyright 2005 Lippincott Williams & Wilkins

> Table of Contents > Volume I - The Lung, Pleura, Diaphragm, and Chest Wall > Section XV - Statistical Analysis and Trial Design > Chapter 96 - Statistical Analysis

Chapter 96

Statistical Analysis

Robert D. Stewart

This chapter aims to provide the thoracic surgeon with a working knowledge of statistics to understand the analyses commonly encountered in the thoracic literature. To critically assess the results of any study, a reader must comprehend the statistical tests being used. All too often, inappropriate statistical methods are used or incorrect conclusions are drawn from appropriate statistical methods.

This chapter is not intended to be a guide for performing statistical analyses. Few formulas are presented. Readers interested in further details are invited to consult the reading references. The growing number of powerful and user-friendly statistical software packages has made it tempting for researchers without adequate statistical training to engage in do it yourself statistics. I strongly recommend the use of a statistical consultant when preparing scientific data for publication. Many journals have recognized statistical experts as members of their editorial boards.

TYPES OF DATA

Statistical analysis begins with classification of the data, because the selection of an appropriate statistical test is impossible without knowledge of the type of variable being studied. There are two major classes of variables: categorical variables and continuous variables. Categorical data are either nominal, ordinal, or discrete. Nominal data have no relative values and are used to denote different but parallel items or groups. For example, gender can be classified as male = 0 and female = 1 or vice versa. When data can assume only two values (i.e., male/female, cancer/no cancer, alive/dead), they are classified as dichotomous. Nominal data are not all dichotomous. Blood groups, geographic region, or race may all be represented by numbers that have no numerical value (i.e., if Midwest = 1 and Southwest = 2, 1 is not less than 2). Categorical data that have relative values, or order, are termed ordinal. Ordinal variables are used to represent graded differences that are qualitative in nature. Cancer stages or disability categories can be represented by numbers where 2 is greater than 1 and 3 is greater than 2; however, unlike inches on a yardstick, the difference between 1 and 2 may not equal the difference between 2 and 3. This difference is important to consider when assessing the mean value of such ordinal values. Finally, there are discrete data that possess both order and relative magnitude. These data are in integer form and usually represent counts (e.g., number of pulmonary segments resected: 1, 2, 3 ).

The second major class of data is continuous. As the name implies, data of this type lie on a continuum of possible values. An example of a continuous datum is the mass, in grams, of resected lung tissue. It is worth noting that some data can be seen as either discrete or continuous (e.g., age in years). Also, continuous data can be transformed into categorical data by grouping data into ranges. For most statistical tests, nominal, ordinal, and discrete data are treated similarly, whereas continuous data have an entirely separate group of statistical tests.

SUMMARY STATISTICS

Summary statistics are used to describe a data set containing multiple data points with just a few numbers. Summary statistics account for two basic parameters: central tendency and spread. The central tendency of a data set can be described in three ways: the mean, median, and mode. The mean, or average, is the sum of all data points divided by the number of data points. Although the mean is the most common statistic used and often conveys the true balance point of a data set, it can be misleading in a skewed data set. A few very large or very small data points (outliers) may lead to a mean that does not describe the actual data in a meaningful or useful way. In such a situation, the median may be a more appropriate statistic. The median is the data point that falls at the midpoint when the data are arranged in order. The median is not affected by the values of the largest or smallest data points. Finally, the mode is the data value that appears most frequently. There may be no mode or several modes in any given data set.

P.1408


The spread of the data around its center is described by the range, variance, and standard deviation. The range is the difference between the smallest and the largest data points. Like the mean, the range is susceptible to distortion by outliers. The interquartile range is the difference between the 75th percentile value and 25th percentile value and can also provide a measure of the spread of a data set without the misrepresentation of a single extreme value.

Variance is a more reliable measure of spread because it is based on the entire data set, not just the largest and smallest values. Variance is the sum of the squares of the distance of each data point from the mean divided by n 1. The standard deviation of a data set is the square root of the variance and is more commonly used because it has the same units as the data that it is describing. Continuous data are conventionally listed by the mean the standard deviation. The standard error is another measure of spread and is most commonly encountered as standard error bars in a graphic representation of data. Standard error is calculated as the standard deviation divided by the square root of the number of data points in the set.

STATISTICAL TESTS

The Null Hypothesis and Statistical Error

Statistics are used most commonly to determine if relationships or correlations observed between experimental groups are indicative of an actual trend (i.e., if they can be applied to the population at large). Simply because a difference is found between two samples does not mean that this difference is reflected in the population at large. It is the fundamental assumption of all statistical tests that any groups being compared are not different or are not associated. This is the null hypothesis. The statistical test calculates the probability that this assumption is true (i.e., the pvalue). At some predetermined low probability that the null hypothesis is true, designated the level of significance, the null hypothesis is rejected and a statistically significant difference is established. The level of significance, designated , is commonly less than 5% (p < 0.05), but it is sometimes lower, such as less than 1% (p < 0.01). The label of statistical significance is often applied when the null hypothesis is rejected. However, it is important to remember that statistical significance is completely independent of clinical or scientific relevance.

Furthermore, the demonstration of statistical significance does not necessarily mean that the observed difference is real (i.e., applicable to the population at large). The p value itself is simply a probability that the difference is real. For this reason, it has become standard to report exact p values as opposed to noting p < 0.05. A p value of 0.001 indicates that a 1 in 1,000 chance exists that the test will demonstrate a statistically significant difference when in fact no difference actually exists; a p value of 0.05 means that a 1 in 20 chance exists of this error occurring. This kind of error is called a type I error. It occurs when the null hypothesis is falsely rejected (i.e., a false-positive statistical result). The lower the p value threshold for statistical significance, the less likely it is that a type I error will occur.

The converse of type I error is type II error. This occurs when the null hypothesis is not rejected, but in reality, in the population at large, a difference actually exists (i.e., a false-negative statistical result). Type II errors are much more common than type I errors, and they usually occur in studies with small sample sizes. It is particularly important to be cognizant of the possibility of a type II error when presented with clinically significant differences in the absence of statistical significance. This is particularly concerning when a small study finds a considerable clinical difference, for example a 50% greater incidence of a particular complication, but since the p value does not reach the level of significance, the researchers will claim that there is no difference in morbidity. The correct conclusion is that such a study did not prove a difference exists.

Power

The power of a study refers to its ability to detect a real difference. Given that the probability of having a type II error is classified as , power is 1 , or the probability of not having a type II error. Power is proportional to the sample size, the magnitude of the actual difference, and . Power is inversely proportional to the variance of the data. Using the power equation, one can determine the sample size required to achieve a particular level of power. Posthoc power analyses are frequently performed to show the power, or lack thereof, of a study (i.e., to explain why a study failed to detect statistically significant differences). The power of a test should be at least 80%.

A sample size calculation should be performed before any study is initiated to ensure sufficient statistical power so that money and effort is not wasted on a study destined to be inconclusive. The equations for sample size calculations are essentially algebraic rearrangements of the appropriate statistical test for the data to be collected. The equation is then solved for n. The sample size calculation requires the presumed mean and variance of the variables yet to be studied. These values must be provided based on a combination of pilot data, data from the literature, and clinical judgment.

Choosing a Statistical Test

When comparing groups of data, several basic questions are asked to arrive at the appropriate statistical test. First, what type of data are being analyzed: continuous or categorical? Second, how many groups or samples are being compared? Third, are the data normally distributed? Finally, are the outcome variables related to the time to the event

P.1409


(i.e., survival data)? The following sections describe the most commonly used tests for data sets categorized by the questions above.

Comparison of Categorical Variables Between Two or More Groups

For categorical variables compared between two or more groups, the Pearson chi square test and Fisher's exact test are most commonly used. For these tests, categorical data are arranged in a contingency table [R C (row column)], the simplest of which is a 2 2 table with one dichotomous dependent variable and one dichotomous independent variable. However, both the chi square test and Fisher's exact test are still applicable when more than two groups and more than two outcomes exist. The null hypothesis is that the proportions of observed outcomes are the same between all groups, in other words, that no difference in the frequency of each outcome exists between groups. The test statistic for chi square is derived from an equation that compares the observed frequencies with the expected frequencies. Expected frequencies are the frequencies that one would find if the null hypothesis were perfectly true, if no difference actually existed between the groups. The chi square statistic, 2, has an associated probability density function; the area under the curve bounded by the particular 2 represents the probability (p value) that the null hypothesis is true (Fig. 96-1). The assumptions for the chi-square test are (a) that the data represent independent random variables and (b) that each cell of the contingency table has a sufficiently large number of data points. The minimum number is not defined, but a conservative number is at least 10 observations per cell. Below this number, the test becomes inaccurate, and Fisher's exact test must be used.

Fig. 96-1. In the chi-square test, the test statistic (T) is calculated in an equation that incorporates the observed frequencies and the expected frequencies in a contingency table if the null hypothesis was perfectly true. The exact p value is given by the area to the right of T under the associated chi square probability density function. If this area is less than the predetermined level of statistical significance (i.e., < 0.05), then the null hypothesis is rejected and the observed differences in frequency are likely to be real. From Rosner BA: Fundamentals of Biostatistics. 2nd Ed. Boston: PWS Publishers, 1986, p. 313. With permission.

Fisher's exact test is also used to test for differences in categorical data. The test is based on the hypergeometric distribution and calculates the exact probability of all possible distributions of the data in the R C table that are as extreme or more extreme than the observed distribution. If the total probability of these extreme distributions is less than the level of significance, then the null hypothesis is rejected and the observed proportions are declared statistically different. Like the chi square test, Fisher's exact test can be used for an R C table of any size. Unlike the chi square test, however, Fisher's exact test can be used with cells containing any number of data points, including no data points.

Both the chi square test and Fisher's exact test can determine if a statistical association exists between categorical dependent and independent variables, but the direction of this association is determined from the raw data, not the test statistic. For example, if there were more patients with tumor remissions in the drug group than the placebo group, and the test statistic had a p value of less than 0.05, then one would conclude that the drug was statistically associated with increased remissions. However, if there are three groups, for example high-dose drug, low-dose drug, and placebo, and the proportion of remissions is somewhat higher in the low dose drug group than in the placebo group and higher in the high dose drug group than in the low dose drug group, a significant chi square test or Fisher's exact test for this 3 2 table would not distinguish which groups are actually significantly different. If one were to subsequently test each group against the other separately, the level of significance would need to be adjusted for multiple comparisons.

Comparing Categorical Data while Controlling for Other Variables

Most studies with multiple variables are going to rely on the regression analyses to sort out associations and control for other variables simultaneously. However, in a case where a single confounding variable needs to be controlled for in a data set of categorical variables, the Mantel Haenszel test splits, weights, and combines the data, thus controlling that variable. For example, if gender affected remission rates and the proportion of males and females was not identical between drug and placebo groups, then a statistical analysis of the association between drug and remission must account for gender. The Mantel Haenszel test of association accomplishes this where the chi square test and Fisher's exact test cannot. Of note, the Mantel Haenszel test can be confusing because the names are attached to several different statistical methods. Mantel Haenszel is also the eponym applied to the log rank test used to test for association in Kaplan Meyer survival curves.

P.1410


Comparison of Two Groups of Continuous Data

The most commonly used test for the comparison of two groups of continuous data is Student's t test. This test is predicated on the assumption that the continuous data being compared follow a normal distribution. It functions by comparing the degree of overlap between two normally shaped curves, each representing a single group's distribution of data. The overlap area determines the probability that the two samples are part of one large curve (i.e., the p value). If the curves have little overlap, then the null hypothesis is rejected and the two distributions are considered separate. The test statistic (T) is derived from an equation based on the between group differences in mean, variance, and sample size. T defines an area under the t probability distribution. The t distribution has two tails, and it is considered standard practice to perform a two-tailed test, which means the p value is the sum of the area under the t distribution greater than T and less than T (Fig. 96-2). The rationale is that, when testing the two means, it is impossible to know in advance whether one group will have a higher or lower mean than the other. Using a two tailed Student's t test is more conservative, lowers , and thus decreases the probability of a type I error.

When the two samples being tested are not independent, but are related based on subject matching or are the same subject assessed at two points in time, a paired Student's t test is appropriate. The paired Student's t test statistic does not compare group means. Instead, it compares the mean of the differences in the actual paired values. The T statistic generated defines an area under the t probability distribution as in the two independent sample Student's t test.

The determination of whether or not a data set is normally distributed is important because many commonly used statistical analyses are valid only for normally distributed data. Such tests are referred to as parametric tests, such as Student's t test. Normality can be determined with a statistical test such as the Wilkes Shapiro test or the Kolmogorov Smirnov test. If data are not normally distributed, a nonparametric statistical test is required.

Fig. 96-2. In the Student's t test, the test statistic ( ) is calculated in an equation that incorporates the means, variances, and sizes of the two samples. The exact p value is given by the area to the right of t and to the left of -t under the associated t distribution in a two-tailed Student's t test. If the total area is less than the predetermined level of statistical significance (i.e., < 0.05), then the null hypothesis is rejected, and the observed difference in means is likely to be real. From Rosner BA: Fundamentals of Biostatistics. 2nd Ed. Boston: PWS Publishers, 1986, p. 260. With permission.

For comparison of two continuous data sets not normally distributed, the nonparametric Wilcoxon rank sum test and the Mann Whitney U test must be used. This situation is most commonly encountered when the sample sizes are small. The Wilcoxon rank sum test is analogous to Student's t test but compares the medians, rather than the means, of two groups. Data from both groups are ordered sequentially and assigned ordinal ranks (i.e., 1st, 2nd, 3rd, etc.). The ranks of each group are summed and compared with the expected total if the groups had similar medians (i.e., an equal sum of ranks). The test statistic is based on the difference in either mean from the expected mean and defines an area below the standard normal curve that is the probability that the two samples are similar.

The Mann Whitney U test differs from the Wilcoxon rank sum test only in how the test statistic is generated. The same p value is generated with either test. They are essentially equivalent two sample nonparametric tests.

The nonparametric equivalent to the paired t test is the Wilcoxon sign rank test. The absolute differences between pairs are ranked in order and summed, but not until each rank is assigned a positive or negative value based on the difference in the corresponding pair. If no association exists, the positive and negative differences should cancel each other and the sum should be 0. A test statistic is generated in a manner similar to that used for the Wilcoxon rank sum test.

Comparison of Three or More Groups of Continuous Data

When more than two groups of continuous data exist, the appropriate statistical test is an analysis of variance (ANOVA). Whereas Student's t test compares the means of two groups, ANOVA compares the variances of multiple groups. ANOVA is based on the idea that a total variance exists that can be calculated for all the data in all the groups. This total variance is made up of the variance in the data from one group to the next group and the variance in the data within each group. For example, if blood pressure is being compared between a control group, a meditation group, and a -blocker group, variation in the blood pressure within the meditation group exists, and variation in the blood pressure between the meditation group and the control and -blocker groups exists. The ANOVA test statistic compares how much of the total variance is accounted for by within group variance with how much is accounted for by between group variance. The null hypothesis is that no association exists between the groups and the outcome (i.e., that neither meditation nor -blockers affect blood pressure). If the null hypothesis is correct, the total variance should be explained by within-group variance, and little between group variance should exist. The ratio of between group variance to within group variance is

P.1411


the basis for the ANOVA test statistic, F. A large enough F value leads to rejection of the null hypothesis. This means that an association exists between the study groups and the outcome. However, the F statistic does not indicate which groups differ from each other, only that significant differences exist between the groups.

Depending on the types of data being measured, there are several types of ANOVA. The previous example is a one way ANOVA because one continuous variable, blood pressure, was compared between several groups. Two way and multiway ANOVAs account for several covariates simultaneously. For example, one could ask whether gender and treatment (control, meditation, -blockers) affected the blood pressure. The two way ANOVA is superior to two consecutive one way ANOVA tests (i.e., treatment and blood pressure, and then gender and blood pressure) because a two way ANOVA detects interactions between the independent variables, treatment and gender in this example, by including an interaction term.

For ANOVA to be valid, certain conditions must be met. First, the data must follow a normal distribution. Second, there must be a common variance for all groups. Third, each observation must be independent. It is important to note that data that are not independent can be used in an ANOVA under certain circumstances, provided that the lack of independence is accounted for in the analysis. When multiple measurements are made on a population over time, such as in any longitudinal study, the simple fact that repeated observations are made on a single subject renders those observations nonindependent. One can account for such a design by including a subject time interaction variable in the analysis. Finally, ANOVA requires that one account for unbalanced data (i.e., unequal numbers of observations in each group).

In all ANOVA designs, a difference may be found among the group means, but each mean must then be compared with each other mean to find the specific differences. In order to find the individual groups that are different, multiple individual comparisons can be made. The problem with multiple comparisons is that the level of significance must be adjusted to avoid a type I error. That is, if one begins to compare each separate group to the others, the chance of finding a difference that is purely random increases. Therefore, the level of significance for all the comparisons must be adjusted downward so that the total is approximately 0.05. The most common correction for multiple comparisons is the Bonferroni method. The corrected level of significance is designated , where = divided by the number of comparisons being made. Other methodologies for determining include the Tukey and the Sheff , which include slightly more complex downward adjustments of to compensate for multiple comparisons.

Because ANOVA relies on the assumption of normally distributed data, it is a parametric test. A nonparametric equivalent of ANOVA, which is a generalization of the Wilcoxon rank sum test, is called the Kruskal Wallis test.

Fig. 96-3. Linear correlation example. The relationship between the baseline low-density lipoprotein (LDL) cholesterol level (x axis) and the 5 year LDL cholesterol level (y axis) in this clinical study may be expressed by the equation y = 0.455x + 19.2. The linear correlation of these data is statistically significant at the p < 0.001 level with a Pearson correlation coefficient (r) of 0.599. From Campos CT, Matts JP, Santilli SM, et al: Predictors of total and low-density lipoprotein cholesterol change after partial ileal bypass. Am J Surg 155:138, 1987. With permission.

To test for an association between two continuous variables, a linear correlation analysis is performed (Fig. 96-3). The Pearson correlation coefficient, r, estimates the linear relationship between two variables, although no distinction is made between variables in terms of one being the explanatory variable and the other being the response variable. If r = 0, no linear relationship exists. If r < 0, a negative linear relationship exists. If r < 0, a positive linear relationship exists. The correlation coefficient only refers to the sample data. To test if the sample data provide evidence that a relationship exists in the population from which they were taken, a test of association based on the t distribution is performed. The Pearson correlation coefficient is a parametric test. If the data are not normally distributed, the nonparametric Spearman's rank correlation coefficient is required. The most significant shortcoming of correlation analysis is that two variables may have a strong and important nonlinear relationship that is overlooked when a linear correlation is not found.

Multivariate Analysis with a Continuous Independent Variable

The association of both categorical and continuous independent variables with a single continuous dependent or outcome variable can be modeled in a linear relationship, linear regression analysis. The data must be normally distributed and independent, and the variance of the outcome variable must be constant across the entire range of the independent variables. Moreover, the anticipated relationship must be linear. The line of best fit is determined based on least squares analysis. The model can be calculated for more than one independent variable. These variables may

P.1412


be either continuous or dichotomous. The term dummy variable refers to such a dichotomous independent variable.

The linear regression model follows the following equation:

Where Y is the dependent variable, Xn are the independent variables, n are the coefficients of regression, and is the error term. The coefficients of regression represent the slope of each line and thus convey how much the outcome variable changes given a unit change in each independent variable. In the case of a dichotomous variable, the change is all or none because the X term is equal to either 0 or 1. All variables entered into a linear regression model have a coefficient of regression, but that does not mean that they are statistically significant. A test of significance based on the t distribution is performed for each variable. Multivariate linear regression allows one to determine an outcome variable's coefficient of regression and its significance while controlling for all other variables in the model. In this manner, effect modification and confounding can be controlled for using this powerful statistical tool.

The relative ease of performing multivariate linear regression on current statistical software packages has led to an increase in the use of this statistical method without adequate knowledge of how to build an appropriate model. Choosing the variables to enter into a multivariate linear regression model is not an exact science. Usually, a univariate test is done on independent variables and those below some predetermined level of statistical significance are entered into the model. This level is usually much higher than the typical level of significance (i.e., 0.10 0.20), to be sure that these variables are controlled for by the multivariate analysis.

However, choosing variables to enter into a multivariate linear regression model does not end with univariate analysis. Sometimes important variables such as age or gender are correctly forced into the model, even if they fail to demonstrate statistically significant effects by univariate analysis. This is done so that they also may be controlled for by the regression model. Clinical judgment also plays a significant role in the selection process. Commonly associated variables noted in the literature may be included in the model even if they do not meet the standard for inclusion during the preliminary univariate analysis. Finally, it is important to remember that many variables are not initially significant on univariate analysis because they are not linearly related. Careful analysis of the data in graphic form may make such a nonlinear relationship obvious. Through transformation of the data (e.g., squaring the data, taking the natural log, taking the square root) or other such manipulation, the relationship may become linear, thus allowing the variable to be accounted for in the regression model.

The important consideration in multivariate regression model design is how to enter the data. There is step forward design, step backward design, and a combination of the step forward and step backward designs. Each of these methods may lead to a different, though not necessarily invalid, model. As each variable is added to a model (step forward) or removed (step backward), other variables' places in the model changes. When an item is removed because it is not statistically significant, another variable may be elevated to statistical significance, or alternatively, a previously significant independent variable may be rendered statistically insignificant.

The final aspect of multivariate linear regression is the assessment of the goodness of fit of the model. Residual plots should be examined to assess fit. A calculation of goodness of fit, termed the coefficient of determination or the R2 value, must be performed. This value assesses how much of the variability in the outcome variable is explained by the model. A model with a number of significant independent predictors may have a poor fit and an R2 value of 0.07. This means that the model only explains 7% of the variation in Y, and is therefore not very helpful. No strict guidelines exist for what an acceptable R2 should be, but it is important to report this number whenever presenting data from a multivariate linear regression analysis.

Multivariate Analysis with a Dichotomous Independent Variable

Like linear regression, logistic regression is useful because it allows one to simultaneously model the relationship of several independent variables with a single outcome variable. One uses logistic regression instead of linear regression when the outcome variable is dichotomous rather than continuous (e.g., dead or alive). The model is based on fitting a linear relationship to the natural log of the odds ratio of the outcome variable by a method termed the maximum likelihood estimation. The odds ratio of a dichotomous variable is the probability of one outcome divided by the probability of the opposite outcome. Thus, logistic regression can be expressed by the equation:

Where p is the probability of the outcome variable being positive, Xn are the independent variables, n are the coefficients, and is the error term.

Building a logistic regression model and assessing the model's goodness of fit present many of the same problems as in a linear regression analysis. The interpretation of the results is quite different. Whereas in linear regression, n represented a linear change in the outcome variable for a unit change in the independent variable, in logistic regression, n represent the natural log of the odds ratio of the outcome variable for each unit change in the independent variable. By exponentiating both sides of the equation, e becomes the odds ratio of the outcome variable for each unit change in the independent variable. It is important to

P.1413


remember that a seemingly small increase in the odds ratio, for instance 1.005, may represent a small unit change in the independent variable (i.e., minutes or grams), but when converted to hours or kilograms, the associated odds ratio becomes quite significant clinically.

Analysis of Survival or Time-to-Event Data

The term survival analysis is misleading because this statistical tool is not limited to survival data, but is useful for all data that involve time until an event occurs, such as time until tumor recurrence or time until hospital discharge. Although these data are continuous, Student's t tests and ANOVA are not appropriate tests because they do not account for the time dependent nature of the data. That is, the group of subjects at the beginning of the study are eliminated throughout the duration of follow up because of the occurrence of the event. Once a subject dies, for example, he or she is no longer in the risk pool for death. The probability of any event occurring at any given point in time is termed the hazard rate. For this reason, survival analysis is sometimes referred to as a hazard function.

The two most common types of survival analyses are Kaplan Meier and Cox regression. The Kaplan Meier analysis is a nonparametric test in which the hazard rate is calculated at each point an event occurs. Only the subjects who remain at risk for the event are included in the denominator. Thus, patients who are lost to follow up and those who have experienced the event are excluded from the probability function at that time point. The resulting graph looks like a step function (Fig. 96-4). Two or more study groups can be compared to look for a difference in the survival or hazard curves. The test statistic is the Mantel Haenszel log rank test, which is essentially a weighted average of a series of chi-square tests performed at each time point. The weighting incorporated into this analysis makes the earlier time points more significant because of the larger number of subjects before censoring.

Cox regression is another survival method that is similar to logistic regression. Instead of using log of the odds ratio as in logistic regression, Cox regression uses the natural log of the hazard function set up as a linear function that includes time in the equation:

Where h(t) is the hazard function at time t, Xn are the independent variables, n are the coefficients, and t is time. In this model, t does not represent the intercept, but rather the baseline hazard function if none of the variables is significantly associated with survival. Cox regression analysis allows for simultaneous assessment of several covariates at the same time, as in multivariate linear regression and multivariate logistic regression. The interpretation of the coefficients n is similar to logistic regression and is equal to a unit increase in the log of the hazard ratio. The hazard ratio is the ratio of adjusted hazard at any given time to baseline hazard at that time. By exponentiating both sides of the equation, e becomes the hazard ratio of the outcome variable for each unit change in the independent variable.

Fig. 96-4. Kaplan-Meier analysis example. The proportion of patients surviving (y axis) at each time interval (x axis) is calculated using the total number of patients at risk at each time interval as the denominator. The resulting hazard function curve appears as a step function. The two survival curves in this clinical study are tested using a Mantel Haenszel log rank test and are significantly different at the p < 0.0001 level of statistical significance. From Sugarbaker DJ, Flores RM, Jaklitsch MT, et al: Resection margins, extrapleural nodal status, and cell type determine postoperative long-term survival in trimodality therapy of malignant pleural mesothelioma: results in 183 patients. J Thorac Cardiovasc Surg 117:54, 1999. With permission.

Reading References

Hosmer DW, Lemeshow S: Applied Logistic Regression. New York: John Wiley & Sons, 1989.

Klienbaum DG, et al: Applied Regression Analysis and Multivariate Methods. 3rd Ed. Pacific Grove, CA: Duxbury Press, 1998.

Pagano M, Gauvreau K: Principles of Biostatistics. Belmont, CA: Duxbury Press, 1993.

Rosner B: Fundamentals of Biostatistics. 4th Ed. Belmont, CA: Duxbury Press, 1995.



General Thoracic Surgery. Two Volume Set. 6th Edition
General Thoracic Surgery (General Thoracic Surgery (Shields)) [2 VOLUME SET]
ISBN: 0781779820
EAN: 2147483647
Year: 2004
Pages: 203

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net