82 - Congenital Vascular Lesions of the Lungs

Editors: Shields, Thomas W.; LoCicero, Joseph; Ponn, Ronald B.; Rusch, Valerie W.

Title: General Thoracic Surgery, 6th Edition

Copyright 2005 Lippincott Williams & Wilkins

> Table of Contents > Volume I - The Lung, Pleura, Diaphragm, and Chest Wall > Section XV - Statistical Analysis and Trial Design > Chapter 97 - Clinical Trial Design

Chapter 97

Clinical Trial Design

John J. Crowley

For thoracic surgery as well as for any other discipline of medicine, the best path toward increased knowledge and better outcomes for patients is through carefully designed and conducted clinical trials. Trials are often categorized as phases. Phase I is typically the first experience in humans with a drug, device, or procedure. An important objective in a phase I trial is the establishment of safety. In cancer clinical trials of new drugs, the objective is often to find the highest dose that does not cause excessive toxicity, as this will likely be the most effective dose. The objectives in phase II trials often involve further safety considerations as well as the establishment of sufficient efficacy that the drug, device, procedure, or regimen deserves further study. The usual objective in a phase III trial is to compare one treatment approach with another, often a new, experimental approach to a standard regimen. Randomization is used in phase III trials to ensure comparability of patients across treatments.

The most important elements in any clinical trial are a clear statement of objectives; careful definition of eligibility, treatments, and end points; statistical considerations consistent with the objectives; careful attention to quality control at all levels, including data item definitions, data collection procedures, training for and review of treatment delivery, data management, and statistical analysis; and a clear reporting of results.

STATISTICAL BACKGROUND

End point data can be classified as being either categorical (qualitative) or measurement (quantitative). Categorical data result from end points that can be classified according to one of several mutually exclusive categories based on a predetermined set of criteria. Examples include surgical mortality (yes/no) and response to treatment (may be yes/no or several categories). Measurement data result from end points that are measured quantities. An important special case of measurement data is time to event data (or survival data), for example, the time from entry on a study until death. A characteristic of survival data is the presence of censoring arising due to the fact that not all patients have experienced the event of interest by the time the study is completed and analyses are performed. For such censored patients we know that the time from entry until the event of interest is at least as long as the time from the patient's entry on the study to the time of the analysis, but we do not know the actual event time. Statistical techniques that have been developed to incorporate these censored observations into the analysis are generally referred to as survival analysis.

The essence of the statistical argument is to make inferences from the patients at hand, the sample, to the universe of all patients, the population. The key concept is that outcomes observed in a sample of patients vary from one sample to the next, so that such observations can only be viewed as estimates of what would be observed in the population. To take the simplest case of a dichotomous outcome such as tumor response or surgical mortality (yes/no), the probability of all possible outcomes with a sample of size N can be determined once one knows the chance p of a success for each patient: this is known as the binomial distribution. With r successes in a sample of N patients, the success rate, r/N (often denoted p) estimates (but does not determine) the success rate p in the population. Using the properties of the binomial distribution (or an approximation using the normal distribution, or bell shaped curve), the statistician can derive not just the point estimate

but also an interval estimate or confidence interval, which has the interpretation of expressing the likely or credible range of possibilities for the true rate p based on the observed rate

. For example, an approximate confidence interval for p is of the form

Here Z is taken from tables of the normal distribution, and has the value 1.96 for a 95% confidence interval (in repeated experiments, the population rate p will be contained in the 95% confidence interval 95% of the time).

Another key distribution is the exponential distribution, useful in describing survival outcomes. The probability of surviving to time t can be expressed by the survival curve S(t), the probability of surviving at least to time t. Note that

P.1415


S(0) = 1 and S(t) decreases toward 0 as t increases. The median survival time, the time past which one half of the patients are expected to live, is that time m for which S(m) = 0.5. Another quantity of interest is the hazard function or hazard rate, often denoted (t). This function is the instantaneous rate of failure loosely, the chance of dying at time t given one is alive just before that time. For exponential distributions, the hazard function (t) is a constant , and the survival distribution is given by S(t) = exp( t), where exp is the exponential function. Under the assumption of exponential survival, the median survival m is m = ln(0.5)/ , where ln is the natural logarithm. The assumption of a constant hazard function is particularly useful for deriving sample sizes for designing clinical trials. However, the estimation of S(t), both point estimates and confidence intervals, most often proceeds without this assumption, according to a technique described by Kaplan and Meier (1958).

Clinical trials are often designed using the statistical framework of hypothesis testing. In this framework a null hypothesis H0, often the status quo, is posited, and the purpose of the trial is to reject that null hypothesis in favor of an alternative hypothesis H1. For example in the comparison of two treatments, an approach A and an approach B, with regard to the outcome operative mortality, the null hypothesis might be that the two mortality rates pA and pB are the same, or H0 : pA = pB. The alternative hypothesis could take one of two forms. If approach A is standard and approach B is experimental, the alternative hypothesis might be that the experimental approach is better than the standard. In terms of operative mortality rates this is written as H1 : pA< pB. We will stay with the status quo unless the new approach proves to be better (the issue of whether the new approach is worse is not really of interest). This is known as a one-sided test. If approach A and approach B are two competing standards, we might be interested in seeing if one of the two is better, and if so, which one. This is a two-sided test and the corresponding alternative hypothesis is denoted H1 : pA = pB.

Since we do not have the whole population of patients but only a sample, we don't know the truth. We do a trial and decide between the null and alternative hypotheses. The consequences of this decision relative to the true values of pA and pB are summarized below:

The probability of rejecting the null hypothesis when it is true is called a type I error, or , or significance level, or false positive rate. The acceptable type I error rate is decided in the planning stages of the trial. The most common significance level for testing is 5%. A concept related to the significance level of a test is the p value, the probability under the null hypothesis of observing a statistic that is equal to or more extreme than the one actually observed. The smaller the p value the more doubt is cast on the null hypothesis (although it still might be true). If we reject the null hypothesis when the p value is less than 0.05, then 5% of the time we will make a type I error.

The probability of not rejecting the null hypothesis when, in fact, the alternative hypothesis is true is called a type II error, or or false negative rate. The quantity 1 is called the power. The power of a test is a function of the true difference, in this case between pA and pB. The sample size for a comparative trial is calculated based on specifying fixed pA and pB and the power (commonly 80% or 90%) as well as the type I error rate. The sample size should be large enough to ensure high power for differences that are realistic and clinically meaningful.

GENERAL DESIGN CONSIDERATIONS

We discuss briefly the key elements of any clinical trial, which begin with the protocol document and carry through the conduct of the study.

Objectives

There should be a limited number of objectives, each carefully specified in terms of end points that are defined with clarity and can thus be measured without confusion. The statistician will often ask that one end point be considered the primary end point and the others secondary; this helps with the calculation of sample size and lends credibility when the results are reported.

Eligibility, Treatments, End Points

The eligibility criteria define the population to which the results of the trial can be safely generalized. For surgical trials the key criteria will often involve characterizing the patients for whom the surgical approaches are possible and likely to be of benefit. Care must be taken that the criteria can be applied by those screening and entering patients on the trial, which often means that staging criteria, nodal maps, and so forth should be included in the protocol or in an appendix. The criteria should involve factors known (or known in principle) at the time of entry on trial, not determined by events that occur later. There is a trade-off between criteria that are so narrow that the generalizability of the study is compromised, and those that are too broad, in which case the effectiveness of treatment may be masked by the inclusion of inappropriate patients with little chance of benefiting. This is a matter of judgment.

P.1416


The treatment approaches also need to be carefully spelled out and appropriate to the questions being asked. As with eligibility criteria, specification of treatment should be understandable and unambiguous to those delivering the therapy.

End points should be suitable for the objectives, and should have clear definitions. An end point such as operative mortality needs to be defined (e.g., any deaths within 30 days of surgery). Survival is defined as the time from registration on study to time of death due to any cause, or last contact (the latter case yielding censored survival times). Using time to death due to disease is problematic because cause of death information is often unreliable; there are statistical issues with this end point as well. Progression-free survival (or disease-free survival for situations in which treatment removes all known disease) is defined as the time from registration to the first observation of disease progression or death due to any cause, or last contact. If a patient has not progressed or died, progression-free survival is censored at the time of last follow-up. Because this end point requires disease to be assessed, the assessment schedule should be the same for all treatment arms.

A common end point for phase II trials in oncology is tumor response, previously defined as a 50% decrease in bidimensionally measurable disease lasting 4 weeks. However, as reported by Therasse and colleagues (2000), more recently this end point has been defined in the response evaluation criteria in solid tumors (RECIST) as a 30% decrease in unidimensionally measurable disease. Although this sounds like a simple, dichotomous outcome, there are problems in practice with multiple lesions, disease that is present but not measurable, measurement schedules that are not every 4 weeks, and so on, so that although tumor response may be a reasonable end point for phase II trials of efficacy, it is not generally appropriate as the primary outcome in a phase II comparative trial.

Side effects or toxicity of treatment will almost always be an end point, though often a secondary one. There is a degree of subjectivity to many of the common toxicity definitions, but there are at least standards such as the Common Toxicity Criteria developed by the U.S. National Cancer Institute. Version 3 of these criteria, due in 2003, will have better coverage of common surgical complications. Patient-reported quality of life is increasingly used as an end point in clinical trials, even a primary one. It is generally considered important to assess many facets of quality of life, such as physical and emotional functioning, and general and treatment-specific symptoms. There are many standard instruments in use, as described for example by Moinpour and co-workers (1989). There are also difficult statistical issues having to do with missing data (e.g., when patients are too sick to fill out the questionnaires). A good reference is Troxel and Moinpour (2001).

Statistical Considerations

The statistical considerations in a protocol should be consistent with the objectives and end points. Ordinarily the sample size will be driven by the primary objective. If the purpose is estimation, then the precision of estimation needs to be specified. If the aim of the study is comparative, then the significance level and the power for a specified difference need to be defined. Further considerations include rate of accrual of patients, power for comparison of any secondary end points, and some characterization of the analysis plan for all end points. For randomized trials, decisions that must be made include what stratification factors to use in the randomization, the specific randomization scheme to be used, and the timing of randomization.

Quality Control

Quality control pertains to all aspects of maintaining quality throughout the conduct of a clinical trial, including a carefully written protocol, data collection forms that will yield the information needed for analysis, data collection and management procedures, training in delivery of treatment and data collection, and central physician review of collected data, including operative and pathology reports. Quality control aspects unique to surgery trials include training on the protocol-specified surgical procedures, requiring experience with the required techniques (either by credentialing or as part of a preprotocol phase), and early centralized review of operative reports. An example (from somewhat below the thorax) is in the trial of extended lymph node dissection for gastric cancer, as reported by Bonenkamp and associates (1999). This was a study conducted in the Netherlands of limited (D1) dissection during surgery versus an extended (D2) dissection as practiced in Japan. All participating surgeons received a videotape and booklet, and were trained during a 4-month period by surgeons from Japan. All extended dissections during the conduct of the protocol were performed in the presence of eight specially trained Dutch surgeons. Regular meetings were held with all the supervising surgeons, and there was central review of operative and pathology reports, with feedback provided (the trial did not support the efficacy of the extended dissection procedure).

Reporting of Results

The report of a clinical trial should feature, in the results section, the definitive protocol-specified primary analysis of the primary end point. Secondary end points (especially those multivariate end points such as toxicity) should be reported in a more descriptive fashion. Other analyses, such as the comparison of treatment outcomes in subsets, and prognostic factor analyses, should be regarded as exploratory only. The patients included in the primary analysis all should be eligible patients, without regard to whether there were deviations in treatment delivered, by what is know as the intent-to-treat principle (some statistical purists would include all patients, not just all eligible patients, but this

P.1417


would seem to make generalization to the appropriate population difficult).

PHASE I AND PHASE II TRIALS

Phase I trials have a limited role in thoracic surgery because they are generally used to find a safe dose (in oncology often the maximum tolerated dose, or MTD) of a single drug or combination regimen. For studies seeking the MTD, typically three to six patients are entered into the study at a particular dose level and are monitored for toxicity. Doses are escalated or deescalated depending on the toxicities observed. A good overview is provided by Storer (2001).

There are two common types of phase II trials, studies of new agents performed in order to assess whether there is promise of activity, and pilot studies conducted to assess the activity and feasibility of previously tested treatments but in new combinations and schedules. Standard phase II studies, of investigational new drugs (INDs), are usually based on tumor response rates and are formulated statistically as a test of the null hypothesis H0: p = pA versus the alternative hypothesis H1: p = pB, where p is the probability of response, pA is the probability that, if true, would mean that the agent was not worth studying further, and pB is the probability that, if true, would mean the agent is active and worth further study. An alternative end point to tumor response that might apply more generally would be 6-month or 1-year survival.

One standard approach to the design of phase II IND trials, as reported by Green and Dahlberg (1992), is to accrue patients in two stages, with one-sided significance level (probability of rejecting H0: p = pA when it is true) approximately 0.05 and power (probability of accepting H1: p = pB when it is true) approximately 0.9. A specified number of patients is targeted for a first stage of accrual, and when that target is approached, the study is closed temporarily while responses are assessed. The study is stopped early if the agent appears unpromising; otherwise, it is reopened to a second stage of accrual. The agent is declared promising only if H0 is rejected after the second stage of accrual. For example, if pA = 0.05 and pB = 0.20, then the design calls for an accrual of 20 patients in the initial stage, and stopping in favor of the null hypothesis that the agent is not promising if there are no responses (or no patients who survive 6 months, or a year, depending on the primary end point). One or more responses call for the accrual of an additional 20 patients in the second stage, and a decision in favor of the alternative hypothesis that the agent is promising if there are five or more responses out of the 40 patients. If pA = 0.10 and pB = 0.30, then 20 patients are accrued in the initial stage, with stopping for negative results if one or fewer patients respond. Otherwise 15 more patients are accrued, and the agent is declared promising if 8 or more of the 35 patients respond. Calculations for general pA and pB are available on the web at http://www.swogstat.org/stat/public/TwoStage/2stage1.htm. Various other two-stage (or more) phase II designs have been proposed, the most commonly used being due to Simon (1989), who minimizes the expected number of patients required, subject to specific restraints.

Single-stage pilot studies are used if the regimen being studied consists of combinations of approaches already shown to be effective. The goal for a pilot study is often estimation (e.g., what is the operative mortality rate or 1-year survival rate to within a given level of uncertainty?), or sometimes a more formal test of hypothesis (e.g., is the hypothesis that the new regimen represents an improvement over standard, and is thus worthy of further testing, tenable or not?). In terms of estimation of a dichotomous outcome and important concept is that an approximate 95% confidence interval is of the form, where

is the estimated rate and N is the sample size. If the desire is to achieve a 95% confidence interval of the form

w, for a specified half-width w, then the sample size required is N = 3.84

(1

)/w2. Use of this formula requires knowledge of ;pcaron;, which is not known but can be estimated for planning purposes. For example, if the desired half-width w is 10%, and the estimate of

is 0.40, then the required sample size is N = 3.84 0.4 0.6/(0.1)2 = 92 patients. This formula for N achieves a maximum when ;pcaron; = 0.5, and thus (rounding 3.84 to 4) a conservative approximation to sample size for a given half-width w is given by N = 1/(w)2. This gives N = 100 for a desired 95% confidence interval of 0.1, N = 400 for a desired 95% confidence interval of 0.05, and so forth.

Pilot studies with a survival outcome (or disease-free survival) are becoming more common. In these cases, hypotheses are formulated in terms of survival curves, for example H0 : S = SA versus the alternative hypothesis H1 : S = SB where S is a survival curve, SA is the survival curve that, if true, would mean that the regimen was not worth studying further, and SB the survival curve that, if true, would mean the regimen was worth further study. The survival curves are often approximated for planning purposes by the exponential distribution. The sample size for such studies depends on not just the null and alternative hypothesis but also on the accrual rate and the planned length of follow-up after the accrual period. Calculations are available on the web at http://www.swogstat.org/stat/public/one/_survival.htm.

In some cases the aim of a phase II study is not to decide whether a particular regimen should be studied further, but to decide which of several new regimens should be taken to the next phase of testing. In these cases a randomized phase II or selection design may be used. Patients are randomized to the treatments under consideration, but the intent of the study is not a definitive comparison but rather to choose for further study a regimen that is likely to be better (or at least not much worse) than the other new regimens. The number of patients per arm is chosen to be large enough that if one treatment is superior by , and the rest are equivalent, the probability of choosing the superior treatment . Sample sizes for selection designs have been worked out both for response end points by Simon and colleagues (1985) and survival end points by Liu and co-workers (1993).

P.1418


A combination of end points (such as response and toxicity) might also be considered. Some designs for phase II studies that formally incorporate both response and toxicity into the decision rules are given by Conaway and Petroni (2001).

PHASE III TRIALS

Randomization

Randomization is the key to unbiased comparison of treatment approaches. However, randomization is not sufficient by itself to guarantee that comparable patients are accrued to the treatment arms unless the sample size is large. In small or moderately sized studies, major imbalances in important patient characteristics can occur by chance; thus, it is prudent to protect against this possibility by making sure the most important factors are reasonably well balanced between the arms. Patient characteristics incorporated into the randomization scheme to achieve such balance are called stratification factors, which should be those prognostic factors (preferably few) known to be strongly associated with outcome. Various schemes are used to achieve both random treatment assignment and balance across important prognostic factors. The permuted block design is perhaps the most common, in which the number of patients per arm is equalized after every block of n patients. Dynamic allocation schemes are also often used; instead of trying to balance treatment within small patient subsets, the treatment assigned (with high probability) is the one that achieves the best balance overall across the individual factors. A common approach described by Pocock and Simon (1975) is the use of sequential treatment.

The best time to randomize is the closest possible time to the start of the treatments to be compared. If randomization and the time of the start of treatment are separated, patients drop out for various reasons, resulting in a number of patients not treated as required. If these are excluded from the analysis, then the patient groups may no longer be comparable. Randomization before such treatment divergences may be required for practical reasons (to make obtaining consent easier, to add time for insurance coverage to be guaranteed, etc.) but should be avoided if possible. Blinding in surgical trials can rarely be achieved (sham surgeries) and is increasingly regarded as unethical. If a blinded trial is conducted, decisions must be made as to the timing and conditions for unblinding. It is generally best not to unblind anyone until the study is published.

Two Arm Trials

Randomized comparative trials with two arms (or treatment strategies) A and B fall naturally into the paradigm of hypotheses testing, with a null hypothesis H0 as the status quo to be disproved in favor of an alternative hypothesis H1 representing a change from the status quo. If the primary end point is a rate p (e.g., 1 year survival rate), then the null hypothesis might be that the two rates pA and pB are the same, or H0: pA = pB and the alternative hypothesis might be one sided, H1 : pA < pB, representing an improvement, or two sided, H1 : pA pB, representing a difference in either direction. The choice of false positive rate or significance level ( ), true positive rate or power (1 ), and the difference to be detected are the major determinants of sample size. A 5% false positive rate is usually felt to be reasonable, and a power of 80% to 90%, but other choices are possible depending on the perceived trade offs between false negatives and false positives. A formula for the sample size N for each arm of a two-arm trial for testing a hypothesized rate in one arm pA versus an alternative rate pB in the other arm is given by,

where [P with bar above] = (pA + pB)/2 and z and z are from tables of the normal distribution for given false positive and false negative rates and . For a 5% false positive rate z = 1.645 for a one sided test, and 1.96 for a two sided test, while z = 0.842 for a power of 80% and z = 1.282 for a power of 90%.

From this formula it can be seen that the closer the alternative pB is to the null value pA, the smaller the denominator and the larger the sample size. Sample size also increases as the false negative and false positive rates decrease. Table 97-1 presents the total sample size 2N for each arm required to detect selected choices of rates in a two ;arm clinical trial. (A slightly more accurate formula due to Fleiss and colleagues, 1980, was used). A program for these calculations can be found on the web at http://www.swogstat.org/stat/public/Binomial/binomial.htm.

When the primary end point is survival (or disease free survival, etc.), the hypotheses of interest can be expressed in a completely analogous way in terms of survival curves SA and SB. For planning purposes the simplifying assumption that the survival curves are exponential is often made. Even with that assumption, the sample size required for a given improvement (say in median survival) depends on the accrual rates and the minimum follow up period in a complicated way. A particular point to remember is that the key determinant is not the number of patients but the number of observed events (e.g., deaths), so that studies in a good prognosis group require many more patients (to observe the

P.1419


same number of events) than studies in a poor prognosis group. Table 97-2 illustrates the effect of significance level, power, median survival m in the control group, increase in median survival to be detected (expressed as a ratio R), and follow up on sample size. An accrual of 200 patients per year is assumed [the formula is described by Bernstein and Lagakos (1978)]. A general program for these calculations is on the web at http://www.swogstat.org/stat/public/Survival/two/_survival.htm.

Table 97-1. Total Sample Size 2N Required to Detect an Increase in pA by for Significance Level 0.05, Power 0.90, One-Sided Test

pA = 0.1 = 0.2 = 0.3
0.1 472 152 80
0.2 678 196 96
0.3 816 222 104
0.4 884 230 104

Table 97-2. Equivalence Trials

m R 1 T
0.05   0.8   1
1 T
0.05   0.9   1
1 T
0.05   0.9   5
1 T
0.01   0.09   5
1 1.25 330 430 360 530
1.5 130 170 110 170
2.0 60 80 40 60
5 1.25 640 790 570 800
1.5 310 390 220 310
2.0 170 210 100 140
Note: Sample size per arm required for a one-sided two-arm trial under various assumptions when the accrual rate is 200 per year.
, significance level; 1 , power; m, median survival time in years in the control group; R, ratio of median survival in the two groups; T, number in years of additional follow up after the accrual period.

There are occasions in which a new regimen is believed to be considerably better than the standard on some secondary end points such as acute toxicity, tolerability, quality of life, or cost to justify its acceptance if it is not necessarily better but at least comparable for the primary end point such as survival. An example might be a less aggressive surgical or other maneuver, which preserves more organ function. In this case, the null hypothesis reflecting the status quo might be that the experimental arm is a little (but not much) worse than the standard treatment, as against the alternative that the two are equal. Such trials are often called equivalence or noninferiority trials. Besides calling for a slightly different statement of the null and alternative hypotheses (e.g., H0 : pB = pA instead of H0 : pB = pB, and H1 : pB = pB instead of H1 : pB = pA+ ) the key feature of these designs is that the specified difference should be small in order for the claim of equivalence to be meaningful, calling for a large sample size. A program for power and sample size calculation for equivalence trials when survival is the end point is on the web at http://www.swogstat.org/stat/public/equivsurv.htm.

MULTIARM TRIALS

There are often several experimental regimens to be compared with a standard, or even several competing standards to be compared, leading to the consideration of multiarm trials. The main issues that arise in the design of such trials has to do with the fact that several comparisons are to be made, so that preservation of a planned false positive rate for the trial as a whole can only be achieved at the cost of a smaller false positive rate for each planned comparison, leading to a larger sample size per arm (not just overall) for such trials. Careful planning is thus required as to the comparisons of interest and the required sample size.

EXAMPLES

Some examples will serve to illustrate the major points.

Surgery as the Variable

The Southwest Oncology Group (SWOG) conducted a multiinstitution phase II pilot study (S8805) of concomitant chemotherapy (cisplatin + VP 16) and radiotherapy followed by surgery for patients with stage IIIA (T1 3 N2) or IIIB non small cell lung cancer, reported by Albain (1995) and Rusch (1993) and their colleagues. Among the objectives were estimation of response rate to the whole program of trimodality therapy, estimation of the proportion of patients undergoing resection, and estimation of the proportion of patients free of microscopic disease after surgery. A general assessment of feasibility, toxicity, and survival, and outcome in stage subsets (IIIA vs. IIIB) was also planned. The statistical design called for the accrual of N =100 eligible patients, in order to estimate rates with a 95% confidence interval of the worst observed difference rate or 10%. Careful attention was given to the drafting of forms to capture information on the surgery performed and the outcome, because this was a new area for this multidisciplinary but medical oncology dominated group. All treating institutions were credentialed in the sense that documentation of a multidisciplinary team was required. All data items and patient evaluations were reviewed by a surgeon (V. Rusch) and a medical oncologist (K.S. Albain).

The results of S8805 were considered sufficiently promising that a randomized phase III trial was planned, coordinated by the Radiation Therapy Oncology Group (RTOG) but with intellectual and accrual participation from SWOG as well as several other cooperative groups (RTOG 93 09, Intergroup study No. INT 0139). For various reasons, eligibility was restricted to patients with stage IIIA (T1 3N2) disease. The experimental arm was as in S8805

P.1420


with some modifications, including the addition of postoperative chemotherapy, and the control arm was the same regimen without surgery (but with a boost of radiotherapy given with postinduction chemotherapy). For practical reasons the randomization was performed before the start of induction chemoradiotherapy, so the comparison was made between treatment plans, one without and one with surgery. The primary (though by no means the only) end point was overall survival from the time of randomization. Data from S8805 as well as from pilots of the control arm that were published by Friess and co workers (1987, 1989) were used in the sample size calculations, which were based on the observed shapes of the survival curves from the pilot studies by the method of Lakatos (1988) and not on an assumption of exponential survival. A one sided type I error of 5% was chosen along with a type II error of 7%. This led to a required sample size of 556 eligible patients followed for a minimum of 30 months (the same sample size was noted to give 80% power for a one-sided 5% level test of 2 year survival as a dichotomous end point, assumed as 25% vs. 35%). Two formal interim analyses were planned at conservative levels according to the method described in Fleming and colleagues (1984). After some experience with slower than planned accrual (and thus more follow up per patient), the sample size was revised down to 510 eligible patients. The trial should be reported in 2003.

Surgeons as the Gatekeepers

In contrast to the situation with stage III non small cell lung cancer, where a key issue is the role of surgery, for patients with early non small cell lung cancer surgery, the initial treatment is a given, and a key question is the role of chemotherapy. An ad hoc group of thoracic surgeons and medical oncologists who called themselves the Bimodality Lung Oncology Team (BLOT) decided to perform a multiinstitution phase II pilot study of chemotherapy (carboplatin and docetaxel) before and after surgery (BLOT study X1195), in patients with clinical stage IB, stage II, or stage IIIA (T3N0 1) non small cell lung cancer. Objectives included estimation of response rate to induction therapy, estimation of the proportion of patients who proved to be resectable, and the surgical mortality rate. A sample size of N = 80 patients was chosen, in order to estimate rates with a 95% confidence interval of the worst observed rate or 11%. Surgical quality control was similar to that in S8805 (some of the participants were the same), although this study had fewer institutions with a higher volume of patients. As the study proceeded, a cohort with just preoperative chemotherapy was added. Preliminary results have been reported by Pisters and co workers (2000).

Based on this pilot study, a randomized trial of chemotherapy followed by surgery versus surgery alone was planned as an intergroup study coordinated by SWOG (S9900, sometimes known as BLOT or NOT). The experimental arm was based on the revision of X1195, which called for three cycles of chemotherapy before surgery and none after. Although surgery is not the variable in this trial, surgery is intended for all patients, and surgeons are crucial in enrolling patients and are thus key participants in the design and conduct of the study. The primary end point was survival from the time of randomization, which was before the initiation of any therapy, and sample size was derived using the assumption of exponential survival curves. A one-sided type I error of 2.5% and a type II error of 19% were used, to detect a 33% improvement in median survival (from 2.7 years in the control group), which with exponential distributions corresponds to a 10% absolute difference in the 5-year survival rate (from 28% to 38%). This led to a sample size of 600 patients accrued over 4 years and followed for a minimum of 3 years. Two formal interim analyses were planned, at conservative levels (0.25%). Accrual is approximately halfway to completion as of January 2003.

Surgical Procedures

The American College of Surgeons Oncology Group (ACOSOG) has recently initiated a randomized phase III study of the extent of mediastinal lymph node dissection in patients with surgical T1 or T2, N0 or N1 (nonhilar) non small cell lung cancer. Patients are randomized in the operating room, after the establishment of eligibility by lymph node sampling, to no further sampling versus complete mediastinal lymphadenectomy, both of course followed by pulmonary resection. The experimental arm is considered to be complete lymph node dissection, which might remove otherwise undiscovered cancer but might also compromise the patient and the immune system. The primary end point for sample size determination was overall survival from randomization. An absolute improvement in the 5-year survival rate of 8% was targeted, and an assumption of exponential survival was assumed (which translates to a 30% increase in median survival). Using a one-sided type I error of 5% and a power of 90%, the calculation yielded a sample size of 1,000 eligible patients accrued over 5 years and followed for a minimum of 5 years. Three formal interim analyses are planned, at conservative levels according to the method of Fleming and colleagues (1984). Special procedures for surgical quality control include skills verification (including a video) and rapid central review of operation and pathology reports from the first few cases of each surgeon.

CONCLUSION

We have barely touched on the main principles of clinical trial design. Expanded discussions can be found in the texts by Green and co workers (2002), which focuses on oncology, and that of Piantadosi (1997), which is both a more

P.1421


general and more mathematically deep treatment of the subject.

REFERENCES

Albain KS, et al: Concurrent cisplatin/etoposide plus chest radiotherapy followed by surgery for stages IIIA (N2) and IIIB non small-cell lung cancer: mature results of Southwest Oncology Group Phase II study 8805. J Clin Oncol 13:1880, 1995.

Bernstein D, Lagakos S: Sample size and power determination for stratified clinical trials. J Stat Computations Simulation 8:65, 1978.

Bonenkamp JJ, et al: Extended lymph-node dissection for gastric cancer. Dutch Gastric Cancer Group. N Engl J Med 340:908, 1999.

Fleiss JL, Tytun A, Ury HK: A simple approximation for calculating sample sizes for comparing independent proportions. Biometrics 36:346, 1980.

Fleming TR, Harrington DP, O'Brien PC: Designs for group sequential tests. Controlled Clinical Trials 5:348, 1984.

Friess GG, Balkadi M, Harvey WH: Concurrent cisplatin and etoposide with radiotherapy in locally advanced non small cell lung cancer. Cancer Treat Rep 71:681, 1987.

Friess GG, Balkadi M, Harvey WH: Simultaneous cisplatin and etoposide with radiation therapy in locoregional non small cell lung cancer. Final results of a pilot trial. In Gralla RJ, Einhorn LH (eds). Small Cell Lung Cancer and Non small Cell Lung Cancer. New York: Royal Society of Medicine Services Ltd, 1989, pp. 121 126.

Green S, Benedetti J, Crowley J: Clinical Trials in Oncology. 2nd Ed. Boca Raton, FL: CRC Press, 2002.

Green S, Dahlberg S: Planned vs attained design in phase II clinical trials. Stat Med 11:853, 1992.

Kaplan EL, Meier P: Nonparametric estimation from incomplete observations. J Am Stat Assoc 53:457, 1958.

Lakatos E: Sample size based on the log-rank statistic in complex clinical trials. Biometrics 44:229, 1988.

Liu PY, Dahlberg S, Crowley J: Selection designs for pilot studies based on survival end points. Biometrics 49:391, 1993.

Moinpour CM, et al: Quality of life end points in cancer clinical trials: review and recommendations. J Natl Cancer Inst 81:485, 1989.

Petroni GR, Conway MR. Designs based on toxicity and response. In Crowley J (ed): Handbook of Statistics in Clinical Oncology. New York: Marcel-Dekker, 2001, pp. 105 118.

Piantadosi S: Clinical Trials: A Methodologic Perspective. New York: John Wiley & Sons, 1997.

Pisters KMW, et al: Induction chemotherapy before surgery for early-stage lung cancer: a novel approach. Bimodality Lung Oncology Team. J Thorac Cardiovasc Surg 119:429, 2000.

Pocock SJ, Simon R: Sequential treatment assignment with balancing for prognostic factors in the controlled clinical trial. Biometrics 31:103, 1975.

Rusch VW, et al: Surgical resection of stage IIIA and stage IIIB non small-cell lung cancer after concurrent induction chemoradiotherapy. A Southwest Oncology Group trial. J Thorac Cardiovasc Surg 105:97, 1993.

Simon R: Optimal two-stage designs for phase II clinical trials. Controlled Clinical Trials 10:1, 1989.

Simon R, Wittes R, Ellenberg S: Randomized phase II clinical trials. Cancer Treat Rep 69:1375, 1985.

Storer B: Phase I trials. In Crowley J (ed): Handbook of Statistics in Clinical Oncology. New York: Marcel-Dekker, 2001, pp. 73 91.

Therasse P, et al: New guidelines to evaluate the response to treatment in solid tumors. European Organization for Research and Treatment of Cancer, National Cancer Institute of the United States, National Cancer Institute of Canada. J Natl Cancer Inst 92:205, 2000.

Troxel AB, Moinpour CM: Statistical analysis of quality of life. In Crowley J (ed): Handbook of Statistics in Clinical Oncology. New York: Marcel-Dekker, 2001, pp. 269 289.



General Thoracic Surgery. Two Volume Set. 6th Edition
General Thoracic Surgery (General Thoracic Surgery (Shields)) [2 VOLUME SET]
ISBN: 0781779820
EAN: 2147483647
Year: 2004
Pages: 203

flylib.com © 2008-2017.
If you may any questions please contact us: flylib@qtcs.net