Review ArticleThe Changing Face of Clinical Trials
Adaptive Designs for Clinical TrialsList of authors.
Randomized clinical trials serve as the standard for clinical research and have contributed immensely to advances in patient care. Nevertheless, several shortcomings of randomized clinical trials have been noted, including the need for a large sample size and long study duration, the lack of power to evaluate efficacy overall or in important subgroups, and cost. These and other limitations have been widely acknowledged as limiting medical innovation.1 Adaptive trial design has been proposed as a means to increase the efficiency of randomized clinical trials, potentially benefiting trial participants and future patients while reducing costs and enhancing the likelihood of finding a true benefit, if one exists, of the therapy being studied.2Table 1.
Adaptive designs are applicable to both exploratory and confirmatory clinical trials. Adaptive designs for exploratory clinical trials deal mainly with finding safe and effective doses or with dose–response modeling. The emphasis is on strategies that will assign a larger proportion of the participants to treatment groups that are performing well, reduce the number of participants in treatment groups that are performing poorly, and investigate a dose range that is larger than ranges in corresponding trials with nonadaptive designs, in order to select effective doses for the confirmatory stage of investigation. Control of the type I error rate is less of an issue. In Table 1, various types of adaptive designs for exploratory clinical trials are classified into categories that reflect the time sequence in which they would be performed in the drug-development process.
In confirmatory trials, the adaptive nomenclature refers to making prospectively planned changes to the future course of an ongoing trial on the basis of an analysis of accumulating data from the trial itself, in a fully blinded or unblinded manner, without undermining the statistical validity of the conclusions.3 However, modifications of randomized clinical trials that are performed in an unblinded manner are subject to closer regulatory scrutiny than those performed in a blinded manner. They require careful attention to statistical techniques and operational procedures to ensure that the implementation is scientific, ethical, and free from bias. In Table 1, different types of adaptations for confirmatory trials are classified into four major categories — seamless phase 2–3 designs, sample-size reestimation, group sequential designs, and population-enrichment designs — and the strengths and weaknesses of each type are identified in relation to corresponding nonadaptive designs. There is some overlap among the different categories. For example, sample-size reestimation could be implemented on its own or incorporated into group sequential, dose-selection, or population-enrichment designs.
In this review, we focus on adaptive designs of confirmatory clinical trials. We discuss the benefits and limitations of such designs, using four case studies that highlight the statistical and operational considerations that are the prerequisites for a successful trial. The statistical methods for hypothesis testing and parameter estimation are provided in the Supplementary Appendix, available with the full text of this article at NEJM.org.
Four Case Studies
Seamless Phase 2–3 Design — the INHANCE Trial
The Indacaterol to Help Achieve New COPD Treatment Excellence (INHANCE) trial was an adaptive two-stage (i.e., phase 2–3), confirmatory, randomized clinical trial of inhaled indacaterol, a once-daily long-acting beta2-agonist bronchodilator for the treatment of chronic obstructive pulmonary disease (COPD); the trial featured multiple treatment groups, with dose selection at the end of stage 1.4,5 In stage 1, patients with COPD were randomly assigned in a double-blind, double-dummy manner to one of seven groups to receive four doses of indacaterol, placebo, formoterol, or tiotropium; the last two regimens were considered to be standard-of-care comparators. Two of the four indacaterol doses were to be selected for further testing at stage 2 along with placebo and tiotropium. The final analysis would be based on the combined data from the two stages.
The primary efficacy objective was to show the superiority of at least one dose of indacaterol over placebo at week 12 with respect to the 24-hour postdose (trough) forced expiratory volume in 1 second (FEV1). Although the final efficacy analysis was to use the FEV1 data through week 12, the dose selection at the interim analysis was to be based on data from patients who had been treated through week 2 only, since indacaterol is known to reach pharmacodynamic steady state within 2 weeks.
The two most important statistical considerations for a design of this type are the dose-selection rule at the interim analysis and the statistical inference at the final analysis. The dose selection would have to be made by an external data and safety monitoring committee that had been equipped with clear, unambiguous decision rules for determining which doses to pick and also some flexibility to deviate from these rules in case of unexpected safety signals or a lack of dose response (see the Supplementary Appendix). Accordingly, a rather complex set of decision rules covering all anticipated contingencies was included in the charter for the data and safety monitoring committee (Table S1 in the Supplementary Appendix).6 The sections on Statistical Methodology in the Supplementary Appendix describe how the type I error is controlled when ineffective doses might be dropped at the end of stage 1 and multiple doses might be compared with a common control group in the final analysis.
In the INHANCE trial, the interim analysis was to be performed when 770 patients (110 patients per group) had completed at least 2 weeks of treatment (Fig. S1 in the Supplementary Appendix). On the basis of the detailed dose-selection guidelines that had been prespecified in the charter, the data and safety monitoring committee selected doses of 150 μg and 300 μg, whereupon the recruitment of patients was immediately resumed for the second stage of the trial. The final analysis was performed when 285 additional patients had been enrolled and evaluated. The difference between each indacaterol dose and either placebo or tiotropium was significant with respect to the primary and key secondary end points.5
This example shows several conditions that are essential for the successful implementation of an adaptive design. First, the highly quantitative, precise, and easily obtained early readout of end-point data made it possible to eliminate two of the trial groups quickly and thereby enroll many more patients in study groups that were receiving the doses and treatments of primary interest. Trials that require rapid recruitment or lengthy or complex patient follow-up (e.g., assessment of freedom from a heart attack over a period of a few years after treatment) may not be suitable for adaptive designs, since enrollment may be almost complete by the time the stage 1 cohort has met its follow-up requirements for decision making. Second, the preliminary planning for this trial was meticulous, with detailed dose-selection criteria, a communication plan for disseminating interim results that would not unblind the interim results, a hypothesis-testing strategy that controlled the type I error, and detailed simulations of the operating characteristics before the initiation of the trial.
Although a nonadaptive approach would have the advantage that the sponsor could be fully involved in the selection of the doses for follow-on phase 3 testing, the adaptive design combined the data from the two stages for the final analysis, which meant that the trial required fewer patients and had a shorter overall duration. This gain in efficiency, however, carried the risk that the totality of evidence at the end of the trial might not support a regulatory submission, possibly because of inadequate dose–response modeling or an inadequate safety profile. For this reason, extensive up-front planning and a thorough discussion by the trial team of all possible contingencies that might arise over the course of the two stages of the trial contributed to the success of the INHANCE trial.
Sample-Size Reestimation — the CHAMPION PHOENIX TrialTable 2.
The Cangrelor versus Standard Therapy to Achieve Optimal Management of Platelet Inhibition (CHAMPION) PHOENIX trial was a double-blind, placebo-controlled trial in which patients who were undergoing urgent or elective percutaneous coronary intervention (PCI) for coronary insufficiency were randomly assigned to receive a bolus and infusion of the intravenous antiplatelet agent cangrelor or a loading dose of the oral antiplatelet agent clopidogrel.7 The primary efficacy end point was a composite of death, myocardial infarction, ischemia-driven revascularization, or stent thrombosis within 48 hours after PCI. The initially planned enrollment of 10,900 patients, with possible early stopping for efficacy on the basis of a gamma (−5) alpha spending function (which generates group-sequential boundaries that resemble the O'Brien–Fleming boundaries) when 70% of the patients had been enrolled, provided the study with 86% power to detect a 24% lower relative risk, from an event rate of 5.1% in the control group to an event rate of 3.9% in the experimental-therapy group. However, small variations in the assumed magnitude of the difference in relative risk on the event rate in the control group could have led to a substantial reduction in power at the design stage (Table 2).
To mitigate this risk, the trial permitted a possible sample-size reestimation at the interim analysis when 70% of the patients had been enrolled. The sample space of possible outcomes at this interim analysis was partitioned into three zones on the basis of the observed percentage lowering in relative risk — unfavorable zone (observed difference, <13.6%), promising zone (≥13.6% to ≤21.2%), and favorable zone (>21.2%).8 If the observed percentage lowering in relative risk fell in the promising zone, there would be an increase in the sample size according to a prespecified formula. In the favorable or unfavorable zones, there would be no change in the sample size because the probability of achieving statistical significance under the current observed difference in relative risk would already be very high in the favorable zone, whereas in the unfavorable zone it would be too low to make an increase in sample size worthwhile. In the promising zone, however, there could be a substantial benefit from increasing the sample size.
For example, if the control group had an event rate of 5.1% and the experimental-therapy group had a relative risk that was lower by only 18%, the overall power would be reduced to 62% (Table 2). However, if an adaptive design were implemented, then the power, which was conditional on falling inside the promising zone at the interim analysis, could be boosted from 66% to 90% by increasing the sample size from 10,900 to an average of 17,373.Figure 1.
The advantage of this approach is that the sample size is only increased after the interim results have been reviewed and observed to be promising (in this case, by the data and safety monitoring committee). This is the major innovation of the adaptive group sequential design as compared with the classic group sequential design, in which the maximum amount of statistical information (in this case, sample size) is fixed at the design stage and there is no flexibility to alter it on the basis of results observed at the interim analysis. Fig. S2 in the Supplementary Appendix shows a detailed comparison between the operating characteristics of the adaptive design that was used in the CHAMPION PHOENIX trial and that of a competing group sequential strategy that used the same expected sample size over a range of clinically meaningful values for the difference in relative risk. In exchange for a small loss of overall power, the adaptive design provides a substantial gain in conditional power if the interim results are promising. Control of the type I error for this type of adaptive design is discussed in the sections on Statistical Methodology in the Supplementary Appendix, as well as in Figure 1 (and see the interactive graphic, available at NEJM.org).
In the CHAMPION PHOENIX trial, the results fell in the favorable zone at the interim analysis, and the sample size was not increased. The final analysis showed statistical significance in favor of cangrelor. On the basis of the results of this trial, regulatory agencies in the United States and the European Union approved cangrelor for use in patients who undergo PCI.
Changing the Primary End Point — the EXAMINE Trial
Before any new antihyperglycemic agent can gain full regulatory approval in the United States, it must be shown to have no association with an unacceptable risk of major adverse cardiovascular events. The specific guidance is that the upper boundary of the two-sided repeated 95% confidence interval for the hazard ratio for major adverse cardiovascular events should not exceed 1.3 in the time-to-event analysis in a prospective phase 3 noninferiority trial of the new agent versus standard of care. The Examination of Cardiovascular Outcomes with Alogliptin versus Standard of Care (EXAMINE) trial was such a cardiovascular-outcome trial of alogliptin, a dipeptidyl peptidase 4 inhibitor.9 The trial enrolled 5380 patients with a median follow-up of 18 months and showed noninferiority by obtaining an upper boundary of the confidence interval of 1.16.
Had the upper boundary of the confidence interval been less than 1, the trial would have shown superiority. That is, the trial would have shown that the new agent was protective instead of merely ruling out an unacceptable increase in cardiovascular risk.10-12 Table S2 in the Supplementary Appendix shows the sample size that would be needed for a cardiovascular-outcome trial to have 90% power to show superiority over a range of hazard ratios. For example, even in the case of a drug with a favorable hazard ratio of 0.85 and an annualized event rate of 2.5%, a trial would require enrollment of almost 18,000 patients over a period of 2 years and an additional 3 years of follow-up. In this context, an adaptive design can generate the best possible estimate of the required sample size, since the actual interim results from the trial itself could be used to repower the trial for superiority. The EXAMINE trial had prespecified that the maximum number of adjudicated major adverse cardiovascular events would be 650, with a planned interim analysis after 550 events and an option to stop the trial and claim noninferiority if the P value for the between-group comparison was less than 0.001.Figure 2.
The trial design included one additional feature. The trial could proceed all the way to 650 events even though the early-stopping boundary for claiming noninferiority was crossed, provided that the conditional power or probability to show superiority by the end of the trial under the current trend exceeded 20%. This feature gave the sponsor a second chance to claim superiority. Since the primary analysis of the noninferiority hypothesis was prespecified to be performed in the intention-to-treat population, the change of goal from noninferiority to superiority would not entail a change of population. However, with only 20% conditional power and no option to increase the total number of adjudicated events beyond 650, the chances of actually claiming superiority were low. This design could have been improved by the inclusion of the adaptive option to increase the required number of events for the final analysis if the noninferiority boundary were crossed at the interim analysis and the conditional power for claiming superiority were sufficiently high (Figure 2).
Table S3 in the Supplementary Appendix shows the operating characteristics of the design. By doubling the required number of events in the promising zone, the chances of showing superiority increase from 64% to 96% if the true hazard ratio is 0.85. This dramatic increase in power would come at the cost of prolonging the trial by 1 year. In the EXAMINE trial, the early-stopping boundary for noninferiority was crossed after 550 events, but the conditional power for claiming superiority was less than 20%. Thus, the trial was stopped; this decision allowed the sponsor to file a claim of noninferiority without extending the trial for an additional year with a slim chance of being able to show superiority.
Biomarker-Driven Adaptive Population-Enrichment Designs
It has become increasingly apparent that treatment effects can differ greatly among subgroups of patients with different genetic or biomarker characteristics. Table S4 in the Supplementary Appendix lists several targeted therapeutic agents that have been approved in the United States for specific subgroups of patients. These examples show the potential of predictive biomarkers to identify patients who are likely to benefit from targeted therapies and to thereby increase the success rate of confirmatory clinical trials. In these examples, we have focused on oncology trials, but the use of this approach will probably increase in other fields as validated biomarkers that predict response or lack of response to therapy emerge (see the Supplementary Appendix).13
However, most previous studies in which biomarkers have shown predictive capabilities were not designed for this purpose. Even in well-controlled phase 3 trials, the biomarker component of the analysis is often performed retrospectively or the trials restricted enrollment to the targeted subgroups from the start. However, the Food and Drug Administration guidance regarding enrichment strategies for clinical trials recommends that even in cases in which there is a strong biologic basis for a therapy to target a particular genetic marker, it is desirable to enroll patients in whom the marker is absent in order to show sensitivity in patients who have the marker and lack of sensitivity in patients who do not have the marker.14
Thus, the dilemma for the investigator planning a phase 3 confirmatory trial of a targeted therapy is whether to open the enrollment to all patients regardless of biomarker status or to restrict the enrollment to a targeted subgroup on the basis of a biologic understanding of the mechanism of action from early, possibly uncontrolled, clinical data. Restricting enrollment to the targeted subgroup without sufficient empirical evidence of a lack of efficacy in the nontargeted subgroup may deny a large segment of the population access to a potentially beneficial treatment. However, if a large trial is conducted in a heterogeneous population, the treatment effect may be diluted, thus resulting in an underpowered study.15 An easily understood example is anemia due to vitamin B12 deficiency. In a randomized clinical trial involving patients with anemia, treating everyone in the experimental-therapy group with vitamin B12 would produce negative results, but the small subgroup of patients who truly have a deficiency would benefit.Figure 3.
An adaptive population-enrichment design is an efficient way to verify prospectively that a biomarker is predictive for a targeted therapy. The basic idea in such a design is for all participants to undergo randomization regardless of biomarker status but with the use of an interim analysis to identify whether the biomarker-positive patients benefit differentially from the targeted agent as compared with the biomarker-negative patients. If it appears that only the biomarker-positive patients are benefiting, then further enrollment in the biomarker-negative subgroup would be terminated. The final statistical analysis of the data would be based on data from the two stages with the use of closed testing and conditional error rate methods to prevent inflation of the type I error (see the sections on Statistical Methodology in the Supplementary Appendix).16,17 Figure 3 is a schematic representation of such a design.
At this time, regulatory agencies tend to review proposals for adaptive designs with greater scrutiny than they give to conventional designs. This situation is probably due to limited experience with such designs and serious concern that sponsors will submit poorly conceived designs that may not control the type I error and may actually be less efficient than conventional designs. As with any new approach, there must be clear design rationale, a demonstration of statistical validity, simulation-based operating characteristics, and a comprehensive charter for the data and safety monitoring committee that addresses both the interim decision rules and the manner in which operational bias will be prevented.
The leakage of interim results could alter investigator behavior and lead to operational bias. Even if there is no leakage of interim results, the mere knowledge that there has been an adaptive change (e.g., sample-size reestimation) could cause investigators to speculate on the efficacy of the new compound, which could potentially change the enrollment and characteristics of the patients after the interim analysis. These risks can be mitigated by double-blind trials, appropriate communication with investigators, detailed and auditable standard operating procedures that document who saw what and when, and demonstration that the baseline characteristics of the patients who were enrolled before the adaptive change match those of the patients who were enrolled after the adaptive change.
Problems can arise with randomization, drug supply, and the recruitment of patients when there are adaptive changes due to dose selection, sample-size increases, or population enrichment. It is critical to ensure that the sample size at the interim analysis is adequate for making the adaptive decision. If patients are enrolled too rapidly relative to the time needed to observe the primary end point, the planned enrollment might be completed before adequate information is available for an adaptive decision to be taken. To date, regulatory agencies have opined favorably about adaptive designs.18,19
Future of Adaptive Trials
More widespread use of adaptive trial designs could accelerate the discovery process, especially if coupled with other evolving trial concepts, such as large, simple trials.20,21 Advances in adaptive trial design will require further dissemination and acceptance of the sometimes complex statistical methods. There is an intuitive appeal of adaptive trial design and its attempt to identify the patients who are most likely to derive benefit from a therapy, and this feature will resonate well with most doctors and patients.
Funding and Disclosures
Disclosure forms provided by the authors are available with the full text of this article at NEJM.org.
1. Fuster V, Bhatt DL, Califf RM, et al. Guided antithrombotic therapy: current status and future research direction: report on a National Heart, Lung and Blood Institute working group. Circulation 2012;126:1645-1662
2. Bauer P, Bretz F, Dragalin V, König F, Wassmer G. Twenty-five years of confirmatory adaptive designs: opportunities and pitfalls. Stat Med 2016;35:325-347
3. Food and Drug Administration, Center for Drug Evaluation and Research and Center for Biologics Evaluation and Research. Guidance for industry: adaptive design clinical trials for drugs and biologics. Silver Spring, MD: Food and Drug Administration, 2010.
4. Barnes PJ, Pocock SJ, Magnussen H, et al. Integrating indacaterol dose selection in a clinical study in COPD using an adaptive seamless design. Pulm Pharmacol Ther 2010;23:165-171
5. Donohue JF, Fogarty C, Lötvall J, et al. Once-daily bronchodilators for chronic obstructive pulmonary disease: indacaterol versus tiotropium. Am J Respir Crit Care Med 2010;182:155-162
6. Lawrence D, Bretz F, Pocock SJ. Indacaterol. In: Trifilieff A, ed. INHANCE: an adaptive confirmatory study with dose selection at interim. Basel, Switzerland: Springer Basel, 2014:77-92.
7. Bhatt DL, Stone GW, Mahaffey KW, et al. Effect of platelet inhibition with cangrelor during PCI on ischemic events. N Engl J Med 2013;368:1303-1313
8. Mehta CR, Pocock SJ. Adaptive increase in sample size when interim results are promising: a practical guide with examples. Stat Med 2011;30:3267-3284
9. White WB, Cannon CP, Heller SR, et al. Alogliptin after acute coronary syndrome in patients with type 2 diabetes. N Engl J Med 2013;369:1327-1335
10. Scirica BM, Bhatt DL, Braunwald E, et al. Saxagliptin and cardiovascular outcomes in patients with type 2 diabetes mellitus. N Engl J Med 2013;369:1317-1326
11. Green JB, Bethel MA, Armstrong PW, et al. Effect of sitagliptin on cardiovascular outcomes in type 2 diabetes. N Engl J Med 2015;373:232-242
12. The ORIGIN Trial Investigators. Basal insulin and cardiovascular and other outcomes in dysglycemia. N Engl J Med 2012;367:319-328
13. Everett BM, Brooks MM, Vlachos HE, et al. Troponin and cardiac events in stable ischemic heart disease and diabetes. N Engl J Med 2015;373:610-620
14. Center for Drug Evaluation and Research (CDER), Center for Biologics Evaluation and Research (CBER), Center for Devices and Radiological Health (CDRH). Guidance for industry: enrichment strategies for clinical trials to support approval of human drugs and biological products. Silver Spring, MD: Food and Drug Administration, 2012.
15. Bhatt DL, Fox KAA, Hacke W, et al. Clopidogrel and aspirin versus aspirin alone for the prevention of atherothrombotic events. N Engl J Med 2006;354:1706-1717
16. Jenkins M, Stone A, Jennison C. An adaptive seamless phase II/III design for oncology trials with subpopulation selection using correlated survival endpoints. Pharm Stat 2011;10:347-356
17. Mehta CR, Schäfer H, Daniel H, Irle S. Biomarker driven population enrichment for adaptive oncology trials with time to event endpoints. Stat Med 2014;33:4515-4531
18. Morgan C, Huyck S, Jenkins M, et al. Adaptive design: results of 2012 survey on perception and use. Ther Innov Regul Sci 2014;48:473-481
19. Elsäßer A, Regnstrom J, Vetter T, et al. Adaptive clinical trial designs for European marketing authorization: a survey of scientific advice letters from the European Medicines Agency. Trials 2014;15:383-383
20. Calvo G, McMurray JJ, Granger CB, et al. Large streamlined trials in cardiovascular disease. Eur Heart J 2014;35:544-548
21. Califf RM. Large simple trials: really, it can't be that simple! Eur Heart J 2014;35:549-551
Citing Articles (276)
- Table 1. Types of Adaptive Designs.
- Table 2. Comparison of Design-Stage and Interim Analysis–Stage Operating Characteristics of an Adaptive Trial.
- Figure 1. Adaptive Features of a Trial That Uses Sample-Size Reestimation.
- Figure 2. Adaptive Design of a Cardiovascular-Outcome Trial with Zones for Decision Making Regarding Superiority.
- Figure 3. Schematic Representation of an Adaptive Two-Stage Population-Enrichment Design.
Review ArticleJun 08