Review ArticleThe Changing Face of Clinical Trials

Health Policy Trials

List of authors.
  • Joseph P. Newhouse, Ph.D.,
  • and Sharon-Lise T. Normand, Ph.D.

Introduction

Clinical trials are most commonly associated with drugs and devices, but there are notable examples of trials that involve health policy. Many such trials test innovations in the delivery of services, whereas others focus on financial incentives for patients or providers. This review of health policy trials is not meant to be comprehensive; rather, we mainly consider trials of financial incentives because of their relevance to public policy. Trials of different insurance plans may, for example, vary the degree of cost sharing borne by the patient or the scope of covered services, or they may vary the terms of provider reimbursement.

Trials That Vary Prices Paid by Patients

Table 1. Table 1. Summary Descriptions of the RAND and Oregon Health Insurance Experiments.

Two well-known examples of trials that varied the prices that patients paid for their care are the RAND Health Insurance Experiment and the Oregon Health Insurance Experiment; Table 1 provides a brief description of each experiment and its findings.1-7 Although these two trials will form the basis for many of the conclusions drawn in this review, other, similar experiments have been conducted. In one trial conducted in rural Ghana, families were randomly assigned to either receive free formal medical care or continue to pay user fees (control group).8,9 The intent was to increase the use of formal care and reduce the prevalence of malaria after one malaria season. Those receiving free care did use formal care about 12% more often than those paying the usual fees, and there was a corresponding reduction in the use of informal care. However, there was no significant difference in the prevalence of anemia between the two groups. A trial similar to the RAND experiment that had similar results on the use of care was carried out in rural China.10 This experiment showed that pricing affects the use of care across diverse cultures.

Another trial, the Post–Myocardial Infarction Free Rx Event and Economic Evaluation (MI FREEE) trial, was intended to improve adherence to care by making four classes of drugs free to commercially insured patients after acute myocardial infarction, a form of value-based insurance design.11 Patients in the control group continued with their usual drug coverage, which generally required a copayment. Adherence in the control group was less than 50% for all four drug classes. Making the drugs free raised adherence by 4 to 6 percentage points and reduced rates of major vascular events and revascularization. No significant effect on total medical cost was detected; the savings gained in terms of fewer downstream medical events in the group receiving free drugs roughly offset the additional cost of the drugs.

All these trials exhibit the analogue of the difference in effectiveness versus efficacy in the usual clinical trial, or the issue of generalizability. The RAND experiment was conducted at a time when the dominant mode of U.S. health insurance was indemnity insurance; as is the case with traditional Medicare today, there were no provider networks. The results of the experiment might have been different if the cost sharing had varied within one of today’s narrow network plans. Medical technology, of course, has greatly changed over the past four decades, so we don’t know whether the outcome would be similar today. The results of the Oregon experiment are conditional on the details of the state’s Medicaid program, and the results of the MI FREEE trial might not be generalizable to the population covered by noncommercial insurance.

Nonetheless, the RAND and Oregon experiments have been influential. For example, when providing an estimate of the costs of proposed legislation, the Congressional Budget Office has continued to use the RAND experiment as its best estimate of the effects of cost sharing, in part because there has been no subsequent similar trial and in part because the results of subsequent observational studies have generally been consistent with those of the RAND experiment with regard to the use of cost sharing.12-15 In addition, very soon after the results of the RAND experiment were published, the prevalence of deductibles in hospital insurance declined markedly, although an attempt to establish a causal link with the results of the experiment would be speculative.4 The results of the Oregon experiment were used by President Barack Obama’s administration to advocate the expansion of Medicaid to cover all low-income adults.16,17

The trials just described centered on the insurance contract. Trials can also test financial or other consumer incentives to improve health habits, such as diet, physical activity, and tobacco use.18-21

Trials That Vary Reimbursement

The units of observation in trials that assess the effects of changes in reimbursement may be patients, providers, or health plans. Such trials are often conducted in a fairly small number of practices or delivery systems, which means that generalizability may be more of an issue than is the case in experiments involving patients’ insurance. The randomization of some participants to a staff-model health maintenance organization (HMO) in the trial conducted by RAND is an example of an experiment of reimbursement. The HMO, whose physicians were salaried employees, was paid a fixed per-member, per-month amount to provide necessary medical services. Care at the HMO was free to the participants, but there was no coverage of services outside the HMO. Use of care at the HMO was compared with use by the group receiving free care in the fee-for-service part of the experiment and with use by a random sample of existing, self-selected HMO enrollees. As compared with hospital use by the group in the fee-for-service part of the experiment, hospital use at the HMO was 34% lower, with no measurable effects on health outcomes. As compared with hospital use by those already enrolled at the HMO, use by the new enrollees was not significantly different, but the use of outpatient services by new enrollees was somewhat lower.3

Although the RAND experiment included patients clustered in one HMO, the unit of observation in other trials could consist of patients clustered in several physician practices or physicians clustered in several delivery systems.22 In these cases, as well as with individual patients clustered in families, the observations should not be treated independently in the analysis. We consider this issue below.

Experiments can vary incentives for both patients and providers, as was the case in a trial intended to reduce levels of low-density lipoprotein cholesterol that offered financial incentives to both patients and physicians.23 Physicians and patients in three primary care practices (340 physicians and 1503 patients) were each eligible to receive up to $1,024 if goals were achieved. The trial had four groups, one that provided both patient and physician incentives, one that provided physician incentives only, one that provided patient incentives only, and a control group that offered no financial incentives. The trial showed that both patient and physician incentives were necessary to reduce cholesterol levels; the results in the groups with incentives for patients or physicians alone did not differ significantly from those in the control group.

Design

Many Decisions

Table 2. Table 2. Considerations Regarding the Design and Analysis of Health Policy Trials.

In all trials, researchers have many decisions to make in addition to the standard consideration of sample size. Table 2 lists several such decisions, a few of which we discuss in greater detail below.

What Inducement, if Any, Should Be Offered to Participants?

The question regarding inducement was a prominent issue in the RAND experiment, in which patients agreed to forgo the benefits of their existing insurance plan. Consequently, some participating families could anticipate paying more out of pocket for medical care, whereas others were offered benefits that were better than those provided by their current insurer. To prevent families with high anticipated costs of care from declining to participate or from withdrawing from the cost-sharing plans, the RAND experiment made “side payments” that were equal to the worst case the family could face. Thus, it was never in the financial interest of a family to decline participation or to withdraw; indeed, this offer could be viewed as an inducement. Some randomly selected families received additional money to determine the effect of side payments on health care spending (the effects proved to be negligible).

The use of side payments appeared to be effective in minimizing potential bias at enrollment because there were no observable differences across plans in either prior utilization or measures of baseline self-reported health status. Nonetheless, rates of refusal and attrition were both higher in plans with cost sharing. This finding has led to some concern about possible bias, but in addition to the lack of observable baseline differences among the plans, the experimental response of participants to cost sharing is similar to that seen in subsequent observational studies.13-15,24,28,29

How Many Sites Should There Be?

Usually, one wants to generalize to a national population, but if fixed costs are involved in opening a site, such as the cost of maintaining a local field office, the researcher faces a trade-off between the number of families per site and the number of sites. The preferred number of sites depends on the ratio of between-site to within-site variance and the cost of another site relative to the cost of an additional person or family.30

How Long Should the Experiment Run?

In all types of social science experiments, there is a “learning-by-doing” effect that may cause later responses to the question of length of run to differ from earlier ones. Responses that may vary over time present the issue of whether the trial should include more participants for a shorter period of time or fewer participants for a longer period of time. There is obvious value in obtaining trial data as soon as possible, a consideration that favors the use of a greater number of participants for a shorter period of time, but the possibility of a variable response over time means that in the ideal case the experiment would be conducted until responses have stabilized. This issue is particularly difficult because the follow-up period, along with the rest of the protocol, must be determined in advance, and uncertainty regarding how long it will take for a response to stabilize is inevitable.

How Should Individual Patients or Families Be Assigned to Treatments?

Simple random assignment will yield asymptotically unbiased estimates of treatment effects, but precision can be improved with the use of stratification, blocking, or a generalization of stratification.26 Moreover, ethical and logistic considerations may dictate the randomization of clusters, such as families as opposed to individual participants.

To What Degree Should Groups of Special Interest Be Oversampled?

In some cases, the researcher may be interested in a subgroup, such as a low-income population or persons who rate their health as fair or poor. However, if the characteristics that define a subgroup are not stable, as in the case of both income and self-rated health, an excessively high rate of oversampling can result in less precision, even for the favored group.31

What Baseline Physiological Characteristics, if Any, Should Be Measured?

Measuring characteristics such as blood pressure and lipid levels at baseline greatly improves power because these characteristics tend to be reasonably stable over a short period, absent medical intervention. However, ethical considerations demand that a participant with a sufficiently abnormal value be treated, an event that compromises the goal of the trial. In the RAND experiment, more than half the participants were randomly assigned to a baseline screening examination, and researchers found that the examination had negligible effects on measures of outcome. The timing of the Oregon experiment was such that baseline physiological measures could not be obtained. This factor may have been one of the reasons why the confidence intervals for the outcomes of blood pressure and total cholesterol level are considerably wider than those in the RAND experiment, despite the larger sample size of the Oregon experiment.

The Role of Randomized Trials in Formulating Policy

Currently, there is controversy over the degree to which the Center for Medicare and Medicaid Innovation (CMMI) should use randomized trials to determine the ways in which Medicare should reimburse providers.32 Although it may use randomization, CMMI at other times simply compares volunteers with nonvolunteers, an approach that raises questions with regard to both selection and generalizability.

Moreover, the experimental variables that CMMI uses — typically, variations in reimbursement — are not necessarily stable. To quote CMMI, the models used by the Centers for Medicare and Medicaid Services (CMS) “are not static. Every model is designed with the intent that CMS will make changes incrementally and refine interventions and incentive structures as more is learned. . . . That is, these models have feedback and learning systems embedded in their design.”33 Such instability of the experimental variable, however, makes causal inference difficult.

CMMI has objections to randomization beyond its desire to allow for tinkering with the experimental variable: “[R]andomization can discourage potential applicants . . . because of concern about being assigned to a comparison group; participants assigned to control groups may be less willing to report key data . . . once the model has started; and randomizing beneficiaries to different levels of care or benefits under a model raises both legal and ethical concerns, particularly when working with vulnerable and at-risk populations.”33

Many of these objections (e.g., unwillingness to participate in a study because of concerns about being randomly assigned to the control group) also arise in clinical trials but have proved to be manageable. Crossover designs may be useful in enticing people to participate in a trial, but their use in cluster-randomized trials can introduce practical complications, such as when investigators need to ensure a common timed switch of the intervention among all participants in the cluster. Stepped-wedge designs, in which the intervention is introduced over time and which eventually allow all clusters to cross over to the experimental group, are promising.34 Offering payment to those who join the control group can boost participation and can sometimes be accomplished with little compromise in validity. Legal and ethical issues seem to be no greater and in many cases are lower than in clinical trials of therapeutics or diagnostics. Finkelstein and Taubman discuss these issues at greater length.35,36

Statistical Considerations in Cluster Trials

In many health policy trials, randomization occurs not at the level of the individual patient or beneficiary but rather at an intact cluster level. For example, the RAND and Oregon Health Insurance Experiments used randomization to assign families to groups but analyzed outcomes at the individual level.

There are many reasons to use cluster randomization. Individual physicians or physician organizations may be reluctant to have some patients receive an enhanced health service while others in their practice do not. Randomizing a cluster will reduce any contamination bias that may arise as a result of interactions among persons in the same cluster. For example, therapies applied in group settings could result in the sharing of attitudes among patients who have the same therapist; prevention strategies suffer the same risk if patients share advice on prevention. Contamination may also arise when a physician’s or staff member’s knowledge of the new health intervention influences either the way in which patients are treated or in which the patients themselves behave in their usual care or control group.

The presence of clusters in a trial affects both power calculations and the analysis, the key feature being the degree of similarity among responses within the cluster. As compared with a design in which there are independent observations, a design that involves within-cluster correlation among outcomes reduces the effective number of observations. Furthermore, if fixed-effect methods are used to analyze trials that use cluster randomization by including a dummy variable for each cluster in the statistical model, the between-cluster variation in responses is not separated from the intervention effect. Analyses that use random cluster effects implemented with the use of either population-averaged models or conditional models are typically required.37 The RAND and Oregon Health Insurance Experiments used a population-averaged approach based on a strategy suggested by Huber38 and White.39 However, if the number of clusters is small — perhaps less than 30 or 40 — population-averaged approaches will underestimate the true variance and will need to be corrected.40,41

Health Policy Trials vs. Clinical Trials

As is apparent from the foregoing discussion, trials in the policy sphere pose many of the same issues for design and analysis as clinical trials. There are also important differences. The use of cluster designs is probably more common in policy trials. Unlike clinical trials, policy trials may arise opportunistically, as in the case of the Oregon experiment. Because there is no legal requirement to conduct a policy trial and generally no commercial gain to be had from it, obtaining funding for policy trials is more challenging. Health policy trials are complex and can be difficult to execute. Thus, they resemble trials of treatment or diagnostic strategies more than trials of drugs or devices. Nonetheless, randomization can be and has been successfully conducted in the sphere of policy.

Funding and Disclosures

Disclosure forms provided by the authors are available with the full text of this article at NEJM.org.

Author Affiliations

From the Departments of Health Care Policy (J.P.N., S-L.T.N.) and Biostatistics (S.-L.T.N.), Harvard Medical School, Boston.

Supplementary Material

References (41)

  1. 1. Newhouse JP, Manning WG, Morris CN, et al. Some interim results from a controlled trial of cost sharing in health insurance. N Engl J Med 1981;305:1501-1507

  2. 2. Brook RH, Ware JE Jr, Rogers WH, et al. Does free care improve adults’ health? Results from a randomized controlled trial. N Engl J Med 1983;309:1426-1434

  3. 3. Manning WG, Leibowitz A, Goldberg GA, Rogers WH, Newhouse JP. A controlled trial of the effect of a prepaid group practice on use of services. N Engl J Med 1984;310:1505-1510

  4. 4. Newhouse JP, the Insurance Experiment Group. Free for all: lessons from the Health Insurance Experiment. Cambridge, MA: Harvard University Press, 1993.

  5. 5. Finkelstein AN, Taubman S, Wright B, et al. The Oregon Health Insurance Experiment: Evidence from the first year. Q J Econ 2012;127:1057-1106

  6. 6. Baicker K, Taubman SL, Allen HL, et al. The Oregon Experiment — effects of Medicaid on clinical outcomes. N Engl J Med 2013;368:1713-1722

  7. 7. Taubman SL, Allen HL, Wright BJ, Baicker K, Finkelstein AN. Medicaid increases emergency-department use: evidence from Oregon’s Health Insurance Experiment. Science 2014;343:263-268

  8. 8. Ansah EK, Narh-Bana S, Asiamah S, et al. Effect of removing direct payment for health care on utilisation and health outcomes in Ghanaian children: a randomised controlled trial. PLoS Med 2009;6:48-57

  9. 9. Powell-Jackson T, Hanson K, Whitty CJM, et al. Who benefits from free health care? Evidence from a randomized experiment in Ghana J Dev Econ 2014;107:305-319

  10. 10. Sine JJ. Demand for episodes of care in the China Health Insurance Experiment. Santa Monica, CA: Pardee RAND Graduate School, 1994.

  11. 11. Choudhry NK, Avorn J, Glynn RJ, et al. Full coverage for preventive medications after myocardial infarction. N Engl J Med 2011;365:2088-2097

  12. 12. Consumer-directed health plans: potential effects on health care spending and outcomesWashington DC: Congressional Budget Office, 2006

  13. 13. Scitovsky AA, McCall N. Coinsurance and the demand for physician services: four years later. Soc Secur Bull 1977;40:19-27

  14. 14. Chandra A, Gruber J, McKnight R. The impact of patient cost-sharing on low-income populations: evidence from Massachusetts. J Health Econ 2014;33:57-66

  15. 15. Brot-Goldberg ZC, Chandra A, Handel BR, Kolstad JT. What does a deductible do? The impact of cost-sharing on health care prices, quantities, and spending dynamicsCambridge, MA: National Bureau of Economic Research, 2015

  16. 16. Glied SA. Health insurance leads to healthier Americans. July 7, 2011 (https://obamawhitehouse.archives.gov/blog/2011/07/07/health-insurance-leads-healthier-americans).

  17. 17. Council of Economic Advisers. Missed opportunities: the consequences of state decisions not to expand Medicaid. 2014 (https://obamawhitehouse.archives.gov/sites/default/files/docs/missed_opportunities_medicaid_0.pdf).

  18. 18. Patel MS, Asch DA, Troxel AB, et al. Premium-based financial incentives did not promote workplace weight loss in a 2013-15 study. Health Aff (Millwood) 2016;35:71-79

  19. 19. Patel MS, Asch DA, Rosin R, et al. Framing financial incentives to increase physical activity among overweight and obese adults: a randomized, controlled trial. Ann Intern Med 2016;164:385-394

  20. 20. Halpern SD, French B, Small DS, et al. Randomized trial of four financial-incentive programs for smoking cessation. N Engl J Med 2015;372:2108-2117

  21. 21. Volpp KG, Troxel AB, Pauly MV, et al. A randomized, controlled trial of financial incentives for smoking cessation. N Engl J Med 2009;360:699-709

  22. 22. Song Z, Rose S, Safran DG, Landon BE, Day MP, Chernew ME. Changes in health care spending and quality 4 years into global payment. N Engl J Med 2014;371:1704-1714

  23. 23. Asch DA, Troxel AB, Stewart WF, et al. Effect of financial incentives to physicians, patients, or both on lipid levels: a randomized clinical trial. JAMA 2015;314:1926-1935

  24. 24. Aron-Dine A, Einav L, Finkelstein AN. The RAND Health Insurance Experiment, three decades later. J Econ Perspect 2013;27:197-222

  25. 25. Joffe MM, Rosenbaum PR. Invited commentary: propensity scores. Am J Epidemiol 1999;150:327-333

  26. 26. Morris CN. A finite selection model for experimental design of the Health Insurance Study. J Econometrics 1979;11:43-61

  27. 27. Ware JH, Harrington D, Hunter DJ, D’Agostino RB Sr. Missing data. N Engl J Med 2012;367:1353-1354

  28. 28. Nyman JA. American health policy: cracks in the foundation. J Health Polit Policy Law 2007;32:759-783

  29. 29. Newhouse JP, Brook RH, Duan N, et al. Attrition in the RAND Health Insurance Experiment: a response to Nyman. J Health Polit Policy Law 2008;33:295-308

  30. 30. Archibald R, Newhouse JP. Social experimentation: some why’s and how’s. In: Miser HJ, Quade EJ, eds. Handbook of systems analysis: craft issues and procedural choices. New York: North-Holland, 1988:173-214.

  31. 31. Morris CN, Newhouse JP, Archibald R. On the theory and practice of obtaining unbiased and efficient samples in social surveys and experiments. In: Smith V, ed. Experimental economics. Westport, CT: JAI Press, 1979.

  32. 32. Kolata G. Method of study is criticized in group’s health policy tests. New York Times. February 3, 2014:A1.

  33. 33. Howell BL, Conway PH, Rajkumar R. Guiding principles for Center for Medicare & Medicaid innovation model evaluations. JAMA 2015;313:2317-2318

  34. 34. Hussey MA, Hughes JP. Design and analysis of stepped wedge cluster randomized trials. Contemp Clin Trials 2007;28:182-191

  35. 35. Finkelstein A, Taubman S. Health care policy: randomize evaluations to improve health care delivery. Science 2015;347:720-722

  36. 36. Finkelstein AN, Taubman S. Using randomized evaluations to improve the efficiency of US healthcare delivery. Cambridge, MA: J-PAL, February 2015 (https://www.povertyactionlab.org/sites/default/files/publications/Using%20Randomized%20Evaluations%20to%20Improve%20the%20Efficiency%20of%20US%20Healthcare%20Delivery.pdf).

  37. 37. Murray DM, Varnell SP, Blitstein JL. Design and analysis of group-randomized trials: a review of recent methodological developments. Am J Public Health 2004;94:423-432

  38. 38. Huber PJ. The behavior of maximum likelihood estimates under nonstandard conditions. In: Lecam LM, Neyman J, eds. Proceedings of the Fifth Berkeley Symposium in mathematical statistics and probability. Berkeley: University of California Press, 1967:221-233.

  39. 39. White H. A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica 1980;48:817-838

  40. 40. Li P, Redden DT. Small sample performance of bias-corrected sandwich estimators for cluster-randomized trials with binary outcomes. Stat Med 2015;34:281-296

  41. 41. Cameron AC, Gelbach J, Miller DL. Bootstrap-based improvements for inference with clustered errors. Rev Econ Stat 2008;90:414-427

Citing Articles (15)

    Comments (2)

    Figures/Media

    1. Table 1. Summary Descriptions of the RAND and Oregon Health Insurance Experiments.
      Table 1. Summary Descriptions of the RAND and Oregon Health Insurance Experiments.
    2. Table 2. Considerations Regarding the Design and Analysis of Health Policy Trials.
      Table 2. Considerations Regarding the Design and Analysis of Health Policy Trials.