Sounding Board

Ensuring Patient Privacy in Data Sharing for Postapproval Research

List of authors.
  • Ameet Sarpatwari, J.D., Ph.D.,
  • Aaron S. Kesselheim, M.D., J.D., M.P.H.,
  • Bradley A. Malin, Ph.D.,
  • Joshua J. Gagne, Pharm.D., Sc.D.,
  • and Sebastian Schneeweiss, M.D., Sc.D.


Postapproval research is essential to address questions about safety and effectiveness that are not answered in the pivotal trials leading to approval of medical products by the Food and Drug Administration (FDA). There are many reasons why preapproval studies cannot answer all such questions, including the frequent exclusion of key segments of the population in these studies1,2 and the studies' inability to detect rare but life-threatening adverse events.3 The need for methodologically robust and efficient postapproval research will grow more urgent as more investigational products are subjected to expedited preapproval studies.

Postapproval observational studies are both a practical and a necessary means to assess safety and effectiveness. Sharing electronic medical records and other secondary health care data sets facilitates observational studies by enabling rapid capture of a greater number of persons with exposures and outcomes of interest as well as by supplying a broader spectrum of study variables than would otherwise be possible if these resources were not shared. These enhancements improve statistical power, permit more rigorous adjustment for confounding, and enable more detailed subgroup analyses to better understand treatment-effect heterogeneity.

Important questions persist, however, about how data sharing can be best accomplished given patients' privacy rights that are outlined under the Health Insurance Portability and Accountability Act (HIPAA)4 and the Health Information Technology for Economic and Clinical Health (HITECH) Act.5 These rules limit the ability of covered entities — health care providers, insurers, and administrators — and their business associates to share individually identifiable health information for purposes not related to treatment, payment, or health care operations. Such protected health information can be critical for postapproval observational studies. In this article, we detail data-sharing pathways for research purposes under HIPAA and the HITECH Act, and we assess their usefulness and the associated risk of liability among investigators seeking to combine large data sets to conduct postapproval research.

Pathways for Sharing Information for Observational Research under HIPAA

Enacted by Congress in 1996, HIPAA required the Department of Health and Human Services (DHHS) to establish national privacy and security standards for the use of protected health information and authorized criminal fines and imprisonment for covered entities who unlawfully obtained or disclosed protected health information knowingly, under false pretenses, or with the intent to secure commercial gain or inflict harm. Civil fines were reserved for cases of at least willful neglect and were capped at $25,000 per year.4 HIPAA did not, however, permit patients to sue covered entities directly.

Enactment of the HITECH Act in 2009 resulted in three changes to this enforcement system. First, it extended the reach of HIPAA to the business associates of covered entities. Second, it authorized state attorneys general to bring civil actions on behalf of their constituents. Finally, the HITECH Act significantly increased the range of possible civil penalties for a wider range of actions, including inadvertent HIPAA breaches, and it allowed fines of up to $1.5 million per year.5

Table 1. Table 1. HIPAA Data-Sharing Pathways for Research Purposes without Patient Consent or Waiver of Consent by an Institutional Review Board.

One way in which HIPAA regulations permit observational studies is by allowing covered entities to share protected health information for “public health activities” conducted under the auspices of a public health authority.6 Such activities include current efforts by the FDA to develop a postapproval risk identification system for medical products.7,8 This Sentinel Initiative offers great promise to regulators,9 but it will not obviate the need for researchers to conduct independent studies of safety and effectiveness. HIPAA permits covered entities to share protected health information for observational research outside the auspices of a public health authority without specific patient authorization or waiver of authorization by an institutional review board under two circumstances: first, if there is conditional use of a limited data set stripped of 16 identifiers, including unique device identifiers and patient-specific addresses that are more specific than a ZIP Code,10 and second, if the data are deidentified (Table 1). Deidentification may be achieved by means of a “safe harbor,” which prohibits sharing of the same 16 identifiers — including increased restrictions on patient-specific addresses — in addition to all elements of dates except years and any other unique identifiable characteristic,11 or by obtaining an expert determination that “the risk is very small that the information could be used, alone, or in combination with other reasonably available information . . . to identify an individual.”12 Although HIPAA regulations authorize states to impose stricter safeguards for data sharing that does not involve public health activities,13 state privacy laws rarely appear to impose additional burdens on researchers.14 The data-sharing pathways generally still necessitate review of the research protocols by an institutional review board, but they can simplify the process, expediting review times and limiting the possibility that an institutional review board will request substantive protocol alterations. This streamlined process is particularly advantageous for ensuring the consistency of multicenter investigations.

Application of the Pathways for Postapproval Research

The three pathways — limited data sets, the safe harbor, and expert determination — have different strengths and limitations in facilitating postapproval research. The ideal pathway would maximize usefulness while minimizing invasion of the patients' privacy and risks of liability among covered entities and expert certifiers.

Although enforcement for noncompliance with HIPAA has historically been limited, the DHHS imposed its first civil penalty for a HIPAA breach, a $4.3 million fine, in February 2011.15 Since that time, the DHHS has collected an additional $18.2 million from 17 other covered entities.16 State attorneys general have also brought independent civil actions for suspected violations in Connecticut, Massachusetts, Minnesota, and Vermont.17 These enforcement actions primarily involved nonsecure maintenance, movement, and disposal of protected health information. No charges have been filed for improper data sharing for research purposes or inadequate deidentification efforts under HIPAA.

Limited Data Sets

The chief advantage of limited data sets in postapproval research is that they can contain precise, patient-specific dates of health care encounters. Specificity in the timing of events such as initiation of treatment, implantation of a device, and the onset of illness is necessary to establish temporal relationships that facilitate causal inference.

However, limited data sets have important constraints. Their restriction on supplying device identifiers could hamper device tracking and, thus, the identification of batch-specific manufacturing defects. It was the desire to enable such tracking that prompted Congress to require the FDA to develop a system of unique device identifiers. This system will soon require most medical devices to be affixed with a device identifier that details its specific model and a production identifier that provides the batch number, manufacturing and expiration dates, and the serial number of the device.18 Although it is clear that serial numbers on devices are precluded from limited data sets, it is uncertain whether batch or model numbers on devices are prohibited.

In addition, in limited data sets, the prohibition on providing patient-specific addresses more granular than ZIP Codes limits investigations of street-level disease burden and drug usage. Using a simulation model, Kamel Boulos et al.19 showed how data aggregation according to census tracts, which generally cover 1200 to 8000 people,20 can mask outbreaks of disease. Limited data sets may therefore prove to be suboptimal for postapproval research involving narrow clustering of outcomes, as in research on infectious diseases. They may also prove to be problematic if socioeconomic confounding is a concern, given that addresses can serve as proxies for income.21

Finally, the use of the limited-data-set pathway poses two risks of liability. First, a covered entity can be found to be in breach of HIPAA for failing to take corrective action if there is awareness of the recipient party's noncompliance with the data-use agreement.10 Because limited data sets are protected health information, covered entities can also face penalties for sharing more than the minimum amount of data necessary. The “minimum necessary” standard remains vague, with no case law and only minimal guidance explaining it. In the HITECH Act, Congress specified that all limited data sets would be deemed to be compliant with the minimum necessary standard until the DHHS provided clarification.5 This guidance is overdue and, once issued, it may affect the applicability of this pathway for postapproval research.

Safe Harbor

Covered entities face a lower risk of liability under the safe-harbor pathway. Sharing is not subject to the minimum necessary standard, and covered entities are not responsible for acting on their knowledge of unpermitted transactions by the recipients of data. They are liable only if the shared data are not properly deidentified.

The data restrictions involved in meeting the safe harbor, however, impose substantial restraints on the usefulness of the resulting data set. Of primary concern is the prohibition on sharing components of patient-specific dates other than the year. In many instances, covered entities can mitigate the restriction by sharing temporal data that do not contain actual dates. For example, informing researchers that a patient had a myocardial infarction 45 days after receiving a newly approved drug is normally permitted. Such date shifting is currently used in deidentified electronic medical-records systems run by institutions such as Vanderbilt University Medical Center22 and in publicly accessible clinical trials data sets from GlaxoSmithKline.23

Figure 1. Figure 1. Example of a Violation of the Date Restriction in the Safe-Harbor Pathway.

Given the study period of October 15, 2009, through January 31, 2010, if a physician informed researchers that a patient was vaccinated 200 days after a randomly chosen, fixed reference date and had an adverse event 250 days after the reference date, the researchers would know that the patient must have been vaccinated in the second half of 2009. Providing this information would violate the safe harbor, which prohibits covered entities from sharing all elements of patient-specific dates, except the year.

Date shifting is less feasible, however, for postapproval investigations of short duration or long-term collaborations involving routine data updates over short intervals, since the recipient party may be able to use the supplied data to infer the date of occurrence of events with greater specificity than the year. Figure 1 shows such a scenario in a postapproval observational investigation of the influenza A (H1N1) 2009 monovalent vaccine (Focetria) in Italy from October 15, 2009, through January 31, 2010. Data for the study were obtained from treating physicians in two periods: up to 3 weeks after vaccination and between 4 and 6 weeks after vaccination.24 If the study had been conducted in the United States and patient-specific data were obtained from primary care practices (i.e., covered entities) through date shifting, recipient researchers would know the date of vaccination with greater specificity than the year for many patients who had an adverse event in the second follow-up period. For example, if a physician reported that a patient had acute respiratory failure 40 days after vaccination, researchers could deduce (in violation of the safe harbor) that the patient was vaccinated in the latter half of 2009. Although many current postapproval studies of drugs and devices span a period of 3 years or more,25-27 there is a growing impetus to conduct these investigations within shorter time frames, increasing the chance that date shifting will contravene the safe harbor.

In addition, as compared with limited data sets, the safe harbor imposes greater restrictions on sharing information about devices and addresses. Although both pathways preclude the transfer of device identifiers, the safe harbor requires that covered entities certify that they have no knowledge that the shared data are identifiable. Such certification will prove difficult if covered entities seek to share batch numbers when the quantity of devices per batch is small and the batch numbers implicate a creation date that implies the date of allocation. This information would narrow the population of possible recipients of data considerably. Covered entities can also share only the first three digits of a ZIP Code and only in circumstances in which the region they encompass has a population of more than 20,000 people.11 Thus, postapproval studies conducted with the use of safe harbors are subject to increased limitations on their ability to produce descriptive information and to perform adjustment for confounding. Such limitations make it unfeasible to conduct a study similar to the one by Brownstein et al.,28 who mapped the frequency of abuse of specific prescription opioids according to three-digit ZIP Codes in New Mexico and adjusted for the availability of these drugs within these regions.

Expert Determination

Using the expert-determination pathway, covered entities can tailor the data they share to specific research needs, enabling transfer of otherwise precluded identifiers (Fig. S1 in the Supplementary Appendix, available with the full text of this article at Expert determination can protect patients' privacy as well as the safe-harbor pathway,29 which is generally considerably safer than limited data sets.30

Widespread usage of the expert-determination pathway, however, requires the generation of additional standards and the existence of experts who are willing to certify that the risk of identification is very small. Risk assessment will need to be tailored to the specific data shared and account for the capabilities and trustworthiness of the recipient.31 At the same time, data protection must be based on real, evidence-based “threat scenarios.”32 For example, a systematic review revealed that although some safe-harbor data can be reidentified,33 the rate of such instances can be small, suggesting that the protections of the safe-harbor pathway are sufficient in many cases.

Expert certifiers face very little risk of liability under HIPAA. Although the HITECH Act expanded the requirements of HIPAA to the business associates of covered entities, it is usually covered entities, and not expert certifiers, who share data. Expert certifiers nevertheless could face possible tort action for negligence under state laws just as the developers of predictive analytics models may be subject to state product-liability claims.34 Given the infancy of the field, the standard of care required of an expert certifier and the cost of gauging the risk of identification incorrectly are unknown.

Objectively, deficient certification could in theory also lead to risks of liability among covered entities. These risks are small, however, because covered entities should be able to rely on certifications, unless they have definitive evidence that the certifications are inadequate or they do not exercise reasonable care in the selection of an expert. As of this writing, no negligence actions have been taken against covered entities for sharing data after deficient certifications.


Although long-term postapproval studies can be successfully conducted with data shared through the safe-harbor pathway, the pathway is not well suited for investigations involving short follow-up or small-scale geographic variation in exposures, covariates, or outcomes. Furthermore, until the DHHS clarifies whether batch and model numbers are considered to be device identifiers under HIPAA, the usefulness of safe harbors for postapproval research on devices remains uncertain. By contrast, limited data sets are more conducive for postapproval studies, but they require a data-use agreement, pose a moderate risk of liability among covered entities, and, like safe harbors, are potentially problematic for investigations of devices or studies in which narrow clustering of diseases is a concern. Expert determination is a promising alternative for cases in which the above limitations are prohibitive, but additional steps are needed to make this pathway more viable.

We think that the DHHS should foster the development of additional standards for expert determination, including model assessments across a sample of interventions and diseases. One way to do so would be through the creation or designation of national deidentification centers of excellence.35 All parties would benefit if the DHHS clarified the risks of liability among expert certifiers and ensured that malpractice insurance is readily available to experts who are willing to offer their services. Greater attention to these issues may facilitate the ability of investigators to use shared data sets to conduct postapproval research of new drugs and devices.

Funding and Disclosures

Disclosure forms provided by the authors are available with the full text of this article at

This article was udpated on October 23, 2014, at

Author Affiliations

From the Program on Regulation, Therapeutics, and Law, Division of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston (A.S., A.S.K., J.J.G., S.S.); and the Department of Biomedical Informatics, School of Medicine, and the Department of Electrical Engineering and Computer Science, Vanderbilt University, Nashville (B.A.M.).

Supplementary Material

References (35)

  1. 1. Geller SE, Koch A, Pellettieri B, Carnes M. Inclusion, analysis, and reporting of sex and race/ethnicity in clinical trials: have we made progress? J Womens Health (Larchmt) 2011;20:315-320

  2. 2. Zulman DM, Sussman JB, Chen X, Cigolle CT, Blaum CS, Hayward RA. Examining the evidence: a systematic review of the inclusion and analysis of older adults in randomized controlled trials. J Gen Intern Med 2011;26:783-790

  3. 3. Darrow JJ. Crowdsourcing clinical trials. Minn Law Rev 2014;98:805-866

  4. 4. Health Insurance Portability and Accounting Act, Pub. L. No. 104-191, 110 Stat. 1936 (Aug. 21, 1996).

  5. 5. Health Information Technology for Economic and Clinical Health Act, Title XIII of Division A and Title IV of Division B of the American Recovery and Reinvestment Act of 2009, Pub. L. No. 111-5, 123 Stat. 226 (Feb. 17, 2009).

  6. 6. Uses and disclosures from for which an authorization or opportunity to agree or object is not required, 45 C.F.R. § 164.512(b)(1).

  7. 7. New drugs, 21 U.S.C. § 355(k)(3)(b).

  8. 8. McGraw D, Rosati K, Evans B. A policy framework for public health uses of electronic health data. Pharmacoepidemiol Drug Saf 2012;21:Suppl 1:18-22

  9. 9. Psaty BM, Breckenridge AM. Mini-Sentinel and regulatory science -- big data rendered fit and functional. N Engl J Med 2014;370:2165-2167

  10. 10. Other requirements relating to uses and disclosures of protected health information, 45 C.F.R. § 164.514(e).

  11. 11. Other requirements relating to uses and disclosures of protected health information, 45 C.F.R. § 164.514(b)(2).

  12. 12. Other requirements relating to uses and disclosures of protected health information, 45 C.F.R. § 164.514(b)(1).

  13. 13. Evans BJ. Institutional competence to balance privacy and competing values: the forgotten third prong of HIPAA preemption analysis. Univ Cal Davis Law Rev. 2013;46:1175-230.

  14. 14. Rosenbaum S, Borzi PC, Burke T, Nath SW. Does HIPAA preemption pose a legal barrier to health information transparency and interoperability? BNA Health Care Policy Rep 2007;15:1-14

  15. 15. Sun LH. Clinic fined $4.3 million for failing to provide patients' medical records. Washington Post. February 21, 2011 (

  16. 16. Department of Health and Human Services, Office of Civil Rights. Case examples and resolution agreements (

  17. 17. Reisz LP. State attorneys general wade further into HIPAA pool. Lexicology. August 7, 2012 (

  18. 18. Food and Drug Administration. Unique device identification system: final rule. Fed Regist 2013;78:58786-58828

  19. 19. U.S. Census Bureau. Geographic terms and concepts — census tracts (

  20. 20. Kamel Boulos MN, Cai Q, Padget JA, Rushton G. Using software agents to preserve individual health data confidentiality in micro-scale geographical analyses. J Biomed Inform 2006;39:160-170

  21. 21. Glover J, Rosman D, Tennant S. Unpacking analyses relying on area-based data: are the assumptions supportable? Int J Health Geogr 2004;3:30-30

  22. 22. Roden DM, Pulley JM, Basford MA, et al. Development of a large-scale de-identified DNA biobank to enable personalized medicine. Clin Pharmacol Ther 2008;84:362-369

  23. 23. Nisen P, Rockhold F. Access to patient-level data from GlaxoSmithKline clinical trials. N Engl J Med 2013;369:475-478

  24. 24. Candela S, Pergolizzi S, Ragni P, et al. An early (3-6 weeks) active surveillance study to assess the safety of pandemic influenza vaccine Focetria in a province of Emilia-Romagna region, Italy -- part one. Vaccine 2013;31:1431-1437

  25. 25. Funch D, Gydesen H, Tornoe K, Major-Pedersen A, Chan KA. A prospective, claims-based assessment of the risk of pancreatitis and pancreatic cancer with liraglutide compared to other antidiabetic drugs. Diabetes Obes Metab 2014;16:273-275

  26. 26. Burmester GR, Matucci-Cerinic M, Mariette X, et al. Safety and effectiveness of adalimumab in patients with rheumatoid arthritis over 5 years of therapy in a phase 3b and subsequent postmarketing observational study. Arthritis Res Ther 2014;16:R24-R24

  27. 27. Ho PM, Maddox TM, Wang L, et al. Risk of adverse outcomes associated with concomitant use of clopidogrel and proton pump inhibitors following acute coronary syndrome. JAMA 2009;301:937-944

  28. 28. Brownstein JS, Green TC, Cassidy TA, Butler SF. Geographic information systems and pharmacoepidemiology: using spatial cluster detection to monitor local patterns of prescription opioid abuse. Pharmacoepidemiol Drug Saf 2010;19:627-637

  29. 29. Malin B, Benitez K, Masys D. Never too old for anonymity: a statistical standard for demographic data sharing via the HIPAA Privacy Rule. J Am Med Inform Assoc 2011;18:3-10

  30. 30. Benitez K, Malin B. Evaluating re-identification risks with respect to the HIPAA privacy rule. J Am Med Inform Assoc 2010;17:169-177

  31. 31. Department of Health and Human Services, Office of Civil Rights. Guidance regarding methods for de-identification of protected health information in accordance with the Health Insurance Portability and Accountability Act (HIPAA) privacy rule (

  32. 32. El Emam K, Arbuckle L. Anonymizing health data: case studies and methods to get you started. Sebastopol, CA: O'Reilly Media, 2013.

  33. 33. El Emam K, Jonker E, Arbuckle L, Malin B. A systematic review of re-identification attacks on health data. PLoS One 2011;6:e28071-e28071

  34. 34. Cohen IG, Amarasingham R, Shah A, Xie B, Lo B. The legal and ethical concerns that rise from using complex predictive analytics in health care. Health Aff (Millwood) 2014;33:1139-1147

  35. 35. Encouraging the use of, and rethinking protections for de-identified (and “anonymized”) health data. Washington, DC: Center for Democracy & Technology, June 2009 (

Citing Articles (9)


    1. Table 1. HIPAA Data-Sharing Pathways for Research Purposes without Patient Consent or Waiver of Consent by an Institutional Review Board.
      Table 1. HIPAA Data-Sharing Pathways for Research Purposes without Patient Consent or Waiver of Consent by an Institutional Review Board.
    2. Figure 1. Example of a Violation of the Date Restriction in the Safe-Harbor Pathway.
      Figure 1. Example of a Violation of the Date Restriction in the Safe-Harbor Pathway.

      Given the study period of October 15, 2009, through January 31, 2010, if a physician informed researchers that a patient was vaccinated 200 days after a randomly chosen, fixed reference date and had an adverse event 250 days after the reference date, the researchers would know that the patient must have been vaccinated in the second half of 2009. Providing this information would violate the safe harbor, which prohibits covered entities from sharing all elements of patient-specific dates, except the year.