Race and Genetic Ancestry in Medicine — A Time for Reckoning with RacismList of authors.
Dr. Borrell, Ms. Elhawary, and Dr. Burchard contributed equally to this article.
In the United States, race, ancestry, genetics, and medicine are inextricably linked in a complex and fraught history. Medicine is replete with examples of racial injustice inflicted by the use of race and ethnicity as biologic constructs to engender hierarchical discrimination. Race and ethnicity are dynamic, shaped by geographic, cultural, and sociopolitical forces; they can influence people’s socioeconomic position and lead to disproportionately high morbidity and mortality for racial and ethnic minorities by sustaining inequitable access to resources, including health care.1
Nevertheless, we believe that it is inappropriate to simply abandon the use of race and ethnicity in biomedical research and clinical practice, since these variables capture important epidemiologic information, including social determinants of health such as racism and discrimination, socioeconomic position, and environmental exposures. Eliminating the use of race/ethnicity, or implementing a race/ethnicity-blind approach, could enable inequitable health care systems to persist and exacerbate racial/ethnic inequities in health outcomes. Complementing the use of race/ethnicity with data on genetic ancestry, genotypes, or biomarkers might be useful, but risks and benefits should be analyzed carefully for specific clinical applications.
Racial Categorizations in the United States
The 1787 U.S. Constitutional Convention adopted the “Three-Fifths Compromise,” which considered each enslaved African to be three fifths of a person, allowing increased representation for the southern states in the House of Delegates, without what they saw as overtaxation. Thus, three racial categories were defined in the first U.S. Census in 1790 and became deeply ingrained in the social fabric of the United States: White people and Native Americans each counted as one whole tax-paying person, and slaves or Black people counted as three fifths of a person.2 Although the Three-Fifths Compromise was repealed in 1868, the U.S. Census continues to classify people based on their racial identification.3
The Office of Management and Budget classifies people by ethnicity as well as racial identification.4 Ethnicity (as in Hispanic/Latino) captures the common values, cultural norms, and behaviors of people who are linked by shared culture and language, whereas race refers to one’s identification with a group or identity ascribed on the basis of physical characteristics and skin color.5 Census questions are intended to reflect self-defined membership in a social category, without anthropologic or genetic meaning,6 and census data are used to determine resource allocation and political representation.
Race as a Master Status Variable
Race is considered a master status,7 or a primary identifying characteristic reflecting a social position ascribed to a person that may affect every aspect of their life. Race influences social interactions and access to opportunities and societal resources.8 For example, race was the driver of “redlining,” a legal form of residential segregation9 that resulted in disinvestment in education and social services, poor housing, limited community resources such as parks and grocery stores, unemployment, and poor access to health care for Black communities.
Race/ethnicity has been used to evaluate differences in clinical measures and outcomes and is used by researchers in established analytic approaches. Unfortunately, even after analysts control for socioeconomic indicators such as education and income, environmental exposures, and other established risk factors, they frequently observe a greater risk of adverse health outcomes among Black Americans than among White Americans. This increased risk is often reported without explanation or is presented as an intrinsic biologic difference between races. These “intrinsic differences” actually capture racialized expressions of biology or the embodiment of inequities related to unmeasured risk factors or exposures, including exposure to individual and structural racism.10
Genetic Ancestry and Admixture
In a society in which inequities in health care affect many disease outcomes, it may seem reasonable to assume that all racial/ethnic differences in disease incidence and outcomes derive from socioeconomic differences. However, race is also directly associated with genetic ancestry and therefore indirectly related to genetic variants that may affect disease and health outcomes. Genomewide genotyping methods and advanced computational algorithms now enable scientists to infer the geographic origins of a person’s ancestors from minute differences in the cumulative frequency of thousands of genetic variants (alleles). These methods and algorithms have been applied, without bias, to large populations worldwide. The largest genetic clusters of people correspond to geographic regions and specific populations in Africa, Europe, Asia, Oceania, and the Americas,11 suggesting that continental-level ancestry captures the greatest population differences in genetic variation. Ancestry assessment within continents can provide information on a finer scale.12
Although race/ethnicity correlates with genetic ancestry,13 it captures different information. Race and ethnicity are self-ascribed or socially ascribed identities and are often “assigned” by police, hospital staff, or others on the basis of physical characteristics. Genetic ancestry is the genetic origin of one’s population. Although race/ethnicity may capture information about the likely presence of certain genetic variants, ancestry is a better predictor.14 Genetic admixture, or genetic exchange among people from different ancestries, is an important characteristic of many populations and may correlate with individuals’ risk for certain genetic diseases.15 And there may be substantial variation in ancestry among and within populations16; U.S. Black populations, for example, have larger proportions of African than of European ancestry, which vary with the year and location in which samples are obtained.17 Latino Americans, the largest and fastest-growing U.S. minority population, are an admixed group of European, Native American, and African ancestries (Figure 1).18
The race/ethnicity categories used in biomedical research and clinical practice are broad and less precise than ancestry. Consider a Black–White biracial male firefighter who presents with a smoke-inhalation injury. How would he be classified? He could self-identify as Black or White, but society would probably label him as Black. From a clinical perspective, he is a combination of Black and White. This ambiguity may contribute to misdiagnosis and is particularly troubling when someone’s race/ethnicity is assigned by health professionals or police. In addition, different health systems may use different racial/ethnic categories. In contrast, ancestry is a fixed characteristic of the genome.
Ancestry testing using millions of genetic markers has significantly advanced our understanding of globally and geographically diverse populations, leading to improved clinical predictions. For example, in Black and Latino people, the proportion of African ancestry predicts differences in creatinine levels and estimated glomerular filtration rate (eGFR). When 10% of Latino people initially deemed to have stage 3 chronic kidney disease had their disease reclassified as stage 2 on the basis of ancestry, their electrolyte levels were more consistent with their ancestry-adjusted stage than their race-adjusted stage.19 In addition, validation of the eGFR equations within three Asian populations yielded different adjusted predicted values,20 suggesting that GFR varies within racial/ethnic groups. We do not yet know, however, whether ancestry adjustment leads to better estimation of GFR than do race-adjusted or race/ancestry-independent methods. The alarming decision by some health care institutions to remove race from GFR calculations ignores potential population differences without considering the clinical performance characteristics or consequences for Black patients.14,21 Though it may be tempting to consider ancestry in such equations, the true cause of observed racial differences in creatinine levels is unknown.
Racial/ethnic differences in risk for disease and response to treatments are partially related to biologic factors, including genetic and epigenetic variants. Using ancestry as a variable helps to capture and explain a portion of the biologic variation between and within groups. For example, in the first large-scale epigenetic study of asthma in minority children, ancestry explained 75% of the total variance in epigenetic patterns, suggesting that race/ethnicity, as a proxy for socioenvironmental exposures, explained the remaining 25%.22 Thus, race/ethnicity may be better than ancestry as a predictor of nongenetic factors. We would argue that both variables are important and are complementary in biomedical research and clinical practice.
Genetic Ancestry versus Individual Clinical Predictors
The National Institutes of Health has made a concerted effort to include racial/ethnic minority populations in biomedical and clinical studies. However, years of inadequate funding for research in these communities have created significant knowledge gaps regarding the generalizability of biomedical discoveries and clinical advances to non-White populations. Less than 2% of National Cancer Institute–funded clinical trials have included non-White participants.23
Still, population-specific genetic variants contributing to clinical differences between racial/ethnic groups have been identified using a limited number of racially/ethnically diverse studies. For example, genetic variants at the 6q25 locus identified in Latina women are associated with protection against breast cancer and originate from Indigenous American populations.24 APOL1 genotypes, which are more common among people with West African ancestry,25 are strongly associated with focal sclerosing glomerulosclerosis, nondiabetic kidney disease, and HIV nephropathy, which can lead to early-onset end-stage kidney failure.26 However, most people with the high-risk genotype do not have rapid progression to kidney failure, which suggests that additional genetic and nongenetic factors influence its effect.
Prostate cancer is more than twice as common among Black men as among White men.27 Genomewide association studies have identified variants at 8q24 that are associated with prostate-cancer risk in many populations, including variants that are more common in Black men and account for much of their excess risk of prostate cancer.28 In another example, a black-box warning added to Plavix (clopidogrel) in 2010 stated that “poor metabolizers may not receive the full benefit of Plavix treatment and may remain at risk for heart attack, stroke, and cardiovascular death.”29 Among people with no response to Plavix, as many as 75% of Asians and Pacific Islanders lack the CYP2C19 genetic polymorphism required to metabolize the prodrug into its active form.29,30 Although there are examples of genetic variants underlying racial/ethnic differences in disease occurrence or outcomes, more often the causes of such differences are unknown, either because unrecognized nongenetic factors are key or because genetic research has failed to incorporate racial/ethnic diversity.31
Globally diverse populations must be studied because genetic variation and genome architecture vary among populations. More than 80% of participants in existing genomewide association studies are of European background; Black and Latino people, who account for more than 30% of the U.S. population, are dramatically underrepresented (about 2% and <0.5%, respectively).31 Less than 4.5% of federally funded pulmonary research has included minority populations, despite evidence of significant population-specific differences in the distribution of genetic risk variants for common diseases such as asthma.32,33
Such disparities perpetuate the gap in access to precision medicine for non-White populations. For example, genetic variants within known cancer risk genes are well identified in populations of European ancestry, but often the same variants are classified as “variants of uncertain significance” in people of non-European ancestry.34 As the push toward precision medicine intensifies, this worrisome deficit in genetic research will grow, leaving much of the global population behind. Unless we act now, the promise of precision medicine will be available to, and benefit, only a select few.31,35
Furthermore, genetic studies of non-European populations are important even if genetic variants are not responsible for overall differences in disease incidence or outcomes. Specifically, the frequency and effect sizes of genetic variants associated with disease risk may vary across populations.31 Polygenic risk scores derived from studies of populations with European ancestry have less predictive power when applied to non-European populations.31 For example, the polygenic risk score for breast cancer is about one third as predictive for Black women as for women of European descent,36 a disparity with clear implications for the future of precision medicine.
Informed Use of Race, Ethnicity, and Ancestry
Race, ethnicity, and ancestry have a complex and intertwined relationship that demands nuanced analyses. We believe that associations between race/ethnicity and disease outcomes should be interpreted carefully and that we should not assume that environmental, social, or genetic factors represent the only contributors to a given disease until causation has been proven. Conversely, we should avoid assuming that genetic causes have been ruled out, as this could undermine the discovery of genetic variants like the 8q24 variants that may partially explain increased prostate-cancer incidence among Black men.28
We believe that decisions regarding the use of race/ethnicity as a predictor in algorithms and mathematical risk models should consider whether the model’s underlying data are strongly associated with race/ethnicity and whether the inclusion or exclusion of race/ethnicity results in better health outcomes and reduced health inequities. For example, it has been claimed that race adjustment may overestimate the GFR in some Black patients and contribute to delays in referral for renal transplantation, but the nonadjusted equation may underestimate Black patients’ GFR, resulting in underdosage or denial of certain medications or foreclosed opportunities for kidney donation. An alternative approach is to calculate the eGFR using cystatin C, a biomarker of renal function, instead of creatinine, but the related testing costs are significantly higher.
Similarly, race-specific reference equations for lung function reflect the lower average measures of normal lung function observed in non-White groups.37,38 Consequently, relative to the equations derived from White populations, those derived from Black populations will yield a higher percentage of predicted values for lung function, which could lead to underestimating the severity of lung disease, with clinical implications including delayed detection, missed opportunities for medical management of symptoms, denial of disability claims, and delayed access to lifesaving treatments such as lung transplantation. On the flip side, using an equation derived from White populations in other racial/ethnic groups may lead to overdiagnosis, excessive follow-up testing, anxiety for patients, and compromised eligibility for treatments such as stem-cell transplantation for cancer.39 Moreover, the application of White-derived lung- and kidney-function equations to Black patients ignores long-recognized racial/ethnic differences in normal physiological function or biomarkers and is itself a form of racial discrimination.
As noted above, adjusting eGFR for ancestry rather than race could result in reclassification of patients’ kidney disease. However, before ancestry adjustment is widely adopted, it is important to demonstrate that it provides results at least as accurate as those of race adjustment. Ideally, ancestry-adjusted results should be evaluated on the basis of prediction of disease or clinically significant outcomes. In several diverse cohorts, for example, mathematical risk models of lung function that included ancestry plus self-identified race/ethnicity yielded more strongly predictive results than models including only self-identified race/ethnicity.40 Data from longitudinal clinical studies of diverse populations evaluated for kidney and lung disease are needed to determine whether race-based equations, ancestry-adjusted equations, or equations that ignore both variables better predict clinically significant outcomes such as diagnosis, disease severity, prognosis, risk of surgical complications, and eligibility for lung transplantation. This debate calls attention to the National Institutes of Health and its disease-focused and organ-based institutes — that is, the National Institute of Diabetes and Digestive and Kidney Diseases and the National Heart, Lung, and Blood Institute — to challenge researchers to determine which prediction equation is the most clinically accurate.
Even where there is known genetic variation related to specific diseases, the use of race/ethnicity may be important in measuring and addressing nongenetic causes of health inequities. Although the higher incidence of prostate cancer among Black men, for example, may be partially explained by genetic variants,28 ancestry may be less important than race/ethnicity in determining clinical outcomes: among men with prostate cancer, race/ethnicity is associated with disparities in access and treatment.41,42
Although some such disparities may be partially captured by careful attention to socioeconomic factors, others may be more deeply rooted in racial stratification, which drives access to care, bias, and racial discrimination or racism. For example, access to organ transplantation is systematically lower for Black patients with end-stage renal disease than for their White counterparts,43 possibly owing in part to physician bias.44 Attention to race/ethnicity is important not only for documenting disparities; interventions designed to reduce disparities have been demonstrated to improve outcomes.45
Considering genetic ancestry in addition to self-identified race/ethnicity has improved our understanding of disease and facilitated the development of interventions. But for many conditions, the relative importance of bias, racial discrimination, culture, socioeconomic status, access to care, environmental factors, and genetics to racial/ethnic differences in disease has not been adequately studied. The combination of these influential correlates of health is captured, albeit imperfectly, by the variable of race/ethnicity, and ignoring it would be counterproductive.
Indeed, we contend that the epidemiologic importance of race/ethnicity will never disappear. Genetic research has advanced our understanding of human disease and therapies that, if made available equitably, could advance care and promote health equity in all groups. But we also recognize that financial, privacy, and societal costs associated with advances in genetics and medicine could exacerbate racial/ethnic health inequities. Therefore, ignoring race and ethnicity in biomedical research and medicine is not the answer to the health-inequity epidemic. Instead, scientists and clinicians should continue to use racial/ethnic categories to address and eliminate health inequities until better predictors are available.
By attending to these issues, we can further elucidate variations in disease onset, progression, and severity among and within racial/ethnic groups. Furthermore, given the emergence of precision medicine and the persistent salience of overt racism, abandoning race/ethnicity without substituting better disease predictors not only is irresponsible but also ignores the reality of U.S. social stratification and its implications for population health.
Funding and Disclosures
Disclosure forms provided by the authors are available at NEJM.org.
Dr. Borrell, Ms. Elhawary, and Dr. Burchard contributed equally to this article.
This article was published on January 6, 2021, at NEJM.org.
1. Health, United States, 2015: with special feature on racial and ethnic health disparities. Hyattsville, MD: National Center for Health Statistics, 2016.
2. Nobles M. Shades of citizenship: race and the Census in modern politics.: Stanford, CA: Stanford University Press, 2000.
3. Nobles M. History counts: a comparative analysis of racial/color categorization in US and Brazilian censuses. Am J Public Health 2000;90:1738-1745.
4. United States Census 2020. Questions asked on the form. 2020 (https://2020census.gov/en/about-questions.html).
5. Borrell LN. Racial identity among Hispanics: implications for health and well-being. Am J Public Health 2005;95:379-381.
6. United States Census 2020. 2020 Census questions: race. 2020 (https://2020census.gov/en/about-questions/2020-census-questions-race.html).
7. Hughes EC. Dilemmas and contradictions of status. Am J Sociol 1945;50:353-359.
8. Master status. In: Bell K, ed. Open education sociology dictionary. 2013 (https://sociologydictionary.org/master-status/).
9. Gross T. A ‘forgotten history’ of how the U.S. government segregated America. National Public Radio (NPR). May 3, 2017 (https://www.npr.org/2017/05/03/526655831/a-forgotten-history-of-how-the-u-s-government-segregated-america).
10. Krieger N. Embodying inequality: a review of concepts, measures, and methods for studying health consequences of discrimination. Int J Health Serv 1999;29:295-352.
11. Jakobsson M, Scholz SW, Scheet P, et al. Genotype, haplotype and copy-number variation in worldwide human populations. Nature 2008;451:998-1003.
12. Bergström A, McCarthy SA, Hui R, et al. Insights into human genetic variation and population history from 929 diverse genomes. Science 2020;367(6484):eaay5012-eaay5012.
13. Banda Y, Kvale MN, Hoffmann TJ, et al. Characterizing race/ethnicity and genetic ancestry for 100,000 subjects in the Genetic Epidemiology Research on Adult Health and Aging (GERA) cohort. Genetics 2015;200:1285-1295.
14. Denny JC. Chapter 13: mining electronic health records in the genomics era. PLoS Comput Biol 2012;8(12):e1002823-e1002823.
15. Hoffman JD, Park JJ, Schreiber-Agus N, et al. The Ashkenazi Jewish carrier screening panel: evolution, status quo, and disparities. Prenat Diagn 2014;34:1161-1167.
16. Micheletti SJ, Bryc K, Ancona Esselmann SG, et al. Genetic consequences of the transatlantic slave trade in the Americas. Am J Hum Genet 2020;107:265-277.
17. Bryc K, Durand EY, Macpherson JM, Reich D, Mountain JL. The genetic ancestry of African Americans, Latinos, and European Americans across the United States. Am J Hum Genet 2015;96:37-53.
18. González Burchard E, Borrell LN, Choudhry S, et al. Latino populations: a unique opportunity for the study of race, genetics, and social environment in epidemiological research. Am J Public Health 2005;95:2161-2168.
19. Udler MS, Nadkarni GN, Belbin G, et al. Effect of genetic African ancestry on eGFR and kidney disease. J Am Soc Nephrol 2015;26:1682-1692.
20. Teo BW, Zhang L, Guh J-Y, et al. Glomerular filtration rates in Asians. Adv Chronic Kidney Dis 2018;25:41-48.
21. Powe NR. Black kidney function matters: use or misuse of race? JAMA 2020;324:737-738.
22. Galanter JM, Gignoux CR, Oh SS, et al. Differential methylation between ethnic sub-groups reflects the effect of genetic ancestry and environmental exposures. Elife 2017;6:e20532-e20532.
23. Chen MS Jr, Lara PN, Dang JH, Paterniti DA, Kelly K. Twenty years post-NIH Revitalization Act: Enhancing Minority Participation in Clinical Trials (EMPaCT): laying the groundwork for improving minority clinical trial accrual: renewing the case for enhancing minority participation in cancer clinical trials. Cancer 2014;120:Suppl 7:1091-1096.
24. Liu Y, Sun L, Liu J, et al. Multicolor (Vis-NIR) mesoporous silica nanospheres linked with lanthanide complexes using 2-(5-bromothiophen)imidazo[4,5-f][1,10]phenanthroline for in vitro bioimaging. Dalton Trans 2015;44:237-246.
25. Nadkarni GN, Gignoux CR, Sorokin EP, et al. Worldwide frequencies of APOL1 renal risk variants. N Engl J Med 2018;379:2571-2572.
26. Kopp JB, Nelson GW, Sampath K, et al. APOL1 genetic variants in focal segmental glomerulosclerosis and HIV-associated nephropathy. J Am Soc Nephrol 2011;22:2129-2137.
27. National Cancer Institute. Prostate cancer: recent trends in SEER age-adjusted incidence rates, 2000–2017 (https://seer.cancer.gov/explorer/application.html?site=66&data_type=1&graph_type=2&compareBy=race&chk_race_5=5&chk_race_4=4&chk_race_3=3&chk_race_6=6&chk_race_2=2&hdn_sex=2&age_range=1&stage=101&rate_type=1&advopt_precision=1&advopt_display=2).
28. Han Y, Rand KA, Hazelett DJ, et al. Prostate cancer susceptibility in men of African ancestry at 8q24. J Natl Cancer Inst 2016;108:108-108.
29. Food and Drug Administration. FDA drug safety communication: reduced effectiveness of Plavix (clopidogrel) in patients who are poor metabolizers of the drug. March 12, 2010 (https://www.fda.gov/drugs/postmarket-drug-safety-information-patients-and-providers/fda-drug-safety-communication-reduced-effectiveness-plavix-clopidogrel-patients-who-are-poor).
30. Wu AH, White MJ, Oh S, Burchard E. The Hawaii clopidogrel lawsuit: the possible effect on clinical laboratory testing. Per Med 2015;12:179-181.
31. Martin AR, Kanai M, Kamatani Y, Okada Y, Neale BM, Daly MJ. Clinical use of current polygenic risk scores may exacerbate health disparities. Nat Genet 2019;51:584-591.
32. Burchard EG, Oh SS, Foreman MG, Celedón JC. Moving toward true inclusion of racial/ethnic minorities in federally funded studies: a key step for achieving respiratory health equality in the United States. Am J Respir Crit Care Med 2015;191:514-521.
33. Torgerson DG, Ampleford EJ, Chiu GY, et al. Meta-analysis of genome-wide association studies of asthma in ethnically diverse North American populations. Nat Genet 2011;43:887-892.
34. Manrai AK, Funke BH, Rehm HL, et al. Genetic misdiagnoses and the potential for health disparities. N Engl J Med 2016;375:655-665.
35. Sirugo G, Williams SM, Tishkoff SA. The missing diversity in human genetic studies. Cell 2019;177:26-31.
36. Huo D, Hu H, Rhie SK, et al. Comparison of breast cancer molecular features and survival by African and European ancestry in the Cancer Genome Atlas. JAMA Oncol 2017;3:1654-1662.
37. Quanjer PH, Stanojevic S, Cole TJ, et al. Multi-ethnic reference values for spirometry for the 3-95-yr age range: the Global Lung Function 2012 equations. Eur Respir J 2012;40:1324-1343.
38. Hankinson JL, Odencrantz JR, Fedan KB. Spirometric reference values from a sample of the general U.S. population. Am J Respir Crit Care Med 1999;159:179-187.
39. Chien JW, Sullivan KM. Carbon monoxide diffusion capacity: how low can you go for hematopoietic cell transplantation eligibility? Biol Blood Marrow Transplant 2009;15:447-453.
40. Moreno-Estrada A, Gignoux CR, Fernández-López JC, et al. Human genetics: the genetics of Mexico recapitulates Native American substructure and affects biomedical traits. Science 2014;344:1280-1285.
41. Stokes WA, Hendrix LH, Royce TJ, et al. Racial differences in time from prostate cancer diagnosis to treatment initiation: a population-based study. Cancer 2013;119:2486-2493.
42. Crews DC, Liu Y, Boulware LE. Disparities in the burden, outcomes, and care of chronic kidney disease. Curr Opin Nephrol Hypertens 2014;23:298-305.
43. Epstein AM, Ayanian JZ, Keogh JH, et al. Racial disparities in access to renal transplantation — clinically appropriate or due to underuse or overuse? N Engl J Med 2000;343:1537-1544.
44. Ayanian JZ, Cleary PD, Keogh JH, Noonan SJ, David-Kasdan JA, Epstein AM. Physicians’ beliefs about racial differences in referral for renal transplantation. Am J Kidney Dis 2004;43:350-357.
45. Victor RG, Lynch K, Li N, et al. A cluster-randomized trial of blood-pressure reduction in Black barbershops. N Engl J Med 2018;378:1291-1301.
Citing Articles (232)
- Genetic Admixture in the Mexican American and Puerto Rican Populations.