Why Greater Diversity is Needed in Genomic Research

Why Greater Diversity is Needed in Genomic Research

Including participants of diverse ancestries in genome-wide association studies improves disease risk prediction, diagnosis, and development of treatments

Michelle Dotzert, PhD

In 1990, an international team of scientists began the Human Genome Project—an enormous undertaking to sequence and map all the genes in the human genome. By 2003, the Human Genome Project was complete, and the team had elucidated the genetic blueprints for our species. This landmark feat set the stage for further genomic research and has led to significant advancements in precision medicine, which has the potential to revolutionize patient care by enabling earlier diagnoses and individually tailored therapy.

Unfortunately, the genomic research that underlies these developments in precision medicine suffers from a striking lack of diversity, which may restrict its reach to select populations and further widen growing disparities in health care. A 2009 analysis revealed that 96 percent of participants in genome-wide association studies (GWAS) were of European descent, and despite some recent improvement (by 2016, 20 percent of GWAS participants were of non-European descent), people of African and Latin American ancestry, as well as Hispanic people and indigenous peoples continue to be underrepresented in genomic research.1 

In addition to addressing some of the social, legal, and ethical implications of underrepresentation, working toward greater diversity in genomic research will make it easier for researchers to identify genetic variants and predict, diagnose, and treat disease across all populations. 

Increased fine-mapping resolution

In GWAS, the genomes of many individuals are analyzed to identify genotype-phenotype associations, which provide insight into disease susceptibility, drug targets, disease biomarkers, and risk prediction for various therapeutics. Whole-genome sequencing technologies may be used for GWAS, but single-nucleotide polymorphism (SNP) arrays are the most widely used genotyping technology.2 

“Genomic research that underlies developments in precision medicine suffers from a striking lack of diversity, which may restrict its reach to select populations and further widen growing disparities in health care.”

GWASs compare common genetic variants (often SNPs) between a baseline population and a population with a trait of interest, such as a height or a particular disease, to identify trait-associated genetic loci. Using fine-mapping techniques, it is possible to identify the particular genetic variants within these regions that are likely to causally influence the trait. The inclusion of more diverse participants in genomics research has been shown to significantly increase fine-mapping resolution of these causal variants.3,4 The effects of ancestral diversity on fine-mapping resolution may be assessed based on the number of SNPs within a credible set, with smaller credible sets containing fewer SNPs indicative of greater resolution.3 In one study, for example, combining 95 percent credible sets (sets of plausible causal variants) from individuals of European descent with those from a more diverse population significantly shrunk the credible sets, whereas the addition of data from another population of European descent had no effect.5 

Identification of previously unknown variants 

Greater diversity in genetic research can provide insights into specific pathogenic variants that differ across populations. This knowledge may be used to guide testing and appropriate clinical interventions. 

Mendelian diseases are often the result of a single pathogenic variant that occurs across different populations, but there are notable exceptions that demonstrate the importance of studying more diverse populations. Cystic fibrosis is much less prevalent among individuals of African descent, largely because it has been underdiagnosed as the causative allele frequently differs in this population compared to populations of European descent. The ΔF508 allele in the CFTR gene accounts for over 70 percent of cystic fibrosis cases in Europeans, but only 29 percent of cases among individuals of African ancestry.6 A different mutation, 3120+1G > A, accounts for between 15 and 65 percent of cystic fibrosis chromosomes among individuals of South African ancestry.7 These and other important findings have only been possible because they included more diverse datasets. 

Identifying disease-causing variants in diverse populations also reduces the risk of false negatives and missed diagnoses. For example, type 2 diabetes (T2D) is often diagnosed using glycated hemoglobin (HbA1c), a measure of blood glucose levels over time. However, genetic variation may produce changes in HbA1c unrelated to glycemic pathways that do not reflect T2D risk. Using genome-wide association meta-analyses in individuals from numerous cohorts of European, African, East Asian, and South Asian ancestry, researchers discovered an X-linked G6PD G202A variant that can shorten red blood cell lifespan and reduce HbA1c regardless of blood glucose levels.8 As approximately 11 percent of individuals of African American ancestry carry at least one copy of the variant, testing for T2D using HbA1c exclusively could account for up two percent of all cases that go undiagnosed in this population.8 

More relevant data for pharmacogenomics-based health care

 Our unique genetic makeup also determines how we respond to various drugs. This information can potentially be used to guide clinical decisions, such as selecting an appropriate drug and dosage that is more likely to produce the desired effect and minimize the risk of adverse events. Increasing population diversity in GWASs can help to identify susceptibility regions and identify causal variants with fine mapping.

“The inclusion of more diverse participants in genomics research has been shown to significantly increase fine-mapping resolution
of causal variants.”

Warfarin, a widely prescribed oral anticoagulant with a narrow therapeutic index, is associated with severe drug-related adverse events. The VKORC1 and CYP2C9 genes contribute to warfarin dose variability, but account for less variability among individuals of African descent than those of European or Asian descent.Despite this, genotype-guided dosing algorithms were developed based on GWASs in which African Americans were largely underrepresented. Subsequent GWASs that included populations of African American adults taking warfarin led to the discovery of a novel CYP2C SNP associated with a clinically relevant effect on warfarin dose in this population, independent of CYP2C9*2 and CYP2C9*3. 10 These findings led the United States Food & Drug Administration to update the warfarin dosing label in 2010 to include initial dosing ranges for patients with various CYP2C9 and VKORC1 genotypes.

The prevalence of asthma in the United Sates is highest among Puerto Ricans and African Americans, and bronchodilator drug response (BDR) is lowest among these populations. GWASs have identified SNPs associated with BDR in populations of European descent, but more diverse studies are lacking. Recently, a whole-genome sequencing (WGS) study conducted to identify variants important to BDR among racially diverse children led to the discovery of 27 variants that explained much of the variation among Puerto Rican, Mexican, and African individuals.11 The findings can help to guide the development of novel therapeutics and enhance precision medicine within these populations. The study also highlights the urgent need for greater diversity within genomic research, as the authors cite a lack of comparable replication cohorts prevented their efforts to replicate rare variant associations.

Advances in precision medicine have the potential to dramatically change disease risk prediction, diagnosis, and treatment. Currently, developments in precision medicine are largely based on genomic research in populations of European ancestry, and numerous populations are precluded from these benefits because of underrepresentation. The inclusion of more diverse populations in genomic research will serve to improve precision medicine for all.


1. Popejoy, A. B. & Fullerton, S. M. Genomics is failing on diversity. Nature vol. 538 161–164 (2016).

2. Tam, V. et al. Benefits and limitations of genome-wide association studies. Nat. Rev. Genet. 20, 467–484 (2019).

3. Asimit, J. L., Hatzikotoulas, K., McCarthy, M., Morris, A. P. & Zeggini, E. Trans-ethnic study design approaches for finemapping. Eur. J. Hum. Genet. 24, 1330–1336 (2016).

4. Zaitlen, N., Paşaniuc, B., Gur, T., Ziv, E. & Halperin, E. Leveraging genetic variability across populations for the identification of causal variants. Am. J. Hum. Genet. 86, 23–33 (2010).

5. Wojcik, G. L. et al. Genetic analyses of diverse populations improves discovery for complex traits. Nature 570, 514–518 (2019).

6. Sirugo, G., Williams, S. M. & Tishkoff, S. A. The Missing Diversity in Human Genetic Studies. Cell 177, 26–31 (2019).

7. Padoa, C., Goldman, A., Jenkins, T. & Ramsay, M. Cystic fibrosis carrier frequencies in populations of African origin. J. Med. Genet. 36, 41–44 (1999).

8. Wheeler, E. et al. Impact of common genetic determinants of Hemoglobin A1c on type 2 diabetes risk and diagnosis in ancestrally diverse populations: A transethnic genome-wide meta-analysis. PLoS Med. 14, 1–30 (2017).

9. Limdi, N. A. et al. Influence of CYP2C9 and VKORC1 on warfarin dose, anticoagulation attainment and maintenance among European American and African Americans NIH Public Access. Pharmacogenomics 9, 511–526 (2008).

10. Perera, M. A. et al. Genetic variants associated with warfarin dose in African-American individuals: A genome-wide association study. Lancet 382, 790–796 (2013).

11. Mak, A. C. Y. et al. Whole-Genome sequencing of pharmacogenetic drug response in racially diverse children with asthma. Am. J. Respir. Crit. Care Med. 197, 1552–1564 (2018).