Solving Rare Disease Mysteries with Genomics

The Solve-RD consortium is striving to improve rare disease diagnostics using genomics

Rare diseases are defined by their prevalence, where any given individual disease affects a small proportion of a population; but collectively, rare diseases are estimated to affect at least 3.5–5.9 percent of people worldwide.1 It is estimated that 80 percent of rare diseases have a genetic cause, and sequencing strategies are often performed to aid diagnosis.2 But even with advances in sequencing techniques, many people living with a rare disease are undiagnosed, as the molecular causes of many rare diseases are unknown.3 For rare diseases that have a diagnostic test available, diagnosis can still be a lengthy process, in part due to the heterogeneity of these diseases and their clinical presentation.3 

The Solve-RD consortium aims to improve rare disease diagnoses by combining expertise to reevaluate existing sequencing data, and incorporating new data from additional techniques.3

"Next-generation sequencing techniques, such as WES and WGS, are powerful tools that can unlock diagnostic information in rare disease patients."

Advanced sequencing for rare disease diagnosis

Advanced sequencing methods, such as whole exome sequencing (WES) and whole genome sequencing (WGS), are improving rare disease diagnoses and becoming integral techniques in identifying the genetic causes of rare diseases.4 WGS covers coding and noncoding regions of the genome; whereas WES only includes the coding regions of the genome. The sequence data is typically screened against a reference sequence (such as known rare disease gene or variant mutations). Matches between a patient’s genetic sequence and a reference sequence are reported, guiding clinical diagnosis.5

As rare diseases can be caused by mutations in noncoding DNA, WGS provides more coverage than WES. However, this added information means WGS analysis is associated with higher costs.6 Finding new ways to harness this genetic information is needed to improve diagnostic yields and improve patient outcomes.

Solve-RD: A strategy for improving rare disease diagnosis

New gene–disease and variant–disease associations are defined each year, and regularly reanalyzing patient data against updated databases can improve diagnostic yields.  But reanalysis presents challenges: the process is expensive and time consuming, and datasets are growing exponentially, where clinical practices lack the bioinformatic strategies to effectively perform reanalyses.7 One strategy that aims to harness the information gained from WGS and WES while overcoming some of the challenges posed by reanalysis is the Solve-RD consortium.

"The Solve-RD consortium aims to improve rare disease diagnoses by combining expertise to reevaluate existing sequencing data, and incorporating new data from additional techniques."

The Solve-RD consortium is set up to diagnose unidentified (or unsolved) rare disease cases, uniting researchers, clinicians, and patients across 15 countries. The project reanalyzes sequencing data from unsolved cases and combines pre-existing sequencing data with new data generated from additional techniques (epigenomic, metabolomic, deep-WES, RNA sequencing, and deep molecular phenotypic analysis). The data is collected from European Reference Networks (ERNs), or institutes associated with these networks. Unsolved rare disease cases are split into four groups, which determine the analysis strategy:

  • Cohort 1: cases that already have WGS or WES data obtained from an ERN or associated institute, where the acquired sequencing data will be reanalyzed.
  • Cohort 2: disease group cases obtained from any ERN that will undergo additional analysis.
  • Cohort 3: cases that have unique phenotypes and have been identified by experts across all ERNs.
  • Cohort 4: cases that are readily identified by experts but lack a known molecular cause. This cohort will undergo further analysis using all appropriate additional techniques.

Analyses and information are shared and accessed through the centralized databases, such as the European Genome-phenome Archive (EGA) and RD-Connect Genome-Phenome Analysis Platform (GPAP). For effective data analyses and interpretations, Solve-RD was organized into task forces and working groups. The Data Analysis Task Force (data scientists and genomics experts) focus on performing data analysis and developing analytical tools, including working groups for specific analyses. The Data Interpretation Task Force (clinicians and geneticists) focus on disease interpretation, such as identifying cases and requirements for data analysis in the ERNs. Both task forces work together for analysis, enabling effective sharing of expertise.3

Solve-RD in practice

"To date, the Solve-RD consortium has diagnosed 255 cases that were previously unsolved."

To date, the Solve-RD consortium has diagnosed 255 cases that were previously unsolved.3 Among these cases, Matalonga and colleagues highlighted how Solve-RD can diagnose unsolved cases by reanalyzing WES and WGS data, as well as documenting a workflow for reanalysis.7 The developed python package required simple inputs (identifiers and filtering parameters) before fast and automated reanalysis of the sequencing data was performed. After reanalysis of genomic data from 4,703 individuals, the researchers were able to diagnose 120 unsolved rare disease cases. The authors noted that two years prior, 13 percent of the identified pathological variants were unknown, highlighting the quick development of rare disease knowledge. The workflow used in this study can also streamline the reanalysis process by providing information in the output file that is useful for clinical assessment. Consequently, this study demonstrates a viable method for performing time-efficient and regular genomic reanalysis of unsolved rare disease cases using Solve-RD.7

Detecting mosaic variants

In addition, several other studies have demonstrated the ability of the Solve-RD consortium to identify the genetic causes of rare diseases. For example, researchers at Radboud University Medical Center reanalyzed WES data in a patient with hereditary diffuse gastric cancer, where the causative variant is unknown in many affected individuals. Reanalysis of WES data revealed a mosaic missense mutation in PIK3CA in the patient.8 Although the researchers note that PIK3CA has not previously been associated with hereditary diffuse gastric cancer, variants of this gene are linked to syndromes with tissue overgrowth, lesions, or malformations. Importantly, by using a low variant allele frequency cut-off in their analysis, te Paske and colleagues were able to detect mosaic variants—which can be missed in analyses with higher variant allele frequency cut-offs, as mosaic variants can have low variant reads.8,9

Detecting mitochondrial DNA variants

Through the Solve-RD consortium, reanalysis of WES data by de Boer and colleagues revealed a variant in the mitochondrial gene MT-TL1 (encoding a mitochondrial transfer RNA) in a patient with an unsolved severe intellectual disability.10 MT-TL1 variants are associated with other conditions, such as myopathy, encephalopathy, lactic acidosis, and stroke-like episodes, and disease presentation can be variable. The patient in this study presented with additional neurodevelopmental and neuromuscular conditions, and whether or not some of these clinical features arose from another variant (in addition to MT-TL1) is unclear. However, the study demonstrated the potential of reanalyzing WES data in probing mitochondrial DNA variants.10

Next-generation sequencing techniques, such as WES and WGS, are powerful tools that can unlock diagnostic information in rare disease patients. Though the molecular causes of many rare diseases are unknown, with new molecular causes revealed each year, reanalyzing existing patient sequencing data can improve diagnostic yields. The Solve-RD consortium offers a powerful strategy to harness this sequencing data, providing an up-to-date bioinformatic pipeline that can diagnose unsolved rare disease cases.


  1. Nguengang Wakap, S. et al. “Estimating cumulative point prevalence of rare diseases: analysis of the Orphanet database.” European Journal of Human Genetics EJHG (2020): 165–173. 
  2. Daoud, Hussein et al. “Next-generation sequencing for diagnosis of rare diseases in the neonatal intensive care unit.” CMAJ : Canadian Medical Association Journal (2016): E254-E260.
  3. Zurek, Birte et al. “Solve-RD: systematic pan-European data sharing and collaborative analysis to solve rare diseases.” European Journal of Human Genetics (2021). 
  4. Vinkšel, M. et al. “Improving diagnostics of rare genetic diseases with NGS approaches.” Journal of Community Genetics (2021): 247–256.
  5. Marshall, C.R. et al. “Best practices for the analytical validation of clinical whole-genome sequencing intended for the diagnosis of germline disease.” Genomic Medicine (2020).
  6. Dunn, P. et al. “Next generation sequencing methods for diagnosis of epilepsy syndromes.” Frontiers in Genetics (2018).
  7. Matalonga, L. et al. “Solving patients with rare diseases through programmatic reanalysis of genome-phenome data.” European Journal of Human Genetics (2021).
  8. te Paske, I.B.A.W. et al. “A mosaic PIK3CA variant in a young adult with diffuse gastric cancer: case report.” European Journal of Human Genetics (2021).
  9. Dou, Y. et al. “Accurate detection of mosaic variants in sequencing data without matched controls.” Nature Biotechnology (2020): 314–319.
  10. de Boer, E. et al. “A MT-TL1 variant identified by whole exome sequencing in an individual with intellectual disability, epilepsy, and spastic tetraparesis.” European Journal of Human Genetics (2021).