February 18, 2021Erica Tennenhouse, PhD
Adam Harris, PhD, is a senior manager, research and development, at Thermo Fisher Scientific. He has been working with next generation sequencing technology for 13 years, focused primarily on library preparation. He led development work on the Ion ReproSeq™ and Ion CarrierSeq™ product lines
Q: What is the difference between traditional and expanded carrier screening?
A: In traditional carrier screening, there’s usually some kind of family history or a particular concern for a couple, and so they might get tested for a single gene or a few genes. Sometimes it’s a little broader and they will be tested for several genes that show frequent variants among people of their ethnicity. In expanded carrier screening, you don’t make any assumptions based on ethnicity or family history; you look for as many carrier mutations as is practical. The goal is to reduce the overall risk of genetic abnormalities in a child. Expanded screening can catch disease-causing variants that wouldn’t be found by traditional carrier screening due to gaps in knowledge about family history and ambiguity around ethnic background.
Q: What platforms do traditional and expanded carrier screening use?
A: Traditional carrier screening can be done with relatively simple technologies, including PCR-based assays, Sanger sequencing, or low-density microarrays. Expanded carrier screening is much more focused on high-throughput approaches like next generation sequencing (NGS) and high-density microarrays.
Q: How does NGS compare to other approaches for carrier screening?
A: For expanded screening, you really need high-throughput approaches like NGS and microarrays. However, you don’t get the same information from microarrays as you do from sequencing. With sequencing, you have the variant in the context of the surrounding sequence. You get what’s called phased information—if there are multiple variants, you can see how they are linked together within individual reads. That makes it easier to look at highly homologous genes, because the context can be used to tell if the variant is in one gene or a homologous gene, or if it is in a pseudogene where it has no effect versus a gene where it is deleterious. You get more
context about where a variant is when you’re running it on a sequencing platform than on an array platform.
Q: What are the greatest informatics challenges of carrier screening?
A: The greatest challenges are from highly homologous genes and gene-pseudogene pairs. Even though we do have more context about the variants from sequencing, we’re still peering at the information through a series of small windows. The very high-throughput NGS platforms that are needed to do expanded screening tend to produce relatively short reads. They give context, but not the entire gene sequence all at once. Take HBA1 and 2, for example, which are involved in alpha-thalassemia, or SMN1 and 2, which are involved in spinal muscular atrophy. Both of these gene pairs have important copy number variants. NGS gives us sequence information from across these genes, but it comes in pieces. Those pieces need to be brought together informatically to deduce, based on the preponderance of evidence, how the region is affected by variants. For alpha-thalassemia, the severity depends on whether one or both genes are deleted. For spinal muscular atrophy, variants or deletions in SMN1 have profound effects, while SMN2 copy number loss alone cannot cause disease. Because the gene pairs share a lot of homology, we have sequences that could belong to either gene. But we also have sequences covering differences between the two genes. We get average copy number information from one set of reads, and e get differential copy number information from the other. A person could sit down and bring all of these data together to produce a call for a sample in a few hours. But high-throughput screens look at hundreds of genes and a dozen or more samples all at the same time. We need to utilize well-designed bioinformatic algorithms to quickly and reliably ascertain the variants present across sample sets.