Tracking Emerging COVID-19 Variants

Viral SNPs can be used to track new coronavirus variants, while SNPs in patient genomes could influence clinical care

Suzanne Leech, PhD

Modern molecular methods facilitate the rapid and precise detection of genomic sequences. The smallest and most common form of variability in the genetic code, single nucleotide polymorphisms (SNPs), are single-base changes that can be used to distinguish between pathogenic variants and predict the disease susceptibility and treatment responses of patients.

The first whole-genome sequence of the coronavirus SARS-CoV-2, the viral agent of COVID-19, was published in February 2020, and since then, 471,919 SARS-CoV-2 genomes have been shared to the NCBI database. A study comparing 10,664 genomes sequenced in 73 countries revealed the genetic diversity of the virus, which had formed five predominant clusters of viral variants with 107 SNPs. GISAID now recognizes 10 viral clades, each of which is divided and subdivided into various lineages.

"SNPs throughout the genome, including noncoding areas, can be used to track the spread and identify the origin of variants." 

Coronaviruses, like other RNA viruses, are prone to mutation due to errors in copying the single-stranded RNA genomes. Coronaviruses show relatively slow rates of recombination and mutation compared with other RNA viruses, with a rate of approximately two single-base mutations per month, about half as many as influenza. Most SARS-CoV-2 mutations of consequence occur in the spike protein, which facilitates viral access into host cells. Common spike mutations allow the virus to enter cells more easily, increasing their transmission rate, and these and other mutations are used to designate SARS-CoV-2 variants. For instance, lineage B.1.617.2, which emerged in India, contains 15 spike protein mutations, including D614G, one of the first identified spike changes.

Using SNPs to identify and track new COVID-19 variants

SNPs throughout the genome, including noncoding areas, can be used to track the spread and identify the origin of variants. A study of 48,635 SARS-CoV-2 genomes in July 2020 revealed the emergence pattern of viral clades on different continents over time. The original “L clade” emerged in China in December 2019 and the “G clade” appeared in Europe at the beginning of 2020. The G and related clades reached the US, Canada, and Asia by March 2020 to become the fastest growing populations. 

"Their data also showed that the virus was circulating in the Chinese population one month before the first outbreak was reported in Wuhan." 

Two scientists from the 100K Pathogen Genome Project used global COVID-19 incidences and transmission dynamics, the viral reproductive (R) number, and SNPs to identify how genomic changes to the virus have influenced its transmission. They found that SNPs in the genome were directly related to differences in outbreak dynamics. The team used epidemic modeling to generate the Genomic Identity (GENI) scoring system, which they claim is able to predict new cases two to five days before an outbreak. Their data also showed that the virus was circulating in the Chinese population one month before the first outbreak was reported in Wuhan.

Using SNPs to identify and track new variants has been invaluable in the battle against COVID-19. The World Health Organization (WHO) is working with laboratories worldwide to monitor changes to the viral genome and identify important mutations that could affect the transmission rate, disease severity, and the efficacy of vaccines, drugs, diagnostics, and epidemic control strategies. This information can warn of possible changes before they reach and spread throughout human populations.

Using SNPs as markers for known circulating viruses, scientists can easily and rapidly identify emerging variants. A study conducted in the Netherlands used genomic analysis to identify four viral clusters with four SNPs that were unique to the country or very rare in other regions. Monitoring the spread of new mutations and their impact on transmissibility, disease severity, and drug and vaccine efforts could provide key information that prevents a devastating mutation from spreading unchecked through populations. 

“Emerging lineages in South Africa and Brazil carrying the E484K mutation will have greatly reduced susceptibility to neutralization by the polyclonal serum antibodies of some individuals.”

In the US, mutations that are thought to prevent human serum antibodies from recognizing SARS-CoV-2 are being closely monitored by scientific teams. The mutations, such as E484K found in the Brazilian and South African variants, change the amino acid sequence of the receptor-binding domain, which is used in vaccinations to induce an immune response. Because current vaccinations are designed to induce a polyclonal immune response to several protein regions, a single SNP is unlikely to completely eradicate protection; however, if several mutations emerge within one variant, the efficacy of vaccines could be significantly impaired. In addition, researchers have found substantial heterogeneity among individual responses to variants, as the Seattle team explained in their recent publication, “emerging lineages in South Africa and Brazil carrying the E484K mutation will have greatly reduced susceptibility to neutralization by the polyclonal serum antibodies of some individuals.”

SNPs in the human genome can affect infectivity, clinical care

In addition to variations in the viral genome, researchers are interested in human genomic variations that may affect peoples’ susceptibility to COVID-19 and disease severity. A large-scale genetic study into infections in patients found four genomic sites linked to COVID-19 susceptibility and nine linked to severity. These genetic anomalies (along with socio-demographic differences) may partially explain why some individuals barely experience any symptoms while others experience severe pathology, require intensive care treatment, and have a higher risk of mortality. 

"Identifying polymorphisms in patient genomes could be used to direct clinical care and provide tailored treatments." 

Evidence suggests that human SNPs affecting angiotensin-converting enzyme 2 (ACE2), a cell-surface protein targeted by SARS-CoV-2 during cell invasion, may increase or decrease the ability of the virus to bind to host cells. The geographic distributions of ACE2 polymorphisms may contribute to the epidemic dynamics of different populations, such as the high mortality rate seen in Italy compared with China. A case-control study indicated that patients with a specific SNP in the gene for the host cell-surface enzyme transmembrane serine protease 2 (TMPRSS2), which activates the spike protein to facilitate cell entry, were twice as likely to contract COVID-19.

Identifying polymorphisms in patient genomes could be used to direct clinical care and provide tailored treatments. For example, some ethnic populations carry a higher frequency of certain SNPs. In addition, there is some evidence that ACE2 polymorphisms affect the efficacy of the antiviral drugs chloroquine and hydroxychloroquine. COVID-19 patients with the SNP rs10490770 were more susceptible to complications such as severe respiratory failure, venous blood clots, and liver damage, and had a higher mortality risk, and the SNP was particularly associated with worse prognosis in those under 60. 

The analysis of massive genomic sequence databases in combination with machine learning artificial intelligence has been used to identify genetic markers that are predictive of responses to COVID-19 vaccination. Very basic forms of focused treatment based on clinical and paraclinical indicators are already being implemented. However, more sophisticated personalized medicine has yet to be realized for COVID-19, and the benefits remain to be seen.

Methods for identifying SNPs

Whole genome sequencing is used to obtain primary information on variant mutations. The bioinformatic databases, such as NCBI and GISAID, comprise vast amounts of genetic information, allowing researchers to conduct sequence alignments to identify and determine the frequency of SNPs. Signature SNPs can then be used to identify viral clades and variants for diagnosis and epidemic monitoring. However, sequencing patient samples is expensive and too slow for clinical and control purposes, and more rapid tests are available or in development.

RT-PCR assays

Specific reverse-transcription PCR assays to detect SNPs are used to identify known variants. Although this method is relatively low-throughput, slow, and requires sequencing to confirm the presence of variants, it is highly accurate. For instance, the CoronaMeltVAR Real-Time PCR detection kit uses melting curve analysis to detect the Alpha, Beta, and Gamma SARS-CoV-2 variants in nasopharyngeal and oropharyngeal samples. Primerdesign provides a series of diagnostic real-time reverse-transcriptase PCR assays, SNPsig kits, that use SNPs to detect various SARS-CoV-2 variants of concern.

"Genotyping methods are rapid and relatively simple to perform and can be invaluable in diagnosis and tracking the origin and spread of variants of concern." 

An economic high-throughput genotyping panel was designed by British scientists affiliated with the COVID-19 Genomics UK Consortium to provide a larger-scale and more rapid variant detection assay; the single-step PACE-RT-based assay has been tested using oligo primers designed to a set of SNPs that can distinguish variants. 

Likewise, Japanese researchers recently published their development of a reverse-transcription PCR high-resolution melting curve analysis method for variant genotyping using five key mutations that determine a virus’ affiliation to the five major GISAID clades (L, S, V, G, GH, and GR). 

Genotyping methods are rapid and relatively simple to perform and can be invaluable in diagnosis and tracking the origin and spread of variants of concern. Unlike diagnostic PCR tests, which, by necessity, tend to target consensus sequences to avoid false negatives, these genotyping assays allow the diagnosis of a wide range of variants, which has obvious advantages for epidemiological and control monitoring.

CRISPR-Cas9-based methods

An alternative commercial technique, the FnCas9 Editor Linked Uniform Detection Assay (FELUDA), is a CRISPR-Cas9-based method of nucleic acid detection used with the Milenias lateral flow assay HybriDetect to specifically identify SARS-CoV-2 variants. Because CRISPR enzymes precisely bind to nucleic acid sequences, which activates their ability to cleave target molecules, they can be used in highly accurate reporter assays. The lateral flow design means FELUDA can be used in the field, does not require expensive, bulky equipment, detects viruses in 45 minutes, and is relatively inexpensive, and it is reported to have a sensitivity and specificity comparable to PCR methods.

A drawback of these more rapid approaches is that the predesigned probes need to be regularly updated to keep pace with the evolving viral gene pool, and they cannot identify novel variants and mutations. Therefore, whole genome sequencing is still required to identify and characterize new mutations and variants.

Giving us an edge in the technological arms race

The COVID-19 epidemic, though devastating, is being combated with the help of modern and rapidly evolving molecular techniques. At a time when they are most needed, cutting-edge methods of detecting SNPs and other genetic variations are becoming increasingly affordable and practicable, arming medical teams, epidemiologists, and policy makers with vital information and giving us the edge in a new form of technological and evolutionary arms race.