Rahul Sharma, PhD
There has been continued interest in the machine-reading of DNA sequences ever since the discovery of DNA. The ability to read genetic sequences and find mistakes before an adverse medical condition develops is key to managing diseases with a genetic component, such as genetic changes associated with cancer predispositions, genetic disorders, and metabolic syndromes. Identifying DNA changes accurately, quickly, and cost-effectively through sequencing is therefore increasingly becoming part of clinical interventions.
Classical DNA sequencing (chain termination method)
A major breakthrough was the Sanger sequencing method developed by Frederick Sanger in 1977, which has since been automated in the form of the capillary sequencing method commercialized by Applied Biosystems Inc. (ABI). Sanger sequencing is based on the selective incorporation of chain-terminating and fluorescently labeled dideoxynucleotides (ddNTPs) during pre-sequencing PCR, also known as cycle sequencing PCR. Labeled fragments of different sizes (terminated at each base of target sequence) are generated and passed through the capillary electrophoresis instrument, which records the sequence by reading the color of ddNTP at the end of each fragment.
Sanger sequencing is still used widely for targeted sequencing of clinically relevant germline mutations, pathogen identification, and confirming the genomic variants identified by non-sequencing (DNA hybridization and PCR based) methods. However, Sanger sequencing is cost prohibitive and inefficient for simultaneous sequencing of multiple genes, the entire genome or metagenome, and detection of low frequency mutations.
The advent of NGS methods
Advanced sequencing methods are known as next-generation sequencing (NGS) or massively parallel sequencing. NGS can read millions or billions of DNA molecules concurrently and display the nucleotide sequence (reads) of each molecule individually. Output of the Sanger sequencing is displayed as a single analog electropherogram for all the DNA molecules sequenced in a reaction. Therefore, the Sanger method cannot differentiate if a nucleotide difference (mutation) is present in less than 15 to 20 percent of the DNA molecules. Because of low sensitivity, this method is not useful for rare mutation detection and residual disease diagnosis, or when identifying the mutation from cell-free circulating DNA.
The advent of NGS methods (such as Illumina’s reversible terminator technology, Life Technologies’ semiconductor sequencing, Pacific Biosciences’ single-molecule real-time sequencing, and Oxford Nanopore Technologies’ sequencing technologies) has had a tremendous impact on scientific research and health care. The NGS process involves adding a set of adaptors (~60bp fragments of known sequences) to the fragmented DNA template or amplicons to be sequenced. This library preparation step enables the sequencing of unknown DNA targets, unlike the Sanger method, in which primers are required to amplify the target region before sequencing. One or two 6-8bp unique sequences (barcode/index) are then attached to the adaptor-ligated DNA of each sample. Sample-specific barcoding enables the multiplexing of up to 768 samples on a single run and de-multiplexing to segregate the sequences corresponding to each sample.
Various NGS methods are now available, some of which can read up to 600bp length (MiSeq) and produce up to 10 billion reads (NovaSeq) in a single run. This output is enough to sequence up to 48 human genomes in 48 hours, bringing the human genome sequencing cost close to $1000/genome, which would be unthinkable using the Sanger sequencing method. However, NGS instrument costs range from $100,000 to $1 million, while the cost of a Sanger sequencer ranges from $50,000 to $200,000, depending upon the number of capillaries.
NGS versus Sanger sequencing
Clinical applications of NGS include simultaneous detection of thousands of mutations in hundreds of genes at low frequency, deciphering of the entire microbial population (microbiome) of a clinical sample, and diagnosis of novel or clinically suspected pathogens. Although hundreds of disease-specific, ready-to-use NGS panels are commercially available and many NGS-based laboratory developed tests (LDTs) are being developed by CLIA-certified high complexity clinical laboratories, there are only a handful of NGS assays that are currently approved by the FDA. Validation of NGS-based assays for clinical decision making is complicated and requires substantial expertise and investment. LDT validation of the Sanger sequencing assay is relatively uncomplicated and cost-effective.
Unlike visibly readable ~500bp output of Sanger sequencing, the analysis of billions of bases read by NGS requires specialized computational capabilities and a skill set that is not commonly available in clinical settings. Although automated pipelines and databases are becoming available, data analysis and clinical interpretation remain major challenges in deploying NGS into clinical settings.
Although NGS is rapidly becoming accessible to more and more clinical labs, Sanger sequencing remains a valuable tool in the clinical setting because of its low cost and fast turnaround time (TAT). Most clinical labs can return Sanger sequencing results within one week, whereas TAT for NGS based tests is two to four weeks, which may be too long for some clinical decisions. Depending upon the diagnostic panel size and genes tested, an NGS based molecular diagnostic is billed for $2,000 to $3,500 and the single-gene Sanger sequencing reimbursement rate for Medicare and Medicaid patients is about $200-300 (BRCA1 at $283). Sanger sequencing data are also easy to interpret, particularly when investigating few germline mutations or validating mutations detected by other methods such as hybridization arrays and PCR.
NGS is now favored for sequencing thousands of genomic variants simultaneously, sequencing the entire genome/exome to find novel variants, detection of rare mutations via cell-free DNA sequencing, microbiome analysis, and pathogen subtyping in critical outbreak situations. On the other hand, Sanger sequencing is still a method of choice for sequencing single genes or gene regions of up to 500 base pairs, short tandem repeat analysis, pathogen detection, and validation of PCR results.
Overall, both Sanger sequencing and NGS should be considered as complementary, rather than rival technologies. It would be an unnecessary stretch of resources in the clinical setting to use NGS for needs that can be satisfied by Sanger sequencing, especially in cases where lower throughput is sufficient to get the genomic information needed to make clinical decisions.