The past 60 years have seen researchers develop sequencing technologies and techniques to determine nucleic acid sequences in biological samples. Their efforts have led to the emergence of next-generation sequencing (NGS), which clinicians and researchers use to diagnose, manage, and monitor a plethora of diseases and disorders by identifying somatic or germline mutations. NGS has also become a key player in metagenomic studies, and, in the heat of the COVID-19 crisis, researchers even employed NGS to characterize the SARS-CoV-2 genome and monitor the spread of the virus.
Many researchers and scientists use the life sciences journal BioTechniques to keep up with developments in NGS. BioTechniques publishes insights into the history of NGS, the sequencing methods that researchers can use, what each stage of NGS entails, the difference between short-read and long-read sequencing, the difference between whole-genome and whole-exome sequencing, NGS data analysis, and NGS bottlenecks.
In this general overview, we’ll explore the history of NGS, the NGS technologies that are now in use, and the NGS workflow.
Although many individuals are familiar with NGS today, this modern sequencing technique has followed a rich history of scientific developments. All the way back in 1953, James Watson and Francis Crick used Rosalind Franklin’s X-ray diffraction and DNA crystallography work to understand the structure of DNA. Then, in 1965, Robert Holley sequenced the first molecule (tRNA).
Since then, a variety of researchers have adapted these methods to progress DNA sequencing. 1977 saw one of the most notable developments when Frederick Sanger and his colleagues came up with Sanger sequencing, a chain-termination method. Then, by 1986, researchers had launched the first automated DNA sequencing method. This development saw the beginning of a golden era for sequencing platforms, including the capillary DNA sequencer.
From here, in 2003, researchers completed the human genome project and went on to launch the first commercially available, second-generation (2G) NGS platform in 2005. This platform enabled researchers to amplify millions of copies of a DNA fragment simultaneously.
Although 2G NGS and Sanger sequencing share some similarities, 2G sequencing offers a much higher sequencing volume that allows researchers to process millions of reactions in parallel. This approach results in higher throughput, higher speed, and higher sensitivity, all at a lower cost. Therefore, 2G NGS allows researchers to conduct genome sequencing research projects within hours that would have taken years to complete with Sanger sequencing.
For many, 2G technologies spring to mind when they think of NGS. But third- and fourth-generation (3G and 4G) technologies also fall under the NGS umbrella.
2G sequencing methods may share a variety of features, but we can categorize them according to their underlying detection chemistries, such as sequencing by ligation (incorporating nanoball) and sequencing by synthesis (SBS). SBS divides further into proton detection, reversible terminator, and pyrosequencing.
Although 2G technologies offer many benefits over outdated sequencing techniques, there are some drawbacks to these technologies. These drawbacks include poor interpretation of homopolymers and the incorporation of incorrect deoxyribonucleotide triphosphates (dNTPs) by polymerases (both of these can lead to sequencing errors), the need for deeper sequencing coverage because of 2G sequencing’s short read lengths, and the need for PCR amplification before researchers begin sequencing.
One of the main benefits of 3G sequencing is that it overcomes the need for PCR amplification. Researchers can obtain sequence information with DNA polymerase by monitoring the incorporation of fluorescent labeled nucleotides into DNA strands with single-base resolution. Depending on which technique and tools researchers use, 3G sequencing can offer benefits like unbiased sequencing, longer read lengths, and real time monitoring of nucleotide incorporation. However, the high error rates, high costs, low read depth, and large quantities of sequencing data can pose challenges.
4G sequencing merges 3G single-molecule sequencing with nanopore technology, which, like 3G sequencing technologies, doesn’t require amplification. Nanopore technology passes the single molecule through nanopores and enables the fastest whole-genome sequence scans to date. However, as these scans are more costly and error-prone than 2G technologies, there isn’t as much data available for this technique.
Researchers can employ a variety of NGS methods, which typically follow a four-stage workflow. They can tailor these methods to the target DNA or RNA and their selected sequencing system. These are the four stages:
During the first stage of the NGS workflow, sample preparation, the researcher extracts nucleic acid (DNA or RNA) from the selected samples. These samples may be blood, sputum, bone marrow, or similar. The researcher quality control checks the extracted samples using a standard method, which could be fluorometric, gel electrophoretic, or spectrophotometric. If they are working with RNA, the researcher will reverse transcribe the sample into cDNA. Some library preparation kits include this step.
During the second stage of the process, library preparation, the researcher randomly fragments the DNA or cDNA, usually by sonication or applying an enzymatic treatment. The optimum fragment length depends on the platform that the researcher is using. The researcher may need to run a small amount of the fragmented sample on an electrophoresis gel. Then, they can end-repair and ligate the fragments to adapters, which are smaller, generic DNA fragments. Adapters have defined lengths with known oligomer sequencers, which make them compatible with the applied sequencing platform and recognizable when researchers carry out multiplex sequencing. Multiplex sequencing enables researchers to pool and sequence large numbers of libraries simultaneously.
Next, the researcher performs size selection, either using magnetic beads or gel electrophoresis. The size selection process eliminates any fragments that are too long or too short for the sequencing platform and protocol. From here, the researcher achieves library enrichment/amplification through PCR. They may apply a “clean-up” step, sometimes using magnetic beads, to remove any undesired fragments and improve sequencing efficiency. The researcher can then complete this stage by using qPCR to quality control check the final libraries and confirm the quantity and quality of the DNA. This way, they can prepare the correct concentration of the sample for sequencing.
During the third stage of the workflow, the sequencing stage, the researcher may perform clonal amplification of the library fragments before sequencer loading (emulsion PCR). On the other hand, the researcher may perform the amplification on the sequencer itself (bridge PCR). They detect and report the sequences according to the platform they are using.
During the final stage of the workflow, data analysis, the researcher examines the generated data files. They choose an analysis method based on the workflow and the aim of the research. For example, mate-pair and paired-end sequencing are ideal for downstream data analysis, especially de novo assemblies. The researcher links sequencing reads that have been separated by an intervening DNA region (mate pair) or are read from both ends of a fragment (paired end).
When choosing a library preparation and sequencing platform, researchers should consider a variety of factors, including:
• The sample type
• The research question
• Whether short-read or long-read sequencing would be more appropriate
• The read length required
• Whether they need to look at the genome or transcriptome (DNA or RNA)
• Whether they need to sequence the whole genome or specific regions only
• The sample concentration required
• The optimum extraction method
• The read depth (coverage) required
• Whether they should multiplex samples
• Whether to use single-end, paired-end, or mate-pair reads
• Whether they need bioinformatic tools.
As NGS technologies generate large volumes of data, the data analysis process typically involves raw read quality control, pre-processing and mapping, post-alignment processing, variant annotation, variant calling, and visualization steps.
When researchers examine raw sequencing data, they can determine the quality of the data and prepare for downstream analyses. These assessments provide a general view of the length and quantity of reads and identify any contaminating sequences or reads with low coverage.
NGS has transformed the sequencing space since its launch in the early noughties. Since then, sequencing methods such as whole-genome, whole-exome, targeted, transcriptome, epigenome, and metagenome sequencing have each offered unique benefits to research settings around the globe. As NGS methods continue to develop and costs continue to dip, many more researchers and clinicians can and will adopt emerging sequencing techniques in their practices.
BioTechniques publishes insights into the laboratory techniques, technologies, and tools that pave the way for developments in the life sciences arena, both in its print journal and on its multimedia website. Its expansive audience of scientists and research professionals use these resources to develop their understanding of lab methodologies and contribute to important industry discussions. From articles, eBooks, and interviews to videos, webinars, and podcasts, BioTechniques curates the materials that users need to keep up with the latest in their fields.