Evaluation of genome assembly software based on long reads
- 1. CNRS, IFB
- 2. CNRS, GenScale
- 3. INRA, IFB
- 4. CNRS, IFB, Elixir
Description
During the last 30 years, Genomics has been revolutionized by the development of first- and second-generation sequencing (SGS) technologies, enabling the completion of many remarkable projects as the Human Genome Project, the 1000 Genomes Project and the Human Microbiome Project.
In the last decade, SGS technologies based on massive parallel sequencing have dominated the market, thanks to their ability to produce enormous volumes of data cheaply. However, often genes and regions of interest are not completely or accurately assembled, complicating analyses or requiring additional cloning efforts for obtaining the correct sequences. The fundamental obstacle in SGS technologies for obtaining high quality genome assembly is the existence of repetitions in the sequences. A promising solution to this issue is the advent of Third-generation sequencing (TGS) technologies based on long read sequencing.
TGS technologies have been used to produce highly accurate de novo assemblies of hundreds of microbial genomes and highly contiguous reconstructions of many dozens of plant and animal genomes, enabling new insights into evolution and sequence diversity. They have also been applied to resequencing analyses, to create detailed maps of structural variations in many species. Also, these new technologies have been used to fill in many of the gaps in the human reference genome. In this report, we compare and evaluate several genome assembly software based on TSG technology. The experimentation has been performed on 4 reference genomes and the results evaluated with the QUAST software.
Files
scientific_reports_assembly_long_reads(2).pdf
Files
(382.7 kB)
Name | Size | Download all |
---|---|---|
md5:33ea1aaad2dcf45b2d948f556344825f
|
382.7 kB | Preview Download |