Published March 12, 2023 | Version 0.1.0
Journal article Open

An efficient error correction and accurate assembly tool for noisy long reads

  • 1. GrandOmics Biosciences
  • 2. State Key Laboratory of Genetic Resources and Evolution, Kunming Institute of Zoology, Chinese Academy of Sciences
  • 3. Instituto de Investigación, Facultad de Medicina, Universidad de San Martín de Porres
  • 4. Institute of Medical Genetics, Cardiff University,
  • 5. School of Automation Science and Engineering, Faculty of Electronic and Information Engineering, Xi'an Jiaotong University
  • 6. Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences
  • 7. State Key Laboratory of Ophthalmology, Zhongshan Ophthalmic Center, Sun Yat-sen University

Description

Long read sequencing data, particularly those derived from the Oxford Nanopore (ONT) sequencing platform, tend to exhibit a high error rate. Here, we present NextDenovo, a highly efficient error correction and assembly tool for noisy long reads, which achieves a high level of accuracy in genome assembly. NextDenovo can rapidly correct reads; these corrected reads contain fewer errors than other comparable tools and are characterized by fewer chimeric alignments. We applied NextDenovo to the assembly of high quality reference genomes of 35 diverse humans from across the world using ONT Nanopore long read sequencing data. Based on these de novo genome assemblies, we were able to identify the landscape of segmental duplications and gene copy number variation in the modern human population. The use of the NextDenovo program should pave the way for population-scale long-read assembly, thereby facilitating the construction of human pan-genomes, using Nanopore long read sequencing data.

Files

35Humans_Biser.zip

Files (1.3 GB)

Name Size Download all
md5:de51d4e94ff3b7381acbb8a799699c18
394.5 MB Preview Download
md5:701310c383259ba74abcf5b4bf2cadd4
913.3 MB Preview Download