There is a newer version of the record available.

Published July 24, 2018 | Version v1
Preprint Open

Haplotype-aware diplotyping from noisy long reads

  • 1. Max Planck Institute for Informatics, Saarbruecken, Germany; Center for Bioinformatics, Saarland University, Saarbruecken, Germany
  • 2. UC Santa Cruz Genomics Institute, University of California Santa Cruz, USA

Description

SNP calls for individual NA12878 produced by MarginPhase and WhatsHap on PacBio and Oxford Nanopore data.

Paper Abstract: Current genotyping approaches for single nucleotide variations rely on short, accurate reads from second generation sequencing devices. Presently, third generation sequencing platforms are rapidly becoming more widespread, yet approaches for leveraging their long but error-prone reads for genotyping are lacking.
Here, we introduce a novel statistical framework for the joint inference of haplotypes and genotypes from noisy long reads, which we term diplotyping. Our technique takes full advantage of linkage information provided by long reads. We validate hundreds of thousands of candidate variants that have not yet been included in the high-confidence reference set (NA12878) of the Genome-in-a-Bottle effort.

Files

Files (388.0 MB)

Name Size Download all
md5:5c434ba655f6a9fbfb49500af6804ca7
52.4 MB Download
md5:212545e8271dd2beab1f7a5ffde0ea38
1.5 MB Download
md5:f7824eee41a02c1c0ae3062cc27f56d7
50.4 MB Download
md5:904c62545fc7b654bbb8c17d957be6f3
1.5 MB Download
md5:b032c019105391e62bd0c4e7fcb604d0
145.6 MB Download
md5:47032937965350f8c615e41b384d5a9d
1.6 MB Download
md5:5606e0629e0c021f3d44f4156558ed61
133.5 MB Download
md5:b11b8ddc1b9c131105701d6113abb901
1.5 MB Download