Published October 6, 2021 | Version v1
Dataset Open

From Forensics to Clinical Research: Expanding the Variant Calling Pipeline for the Precision ID mtDNA Whole Genome Panel

  • 1. VMorais Lab – Mitochondria Biology & Neurodegeneration, Instituto de Medicina Molecular João Lobo Antunes, Faculdade de Medicina, Universidade de Lisboa, Portugal & NeuroCure Clinical Research Center, Charité - Universitätsmedizin Berlin, Germany
  • 2. VMorais Lab – Mitochondria Biology & Neurodegeneration, Instituto de Medicina Molecular João Lobo Antunes, Faculdade de Medicina, Universidade de Lisboa, Portugal
  • 3. José Ferro Lab – Clinical Research in Non-communicable Neurological Diseases, Instituto de Medicina Mo-lecular João Lobo Antunes, Faculdade de Medicina, Universidade de Lisboa, Portugal & Serviço de Neurologia, Hospital de Santa Maria, Centro Hospitalar Universitário Lisboa Norte, Portugal
  • 4. NeuroCure Clinical Research Center, Charité - Universitätsmedizin Berlin, Germany & Experimental and Clinical Research Center, Charité - Universitätsmedizin Berlin and Max Delbrück Center for Molecular Medicine, Germany
  • 5. Institute of Genetic Epidemiology, Department of Genetics and Pharmacology, Medical University of Inns-bruck, Austria

Description

In this dataset we provide the 1000 Genomes Project's samples processed in our manuscript:

Cortes-Figueiredo, F.; Carvalho, F.S.; Fonseca, A.C.; Paul, F.; Ferro, J.M.; Schönherr, S.; Weissensteiner, H.; Morais, V.A. From Forensics to Clinical Research: Expanding the Variant Calling Pipeline for the Precision ID mtDNA Whole Genome Panel. Int. J. Mol. Sci. 2021, 22, 12031. https://doi.org/10.3390/ijms222112031.

Abstract

Despite a multitude of methods for the sample preparation, sequencing, and data analysis of mitochondrial DNA (mtDNA), the demand for innovation remains, particularly in comparison with nuclear DNA (nDNA) research. The Applied Biosystems™ Precision ID mtDNA Whole Genome Panel (Thermo Fisher Scientific, USA) is an innovative library preparation kit suitable for degraded samples and low DNA input. However, its bioinformatic processing occurs in the enterprise Ion Torrent Suite™ Software (TSS), yielding BAM files aligned to an unorthodox version of the revised Cambridge Reference Sequence (rCRS), with a heteroplasmy threshold level of 10%. Here, we present an alternative customizable pipeline, the PrecisionCallerPipeline (PCP), for processing samples with the correct rCRS output after Ion Torrent sequencing with the Precision ID library kit. Using 18 samples (3 original samples and 15 mixtures) derived from the 1000 Genomes Project, we achieved overall improved performance metrics in comparison with the proprietary TSS, with optimal performance at a 2.5% heteroplasmy threshold. We further validated our findings with 50 samples from an ongoing independent cohort of stroke patients, with PCP finding 98.31% of TSS’s variants (TSS found 57.92% of PCP’s variants), with a significant correlation between the variant levels of variants found with both pipelines.


Please refer to our the github page filcfig/PCP, for more details on running the PrecisionCalllerPipeline.

Notes

This research was funded by Fundação para a Ciência e Tecnologia (FCT) (FCT/PTDC/MED-NEU/7976/2020), and a project cofunded by FEDER (POR Lisboa 2020—Programa Operacional Regional de Lisboa PORTUGAL 2020), and FCT (PAC-PRECISE LISBOA-01-0145-FEDER-016394), and the National Multiple Sclerosis Society (NMSS), NMSS Pilot Research Grant (PP-1712-29466). F.C.F.'s stipend was supported by FCT (PD/BD/114122/2015), and by Merck Germany (restricted research grant). V.A.M. is an iFCT researcher (IF/01693/2014; IMM/CT/27-2020).

Files

10.5281_zenodo.5524539.zip

Files (5.1 GB)

Name Size Download all
md5:e15f8a52650863fda4379cc5a9d1d1e1
5.1 GB Preview Download

Additional details

Related works

Is supplement to
Journal article: 10.3390/ijms222112031 (DOI)