Published December 6, 2022 | Version v1
Dataset Open

Supplementary information for: NUMT PARSER: Automated identification and removal of nuclear mitochondrial pseudogenes (numts) for accurate mitochondrial genome reconstruction in Panthera

  • 1. University of Illinois at Urbana Champaign

Description

Nuclear mitochondrial pseudogenes (numts) may hinder the reconstruction of mtDNA genomes and affect the reliability of mtDNA datasets for phylogenetic and population genetic comparisons. Here, we present the program Numt Parser, which allows for the identification of DNA sequences that likely originate from numt pseudogene DNA. Sequencing reads are classified as originating from either numt or true cytoplasmic mitochondrial (cymt) DNA by direct comparison against cymt and numt reference sequences. Classified reads can then be parsed into cymt or numt datasets. We tested this program using whole genome shotgun-sequenced data from two ancient Cape lions (Panthera leo) because mtDNA is often the marker of choice for ancient DNA studies, and the genus Panthera is known to have numt pseudogenes. Numt Parser decreased sequence disagreements that were likely due to numt pseudogene contamination and equalized read coverage across the mitogenome by removing reads that likely originated from numts. We compared the efficacy of Numt Parser to two other bioinformatic approaches that can be used to account for numt contamination. We found that Numt Parser outperformed approaches that rely only on read alignment or Basic Local Alignment Search Tool (BLAST) properties, and was effective at identifying sequences that likely originated from numts while having minimal impacts on the recovery of cymt reads. Numt Parser therefore improves the reconstruction of true mitogenomes, allowing for more accurate and robust biological inferences.

Notes

Files in BAM format (.bam) are stored in binary and require the use of SAMtools for conversion. SAM (.sam) and FASTA (.fa) files are in text format and can be accessed using any text editor software (in either the command line or a graphical application).

Funding provided by: Cooperative State Research, Education, and Extension Service, US Department of Agriculture*
Crossref Funder Registry ID:
Award Number: ILLU 875–952

Files

README.md

Files (14.2 MB)

Name Size Download all
md5:a134389764e67b83650b718583e50aff
493.5 kB Download
md5:1f797900edd20bb5723af1f4057f5b67
409.7 kB Download
md5:7a88c6ec7793fe2d7f9ea85e139fcefd
536.9 kB Download
md5:ebd7c49efe0301ad8efb4df174fd4d2e
608.3 kB Download
md5:be50c50a34f3e1b5e73d77f2b5a25bd4
16.9 kB Download
md5:21b4da2bea7bcfcee8b8213b276bd9bf
3.8 MB Download
md5:a47b6708bfdd830935b83f1bb91ce54d
1.7 MB Download
md5:5e9ad00b159be6b6f551b24a2bdc6d9c
341.1 kB Download
md5:7ce78884d835d94af438051bc3da76ed
275.1 kB Download
md5:ac9b51e701a5371505529cf37b8b23c3
367.6 kB Download
md5:919177b91ff5ba5ae922e5126851e2c6
488.7 kB Download
md5:129aeff6c047cb099cfcbb1fe9dfae2b
16.9 kB Download
md5:8f964e02e02732c5e30bb53c6ad5c108
3.1 MB Download
md5:83e36ee74d4559076dd21b5448e433b9
1.9 MB Download
md5:c1a6acc72df1d6872025a90268a973be
7.6 kB Preview Download
md5:a4538860d3900d21cb2f9cd61c85c657
175.8 kB Preview Download

Additional details

Related works

Is cited by
10.1101/2022.04.04.487049 (DOI)