Published September 30, 2019 | Version v1
Dataset Open

MitoFinder: efficient automated large-scale extraction of mitogenomic data in target enrichment phylogenomics

  • 1. Institut des Sciences de l'Evolution de Montpellier (ISEM), CNRS, EPHE, IRD, Université de Montpellier, Montpellier, France.
  • 2. Laboratório Multidisciplinar para Análise de Dados (LAMPADA), Instituto de Bioquímica Médica Leopoldo de Meis, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Brasil.

Description

MitoFinder: efficient automated large-scale extraction of mitogenomic data in target enrichment phylogenomics

Rémi Allio1, Alex Schomaker-Bastos2,†, Jonathan Romiguier1, Francisco Prosdocimi2, Benoit Nabholz1, and Frédéric Delsuc1

1Institut des Sciences de l’Evolution de Montpellier (ISEM), CNRS, EPHE, IRD, Université de Montpellier, Montpellier, France.

2Laboratório Multidisciplinar para Análise de Dados (LAMPADA), Instituto de Bioquímica Médica Leopoldo de Meis, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Brasil.

In Memoriam (08/01/2015)

 

Correspondence

Rémi Allio

Email: remi.allio@umontpellier.fr

Frédéric Delsuc

Email: frederic.delsuc@umontpellier.fr

 

Running head

Mitochondrial signal from UCE capture data

 

Abstract

Thanks to the development of high-throughput sequencing technologies, target enrichment sequencing of nuclear ultraconserved DNA elements (UCEs) now allows routinely inferring phylogenetic relationships from thousands of genomic markers. Recently, it has been shown that mitochondrial DNA (mtDNA) is frequently sequenced alongside the targeted loci in such capture experiments. Despite its broad evolutionary interest, mtDNA is rarely assembled and used in conjunction with nuclear markers in capture-based studies. Here, we developed MitoFinder, a user-friendly bioinformatic pipeline, to efficiently assemble and annotate mitogenomic data from hundreds of UCE libraries. As a case study, we used ants (Formicidae) for which 501 UCE libraries have been sequenced whereas only 29 mitogenomes are available. We compared the efficiency of four different assemblers (IDBA-UD, MEGAHIT, MetaSPAdes, and Trinity) for assembling both UCE and mtDNA loci. Using MitoFinder, we show that metagenomic assemblers, in particular MetaSPAdes, are well suited to assemble both UCEs and mtDNA. Mitogenomic signal was successfully extracted from all 501 UCE libraries allowing confirming species identification using COI barcoding. Moreover, our automated procedure retrieved 296 cases in which the mitochondrial genome was assembled in a single contig, thus increasing the number of available ant mitogenomes by an order of magnitude. By leveraging the power of metagenomic assemblers, MitoFinder provides an efficient tool to extract complementary mitogenomic data from UCE libraries, allowing testing for potential mito-nuclear discordance. Our approach is potentially applicable to other sequence capture methods, transcriptomic data, and whole genome shotgun sequencing in diverse taxa.

 

Figures & Tables

Figure 1. Conceptualization of the pipeline used to assemble and extract UCE and mitochondrial signal from ultraconserved element sequencing data.

Figure 2. Comparison of the efficiency of the assemblers in terms of: A) computational time, B) number of potentially mitochondrial contigs identified, and C) number of mitochondrial genes annotated. Violin plots reflect the data distribution with a horizontal line indicating the median. Note that for the three metagenomic assemblers, 5 CPUs were used compared to 35 CPUs for Trinity. Plots were obtained using PlotsOfData (Postma & Goedhart 2019).

Figure 3. Phylogenomic relationships of ants (Formicidae). AA) Mito-nuclear phylogenetic differences among subfamily relationships based on the UCE and mtDNA supermatrices obtained with the assembler MetaSPAdes assembler. Clades corresponding to subfamilies were collapsed. Inter-subfamily relationships with UFBS < 95% were collapsed. Non-maximal node support values are reported. B) The topology obtained reflects the results of phylogenetic analyses based on the amino acid mitochondrial supermatrix (using MetaSPAdes as assembler). Histograms reflect the percent of UCEs (light grey) and mitochondrial genes (dark grey) recovered for each species. Illustrative pictures (*): Diacamma sp. (Ponerinae; top left), Formica sp. (Formicinae; top right), and Messor barbarus (Myrmicinae; bottom right).

Table 1. Summary statistics on assembly results according to the assembler used. The values are averages over the 501 assemblies, except for the assembly time, which is a median value. The two tables report specific statistics for A) ultraconserved elements data, and B) mitochondrial data. Note that 35 CPUs were used for Trinity whereas 5 CPUs were used for other assemblers.

Table 2. Statistical comparison between the performances of the different assemblers. Statistical significance was estimated with a paired non parametric test (paired wilcoxon test). *** = p<0.001; ** = p<0.01; * = p<0.05; NS = p>0.05; and (+)/(-) is the result of the comparison between the row and the column.

 

Appendices

Appendix S1. List of the 501 UCE libraries (SRA accessions) and associated metadata.

Appendix S2. Summary statistics on mitochondrial signal recovered per species and depending on the assembler used. The table provides the number of contigs and genes recovered with MitoFinder and the size of each annotated gene.

Appendix S3. Summary statistics of barcoding analyses. Detailed results for both BOLDsystem and Megablast analyses are provided for each CO1 recovered with MitoFinder using MetaSPAdes.

Appendix S4. Detailed results of tree distance analyses realized with Dquad (Ranwez, Criscuolo, & Douzery 2010). Trees obtained with each assembler with mitochondrial amino acid supermatrix, mitochondrial nucleotide supermatrix, and UCE nucleotide supermatrix were compared with each others.

Appendix S5. List of Genbank accession numbers for newly generated mitchondrial contigs.

 

Zenodo supplementary files

Assembly_results.tar.gz Contains all contigs obtained for each species with the different assemblers implemented in MitoFinder.

MitoFinder_annotations.tar.gz Contains MitoFinder annotations for each species. (based on the contigs obtained with MetaSPAdes)

UCE_results.tar.gz Contains all annotated UCE obtained for each species after UCE identification with PHYLUCE. (MetaSPAdes)

Final_mtDNA_alignments.tar.gz Contains the final mitochondrial gene alignments. (MetaSPAdes)

Final_UCE_alignments.tar.gz Contains the final UCE alignments. (MetaSPAdes)

Final_mtDNA_matrices.tar.gz Contains the final mi  tochondrial supermatrices (AA and NT) used for the phylogenetic analyses. (MetaSPAdes)

Metaspades_final_UCE_matrix.phy The final UCE supermatrix used for the phylogenetic analyses. (MetaSPAdes)

Notes

Correspondence Rémi Allio Email: remi.allio@umontpellier.fr Frédéric Delsuc Email: frederic.delsuc@umontpellier.fr

Files

AppendixS1_Formicidae_UCEs_SRA_2018.29.03.pdf

Files (17.2 GB)

Name Size Download all
md5:5139a99b1b60e87b1a63eedc290cb0fe
456.5 kB Preview Download
md5:07aab39f28e2745496f857977213f654
553.9 kB Preview Download
md5:633448e7358e81c67334fd0e3f72b15a
333.7 kB Preview Download
md5:ff95176d43f3dc8cf3ffa57853e429f2
119.2 kB Preview Download
md5:515a915a489a98f9f2c43123c641b777
46.7 kB Download
md5:37afafc2404b601e2b10b6b57fb747cd
16.3 GB Download
md5:59e2a42162353faaeb10f3f859c80acf
1.3 MB Preview Download
md5:1e4ae629f46027260c10f279c450607f
606.0 kB Preview Download
md5:4e92493909db322ba41cabfe485eb568
929.9 kB Preview Download
md5:dedfb4c391690badaf47fabed1289105
1.6 MB Download
md5:34e535ffb235452f23d9e28b240c9ad8
1.9 MB Download
md5:517e5dc8606b0e4d1dae2130897ccdd0
7.0 MB Download
md5:4743441709e975343d3375a0a7a194bc
78.2 MB Download
md5:7730a5c2bf347cf766ceb1f3d2540114
27.1 MB Download
md5:3552430579240e0a2b6b247e6db48664
236.0 kB Preview Download
md5:d8cb7949ba281c9df8bbb317397cbf7a
72.4 kB Preview Download
md5:d034ff33c3c28cf5c95a44106967e6ad
773.7 MB Download

Additional details

Related works

Is cited by
Preprint: 10.1101/685412 (DOI)

Funding

European Commission
ConvergeAnt - An Integrative Approach to Understanding Convergent Evolution in Ant-eating Mammals 683257