There is a newer version of the record available.

Published February 18, 2020 | Version Version 1
Dataset Open

Undinarchaeota illuminate the evolution of DPANN archaea

  • 1. NIOZ, Royal Netherlands Institute for Sea Research, Department of Marine Microbiology and Biogeochemistry, and Utrecht University, Netherlands
  • 2. School of Biological Sciences, University of Bristol, UK
  • 3. Australian Centre for Ecogenomics, School of Chemistry and Molecular Biosciences, The University of Queensland, Australia
  • 4. Department of Cell- and Molecular Biology, Science for Life Laboratory, Uppsala University, SE-75123, Uppsala, Sweden
  • 5. Research School of Computer Science and Research School of Biology, Australian National University, Australia

Description

Abstract

The evolution and diversification of Archaea is central to the history of life on Earth. Cultivation-independent approaches have revealed at least ten lineages of Archaea whose members have small cell and genome sizes and limited metabolic capabilities and have been suggested to form the monophyletic “DPANN” superphylum. However, the phylogenetic diversity of DPANN and their placement in the archaeal tree remain controversial. Here, we describe 12 genomes of an uncharacterized archaeal phylum-level lineage UAP2 (Candidatus Undinarchaeota), which in initial analyses distantly affiliated with DPANN archaea. Our analyses revealed that members of the Undinarchaeota have small estimated genome sizes and, while potentially being able to conserve energy through fermentation, likely depend on partner organisms for the acquisition of vitamins and amino acids. Phylogenomic analyses robustly supported the placement of Undinarchaeota as an independent lineage between two major and highly supported clans of ‘DPANN’: one clan comprising Micrarchaeota, Altiarchaeota and Diapherotrites, and another encompassing all other DPANN lineages. These analyses also suggest that DPANN Archaea may have exchanged core genes with their hosts by horizontal gene transfer (HGT), adding to the difficulty of placing DPANN in the archaeal tree. Together, our findings provide crucial insights helping to resolve debates about DPANN origins as well as the evolution of symbiotic lifestyles in archaea.

Repository Contents

1_Genome_files.tar.gz includes all Undinarchaeota (original name UAP2) metagenome-assembled genomes (MAGs). This includes: 

  1. The original contigs for each UAP2 MAG (fna files)
  2. The prokka output for each UAP2 MAG (faa files)
  3. A concatenated file of all proteins from each UAP2 MAG and all archaeal reference genomes (364 genomes in total). This folder also includes a list of archaeal genomes investigated.

2_Phylogenies.tar.gz includes all files for the phylogenetic analyses. This includes the following folders:

1. Files for the concatenated species trees for different taxa sets. These files are related to the following parts of the manuscript: Supplementary Table 6; Figure 1 and Supplementary Figures S6-S56. The folder includes the following:

  • Folder '1_unaligned_sequences' includes individual protein sequences extract from the different taxa sets.
  • Folder '2_alignments' includes the alignment files generated by MAFFT.
  • Folder '3_alignments_trimmed' includes the alignments trimmed with BMGE.
  • Folder '4_phylogenies' includes the IQ-TREE output for all phylogenies as well as color-annotation file for figtree. Additionally files rooted with minimal ancestor deviation (MAD) rooting (*.rooted) are provided. Note, that for the final figures the *treefile_renamed (i.e. the iqtree file with the full taxa string) were artificially rooted using the DPANN archaea. The numbering corresponds to Supplementary Table S6 of the main manuscript.
  • Folder ' 5_pdfs' includes the PDFs for each tree

2. Files for single gene trees that includes:

  • The folder '1_arcogs' includes the unaligned proteins, alignments, trimmed alignments, trees and pdfs for the single gene trees based on the arCOG identifiers. The arCOGs were extract from 12 UAP2 MAGs + 352 archaeal + 3020 bacterial + 100 eukaryotic genomes. ArCOGs were only considered if they occurred in at least 3 UAP2 genomes. Notice, these files were used to investigate UAP2 for HGT events and correspond to the following parts of the manuscript: Figure 4 and Supplementary Tables 4, 5, 16 and 17
  • The folder '151_markers' including the proteins, alignments, trimmed alignments, trees and pdfs for evaluating the 151 marker set used for the concatenated species tree. Files were provided for the 127 and 364 taxa set. These files were used as a basis for the concatenated species trees that were used to generated Supplementary Figures S6-S56. Additionally, the trees were used for ranking marker proteins and generating Supplementary Tables 4-5.
  • The folder '3_other_individual_trees' includes the proteins, alignments and phylogenies for the 16S_23S, RubisCO and primase analyses. The data was used to generate the following parts of the manuscript: Supplementary Table 11, Supplementary Figures 3-5, 57 and 59.

3_Scripts.tar.gz includes all files for the phylogenetic analyses. This includes the following folders:

1. The files for the main workflow for the annotations and phylogenies.

  • This folder includes the workflow to generate annotations for archaeal genomes as well as an example script that was used to generate phylogenies. These analyses were typically run on a in-house bioinformatics cluster with 4x Xeon Gold 6140 2.3 GHz processors using bash, python and perl. The used system runs a Linux operating system, Red Hat Enterprise 7.5.

2.  A folder providing any required dependencies that include:

  • any python or perl scripts that were used during this study and/or that are mentioned in the methods section
  • Databases used for the annotations, esp. if these were slightly modified. Notice, changes typically include parsing of the mapping files or modifications of the sequence headers for easier parsing.
  • mapping files needed to link the genome accession ids to the taxonomy string as well as lists of protein IDs used for different phylogenies (i.e. 14 + 48 arCOGs used for protein phylogenies)

3. R scripts (including all needed input files) used to: 

  • generate tables and figures for the annotations, i.e. Figure 2 and 3 and Supplementary Tables 7, 8, 9, 12, 13-15 and Supplementary Figures 58, 60-62 . The input folder includes the raw output from the annotation workflow and includes annotations for the 12 UAP2 MAGs as well as 352 archaeal reference genomes.
  • generate tables and figures for the HGT analyses, i.e. Figure 4 and Supplementary Tables S16 and S17. Here, proteins based on arCOGs were extracted from 364 archaeal, 3020 bacterial and 98 eukaryotic genomes and used to generate single protein phylogenies. The resulting trees were used to investigate horizontal gene transfer events and the necessary scripts are provided in this folder.
  • generate tables and figures for the amino acid identify (AAI) comparisons, i.e. Supplementary Table S3 and Supplementary Figure S2. 
  • rank the marker genes for concatenated species trees for the 127 and 364 taxa set. These were used to generate Supplementary Tables S4 and S5.

Notes

This work was supported by a grant of the Swedish Research Council (VR starting grant 2016-03559 to Anja Spang), the NWO-I foundation of the Netherlands Organisation for Scientific Research (WISE fellowship to AS). Tom Williams was supported by a Royal Society University Research Fellowship. Benjamin Woodcroft was supported by the Australian Research Council Discovery Early Career Research Awards #DE160100248. And an Australian Research Council (ARC) Future Fellowship (FT170100213) awarded to Chris Rinke.

Files

Files (2.8 GB)

Name Size Download all
md5:6403b0ad56f1a5a023debf1063e43f0e
137.1 MB Download
md5:470f723098a678b831e2ed2ffaba974d
988.6 MB Download
md5:cf7c24f444adb1526d3409c5f0fc0c6a
1.7 GB Download

Additional details

Related works

Is cited by
10.1101/2020.03.05.976373 (DOI)

Funding

Discovery Early Career Researcher Award - Grant ID: DE160100248 DE160100248
Australian Research Council
ARC Future Fellowships - Grant ID: FT170100213 FT170100213
Australian Research Council