Internalizing archaea: a paradigm shift

doi:10.5281/zenodo.4020483

Published September 9, 2020 | Version Version 1

Dataset Open

Internalizing archaea: a paradigm shift

1. School of Biotechnology and Biomolecular Sciences, The University of New South Wales, Sydney, NSW 2052, Australia
2. The ithree institute, University of Technology Sydney, Ultimo, NSW 2007, Australia
3. Royal Netherlands Institute for Sea Research, Department of Marine Microbiology and Biogeochemistry, Utrecht University, P.O. Box 59, NL-1790 AB Den Burg, The Netherlands
4. Electron Microscope Unit, Mark Wainwright Analytical Centre, The University of New South Wales, Sydney, NSW 2052, Australia
5. Biological Resources Imaging Laboratory, Mark Wainwright Analytical Centre, University of New South Wales, Sydney, NSW 2052, Australia
6. Biomedical Imaging Facility, Mark Wainwright Analytical Centre, The University of New South Wales, Sydney, NSW 2052, Australia
7. Ramaciotti Centre for Cryo-Electron Microscopy, Monash University, Clayton 3168, VIC, Australia
8. Drug Discovery Biology, Monash Institute of Pharmaceutical Sciences, Monash University, Parkville 3052, Victoria, Australia

Repository contents

1_Phylogenies.tar.gz includes all files needed to generate the phylogeny shown in Figure 4 of the associated manuscript. Specifically, this includes:

The Workflow used to generated the species tree
Any required dependencies such as custom scripts or custom databases
The protein files from
1. The proteins from all archaeal reference genomes that were used to generate the protein tree
2. The 51 marker proteins used to generate the species tree
The mafft_linsi alignments of the 51 marker proteins
The BMGE trimmed alignments of the 51 marker proteins
The concatenated alignment used to generate the species tree
The output of the IQ-TREE analysis

2_Protein_search.tar.gz includes all files needed to generate (a) the HMM profiles specific the two proteins with a coiled-coil protein (CCP) domain (referred to as Locus1 and Locus2 throughout the description) and (b) the Phyre2 results for all potential Locus1 and Locus2 proteins found in the archaea reference set. Specifically, this includes:

(a)

The script used to build the HMM profiles
Any required dependencies
The sequences of Locus1 and Locus2 proteins, including the individual proteins, the aligned proteins and the trimmed alignments.
All HHsearch results
The HMM profiles

(b)

The results for the batch search run for all potential Locus 1 and Locus2 proteins found across DPANN archaea
The results for the sensitive search run for the Locus1 and Locus2 proteins from Cand. N. antarcticus

3_Orthogroup_Data.tar.gz includes all files relating to groups of orthologous proteins generated by OrthoFinder. Specifically this includes:

List of assemblies included in the analysis
List of the number of proteins belonging to each orthogroup for each genome in the analysis.
List of each orthogroup and the identifiers for each protein in each genome that were assigned to that orthogroup.
List of the protein IDs for every protein in each orthogroup. Does not contain genome information.
List of how many orthogroups are shared between each genome
List of genes from each genome not assigned to an orthogroup
Distribution of orthogroups across major phyla.
General statistics for the orthogroups
General statistic for each genome

Files

Files (1.9 GB)

Name	Size	Download all
1_Phylogenies.tar.gz md5:d7f33a5c17e2af81709338e9934cd031	300.5 MB	Download
2_Protein_search.tar.gz md5:f1f515c00762ccaf34bfe63265f745cb	1.6 GB	Download
3_Orthogroup_Data.tar.gz md5:aa4206812f0cd25928f32ee1f47ce174	14.2 MB	Download

	All versions	This version
Views	279	127
Downloads	28	6
Data volume	12.2 GB	3.8 GB

Internalizing archaea: a paradigm shift

Creators

Description

Files

Files (1.9 GB)