There is a newer version of the record available.

Published September 9, 2020 | Version Version 1
Dataset Open

Internalizing archaea: a paradigm shift

  • 1. School of Biotechnology and Biomolecular Sciences, The University of New South Wales, Sydney, NSW 2052, Australia
  • 2. The ithree institute, University of Technology Sydney, Ultimo, NSW 2007, Australia
  • 3. Royal Netherlands Institute for Sea Research, Department of Marine Microbiology and Biogeochemistry, Utrecht University, P.O. Box 59, NL-1790 AB Den Burg, The Netherlands
  • 4. Electron Microscope Unit, Mark Wainwright Analytical Centre, The University of New South Wales, Sydney, NSW 2052, Australia
  • 5. Biological Resources Imaging Laboratory, Mark Wainwright Analytical Centre, University of New South Wales, Sydney, NSW 2052, Australia
  • 6. Biomedical Imaging Facility, Mark Wainwright Analytical Centre, The University of New South Wales, Sydney, NSW 2052, Australia
  • 7. Ramaciotti Centre for Cryo-Electron Microscopy, Monash University, Clayton 3168, VIC, Australia
  • 8. Drug Discovery Biology, Monash Institute of Pharmaceutical Sciences, Monash University, Parkville 3052, Victoria, Australia

Description

Repository contents

1_Phylogenies.tar.gz includes all files needed to generate the phylogeny shown in Figure 4 of the associated manuscript. Specifically, this includes: 

  1. The Workflow used to generated the species tree
  2. Any required dependencies such as custom scripts or custom databases
  3. The protein files from 
    1. The proteins from all archaeal reference genomes that were used to generate the protein tree
    2. The 51 marker proteins used to generate the species tree
  4. The mafft_linsi alignments of the 51 marker proteins
  5. The BMGE trimmed alignments of the 51 marker proteins
  6. The concatenated alignment used to generate the species tree
  7. The output of the IQ-TREE analysis

2_Protein_search.tar.gz includes all files needed to generate (a) the HMM profiles specific the two proteins with a coiled-coil protein (CCP) domain (referred to as Locus1 and Locus2 throughout the description) and (b) the Phyre2 results for all potential Locus1 and Locus2 proteins found in the archaea reference set. Specifically, this includes:

(a)

  1.  The script used to build the HMM profiles
  2. Any required dependencies
  3. The sequences of Locus1 and Locus2 proteins, including the individual proteins, the aligned proteins and the trimmed alignments.
  4. All HHsearch results
  5. The HMM profiles

(b)

  1. The results for the batch search run for all potential Locus 1 and Locus2 proteins found across DPANN archaea
  2. The results for the sensitive search run for the Locus1 and Locus2 proteins from Cand. N. antarcticus

 

3_Orthogroup_Data.tar.gz includes all files relating to groups of orthologous proteins generated by OrthoFinder. Specifically this includes:

  1. List of assemblies included in the analysis
  2. List of the number of proteins belonging to each orthogroup for each genome in the analysis.
  3. List of each orthogroup and the identifiers for each protein in each genome that were assigned to that orthogroup.
  4. List of the protein IDs for every protein in each orthogroup. Does not contain genome information.
  5. List of how many orthogroups are shared between each genome
  6. List of genes from each genome not assigned to an orthogroup
  7. Distribution of orthogroups across major phyla.
  8. General statistics for the orthogroups
  9. General statistic for each genome

Files

Files (1.9 GB)

Name Size Download all
md5:d7f33a5c17e2af81709338e9934cd031
300.5 MB Download
md5:f1f515c00762ccaf34bfe63265f745cb
1.6 GB Download
md5:aa4206812f0cd25928f32ee1f47ce174
14.2 MB Download