Published May 2, 2025 | Version v1
Dataset Open

Supplementary data for "A cellular entity retaining only its replicative core: Hidden archaeal lineage with an ultra-reduced genome"

  • 1. ROR icon Dalhousie University
  • 2. ROR icon The University of Tokyo
  • 3. ROR icon Yamagata University
  • 4. EDMO icon Japan Agency for Marine-Earth Science and Technology
  • 5. ROR icon University of Tsukuba

Description

 

  •  M16_Assembly/
    • unicycler_hybrid.fasta: The single-cell amplified genome assembly of Citharistes regius M16 cell.
    • unicycler_hybrid_longest_orfs.pep: Amino acid sequences of predicted open reading frames used as queries in additional archaeal contig searches.
  • AlphaFold_models_for_annotation/
    This folder contains the predicted tertiary structures of all proteins encoded in the Sukunaarchaeum genome. The version of the AlphaFold used for each prediction is shown in the respective file name (v2.2.0 or v2.3.2 or v3).
  • AlphaFold3_models_for _large_proteins/
    This folder contains AlphaFold3 protein structures of the eight large proteins identified in the Sukunaarchaeum genome.
  • Phylogennomics/
    This folder contains data related to the phylogenomic analyses of the 70 genes alignment.
    • single_gene_fasta/
      • This folder contains amino acid sequences of each of the 70 genes from 150 genomes. For each single gene, it includes a raw sequence file, a sequnce file aligned by MAFFT, and an alignment file trimmed by BMGE.
    • Analyses_original/
      • 70gene150otu.fasta: Concatenated amino acid alignment comprising 70 genes and 150 archaeal species.
      • 70gene150otu_coverage.tsv: Coverage of each gene and genome used for the concatenated alignment.
      • 70gene150otu_ML_LG+C60+F+I+R10_UFBP_PMSFBP.treefile: ML tree estimated under LG+C60+F+I+R10 model with UFBP and PMSFBP analyses.
      • 70gene150otu_ML_LG+C60+F+I+R10_UFBP_PMSFBP_fullname.treefile: ML tree with full leaf names.
      • 70gene150otu_BI_GTR+CAT+G4_BPP.treefile: Bayesian tree estimated under GTR+CAT+G4 model with BPP supports.
      • 70gene150otu_BI_GTR+CAT+G4_BPP_fullname.treefile: Bayesian tree with full leaf names.
    • AU_test: This folder contains a log file, an iqtree file, and a tree file created byIQ-TREE as a result of the AU test.
    • Analyses_SR4/
      • 70gene150otu_SR4.fasta: SR4-recoded alignment file.
      • SR4_models.nex: NEXUS file containing C60SR4 model.
      • 70gene150otu_MLSR4_GTR+SR4C60+I+R8_UFBP.treefile: ML tree estimated under GTR+SR4C60+I+R8 model with UFBP approximation.
      • 70gene150otu_MLSR4_GTR+SR4C60+I+R8_UFBP_fullname.treefile: ML tree with full leaf names.
      • 70gene150otu_BISR4_GTR+CAT+G4_BPP.treefile: Bayesian tree estimated under GTR+CAT+G4 model with BPP supports.
      • 70gene150otu_BISR4_GTR+CAT+G4_BPP_fullname.treefile: Bayesian tree with full leaf names.
    • FastSiteRemoval/
      • fasta: This folder contains amino acid sequence alignments used for fast-evolving site removal analyses.
      • fasta_SR4: This folder contains SR4-recoded alignments used for fast-evolving site removal analyses
      • boottrees: This folder contains ultrafast bootstrap trees generated in fast-evolving site removal analyses.
      • boottrees_SR4: This folder contains ultrafast bootstrap trees generated in SR4-recoded fast-evolving site removal analyses.
  • rRNA_phylogeny/
    • Tara_st76_metagenome/
      • megahit.final.contigs.fa.gz: Gzipped FASTA file for a genome assembly generated using metagenomic reads from Tara Oceans sampling station 76 (TOSS76).
      • blastn_out.txt: BLASTN search results against megahit.final.contigs.fa using the 16S and 23S rRNA of Sukhnaarchaeum as queries.
      • run_megahit.sh: Bash script used for the MEGAHIT assembly.
    • Tree_with_st76/
      • 16S/23S_with_st76.fasta: These files contain the following rRNA sequences. 
        • Sukunaarchaeum rRNA: Names start with Sukuna
        • rRNA sequences detected in the TOSS76 metagenome assembly: Names start with st76
        • rRNA sequences detected in the MATOU metatranscriptome dataset: Names start with MATOU_v1
        • rRNA sequences from GTDB genomes: Names start with NCBI accesion numbers
      • 16S/23S_with_st76_linsi.fasta: Sequences aligned by MAFFT.
      • 16S/23S_with_st76_linsi_bmge.fasta: Sequences trimmed by BMGE.
      • 16S/23S_with_st76_GTR+F+G_UFBP.treefile: ML tree estimated by IQ-TREE.
      • 16S/23S_with_st76_GTR+F+G_UFBP_fullname.treefile: ML tree with full leaf names.
    • Tree_without_st76/
      This folder contains almost the same files as above but without the TOSS76 sequences.

Files

Files (1.0 GB)

Name Size Download all
md5:b3f31048054e8db355df4b2ed189bf9c
3.4 MB Download
md5:58b92ac02a416a2424a6fd27abc7198c
21.1 MB Download
md5:a6c03144a8294bbf5e71a5d32a5b19f3
1.0 MB Download
md5:2c71d8f01fb246b05384cb469ead451f
18.5 MB Download
md5:d7e1ad72d858b80be8a19a80cc2f2498
998.3 MB Download

Additional details

Related works

Is supplement to
Preprint: 10.1101/2025.05.02.651781 (DOI)