Published May 2, 2025
| Version v1
Dataset
Open
Supplementary data for "A cellular entity retaining only its replicative core: Hidden archaeal lineage with an ultra-reduced genome"
Creators
Description
- M16_Assembly/
- unicycler_hybrid.fasta: The single-cell amplified genome assembly of Citharistes regius M16 cell.
- unicycler_hybrid_longest_orfs.pep: Amino acid sequences of predicted open reading frames used as queries in additional archaeal contig searches.
- AlphaFold_models_for_annotation/
This folder contains the predicted tertiary structures of all proteins encoded in the Sukunaarchaeum genome. The version of the AlphaFold used for each prediction is shown in the respective file name (v2.2.0 or v2.3.2 or v3). - AlphaFold3_models_for _large_proteins/
This folder contains AlphaFold3 protein structures of the eight large proteins identified in the Sukunaarchaeum genome. - Phylogennomics/
This folder contains data related to the phylogenomic analyses of the 70 genes alignment.
- single_gene_fasta/
- This folder contains amino acid sequences of each of the 70 genes from 150 genomes. For each single gene, it includes a raw sequence file, a sequnce file aligned by MAFFT, and an alignment file trimmed by BMGE.
- Analyses_original/
- 70gene150otu.fasta: Concatenated amino acid alignment comprising 70 genes and 150 archaeal species.
- 70gene150otu_coverage.tsv: Coverage of each gene and genome used for the concatenated alignment.
- 70gene150otu_ML_LG+C60+F+I+R10_UFBP_PMSFBP.treefile: ML tree estimated under LG+C60+F+I+R10 model with UFBP and PMSFBP analyses.
- 70gene150otu_ML_LG+C60+F+I+R10_UFBP_PMSFBP_fullname.treefile: ML tree with full leaf names.
- 70gene150otu_BI_GTR+CAT+G4_BPP.treefile: Bayesian tree estimated under GTR+CAT+G4 model with BPP supports.
- 70gene150otu_BI_GTR+CAT+G4_BPP_fullname.treefile: Bayesian tree with full leaf names.
- AU_test: This folder contains a log file, an iqtree file, and a tree file created byIQ-TREE as a result of the AU test.
- Analyses_SR4/
- 70gene150otu_SR4.fasta: SR4-recoded alignment file.
- SR4_models.nex: NEXUS file containing C60SR4 model.
- 70gene150otu_MLSR4_GTR+SR4C60+I+R8_UFBP.treefile: ML tree estimated under GTR+SR4C60+I+R8 model with UFBP approximation.
- 70gene150otu_MLSR4_GTR+SR4C60+I+R8_UFBP_fullname.treefile: ML tree with full leaf names.
- 70gene150otu_BISR4_GTR+CAT+G4_BPP.treefile: Bayesian tree estimated under GTR+CAT+G4 model with BPP supports.
- 70gene150otu_BISR4_GTR+CAT+G4_BPP_fullname.treefile: Bayesian tree with full leaf names.
- FastSiteRemoval/
- fasta: This folder contains amino acid sequence alignments used for fast-evolving site removal analyses.
- fasta_SR4: This folder contains SR4-recoded alignments used for fast-evolving site removal analyses
- boottrees: This folder contains ultrafast bootstrap trees generated in fast-evolving site removal analyses.
- boottrees_SR4: This folder contains ultrafast bootstrap trees generated in SR4-recoded fast-evolving site removal analyses.
- single_gene_fasta/
- rRNA_phylogeny/
- Tara_st76_metagenome/
- megahit.final.contigs.fa.gz: Gzipped FASTA file for a genome assembly generated using metagenomic reads from Tara Oceans sampling station 76 (TOSS76).
- blastn_out.txt: BLASTN search results against
megahit.final.contigs.fa
using the 16S and 23S rRNA of Sukhnaarchaeum as queries. - run_megahit.sh: Bash script used for the MEGAHIT assembly.
- Tree_with_st76/
- 16S/23S_with_st76.fasta: These files contain the following rRNA sequences.
- Sukunaarchaeum rRNA: Names start with
Sukuna
- rRNA sequences detected in the TOSS76 metagenome assembly: Names start with
st76
- rRNA sequences detected in the MATOU metatranscriptome dataset: Names start with
MATOU_v1
- rRNA sequences from GTDB genomes: Names start with NCBI accesion numbers
- Sukunaarchaeum rRNA: Names start with
- 16S/23S_with_st76_linsi.fasta: Sequences aligned by MAFFT.
- 16S/23S_with_st76_linsi_bmge.fasta: Sequences trimmed by BMGE.
- 16S/23S_with_st76_GTR+F+G_UFBP.treefile: ML tree estimated by IQ-TREE.
- 16S/23S_with_st76_GTR+F+G_UFBP_fullname.treefile: ML tree with full leaf names.
- 16S/23S_with_st76.fasta: These files contain the following rRNA sequences.
- Tree_without_st76/
This folder contains almost the same files as above but without the TOSS76 sequences.
- Tara_st76_metagenome/
Files
Files
(1.0 GB)
Additional details
Related works
- Is supplement to
- Preprint: 10.1101/2025.05.02.651781 (DOI)