Published October 15, 2024 | Version v3
Dataset Open

The genomic origin of the unique chaetognath body plan

  • 1. ROR icon University College London
  • 2. ROR icon Queen Mary University of London
  • 3. ROR icon Naturalis Biodiversity Center
  • 4. ROR icon University of California, Berkeley
  • 5. ROR icon Okinawa Institute of Science and Technology Graduate University
  • 6. Mie University

Description

Supplementary files and code for the "The genomic origin of the unique chaetognath body plan"

ATAC-seq

Processing of called peaks (bed) descripting classif-atac.ipynb and resulting called peaks in peaks_all_re.txt and filtered version peaks_flt_re.txt.

GeneFamilies

Code used for gene family analyses is detailed in gene_families 2.ipynb using as input: 
- the gene families inferred by  Broccoli orthologous_groups_eq.txt
- the reconciliated gene trees calculated by GeneRax as NHS format: Chaeto_rev0124_recon.nhx and also as XML in xml/Chaeto_rev0124_recon_xml.tgz with the corresponding code to parse them. The file Chaeto_rev0124_recon.lab.tre has the same trees in a human readable foramt with the gene names for mouse and Drosophila. 
 
Resulting files include 
- the list of gained, lost and duplicated gene families:Orthogroups_GLD_re
- GO enrichment for chaetognath duplicates: Pgot_DupGO_enrch_r_BP_wn.tsv
- script used to compute 4DTv stats4D.py from reciprocal gene alignements PargotALI.out.gz. Results are in PargotALI.stats.gz
Panther_all.txt contains panther annotation for all the proteomes. 
- emapper/*.emapper.annotations.gzcontains the eggnog annotation for selected proteomes 
- GenEra_34758_gene_ages.tsv is the result of GenEra phylostratigraphic analyses
- loss_gnathi_bflo.txt: amphioxus homologues of genes lost in the gnathiferan lineages
- proteomes-pgot-sel.tgz: proteomes of selected genes used for gene family reconstruction 

Methylation

- Script_Chaeto.R: R script to perform data analysis and plotting 
ChaetoDeepToolsCommands.sh: plots of methylation in genes and TEs
- EMseq_files.tar.gz: result file from EM-seq
- Methylated_genes.tsv: list of methylated genes
- MethylationToolkitGenes.txt: analyseis of methylation toolkit 
- Paraspadella_EMseq.CGmap.gz: EMSeq genome-wide map

OperonTransSplicing

SL_Operon_redux-chim.ipynb: notebook describing the annotation of operons 
- SL_status_eq.txt: SL assigned to genes 
- go-basic.obo: gene ontology file
- Pgot_lowinput_SLs_counts_eq.tsv: counts of splice-leaders detetected for transcripts of annotated genes 
- Pgot_operons_filt_eq.txt: list of annotated operons 
- Pgot_OvL0Qm.cro.sizes: list of chromsome and scaffold sizes
- Pgot_oper_GOe_eq.tsv: GO enrichment in operons
 

Ressources 

Main ressource files including : 
- Pgot_OvL0Qm_cn.fa.gz: genome fasta file 
- Pgot_OvL0Qm_aPe.gtf.gz: GTF file 
- Pgot_genInfo_rr.txt: list of genes with function annotation, gene family, domains, phylostrata, etc...
- Pgot_OvL0Qm_aP.repeats.cro.bed: BED files with positions of repeats 

Single-cell

SAMap_vignette.ipynb: notebook describing how to run SAMap. 
- markers.tsv: list of cell-types marker genes inferred using Seurat
- ChaetoGN_Lau.Rmd: R markdown summarising main analyses steps 
- Ch_v5_chim.RDS: R object containing the analyses datasets
- maps/*: results of SAMAP comparisons 

Hi-C 

NOTEBOOK_all_hic_analyses.ipynb detailed the analyses of Hi-C data using other scripts and files in this folder
data` contains main Hi-C datafiles including multires contact map chaeto.matrix.final.allres.hic 

Files

For_Zenodo.zip

Files (2.1 GB)

Name Size Download all
md5:1ac980053dbb3721d3c9ce388edd8334
2.1 GB Preview Download

Additional details

Dates

Created
2014-10-20