Published October 15, 2024
| Version v3
Dataset
Open
The genomic origin of the unique chaetognath body plan
Creators
Description
Supplementary files and code for the "The genomic origin of the unique chaetognath body plan"
ATAC-seq
Processing of called peaks (bed) descripting
classif-atac.ipynb and resulting called peaks in peaks_all_re.txt and filtered version peaks_flt_re.txt.GeneFamilies
Code used for gene family analyses is detailed in
gene_families 2.ipynb using as input: - the gene families inferred by Broccoli
orthologous_groups_eq.txt- the reconciliated gene trees calculated by GeneRax as NHS format:
Chaeto_rev0124_recon.nhx and also as XML in xml/Chaeto_rev0124_recon_xml.tgz with the corresponding code to parse them. The file Chaeto_rev0124_recon.lab.tre has the same trees in a human readable foramt with the gene names for mouse and Drosophila. Resulting files include
- the list of gained, lost and duplicated gene families:
Orthogroups_GLD_re- GO enrichment for chaetognath duplicates:
Pgot_DupGO_enrch_r_BP_wn.tsv- script used to compute 4DTv
stats4D.py from reciprocal gene alignements PargotALI.out.gz. Results are in PargotALI.stats.gz-
Panther_all.txt contains panther annotation for all the proteomes. -
emapper/*.emapper.annotations.gzcontains the eggnog annotation for selected proteomes -
GenEra_34758_gene_ages.tsv is the result of GenEra phylostratigraphic analyses-
loss_gnathi_bflo.txt: amphioxus homologues of genes lost in the gnathiferan lineages-
proteomes-pgot-sel.tgz: proteomes of selected genes used for gene family reconstruction Methylation
-
Script_Chaeto.R: R script to perform data analysis and plotting -
ChaetoDeepToolsCommands.sh: plots of methylation in genes and TEs-
EMseq_files.tar.gz: result file from EM-seq-
Methylated_genes.tsv: list of methylated genes-
MethylationToolkitGenes.txt: analyseis of methylation toolkit -
Paraspadella_EMseq.CGmap.gz: EMSeq genome-wide mapOperonTransSplicing
-
SL_Operon_redux-chim.ipynb: notebook describing the annotation of operons -
SL_status_eq.txt: SL assigned to genes -
go-basic.obo: gene ontology file-
Pgot_lowinput_SLs_counts_eq.tsv: counts of splice-leaders detetected for transcripts of annotated genes -
Pgot_operons_filt_eq.txt: list of annotated operons -
Pgot_OvL0Qm.cro.sizes: list of chromsome and scaffold sizes-
Pgot_oper_GOe_eq.tsv: GO enrichment in operonsRessources
Main ressource files including :
-
Pgot_OvL0Qm_cn.fa.gz: genome fasta file -
Pgot_OvL0Qm_aPe.gtf.gz: GTF file -
Pgot_genInfo_rr.txt: list of genes with function annotation, gene family, domains, phylostrata, etc...-
Pgot_OvL0Qm_aP.repeats.cro.bed: BED files with positions of repeats Single-cell
-
SAMap_vignette.ipynb: notebook describing how to run SAMap. -
markers.tsv: list of cell-types marker genes inferred using Seurat-
ChaetoGN_Lau.Rmd: R markdown summarising main analyses steps -
Ch_v5_chim.RDS: R object containing the analyses datasets-
maps/*: results of SAMAP comparisons Hi-C
-
NOTEBOOK_all_hic_analyses.ipynb detailed the analyses of Hi-C data using other scripts and files in this folder-
data` contains main Hi-C datafiles including multires contact map chaeto.matrix.final.allres.hic Files
For_Zenodo.zip
Files
(2.1 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:1ac980053dbb3721d3c9ce388edd8334
|
2.1 GB | Preview Download |
Additional details
Dates
- Created
-
2014-10-20