Published October 15, 2024
| Version v3
Dataset
Open
The genomic origin of the unique chaetognath body plan
Authors/Creators
Description
Supplementary files and code for the "The genomic origin of the unique chaetognath body plan"
ATAC-seq
Processing of called peaks (bed) descripting
classif-atac.ipynb and resulting called peaks in peaks_all_re.txt and filtered version peaks_flt_re.txt.GeneFamilies
Code used for gene family analyses is detailed in
gene_families 2.ipynb using as input: - the gene families inferred by Broccoli
orthologous_groups_eq.txt- the reconciliated gene trees calculated by GeneRax as NHS format:
Chaeto_rev0124_recon.nhx and also as XML in xml/Chaeto_rev0124_recon_xml.tgz with the corresponding code to parse them. The file Chaeto_rev0124_recon.lab.tre has the same trees in a human readable foramt with the gene names for mouse and Drosophila. Resulting files include
- the list of gained, lost and duplicated gene families:
Orthogroups_GLD_re- GO enrichment for chaetognath duplicates:
Pgot_DupGO_enrch_r_BP_wn.tsv- script used to compute 4DTv
stats4D.py from reciprocal gene alignements PargotALI.out.gz. Results are in PargotALI.stats.gz-
Panther_all.txt contains panther annotation for all the proteomes. -
emapper/*.emapper.annotations.gzcontains the eggnog annotation for selected proteomes -
GenEra_34758_gene_ages.tsv is the result of GenEra phylostratigraphic analyses-
loss_gnathi_bflo.txt: amphioxus homologues of genes lost in the gnathiferan lineages-
proteomes-pgot-sel.tgz: proteomes of selected genes used for gene family reconstruction Methylation
-
Script_Chaeto.R: R script to perform data analysis and plotting -
ChaetoDeepToolsCommands.sh: plots of methylation in genes and TEs-
EMseq_files.tar.gz: result file from EM-seq-
Methylated_genes.tsv: list of methylated genes-
MethylationToolkitGenes.txt: analyseis of methylation toolkit -
Paraspadella_EMseq.CGmap.gz: EMSeq genome-wide mapOperonTransSplicing
-
SL_Operon_redux-chim.ipynb: notebook describing the annotation of operons -
SL_status_eq.txt: SL assigned to genes -
go-basic.obo: gene ontology file-
Pgot_lowinput_SLs_counts_eq.tsv: counts of splice-leaders detetected for transcripts of annotated genes -
Pgot_operons_filt_eq.txt: list of annotated operons -
Pgot_OvL0Qm.cro.sizes: list of chromsome and scaffold sizes-
Pgot_oper_GOe_eq.tsv: GO enrichment in operonsRessources
Main ressource files including :
-
Pgot_OvL0Qm_cn.fa.gz: genome fasta file -
Pgot_OvL0Qm_aPe.gtf.gz: GTF file -
Pgot_genInfo_rr.txt: list of genes with function annotation, gene family, domains, phylostrata, etc...-
Pgot_OvL0Qm_aP.repeats.cro.bed: BED files with positions of repeats Single-cell
-
SAMap_vignette.ipynb: notebook describing how to run SAMap. -
markers.tsv: list of cell-types marker genes inferred using Seurat-
ChaetoGN_Lau.Rmd: R markdown summarising main analyses steps -
Ch_v5_chim.RDS: R object containing the analyses datasets-
maps/*: results of SAMAP comparisons Hi-C
-
NOTEBOOK_all_hic_analyses.ipynb detailed the analyses of Hi-C data using other scripts and files in this folder-
data` contains main Hi-C datafiles including multires contact map chaeto.matrix.final.allres.hic Files
For_Zenodo.zip
Files
(2.1 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:1ac980053dbb3721d3c9ce388edd8334
|
2.1 GB | Preview Download |
Additional details
Dates
- Created
-
2014-10-20