Published April 28, 2021 | Version v1
Dataset Open

Stramenopile dataset for positive selection

  • 1. University of Tübingen

Description

This repository contains the data (sequences, annotations, and intermediary files) collected and produced during the preparation of the following preprint: https://doi.org/10.1101/2021.01.12.426341. Description of files:

  • The samples.zip file contains for each taxa:
    • The functional annotations from Interproscan with extension ".tsv" in a TSV format.
    • The genome annotations with extension ".gff" in a gff3 format.
    • The protein sequences with extension ".faa" in a fasta format.
    • The corresponding coding DNA sequences with extension ".fna" in a fasta format.
  • The all_ann.csv file contains all annotations from the tested genes with added information for positive selection and orthology status in a CSV format.
  • The go_mapping.csv file contains the mapping of GO terms to protein accessions of the dataset in a CSV format.
  • The protein_families.poff.tsv contains the proteinortho output file corresponding to the classification of the genes in the dataset into ortholog groups in a TSV format.
  • The families.zip file contains the intermediary files for each of the selected orthogroups:
    • Tree files in newick format in the folder "trees".
    • Protein sequences in the folder  "faas".
    • Coding DNA sequences in the folder "fnas".
    • Log outputs from the FUBAR analysis in the folder "logs".
    • Codon alignments in the folder "codon_alns"
  • The families_fubar.zip file contains the same files as before for the subset of orthogroups with a positive result in the FUBAR analysis plus log output from the aBSREL analysis in the log folder.

Files

all_ann.csv

Files (21.1 GB)

Name Size Download all
md5:68d82b41788622e8c978e1f2737a0966
18.5 GB Preview Download
md5:2d7afbe1d40fcecd7aa0c2ac5e0e2f21
529.8 MB Preview Download
md5:3f37f12a8b947561a378040aaeac0b40
57.2 MB Preview Download
md5:fe2f9c20a2edd02bd3453148b7ff4940
37.8 MB Download
md5:cbdebbf18ded0cbd57d540605c14e9b2
2.0 GB Preview Download

Additional details

Related works

Is cited by
Preprint: 10.1101/2021.01.12.426341v3 (DOI)