The pan-genome of Saccharomyces cerevisiae
Description
These datasets are related to 'The pan-genome of Saccharomyces cerevisiae' (Li G., Ji B., and Nielsen J.).
This deposition contains following datasets:
(1) Genomes.tar.gz: a compressed file containing 1392 Saccharomyces cerevisiae genome assembles analyzed
(2) genome_information.tsv: a tab-separated text file that contains the basic information of above genomes.
(3) ClusterFasta.tar.gz: a compressed file that contains a list of fasta files. Each fasta file contains protein sequences in a cluster. The name of the fasta file is the name of the representative sequence of that cluster.
(4) sc_gene_cluster_info_0.7_v4.tsv: a tab-separated text file that contains the properties of gene clusters.
(5) gene_presence_absence_v4.tsv: a tab-separated text file that contains the gene-presence/absence information. Each columns is a gene cluster. Each row is a genome. Y/N is used to present presence/absence.
(6) gene_num_in_clusters_of_each_strain_v4.tsv: a tab-sparated text file that contains the gene number of each genome in each cluster (copy number). Each columns is a gene cluster. Each row is a genome.
(7) feature_importances_cv5_pa_cnv.tsv: a tab-separated file that contains the feature importance from a random forest classifier in a 5-fold cross-validation approach. The classifier was trained on gene presence/absence table (PA) or copy number table (CNV). The columns 'pa_x' indicate the feature importance in each fold of cross-validation on PA dataset. The columns 'cnv_x' indicate the feature importance in each fold of cross-validation on CNV dataset.
Files
Files
(5.6 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:875cbee673d146810bb80255683b167c
|
96.2 MB | Download |
|
md5:e8f95a0b9fe822613e13d766353d998b
|
1.1 MB | Download |
|
md5:6da02709890239b3492ff50fc0ca2c23
|
19.5 MB | Download |
|
md5:7df341116b52305127df3cc2bc7cf25b
|
19.5 MB | Download |
|
md5:9447ee559d20b95efe121fc130ab851e
|
200.3 kB | Download |
|
md5:a84b6c04748ce9b9737e0af6b2b83cc2
|
5.5 GB | Download |
|
md5:7f19697360143861ac9d664ce5f97398
|
446.1 kB | Download |