Published July 29, 2021 | Version v6
Dataset Open

Data and Weights for Reverse Homology

Description

Training data, weights, and classification datasets for "Discovering molecular features of intrinsically disordered regions by using evolution for contrastive learning". 

Training data:

  • scer_idr_homologues and human_idr_homologues contain a zip file of the fasta files of IDR homologues used to train the yeast and human model, respectively. Note that these fasta files are aligned, but we strip away the alignment symbol "-" before input into our model. disprot_idr_homologues contain a zip file of IDR homologues corresponding to the DisProt database.

Weights:

  • scer_idr_model and human_idr_model contain a zip file of the weights for the yeast and human model respectively, which can be loaded into the model files at github.com/alexxijielu/reverse_homology. Likewise, disprot_idr_model contains a zip file of our model trained on DisProt IDRs exclusively.

Logo Websites:

  • scer_idr_logo_website, human_idr_logo_website, and disprot_idr_logo_website contain a zip file of an HTML file for the yeast, human, and DisProt model respectively, showing sequence logos of the features learned by each model and their enrichments.

Features:

  • human_idr_features contains the raw feature activations for all human IDRs in our human model. (We didn't include this file in the supplementary for the paper due to size.) 

Classification datasets:

  • IDR_classification_datasets contains datasets used in our benchmarks. These datasets are encoded as binary csv matrixes. cdc28_classification contains IDRs labeled as Cdc28 phosphorylation sites, mitochondrial_targeting_classification contains IDRs labeled as mitochondrial targeting signals, evosig_cluster_classification contains IDRs labeled by clusters assigned in previous computational work by Zarin et al. eLife 2019, and go_SLIM_classification contains proteins labeled by GO Slim annotations. 

Files

disprot_idr_homologues.zip

Files (257.0 MB)

Name Size Download all
md5:6ac20d5e67a799c50d00cccab46d6606
2.6 MB Preview Download
md5:9f22e83ebd5b282161c98a289831c0c0
5.1 MB Preview Download
md5:9ae603c65fc835e85b8eecd56dda1a1b
19.7 MB Preview Download
md5:a601e725d4369d917a210999ea89671b
69.3 MB Preview Download
md5:beb4abbbebd64a4bfbda2aa8d1a72fde
39.9 MB Preview Download
md5:fb374c9da0bd481c503325fac2c5cd8f
38.9 MB Preview Download
md5:5c4d66df6bcd349be7af221a1195c22b
36.0 MB Preview Download
md5:42b078e818b4deeb02afe8dcbb6f6df1
167.7 kB Preview Download
md5:0964280f9faaaa9c71f2390dd08631d2
6.6 MB Preview Download
md5:c749ecc0f6af6b1c5cf155f396afa3ad
18.7 MB Preview Download
md5:378ad0cb8433af659b98803ab241a4a0
19.9 MB Preview Download