Data and Weights for Reverse Homology
Training data, weights, and classification datasets for "Discovering molecular features of intrinsically disordered regions by using evolution for contrastive learning".
- scer_idr_homologues and human_idr_homologues contain a zip file of the fasta files of IDR homologues used to train the yeast and human model, respectively. Note that these fasta files are aligned, but we strip away the alignment symbol "-" before input into our model.
- scer_idr_model and human_idr_model contain a zip file of the weights for the yeast and human model respectively, which can be loaded into the model files at github.com/alexxijielu/reverse_homology. It also contains z-scores of all of the model features across all IDRs, which are required to run the mutational scanning map code.
- IDR_classification_datasets contains datasets used in our benchmarks. These datasets are encoded as binary csv matrixes. cdc28_classification contains IDRs labeled as Cdc28 phosphorylation sites, mitochondrial_targeting_classification contains IDRs labeled as mitochondrial targeting signals, evosig_cluster_classification contains IDRs labeled by clusters assigned in previous computational work by Zarin et al. eLife 2019, and go_SLIM_classification contains proteins labeled by GO Slim annotations.