Published October 30, 2025
| Version v1
Model
Open
MotifAE: Unsupervised Discovery of Functional Motifs from Protein Language Model
Description
representative_2.3M_seq.csv contains representative proteins from structure-based clustering of Alphafold structure database. The ESM2-650M last layer embeddings of these proteins were used to train SAE and MotifAE.
SAE_step_80000.pt and MotifAE_step_80000.pt are checkpoints at 80,000 steps of both models. SAE was trained with reconstruction loss and L1 norm, MotifAE was trained with an additional local similarity loss.
412pros_ddG_ML.csv contains the deep mutational scanning data of protein folding stability, which is use to train MotifAE-G. 1404_stability_associated_features.pt were selected features using MotifAE-G.
Files
412pros_ddG_ML.csv
Files
(1.7 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:5149210e9d5c7b45eebd51a4008c7f40
|
165.0 kB | Download |
|
md5:9702c3d046b0c9c8fd78666a9751f7f7
|
52.3 MB | Preview Download |
|
md5:cabb07008f26705f07e18344607a2d51
|
419.6 MB | Download |
|
md5:2faacfd5c802d9f13c0f455582269e52
|
765.1 MB | Preview Download |
|
md5:9e424e4f50e0e30ddd98818430d1c775
|
419.6 MB | Download |
Additional details
Dates
- Available
-
2025-11-04
Software
- Repository URL
- https://github.com/CHAOHOU-97/MotifAE
- Programming language
- Python