Published August 25, 2023
| Version v2
Software
Open
Multiple sequence-alignment-based RNA language model and its application to structural inference
Description
News: More details and updates can be found at https://github.com/yikunpku/RNA-MSM
This project contains codes and [pre-trained weight](https://drive.google.com/file/d/11A-S13qAb5wiBi1YLs3EOrnixSDq7Q0q/view?usp=share_link) for MSA RNA language model (**RNA-MSM**) as well as RNA secondary structure and solvent accessibility tasks and corresponding [RNA datasets](https://drive.google.com/drive/folders/1jYqk7rAp9ysJCBXOa5Yx4Z9es89h-f2h?usp=sharing).
RNA-MSM is the first unsupervised MSA RNA language model based on aligned homologous sequences that outputs both embedding and attention map to match different types of downstream tasks.
The resulting RNA-MSM model produced attention maps and embeddings that have direct correlations to RNA secondary structure and solvent accessibility without supervised training. Further supervised training led to predicted secondary structure and solvent accessibility that are **significantly more accurate than current state-of-the-art techniques**. Unlike many previous studies, we would like to emphasize that we were **extremely careful in avoiding over training**, a significant problem in applying deep learning to RNA by **choosing validation and test sets structurally different from the training set**.
More details can be found at https://github.com/yikunpku/RNA-MSM
Files
RNA-MSM.zip
Files
(13.8 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:a0ae5afad544b12f20ad29daca52ea6f
|
13.8 MB | Preview Download |