There is a newer version of the record available.

Published August 25, 2023 | Version v2
Software Open

Multiple sequence-alignment-based RNA language model and its application to structural inference

  • 1. zhouyq@szbl.ac.cn

Description

News: More details and updates can be found at https://github.com/yikunpku/RNA-MSM

This project contains codes and [pre-trained weight](https://drive.google.com/file/d/11A-S13qAb5wiBi1YLs3EOrnixSDq7Q0q/view?usp=share_link) for MSA RNA language model (**RNA-MSM**) as well as RNA secondary structure and solvent accessibility tasks and corresponding [RNA datasets](https://drive.google.com/drive/folders/1jYqk7rAp9ysJCBXOa5Yx4Z9es89h-f2h?usp=sharing).


RNA-MSM is the first unsupervised MSA RNA language model based on aligned homologous sequences that outputs both embedding and attention map to match different types of downstream tasks.

The resulting RNA-MSM model produced attention maps and embeddings that have direct correlations to RNA secondary structure and solvent accessibility without supervised training. Further supervised training led to predicted secondary structure and solvent accessibility that are **significantly more accurate than current state-of-the-art techniques**. Unlike many previous studies, we would like to emphasize that we were **extremely careful in avoiding over training**, a significant problem in applying deep learning to RNA by **choosing validation and test sets structurally different from the training set**.

More details can be found at https://github.com/yikunpku/RNA-MSM

Files

RNA-MSM.zip

Files (13.8 MB)

Name Size Download all
md5:a0ae5afad544b12f20ad29daca52ea6f
13.8 MB Preview Download