GraphaRNA dataset and model
Authors/Creators
Description
Graph Neural Network and Diffusion Model for Modeling RNA Interatomic Interactions
This repository contains the datasets and the pre-trained model associated with GraphaRNA, a diffusion-based graph neural network for RNA 3D structure prediction. The data is organized into multiple files, each providing key resources for training, validation, and testing the model, as well as a pre-trained model ready for inference.
Data Overview:
-
rRNA_tRNA.tar.gz:
- Contains raw PDB files with extracted descriptors from ribosomal RNA (rRNA) and transfer RNA (tRNA) structures.
-
non_rRNA_tRNA.tar.gz:
- Contains raw PDB files with extracted descriptors from RNA molecules that are non-rRNA and non-tRNA. These serve as a separate test set.
-
train-pkl.tar.gz:
- Contains the filtered and preprocessed pickle files for the training set, derived from the rRNA_tRNA dataset. These files are used to train GraphaRNA.
-
val-pkl.tar.gz:
- Contains the validation set, which is a subset of the training data from train-pkl.tar.gz.
-
test-pkl.tar.gz:
- Contains the preprocessed pickle files for the test set, derived from the non_rRNA_tRNA dataset. This set includes RNA descriptors that are not rRNA or tRNA, providing a challenging test scenario.
-
model_epoch_800.tar.gz:
- Contains the pre-trained GraphaRNA model after 800 epochs of training on the train-pkl dataset. This model is ready for inference and evaluation.
Use of Data and Model:
- The raw PDB files can be used for RNA descriptor extraction, while the pickle files are preprocessed for direct use in training, validation, and testing workflows.
- The GraphaRNA model in
model_epoch_800.tar.gzcan be used to run inference on new RNA data or to reproduce results from the associated paper.
How to Use:
- Training: The
train-pkl.tar.gzcontains data that can be used to retrain the GraphaRNA model from scratch. - Validation: The
val-pkl.tar.gzcan be used to validate the model during or after training. - Testing: Use the
test-pkl.tar.gzto evaluate the model's performance on RNA types that it wasn't trained on (non-rRNA and non-tRNA). - Inference: The
model_epoch_800.tar.gzis ready for inference on new RNA sequences.
Acknowledgments:
If you use this dataset or the pre-trained model in your research, please cite the associated paper (linked here once published).
Files
Files
(5.3 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:4d53e950852d618c9104a8834e9e463f
|
2.8 GB | Download |
|
md5:2c1411cec54a844575382d897dfc3cdf
|
377.6 MB | Download |
|
md5:79ac4708bb6f4f6120827e96c5ecf41b
|
2.1 GB | Download |
|
md5:1e702ebcbec1deed93c6c877dfcc5a7f
|
19.6 MB | Download |
|
md5:0747f671053994bb0d397686fe1954c0
|
65.8 MB | Download |
|
md5:f72e71365f16969f4d41f5ca9ecd8994
|
784.9 kB | Download |