There is a newer version of the record available.

Published September 11, 2024 | Version v1
Dataset Open

GraphaRNA dataset and model

  • 1. ROR icon Poznań University of Technology
  • 2. ROR icon Bowling Green State University

Description

Graph Neural Network and Diffusion Model for Modeling RNA Interatomic Interactions

This repository contains the datasets and the pre-trained model associated with GraphaRNA, a diffusion-based graph neural network for RNA 3D structure prediction. The data is organized into multiple files, each providing key resources for training, validation, and testing the model, as well as a pre-trained model ready for inference.

Data Overview:

  1. rRNA_tRNA.tar.gz:

    • Contains raw PDB files with extracted descriptors from ribosomal RNA (rRNA) and transfer RNA (tRNA) structures.
  2. non_rRNA_tRNA.tar.gz:

    • Contains raw PDB files with extracted descriptors from RNA molecules that are non-rRNA and non-tRNA. These serve as a separate test set.
  3. train-pkl.tar.gz:

    • Contains the filtered and preprocessed pickle files for the training set, derived from the rRNA_tRNA dataset. These files are used to train GraphaRNA.
  4. val-pkl.tar.gz:

    • Contains the validation set, which is a subset of the training data from train-pkl.tar.gz.
  5. test-pkl.tar.gz:

    • Contains the preprocessed pickle files for the test set, derived from the non_rRNA_tRNA dataset. This set includes RNA descriptors that are not rRNA or tRNA, providing a challenging test scenario.
  6. model_epoch_800.tar.gz:

    • Contains the pre-trained GraphaRNA model after 800 epochs of training on the train-pkl dataset. This model is ready for inference and evaluation.

Use of Data and Model:

  • The raw PDB files can be used for RNA descriptor extraction, while the pickle files are preprocessed for direct use in training, validation, and testing workflows.
  • The GraphaRNA model in model_epoch_800.tar.gz can be used to run inference on new RNA data or to reproduce results from the associated paper.

How to Use:

  • Training: The train-pkl.tar.gz contains data that can be used to retrain the GraphaRNA model from scratch.
  • Validation: The val-pkl.tar.gz can be used to validate the model during or after training.
  • Testing: Use the test-pkl.tar.gz to evaluate the model's performance on RNA types that it wasn't trained on (non-rRNA and non-tRNA).
  • Inference: The model_epoch_800.tar.gz is ready for inference on new RNA sequences.

Acknowledgments:

If you use this dataset or the pre-trained model in your research, please cite the associated paper (linked here once published).

Files

Files (5.3 GB)

Name Size Download all
md5:4d53e950852d618c9104a8834e9e463f
2.8 GB Download
md5:2c1411cec54a844575382d897dfc3cdf
377.6 MB Download
md5:79ac4708bb6f4f6120827e96c5ecf41b
2.1 GB Download
md5:1e702ebcbec1deed93c6c877dfcc5a7f
19.6 MB Download
md5:0747f671053994bb0d397686fe1954c0
65.8 MB Download
md5:f72e71365f16969f4d41f5ca9ecd8994
784.9 kB Download