Published July 1, 2023 | Version 1.0
Dataset Open

High-quality RNA residues: RNA2023

  • 1. Duke University

Description

Introduction
--------------------------------------------------------------------------------
This is the RNA2023 dataset by the Richardson Lab at Duke University

These are high-quality residues from high-quality, low-redundancy RNA chains in the PDB.

For a similar set of quality-filtered protein residues, see the top2018 datasets at:
https://doi.org/10.5281/zenodo.4626149
https://doi.org/10.5281/zenodo.5115232

 

Corresponding authors
--------------------------------------------------------------------------------
dcrjsr at kinemage.biochem.duke.edu
christopher.sci.williams at gmail.com


Usage recommendations
--------------------------------------------------------------------------------
RNA residues that fail the filtering criteria described below have been removed from the files.  As a result, these files can be considered pre-filtered and will return only results for residues of good model quality with supporting experimental data.

Files already contain hydrogens added by Reduce in the context of the original full models.

Two datasets are provided.  The standard dataset is rna2023_pruned.  We recommend this version as the default.  The RNA backbone conformational space is highly diverse, and some real conformations fall below the statistical threshold for recognition as suites.  Therefore we do not recommend excluding suite outliers from the dataset except in specialty cases.  We also provide a rna2023_nosuiteout dataset.  In this case, no residues having "!!" outlier suite identifications are permitted.  This set may be useful in specialist cases where suite outliers are undesireable or where losing some real conformations is an acceptable sacrifice for maximal filtering.

Each dataset also has a mmCIF version.

Note: Chains are named based on author chain ids, except for 8b0x, chain a.  To avoid conflicts with 8b0x chain A in file systems that do not support case-sensitive file names, 8b0x chain a has been renamed to chain AB, matching its PDB/mmCIF designation.


Additional files
--------------------------------------------------------------------------------
rna2023_pdbmetadata.csv contains information on release date, resolution, title, authors, etc for each source pdb.

rna2023_chain_list contains a list of all included chains, plus statistics on the number residues from the original chain passed the quality filters.

rna2023_suitename_table.csv and rna2023_suitename_table_nosuiteout.csv contain suitename identifications of rotameric RNA backbone conformations (1a, 1c, 2u, 6d, etc) precomputed for convenience.


Filtering criteria: Chain level
--------------------------------------------------------------------------------
The chain list was derived from http://rna.bgsu.edu/rna3dhub/nrlist, version 3.150 as of 2020/10/28, with a 1.9Å resolution cutoff.

We added 6ugg chain A and two recent EM ribosome structures: 8a3d and 8b0x

After residue-level filtering, chains with no complete suites were removed.


Filtering criteria: Residue level
--------------------------------------------------------------------------------
Even excellent structures usually contain some poorly-resolved regions.  Residue-level filtering helps avoid including these regions in otherwise high-quality data

Residues are required to meet the following validation quality contain:
No sugar pucker outliers
No steric overlaps or "clashes", as per Probe >= 0.5Å
No covalent bond or angle geometry outliers
Optionally, no !! suite outliers

Residues from xray structures are required for meet the following fit-to-map criteria:
Average of worst 2 atoms' 2Fo-Fc map values >= 1.2
Average of worst 2 atoms' RSCC scores >= 0.7
No atoms modeled at partial occupancy

Residues from em structures are required for meet the following fit-to-map criteria:
RSCC >= 0.7
Residue inclusion fraction = 1.0 or >= 0.95, depending on structure
No atoms modeled at partial occupancy

Filtering is documented in each pruned file. See USER  DOC lines in .pdb and data_rna2023_dataset loops in .cif


Version history
--------------------------------------------------------------------------------
Version 1.0 Jun 30, 2023
Initial version

Files

README.txt

Files (25.1 MB)

Name Size Download all
md5:0bc079a0b5fcfdc960a4f573adcbd046
4.2 kB Preview Download
md5:28427e5cf2a62e964d9fbd1143845347
3.3 kB Preview Download
md5:1619753172b144d20e1b1b311f7073a9
3.3 kB Preview Download
md5:b1d4dcf93b9fcbfc9da4b63e1c904db9
24.8 kB Preview Download
md5:f0db1f670f3e12f231eeae7db144308d
6.7 MB Preview Download
md5:e9943504c35148ef934c225f52285853
6.4 MB Preview Download
md5:62827266ec09834450a6640dc918b4f5
5.7 MB Preview Download
md5:bf0c1016159863f69adf659ee7330866
6.0 MB Preview Download
md5:04c4c4c5e19f5b2ba96428cc396731ea
132.8 kB Preview Download
md5:89bf68132719b4475ed4727da4df8259
118.9 kB Preview Download