Supplementary data for "New Targets and Procedures for Validating the Valence Geometry of Nucleic Acid Structures"
Contributors
Researcher (3):
Description
This repository contains data and code accompanying the paper "New Targets And Procedures For Validating The Valence Geometry Of Nucleic Acid Structures" by Černý et al. The files are as follows:
pdb_na_reference_set.zip- the complete PDB-NA Reference Setzprime_thresholds.zip- thresholds for the weighted asymmetric non-parametric standard score (Z')restraints_in literature_and_refinement_software.xlsx- listing of geometrical restraints for nucleic acid bond lengths and angles found in the literature and refinement programsfiltering_and_prosco_code.zip- the code for filtering the PDB-NA Reference Set and for calculating the probability percentile score (ProSco)residues_removed_after_expert_inspection.csv- list of residues excluded from the PDB-NA Reference Set after manual inspectionPreferred_CSD_stats.ods- table with PDB-wide summary of proportions of the lower and upper boundaries between Preferred and Allowed determined by CSD 3σ values rather than ProSco 5 valuesZ-prime_analysis.zip- An html report with the visualizations used to inspect the effect of different Z' thresholdsprosco_json.zip- ProSco values in JSON format for all analyzed bond lengths and angles
Additional information about the content of the "filtering_and_prosco_code.zip" file:
The "filtering" directory contains scripts and data for quality filtering of non-redundant DNA and RNA residues forming the "PDB NA Reference Set".
"rcsb_all_DNA+RNA_within_3.5A_xray_with_data.txt" - list of DNA and RNA xray PDB structures within 3.5A crystallographic resolution where experimental data are available. This was obtained by an Advanced Search query with the mentioned parameters at rscb.org web site.
The scripts in the expected order of calling:
- "
validation_xml2json" - Converting the xml-formatted PDB validation reports from XML format to JSON. - "
graphQL_GET_protein_clusters" - The script queries the https://data.rcsb.org/graphql endpoint for details about biomolecular chains contained in the list of xray structure. The "rcsb_DNA+RNA_graphQL.json" file with the requested data is generated. - "
make_non_redundant_DNA.py" and "make_non_redundant_RNA.py" - Identification of DNA and RNA sequence clusters in complexes with proteins and naked NAs. This step relies on modified BioPython substitution matrix, update your local instalation by files from the attached Bio directory. - "
calculate_scores_single_res.py" - Adding the quality score for each NA chain. - "
process_non_redundant" - Compiles the set of highest quality non-redundant NA chains.
The "naval" directory contains a C++ re-implementation of the python-based annotation code (https://github.com/mkowiel/nucleic-acid-validation.git).
The C++ program depends on the "libLLKA" library used at the https://dnatco.datmos.org web service. The source code is available from https://github.com/cernylab/libLLKA.git repository.
The code processes the structures from "filtering" step, measures the bond lengths and angles and returns an intermediate classification. Only the CSD-related classes are used further for the final composite validation tier (combining the ProSco, CSD, and Z' scores).
The "prosco" directory contains the R script and auxiliary shell scripts for (re)calculation of the *_prosco.json files. It uses the "naval" annotated csv files from "pdb_na_reference_set.zip" as the input.
Files
filtering_and_prosco_code.zip
Files
(96.8 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:c2de4601d22834ee4ee474706a2c3f39
|
9.7 MB | Preview Download |
|
md5:a531d308a3984eaefb453fc988c91359
|
2.6 MB | Preview Download |
|
md5:1d8ca40cdb9d48c8fc8f0419315b3269
|
53.3 kB | Download |
|
md5:cb9a8818f2d2341d986b90af2c0317b4
|
20.1 MB | Preview Download |
|
md5:b391b3cbfd833a8c97970827ec76d869
|
291 Bytes | Preview Download |
|
md5:c0ea6a31d353d3baa658d19440a3fe74
|
212.1 kB | Download |
|
md5:117a6c16b75dc2eea423022a37e2d0f2
|
64.0 MB | Preview Download |
|
md5:fbc182002554d92382fbe31a22e6d56c
|
8.0 kB | Preview Download |