There is a newer version of the record available.

Published May 7, 2024 | Version 1.0.0
Dataset Open

Datasets and Jupyter notebook for the structural analysis of protein-RNA interface evolution

  • 1. Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198, Gif-sur-Yvette, France
  • 1. Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198, Gif-sur-Yvette, France

Description

The present repository contains data and code related to our manuscript "Structural comparison of protein-RNA homologous interfaces reveals widespread overall conservation contrasted with versatility in polar contacts". In the manuscript, we analyze the evolution of protein-RNA interfaces by building a dataset of protein-RNA interologs (homologous interfaces) and exploring how interface contacts are conserved between homologous interfaces, as well as possible explanations for non-conserved contacts.

This repository contains the following files:

  • DataAnalysisNotebook.ipynb is a Jupyter notebook to reproduce contact conservation analysis and all figures from our manuscript 
  • 2022-02-21-PDB.csv contains data from the PDB about 3D structures of complexes containing interacting protein and RNA chains (PDB structure identifier, chain identifiers, experimental technique and resolution)
  • 2022-02-21-PDB_proteinchainscontactingRNAchains.groupbp.tsv contains more detailed information about interacting protein and RNA chains from these complexes (PDB and chain identifiers, protein and RNA size, interface size and number of contacts)
  • 2022-02-21-PDB_proteinchainscontactingRNAchains.groupbp.txt.selectXE_2.50_p30_r10_pi5_ri5_rep_bc-100.out_RNAcl_0.99.tsv contains the same detailed information, restricted to the filtered dataset used as a starting point in our interolog search pipeline
  • PDBinterfaceAlign.csv contains information about the structural alignment of pairs of protein-RNA interactions (structural alignment TM-scores, sequence identity and coverage)
  • DataInterologsParam.tsv contains information about a pre-filtered set of 2587 potential interologs (including interface RMSD, sequence identity and coverage and interface size)
  • DataInterologsContacts.tsv contains detailed information about conserved and non-conserved contacts in the final set of 2022 interologs (atomic contacts, apolar contacts, hydrogen bonds, salt bridges and stacking information for aminoacid-nucleotide pairs, as well as information about whether each belongs to the interface, secondary structures, and the aminoacid surface accessibility and evolutionary conservation metrics)
  • DataCons.csv contains precomputed contact conservation metrics for each of the 2022 interolog pairs, for fast reproduction of manuscript figures
  • ListeIntraHbonds.pkl and ListeIntraSaltBridges.pkl are pickle-format data files containing intra-molecular hydrogen bonds and salt bridges (respectively) that are used to analyse scenarii of compensation for non-conserved polar contacts.

Files

DataAnalysisNotebook.ipynb

Files (156.3 MB)

Name Size Download all
md5:e96b56f90b78abdb3711cd13b0f73484
3.6 MB Preview Download
md5:929d3f5cd3e5bad43b7c8ce6e76ddeff
5.5 MB Download
md5:e5f0b0e3c731a055ced645698227f26d
1.4 MB Download
md5:91687c2a2570f65fc67ba9d1253c2391
110.6 kB Preview Download
md5:e4968d6b5dd4f566ba48dd6c5dbb6d9e
459.3 kB Preview Download
md5:96bf65939557f47e225680675038f68b
104.8 MB Download
md5:845d5d2d07b27953f3e88c2f61774cc0
273.6 kB Download
md5:f82b42d5ef60af52a7f96f68eda639f0
2.7 MB Download
md5:87871d5ae2fbd122bf71746671d9277b
157.7 kB Download
md5:000672f6e649cc335511a14c0b57d9da
37.2 MB Preview Download

Additional details

Funding

Agence Nationale de la Recherche
ESPRINet - Integrating heterogeneous Evolutionary, Structural and Omics data to predict Protein-RNA Interaction Networks ANR-18-CE45-0005