Datasets and Jupyter notebook for the structural analysis of protein-RNA interface evolution
Authors/Creators
- 1. Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198, Gif-sur-Yvette, France
Contributors
Project members:
- 1. Université Paris-Saclay, CEA, CNRS, Institute for Integrative Biology of the Cell (I2BC), 91198, Gif-sur-Yvette, France
Description
The present repository contains data and code related to our manuscript "Structural comparison of protein-RNA homologous interfaces reveals widespread overall conservation contrasted with versatility in polar contacts". In the manuscript, we analyze the evolution of protein-RNA interfaces by building a dataset of protein-RNA interologs (homologous interfaces) and exploring how interface contacts are conserved between homologous interfaces, as well as possible explanations for non-conserved contacts.
This repository contains the following files:
- DataAnalysisNotebook.ipynb is a Jupyter notebook to reproduce contact conservation analysis and all figures from our manuscript
- 2022-02-21-PDB.csv contains data from the PDB about 3D structures of complexes containing interacting protein and RNA chains (PDB structure identifier, chain identifiers, experimental technique and resolution)
- 2022-02-21-PDB_proteinchainscontactingRNAchains.groupbp.tsv contains more detailed information about interacting protein and RNA chains from these complexes (PDB and chain identifiers, protein and RNA size, interface size and number of contacts)
- 2022-02-21-PDB_proteinchainscontactingRNAchains.groupbp.txt.selectXE_2.50_p30_r10_pi5_ri5_rep_bc-100.out_RNAcl_0.99.tsv contains the same detailed information, restricted to the filtered dataset used as a starting point in our interolog search pipeline
- PDBinterfaceAlign.csv contains information about the structural alignment of pairs of protein-RNA interactions (structural alignment TM-scores, sequence identity and coverage)
- DataInterologsParam.tsv contains information about a pre-filtered set of 2587 potential interologs (including interface RMSD, sequence identity and coverage and interface size)
- DataInterologsContacts.tsv contains detailed information about conserved and non-conserved contacts in the final set of 2022 interologs (atomic contacts, apolar contacts, hydrogen bonds, salt bridges and stacking information for aminoacid-nucleotide pairs, as well as information about whether each belongs to the interface, secondary structures, and the aminoacid surface accessibility and evolutionary conservation metrics)
- DataCons.csv contains precomputed contact conservation metrics for each of the 2022 interolog pairs, for fast reproduction of manuscript figures
- ListeIntraHbonds.pkl and ListeIntraSaltBridges.pkl are pickle-format data files containing intra-molecular hydrogen bonds and salt bridges (respectively) that are used to analyse scenarii of compensation for non-conserved polar contacts.
Files
DataAnalysisNotebook.ipynb
Files
(156.3 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:e96b56f90b78abdb3711cd13b0f73484
|
3.6 MB | Preview Download |
|
md5:929d3f5cd3e5bad43b7c8ce6e76ddeff
|
5.5 MB | Download |
|
md5:e5f0b0e3c731a055ced645698227f26d
|
1.4 MB | Download |
|
md5:91687c2a2570f65fc67ba9d1253c2391
|
110.6 kB | Preview Download |
|
md5:e4968d6b5dd4f566ba48dd6c5dbb6d9e
|
459.3 kB | Preview Download |
|
md5:96bf65939557f47e225680675038f68b
|
104.8 MB | Download |
|
md5:845d5d2d07b27953f3e88c2f61774cc0
|
273.6 kB | Download |
|
md5:f82b42d5ef60af52a7f96f68eda639f0
|
2.7 MB | Download |
|
md5:87871d5ae2fbd122bf71746671d9277b
|
157.7 kB | Download |
|
md5:000672f6e649cc335511a14c0b57d9da
|
37.2 MB | Preview Download |
Additional details
Funding
- Agence Nationale de la Recherche
- ESPRINet - Integrating heterogeneous Evolutionary, Structural and Omics data to predict Protein-RNA Interaction Networks ANR-18-CE45-0005