rbLEC - restricted backbone Local Euler Characteristic - from CATH database
Description
-----------------------------------------------------------------------------------------------------------------------------------
Author: Rodrigo A. Moreira (C) 2023
https://orcid.org/0000-0002-7605-8722
LICENSE: CC BY-NC-ND 4.0 (https://creativecommons.org/licenses/by-nc-nd/4.0/)
----------------------------------------------------------------------------------------------------------------------------------
rbLEC - Local Euler Charactersitics - from CATH database
----------------------------------------------------------------------------------------------------------------------------------
A. rbLEC NETWORK
[I] The networks for each PDB[1] structure is defined by the PDB atoms N,CA,C of each residue as nodes of a graph G.
[II] An edge of G is set if the distance between two atom in [I] is greater than 2.0 Angstrons.
[III] The graph G is defined in the files with extensions ".network_backboneRE_heavy_gt2"
Equation (1) [6,7]
\begin{equation}
\chi = \sum_{k=1}^{N} \kappa_k = \sum_{k=1}^{N} \underbrace{ \left(1 + \sum_{l=1}^{\infty} (-1)^{l} \frac{v_{l-1}}{l+1} \right)_{k}}_{\kappa_k}
\end{equation}
Equation (2)
\begin{equation}
LEC = \sum_{m \in R} \kappa_m = \kappa_{N} + \kappa_{CA} + \kappa_{C}
\end{equation}
B. FILENAME EXTENSIONS
B.1 Basic files
".fixed"
PDB file after use of pdbfixer[2] in structures from CATH database.
".dssp"
Output of DSSP[3] software
".stride"
Output of STRIDE[4] software
B.2 Data files
".network_backboneRE_heavy_gt2" - Generate by D.2 below.
Describe the network graph, as described in A. above.
".knill_curvature" - Generate by D.1 below.
Contain the filtration of kappas for each vertice of the network.
".residues_curvature" - Generate by D.1 below.
They are the filtration of LEC, Equation (2) above, for each residue, namely summation of 3 kappas from respective '.knill_curvature', correspoings to PDB atoms N,CA and C, describe in A. above.
".label" - Generated by D.3 below
Extra file for easier assesment of structures. They have the same information about LEC as described in respective ".residue_curvature" file extensions, but merge also the information from ".dssp" and ".stride" classes as well as residue name and residue ID for each molecule.
Format of columns:
cutoff resname resid DSSP_class STRIDE_class LEC
C. FOLDERS
CATH_FIXED (after uncompress cath_fixed.tar.xz, approximately 13GB)
contains the fixed PDBs and LECs from CATH[5] database
D. SOFTWARE
D.1 lec.py: compute the kappas in Equation (1) above.
Example usage:
$ python3 lec.py CATH_FIXED/2x0qA02/2x0qA02
It will create the files with extension ".kappas" and ".relec", which reproduces the respectively the files with extension ".knill_curvature" and ".residue_curvature".
D.2 pdb2network.lua: creates rbLEC network file (number of nodes and edges list) from PDB to be used as input by lec.py.
Example usage:
$ lua pdb2rbLEC.lua CATH_FIXED/2x0qA02/2x0qA02.fixed
Output reproduces the file CATH_FIXED/2x0qA02/2x0qA02.pdb.network_backboneRE_heavy_gt2
D.3 label.lua: create files with extension '*.label' from files '*.pdb.stride', '*.pdb.dssp' and '*.pdb.network_backboneRE_heavy_gt2.residues_curvature.
Example usage:
$ lua label.lua CATH_FIXED/2x0qA02/2x0qA02.pdb
Output reproduces the file CATH_FIXED/2x0qA02/2x0qA02.pdb.network_backboneRE_heavy_gt2.residues_curvature.label
REFERENCES
[1] Herman, H., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T., Weissig, H., Shindyalov, I., & Bourne, P. (2000). The protein data bank. Nucleic acids research, 28, 235–42.
[2] Eastman, P., Swails, J., Chodera, J., McGibbon, R., Zhao, Y., Beauchamp, K., Wang, L.P., Simmonett, A., Harrigan, M., Stern, C., & others (2017). OpenMM 7: Rapid development of high performance algorithms for molecular dynamics. PLoS computational biology, 13(7), e1005659.
[3] Kabsch, W., & Sander, C. (1983). Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers: Original Research on Biomolecules, 22(12), 2577–2637.
[4] Frishman, D., & Argos, P. (1995). Knowledge-based protein secondary structure assignment. Proteins: Structure, Function, and Bioinformatics, 23(4), 566–579.
[5] Knudsen, M., & Wiuf, C. (2010). The CATH database. Human genomics, 4(3), 1–6.
[6] Levitt, N. (1992). The Euler characteristic is the unique locally determined numerical homotopy invariant of finite complexes. Discrete & computational geometry, 7, 59–67.
[7] Knill, O. (2011). A graph theoretical Gauss-Bonnet-Chern theorem. arXiv preprint arXiv:1111.5395.
Files
Files
(1.6 GB)
Additional details
References
- Herman, H., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T., Weissig, H., Shindyalov, I., & Bourne, P. (2000). The protein data bank. Nucleic acids research, 28, 235–42.
- Eastman, P., Swails, J., Chodera, J., McGibbon, R., Zhao, Y., Beauchamp, K., Wang, L.P., Simmonett, A., Harrigan, M., Stern, C., & others (2017). OpenMM 7: Rapid development of high performance algorithms for molecular dynamics. PLoS computational biology, 13(7), e1005659.
- Kabsch, W., & Sander, C. (1983). Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers: Original Research on Biomolecules, 22(12), 2577–2637.
- Frishman, D., & Argos, P. (1995). Knowledge-based protein secondary structure assignment. Proteins: Structure, Function, and Bioinformatics, 23(4), 566–579.
- Knudsen, M., & Wiuf, C. (2010). The CATH database. Human genomics, 4(3), 1–6.
- Levitt, N. (1992). The Euler characteristic is the unique locally determined numerical homotopy invariant of finite complexes. Discrete & computational geometry, 7, 59–67.
- Knill, O. (2011). A graph theoretical Gauss-Bonnet-Chern theorem. arXiv preprint arXiv:1111.5395.