Published September 27, 2023 | Version v1
Dataset Open

rbLEC - restricted backbone Local Euler Characteristic - from CATH database

  • 1. Basque Center for Applied Mathematics

Contributors

  • 1. Basque Center for Applied Mathematics

Description

-----------------------------------------------------------------------------------------------------------------------------------

Author: Rodrigo A. Moreira (C) 2023
https://orcid.org/0000-0002-7605-8722
LICENSE: CC BY-NC-ND 4.0 (https://creativecommons.org/licenses/by-nc-nd/4.0/)

----------------------------------------------------------------------------------------------------------------------------------

rbLEC - Local Euler Charactersitics - from CATH database

----------------------------------------------------------------------------------------------------------------------------------

A. rbLEC NETWORK

    [I] The networks for each PDB[1] structure is defined by the PDB atoms N,CA,C of each residue as nodes of a graph G.
    [II] An edge of G is set if the distance between two atom in [I] is greater than 2.0 Angstrons.
    [III] The graph G is defined in the files with extensions ".network_backboneRE_heavy_gt2"

Equation (1) [6,7]
    \begin{equation}
        \chi = \sum_{k=1}^{N} \kappa_k = \sum_{k=1}^{N} \underbrace{ \left(1 + \sum_{l=1}^{\infty} (-1)^{l} \frac{v_{l-1}}{l+1} \right)_{k}}_{\kappa_k}
    \end{equation}

Equation (2)
    \begin{equation}
        LEC = \sum_{m \in R} \kappa_m = \kappa_{N} + \kappa_{CA} + \kappa_{C}
    \end{equation}

B. FILENAME EXTENSIONS

  B.1 Basic files

".fixed"
    PDB file after use of pdbfixer[2] in structures from CATH database.

".dssp"
    Output of DSSP[3] software

".stride"
    Output of STRIDE[4] software

  B.2 Data files

".network_backboneRE_heavy_gt2" - Generate by D.2 below.
    Describe the network graph, as described in A. above.

".knill_curvature" - Generate by D.1 below.
    Contain the filtration of kappas for each vertice of the network.

".residues_curvature"  - Generate by D.1 below.
    They are the filtration of LEC, Equation (2) above, for each residue, namely summation of 3 kappas from respective '.knill_curvature', correspoings to PDB atoms N,CA and C, describe in A. above.

".label"  - Generated by D.3 below
    Extra file for easier assesment of structures. They have the same information about LEC as described in respective ".residue_curvature" file extensions, but merge also the information from ".dssp" and ".stride" classes as well as residue name and residue ID for each molecule.
    Format of columns:
        cutoff resname resid DSSP_class STRIDE_class LEC

C. FOLDERS

    CATH_FIXED (after uncompress cath_fixed.tar.xz, approximately 13GB)
        contains the fixed PDBs and LECs from CATH[5] database

D. SOFTWARE
    D.1 lec.py:  compute the kappas in Equation (1) above.
        Example usage:
            $ python3 lec.py CATH_FIXED/2x0qA02/2x0qA02
        It will create the files with extension ".kappas" and ".relec", which reproduces the respectively the files with extension ".knill_curvature" and ".residue_curvature".

     D.2 pdb2network.lua: creates rbLEC network file (number of nodes and edges list) from PDB to be used as input by lec.py.
        Example usage:
            $ lua pdb2rbLEC.lua CATH_FIXED/2x0qA02/2x0qA02.fixed
        Output reproduces the file CATH_FIXED/2x0qA02/2x0qA02.pdb.network_backboneRE_heavy_gt2

     D.3 label.lua: create files with extension '*.label' from files '*.pdb.stride', '*.pdb.dssp' and '*.pdb.network_backboneRE_heavy_gt2.residues_curvature.
        Example usage:
             $ lua label.lua CATH_FIXED/2x0qA02/2x0qA02.pdb
        Output reproduces the file CATH_FIXED/2x0qA02/2x0qA02.pdb.network_backboneRE_heavy_gt2.residues_curvature.label

REFERENCES
[1] Herman, H., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T., Weissig, H., Shindyalov, I., & Bourne, P. (2000). The protein data bank. Nucleic acids research, 28, 235–42.
[2] Eastman, P., Swails, J., Chodera, J., McGibbon, R., Zhao, Y., Beauchamp, K., Wang, L.P., Simmonett, A., Harrigan, M., Stern, C., & others (2017). OpenMM 7: Rapid development of high performance algorithms for molecular dynamics. PLoS computational biology, 13(7), e1005659.
[3] Kabsch, W., & Sander, C. (1983). Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers: Original Research on Biomolecules, 22(12), 2577–2637.
[4] Frishman, D., & Argos, P. (1995). Knowledge-based protein secondary structure assignment. Proteins: Structure, Function, and Bioinformatics, 23(4), 566–579.
[5] Knudsen, M., & Wiuf, C. (2010). The CATH database. Human genomics, 4(3), 1–6.
[6] Levitt, N. (1992). The Euler characteristic is the unique locally determined numerical homotopy invariant of finite complexes. Discrete & computational geometry, 7, 59–67.
[7] Knill, O. (2011). A graph theoretical Gauss-Bonnet-Chern theorem. arXiv preprint arXiv:1111.5395.

 

 

Files

Files (1.6 GB)

Name Size Download all
md5:bf358307f211e808cd78c290e00a4d01
1.6 GB Download
md5:c2d567b6adca86b4fc30f46ef348a644
7.0 kB Download
md5:fa983e57013e9e02a4c51c5e5b1c8b24
1.7 kB Download
md5:d2b79e51f389d04b18d9d024580b3f78
13.6 kB Download
md5:c923786303a0f07ca2b49e9a12eff228
3.6 kB Download

Additional details

References

  • Herman, H., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T., Weissig, H., Shindyalov, I., & Bourne, P. (2000). The protein data bank. Nucleic acids research, 28, 235–42.
  • Eastman, P., Swails, J., Chodera, J., McGibbon, R., Zhao, Y., Beauchamp, K., Wang, L.P., Simmonett, A., Harrigan, M., Stern, C., & others (2017). OpenMM 7: Rapid development of high performance algorithms for molecular dynamics. PLoS computational biology, 13(7), e1005659.
  • Kabsch, W., & Sander, C. (1983). Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers: Original Research on Biomolecules, 22(12), 2577–2637.
  • Frishman, D., & Argos, P. (1995). Knowledge-based protein secondary structure assignment. Proteins: Structure, Function, and Bioinformatics, 23(4), 566–579.
  • Knudsen, M., & Wiuf, C. (2010). The CATH database. Human genomics, 4(3), 1–6.
  • Levitt, N. (1992). The Euler characteristic is the unique locally determined numerical homotopy invariant of finite complexes. Discrete & computational geometry, 7, 59–67.
  • Knill, O. (2011). A graph theoretical Gauss-Bonnet-Chern theorem. arXiv preprint arXiv:1111.5395.