There is a newer version of the record available.

Published August 24, 2018 | Version v1

Sensitivity Datasets - Leveraging Implicit Knowledge in Neural Networks for Functional Dissection and Engineering of Proteins

  • 1. Synthetic Biology Group, Institute for Pharmacy and Biotechnology (IPMB) and Center for Quantitative Analysis of Molecular and Cellular Biosystems (BioQuant), University of Heidelberg, Heidelberg, 69120, Germany; Digital Health Center, Berlin Institute of Health (BIH) and Charité University Medicine, Berlin, 10117, Germany
  • 2. Synthetic Biology Group, Institute for Pharmacy and Biotechnology (IPMB) and Center for Quantitative Analysis of Molecular and Cellular Biosystems (BioQuant), University of Heidelberg, Heidelberg, 69120, Germany
  • 3. Molecular Epidemiology Unit, Berlin Institute of Health (BIH) and Charité University Medicine, Berlin, 10117, Germany
  • 4. Digital Health Center, Berlin Institute of Health (BIH) and Charité University Medicine, Berlin, 10117, Germany; Health Data Science Unit, University Hospital Heidelberg, Heidelberg, 69120, Germany

Description

Leveraging Implicit Knowledge in Neural Networks for Functional Dissection and Engineering of Proteins

The Sensitivity datasets cover more than 2000 proteins and are structured as follows.

It is uploaded as tar.gz. and structured in three separate directories.

  • mean_pdb/ contains the proteins for the proteins used to analyze the sphere variances, correlation with information content and correlations between GO terms (Figure 2c-e) and the ligand binding (Figure 3 a-c, Supplementary Figure 3).
  • mean_examples/ contains the proteins used for inferring the protein-receptor hybrids by the Hahn lab (Figure 5)
  • with_biological_activity/ contains the ERK2 data (Figure 3), the spCas9 data (Supplementary Figure 5) and the AcrIIA4 data (Figure 6)
     
  • binding_activities.csv contains the pdb identifiers and ligand descriptions for Figure 3, Supplementary Figure 3 and is needed for distance_to_ligand.py

 

Important

  • The biological activity data for spCas9 is from Brenan et al.1: Supplementary Table 1. We used column ‘dox_average’, here 'mean_dox_average'. 
  • The biological activity data for ERK2 is from Oakes et al.2: Supplementary Table 2. We used column ‘fold_change’ log2-transfomred, here 'mean_log2_fold_change'.
  • The sequences and secondary structure information were downloaded from the RCSB Protein Databank and are available here: https://cdn.rcsb.org/etl/kabschSander/ss_dis.txt.gz This URL can be found with some explanation at http://www.rcsb.org/pdb/static.do?p=download/http/index.html
  • The secondary structure annotation relies on the DSSP Algorithm by Kabsch and Sander3

 

The files are tab-separated and contain the following columns:

  • Pos Position in the sequence, starting from zero
  • AA Amino acid in that position
  • sec Secondary structure as annotated in the RCSB Protein Databank
  • dis if a region has not been experimentally observed (sometimes explains mismatches with crystal structures)
  • GO:_______ Sensitivity for that GO term
  • svar_GO:_______ Shere Variance of the sensitivity for that GO term
  • ic Information content, based on Pfam seed alignment
  • svar_n_neighbours number of residues in the sphere used to calculate the sphere variance
  • svar_d_center Distance to the center of mass of the chain that was analyzed
  • Others refer to biological activity data, depend on the source

 

References

  1. Brenan, L. et al. Phenotypic Characterization of a Comprehensive Set of MAPK1/ERK2 Missense Mutants. Cell Rep 17, 1171-1183, doi:10.1016/j.celrep.2016.09.061 (2016).
  2. Oakes, B. L. et al. Profiling of engineering hotspots identifies an allosteric CRISPR-Cas9 switch. Nat Biotechnol 34, 646-651, doi:10.1038/nbt.3528 (2016).
  3. Kabsch, W. & Sander, C. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22, 2577-2637, doi:10.1002/bip.360221211 (1983).

Notes

Authorship Statement The following are members of the iGEM (international genetically engineered machines) Team Heidelberg 2017: Lukas Adam, Thore Bürgel, Roland Eils, Catharina Gandor, Daniel Heid, Mareike Daniela Hoffmann, Stefan Holderbach, Michael Jendrusch, Marita Klein, Irina Lehmann, Jan Mathony, Dominik Niopek, Pauline Pfuderer, Lukas Platz, Moritz Przybilla, Carolin Schmelas, Max Schwendemann, Julius Upmeier zu Belzen, Max Waldhauer (all from Germany). Acknowledgements This work was funded by the Klaus-Tschira foundation, the German Research Council (DFG) and the Federal Ministry of Education and Research (BMBF). We thank Jürgen Quittek and Matthias Niepert (both NEC, Heidelberg), Thomas Wollmann (IPMB, BioQuant and German Cancer Research Center (DKFZ), Heidelberg) for helpful discussions and Marc Hemberger (BioQuant, Heidelberg) for support with IT and GPU cluster use.

Files

Files (110.5 MB)

Name Size Download all
md5:574f7efa1ae2eb2c363b68cb041e2b93
110.5 MB Download

Additional details

Related works

References

  • Brenan, L. et al. Phenotypic Characterization of a Comprehensive Set of MAPK1/ERK2 Missense Mutants. Cell Rep 17, 1171-1183, doi:10.1016/j.celrep.2016.09.061 (2016).
  • Oakes, B. L. et al. Profiling of engineering hotspots identifies an allosteric CRISPR-Cas9 switch. Nat Biotechnol 34, 646-651, doi:10.1038/nbt.3528 (2016).
  • Kabsch, W. & Sander, C. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22, 2577-2637, doi:10.1002/bip.360221211 (1983).