Published August 24, 2018
| Version v1
Dataset
Open
Sensitivity Datasets - Leveraging Implicit Knowledge in Neural Networks for Functional Dissection and Engineering of Proteins
Authors/Creators
- 1. Synthetic Biology Group, Institute for Pharmacy and Biotechnology (IPMB) and Center for Quantitative Analysis of Molecular and Cellular Biosystems (BioQuant), University of Heidelberg, Heidelberg, 69120, Germany; Digital Health Center, Berlin Institute of Health (BIH) and Charité University Medicine, Berlin, 10117, Germany
- 2. Synthetic Biology Group, Institute for Pharmacy and Biotechnology (IPMB) and Center for Quantitative Analysis of Molecular and Cellular Biosystems (BioQuant), University of Heidelberg, Heidelberg, 69120, Germany
- 3. Molecular Epidemiology Unit, Berlin Institute of Health (BIH) and Charité University Medicine, Berlin, 10117, Germany
- 4. Digital Health Center, Berlin Institute of Health (BIH) and Charité University Medicine, Berlin, 10117, Germany; Health Data Science Unit, University Hospital Heidelberg, Heidelberg, 69120, Germany
Description
Leveraging Implicit Knowledge in Neural Networks for Functional Dissection and Engineering of Proteins
The Sensitivity datasets cover more than 2000 proteins and are structured as follows.
It is uploaded as tar.gz. and structured in three separate directories.
- mean_pdb/ contains the proteins for the proteins used to analyze the sphere variances, correlation with information content and correlations between GO terms (Figure 2c-e) and the ligand binding (Figure 3 a-c, Supplementary Figure 3).
- mean_examples/ contains the proteins used for inferring the protein-receptor hybrids by the Hahn lab (Figure 5)
- with_biological_activity/ contains the ERK2 data (Figure 3), the spCas9 data (Supplementary Figure 5) and the AcrIIA4 data (Figure 6)
- binding_activities.csv contains the pdb identifiers and ligand descriptions for Figure 3, Supplementary Figure 3 and is needed for distance_to_ligand.py
Important
- The biological activity data for spCas9 is from Brenan et al.1: Supplementary Table 1. We used column ‘dox_average’, here 'mean_dox_average'.
- The biological activity data for ERK2 is from Oakes et al.2: Supplementary Table 2. We used column ‘fold_change’ log2-transfomred, here 'mean_log2_fold_change'.
- The sequences and secondary structure information were downloaded from the RCSB Protein Databank and are available here: https://cdn.rcsb.org/etl/kabschSander/ss_dis.txt.gz This URL can be found with some explanation at http://www.rcsb.org/pdb/static.do?p=download/http/index.html
- The secondary structure annotation relies on the DSSP Algorithm by Kabsch and Sander3
The files are tab-separated and contain the following columns:
- Pos Position in the sequence, starting from zero
- AA Amino acid in that position
- sec Secondary structure as annotated in the RCSB Protein Databank
- dis if a region has not been experimentally observed (sometimes explains mismatches with crystal structures)
- GO:_______ Sensitivity for that GO term
- svar_GO:_______ Shere Variance of the sensitivity for that GO term
- ic Information content, based on Pfam seed alignment
- svar_n_neighbours number of residues in the sphere used to calculate the sphere variance
- svar_d_center Distance to the center of mass of the chain that was analyzed
- Others refer to biological activity data, depend on the source
References
- Brenan, L. et al. Phenotypic Characterization of a Comprehensive Set of MAPK1/ERK2 Missense Mutants. Cell Rep 17, 1171-1183, doi:10.1016/j.celrep.2016.09.061 (2016).
- Oakes, B. L. et al. Profiling of engineering hotspots identifies an allosteric CRISPR-Cas9 switch. Nat Biotechnol 34, 646-651, doi:10.1038/nbt.3528 (2016).
- Kabsch, W. & Sander, C. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22, 2577-2637, doi:10.1002/bip.360221211 (1983).
Notes
Files
Files
(110.5 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:574f7efa1ae2eb2c363b68cb041e2b93
|
110.5 MB | Download |
Additional details
Related works
- Is compiled by
- https://github.com/juzb/DeeProtein (URL)
- 10.5281/zenodo.1402828 (DOI)
References
- Brenan, L. et al. Phenotypic Characterization of a Comprehensive Set of MAPK1/ERK2 Missense Mutants. Cell Rep 17, 1171-1183, doi:10.1016/j.celrep.2016.09.061 (2016).
- Oakes, B. L. et al. Profiling of engineering hotspots identifies an allosteric CRISPR-Cas9 switch. Nat Biotechnol 34, 646-651, doi:10.1038/nbt.3528 (2016).
- Kabsch, W. & Sander, C. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22, 2577-2637, doi:10.1002/bip.360221211 (1983).