There is a newer version of the record available.

Published February 25, 2022 | Version v1
Dataset Open

AlphScore_final dataset

  • 1. Institute of Human Genetics, University Bonn, and School of Medicine, University Hospital Bonn, Germany
  • 2. Berlin Institute of Health at Charité – Universitätsmedizin Berlin , Berlin, Germany
  • 3. Institute for Genomic Statistics and Bioinformatics, University Hospital of Bonn, University of Bonn, Bonn, Germany; Institute of Medical Biometry, Informatics and Epidemiology, University Hospital of Bonn, University of Bonn, Bonn, Germany
  • 4. Berlin Institute of Health at Charité – Universitätsmedizin Berlin , Berlin, Germany; Institut für Humangenetik, Universität zu Lübeck, Lübeck, Germany

Description

This file contains AlphScore_final as described in our associated publication. The file is based on dbNSFP 4.2a, contains a header and is tab-separated and compressed using bgzip. The columns contain the following content:    

#chr chromosome (hg38)
pos(1-based) position (hg38)
ref reference allele
alt alternative allele
aaref reference amino acid
aaalt alternative amino acid
rs_dbSNP rs number
hg19_chr chromosome (hg19)
hg19_pos(1-based) position (hg19)
ID variant id in the format: chromosome:position:reference amino acid:alternative amino acid
genename genename, taken from dbNSFP
Uniprot_acc_split The Uniprot-IDs of the structural models that were used to create AlphScore_final (multiple entries separated by ; )
Uniprot_acc Uniprot_acc as provided by dbNSFP
HGVSp_VEP_split The missense variant(s) as used to create AlphScore_final; these variant(s) correspond(s) to the Uniprot_acc_split
HGVSp_VEP HGVSp_VEP as provided by dbNSFP
CADD_raw CADD_raw as provided by dbNSFP
REVEL_score REVEL_score as provided by dbNSFP
DEOGEN2_score DEOGEN2_score as provided by dbNSFP
b_factor AlphaFold's pLDDT-score of the residue (if a variant affects multiple proteins, the values of the proteins as indicated in Uniprot_acc_split are given separated by ; ).
 SOLVENT_ACCESSIBILITY_core Solvent accessibility of the residue as calculated for C-alpha by DSSP (if a variant affects multiple proteins, the values of the proteins as indicated in Uniprot_acc_split are given separated by ; ).
 in_gnomad_train TRUE if the variant was in the gnomAD set used for training
in_clinvar_ds TRUE if the variant was in the ClinVar set used for validation / training of combined scores
AlphScore This column corresponds to AlphScore_final
glm_AlphCadd This column corresponds to AlphScore_final + CADD
glm_AlphRevel This column corresponds to AlphScore_final + REVEL
glm_RevelCadd This column corresponds to REVEL + CADD
glm_AlphRevelCadd This column corresponds to AlphScore_final + REVEL + CADD
glm_AlphDeogen This column corresponds to AlphScore_final + DEOGEN2
glm_CaddDeogen This column corresponds to CADD + DEOGEN2
glm_DeogenRevel This column corresponds to DEOGEN2 + REVEL
glm_AlphDeogenRevel This column corresponds to AlphScore_final + DEOGEN2 + REVEL
glm_AlphCaddDeogen This column corresponds to AlphScore_final + CADD + DEOGEN2
 glm_CaddDeogenRevel This column corresponds to CADD + DEOGEN2 + REVEL

 

Note that the Creative Commons license applies only to the values of AlphScore_final. REVEL and CADD scores as well as combined scores containing REVEL and CADD are not licensed for commercial use. The full list of references can be found in our manuscript.

Files

Files (9.4 GB)

Name Size Download all
md5:cbb22553cd6976f5a8a6e2333e5a9fed
9.4 GB Download
md5:a9fe9eae820a54f3c64d0e3372be9564
765.6 kB Download