Published September 12, 2024
| Version v3
Journal article
Open
D2Deep: Combining evolution and protein language models for cancer driver mutation prediction
- 1. Interuniversity Institute of Bioinformatics, Vrije Universiteit Brussel (VUB) - Université Libre de Bruxelles (ULB)
- 2. Interuniversity Institute of Bioinformatics, Vrije Universiteit Brussel (VUB) - Université Libre de Bruxelles (ULB), Brussels Interuniversity Genomics High Throughput Core (BRIGHTcore), VUB-ULB
- 3. Interuniversity Institute of Bioinformatics, Vrije Universiteit Brussel (VUB) - Université Libre de Bruxelles (ULB), Structural Biology Brussels, VUB
Description
Datasets containing predictions, training and validation data for D2Deep predictor:
- D2Deep_predictions: D2Deep predictions for mutations in cancer driver proteins included in Next Generation Sequencing (NGS) panel of biopsies of haematological and solid tumours from Compermed Guidelines (https://www.compermed.be/en/guidelines)
- Features: Epistatic features that integrate evolutionary and co-evolutionary information and can be used to identify short- and long-range effects of mutations within proteins.
- common_variants: common variants from gnomAD database (December 2022)
- dbSNP: Single nucleotide polymorphisms (SNPs) from the Single Nucleotide Polymorphism database (dbSNP)
- humsavar_benign_mutations: UniProtKB/Swiss-Prot human missense variants - release 21st December 2021
- clinvar_benign_deleterious_missense: ClinVar missense variants (March 2023)
- Tier.csv: Missense Tier 1,2,3 mutations from Catalogue of Somatic Mutations in Cancer (COSMIC - Cancer Mutation Census releasev92)
- cgi.csv: Missense oncogenic mutations from Cancer Genome Interpreter (release 2018)
- Balanced_training_set: Pathogenic/benign balanced set (on gene level) used for training the model
- log_probWT_MUT_Tier1_2_3_common_balanced+-2_2200AA_57maxpool: Training set features used for model training
- DMS_mutations: Deep Mutational Scanning mutations used for validation (2021 - https://doi.org/10.15252/msb.202110305)
- DRGN_testset: DRGN test set used for validation
- clinvar_balanced_somatic_germline_missense: Clinvar somatic versus germline subset used for validation (March 2023)
- 5genes_clinvarlabels_D2D_confidence_all: Performances of 6 predictors on 5 cancer genes mutations (March 2023)
- TP53_expert_multiple_single_submitters, BRAF_expert_multiple_single_submitters, CHEK2_expert_multiple_single_submitters, AR_expert_multiple_single_submitters, PTEN_expert_multiple_single_submitters : ClinVar labels with Review status: Practice guideline, Expert panel, Multiple submitters, Single submitter (March 2023)
- all_msas: mmseq2 Multiple Sequence Alignments for proteins used
------------------------------------------------------------------------------------------------------------------------
You can use our web server to query protein mutations and use the
interactive visualizations: https://tumorscope.be/d2deep/
------------------------------------------------------------------------------------------------------------------------
Files
features.zip
Files
(5.1 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:0782ebea212d6650226c380830754f72
|
5.1 GB | Preview Download |
Additional details
Dates
- Available
-
2023-11-17https://www.biorxiv.org/content/10.1101/2023.11.17.567550v1