D2Deep: Combining evolution and protein language models for cancer driver mutation prediction
Authors/Creators
- 1. Interuniversity Institute of Bioinformatics, Vrije Universiteit Brussel (VUB) - Université Libre de Bruxelles (ULB)
- 2. Interuniversity Institute of Bioinformatics, Vrije Universiteit Brussel (VUB) - Université Libre de Bruxelles (ULB), Brussels Interuniversity Genomics High Throughput Core (BRIGHTcore), VUB-ULB
- 3. Interuniversity Institute of Bioinformatics, Vrije Universiteit Brussel (VUB) - Université Libre de Bruxelles (ULB), Structural Biology Brussels, VUB
Description
Datasets containing predictions, training and validation data for D2Deep predictor:
- D2Deep_predictions: D2Deep predictions for mutations in cancer driver proteins included in Next Generation Sequencing (NGS) panel of biopsies of haematological and solid tumours from Compermed Guidelines (https://www.compermed.be/en/guidelines)
- common_variants: common variants from gnomAD database (December 2022)
- dbSNP: Single nucleotide polymorphisms (SNPs) from the Single Nucleotide Polymorphism database (dbSNP)
- humsavar_benign_mutations: UniProtKB/Swiss-Prot human missense variants - release 21st December 2021
- clinvar_benign_deleterious_missense: ClinVar missense variants (March 2023)
- Tier.csv: Missense Tier 1,2,3 mutations from Catalogue of Somatic Mutations in Cancer (COSMIC - Cancer Mutation Census releasev92)
- cgi.csv: Missense oncogenic mutations from Cancer Genome Interpreter (release 2018)
- Balanced_training_set: Pathogenic/benign balanced set (on gene level) used for training the model
- log_probWT_MUT_Tier1_2_3_common_balanced+-2_2200AA_57maxpool: Training set features used for model training
- DMS_mutations: Deep Mutational Scanning mutations used for validation (2021 - https://doi.org/10.15252/msb.202110305)
- DRGN_testset: DRGN test set used for validation
- clinvar_balanced_somatic_germline_missense: Clinvar somatic versus germline subset used for validation (March 2023)
- 5genes_clinvarlabels_D2D_confidence_all: Performances of 6 predictors on 5 cancer genes mutations (March 2023)
- TP53_expert_multiple_single_submitters, BRAF_expert_multiple_single_submitters, CHEK2_expert_multiple_single_submitters, AR_expert_multiple_single_submitters, PTEN_expert_multiple_single_submitters : ClinVar labels with Review status: Practice guideline, Expert panel, Multiple submitters, Single submitter (March 2023)
- all_msas: mmseq2 Multiple Sequence Alignments for proteins used
------------------------------------------------------------------------------------------------------------------------
You can use our web server to query protein mutations and use the
interactive visualizations: https://tumorscope.be/d2deep/
------------------------------------------------------------------------------------------------------------------------
Files
5genes_clinvarlabels_D2D_confidence_all.csv
Files
(4.0 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:919080f4dac8c6677e490e1360c04fd9
|
830.5 kB | Preview Download |
|
md5:ac819b79633c2fefc2198de4a4a86547
|
1.8 GB | Preview Download |
|
md5:26d48582b3d1b69d61f813b1d720d843
|
49.3 kB | Preview Download |
|
md5:049368b441af5244a6ef1f352ee9a2ff
|
19.5 MB | Preview Download |
|
md5:d27a4fc868aeeb91d8ee3db04b28f32a
|
105.9 kB | Preview Download |
|
md5:0dd49c42ab920c18fb29c9063bd13741
|
4.8 MB | Preview Download |
|
md5:2f9367e8c204ba1b00f64e46e911d26c
|
473.3 kB | Preview Download |
|
md5:5d64ca6960fa2f4c3e96b2b27e8ffd2b
|
699.1 kB | Preview Download |
|
md5:b916e6f7e7bdb6f0291559d0abf2992d
|
209.8 MB | Preview Download |
|
md5:b2da8b66bc78e2de3852090288e6b5fe
|
121.4 MB | Preview Download |
|
md5:36f60a979aeda278427760f5364f12e5
|
1.3 GB | Preview Download |
|
md5:324e3dc3bb177f7ed1e297a36a5e6255
|
189.3 MB | Preview Download |
|
md5:e693d17f6f007b149d86276517d3c4b7
|
65.5 MB | Preview Download |
|
md5:6d826c1e3dfcd7f0e3c53cbf6f7039ef
|
8.5 MB | Preview Download |
|
md5:6210ffa61683772262a82e0d543b4aa4
|
94.0 MB | Preview Download |
|
md5:60a98831700178c5c4a5e41405163e68
|
125.5 MB | Preview Download |
|
md5:f189158eface9fe30cb6af814757aebf
|
221.9 kB | Preview Download |
|
md5:8ac5578683479438f517c4379251fc2a
|
8.6 MB | Preview Download |
|
md5:b76a65658b954310e6cf684805760283
|
272.8 kB | Preview Download |
Additional details
Dates
- Available
-
2023-11-17https://www.biorxiv.org/content/10.1101/2023.11.17.567550v1