Genome-wide AnnotateMissense predictions for 90.6 million hg38 missense variants
Authors/Creators
- 1. The University of Queensland
- 2. The University of Queensland; Baker Heart and Diabetes Institute
Description
This record contains the genome-wide AnnotateMissense prediction database and model-associated output associated with the manuscript AnnotateMissense: A Multi-source Annotation and Benchmarking Framework for Genome-wide Missense Pathogenicity Prediction.
AnnotateMissense is a scalable framework for genome-wide missense variant annotation, feature integration, benchmarking, and pathogenicity prediction. Starting from 90,643,830 hg38 missense single-nucleotide variants derived from dbNSFP v5.1/CAGI7 Annotate-All-Missense resources, the workflow integrates ANNOVAR-derived annotations, dbNSFP features, AlphaMissense scores, ESM-derived protein language model features, and engineered biological features.
The uploaded files include:
variants.duckdb.gz: compressed DuckDB database containing genome-wide missense variant annotations and AnnotateMissense prediction outputs.UQ_BioSig_model_Final.tsv.gz: compressed final model-associated prediction/output table.
The source code and workflow scripts are available at: https://github.com/MuhammadMuneeb007/CAGI7_Annotate_All_Missense
These predictions are intended for research prioritisation, variant triage, and reproducible benchmarking. They are not standalone clinical classifications and should not be used as the sole basis for clinical decision-making.
Files
Files
(22.2 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:417f2649058118bb3b576e693ba75344
|
1.4 GB | Download |
|
md5:578ab4e9c988c2cf78d8aff9f7ceb63b
|
20.8 GB | Download |
Additional details
Related works
- Is supplemented by
- Software: https://github.com/MuhammadMuneeb007/CAGI7_Annotate_All_Missense (URL)