Published May 2, 2026 | Version 1.0.0
Dataset Open

Genome-wide AnnotateMissense predictions for 90.6 million hg38 missense variants

  • 1. The University of Queensland
  • 2. The University of Queensland; Baker Heart and Diabetes Institute

Description

This record contains the genome-wide AnnotateMissense prediction database and model-associated output associated with the manuscript AnnotateMissense: A Multi-source Annotation and Benchmarking Framework for Genome-wide Missense Pathogenicity Prediction.

AnnotateMissense is a scalable framework for genome-wide missense variant annotation, feature integration, benchmarking, and pathogenicity prediction. Starting from 90,643,830 hg38 missense single-nucleotide variants derived from dbNSFP v5.1/CAGI7 Annotate-All-Missense resources, the workflow integrates ANNOVAR-derived annotations, dbNSFP features, AlphaMissense scores, ESM-derived protein language model features, and engineered biological features.

The uploaded files include:

  • variants.duckdb.gz: compressed DuckDB database containing genome-wide missense variant annotations and AnnotateMissense prediction outputs.
  • UQ_BioSig_model_Final.tsv.gz: compressed final model-associated prediction/output table.

The source code and workflow scripts are available at: https://github.com/MuhammadMuneeb007/CAGI7_Annotate_All_Missense

These predictions are intended for research prioritisation, variant triage, and reproducible benchmarking. They are not standalone clinical classifications and should not be used as the sole basis for clinical decision-making.

Files

Files (22.2 GB)

Name Size Download all
md5:417f2649058118bb3b576e693ba75344
1.4 GB Download
md5:578ab4e9c988c2cf78d8aff9f7ceb63b
20.8 GB Download

Additional details

Related works