Published April 23, 2026 | Version v1
Dataset Open

EVEE: Interpretable variant effect prediction from genomic foundation model embeddings

Description

This dataset contains the precomputed variant effect predictions and interpretability features that power the Evo Variant Effect Explorer (EVEE) web application, accompanying the preprint "EVEE: Interpretable variant effect prediction from genomic foundation model embeddings" (Pearce et al., 2026, doi:10.64898/2026.04.10.717844). Each row is one ClinVar variant (4,252,870 total) and carries its genomic coordinates, gene and consequence annotations, ClinVar clinical significance, an Evo 2 embedding-based pathogenicity score, and roughly 4,900 additional probe outputs covering protein-level disruption features (InterPro domains, post-translational modifications, secondary structure, active/binding sites, disorder, etc.), regulatory-track predictions (ChromHMM states, ATAC-seq and ChIP-seq peaks across multiple cell types, CCRE annotations), amino-acid and consequence classifiers, and per-variant reference-predictor scores (AlphaMissense, REVEL, CADD, PrimateAI, SpliceAI, and others). The table is released as five chromosome-balanced Parquet shards (clean_shard_0.parquet through clean_shard_4.parquet, each 6.8–7.3 GB) plus a manifest.json describing which chromosomes live in each shard. Consumers can read all shards as a single logical table with polars.scan_parquet("clean_shard_*.parquet") or duckdb.read_parquet. This is the exact artifact used to build the EVEE variants.duckdb served at https://evee.goodfire.ai.

Notes

Per-variant flat table (builds/v5/clean.parquet) used to populate the EVEE web app (https://evee.goodfire.ai). One row per ClinVar variant; columns include variant_id, gene_name, consequence, ClinVar significance and label, an Evo 2 probe pathogenicity score, and ~4,900 additional probe heads (disruption, effect, and annotation categories). The variants.duckdb artifact served by the website is derived from this parquet via scripts in https://github.com/goodfire-ai/variant-viewer.

Files

manifest.json

Files (35.0 GB)

Name Size Download all
md5:6b25a426954b7590c04e93b42bd0624f
6.8 GB Download
md5:5f9fe5fb06689b0b7b6f412d86975e05
6.8 GB Download
md5:ffbf39a9d047897932bbd4c7d43c05a2
7.2 GB Download
md5:4714443ef1ac68edf7cc92fb49587df7
7.0 GB Download
md5:fa19cce9b5d64235eedce017efb62074
7.3 GB Download
md5:9eae4cd35bb87ce89b7ebdef8735ebe2
1.4 kB Preview Download

Additional details

Related works

Is supplement to
Preprint: 10.64898/2026.04.10.717844 (DOI)

Dates

Available
2026-04-22