EVEE: Interpretable variant effect prediction from genomic foundation model embeddings
Authors/Creators
- Pearce, Michael T.1
-
Dooms, Thomas1
-
Yamamoto, Ryo1
-
Meehl, Joshua2
-
Molnar, Carl2
- Bissell, Mark1
-
Hazra, Dron1
- Fang, Ching1
- Nguyen, Nam1
- Anderson, Michael1
-
Osborne, Collin2
- Duffy, Patrick2
-
Toomey, Bridget2
-
Klee, Eric2
-
Myasoedova, Elena2
-
Ryu, Alexander J.2
-
Ayanian, Shant2
-
Korfiatis, Panos2
-
Redlon, Matt2
- Jain, Archa1
- Balsam, Daniel1
-
Wang, Nicholas K.1
- 1. Goodfire
- 2. Mayo Clinic
Description
This dataset contains the precomputed variant effect predictions and interpretability features that power the Evo Variant Effect Explorer (EVEE) web application, accompanying the preprint "EVEE: Interpretable variant effect prediction from genomic foundation model embeddings" (Pearce et al., 2026, doi:10.64898/2026.04.10.717844). Each row is one ClinVar variant (4,252,870 total) and carries its genomic coordinates, gene and consequence annotations, ClinVar clinical significance, an Evo 2 embedding-based pathogenicity score, and roughly 4,900 additional probe outputs covering protein-level disruption features (InterPro domains, post-translational modifications, secondary structure, active/binding sites, disorder, etc.), regulatory-track predictions (ChromHMM states, ATAC-seq and ChIP-seq peaks across multiple cell types, CCRE annotations), amino-acid and consequence classifiers, and per-variant reference-predictor scores (AlphaMissense, REVEL, CADD, PrimateAI, SpliceAI, and others). The table is released as five chromosome-balanced Parquet shards (clean_shard_0.parquet through clean_shard_4.parquet, each 6.8–7.3 GB) plus a manifest.json describing which chromosomes live in each shard. Consumers can read all shards as a single logical table with polars.scan_parquet("clean_shard_*.parquet") or duckdb.read_parquet. This is the exact artifact used to build the EVEE variants.duckdb served at https://evee.goodfire.ai.
Notes
Files
manifest.json
Files
(35.0 GB)
| Name | Size | Download all |
|---|---|---|
|
md5:6b25a426954b7590c04e93b42bd0624f
|
6.8 GB | Download |
|
md5:5f9fe5fb06689b0b7b6f412d86975e05
|
6.8 GB | Download |
|
md5:ffbf39a9d047897932bbd4c7d43c05a2
|
7.2 GB | Download |
|
md5:4714443ef1ac68edf7cc92fb49587df7
|
7.0 GB | Download |
|
md5:fa19cce9b5d64235eedce017efb62074
|
7.3 GB | Download |
|
md5:9eae4cd35bb87ce89b7ebdef8735ebe2
|
1.4 kB | Preview Download |
Additional details
Related works
- Is supplement to
- Preprint: 10.64898/2026.04.10.717844 (DOI)
Dates
- Available
-
2026-04-22